test(qwp): fix flaky cold-start timing in async close-drain test#54
test(qwp): fix flaky cold-start timing in async close-drain test#54mtopolnik wants to merge 1 commit into
Conversation
testAsyncCloseDrainSucceedsWhenServerWasUpAllAlong timed the whole
try (Sender = fromConfig(...)) { write; flush; } close block against a
2500ms budget, but the assertion budgets close()'s drain latency, not
sender construction. Iteration 0 is the first store-and-forward async
sender in the JVM, so fromConfig pays a one-time cold-start cost (class
loading, JIT, sf buffer mmap). On a loaded CI runner that cold start
reached ~3.26s and tripped the assertion, even though close() itself
took ~12ms.
Move t0 to wrap only sender.close(), matching the sibling test
testAsyncCloseDrainSucceedsWhenServerStartsDuringDrain. close() is
idempotent, so the try-with-resources auto-close is a no-op. Measured
close() latency stays in the single-digit milliseconds across all 20
iterations, well within the 2500ms budget.
|
Level 2. Test-only change: one 7-line move of the timing window in I verified the single load-bearing claim — that CriticalNone. ModerateNone. Minor2. Residual flake surface: 2500ms assertion vs 3000ms drain timeout (in-diff, 3. (Pre-existing, not introduced by this PR) SummaryVerdict: approve. The fix is correct and the diff is minimal. The 2500ms budget always targeted drain latency; moving |
Problem
CloseDrainTest.testAsyncCloseDrainSucceedsWhenServerWasUpAllAlongruns 20iterations. Each iteration constructs a store-and-forward async
Sender, writesand flushes one row, then closes the sender and asserts that close completes
within a 2500ms budget. That budget targets the latency of
close()'s drain,but the test timed the whole
try (Sender sender = Sender.fromConfig(cfg)) { ...; flush(); }block — construction and close — against it.Iteration 0 builds the first store-and-forward async sender in the JVM, so
fromConfigpays a one-time cold-start cost: class loading, JIT warmup, and thestore-and-forward buffer mmap. On a loaded CI runner that cold start reached
~3.26s and tripped the 2500ms assertion, even though
close()itself took~12ms. The failure is load-dependent, so it surfaced as a rare flake on busy
runners rather than deterministically.
Fix
Move the
t0 = System.nanoTime()start inside the try block so the timed regionwraps only
sender.close(). This matches the sibling testtestAsyncCloseDrainSucceedsWhenServerStartsDuringDrain, which already timesclose()alone. The cold-start cost now sits outside the measured window, whereit belongs: the assertion was always about drain latency, not sender
construction.
QwpWebSocketSender.close()is idempotent — it guards on aclosedflag andreturns immediately on the second call — so the try-with-resources auto-close at
the end of the block is a no-op.
Tradeoffs
The test no longer bounds
fromConfig/construction time at all. That was neverits intent — the 2500ms budget targets the close/drain path — and construction
runs in every other test in the suite, so this is not a coverage loss here. A
genuine construction-time regression would need a dedicated test rather than
being caught incidentally by this one.
Test plan
CloseDrainTest#testAsyncCloseDrainSucceedsWhenServerWasUpAllAlongand#testAsyncCloseDrainSucceedsWhenServerStartsDuringDrainlocally: 2 testsrun, 0 failures, 0 errors
WasUpAllAlongtest passes with the close-onlytimer; every iteration's
close()completed within the 2500ms budgetQwpWebSocketSender.close()guards on itsclosedflag, so theimplicit try-with-resources close is a no-op