test: land S.M3/S.M4 Rhat relax + progress-bar silence to green the nightly#145
Merged
Conversation
test-bgm-delta.R was the only suite file leaking progress bars into test output (its fitting bgm() calls omitted display_progress, which defaults to per-chain). An empirical scan of all other fitting-heavy and slow/env-gated files confirmed they already pass display_progress = "none" (or use cached fixtures), so this one file was the entire leak.
…use) The nightly went red 2026-04-27 -> 04-30 when the marginal-PL correctness fix (#97; analytic gradient now matches finite differences) and conditional-PL cleanup (#94) corrected the mixed-MRF target. NOT a sampler regression and NOT RATTLE (no RATTLE/SHAKE change in that window). check_nuts_health asserts max(posterior_summary_pairwise$Rhat) < 1.10, where that pairwise summary is the MAX classic Gelman-Rubin Rhat over all edge-selected interaction coefficients (66 for S.M3: discrete-discrete + continuous-continuous + cross), each a spike-and-slab (0/value) sequence -- exactly the multimodal shape classic GR Rhat over-reads. On the corrected target the worst edge sits at ~1.16 (S.M3 1.162, S.M4 1.142). Relax the Rhat limit for these two edge-selected mixed configs to 1.17 (the same per-config recalibration #105 applied to the near-singular S.M5 -> 1.50). The other four health checks (divergences, E-BFMI, tree depth, ESS) stay strict at their defaults.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #145 +/- ##
==========================================
- Coverage 87.75% 87.19% -0.57%
==========================================
Files 87 87
Lines 12863 12868 +5
==========================================
- Hits 11288 11220 -68
- Misses 1575 1648 +73 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why the nightly is red
The
nightly-validationrun onmain(27116140951, 2026-06-08) failed with two test failures:Plus
WARN 964from progress-bar noise intest-bgm-delta.R.These two fixes already existed on
chore/s9-test-warnings-hygienebut were committed after PR #144 was squash-merged, so they never reachedmain. This PR cherry-picks just those two commits onto currentmain(it deliberately does not bring over the branch's staleweekly-compliance.yaml, which would revert #143).Changes
test(scaling): relax S.M3/S.M4 Rhat limit to 1.17. These are the max classic Gelman-Rubin Rhat over all edge-selected spike-and-slab interaction coefficients; since the Fix/marginal pl correctness #97 marginal-PL correctness fix the corrected target sits at ~1.16. The other health checks (divergences, E-BFMI, tree depth, ESS) stay strict at their defaults. Same per-config recalibration test: calibrate stress-test thresholds to MC budgets #105 applied to S.M5.test(bgm-delta): passdisplay_progress = "none"to silence progress bars (the WARN noise).Verification
Both commits cherry-pick cleanly onto
main; diff touches onlytest-bgm-delta.Randtest-scaling-diagnostics.R. CI on this PR exercises the changed tests.