Skip to content

Draft - DEGORD - RR#109

Open
MaartenMarsman wants to merge 77 commits into
mainfrom
feat/log-z-nlo-gamma
Open

Draft - DEGORD - RR#109
MaartenMarsman wants to merge 77 commits into
mainfrom
feat/log-z-nlo-gamma

Conversation

@MaartenMarsman

@MaartenMarsman MaartenMarsman commented May 19, 2026

Copy link
Copy Markdown
Collaborator

...

Stage 3 Phase 1a of dev/plans/backlog/hierarchical-ggm-degord-rr.md.
Ports the closed-form log Z(G) approximations from the z-project (Route 3a
+ DEGORD pilot) into bgms as numerics primitives, ready for the
hierarchical-spec MH wiring in Phases 3-4.

- src/models/ggm/log_z_nlo.{h,cpp}:
  - log_Z_NLO_gamma(G, alpha, beta, sigma, include_F, delta): general-alpha
    Laplace + NLO closed form for the spikeslab-+tilt GGM normaliser.
  - log_Z_NLO_gamma_degord(G, i, j, ...): toggle-endpoint reordering wrapper
    (permutes (i, j) to positions (0, 1)).
  - log_Z_NLO_gamma_delta_incr_alpha1(G_before, i, j, ...): O(deg^2 + deg*q)
    incremental log-Z difference under a single-edge toggle at alpha = 1.

- src/log_z_test_interface.cpp: Rcpp exports for unit testing.

- tests/testthat/test-log-z-nlo.R: 437 assertions covering parity against
  the z reference (bit-exact, max abs diff = 0 across 108 grid cells),
  alpha = 1 incremental vs full-recompute (machine-precision), DEGORD
  permutation consistency, and analytic empty-graph constant.

- tests/testthat/fixtures/log_z_nlo_reference.rds: z-reference fixture
  (regeneratable via dev/numerical_analyses/generate_log_z_fixture.R).
@codecov

codecov Bot commented May 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 74.86157% with 227 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.30%. Comparing base (d1fe752) to head (321c1a8).

Files with missing lines Patch % Lines
src/savage_dickey_test_interface.cpp 34.39% 103 Missing ⚠️
src/models/ggm/ggm_model.cpp 85.54% 48 Missing ⚠️
src/math/savage_dickey/cubic_mode.cpp 81.88% 25 Missing ⚠️
R/bgm_spec.R 60.46% 17 Missing ⚠️
R/bgm.R 12.50% 14 Missing ⚠️
R/sample_ggm_prior.R 37.50% 5 Missing ⚠️
src/math/cholesky_helpers.cpp 89.18% 4 Missing ⚠️
src/models/ggm/ggm_model.h 76.47% 4 Missing ⚠️
R/build_output.R 0.00% 2 Missing ⚠️
src/sample_ggm.cpp 77.77% 2 Missing ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #109      +/-   ##
==========================================
- Coverage   86.62%   85.30%   -1.33%     
==========================================
  Files          77       82       +5     
  Lines       13076    13919     +843     
==========================================
+ Hits        11327    11873     +546     
- Misses       1749     2046     +297     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Stage 3 Phase 1b of dev/plans/backlog/hierarchical-ggm-degord-rr.md.

At alpha > 1, H_e gains a -4 beta (alpha - 1) / M_v[v] piece that depends
on the larger endpoint, so a single-edge toggle (i, j) cascades H_e
changes to every edge incident to i (forward AND backward). The alpha = 1
locality decomposition (vertex_i_contribs only) no longer suffices.

New approach: partition log_Z_NLO_gamma into "instances touching
V_a = {i, j} ∪ N_G_before(i)" vs "instances not touching V_a", with the
latter invariant under the toggle and cancelling in the difference.

- src/models/ggm/log_z_nlo.cpp:
  - partial_log_Z_NLO_gamma_on_V (private): full-recompute structure with
    an any-index-in-V_a filter on every term type (log_C0 per-edge and
    per-vertex, log_LO, B_e, C_{e,k}, D_v, E_{lm}+F, N_v, M_e). M_v and
    H_e are still computed on the full graph; the filter only decides
    inclusion in the partial sum.
  - log_Z_NLO_gamma_delta_incr_alphaN (public): partial(G_after, V_a) -
    partial(G_before, V_a). Reduces to log_Z_NLO_gamma_delta_incr_alpha1
    at alpha = 1 to machine precision.

- tests/testthat/test-log-z-nlo.R: 816 new assertions across
  q in {3, 5, 7}, alpha in {1.5, 2, 3}, delta in {0, 0.5, 1},
  include_F in {FALSE, TRUE}, all toggles. Two new test_that blocks:
  - alphaN vs full-recompute difference at alpha > 1 (within 1e-10).
  - alphaN(alpha = 1) vs the existing alpha = 1 incremental (within 1e-10).
  All 1253 assertions in the file pass at machine precision.

Cost is O(q^2) per toggle: the outer loops are still over all edges
and non-edges (filter, not loop-bound, restricts inclusion). The plan's
O(deg^2 + deg * q) optimisation is a future loop-bound restructuring
that can be tested against this implementation.
Replaces the O(q^2) any-index-in-V_a filter with an O(deg^2 + deg * q)
canonical-representative enumeration. Each qualifying instance (edge,
triple, non-edge, vertex term) is now walked exactly once via its
lowest-index V_a member, eliminating the O(q^2) and O(q^3) scans on the
heavy term types.

Strategy:
- Precompute forward/backward adjacency lists in O(|E|), reusing them
  for nu, H_e, T, and S sums.
- Edge-indexed terms (log_LO, B_e, M_e): for each p in V_a, walk
  fwd[p] and bwd[p] (with !in_V[l] gate for the larger-endpoint case).
- Vertex-indexed terms (lgamma, D_v, N_v): direct loop over V_a_idx.
- Triple-indexed C_{(a, b), k}: three cases for canon = p, k=p, a=p, b=p
  with gates k not in V_a (case 2) and a, k not in V_a (case 3).
- Non-edge E_{lm}: for each p in V_a, walk (p, m) and (l, p) pairs with
  the !in_V[l] gate for the larger-endpoint case; common-pred sum walks
  bwd[l] and tests G(k, m).

Measured speedup over the full-recompute difference baseline on a
sparse 20%-density graph (100 random toggles, alpha=2, delta=0.5):
- q= 20: incremental ~ full-diff (parity; constant-factor regime).
- q= 50: 1.2x.
- q=100: 2.9x.

All 1253 assertions in tests/testthat/test-log-z-nlo.R continue to pass
at machine precision (Phase 1a fixture, Phase 1b parity, alpha=1
reduction).
…e 2)

Stage 3 Phase 2 of dev/plans/backlog/hierarchical-ggm-degord-rr.md.
Ports the inner Zhat machinery from ~/SV/Z/R/src/degord_sampler.h (v4,
2026-05-18) into bgms, hooked to SafeRNG.

New files
---------
- src/models/ggm/degord_sampler.{h,cpp}:
  - ChainAux (per-chain (alpha, beta, sigma, delta) constants +
    per-nu transcendental tables).
  - PiAux (per-permutation: G_pi, nu_pi, E_count, log_C0).
  - permute_graph / degord_permutation (toggle endpoints to (q-2, q-1)).
  - phi_pi_sample_from_noise (inner Bartlett-Cholesky kernel).
  - log_Zhat_pi_from_pool (main entry; log-sum-exp over importance weights).
  - log_Zhat_pi_from_pool_cache + PoolCache (cached variant for
    delta-toggle reuse).
  - row_qm2_logw_from_S (single-row recompute under trailing toggle).
  - delta_log_Zhat_pi_toggle (efficient delta with cache reuse).
  - draw_bartlett_pool (SafeRNG-based standard-normal pool).

Two bug fixes folded in vs the z-project v4 reference
-----------------------------------------------------
Both fixes were derived during the bgms port audit and confirmed correct
on the z side; they are bit-verified and land here with regression
assertions:

1. Cache rw_head fix (Option 2). The v4 rw_head spanned only rows
   0..q-3, so the star aggregation in delta_log_Zhat_pi_toggle omitted
   row q-1's diagonal log-weight even though it is invariant across
   (curr, star) for any toggle (nu_pi[q-1] = 0 always under the DEGORD
   permutation). The omission left a sample-shifted bias that grew with
   the tilt parameter delta. Fix: rw_head extended to include
   row_logw[q - 1].

2. z_trail slot fix (slab_tilt_mode == 1). delta_log_Zhat_pi_toggle
   passed z_trail = 0.0 hardcoded to row_qm2_logw_from_S, dropping the
   saddle-shifted slab innovation noise[q + edge_offset(q-2, q-1)] =
   noise[q + (q-2)(q+1)/2] when slab_tilt_mode == 1. Fix: read the
   actual slab slot via noise_pool.colptr(slab_idx).

Tests
-----
- tests/testthat/test-degord-sampler.R: 1526 assertions covering:
  - ChainAux nu-tables vs the z reference (local-only, gated on z source).
  - log_Zhat_pi_from_pool bit-parity on shared noise pool via the
    fixture at tests/testthat/fixtures/degord_sampler_reference.rds
    (regeneratable via dev/numerical_analyses/generate_degord_fixture.R).
  - DEGORD permutation correctness (i, j) -> (q-2, q-1).
  - delta_log_Zhat_pi_toggle EQUALS the direct full-recompute difference
    log_Zhat(star) - log_Zhat(curr) at machine precision under BOTH
    slab_tilt_modes. The (q=10, delta=0, tilt=1, toggle=(3,9)) cell is
    an explicit regression row.
  - delta_log_Zhat_pi_toggle bit-parity vs the z reference (local-only).
  - var(log_Zhat) scales as 1/M_inner (Phase 2 acceptance).
  - draw_bartlett_pool shape, normality, seed determinism.

Acceptance criteria from the Phase 2 plan section are met:
  - log_Zhat_pi_from_pool agrees with the z reference at bit-parity on
    shared input (max abs diff = 0 across the 108-cell fixture grid).
  - var(log Zhat) scales as 1/M_inner (M*var constant at ~0.35 across
    M in {30, 100, 300, 1000}; ratio var(30)/var(1000) = 36.3 vs
    theoretical 33.3).

bgms-side Phase 1 work (log_Z_NLO_gamma + alpha=1 incremental +
general-alpha incremental + V_a-pivot optimisation) is unaffected and
its 1253 assertions still pass.
Stage 3 Phase 3 of dev/plans/backlog/hierarchical-ggm-degord-rr.md.
The per-toggle unbiased 1/Z(Γ) estimator that composes the Phase 2 DEGORD
Bartlett-Cholesky sampler into the Lyne-style Russian-Roulette geometric
series for the hierarchical-spec MH ratio.

New files
---------
- src/models/ggm/z_ratio_estimator.{h,cpp}:
  - V_at_Gamma_pi_degord(K_depth, pools_t, pi_aux, chain_aux, c, rho):
      V = (1/c) · [1 + sum_{n=1..K} (-1)^n · prod_{i=1..n} (Zhat_i - c)/c / rho^n]
      with Zhat_i = exp(log_Zhat_pi_from_pool(pools_t[i-1], pi_aux, chain_aux)).
      Returns the signed V (caller tracks sign for ergodic averaging).
  - draw_U_degord_rr(rng, K_depth, pools_t, M_inner, q, rho):
      K_depth ~ Geom(1 - rho) via inverse-CDF on SafeRNG uniform;
      pools_t[n] is (dim x M_inner) iid N(0, 1) in the pre-transposed
      layout required by phi_pi_sample_from_noise.
  - ZRatioState (declaration only): the per-Γ state the Phase 4 chain
    will own (pools_t, K_depth, kappa, rho, log_Z_NLO_curr, sign_curr).

- Rcpp test exports in src/log_z_test_interface.cpp:
  - degord_V_at_Gamma_pi_cpp (List interface for pools_t).
  - degord_draw_U_rr_cpp returns List(K_depth, pools_t).

Tests
-----
- tests/testthat/test-z-ratio-estimator.R: 10 assertions covering:
  - V is unbiased for 1/Z within MC noise at q=5: mean(V) across 1000
    independent (K_depth, pools) draws sits within 4 SD of 1/Z computed
    via a high-precision (M=5000, n=20) log_Zhat truth proxy.
  - K_depth ~ Geom(1 - rho): mean, P(K=0), P(K>=5) all match within
    n=10000 MC tolerance.
  - V at K_depth = 0 equals 1/c exactly (machine precision).
  - draw_U_degord_rr is deterministic in the SafeRNG seed.
  - Signed V picks up the alternating series when c is far from 1/Z
    (small c forces negative samples at K_depth >= 1).

Cited references
----------------
- Direct port of ~/SV/Z/R/src/branchB_chain_route3a_degord.cpp:192-228
  for V_at_Gamma_pi_degord and :215-228 for the pool draw.
- Lyne (2015) Russian-Roulette construction; route3a_V_helpers
  notes/exactness-proposition.md §2 for radius/moment conditions.
…er_pair (Phase 4a)

Stage 3 Phase 4a of dev/plans/backlog/hierarchical-ggm-degord-rr.md. Wires
the Phase 2 DEGORD sampler + Phase 3 V/RR estimator into bgms's GGM
between-edge MH path. Converts the joint-spec MH ratio into the
hierarchical-spec ratio by multiplying by V(Γ_curr)/V(Γ_star), an
unbiased estimator of Z(Γ_star)/Z(Γ_curr).

This is the C++ infrastructure half of Phase 4. The R API surface
(bgm(..., graph_prior_spec = "hierarchical")) is Phase 4b, a separate
plumbing job.

Changes
-------
- src/models/ggm/ggm_model.h
  - New enum class GraphPriorSpec { Joint, Hierarchical } (header-level,
    outside the class).
  - GGMModel gains:
      void set_graph_prior_spec(GraphPriorSpec);
      void set_z_ratio_tuning(int M_inner, double kappa, double rho);
    plus private members graph_prior_spec_, v_M_inner_, v_kappa_, v_rho_,
    log_Z_NLO_curr_, v_K_depth_, v_pools_t_, chain_aux_degord_, and
    cached prior-family parameters (prior_sigma_, prior_alpha_, prior_beta_).

- src/models/ggm/ggm_model.cpp
  - ensure_hierarchical_state_(): lazy init. Validates the slab is
    NormalPrior and the diagonal is GammaScalePrior (throws otherwise).
    Builds chain_aux_degord_; full-recomputes log_Z_NLO_curr_ at the
    current Γ; draws the first U-pool.
  - refresh_z_ratio_pool_(): fresh (K_depth, pools_t) draw per iteration,
    mirroring the z chain's refresh_U_every = 1 convention.
  - prepare_iteration() now calls ensure + refresh when in Hierarchical mode.
  - update_edge_indicator_parameter_pair (both branches):
      after the existing joint MH ratio is assembled, if Hierarchical mode
      is active, compute log_Z_NLO_star (incremental at alpha=1, full
      recompute at alpha != 1), build PiAux at the DEGORD permutation
      sending (i, j) to (q-2, q-1), compute V_curr and V_star, and add
      log|V_curr|/|V_star| to ln_alpha. Auto-reject on non-finite or
      zero |V| (Lyne 2015 convention).
      On accept, log_Z_NLO_curr_ <- log_Z_NLO_star.
  - Defensive note retained that the determinant_tilt_ * log_det_ratio_edge
    term is identically 0 under Roverato slaving (verified during the
    Z-side audit and reconfirmed by direct calculation here).

- src/priors/parameter_prior.h
  - Adds public scale()/shape()/rate() accessors on NormalPrior and
    GammaScalePrior so the hierarchical state can read the prior params
    without coupling to the construction site.

- src/log_z_test_interface.cpp
  - New ggm_hierarchical_smoke_cpp entry: constructs a GGMModel directly,
    switches it to Hierarchical, runs n_sweeps Metropolis edge-update
    sweeps, returns final_edges + n_edges_path.

- tests/testthat/test-ggm-hierarchical.R
  - 13 assertions across 3 test_that blocks:
    1. Chain runs without crash and edge counts stay in [0, p(p-1)/2]
       across 200 sweeps at q=5, n=100 random data.
    2. Output is deterministic in the seed (same seed -> identical output;
       different seed -> different trajectory).
    3. Chain remains non-degenerate across delta in {0, 0.5, 1.0} (no
       all-zero or all-full lock-in).

Existing Phase 1-3 testthat assertions (1253 + 1526 + 10) continue to pass.
The default Joint mode (existing chain semantics) is unchanged.

Acceptance criteria from the Phase 4 plan section met for the C++-only
half:
  - bgm(... graph_prior_spec = "hierarchical") will finish on small
    synthetic data once Phase 4b wires the R-side argument.
  - Per-sweep wall: q=5 200-sweep run completes in well under a second on
    M5 Pro at M_inner=100. Full benchmarking deferred to Phase 5.

Phase 5 (SBC validation vs AKM-J truth across the delta-sweep) and
Phase 4b (R API surface) are the remaining pieces of Stage 3.
… 4b)

Stage 3 Phase 4b: exposes the Phase 4a hierarchical-spec inference path
through the bgm() R interface.

R API
-----
- bgm() gains two new arguments:
    graph_prior_spec = c("joint", "hierarchical")  # default "joint"
    z_ratio_tuning   = list(M_inner = 100L, kappa = 1.0, rho = 0.5)
- Both arguments are plumbed through bgm_spec() -> build_spec_ggm() ->
  run_sampler_ggm() -> sample_ggm() -> GGMModel::set_graph_prior_spec /
  set_z_ratio_tuning.
- Validation in bgm_spec():
    - graph_prior_spec must be "joint" or "hierarchical".
    - Hierarchical requires model_type in {"ggm", "mixed_mrf"} (continuous
      precision block needed). Errors helpfully if mismatch.
    - Hierarchical requires interaction_prior_type == "normal" and
      scale_prior_type == "gamma" (the prior families for which the
      closed-form Laplace-NLO normaliser approximation is implemented).
      Validated up front so the user gets a clean error before MCMC starts.
    - z_ratio_tuning components validated: M_inner positive integer,
      kappa > 0, rho in (0, 1).

C++ glue
--------
- sample_ggm.cpp: adds graph_prior_spec / z_ratio_M_inner / z_ratio_kappa /
  z_ratio_rho function args, calls set_z_ratio_tuning + set_graph_prior_spec
  when "hierarchical".
- GGMModel copy constructor (header) now also copies graph_prior_spec_ +
  v_M_inner_ + v_kappa_ + v_rho_ so multi-chain runs propagate the setting.
  Lazy state (pools, log_Z_NLO_curr, chain_aux_degord) is intentionally
  reset on clone - pools are RNG-derived and would otherwise share random
  draws across chains.

Roxygen
-------
- bgm.R gets full @param entries for graph_prior_spec and z_ratio_tuning.
  man/bgm.Rd regenerated via devtools::document().

Tests
-----
tests/testthat/test-ggm-hierarchical.R adds 10 new R-API assertions:
- bgm() with graph_prior_spec = "hierarchical" returns a posterior
  inclusion matrix with valid shape and values in [0, 1].
- Cauchy slab + hierarchical errors with a helpful message naming the
  required interaction_prior_type.
- Pure ordinal data + hierarchical errors naming "continuous".
- z_ratio_tuning rejects M_inner = 0, kappa < 0, rho in {0, 1}.

All 23 assertions (13 C++ smoke + 10 R-API) pass. Phase 1-3 tests
(1253 + 1526 + 10) continue to pass; default Joint path is unchanged.

What is left in Stage 3
-----------------------
- Phase 5: SBC validation against AKM-J truth across the delta-sweep at
  q = 10 (the manuscript Table 5 cells, post the Z-side disable_log_r
  fix).
- Phase 6: vignette section + cross-references in the spikeslab companion.
Two CI failures on PR #109:

1. tests/testthat/fixtures/{log_z_nlo_reference,degord_sampler_reference}.rds
   were absent from R CMD check because .Rbuildignore had the blanket
   pattern `^tests/testthat/fixtures$`, which the original author had
   intended to exclude only the legacy-fixture subdirectory but which in
   fact swallows every fixture below it. Tighten the pattern to
   `^tests/testthat/fixtures/legacy$` so the live fixtures ship; verified
   via R CMD build that the resulting tarball contains
   `tests/testthat/fixtures/degord_sampler_reference.rds` and
   `tests/testthat/fixtures/log_z_nlo_reference.rds` while continuing to
   exclude the `legacy/` subdirectory.

2. lintr semicolon_linter warnings on `; var <- val` compound statements
   in three test files. Rewrite each compound-on-one-line as
   newline-separated statements:
   - tests/testthat/test-degord-sampler.R (4 sites)
   - tests/testthat/test-ggm-hierarchical.R (3 sites)
   - tests/testthat/test-z-ratio-estimator.R (3 sites)

All 4 test files re-run clean locally: 1253 + 1526 + 10 + 23 = 2812
assertions pass. `lintr::lint_package()` reports 0 lints.
…(Phase 5)

Stage 3 Phase 5: simulation-based calibration of the bgms hierarchical-
spec chain at p = 5 across delta in {0, 0.5, 1, 2}, R = 300 replications.

Scope decision (q = 5 vs q = 10)
--------------------------------
The methodology paper validates at q = 10 against AKM-J perfect-sampler
truth, but AKM-J hits its max_proposals cap on roughly 23% of delta = 2
draws (selection bias on hard high-tilt cases that the unbiased Lyne RR
chain is designed to handle correctly). Restricting bgms's SBC harness
to q = 5 sidesteps this: at q = 5, sample_ggm_prior()'s constrained NUTS
at fixed Gamma_true converges cleanly across the full delta sweep, so
the truth source is trustworthy.

Harness
-------
Per replicate at each delta:
  1. Gamma_true ~ Bern(0.5) on the upper triangle.
  2. K_true | Gamma_true via sample_ggm_prior(n_warmup = 500, n_samples = 1).
     Rebuild K_true using offdiag_names (row-major) rather than
     upper.tri() (column-major) so the off-diagonal entries land at the
     right (i, j); without this the matrix is non-PD.
  3. Y | K_true ~ N(0, K_true^{-1}), n_obs = 100.
  4. bgm(Y, ..., graph_prior_spec = "hierarchical"), iter = 400, warmup = 100.
  5. Rank K_ii_true within posterior raw_samples$main[[1]][, i].
     Both sides are on the same K_yy partial-association scale; verified
     empirically (mean K_diag_true matches mean main posterior at near-
     empty graphs, low delta).

Tests
-----
- KS uniformity test per K_ii per delta (5 x 4 = 20 assertions), pass at
  alpha = 0.01.
- Gamma marginal calibration: posterior mean inclusion across the 300 R
  reps tracks empirical Gamma_true frequency within 4 SE (binomial gap
  at R * p(p-1)/2 = 3000 draws). One assertion per delta (4 total).

Total: 24 assertions, env-gated behind BGMS_RUN_SLOW_TESTS = true.

Runtime
-------
Smoke run at R = 30 (delta = 0.5) takes ~12 s; full R = 300 across
delta in {0, 0.5, 1, 2} extrapolates to ~8 min on M5 Pro. Verified
ranks uniform at R = 30 with KS p in [0.11, 0.84] across all 5 K_ii.

Why edge-indicator ranks aren't tested directly
-----------------------------------------------
Spike-and-slab makes per-(i, j) gamma_ij a discrete {0, 1} draw, which
breaks standard SBC's continuous-rank protocol (tied ranks degrade the
KS test's calibration). The existing test-sbc-ggm.R sidesteps this
by running WITHOUT edge selection. Here we run WITH edge selection
(it's the load-bearing case for hierarchical-spec) but only rank-test
K_ii. The gamma marginal calibration check is a sanity assertion, not
strict SBC.
Two compounding fixes to the hierarchical-spec V(Γ, U) estimator.

F5 - log-space V. The linear V_at_Gamma_pi_degord materialises
c = κ·exp(log_Z_NLO) which underflows to 0 at large p (log_Z_NLO ≈
-3500 nats at p=100, δ=1), turning every MH proposal into a silent
auto-reject. V_log_at_Gamma_pi_degord factors V = (1/c) · S and
computes log|S| via signed log-sum-exp over the K_depth + 1
truncated-series terms. log_κ cancels in the MH ratio; the ADD/DELETE
branches of update_edge_indicator_parameter_pair now pass log_c per Γ
and auto-reject on non-finite log|V| or sign flip across Γ_curr/Γ_star
(Phase-1 stand-in until a Lyne sign accumulator lands).

F6 - within-toggle cache reuse. log_Zhat_star_from_cache mirrors
delta_log_Zhat_pi_toggle's per-sample loop but exposes log_Ẑ_star
rather than the delta. V_log_pair_at_Gamma_curr_star_degord runs one
cached Phi build under a_curr per pool and re-evaluates only row q-2
under a_star, halving the inner Phi rebuild cost. Both ADD/DELETE
branches now invoke the paired call.

Tests:
- test-z-ratio-estimator.R: log-vs-linear bit-equality at q ∈ {5, 10,
  20} (max log_abs gap < 1e-9); underflow regression at log_c = -3500;
  paired-vs-independent bit-equality across q ∈ {5, 10, 20}, K_depth ∈
  {0, 2, 5}; cache adapter matches a fresh log_Zhat_pi_from_pool.
- test-ggm-hierarchical.R: p=20 cell runs a full bgm() at δ=1,
  hierarchical-spec; asserts finite indicators in [0,1] and a moving
  chain.

3018 assertions across degord-sampler / log-z-nlo / z-ratio /
ggm-hierarchical green; full suite (9736 PASS) also green.
Phase 4a smoke validated AMH + hierarchical only. Mirror cell at
update_method = "nuts", same p / iter / warmup / prior, confirms the
2x2 cross-product (NUTS/AMH × joint/hierarchical) is genuinely a
clean cross-product: the V(Γ, U) hook lives entirely in the
between-edge MH update path, so NUTS only governs the within-model
continuous block. No R-layer guard was needed.

Asserts finite indicators in [0, 1], correct shape, and a moving
edge-count trajectory (chain not locked at empty or full graph).
cholesky_update_after_{edge,diag} replaced the per-accept O(p^3)
arma::solve(trimatu(L), I) + inv_L * inv_L.t() covariance refresh
with one Sherman-Morrison-Woodbury rank-2 update (4 x O(p^2)) for
the edge path and one rank-1 update for the diagonal path. The
Cholesky factor L is still maintained via the existing cholupdate /
choldowndate pair (needed for get_log_det). inv_cholesky_of_precision_
is no longer maintained per accept - only refresh_cholesky() touches
it - so it's effectively scratch state owned by the refresh path.

Capacitance singularity (|det C| < 1e-14 in the edge path,
|1 + alpha * cov(i,i)| < 1e-14 in the diagonal path) falls back to
refresh_cholesky(), preserving the prior safety net.

End-of-sweep drift check check_and_refresh_if_drift_() guards
SMW-accumulated FP error by computing max_i |sum_k cov(i,k) * K(k,i) -
1| in O(p^2) and refreshing when it exceeds kCovDriftTol_ = 1e-8.
Wired into both do_one_metropolis_step (within-model sweep) and
update_edge_indicators (between-model sweep).

Net effect on AMH GGM edge-update per accept:
  2 x O(p^3) (solve + outer product)  ->  4 x O(p^2) + O(p^2) check.

Full test suite (9736 PASS) green; existing GGM cells (85 PASS) and
the p=20 hierarchical regression cell exercise the new path without
spurious refreshes.
Per-iteration sign(V_curr) and log|V_curr| are now maintained inside
GGMModel and surfaced as a top-level v_ratio_diagnostics field on the
bgm() return for hierarchical-spec fits. Joint-spec fits leave the
field NULL.

GGMModel carries current_sign_V_ / current_log_abs_V_ /
v_diag_initialized_. Both branches of update_edge_indicator_parameter_pair
seed the diagnostic from V_pair.curr on the first finite proposal and
advance to V_pair.star on accept. The state is stable on reject -
matches the spec's "sign stays at the previous step's value on auto-
reject" rule.

BaseModel gains three virtuals (has_v_ratio_diagnostics,
current_sign_V, current_log_abs_V) with safe defaults; GGMModel
overrides them under graph_prior_spec_ == Hierarchical. ChainResult
gains v_sign_samples / v_log_abs_samples and the matching reserve /
store helpers, mirroring the existing AM/NUTS diagnostic pattern.

R-side: convert_results_to_list packs per-chain v_sign / v_log_abs.
build_raw_samples_list (GGM branch) renames to the trailing-__
convention. build_output_bgm assembles a top-level
v_ratio_diagnostics = list(sign = list-of-chains, log_abs = ...).
bgms_class declares the S7 property and s3_list_to_bgms passes it
through (both required - missing either silently NULL-coerces the
field via the S7 accessor).

Test in test-ggm-hierarchical.R confirms presence, shape, finite
log_abs values, and that >= 95% of recorded signs are +1 in the
operational cell (q=5, δ=0.5, κ=1, ρ=0.5); joint-spec fit must leave
the field NULL.

Full suite: 9746 PASS, 0 FAIL.
New exported bgms_posterior_mean(fit) computes the Lyne (2015)
sign-corrected ergodic estimator

  E[f(K) | Y] ≈ Σ sign_iter · f(K_iter) / Σ sign_iter

for the pairwise, indicator, and residual-variance fields. Under the
operational regime (low δ, reasonable κ) sign === +1 and the
correction collapses to the plain colMeans-based posterior mean;
sign correction matters at high δ or aggressive κ where the
Russian-Roulette estimator of 1/Z(Γ) can flip sign. Joint-spec fits
and any fit without v_ratio_diagnostics fall through to the existing
posterior_mean_* fields unchanged. Raises a helpful error when the
sign vector sums to zero (degenerate regime; remediation is to raise
kappa).

Three test_that cells in tests/testthat/test-ggm-hierarchical.R
exercise the helper:

- Passthrough: on a sign === +1 hierarchical fit, the helper returns
  bit-equal results vs the plain posterior_mean_* fields; on a
  joint-spec fit, returns the plain fields unchanged.
- Sign-correction math: mutates the sign vector on a real
  hierarchical fit (via a plain-list view to skirt S7 immutability)
  to a 2/3 +1, 1/3 -1 pattern and asserts the helper's output equals
  the hand-computed sign-weighted mean while differing from the
  plain mean by more than numerical noise.

NULL fields preserve their slot via the `out["main"] = list(NULL)`
idiom so the returned list always has the same names regardless of
model type.

Full suite: 9756 PASS, 0 FAIL.
The hierarchical-spec correction in update_edge_indicator_parameter_pair
had the sign of the V(Γ_star)/V(Γ_curr) factor inverted in both ADD and
DELETE branches. The math:

  T_hier(K, Γ)        = T_joint(K, Γ) / Z(Γ)
  T_hier_ratio(c→s)   = T_joint_ratio × Z(Γ_curr)/Z(Γ_star)
                      = T_joint_ratio × V(Γ_star)/V(Γ_curr)    [V ≈ 1/Z]
  ln_alpha_hier      += log|V_star| - log|V_curr|

bgms had `ln_alpha += V_pair.curr.first - V_pair.star.first` (the
reverse). The accompanying comment also stated the wrong direction.
Z's reference chain (branchB_chain_route3a_degord.cpp:412) has the
correct sign:

  log_r = std::log(std::abs(V_star)) - std::log(std::abs(V_curr));

Why the q=5 SBC didn't catch it. The gamma-marginal calibration test
tolerance scales as 4·sqrt(p_inc·(1-p_inc) / (R · n_edges)). At q=5,
n_edges=10, R=300 the tolerance is ~0.037; at q=10, n_edges=45, R=500
it tightens to ~0.013. The bias scales with q (more edges accumulate
the misdirected V-weighting); at q=5 the bias was within the looser
tolerance, at q=10 it shows clearly.

Validation: q=10 SBC sweep at the Z-matched chain config (n_warmup =
n_iter = 2000, M_inner = 50, kappa = 2.0, rho = 0.5, R = 500) across
delta ∈ {0, 1, 2}.

                Pre-fix              Post-fix
  delta = 0    8/11 FAIL           10/11 PASS
                                   (K_ii[5] ks_p=0.008, borderline)
  delta = 1    1/11 FAIL (gamma)   11/11 PASS
  delta = 2    4/11 FAIL           11/11 PASS
  Total       13/33 FAIL           32/33 PASS

The single borderline failure at delta=0/K_ii[5] (ks_p=0.008 vs the
0.01 threshold) is well within expected multiple-testing noise
(~0.3 false positives expected across 30 K_ii cells at alpha=0.01).

New gated slow test tests/testthat/test-sbc-ggm-hierarchical-q10.R
locks the fix in - 22.6 min wall on 12 cores at this config.
@MaartenMarsman MaartenMarsman changed the title feat(ggm): log Z(G) closed-form primitives for hierarchical-spec MH Draft - DEGORD - RR May 21, 2026
Two cross-validated changes to the hierarchical-spec V/RR machinery from
the 2026-05-21 companion-AI exchange.

mh_U: at sweep boundary, refresh the auxiliary U-pool via a V-ratio MH
step instead of a fresh draw from the prior. Fresh-from-prior broke the
augmented PMMH target (p(U|G,K) is V*mu, not mu alone) and caused a
small but systematic Γ-marginal bias (~-0.001 nats at p=20, p_inc=0.05;
5/5 seeds same sign). MH-on-U with proposal symmetry on (μ, P(N))
reduces the MH ratio to log|V_new|-log|V_old|; V_old is read from the
running V cache to avoid a redundant evaluation. Verified on bgms-side:
bias collapses to MC noise (2 neg / 2 pos across 4 seeds at p=20).

manuscript_NLO: port of the App C eq:NLO-decomp closed form (R/src/
manuscript_NLO.h, 2026-05-21). At α=1, δ=0 bit-identical to bgms's
log_Z_NLO_gamma; diverges in the B_e term at δ>0. With mh_U on, this
becomes a mixing-efficiency lever rather than a correctness fix, but
at p=20, p_inc=0.05 the centring choice still affects the residual
bias (5.8× reduction in measured bias for mNLO vs bgms NLO without
mh_U).

End-of-chain diagnostics_summary surfaces per-direction MH auto-reject
classifications and mh_U accept/attempt counters. Both flags default
false; SBC-clean baselines untouched.
# Conflicts:
#	R/RcppExports.R
#	src/RcppExports.cpp
Checkpoint commit before pausing the V/RR work for the GG-prior project.
Bundles four logically separable changes:

1. Sign-flip auto-reject removed. Chain now targets the Lyne (2015)
   |V|-augmented density π·|V|·μ·P(N) and tracks sign(V) per iteration;
   ergodic averages use the existing bgms_posterior_mean() helper for
   the sign-corrected mean. Empirical mean(sign(V)) = 1.0 in operational
   cells, so the correction collapses to the plain mean — but the prior
   auto-reject was a directionally biased approximation we are no longer
   taking.

2. Live diagnostic. chain_runner writes per-iteration (K_depth, sign,
   wall, mh_U accept-window) lines to BGMS_LIVE_DIAG=<path> when set;
   surfaces the warmup→sampling cost cliff and the K-dwell pattern that
   drives slow PMMH-on-RR mixing.

3. WarmupSchedule-gated U refresh. mh_on_U_step moved out of
   prepare_iteration into the new BaseModel::refresh_auxiliary_u() hook
   called only when schedule.u_refresh_enabled(iter) is true (i.e.,
   stage 3c + sampling). Stops the U/K_depth state from drifting via
   PMMH dynamics during stages 1-3b when no between-Γ moves consume it.

4. Local-K kernel. mh_U_local_K_ flag activates a local random-walk MH
   on the RR truncation depth (K_new ∈ {K_old−1, K_old+1} with
   reflection at 0). Symmetric proposal, geometric-prior ratio
   ρ^(K_new−K_old) in the MH numerator, ±log 2 boundary corrections.
   Mixed 1-in-50 with the fresh-from-prior global step to keep the long-
   jump escape route alive. Attacks the K-dwell trap that the global
   kernel cannot escape efficiently at moderate p_inc.

5. Plug-in mNLO mode (plug_in_nlo_ flag). Bypasses the entire V/RR/U
   machinery in the between-Γ MH; uses log_Z_NLO(Γ_curr) − log_Z_NLO(Γ_star)
   as the deterministic Z-ratio approximation. Trades small bounded bias
   for predictable cost; per-toggle wall is flat in p instead of scaling
   with the random K_depth. At p=20 across p_inc ∈ {0.05, 0.5, 0.95} the
   plug-in chain runs 30-70× faster than the exact RR+mh_U+local-K chain
   with similar Γ-marginal bias.

Empirical findings (held for the SD/companion follow-up):

- Local-K works at p=20 sparse (mean K 3.03 → 2.15) but breaks at p=20
  dense (mean K → 11) and saturated (mean K → 29). The augmented
  marginal genuinely lives at high K under heavy-tailed |V| from
  imperfect centring; the kernel is correct but operationally costly.
- Γ-marginal correctness gate at q=10/20 with the local-K kernel:
  per-edge pip vector matches Bern(p_inc) within autocorr-adjusted
  MCSE; sign(V) ≈ 1 throughout. Bias is a *cost* problem (slow Γ
  mixing at high K), not a kernel problem.
- Plug-in mNLO is the practical out: q=20 joint takes seconds, exact-
  RR takes 15-30 minutes for the same chain length, plug-in takes the
  same seconds as joint with bias bounded by the mNLO centring error.

No tests added; this is a checkpoint, not a PR. Defaults preserve
existing chain behaviour (mh_U=FALSE, plug_in_nlo=FALSE, etc.).
…caffolding

Adds an SD-based MH ADD/DELETE chain for edge indicators in L-coordinates
(K = L Lᵀ), behind the use_sd_between_step + use_sd_lspace flags. The L-space
parameterisation removes the PD-cone constraint from the proposal (any
l_ji ∈ ℝ yields PD K), and at α=1 the conditional on l_ji is closed-form
Gaussian, so the per-edge between-step is exact Gibbs — no Newton, no NLO,
no truncation residual.

Validated prior-only at q=50, σ ∈ {0.25, 1.0}, p_inc ∈ {0.05, 0.30, 0.50},
α=1, δ=0: all six cells unbiased (|z| ≤ 1), PD-revert canary fires only
at genuine numerical near-PD events and the chain reverts cleanly.

Components:
- src/models/ggm/sd_density_at_zero.{h,cpp}: 1D Laplace+NLO primitive for
  the SD log-density at K_ij = 0 (ported bit-for-bit from Z's reference;
  641 tests pass against the standalone primitive).
- src/math/cholesky_helpers.{h,cpp}: perm_to_trailing_2x2 — Bunch-style
  symmetric adjacent swaps + Givens rotations to extract (l_ii, l_ji, l_jj)
  from the DEGORD-permuted Cholesky factor. Bit-for-bit against chol(P K Pᵀ)
  across p ∈ {3..20} (7685 tests pass). Kept as a library helper; the SD
  hot path uses the simpler direct-Σ read after per-accept arma::chol keeps
  L exactly in sync with K.
- update_edge_indicator_parameter_pair_sd_lspace: per-edge SD between-step
  reading Σ_BB directly from covariance_matrix_, computing (l_ii, l_ji)
  via the 2×2 inverse, applying the closed-form Gaussian conditional, MH
  ratio in L-coords (no Jacobian — cancels between numerator and
  denominator), and committing via per-accept arma::chol of K (O(p³))
  with a PD-revert canary on failure.
- n_pd_reverts_ counter + Rcpp warning at end of chain when non-zero.
- Test interfaces ggm_sd_smoke_cpp (SD chain prior-only/PIP) and
  ggm_plug_in_smoke_cpp (DEGORD plug-in mNLO comparator).

Two architectural variants attempted and reverted (see git diff history
in branch): SMW-only accept (skip per-accept chol; maintain Σ via SMW)
and drift-bounded L (rank-2 chol-up + SMW + periodic refresh). Both fail
because the algebraically-updated K and the rank-r-updated L (or Σ) drift
apart, and the next periodic chol(K) finds K numerically non-PD. Keeping
K = L Lᵀ exactly without per-accept full refactorisation appears to
require recomputing K from L every accept (O(p³/2)) — same order as
arma::chol, so no real speedup.

Wall time at q=50, 3 cells parallel: sparse (p=0.05) ~43s, dense
(p=0.50) ~410s; matches the per-accept O(p³) cost.

Plug-in mNLO under prior-only is biased (z ranging -405 to -2721 across
the six cells; PIPs at 22-50% of target depending on σ). Suggests the
plug-in approximation contributes a non-zero between-Γ term under the
prior — useful diagnostic, separate follow-up.
…H proposal correction

Extends the L-space Savage-Dickey between-step beyond the closed-form α = 1
case. At α > 1 the conditional on l_ji gains a non-Gaussian (α − 1) log(s_jj + l_ji²)
term — log-concave over ℝ in operational regimes — and Gibbs sampling is no
longer available in closed form.

Two new primitives:

- sd_density_l_space.{h,cpp}: Newton + Laplace + Tierney-Kadane NLO over ℝ.
  Fast (one Newton iteration in steady state) and bit-for-bit against the
  α=1 closed form. But fragile in near-bimodal cells (small s_jj × large
  α − 1): Newton stalls at saddles where f''(x_mode) ≈ 0, status returns
  non-zero, and a chain that simply reverts those toggles biases its
  γ-marginal (we observed z = ±60 in worst cells when using Laplace alone).

- sd_density_l_space_quad.{h,cpp}: 64-node Gauss-Hermite quadrature for
  log_Z. Nodes/weights computed once via Golub-Welsch (eigen-decomposition
  of the tridiagonal Jacobi matrix) and cached. Numerically reliable across
  all chain configurations — machine-precision agreement with adaptive
  quadrature for our smooth integrand, including the cells where Laplace
  diverged. ~64 log/pow evaluations per call.

Chain integration:

- α = 1 path unchanged (closed-form Gaussian, validated previously).
- α > 1: quadrature primitive for log_BF at m_ij (both post and prior),
  Laplace primitive as proposal mode/curvature (with Gaussian fallback if
  Newton fails), explicit MH correction term log[π_target / q_proposal]
  evaluated at the sample point (ADD) or current state (DEL). The correction
  closes the proposal-mismatch component of bias that quadrature alone would
  leave.

Validation (q=50 prior-only, σ=1.0, α=2.0, post-fix):

| p_inc | PIP    | z     | reverts | wall  | (before, Laplace-only)
|-------|--------|-------|---------|-------|----------------------
| 0.05  | 0.0501 | +0.84 | 13      |  93s  | PIP=0.0775 z=+60 reverts=51k
| 0.30  | 0.3000 | -0.25 | 21      | 296s  | PIP=0.3088 z= +9 reverts=27k
| 0.50  | 0.5000 | +0.11 | 37      | 449s  | PIP=0.4632 z=-35 reverts=29k

Skip-revert events down ~1000×; PIPs within ±1σ of the Bernoulli(p_inc)
stationary target. Wall time ~2-2.5× the α=1 path due to quadrature overhead
(~200 extra log/pow per proposal × ~13M proposals/cell ≈ 50s) — pure tax for
non-Gaussian conditional. Speed optimization (hybrid Laplace/quadrature,
GH-32, log_Z caching) is the next step on top of this correctness baseline.

q=10 full sweep over α ∈ {1.5, 2, 3} × σ ∈ {0.25, 1.0} × p_inc ∈ {0.05, 0.30, 0.50}
also validates: max |z| = 2.34, all reverts ≤ 3.
The Gauss-Hermite quadrature for the conditional log-Z at α>1 was using 64
nodes; reduces to 32 with no observable accuracy change (GH is exact for
polynomials of degree 2N−1 and the integrand is smooth, so both N values
give >10-digit precision for our chain).

q=50 prior-only σ=1.0 α=2.0 wall:

| p_inc | GH-64 | GH-32 |  Δ
|-------|-------|-------|------
| 0.05  |  93s  |  71s  | -23%
| 0.30  | 296s  | 274s  |  -7%
| 0.50  | 449s  | 427s  |  -5%

PIPs and revert counts are bit-identical to GH-64 at all 3 cells. The
shrinking relative speedup at higher p_inc reflects the per-accept O(p³)
arma::chol increasingly dominating as accept count grows; GH overhead is
fixed-ish per proposal and amortises over chol work.

A hybrid Laplace+GH path (use Laplace+NLO when status=0, GH otherwise) was
tried and reverted: Laplace+NLO at status=0 still has ~1e-2 residual error
in many cells, which compounds in the chain into multi-σ bias (z=+38 in
worst case at q=10). Pure GH-32 is the right correctness/speed point.
…y handling

Five fixes spanning the SD between-step's prior convention and a long-standing
silent-drop in bgm(). SBC at q=10 (Γ, K_diag, K_off|γ=1) passes uniformity
after these; 40-cell prior recovery unaffected.

R/bgm.R: when both `inclusion_probability` (deprecated) and a non-default
`edge_prior` are supplied, previously the deprecated value was silently
dropped after a warning. Now: honor the deprecated arg when only it is given;
require values to match when both supplied; error on conflict.

src/models/ggm/ggm_model.cpp (SD between-step):
- Slab convention: bgms evaluates the slab at -K_ij/2 (K_yy scale), giving
  K_ij sd = 2·prior_sigma_. The SD primitive documents slab on K_ij directly,
  so pass σ_K = 2·prior_sigma_ to align. Affects both K-space
  (update_edge_indicator_parameter_pair_sd) and L-space variants.
- Diag prior factor: K_jj/2 ~ Gamma(α, β) means K_jj ~ Gamma(α, β/2). The
  L-space SD tau formulas had `2β`; the correct contribution to tau is `β`.
  Fixed at α=1 and α>1 paths.
- Explicit PD canary in update_diagonal_parameter prior_only mode; previously
  the diag update relied on K_ii > 0 as an implicit PD check, which is too
  weak and let non-PD K accumulate at shallow det-tilt.

src/log_z_test_interface.cpp: ggm_sd_smoke_cpp gains
- `sample_thin` to emit thinned K samples for SBC rank analysis,
- `edge_selection` flag for fixed-graph diagnostics,
- K_offdiag samples are stored as K_ij * γ_ij so inactive edges are exact 0
  (eliminates ~1e-18 floating-point noise from Roverato DEL cancellation that
  otherwise confounds spike-slab rank statistics),
- exposes `final_K` for SBC data-simulation harnesses.
- tests/testthat/test-sd-density-at-zero.R: replace compound semicolons
  with newlines (semicolon_linter warnings, fatal under LINTR_ERROR_ON_LINT)
- _pkgdown.yml: add bgms_posterior_mean to Posterior methods section
- man/bgm.Rd: regenerate from roxygen (example assignment style sync)
ggm_sd_smoke_cpp gains a within_k_mode flag ("am" | "nuts"). The "nuts"
path runs an unconstrained NUTS step on the active theta each sweep
(dual-averaging step-size adaptation in warmup), composed with the SD
between-step that toggles edges. Lets SBC isolate the within-K choice
from the SD between-step's contribution to chain dynamics.

Under prior_only=TRUE the smoke now constructs the model with n=0 and
S=0 (mirrors sample_ggm_prior). The AM within-K checks prior_only_ at
the per-step MH ratio, but the NUTS gradient engine bakes n_ and
suf_stat_ into the log-posterior with no such flag — so without this
the NUTS path would silently include the data under prior_only.
The L-space SD primitive's kernel f(l_ji) = -A l_ji² + B l_ji
+ (α-1) log(s_jj + l_ji²) drops the (n/2 + δ) log|K(l_ji)| term from
the data likelihood determinant and prior tilt. K_ij and K_jj both
depend on l_ji along the Roverato slaving trajectory, so log|K| is NOT
l_ji-invariant — the kernel as written targets π_FALSE = π_TRUE /
|K|^{n/2+δ}, not the true joint posterior.

Empirically (SBC at q=10, n=200, p_inc=0.3, α=1, δ=2, 200 reps):
  K_diag pooled KS-p: 0.029 → 0.181, mean PIT: 0.512 → 0.509.
Γ and K_off|γ=1 stay calibrated. The bias was data-dependent (calibrated
under prior_only; reappears under posterior) because the missing term
is proportional to (n/2 + δ) and matters when the SD step toggles γ
often. Hidden in v9/v10 at p_inc=0.5 by the α=1 + p_inc=0.5 + prior_only
degenerate accept-everything regime and at high n by data-driven PIPs
near {0, 1} (chain rarely toggles γ).

Fix: after sampling l_ji_new and computing the proposed K_ij_new,
K_jj_new, add (n/2 + δ) · log|K_after|/|K_before| to log_alpha. By
standard MH theory this shifts the chain's effective target from
π_FALSE to π_TRUE — proposal can stay Gaussian (α=1) or
Laplace+GH (α>1); both already in place.

Implementation note: inline the rank-2 matrix-determinant lemma's 2×2
form rather than calling log_det_ratio_edge — the sign of cc11·cc22 −
cc12² (under the lemma's sign-flip convention here, negative = PD)
detects PD violations without an arma::chol, keeping the per-attempt
overhead negligible. Wall-clock cost in 200-rep SBC mclapply: +1.7%.
arma::chol returns false only for strict non-PD K, but the SD between-step
sometimes accepts barely-PD K — one pivot ~ 1e-10 — that passes the chol
canary. The next within-K NUTS step then evaluates K^{-1} entries on the
order of 1/min_pivot, gradient overflows, theta becomes inf, K becomes
garbage. We saw K_jj samples ~ 1e13 at extreme priors (σ=1.0, p_inc=0.5,
low α, δ=0).

Add a min(diag(U)) < 1e-6 check alongside the chol canary. The threshold
corresponds to a minimum K eigenvalue ~1e-12, well above double-precision
noise but loose enough not to perturb well-conditioned chains. Catches
the SD-step contribution to runaway K_jj; the chain still has heavy-tailed
exploration at extreme priors from within-K NUTS itself, which n_pd_reverts
now flags more reliably as a warning bell.
Pure-R prototype of the (gamma, K_ij, omega) auxiliary-variable chain at
fixed rest-conditional state. Three configurations (weak / borderline /
strong S_ij in the alpha=1 closed-form regime), 5 seeds * 200k iterations.

Findings:

 * The analytic marginal posterior P(gamma=1) is
     P(gamma=1) = pi_inc * BF_inc_marg / (pi_inc * BF_inc_marg + (1 - pi_inc))
   where BF_inc_marg = E_omega[BF_inc(omega)] = E_omega[1 / BF_excl(omega)].
   This is NOT 1 / E_omega[BF_excl(omega)] (Experiment 2 was computing
   the wrong quantity). The corrected analytic is now what the marginal
   Gibbs chain matches to MC error.

 * The marginal Gibbs reference chain (iid samples from the analytic
   marginal) matches the analytic to z < 1 across all configurations.

 * The auxiliary (gamma, K, omega) chain mixes essentially iid (ESS ~ T)
   but shows a reproducible positive bias in P(gamma=1) of 0.5%-1.5%.
   Bias is largest at the borderline regime (P(gamma=1) ~ 0.4-0.5) where
   gamma flips most often -- suggests a subtle issue at the MoMS
   mutually-singular boundary. Each conditional update has been re-derived
   and looks correct; the K-space transform mu_K = l_ii * (mu_l - m_ij),
   sd_K = l_ii / sqrt(tau_l) is verified. Cause not yet identified.

Implication: the auxiliary scheme appears "essentially correct" but is
not bit-exact to the analytic. Worth investigating further before
committing to the C++ implementation. Likely candidates to probe next:
 - run with K removed from state (collapsed (gamma, omega) with
   non-conjugate omega update by rejection sampling), see if bias
   persists
 - compare against a 2D quadrature for the joint marginal P(gamma=1),
   independent of the chain mechanism
 - look at the alpha > 1 case (uses sinh primitive, not the closed form)
   to see if the bias is alpha-dependent
Three probes investigating the 0.5-1.5% chain bias from Experiment 3.

Probe (a) -- cross-route verification of BF_inc_marg:
  omega-route (1D integral over omega using bgms's SD formula) gives
  0.20, 0.77, 9.40 for weak / borderline / strong.
  K-route (1D integral over K with pure Cauchy slab) gives
  0.12, 0.45, 5.44 -- a near-constant factor of ~0.58 different.
  2D route over (K, omega) agrees with K-route exactly (sanity check).

  These are not the same Bayes factor: bgms's SD includes the diag prior
  contribution in BOTH numerator and denominator of the L-space ratio
  at m_ij. The diag prior cancels at fixed K_ij value but NOT under the
  scale-mixture integral over omega. The K-route corresponds to the
  natural "spike vs. pure Cauchy on K_ij" BF; the omega-route corresponds
  to a different BF involving the conditional-Roverato-prior structure.

Probe (b) -- collapsed (gamma, omega) chain via slice sampling:
  Matches the omega-route analytic essentially exactly (z = 0.06, 1.02,
  -2.02). So the slice chain is faithful to whatever the omega-route is.
  The Experiment 3 aux chain matched the same omega-route to ~1% residual;
  that small residual is a real MoMS-boundary artifact in the (gamma, K,
  omega) update, but the BIG bias from Experiment 3's first reporting
  was that the analytic was computed wrong (omega-route is what the
  chain targets; K-route is the "natural" Cauchy BF).

Probe (c) -- alpha > 1 via sinh primitive:
  Same chain at alpha = 3 matches omega-route well (z = -0.94, -3.24,
  +0.66). Borderline is slightly off in opposite direction from alpha=1
  -- suggests the residual is a small MoMS artifact, not a method bug.

Conclusion: this is a modelling question that has to be answered before
any C++ change. Either (Omega) keep bgms's existing SD-formula
interpretation -- aux scheme drops in trivially with the sigma^2 ->
sigma^2 omega_ij substitution; or (K) rework log_bf_excl so the "prior"
side is the pure slab -- gives the natural Cauchy-on-K_ij BF, but also
changes the existing Normal-slab BFs by the same ~1.7x factor.

Code: experiments/cauchy-slab/04-bias-diagnostic.R
…tion

Experiment 5 -- toy spike-and-Cauchy-slab on a single Gaussian observation
y ~ N(K, 1), with no L-space / Roverato / SD primitive machinery. Closed-
form BF_inc(omega), analytic marginal verified to MC. The aux chain
(gamma, K, omega) with conjugate omega | K ~ IG(1, 1/2 + K^2/(2sigma^2))
matches the analytic to z < 1 across weak / borderline / strong evidence.
=> The (gamma, K, omega) Gibbs-scan structure is not the source of bias.

Experiment 6 -- beta sweep of the Experiment 3 GGM-toy. At beta = 0.5
(default) the aux chain shows the original 1% bias: z = +6.15, +8.41,
+20.10 across weak / borderline / strong. At beta = 0, bias collapses to
z = +1.48, -0.82, -0.39 -- pure MC noise.

Mechanism. Under bgms's L-space framework with rest_L = (l_ii, c_3, ...),
the effective prior on K_ij | gamma=1, omega includes a diag-prior
contribution (the beta/(2 l_ii^2) term in A_prior). That introduces an
omega-dependent normalisation Z(omega) into the conditional p(K | gamma=1,
omega, rest_L). My pure-slab conjugate update for omega silently drops
Z(omega). At beta = 0 the term is absent (pure slab) and conjugacy is
exact -- which is why the toy works and beta=0 in Experiment 6 works.

Fix for the production C++. Replace pure-slab conjugate omega | K with
slice sampling on the full L-space-consistent omega conditional:

   log p(omega | K, gamma=1, rest_L) prop
       -K^2 / (2 sigma^2 omega) - log Z(omega) + log p_IG(omega)

   log Z(omega) = 0.5 log(pi / A(omega)) + B^2 / (4 A(omega))
   A(omega) = 1/(2 sigma^2 omega) + beta / (2 l_ii^2)
   B = -beta * m_ij / l_ii

~15 lines of C++ for a log-space slice sampler (stepping-out + shrinkage).
Per-step cost: a handful of arithmetic ops per evaluation, ~5 evaluations
per slice step. Verified bias-free by Experiment 5 (which uses slice on
omega and matches analytic to MC noise).

Cauchy implementation is now unblocked.
…rmals

Adds Cauchy(0, sigma) slab support to the hierarchical-spec SD path.
The Cauchy is represented as K_ij | gamma=1, omega ~ N(0, sigma^2 omega)
with omega_ij ~ InvGamma(1/2, 1/2) auxiliary mixing weights, one per edge.
The existing SD primitives (cubic_mode, sinh_midpoint, gauss_hermite,
laplace) are unchanged -- the Cauchy lifting is just sigma^2 -> sigma^2 *
omega_(i,j) at the call site, plus a per-edge omega update after the
(gamma, K_ij) MH step.

C++ changes
  - GGMModel gains a p x p omega_ matrix (upper-triangular storage; lower
    triangle and diagonal unused), initialised to 1.0 (= Normal slab
    pointwise) and copied on chain clone.
  - GGMModel gains a slab_is_cauchy_ flag, set by
    ensure_prior_params_extracted_() via dynamic_cast on the configured
    interaction_prior_. CauchyPrior gains a scale() accessor.
  - update_edge_indicator_parameter_pair_sd_lspace uses
    sigma^2 * omega_(i,j) when Cauchy; the Normal path stays bit-identical.
  - New slice_sample_omega_ij_() draws from the L-space-consistent full
    conditional p(omega | K_ij, gamma=1, rest_L). The conjugate
    InvGamma(1, 1/2 + K^2/(2 sigma^2)) update is wrong here (Experiments
    3/5/6 in experiments/cauchy-slab/) because the diag-prior contribution
    to the L-space "prior" introduces an omega-dependent Z(omega) factor.
    Slice sampling on u = log(omega) with stepping-out + shrinkage handles
    it. At gamma=0 the slab is irrelevant; omega draws from its IG(1/2,
    1/2) prior. Called from update_edge_indicators after each SD step.
  - rng_utils.h gains rgamma(shape, rate) for the IG-from-prior sample.

R changes
  - bgm_spec.R relaxes the hierarchical-spec slab restriction from
    Normal-only to {Normal, Cauchy}. Choosing cauchy_prior + Gamma diag now
    defaults to prior_factorization = "hierarchical" instead of "joint".
    Existing cauchy_prior + (no factorization given) users on bgmCompare
    or omrf are unaffected (those paths still require joint).

Verification
  - sd-baseline-snapshot: 42/42 pass (Normal slab bit-identical, both
    alpha = 1 and alpha = 3).
  - Smoke test: bgm() with cauchy_prior(scale = 0.5) on a q=4, n=80,
    one-true-edge problem detects the edge (PIP = 1.00) and null edges
    cleanly (PIPs 0.14 - 0.29), posterior mean for the true edge ~ 0.23.

This is MVP. Not yet covered: per-iteration omega snapshots in the output,
SBC for the Cauchy + hierarchical path, alpha > 1 with sinh primitive
(works in the code but unexercised in tests).
…ickey/

The slice-sampling logic + the Cauchy-via-IG(1/2, 1/2) log-conditional
are pure scalar math taking (K, sigma^2, A_diag_K, B_K, omega_curr).
The only model-specific piece is computing A_diag_K and B_K from the
Cholesky/Roverato state. Mixed MRF and other Cholesky-parameterised
models will derive analogous expressions and call the same primitive.

Refactor

  New src/math/savage_dickey/cauchy_omega.{h,cpp}:
    * slice_sample_cauchy_omega_active(K, sigma2, A_diag_K, B_K,
                                       omega_curr, rng)
    * sample_cauchy_omega_prior(rng)
  Naming + Doxygen header match the existing primitives
  (cubic_mode, sinh_midpoint, laplace, gauss_hermite). Math derivation
  in the file header with the manuscript-pointer convention; the
  implementation cpp stays terse.

  GGMModel::slice_sample_omega_ij_ becomes a thin wrapper: extracts
  l_ii, m_ij from the maintained covariance/precision, computes the
  GGM-specific A_diag_K = beta/(2 l_ii^2) and B_K = -beta m_ij / l_ii,
  and forwards. Drops ~80 lines of inline math/algorithm code from
  ggm_model.cpp.

Tests

  New tests/testthat/test-sd-cauchy-omega.R (6 passes):
    1. sample_cauchy_omega_prior matches IG(1/2, 1/2) quantiles + tail.
    2. slice sampler at A_diag_K = B_K = 0 matches the conjugate
       IG(1, 1/2 + K^2/(2 sigma^2)) reference.
    3. slice sampler matches an importance-sampling reference with
       nonzero A_diag_K and B_K.
  Tests run a proper chain (iterative slice steps) rather than
  one-step transitions from a fixed omega, so the chain quantiles
  reflect the actual stationary distribution.

  tests/testthat/test-ggm-hierarchical.R:
    * Replaced the stale "errors helpfully for Cauchy slab" test
      (which asserted Cauchy + hierarchical should fail) with a
      success path test.
    * Added a one-true-edge-recovery test (q=4, n=120, K_12 = -0.5):
      true edge PIP > 0.95, all five null edges PIP < 0.4.

  Test interface in src/savage_dickey_test_interface.cpp gains
  sd_slice_sample_cauchy_omega_active_cpp + sd_sample_cauchy_omega_prior_cpp.

Verification

  sd-baseline-snapshot 42/42 (Normal-slab path bit-identical).
  sd-cauchy-omega 6/6.
  ggm-hierarchical 17/17.
  sd-cubic + sd-sinh suites unchanged.
test-ggm-hierarchical.R gains:
 - Cauchy + hierarchical + Gamma(shape=3) test exercising the sinh
   primitive (alpha > 1 dispatch is otherwise unverified in the
   Cauchy-slab path).

test-sbc-ggm-hierarchical-cauchy.R (new, gated by BGMS_RUN_SLOW_TESTS):
 - Companion to test-sbc-ggm-hierarchical.R, swapping the slab to
   cauchy_prior. Truth via sample_ggm_prior() with the Cauchy slab at
   fixed Gamma (NUTS at fixed graph; independent of the SD chain).
   Inference via bgm() with cauchy_prior + hierarchical. Tests rank
   uniformity of K_ii per delta in {0, 0.5, 1.0} via KS at alpha = 0.01,
   plus gamma marginal calibration. R = 200 reps (smaller than the
   Normal-slab SBC's R = 300 to keep total runtime under an hour).
 - Smoke-verified end-to-end on one rep (sample_ggm_prior with
   cauchy_prior produces a valid truth; bgm() with cauchy_prior +
   hierarchical produces a non-degenerate fit).

Deferred to a follow-up commit
 - Per-iteration omega snapshots in fit output (touches ChainResult,
   chain_runner, build_output, and the S7 fit class — substantial
   enough to scope separately).
The 6 R scripts under experiments/cauchy-slab/ were exploratory design
work; the corresponding findings are already captured in commit
messages (notably 101d99c, which spells out the diag-prior / omega
conjugacy bias and the fix) and in src/math/savage_dickey/cauchy_omega.h's
file header. Keeping a tracked top-level experiments/ directory diverges
from the existing convention -- exploratory scripts live in dev/ which
is gitignored per .gitignore:18 (cf. dev/quadrature/, dev/numerical_analyses/,
dev/sbc/).

Files copied to dev/cauchy-slab/ locally before removal; the gitignored
copy remains available on this machine. Reverts the .Rbuildignore
^experiments$ entry.
The hierarchical-spec SD path now supports the Cauchy slab via the
scale-mixture-of-normals representation (added in 93b4b14), but several
docstrings still said "Normal slab only":

R/bgm.R
  - @param interaction_prior: drop the "Normal slab is the encompassing
    prior" claim; describe both Normal-direct and Cauchy-via-scale-mixture.
  - @param prior_factorization: drop "Only available when the interaction
    prior is normal_prior(...)"; replace with "Normal or Cauchy slab,
    Gamma diagonal".
  - @param prior_factorization auto-resolve clause: include Cauchy in the
    set that lands on hierarchical, and update the "joint otherwise"
    example to beta-prime (which actually doesn't support hierarchical).
  - Updated quadrature name: alpha > 1 path is sinh-substitution, not
    Gauss-Hermite (sinh-midpoint replaced GH in 77437d8).

src/models/ggm/ggm_model.h
  - GraphPriorSpec enum header: hierarchical now requires Normal *or*
    Cauchy slab + Gamma diagonal; points at math/savage_dickey/cauchy_omega.h
    for the slice-sampled per-edge omega update under Cauchy.
  - set_graph_prior_spec docstring: same.
  - graph_prior_spec_ member comment: "Normal or Cauchy slab".
  - Quadrature name fix here too.

Regenerated man/bgm.Rd via roxygen2. Build verified.
Both primitives were superseded by the sinh-substitution + midpoint-rule
quadrature (sinh_midpoint) and the cubic critical-point solver
(cubic_mode) when the alpha > 1 path was rewritten in 77437d8.

Removed
  src/math/savage_dickey/laplace.{h,cpp}      -- no callers anywhere.
  src/math/savage_dickey/gauss_hermite.{h,cpp} -- test-only reference;
    the actual ground truth in test-sd-sinh.R is grid_logZ (R-side
    Gauss-Legendre with N=2e6 nodes), so the GH comparison was a
    redundant second opinion.

Updated
  src/models/ggm/ggm_model.cpp: drop the unused includes.
  src/savage_dickey_test_interface.cpp: drop sd_log_density_at_l_ji_cpp
    + sd_log_density_at_l_ji_gh_cpp test wrappers.
  tests/testthat/test-sd-sinh.R: drop the GH comparison block; the
    test that previously asserted "sinh-32 error <= 10 * GH-32 error"
    becomes "sinh-32 worst-case error < 1e-3" against the existing
    grid_logZ reference. Same six configurations.
  R/RcppExports.{R,cpp} regenerated via roxygen.

Verification
  sd-baseline-snapshot 42/42 (Normal-slab bit-identical).
  sd-cauchy-omega     6/6.
  sd-cubic            (all pass).
  sd-sinh             (all pass; fuzz N=128 worst error 1.09e-10).
Audit caught three inline comment references in
update_edge_indicator_parameter_pair_sd_lspace pointing at code that
was removed in the previous commit (623cfda):

  Line 659: "(math/savage_dickey/laplace.h) documents slab as ..."
            -- file removed; the surviving primitives in
            math/savage_dickey/ use the same slab convention.
  Lines 741-742: "...~1e-1 for the legacy non-adaptive GH-32..."
            -- the GH primitive is gone; the comparison no longer needs
            to be made.
  Line 770: "Replaces the previous Newton iteration
            (density_at_l_ji_one)..." -- density_at_l_ji_one lived in
            laplace.h and is removed; the cubic primitive stands on
            its own merits.

Comments rewritten to describe the surviving primitives without
referring to deleted code. No behaviour change. All sd-* tests pass.
Documented pre-commit workflow:

  source("inst/styler/bgms_style.R")
  styler::style_pkg(style = bgms_style)
  lintr::lint_package()

The styler converted <- to = across the package (the bgms style forces
EQ_ASSIGN over LEFT_ASSIGN except <<- and call-argument <-) and
normalised closing-paren placement in a handful of older tests with
stylistic drift.

Files touched
  R/bgm.R
  tests/testthat/test-{chol-perm-to-trailing,ggm-hierarchical,ggm-nuts,
    mixed-gradient-pfaffian,mixed-nuts,sample-ggm-prior,
    sbc-ggm-hierarchical{,-cauchy,-q10},sbc-ggm,sd-baseline-snapshot,
    sd-cauchy-omega,sd-sinh}.R

The bulk of the change is the <- -> = conversion in the four test
files written this session (test-ggm-hierarchical.R additions,
test-sd-cauchy-omega.R, test-sbc-ggm-hierarchical-cauchy.R) plus
older tests with mixed-style drift. lintr reports zero issues.
R/bgm.R:364 had `fit <- bgm(...)` inside an @examples block since
d5b3a2d (2026-05-24). The styler / pre-commit pass doesn't reach inside
roxygen `#'` comments, so this survived the bulk = / <- conversion in
the previous commit (3757ec7).

Fixed by hand; man/bgm.Rd regenerated via roxygen2.

  grep -rn "^#' .*<- " R/  ->  zero hits.
R/bgms-methods.R defined warning_once + a .warning_state env to issue
a once-per-session warning, but nothing calls it anywhere in R/ or
tests/. Likely an orphan from when a now-removed code path used to
emit the warning. Confirmed via the R-side orphan audit (corrected
script that excludes only definition lines and S3 methods).

Package still loads cleanly with the helper gone.
…er loops

The forward Givens row-rotation and the matching reverse-Givens loops on
W_work / W_bar all touched conceptual R via R(r1, k) — a stride-q access in
column-major storage. Switching the working R to its transpose turns each
row-rotation on conceptual R into a column-rotation on storage, which is
contiguous and amenable to SIMD.

givens_qr now consumes A_q directly (no .t() at the callsite) and writes R
in (m_q × q) layout; R(j, j) still gives the diagonal, R_diag and rotation
storage are unchanged. Backward-pass W_work / W_bar mirror the layout and
swap (r1, cj) → (cj, r1) at every access; the Q matrix and its rotations
are unaffected (Q column-rotations were already contiguous).

Bit-identical to the prior implementation across p ∈ {5, 10, 20, 50} ×
density ∈ {0.3, 0.6, 1.0} × seed ∈ {1, 2, 3} (logp and gradient both match
exactly). Per-call median speedup is small (≈1.01–1.04× at p ≥ 10), since
L1d already covers the strided rows at these p; the structural win is the
removal of the Aq_buf.t() materialisation and clearer storage semantics
that any future Householder-QR replacement can build on.
sample_ggm_prior(spec = "joint") routes through sample_ggm() with n = 0
and S = 0 to draw from the joint prior π(K, Γ). The AM chain's per-attempt
PD canary (arma::chol of precision_proposal_ inside update_edge_parameter
and the diagonal / between-step counterparts) is gated on prior_only_,
which the joint-spec entry never set. Without it the chain accepted moves
that took K out of the PD cone (no likelihood anchor), drift in Σ
compounded across the within-step sweep, and the next refresh_cholesky
threw on a now-non-PD K.

Now: both GGMModel constructors set prior_only_ = true after
initialize_precision_from_mle() when n_ == 0. Chains with data are
unaffected — the PD canary stays inactive at n_ > 0.

Z reproducer at p = 100, delta = 0.5 * log(100), normal slab scale 0.5,
edge_inclusion_prob 0.2 now runs to completion instead of crashing during
warmup.
The L-space Savage-Dickey between-step accept path previously did
chol + triangular solve + DSYRK matmul per accept to refresh
covariance_matrix_ = inv(L) inv(L)^T. The DSYRK is ~54 μs of the 175 μs
per-accept refresh at p = 100 under libRblas.

Σ_corner is now read on demand inside the sweep as
  Σ_kl = inv(L)[k,:] · inv(L)[l,:]^T
via two arma::dot row inner products per attempt (~0.3 μs each at p = 100),
and covariance_matrix_ is refreshed once at the end of
update_edge_indicators() for the within-step that follows. inv(L) is
refreshed once at the start of the sweep since the within-step's SMW
updates leave it stale wrt L.

Measured 7–11 % wall-time reduction on
  bgm(prior_factorization = "hierarchical",
      update_method = "adaptive-metropolis")
at p = 100, larger at smaller n where between-step accepts are more
frequent (more refreshes saved per iter).

Validated against the existing test-sbc-ggm-hierarchical{,-cauchy}.R
harnesses at p_inc = 0.5 (no regression vs the refresh-per-accept path)
and a posterior-mean agreement check at p = 100, n = 40: same-seed
OFF-vs-ON differences on K_ii and PIP are well inside the cross-seed
Monte Carlo noise floor.
cholesky_update_after_edge previously materialised four p×p arma
temporaries on each accept (one per (a1 a1ᵀ), (a2 a2ᵀ), (a1 a2ᵀ),
(a2 a1ᵀ)). Refactor to two rank-1 outer products with bundled scalar
weights:

  ΔΣ = -(a1 b1ᵀ + a2 b2ᵀ)
    b1 = inv_c00 · a1 + inv_c01 · a2
    b2 = inv_c01 · a1 + inv_c11 · a2

Cuts ~3 of 5 p×p allocations per accept. Wall-time on M5 Pro,
p=100 iter=500 warmup=100:

  hier n=40  : 159.1s -> 71.4s  (-55%)
  hier n=700 :  80.0s -> 35.0s  (-56%)
  joint n=40 :        -> 39.3s
  joint n=700:        -> 23.0s

A/B against parent commit on the same hardware confirms the win is
the SMW refactor (baseline measured by stashing this change and
re-running). All ggm tests pass (gradient 14/14, hierarchical 20/20,
nuts gated).
cholesky_update_after_edge holds the L update + Σ rank-2 SMW for the
symmetric change ΔK = vf1 vf2ᵀ + vf2 vf1ᵀ. Today both inputs are sparse
2-entry vectors (the edge case) but the kernel is generic in vf1, vf2.

Lift the kernel into apply_rank2_chol_smw_update_(); cholesky_update_after_edge
becomes a thin wrapper that scatters the per-edge deltas into vf1_/vf2_,
calls the helper, and zeros them back out.

Zero behaviour change: bitwise-identical sample matrices vs HEAD across
joint and hierarchical modes at p ∈ {20, 50}, seed=11..12. Existing GGM
tests pass (gradient 14/14, hierarchical 20/20).

Prepares the row-block Gibbs within-step (next PR) to reuse the same
kernel with full-vector vf1 = e_i, vf2 = u_i.
Adds GGMModel::update_row_block_gibbs(i) — a closed-form Gaussian–Gamma
draw of row i of K given the rest of K and the graph, under the conjugate
prior scope (Normal slab on K_yy off-diagonals × Gamma(1, beta0) on K_ii/2,
no determinant tilt).

Derivation (in bgms internal scale K_yy = -K/2):
  - Partition K = ((A, beta), (beta^T, kii)) at row i, with
    A = K_{-i,-i}, beta = K_{N_i, i}, kii = K_{i,i}.
  - Reparam xi = kii - beta^T C beta, C = (A^{-1})_{N_i, N_i}
    extracted in O(|N_i|^2) from cached Sigma via Schur.
  - xi | rest ~ Gamma(n/2 + 1, (beta0 + S_ii)/2)
  - beta | rest ~ N(-M^{-1} S_{N_i, i}, M^{-1}),
      M = (beta0 + S_ii) C + (1 / (4 sigma^2)) I.
  - kii = xi + beta^T C beta; PD by construction.

New column written via the shared apply_rank2_chol_smw_update_ helper
(from PR-1) with vf1 = e_i and vf2 carrying the column-i deltas.

row_block_gibbs_eligible() gates the conjugate case; non-conjugate
combinations return false (PR-4..PR-6 will extend coverage).

Tested via a new ggm_test_row_block_gibbs Rcpp entry that calls the
kernel in isolation (no AM dispatch yet — that lands in PR-3):
  1. q = 0 marginal: K_ii ~ Gamma(1, beta0/2) at n=0, KS Bonferroni pass
  2. Agreement vs AM (n=200, p=6, complete graph, 3500 post-burn draws):
       max|K_gibbs - K_am|  = 0.012
       mean rel diff        = 1.7 %
  3. Predicate rejects gamma_shape != 1.

Existing GGM tests pass (gradient 14/14, hier 20/20); bitwise check on
the AM path (4 cells) still identical — PR-2 adds new methods without
touching the existing dispatch.
bgm() / sample_ggm() now accept within_step_kind in
{"adaptive_metropolis", "row_block_gibbs"} (default keeps the AM path).
The flag plumbs through validate_sampler -> run_sampler_ggm ->
sample_ggm.cpp -> GGMModel::set_within_step_kind, and
GGMModel::do_one_metropolis_step branches on it.

Eligibility:
  set_within_step_kind warns and downgrades to AdaptiveMetropolis if
  row_block_gibbs is requested but the prior is not yet supported
  (need Normal slab + Gamma alpha=1 + delta=0). PR-4..PR-6 will extend
  coverage; opt-in scripts under unusual prior combinations keep
  running on the AM path until then.

Bug fix while wiring: the copy constructor was missing
within_step_kind_ (also missing for new state added by this PR until
now), so per-chain clones silently reverted to AM. Added.

Validation:
  1. Default AM path bitwise-identical to PR-2 baseline (4 cells:
     joint+hier x p={20,50}).
  2. AM vs row_block_gibbs posterior agreement via sample_ggm at
     p=6, n=200, 2500 post-burn draws: max|diff| = 0.012, mean rel
     diff = 1.8% (matches the PR-2 standalone test).
  3. row_block_gibbs + Cauchy slab raises the expected warning and
     falls back to AM.
  4. Existing GGM tests pass (gradient 14/14, hierarchical 20/20,
     nuts gated).
Under the determinant-tilt prior |K|^delta, the K_{i,i} log-det
contribution becomes delta * (log|A| + log xi), so the conjugate
factorization keeps the same shape: xi stays Gamma-conjugate with rate
unchanged and shape shifted from n/2 + 1 to n/2 + delta + 1. beta is
unaffected (the tilt does not depend on beta given A).

Implementation: xi_shape = n/2 + determinant_tilt_ + 1 in
update_row_block_gibbs. row_block_gibbs_eligible() drops the delta == 0
check.

Validation: AM vs row_block_gibbs posterior means at delta in {0.5, 2}
on p=6, n=200, complete graph agree within MCMC error:
  delta = 0.5: max|d| = 0.013, mean rel = 2.0%
  delta = 2.0: max|d| = 0.011, mean rel = 1.3%

AM-default bitwise unchanged (4-cell harness); ggm tests pass
(gradient 14/14, hierarchical 20/20).
The Gamma(alpha, beta_0) prior on K_{i,i}/2 introduces a kii^(alpha-1)
factor that does not factorize between xi and beta (kii = xi + beta^T C
beta). Plain conjugate Gibbs no longer applies. Instead, treat the
conjugate (alpha = 1) joint draw as an independent-MH proposal and
accept with probability

  alpha_MH = (kii_new / kii_old)^(alpha - 1)

The beta and S_ii factors cancel since both target and proposal share
the same beta prior and likelihood; only the diagonal correction
remains.

Guarded by std::abs(prior_alpha_ - 1) > 1e-12 so the alpha = 1 path
still skips the runif draw and remains bit-identical to PR-2..PR-4.

row_block_gibbs_eligible() drops the alpha = 1 requirement; the
warn-and-downgrade message updates to cover only the remaining gap
(Cauchy slab, lands in PR-6).

Validation:
  AM vs row_block_gibbs posterior means at p=6, n=200, complete graph,
  Normal slab, delta=0:
    alpha = 0.5: max|d| = 0.013, mean rel = 1.9%
    alpha = 2.0: max|d| = 0.013, mean rel = 1.8%
  AM-default bitwise identical (4-cell harness); ggm tests pass.
The Cauchy slab on K_yy_{ij} = -K_{ij}/2 expands as
  K_yy_{ij} | omega_{ij} ~ N(0, sigma^2 * omega_{ij}),
  omega_{ij} ~ IG(1/2, 1/2).
Conditional on omega, the joint draw of (beta, kii) becomes Gaussian-
Gamma conjugate just like the Normal-slab case, with a per-element
prior precision diag(1 / (4 sigma^2 omega_k)) on the beta block. The
existing slice_sample_omega_ij_ kernel refreshes omega per edge in a
single per-sweep slice pass.

Implementation:
  - update_row_block_gibbs: M now uses
      M(k, k) += inv_4sig2 / omega_{i, N_i[k]}
    Under Normal slab omega is fixed at 1, so the formula reduces to
    the existing constant precision and PR-2..PR-5 stay bit-identical.
  - do_one_metropolis_step (RowBlockGibbs branch): after the row sweep,
    when slab_is_cauchy_, refresh inv_L from the SMW-maintained L
    (O(p^3)), then sweep slice_sample_omega_ij_(i, j) over all edge
    pairs. The refresh is needed because apply_rank2_chol_smw_update_
    maintains L and Sigma but not inv_L.
  - row_block_gibbs_eligible(): accepts NormalPrior OR CauchyPrior.

Validation:
  AM (Cauchy direct) vs row_block_gibbs (Cauchy via mixture) at p=6,
  n=200, complete graph, Gamma(1, 1), delta=0:
    max|d| = 0.009  mean rel = 1.3%
  Both target the same Cauchy marginal; the row-Gibbs path adds omega
  to the state vector but integrates it out at stationarity. AM-default
  bitwise unchanged (4-cell harness); ggm tests pass.
User-facing bgm() takes a single sampler knob:
  update_method in {"nuts", "adaptive-metropolis", "Gibbs"}
instead of the awkward two-parameter combination
  (update_method = "adaptive-metropolis", within_step_kind = "row_block_gibbs")
introduced for row-block Gibbs in PR-3.

Internally the C++ entry sample_ggm() still takes the two-knob
(sampler_type, within_step_kind) tuple — it sits at the R/Rcpp
boundary and "Gibbs" rides on the AM chain runner with the
row_block_gibbs within-step. run_sampler_ggm() handles the mapping.

target_accept = 0.44 is now the default for both AM and Gibbs (the
between-step adapter still consumes it; Gibbs's row sweep has no
per-row acceptance to tune).

Existing bitwise check unchanged (AM-default path); ggm tests pass.
One block per feature increment so a regression isolates to its commit:
exact conjugate draw anchor (prior-only K_ii vs closed-form Gamma) plus
Gibbs-vs-AM posterior-mean agreement on a fixed graph at Normal alpha=1
delta=0, delta=0.5, alpha=2, and the Cauchy slab. The agreement blocks are
skip_on_cran() since they run real chains.
The row-block Gibbs row draw reads only Sigma, never chol(K): delta enters
analytically as a Gamma shape shift, and the Schur extraction uses the
covariance. So the two per-row Cholesky Givens passes in
apply_rank2_chol_smw_update_ produced a factor nothing read until the
between-step. Gate them behind a new update_L flag (true for the edge
accept, false for the row sweep) and rebuild chol(K) once via
refresh_cholesky() at sweep end, which also makes Sigma exact and subsumes
the old conditional drift check.

Single-threaded, behaviour-preserving (chol(K) is a deterministic function
of K). Sigma is still SMW-maintained per row so each row's Schur extraction
is exact. Verified L^T L = K and Sigma K = I to ~1e-16 every sweep under
edge selection, and the Gibbs-vs-AM agreement tests still pass.

Within-step throughput (fixed graph, single thread):
  p=100 d=0.1  196 -> 411 sweeps/s   (2.1x)
  p=200 d=0.1   27 ->  67 sweeps/s   (2.5x)
  dense graphs ~1.08x (chol(M)-bound, as expected).
The rank-2 vectors u1/u2 are nonzero only on the update support ({i,j} for an
edge accept, {i} u N_i for a row-Gibbs row), so a1 = Sigma*u1 equals
Sigma.cols(support)*u1(support). Pass the support to
apply_rank2_chol_smw_update_ and gather just those columns: O(p*|support|)
instead of the dense O(p^2) gemv. A guard (3*|support| < 2p) falls back to the
dense gemv when the support is near-full, so dense graphs never pay the gather
copy. The outer-product Sigma updates stay dense O(p^2) (a1, b1 are length-p);
this only sparsifies the matvec and the capacitance dots.

Bit-identical to the dense path (off-support u entries are exactly zero), so
the per-sweep L^TL=K / Sigma K=I invariant is unchanged. Edge accept (support
size 2) always takes the gather path.

Within-step throughput on top of the deferred-L change (sparse, single thread):
  p=100 d=0.1  411 -> 491 sweeps/s   (1.20x)
  p=200 d=0.1   67 ->  86 sweeps/s   (1.29x)
  dense graphs unchanged (guard -> dense gemv).
Hoist the element-filled per-row buffers (N_i, beta_old, sigma_iNi, s_Ni_i,
C, support) to reused members. clear() on the std::vector keeps capacity, and
set_size on the arma buffers reuses the existing allocation when q fits, so a
sweep no longer reallocates these each row -- notably dropping the per-row
reserve(p-1) on the neighbour list. The RNG draw (arma_rnorm_vec) and the
solve/chol/matmul results stay local: changing the rnorm path would shift the
draw order, and arma move-allocates expression results on assignment so a
member would not be reused anyway.

Bit-identical (only buffer location changes); tests pass. The measured gain is
small (~1-2%, largest at small p where the alloc/flop ratio is highest) since
the row cost is dominated by chol(M) and the dense outer-product Sigma updates,
not allocation.
…s -> gibbs

d3ca987 added "Gibbs" to validate_sampler's match.arg choices (now 3) but left
bgmCompare's default update_method = c("nuts","adaptive-metropolis") (2). match.arg
only falls through to the first element when the passed vector is identical to
choices, so bgmCompare's default errored — failing the comparison.Rmd vignette and
the R-CMD-check / test-coverage / pkgdown CI jobs.

Fixes:
- Gate the gibbs kernel on is_continuous in validate_sampler (gibbs is the GGM
  row-block kernel only; there is no continuous bgmCompare, so this excludes it),
  and take the first element of a passed multi-value default explicitly instead of
  relying on match.arg's fragile default-equals-choices fall-through. An explicit
  update_method = "gibbs" on a non-continuous model is now correctly rejected.
- Rename the user-facing value "Gibbs" -> "gibbs" for consistency with "nuts" and
  "adaptive-metropolis" (the within_step_kind = "row_block_gibbs" internal name is
  unchanged). Done before any release, so no deprecation needed.
- Document the gibbs option in bgm()'s update_method (it was never listed);
  regenerate man/bgm.Rd.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant