Skip to content

Clear monitor-pending RAA once regenerated#4684

Open
wpaulino wants to merge 1 commit into
lightningdevkit:mainfrom
wpaulino:bogus-async-raa-regenerated
Open

Clear monitor-pending RAA once regenerated#4684
wpaulino wants to merge 1 commit into
lightningdevkit:mainfrom
wpaulino:bogus-async-raa-regenerated

Conversation

@wpaulino

Copy link
Copy Markdown
Contributor

The chanmon_consistency fuzz target found a reconnect ordering where signer_pending_revoke_and_ack and monitor_pending_revoke_and_ack could both describe the same owed revoke_and_ack.

The channel first received a commitment_signed whose monitor update completed, but the signer could not provide the next point or secret, leaving signer_pending_revoke_and_ack set. Later, receiving the peer revoke_and_ack freed holding-cell HTLCs and produced a held monitor update. While that monitor update was still blocked, channel_reestablish saw the peer one state behind and recorded monitor_pending_revoke_and_ack, plus the corresponding monitor-pending commitment_signed, so the messages could be replayed once monitor updating was restored.

If the signer unblocked before the held monitor update was released, signer_maybe_unblocked generated and sent the RAA using signer_pending_revoke_and_ack. The monitor-pending flag was not cleared at that point, so monitor_updating_restored later generated the same RAA again when the held update completed. The peer had already advanced after accepting the signer-unblocked RAA, so it rejected the duplicate secret as not corresponding to its current pubkey and force-closed.

Fix this by clearing monitor_pending_revoke_and_ack whenever get_last_revoke_and_ack successfully constructs an RAA, alongside signer_pending_revoke_and_ack. All resend paths regenerate RAAs through this helper, so successful generation through either pending path satisfies the other pending record. If generation fails, pending signer state is still left set and monitor-pending state remains available for monitor restoration to retry.

This failure was discovered in https://github.com/lightningdevkit/rust-lightning/actions/runs/26905971318/job/79370860747.

@wpaulino wpaulino added this to the 0.3 milestone Jun 11, 2026
@wpaulino wpaulino requested a review from TheBlueMatt June 11, 2026 21:38
@wpaulino wpaulino self-assigned this Jun 11, 2026
@ldk-reviews-bot

ldk-reviews-bot commented Jun 11, 2026

Copy link
Copy Markdown

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@ldk-claude-review-bot

ldk-claude-review-bot commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

No new issues found.

The production change (lightning/src/ln/channel.rs:10262-10268) clears monitor_pending_revoke_and_ack in signer_maybe_unblocked when the signer-pending path successfully regenerates an RAA. On re-verification:

  • The clear is placed after the resend-order reblocking logic (10250-10261), so an RAA that gets nulled by the CommitmentFirst reorder correctly leaves monitor_pending_revoke_and_ack set for later retry. This placement is correct.
  • The reverse duplicate (monitor path then signer path) is already prevented by get_last_revoke_and_ack clearing signer_pending_revoke_and_ack on success (line 10352).
  • monitor_updating_restored clears monitor_pending unconditionally (10047), so the held update no longer regenerates the RAA after the signer path sent it.

Minor non-blocking observation (not a code bug): the PR description and inline comment frame the fix as happening "whenever get_last_revoke_and_ack successfully constructs an RAA," but the change is actually in signer_maybe_unblocked, not the helper. The implementation is nonetheless functionally complete and the chosen location is in fact safer than placing it in the helper would be.

The symmetric commitment_signed resend path remains untreated, as noted in my prior review — still out of scope for the bug this PR targets.

@ldk-reviews-bot

Copy link
Copy Markdown

🔔 1st Reminder

Hey @TheBlueMatt! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@TheBlueMatt

Copy link
Copy Markdown
Collaborator

Why does this not need backport to 0.1/0.2?

@TheBlueMatt TheBlueMatt left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here, why is it safe to always send an RAA (via get_last_revoke_and_ack) based on only signer_pending_... or monitor_pending_...? eg if we reconnect and find that we owe a commitment_signed that is blocked on the signer but have a blocked revoke_and_ack on a monitor update, we'll set signer_pending_raa and then send it if the signer completes even if the monitor is pending.

@ldk-reviews-bot

Copy link
Copy Markdown

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

The `chanmon_consistency` fuzz target found a reconnect ordering where
`signer_pending_revoke_and_ack` and `monitor_pending_revoke_and_ack`
could both describe the same owed `revoke_and_ack`.

The channel first received a `commitment_signed` whose monitor update
completed, but the signer could not provide the next point or secret,
leaving `signer_pending_revoke_and_ack` set. Later, receiving the peer
`revoke_and_ack` freed holding-cell HTLCs and produced a held monitor
update. While that monitor update was still blocked,
`channel_reestablish` saw the peer one state behind and recorded
`monitor_pending_revoke_and_ack`, plus the corresponding monitor-pending
`commitment_signed`, so the messages could be replayed once monitor
updating was restored.

If the signer unblocked before the held monitor update was released,
`signer_maybe_unblocked` generated and sent the already monitor-safe RAA
using `signer_pending_revoke_and_ack`. The monitor-pending flag was not
cleared at that point, so `monitor_updating_restored` later generated
the same RAA again when the held update completed. The peer had already
advanced after accepting the signer-unblocked RAA, so it rejected the
duplicate secret as not corresponding to its current pubkey and
force-closed.

Fix this by clearing `monitor_pending_revoke_and_ack` in the
signer-resume path only once a signer-pending RAA is actually being
returned.
@wpaulino wpaulino force-pushed the bogus-async-raa-regenerated branch from 38f6df2 to 27223fd Compare June 15, 2026 17:29
@wpaulino wpaulino requested a review from TheBlueMatt June 15, 2026 17:29
@wpaulino

wpaulino commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Why does this not need backport to 0.1/0.2?

I think we can, though I do wonder why this went uncaught for so long if it actually was an issue in those releases as well. Something about our current fuzz harness made this much easier to find.

EDIT: It's reachable in the fuzzer now because there's a path to reload with a stale manager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants