Clear monitor-pending RAA once regenerated#4684
Conversation
|
👋 Thanks for assigning @TheBlueMatt as a reviewer! |
|
No new issues found. The production change (
Minor non-blocking observation (not a code bug): the PR description and inline comment frame the fix as happening "whenever The symmetric |
|
🔔 1st Reminder Hey @TheBlueMatt! This PR has been waiting for your review. |
|
Why does this not need backport to 0.1/0.2? |
TheBlueMatt
left a comment
There was a problem hiding this comment.
I'm a bit confused here, why is it safe to always send an RAA (via get_last_revoke_and_ack) based on only signer_pending_... or monitor_pending_...? eg if we reconnect and find that we owe a commitment_signed that is blocked on the signer but have a blocked revoke_and_ack on a monitor update, we'll set signer_pending_raa and then send it if the signer completes even if the monitor is pending.
|
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
The `chanmon_consistency` fuzz target found a reconnect ordering where `signer_pending_revoke_and_ack` and `monitor_pending_revoke_and_ack` could both describe the same owed `revoke_and_ack`. The channel first received a `commitment_signed` whose monitor update completed, but the signer could not provide the next point or secret, leaving `signer_pending_revoke_and_ack` set. Later, receiving the peer `revoke_and_ack` freed holding-cell HTLCs and produced a held monitor update. While that monitor update was still blocked, `channel_reestablish` saw the peer one state behind and recorded `monitor_pending_revoke_and_ack`, plus the corresponding monitor-pending `commitment_signed`, so the messages could be replayed once monitor updating was restored. If the signer unblocked before the held monitor update was released, `signer_maybe_unblocked` generated and sent the already monitor-safe RAA using `signer_pending_revoke_and_ack`. The monitor-pending flag was not cleared at that point, so `monitor_updating_restored` later generated the same RAA again when the held update completed. The peer had already advanced after accepting the signer-unblocked RAA, so it rejected the duplicate secret as not corresponding to its current pubkey and force-closed. Fix this by clearing `monitor_pending_revoke_and_ack` in the signer-resume path only once a signer-pending RAA is actually being returned.
38f6df2 to
27223fd
Compare
I think we can, though I do wonder why this went uncaught for so long if it actually was an issue in those releases as well. Something about our current fuzz harness made this much easier to find. EDIT: It's reachable in the fuzzer now because there's a path to reload with a stale manager. |
The
chanmon_consistencyfuzz target found a reconnect ordering wheresigner_pending_revoke_and_ackandmonitor_pending_revoke_and_ackcould both describe the same owedrevoke_and_ack.The channel first received a
commitment_signedwhose monitor update completed, but the signer could not provide the next point or secret, leavingsigner_pending_revoke_and_ackset. Later, receiving the peerrevoke_and_ackfreed holding-cell HTLCs and produced a held monitor update. While that monitor update was still blocked,channel_reestablishsaw the peer one state behind and recordedmonitor_pending_revoke_and_ack, plus the corresponding monitor-pendingcommitment_signed, so the messages could be replayed once monitor updating was restored.If the signer unblocked before the held monitor update was released,
signer_maybe_unblockedgenerated and sent the RAA usingsigner_pending_revoke_and_ack. The monitor-pending flag was not cleared at that point, somonitor_updating_restoredlater generated the same RAA again when the held update completed. The peer had already advanced after accepting the signer-unblocked RAA, so it rejected the duplicate secret as not corresponding to its current pubkey and force-closed.Fix this by clearing
monitor_pending_revoke_and_ackwheneverget_last_revoke_and_acksuccessfully constructs an RAA, alongsidesigner_pending_revoke_and_ack. All resend paths regenerate RAAs through this helper, so successful generation through either pending path satisfies the other pending record. If generation fails, pending signer state is still left set and monitor-pending state remains available for monitor restoration to retry.This failure was discovered in https://github.com/lightningdevkit/rust-lightning/actions/runs/26905971318/job/79370860747.