backport: merge bitcoin/bitcoin#25144: refactor: Pass Peer& to Misbehaving()#7331
backport: merge bitcoin/bitcoin#25144: refactor: Pass Peer& to Misbehaving()#7331knst wants to merge 1 commit into
Conversation
|
|
This pull request has conflicts, please rebase. |
|
This pull request has conflicts, please rebase. |
|
✅ Review complete (commit e1f94ef) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e1f94ef60f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| void PeerManagerImpl::PeerMisbehaving(const NodeId pnode, const int howmuch, const std::string& message) | ||
| { | ||
| Misbehaving(pnode, howmuch, message); | ||
| Misbehaving(*Assert(GetPeerRef(pnode)), howmuch, message); |
There was a problem hiding this comment.
Avoid asserting on stale peer ids from async handlers
External handlers can penalize peer ids after the peer has already been finalized; for example ISDLOCK messages are enqueued with the sender id (src/instantsend/net_instantsend.cpp:281), then verified later on the InstantSend worker (src/instantsend/net_instantsend.cpp:414-416), and an invalid source calls PeerMisbehaving (src/instantsend/net_instantsend.cpp:195). If that remote peer disconnects before the worker reaches this line, GetPeerRef(pnode) returns null and Assert aborts the node instead of preserving the old no-op behavior of Misbehaving(NodeId). A malicious peer can therefore turn an invalid async message plus disconnect into a process crash.
Useful? React with 👍 / 👎.
WalkthroughThe PR replaces node-id based misbehavior handling in Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/net_processing.cpp`:
- Line 3600: The `*Assert(GetPeerRef(node))` use in
`PostProcessMessage`/`PeerPostProcessMessage` can abort if the peer is already
absent, unlike the safer `MaybePunishNodeForBlock`/`MaybePunishNodeForTx` paths.
Update the Dash-specific misbehavior handling to tolerate a missing peer by
checking the result of `GetPeerRef(node)` before calling `Misbehaving`, and
verify all `PeerMisbehaving` callers only report still-connected peers so `node`
cannot be invalid or default to `-1`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3bf222ef-10d4-42c7-b356-e33f78fd5707
📒 Files selected for processing (3)
src/net_processing.cppsrc/net_processing.hsrc/test/denialofservice_tests.cpp
| { | ||
| if (result.m_error) { | ||
| Misbehaving(node, result.m_error->score, result.m_error->message); | ||
| Misbehaving(*Assert(GetPeerRef(node)), result.m_error->score, result.m_error->message); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Verify these *Assert(GetPeerRef(...)) paths can't abort on an absent peer.
Unlike MaybePunishNodeForBlock/ForTx (which use if (peer)), the Dash-specific glue here dereferences *Assert(GetPeerRef(...)). If the peer is no longer in m_peer_map, Assert aborts rather than performing a previously-safe no-op. Two things to confirm:
PostProcessMessagedefaultsnode = -1in the public interface; an error result reaching Line 3600 withnode == -1(e.g. viaPeerPostProcessMessage) would makeGetPeerRef(-1)empty and trigger the assert.PeerMisbehaving(Line 6598) is invoked by externalNetHandlers; confirm they only report misbehavior for a still-connected peer id.
#!/bin/bash
# Inspect PeerPostProcessMessage forwarding (does it pass node or default -1?)
rg -nP --type=cpp -C3 '\bPeerPostProcessMessage\b' src/net_processing.cpp
# Find all PostProcessMessage call sites and the node argument they pass
rg -nP --type=cpp 'PostProcessMessage\s*\(' src
# Find PeerMisbehaving callers (external handlers) to confirm peer liveness
rg -nP --type=cpp -C3 '\bPeerMisbehaving\b' src
# Confirm whether any MessageProcessingResult can carry m_error with no node context
ast-grep run --pattern 'PeerPostProcessMessage($$$)' --lang cpp src/net_processing.cppAlso applies to: 6598-6598
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/net_processing.cpp` at line 3600, The `*Assert(GetPeerRef(node))` use in
`PostProcessMessage`/`PeerPostProcessMessage` can abort if the peer is already
absent, unlike the safer `MaybePunishNodeForBlock`/`MaybePunishNodeForTx` paths.
Update the Dash-specific misbehavior handling to tolerate a missing peer by
checking the result of `GetPeerRef(node)` before calling `Misbehaving`, and
verify all `PeerMisbehaving` callers only report still-connected peers so `node`
cannot be invalid or default to `-1`.
fa8aa0a Pass Peer& to Misbehaving() (MacroFake) Pull request description: `Misbehaving` has several coding related issues (ignoring the conceptual issues here for now): * It is public, but it is not supposed to be called from outside of net_processing. Fix that by making it private and creating a public `UnitTestMisbehaving` method for unit testing only. * It doesn't do anything if a `nullptr` is passed. It would be less confusing to just skip the call instead. Fix that by passing `Peer&` to `Misbehaving()`. * It calls `GetPeerRef`, causing `!m_peer_mutex` lock annotations to be propagated. This is harmless, but verbose. Fix it by removing the no longer needed call to `GetPeerRef` and the no longer needed lock annotations. ACKs for top commit: vasild: ACK fa8aa0a w0xlt: Code Review ACK bitcoin@fa8aa0a Tree-SHA512: e60a6b317f2b826f9e0724285d00b632d3e2a91ded9fa5ba01c80766c5d39270b719be234c01302d46eaba600910032693836aa116ff05ee1b590c7530881cd3 Co-authored-by: MacroFake <falke.marco@gmail.com>
thepastaclaw
left a comment
There was a problem hiding this comment.
Code Review
The backport of bitcoin#25144 is mechanically clean and correctly adapts the upstream call sites (MaybePunishNodeForBlock/MaybePunishNodeForTx with if (peer), ProcessMessage with *peer after the null check). However, the two Dash-only entry points that bridge NodeId callers into the new Peer& API — PeerMisbehaving and PostProcessMessage — were adapted using *Assert(GetPeerRef(node)), which is fatal on disconnected peers. PeerMisbehaving in particular is reachable from worker-thread paths (e.g. NetInstantSend::ApplyVerificationResults) where the NodeId can race against disconnect, regressing the prior silent no-op into a node-aborting Assert.
🔴 1 blocking | 🟡 1 suggestion(s)
1 additional finding(s) omitted (not in diff).
🤖 Prompt for all review comments with AI agents
These findings are from an automated code review. Verify each finding against the current code and only fix it if needed.
In `src/net_processing.cpp`:
- [BLOCKING] src/net_processing.cpp:6596-6599: PeerMisbehaving aborts the node on disconnected peers — regression vs. prior no-op
Before this PR, `Misbehaving(NodeId, ...)` started with `PeerRef peer = GetPeerRef(pnode); if (peer == nullptr) return;`, so the Dash-only `PeerMisbehaving` wrapper was a graceful no-op when the peer had already been removed. After this PR, `PeerMisbehaving` is `Misbehaving(*Assert(GetPeerRef(pnode)), howmuch, message);`. `Assert` (src/util/check.h:71) calls `assertion_fail` and aborts the process when its argument is falsy.
`PeerMisbehaving` is invoked from many Dash subsystems with NodeIds captured earlier, including worker-thread paths where the peer may have disconnected since enqueuing:
- `src/instantsend/net_instantsend.cpp:195` — `NetInstantSend::ApplyVerificationResults` runs on the isman work thread (see `workInterrupt` loop at line 410) and uses `pending.node_id` captured at enqueue time.
- `src/instantsend/net_instantsend.cpp:67,100,266`, `src/llmq/net_dkg.cpp:176/183/198/...`, `src/llmq/net_signing.cpp:37/231/275`, `src/llmq/net_quorum.cpp` (many), `src/governance/net_governance.cpp:86/140/197`, `src/coinjoin/server.cpp:139/144/153/179` — all pass NodeId by value.
Upstream solved this exact race in this same PR for `MaybePunishNodeForBlock`/`MaybePunishNodeForTx` by guarding with `if (peer) Misbehaving(*peer, ...)`. The Dash wrapper needs the same null-tolerant pattern; otherwise a routine disconnect race (or an attacker engineering one during verification latency) turns a benign "peer already gone" event into a node crash.
- [SUGGESTION] src/net_processing.cpp:3597-3611: PostProcessMessage Assert is fragile for NodeId=-1 callers
`PostProcessMessage` now does `Misbehaving(*Assert(GetPeerRef(node)), result.m_error->score, result.m_error->message);` when `result.m_error` is set. The public `PeerPostProcessMessage` wrapper (line 6651) forwards to `PostProcessMessage(std::move(ret), /*node=*/-1)`, so `GetPeerRef(-1)` would return an empty `PeerRef` and the Assert would abort.
Today this is safe only because the producers of `MessageProcessingResult` invoked without a peer (e.g. `ChainlockHandler::ProcessNewChainLock` called via `src/chainlock/signing.cpp:286` with `-1`) explicitly guard with `if (from != -1) return MisbehavingError{...};` (src/chainlock/handler.cpp:109). That invariant is not enforced by the type system and is easy to break in future processors that return a `MisbehavingError` without checking `from`.
Mirroring the `MaybePunishNodeForBlock`/`MaybePunishNodeForTx` adaptation (look up, skip if null) keeps `PostProcessMessage` robust without relying on a cross-file invariant.
| void PeerManagerImpl::PeerMisbehaving(const NodeId pnode, const int howmuch, const std::string& message) | ||
| { | ||
| Misbehaving(pnode, howmuch, message); | ||
| Misbehaving(*Assert(GetPeerRef(pnode)), howmuch, message); | ||
| } |
There was a problem hiding this comment.
🔴 Blocking: PeerMisbehaving aborts the node on disconnected peers — regression vs. prior no-op
Before this PR, Misbehaving(NodeId, ...) started with PeerRef peer = GetPeerRef(pnode); if (peer == nullptr) return;, so the Dash-only PeerMisbehaving wrapper was a graceful no-op when the peer had already been removed. After this PR, PeerMisbehaving is Misbehaving(*Assert(GetPeerRef(pnode)), howmuch, message);. Assert (src/util/check.h:71) calls assertion_fail and aborts the process when its argument is falsy.
PeerMisbehaving is invoked from many Dash subsystems with NodeIds captured earlier, including worker-thread paths where the peer may have disconnected since enqueuing:
src/instantsend/net_instantsend.cpp:195—NetInstantSend::ApplyVerificationResultsruns on the isman work thread (seeworkInterruptloop at line 410) and usespending.node_idcaptured at enqueue time.src/instantsend/net_instantsend.cpp:67,100,266,src/llmq/net_dkg.cpp:176/183/198/...,src/llmq/net_signing.cpp:37/231/275,src/llmq/net_quorum.cpp(many),src/governance/net_governance.cpp:86/140/197,src/coinjoin/server.cpp:139/144/153/179— all pass NodeId by value.
Upstream solved this exact race in this same PR for MaybePunishNodeForBlock/MaybePunishNodeForTx by guarding with if (peer) Misbehaving(*peer, ...). The Dash wrapper needs the same null-tolerant pattern; otherwise a routine disconnect race (or an attacker engineering one during verification latency) turns a benign "peer already gone" event into a node crash.
| void PeerManagerImpl::PeerMisbehaving(const NodeId pnode, const int howmuch, const std::string& message) | |
| { | |
| Misbehaving(pnode, howmuch, message); | |
| Misbehaving(*Assert(GetPeerRef(pnode)), howmuch, message); | |
| } | |
| void PeerManagerImpl::PeerMisbehaving(const NodeId pnode, const int howmuch, const std::string& message) | |
| { | |
| if (PeerRef peer = GetPeerRef(pnode)) Misbehaving(*peer, howmuch, message); | |
| } |
source: ['claude', 'coderabbit']
What was done?
After all refactorings to separate network and consensus code it's finally possible to do bitcoin#25144
Depends on #7314, #7332 - both are merged.
How Has This Been Tested?
Run unit & functional tests
Breaking Changes
N/A
Checklist: