perf: skip merge on subtrees the memo proves are unaffected#35
Conversation
| out.extend(nodes[i:]) | ||
| if len(out) == 1: | ||
| return out[0] | ||
| merged_nodes = tuple(node.merge(token, merge) for node in out) |
There was a problem hiding this comment.
i love the idea, but think we might need a better implementation - this is a bit messy and not straightforward
|
Fair — I tried two cleaner rewrites and benchmarked both against main; both lose the speedup:
The win comes specifically from the single inline pass: the So it's a real tension: the speedup needs the inline scan, which is the part that reads as messy. Given this is the smallest of the five wins and resists a clean expression, my recommendation is to close this one rather than add code that's either messy or doesn't help. The other four (slots, skip-flatten, argmax-items, interning) carry the bulk of the improvement. Happy to close it unless you'd rather keep the inline version as-is. |
When get_merges is memoized, _merges lists every mergeable subsequence in the subtree. If the chosen merge isn't among them, this subtree contains it nowhere, so merge() returns self immediately — skipping the scan, the child recursion, and the rebuild. Falls back to the plain scan when the node isn't memoized. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5babe23 to
c4d8cee
Compare
Reworked from the earlier lazy-out version (which read messy and only gave ~5%) to reuse the
get_mergesmemo as an "is this subtree affected?" check, per review._merges(when memoized — the default) already lists every mergeable subsequence in the subtree, including nested ones (yield from child.get_merges()). So if the chosen merge isn't among them, this subtree contains it nowhere andmerge()can returnselfimmediately — skipping the scan, the child recursion, and the rebuild in one membership check:Falls back to the scan when the node isn't memoized (e.g.
TRADE_MEMORY_FOR_SPEED=False, or pre-seeded merges before anyget_merges), so behavior is unchanged there. Output identical; 137 tests pass (rebased on current main, which now has #32/#33/#34/#36).Measured back-to-back, 3 identical rounds (50 texts / 200 merges; time = best of 3):
Far bigger than the original lazy-out (−4–7%): an unaffected word now skips the scan, recursion, and rebuild in a single C-level
incheck. Memory is mixed and small (BNE −1.2MB from avoided transient rebuilds; BPE/Boundless +~0.25MB).🤖 Generated with Claude Code