feat(payload): positional-cursor /check paging; remove the spool#52
Open
andinux wants to merge 7 commits into
Open
feat(payload): positional-cursor /check paging; remove the spool#52andinux wants to merge 7 commits into
andinux wants to merge 7 commits into
Conversation
cloudsync_payload_chunks could only resume a window from since>db_version, so a chunk boundary that lands inside a single committed db_version (or inside a fragmented oversized value) was not addressable — which is the whole reason the server stages the stream into a cloudsync_payload_spool table to page it out. This makes the vtab resumable by position instead. Add an optional positional cursor: hidden inputs resume_db_version, resume_seq, resume_frag_offset start the scan at (db_version, seq) inclusive and re-enter a mid-value fragment at a byte offset; new outputs next_db_version, next_seq, next_frag_offset and is_final report where the emitted chunk stopped. A stateless /check can then page the whole window with an O(1) seek per call (vs O(N^2) replay-from-since), no spool table, no server-side state. Legacy since>db_version callers (send path, spool fill) are unchanged: columns are appended and the positional branch only activates when resume_db_version is bound. Tiling is exact, not idempotent-overlap: (db_version, seq) is a unique total order (changes rowid = (db_version<<30)|seq), next_* names the row that did not fit or the exact byte already emitted, and the fragment plan is a deterministic function of the row, so a resumed fragment tiles identically. The drain-start cursor must seek exclusive-after the last applied change (see docs/internal design note) so the protocol never relies on changes-level idempotency to absorb a re-sent row. New unit test Payload Chunks Positional Resume drives a window mixing a db_version split across chunks with a value larger than the chunk budget, pages it one chunk per call via the cursor, and asserts byte-identity with a full-window scan (including a mid-fragment resume). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…into codex/positional-cursor-chunks
… SRF Mirror the SQLite vtab's positional cursor on the PostgreSQL SRF so the /check job can page the node one chunk per round-trip without the cloudsync_payload_spool table. Adds inputs resume_db_version, resume_seq, resume_frag_offset (start the scan at (db_version, seq) inclusive and re-enter a mid-value fragment at a byte offset) and outputs next_db_version, next_seq, next_frag_offset, is_final (where the emitted chunk stopped). The positional query reuses the (db_version > $ OR (db_version = $ AND seq >= $)) shape already used by payload_blob_checked's estimate, with an inclusive seq to match the vtab's exact tiling. Fragment setup is factored into payload_chunks_pg_begin_fragment so a streamed and a resumed fragment build identically; the next cursor is read from the buffered source row (or the same row mid-fragment), peeking one row ahead to set is_final. Legacy callers (send path, spool fill) are unchanged: new args default to NULL so the positional branch only activates when resume_db_version is bound, and the existing output columns keep their positions. New test 55_payload_chunks_positional_resume.sql resumes at every non-final chunk's reported cursor and asserts the fetched chunk equals the next chunk of a full-window scan byte-for-byte, including a mid-fragment resume. Verified against PostgreSQL 17; existing chunk/spool/fragment tests (52-54) still pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Part 1 of test 55 proves the positional cursor produces byte-identical chunks to a full scan. Part 2 now closes the loop end-to-end: a plpgsql helper drains the whole window the way the /check job will (legacy exclusive since=0, then the positional cursor one chunk per call, ORDER BY chunk_index LIMIT 1 so each value-per-call SRF runs to completion), the drained stream is applied to a fresh database, and the receiver's table content is hashed and compared to the source. This validates the real path (positional drain -> apply -> faithful replica), not just byte-identity against a baseline. Verified on PostgreSQL 17: 501 rows reproduced, hashes match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirror the PostgreSQL test's end-to-end check on the SQLite side: after asserting the positional drain reproduces the full-scan chunks byte-for-byte, apply that drained stream to a fresh receiver database and assert its split_test content matches the source via test_split_tables_equal. Chunks are applied in reverse drain order so the apply path must reassemble v3 fragments and merge rows independent of transport order. Validates positional drain -> apply -> faithful replica, the real path the /check job will use. All unit tests pass, no leaks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
chunk_bench builds a window of N chunks and times paging the whole window one chunk per call via the positional cursor on cloudsync_payload_chunks, reporting wall time, per-chunk cost and throughput. Local-only (loads the built dylib, no network). Env-tunable rows/row_bytes/txns/repeats/chunk_size (TXNS splits the rows across db_versions). Lets the drain's computational growth be tracked — e.g. to confirm a future indexed (db_version, seq) seek flattens it from O(N^2) to O(N). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ions The /check job now pages the tenant via the positional cursor on cloudsync_payload_chunks, so the server-side download spool is dead. Remove the cloudsync_payload_spool table and the cloudsync_payload_spool_fill / _drop / _drop_chunk functions on both engines (SQLite: sql_sqlite.c constants, sql.h decls, cloudsync_sqlite.c functions + registration; PostgreSQL: cloudsync.sql.in and the 1.0->1.1 migration), plus the SQLite "Payload Download Spool" unit test and the spool sections of the PostgreSQL chunk test. The client-side cursor paging (network.c) is unchanged: the wire protocol still pages chunks by an integer cursor; only the server's node-side staging mechanism changed. All remaining chunk/fragment/positional tests pass on both engines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the server-side download spool with a positional cursor on
cloudsync_payload_chunks, so the/checkjob can page a tenant's change windowone chunk per round-trip with no temporary table. This is the extension half of the
change; the companion server PR is sqliteai/cloudsync#52.
Why
The spool stages a whole
/checkwindow into acloudsync_payload_spooltable onthe tenant, then pages it out. That has three operational costs this PR removes:
spool_fillissuesCREATE TABLE, which the tenant's defaultreadwriteapikey is not authorized for →failures.check: database_permission_denied(the original bug; the bench polled forever). Positional needs no DDL.
Positional writes nothing to the tenant.
creatable.
How it works
cloudsync_payload_chunksgains an optional positional cursor:resume_db_version,resume_seq,resume_frag_offset— start the scanat
(db_version, seq)inclusive and re-enter a mid-value fragment at a byte offset;next_db_version,next_seq,next_frag_offset,is_final— where theemitted chunk stopped.
The server pages with
… LIMIT 1, threading each chunk'snext_*back as the nextresume_*under a pinneduntilwatermark, untilis_final. Tiling is exact —(db_version, seq)is a unique total order, so each change is emitted in exactly onechunk with no overlap and no reliance on changes-level idempotency. The first fetch
uses the legacy exclusive
since(drain-start "exclusive-after" rule).Client wire protocol is unchanged — the positional cursor is server-internal
(job ↔ node); the client still pages by an integer artifact cursor.
network.cisuntouched.
Feature comparison
CREATE TABLEprivilege/checkPerformance (
test/chunk_bench.c, local SQLite, end-to-end paging)Chunk counts/bytes match exactly (positional is correct, just slower); the ratio
doubles as the window doubles ⇒ positional is O(N²). Root cause: each resume runs
… FROM cloudsync_changes WHERE (db_version,seq) ≥ cursor ORDER BY db_version, seq,and
cloudsync_changeshas no(db_version, seq)index, so it re-scans + re-sortsthe window every call.
Accepted trade-off:
/checkis an async background job, so the O(N²) is node CPUon cold-start/large windows — fine for typical windows and mitigatable. The real O(N)
fix is a follow-up (indexed
(db_version, seq)seek oncloudsync_changes);test/chunk_bench.cis kept as the guard to confirm it flattens the curve.Changes
cloudsync_payload_chunkspositional cursor — SQLite vtab (cloudsync_sqlite.c)and PostgreSQL SRF (
cloudsync_postgresql.c+ SQL signatures). Legacy callers(send path) unchanged; columns appended, positional branch only active when
resume_db_versionis bound.cloudsync_payload_spooltable +spool_fill/spool_drop/drop_chunkon both engines, the SQLite
Payload Download Spoolunit test, and the spoolsections of the PG chunk test.
an end-to-end drain→apply→content-hash round-trip.
test/chunk_bench.c— local positional-drain benchmark / O(N²)→O(N) guard.Verification
the spool removed.
failures.check(separate commit) surfaces apermanent error immediately instead of polling.
Follow-ups (not blocking)
(db_version, seq)seek oncloudsync_changes→ positional drain O(N)./checkjob-duration metric/alert for pathological large windows.🤖 Generated with Claude Code