feat(payload): positional-cursor /check paging; remove the spool by andinux · Pull Request #52 · sqliteai/sqlite-sync

andinux · 2026-06-27T01:05:11Z

Summary

Replace the server-side download spool with a positional cursor on
cloudsync_payload_chunks, so the /check job can page a tenant's change window
one chunk per round-trip with no temporary table. This is the extension half of the
change; the companion server PR is sqliteai/cloudsync#52.

Why

The spool stages a whole /check window into a cloudsync_payload_spool table on
the tenant, then pages it out. That has three operational costs this PR removes:

Privilege — spool_fill issues CREATE TABLE, which the tenant's default
readwrite apikey is not authorized for → failures.check: database_permission_denied
(the original bug; the bench polled forever). Positional needs no DDL.
Disk — the spool temporarily ~doubles tenant DB space per requesting client.
Positional writes nothing to the tenant.
Ops — no provisioning / migration / dbadmin-fallback to keep the spool table
creatable.

How it works

cloudsync_payload_chunks gains an optional positional cursor:

inputs resume_db_version, resume_seq, resume_frag_offset — start the scan
at (db_version, seq) inclusive and re-enter a mid-value fragment at a byte offset;
outputs next_db_version, next_seq, next_frag_offset, is_final — where the
emitted chunk stopped.

The server pages with … LIMIT 1, threading each chunk's next_* back as the next
resume_* under a pinned until watermark, until is_final. Tiling is exact —
(db_version, seq) is a unique total order, so each change is emitted in exactly one
chunk with no overlap and no reliance on changes-level idempotency. The first fetch
uses the legacy exclusive since (drain-start "exclusive-after" rule).

Client wire protocol is unchanged — the positional cursor is server-internal
(job ↔ node); the client still pages by an integer artifact cursor. network.c is
untouched.

Feature comparison

	positional cursor (new)	spool (old)
Tenant `CREATE TABLE` privilege	none	required (the bug)
Tenant disk during `/check`	none	~doubles per client
Provisioning / migration	none	spool table must exist
Node-side state	none (cursor is the job's loop var)	spool rows per stream
Client wire protocol	unchanged	unchanged
Drain cost	O(N²) (see below)	O(N)

Performance (`test/chunk_bench.c`, local SQLite, end-to-end paging)

window (multi-version)	spool O(N)	positional O(N²)	ratio
100 chunks	164 ms	872 ms	5.3×
200 chunks	350 ms	3,439 ms	9.8×

Chunk counts/bytes match exactly (positional is correct, just slower); the ratio
doubles as the window doubles ⇒ positional is O(N²). Root cause: each resume runs
… FROM cloudsync_changes WHERE (db_version,seq) ≥ cursor ORDER BY db_version, seq,
and cloudsync_changes has no (db_version, seq) index, so it re-scans + re-sorts
the window every call.

Accepted trade-off: /check is an async background job, so the O(N²) is node CPU
on cold-start/large windows — fine for typical windows and mitigatable. The real O(N)
fix is a follow-up (indexed (db_version, seq) seek on cloudsync_changes);
test/chunk_bench.c is kept as the guard to confirm it flattens the curve.

Changes

cloudsync_payload_chunks positional cursor — SQLite vtab (cloudsync_sqlite.c)
and PostgreSQL SRF (cloudsync_postgresql.c + SQL signatures). Legacy callers
(send path) unchanged; columns appended, positional branch only active when
resume_db_version is bound.
Removed cloudsync_payload_spool table + spool_fill/spool_drop/drop_chunk
on both engines, the SQLite Payload Download Spool unit test, and the spool
sections of the PG chunk test.
New tests on both engines: byte-for-byte tiling incl. a mid-fragment resume, plus
an end-to-end drain→apply→content-hash round-trip.
test/chunk_bench.c — local positional-drain benchmark / O(N²)→O(N) guard.

Verification

SQLite unit suite passes; PG chunk tests (52/53/54/55) pass on PostgreSQL 17 with
the spool removed.
Client fail-fast on a non-retryable failures.check (separate commit) surfaces a
permanent error immediately instead of polling.

Follow-ups (not blocking)

Indexed (db_version, seq) seek on cloudsync_changes → positional drain O(N).
/check job-duration metric/alert for pathological large windows.

🤖 Generated with Claude Code

cloudsync_payload_chunks could only resume a window from since>db_version, so a chunk boundary that lands inside a single committed db_version (or inside a fragmented oversized value) was not addressable — which is the whole reason the server stages the stream into a cloudsync_payload_spool table to page it out. This makes the vtab resumable by position instead. Add an optional positional cursor: hidden inputs resume_db_version, resume_seq, resume_frag_offset start the scan at (db_version, seq) inclusive and re-enter a mid-value fragment at a byte offset; new outputs next_db_version, next_seq, next_frag_offset and is_final report where the emitted chunk stopped. A stateless /check can then page the whole window with an O(1) seek per call (vs O(N^2) replay-from-since), no spool table, no server-side state. Legacy since>db_version callers (send path, spool fill) are unchanged: columns are appended and the positional branch only activates when resume_db_version is bound. Tiling is exact, not idempotent-overlap: (db_version, seq) is a unique total order (changes rowid = (db_version<<30)|seq), next_* names the row that did not fit or the exact byte already emitted, and the fragment plan is a deterministic function of the row, so a resumed fragment tiles identically. The drain-start cursor must seek exclusive-after the last applied change (see docs/internal design note) so the protocol never relies on changes-level idempotency to absorb a re-sent row. New unit test Payload Chunks Positional Resume drives a window mixing a db_version split across chunks with a value larger than the chunk budget, pages it one chunk per call via the cursor, and asserts byte-identity with a full-window scan (including a mid-fragment resume). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…into codex/positional-cursor-chunks

… SRF Mirror the SQLite vtab's positional cursor on the PostgreSQL SRF so the /check job can page the node one chunk per round-trip without the cloudsync_payload_spool table. Adds inputs resume_db_version, resume_seq, resume_frag_offset (start the scan at (db_version, seq) inclusive and re-enter a mid-value fragment at a byte offset) and outputs next_db_version, next_seq, next_frag_offset, is_final (where the emitted chunk stopped). The positional query reuses the (db_version > $ OR (db_version = $ AND seq >= $)) shape already used by payload_blob_checked's estimate, with an inclusive seq to match the vtab's exact tiling. Fragment setup is factored into payload_chunks_pg_begin_fragment so a streamed and a resumed fragment build identically; the next cursor is read from the buffered source row (or the same row mid-fragment), peeking one row ahead to set is_final. Legacy callers (send path, spool fill) are unchanged: new args default to NULL so the positional branch only activates when resume_db_version is bound, and the existing output columns keep their positions. New test 55_payload_chunks_positional_resume.sql resumes at every non-final chunk's reported cursor and asserts the fetched chunk equals the next chunk of a full-window scan byte-for-byte, including a mid-fragment resume. Verified against PostgreSQL 17; existing chunk/spool/fragment tests (52-54) still pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Part 1 of test 55 proves the positional cursor produces byte-identical chunks to a full scan. Part 2 now closes the loop end-to-end: a plpgsql helper drains the whole window the way the /check job will (legacy exclusive since=0, then the positional cursor one chunk per call, ORDER BY chunk_index LIMIT 1 so each value-per-call SRF runs to completion), the drained stream is applied to a fresh database, and the receiver's table content is hashed and compared to the source. This validates the real path (positional drain -> apply -> faithful replica), not just byte-identity against a baseline. Verified on PostgreSQL 17: 501 rows reproduced, hashes match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Mirror the PostgreSQL test's end-to-end check on the SQLite side: after asserting the positional drain reproduces the full-scan chunks byte-for-byte, apply that drained stream to a fresh receiver database and assert its split_test content matches the source via test_split_tables_equal. Chunks are applied in reverse drain order so the apply path must reassemble v3 fragments and merge rows independent of transport order. Validates positional drain -> apply -> faithful replica, the real path the /check job will use. All unit tests pass, no leaks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chunk_bench builds a window of N chunks and times paging the whole window one chunk per call via the positional cursor on cloudsync_payload_chunks, reporting wall time, per-chunk cost and throughput. Local-only (loads the built dylib, no network). Env-tunable rows/row_bytes/txns/repeats/chunk_size (TXNS splits the rows across db_versions). Lets the drain's computational growth be tracked — e.g. to confirm a future indexed (db_version, seq) seek flattens it from O(N^2) to O(N). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ions The /check job now pages the tenant via the positional cursor on cloudsync_payload_chunks, so the server-side download spool is dead. Remove the cloudsync_payload_spool table and the cloudsync_payload_spool_fill / _drop / _drop_chunk functions on both engines (SQLite: sql_sqlite.c constants, sql.h decls, cloudsync_sqlite.c functions + registration; PostgreSQL: cloudsync.sql.in and the 1.0->1.1 migration), plus the SQLite "Payload Download Spool" unit test and the spool sections of the PostgreSQL chunk test. The client-side cursor paging (network.c) is unchanged: the wire protocol still pages chunks by an integer cursor; only the server's node-side staging mechanism changed. All remaining chunk/fragment/positional tests pass on both engines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

andinux and others added 7 commits June 26, 2026 18:08

Merge remote-tracking branch 'origin/codex/chunked-payloads-network' …

d95a655

…into codex/positional-cursor-chunks

andinux changed the title ~~feat(payload): positional-cursor resume for chunk vtab (prototype)~~ feat(payload): positional-cursor /check paging; remove the spool Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(payload): positional-cursor /check paging; remove the spool#52

feat(payload): positional-cursor /check paging; remove the spool#52
andinux wants to merge 7 commits into
codex/chunked-payloads-networkfrom
codex/positional-cursor-chunks

andinux commented Jun 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

andinux commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

How it works

Feature comparison

Performance (test/chunk_bench.c, local SQLite, end-to-end paging)

Changes

Verification

Follow-ups (not blocking)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andinux commented Jun 27, 2026 •

edited

Loading

Performance (`test/chunk_bench.c`, local SQLite, end-to-end paging)