Add GET /telemetry/events to read archived telemetry from S2#285
Add GET /telemetry/events to read archived telemetry from S2#285archandatta wants to merge 14 commits into
Conversation
bb9b965 to
0f44c61
Compare
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… limit Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reading telemetry emitted its own api_call event back into the stream, a feedback loop that prevents a paginated read from catching the tail. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A fresh s2.New() per request re-handshakes TLS/HTTP2 on every page of a paginated walk; build the client once in NewS2Reader and reuse it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pins the telemetryReadOps skip against operationId drift; without it a rename silently revives the read feedback loop with all fast tests green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- unify the category enum: PublishEventRequest now references the shared TelemetryEventCategory schema instead of inlining it (no drift on add) - clamp negative since/until/offset to 0 rather than wrapping to a huge uint64; add minimum:0 to the since/until schema - drop the comment claiming a pagination "convention" that does not exist, and stop leaking the internal ring-buffer size into the replay doc - cover category-filtered pagination (empty intermediate pages) e2e and the negative-bound clamping unit case Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
49d0a07 to
48e3e12
Compare
|
Created a monitoring plan for this PR. What this PR does: Browser telemetry events that age out of the live SSE stream are now accessible via a new paginated read endpoint, so consumers can retrieve the full history of archived events rather than only live-streamed ones. Also fixes a feedback loop where the per-request api_call middleware was self-emitting events for telemetry read endpoints, which could prevent paginated reads from ever catching the stream tail at small page sizes. Intended effect:
Risks:
Status updates will be posted automatically on this PR as monitoring progresses. |
Reflection-assert the skip-set keys are real StrictServerInterface methods so an operationId rename can't silently revive the read feedback loop, and skip-with-warn an unparseable archived record so one bad record can't wedge every paginated read past its seqnum. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rgarcia
left a comment
There was a problem hiding this comment.
reviewed — solid PR. core pagination mechanics are correct (i traced the S2 cursor/has_more termination through the SDK) and match the main API's offset_pagination convention; unit + e2e coverage is thorough. main things are spec-surface wording and a couple of consistency items before this gets relayed to the public API. no correctness bugs.
API surface — keep implementation out of the spec (this gets relayed to the public API)
server/openapi.yaml:1383(summary) — "Read archived telemetry events from durable storage" is implementation-heavy → something like "Read telemetry events for a browser session"server/openapi.yaml:1385-1390(description) — drop the S2 / "age out of the live SSE ring buffer" / "durable storage is not configured" detail → e.g. "Reads telemetry event data for the browser session."server/openapi.yaml:1398(offset param) — "This is a stream position" leaks implementation; reword as an opaque pagination cursor. keep the useful guidance (pass backX-Next-Offset; don't derive it from the response body / it's not the envelopeseq).
Pagination consistency (vs kernel/kernel packages/api/openapi.yaml)
- core mechanics MATCH: header-based offset pagination, top-level array body,
offset/limitparams,X-Has-More/X-Next-Offsetheaders — i.e. the Stainlessoffset_paginationmodel. the planned SDK wiring should work as-is. server/openapi.yaml:1428(limit max) — max is 1000; every main-API list endpoint capslimitat 100 (default 20). align to 100, or call out the higher cap as a deliberate telemetry exception.server/openapi.yaml:1406/:1414(since/until) — typed as int64 unix ms. the main API'ssinceis a string (RFC-3339 or a duration like5m, e.g.packages/api/openapi.yaml:8346) and there's nountilanywhere. prefer RFC-3339/duration forsinceto stay consistent for the public relay, and treatuntilas a documented telemetry-specific addition.offsettaking precedence oversinceis new behavior (no main-API endpoint combines a cursor with a time filter) — worth an explicit doc note.
Handler
server/cmd/api/api/events.go:142(defaultReadWindow) — only sets the start (now−5m); the end is bounded bycount, not a window. reword the name/comment to "default start" to avoid implying an end bound.server/cmd/api/api/events.go:190-195(buildReadOptions) —until-only requests (nosince/offset) still anchor the start at now−5m via the default branch, sountil< now−5m silently yields an empty page. consistent with the documentedsincedefault, but a mild footgun — confirm intended, or special-case until-only to start from the beginning.
Test coverage
- the core
S2Reader.Read+has_more/cursor logic (the subtlest code here) is exercised only by the e2e tests, which auto-skip without S2 creds + docker. worth confirming CI runs them with creds; otherwise an interface seam over the S2 read session for a mocked unit test would give fast-feedback coverage of the cursor math on every run. - windowed (
until) pagination termination on an actively-appended stream is hand-verified only (the "one benign empty boundary page" in the description), not in a committed test. my read of the SDK agrees it terminates (a zero-record page →NextReadPosition()==nil→has_more=false), but consider an e2e guard.
Nits (optional)
- duplicate "is S2 configured?" predicate — the explicit three-field
if config.S2Basin != "" && …inmain.gofor the writer vsNewS2Reader's internal copy of the same check; a smallconfig.S2Enabled()helper would keep them from drifting. - config defaults (
100ms/50) live in both the config struct tags and thes2storage.gozero-value fallbacks (already acknowledged in a comment) — just duplication to be aware of. S2Reader.Readunbounded-read guard (Count==nil && Until==nil → Until=now) is dead from the only caller (the handler always setsCount) and untested — fine as defensive library code, just noting it's unreachable from this path.
Summary
Adds a paginated read path for browser telemetry archived in S2, so events stay available after they age out of the live SSE ring buffer (
GET /telemetry/stream).GET /telemetry/events— reads archived telemetry envelopes for the browser in ascending sequence order. Returns a top-level JSON array of envelopes plusX-Has-More/X-Next-Offsetpagination headers. Empty result serializes as[], nevernull.offsetis an S2 sequence cursor (pass the previous response'sX-Next-Offset);since/untilbound the time window;categoryfilters;limitis the page size (default 100, max 1000).has_moreis derived from the S2 read's next-position vs the stream tail, so it is correct whether a page stops on the count, byte, or time bound. Negativesince/until/offsetare clamped to 0 (minimum: 0in the spec; no request-validation middleware is mounted).events.S2Reader— one-shot, bounded S2 reader injected intoApiService; builds its S2 client once and reuses it across pages.S2_BATCHER_LINGER/S2_BATCHER_MAX_RECORDSenv vars wire into the existingS2Config(defaults unchanged: 100ms / 50).TelemetryEventCategoryschema; both the read (TelemetryEvent) and write (PublishEventRequest) sides reference it, so adding a category can't drift the two apart.api_callevents. The per-requestapi_callmiddleware was firing forReadTelemetryEvents/StreamTelemetryEvents, appending to the very stream being read — a feedback loop that, at page size 1, prevented a paginated read from ever catching the tail. Reads are now side-effect-free.Envelope/event schemas are unchanged. The response is encoded directly from
events.Envelope(matching the SSE stream and publish endpoints), so all telemetry endpoints emit an identical envelope shape.Consumers paginate via
X-Next-Offset(an S2 stream position) — not the envelope'sseqfield; the param/header docs call this out.This endpoint is harmless until a consumer calls it, so it can ship independently. A follow-up on the control-plane API will mirror it and wire Stainless
offset_paginationso the SDK auto-paginates.Note on category-filtered pagination
The category filter is applied after the cursor-bounded S2 read (S2 has no server-side filtering), so a filtered page can come back empty while
X-Has-Moreis true.has_moreis authoritative — a client pages until it is false, not until a page is empty. Verified end-to-end (see below).Test plan
buildReadOptions(offset-over-since precedence, negative-bound clamping,Countmapping),pageSizeclamping,filterByCategory, and a middleware regression asserting the telemetry read endpoints emit noapi_call.server/e2e, real S2 viaKI_E2E_BACKEND=docker): round-trip, category filter, empty-window[], multi-page cursor walk (ascending, no gaps/dupes), category-filtered pagination across empty intermediate pages, and a read-is-side-effect-free assertion. Auto-skips withoutS2_BASIN/S2_ACCESS_TOKEN/S2_STREAM.go build,go vet, package tests green;make oapi-generateoutput committed.Live runs
Ran against a real S2 stream (freshly built headless image,
KI_E2E_BACKEND=docker):Also exercised by hand over HTTP at scale — published ~1000 events and paged the full stream at
limit=50:limit=1terminates cleanly (the feedback-loop fix), and a windowed (until) read terminates even while the stream is actively appended (one benign empty boundary page).🤖 Generated with Claude Code
Note
Medium Risk
New external read surface and S2 pagination semantics (especially category-filtered empty pages) need correct client cursor handling; middleware skip fixes a real pagination bug on read paths.
Overview
Adds paginated read access to telemetry archived in S2 via
GET /telemetry/events, so clients can recover events after they leave the live SSE ring buffer. When S2 is not configured, the handler returns200with[](nevernull). Pages useX-Has-More/X-Next-Offset(S2 stream cursor, not envelopeseq), withsince/until,limit(default 100, max 1000), and post-readcategoryfiltering.events.S2Readerperforms bounded, reusable-client S2 reads;ApiServicetakes an optional reader andbuildReadOptionsmaps query params (including negative-bound clamping andoffsetoversince).S2 append batching is tunable via
S2_BATCHER_LINGER/S2_BATCHER_MAX_RECORDSenv vars passed into the existing writer.OpenAPI promotes a shared
TelemetryEventCategoryfor publish and read paths (replacingPublishEventRequestCategory).Telemetry HTTP middleware skips
ReadTelemetryEventsandStreamTelemetryEventsso reads do not appendapi_callevents into the stream being consumed.E2e tests cover round-trip, pagination, filtered empty pages, and read side-effect-freedom against real S2 when creds are set.
Reviewed by Cursor Bugbot for commit 4c4e6ff. Bugbot is set up for automated code reviews on this repo. Configure here.