feat(core): OIDC sign-in via device flow (RFC 8628)#52
Open
glasstiger wants to merge 22 commits into
Open
Conversation
Two fixes for the OIDC device flow in the Java client. M2 - Bidi / zero-width Unicode bypassed the display sanitizer. sanitizeForDisplay and OidcAuthException.putSanitized filtered only on Character.isISOControl, which covers C0/C1 and DEL but not the bidirectional overrides (U+202A-202E), isolates (U+2066-2069), marks (U+200E/200F), zero-width characters or the BOM (U+FEFF). Those fields - user_code, verification_uri(_complete), error and error_description - all come from the IdP/settings boundary and reach System.out and the exception messages, so a hostile or MITM'd IdP could embed a right-to-left override and spoof the verification URL a human reads and then opens. The JSON lexer's \uXXXX decoding widens the vector, since an escaped override decodes to the real character before display. Both sanitizers now share OidcAuthException.isUnsafeForDisplay, which also strips the Unicode format category (Cf) plus the explicit bidi/BOM set. The predicate uses hex int literals rather than char escapes, keeping the source strictly ASCII so the file carries none of the characters it guards against. M3 - httpTokenProvider forced a successful sign-in before build(). createLineSender eagerly rebuilt the pending request when a provider was set, calling getToken() at build time. With the documented .httpTokenProvider(auth::getTokenSilently), that threw unless the caller had already signed in, so the natural "construct the sender, sign in, then send" ordering was impossible. The first token pull is now deferred off the build path to the first row (table()). The provider is wired at build but not queried; the initial request is stamped with a token when the first row starts, and the pending flag is cleared only after the pull succeeds, so a not-yet-signed-in provider that throws leaves the stamp pending for a retry. The Sender.httpTokenProvider Javadoc now states the provider is not called at build time. Tests: new bidi/zero-width cases for the challenge fields and the oauth error message (fed as JSON \uXXXX escapes so they exercise the decode-then-display path), and a new LineHttpSenderTokenProviderTest covering the deferred pull and a lazily signing-in provider. Each test was confirmed to fail without its fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three robustness fixes in the OIDC device-flow parser. m3 - A JSON null arrives from the lexer as the literal "null". The token and device parsers used putValue, which stored it verbatim, so "access_token": null became the 4-char token "null" and "error": null was read as an OAuth error code "null". Merged putValue with SettingsDiscoveryParser's null-guarding putNonNull into one shared helper used by all three parsers, so a JSON null is treated as absent everywhere. m4 - Endpoint.parse did not range-check the port, so host:0, host:-1 and host:99999 parsed and flowed to the transport. Added a 1..65535 guard that rejects them with a clear message. m5 - The token-response expires_in was not clamped, unlike the device-auth value, so a TTL near Integer.MAX_VALUE cached the token for ~68 years. storeTokens now applies the same boundedSeconds clamp (the default for a non-positive value, capped at MAX_EXPIRES_IN_SECONDS). The server still enforces the real expiry; this only bounds how long the client trusts its cached copy. Tests: null access_token and null error are rejected/ignored, out-of-range ports are rejected at build, and a clamped token expiry forces a fresh sign-in (observed via a clock-skew margin set above the clamp). Each test was confirmed to fail without its fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-questdb-client into ia_oidc_device_flow
The post-flush reset() eagerly rebuilt the next request and pulled the provider token via httpTokenProvider.getToken() after the current batch had already been sent and accepted. If that pull threw (e.g. OidcDeviceAuth::getTokenSilently when a silent refresh fails) it turned an already-successful flush into a thrown exception and left the shared Request half-built (contentStart == -1, no withContent()), so the next row's data went into the header region - a malformed request, lost rows and a permanently corrupted sender. Route every request's token pull through the same deferred, retriable path the initial request already used: newRequest() no longer pulls the provider token (it marks the request token-pending and builds a valid token-less request), and stampTokenIfPending() pulls it lazily when the first row of a request starts. A failed pull leaves the flag set and the sender untouched, so the next row re-runs the stamp and fully rebuilds the request. Per-request token rotation is unchanged. Rename isInitialTokenPending/stampInitialTokenIfPending to isTokenPending/stampTokenIfPending since the deferral now covers every request, and stamp the token in putRawMessage() too. Add a regression test that fails at the first, successful flush without the fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
getToken() and getTokenSilently() were both synchronized on the instance
monitor, and getToken() holds it for the entire interactive device flow -
up to the device-code lifetime, clamped to one hour. A long-lived Sender
wired with httpTokenProvider(auth::getTokenSilently) therefore stalled on
the flush path for up to an hour whenever another thread ran an
interactive sign-in (e.g. a re-auth after the refresh token died). The
javadoc claimed the opposite ("safe on a request/flush path").
Replace the synchronized methods with a ReentrantLock. getToken() and
clearCache() still acquire it blocking, but getTokenSilently() now uses
tryLock() and fails fast with an OidcAuthException instead of waiting:
while a sign-in is in progress there is no token to serve anyway, so the
caller gets a prompt, retriable exception rather than a wedged flush. The
interactive flow still holds the lock for its whole duration and close()
still sets the volatile cancellation flag before acquiring the lock, so
the no-use-after-free guarantee is unchanged.
Correct the class and getTokenSilently() javadocs, and add a regression
test that fails (getTokenSilently blocks ~10s behind an in-flight
sign-in) without the fix.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
isUnsafeForDisplay() inspected one UTF-16 code unit at a time, so a supplementary-plane (>= U+10000) format or control character - an invisible U+E00xx "tag" char, for instance - arrived as a surrogate pair whose halves are each neither a control nor category Cf and so passed the filter unstripped. Because the JSON lexer reassembles such 😀-style escapes, a hostile or man-in-the-middled identity provider could smuggle invisible/spoofing characters into a user_code, a verification_uri, or an error_description and on into the terminal prompt and exception messages. Judge a Unicode code point instead: isUnsafeForDisplay() takes an int, and both sanitizers (putSanitized for exception messages, sanitizeForDisplay for the prompt) walk the text by code point with Character.codePointAt/charCount, so Character.getType classifies a supplementary char as one character. A legitimate astral character (an emoji) is still preserved. Make the assertNoUnsafeDisplayChars test helper code-point-aware too - it shared the blind spot - and add a regression test that fails (the U+E0001 tag char survives) without the fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pollOnce() checked for a token before the HTTP status and the OAuth error field, so a response that carried a token alongside an error, or under a non-2xx status, was cached as a valid grant. tryRefresh() had the same flaw: it accepted the refreshed token on token presence alone. Both contradict RFC 6749 - 5.1 makes a grant a 2xx response carrying a token, and 5.2 says an error response must not be treated as a grant. Handle the OAuth error first in pollOnce(), so a token smuggled alongside an error never counts, and accept a token only when the status is 2xx; a token under a non-2xx status goes to the transport- error budget instead of being trusted. Guard tryRefresh() the same way: cache the refreshed token only from a clean 2xx response with no error, otherwise fall back to the interactive flow. The happy path and the existing pending/slow_down/access_denied/empty- body outcomes are unchanged. Add regression tests for a token alongside an error, a token under a non-2xx status, and a refresh that smuggles a token with an error - each fails without the fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
newRequest() passed the token from httpTokenProvider.getToken() straight to authToken(), which does not null- or empty-check it. A provider that returned null, "", or whitespace therefore produced a malformed "Authorization: Bearer " header that the server only answered with a 401 far from the cause - no client-side error at all. The HttpTokenProvider contract forbids such a return but nothing enforced it, and httpToken() already rejects a blank token, so the provider path was the weaker spot. Validate the pulled token with Chars.isBlank (as httpToken does) and throw a clear LineSenderException instead. The check sits inside the deferred pull, so a rejected token leaves the stamp pending and the next row retries cleanly, just like a throwing provider does. OidcDeviceAuth never returns a blank token, so this guards custom providers. Add tests that a null, an empty, and a whitespace-only provider token is rejected at first use - each fails without the fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
JsonLexer.getCharSequence rescanned every decoded value and name from the start to look for a backslash, even though the parse loop already detects one when it sets ignoreNext. Record that in a sawEscape flag (carried across parse() fragments) and resolve escapes only when it is set, so the common no-escape value returns the assembled sink without a second pass. OidcDeviceAuth.Endpoint.parse now rejects a host that contains control characters or whitespace - a smuggled CR/LF would otherwise flow into the outbound Host header. Add the tests these paths lacked: a cross-fragment escape; the lexer's lenient and exotic escape arms (surrogate pairs, \b/\f, unknown and malformed escapes, lone surrogates); the version-probe settings parser reading an escaped key through unescape; HTTP-token-provider rejection for UDP and WebSocket (not just TCP); and the control-character host cases above. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Port the issuer feature from py-questdb-client (PR #133) onto OidcDeviceAuth, so the device flow keeps working against servers that do not advertise their device-authorization endpoint, and so the device code and refresh token are only sent where the caller pins. The issuer plays three roles: - Discovery fallback: when /settings omits the device (and/or token) endpoint, fromQuestDB(url, issuer) reads it from the issuer's .well-known/openid-configuration document. The discovery origin comes only from the out-of-band issuer (or an explicit discoveryUrl), never from a /settings-supplied value, so a tampered /settings cannot redirect discovery. Without a pin, discovery is refused. - Plaintext-channel pin: a /settings response fetched over plaintext http to a non-loopback host (only reachable with allowInsecureTransport) cannot route credentials to its advertised endpoints without a pin. - Endpoint-origin pin: validateEndpointOrigins, enforced in Builder.build() on every construction path, requires the token and device endpoints to share one origin (RFC 8628 co-location) and, when an issuer is set, to belong to it. Config surface: Builder.issuer(...); new fromQuestDB overloads (url, issuer), (url, issuer, allowInsecure), and a 5-arg master taking issuer, discoveryUrl and a TLS config. Tradeoffs: - The co-location check makes the token and device endpoints share an origin. testPersistentTransportFailureDuringPollingAborts simulated an unreachable token endpoint with a dead second port; it now uses a new MockOidcServer.dropConnection() against a co-located path. - The origin pin compares scheme/host/port and ignores the path, so an identity provider that hosts its endpoints on a different origin than its issuer must be configured without an issuer. This matches the Python client. - allowInsecureTransport still relaxes the identity provider endpoints too (unchanged); the Python client always forces https/loopback for the IdP. Left as-is to avoid changing settled transport behavior. Adds 7 tests and updates the README OIDC section. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Endpoint.parse now rejects control characters and whitespace anywhere in the url before splitting it. The host was already checked, but the path was not, so a tampered /settings or discovery document could carry a CR/LF in an endpoint path that the JSON lexer decodes and postForm writes verbatim onto the request line via .url(endpoint.path) - a header-injection / request-smuggling vector that the origin pin (which compares scheme/host/port only) does not catch. Validating the whole url up front also keeps it safe to echo in the parse error messages. fromQuestDB now derives the pin origin from a caller-supplied discoveryUrl when no issuer was resolved. Previously a discoveryUrl pin only took effect when discovery actually ran (an endpoint missing from /settings); when /settings advertised both endpoints the discovery branch was skipped and validateEndpointOrigins ran with a null issuer, so a compromised server could advertise both endpoints at an attacker origin and slip past the pin. The discoveryUrl pin now behaves like the issuer pin on every construction path. Adds regression tests for both: a CR/LF-injected advertised endpoint, path and query cases in Endpoint.parse, and discoveryUrl-pin accept and reject against on- and off-origin endpoints. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Endpoint.parse already rejected control characters and whitespace in the url, which kept it safe to echo into the exception messages once it passed validation. That scan did not catch bidi, zero-width or other format characters (U+202E, U+200B, U+FEFF, the Cf category, and the supplementary-plane tag characters), so a tampered /settings or discovery endpoint url could still smuggle one into an OidcAuthException message and reorder, hide or forge the log line it lands in. The url scan now runs per code point and also rejects anything isUnsafeForDisplay flags, so an OIDC url may carry no control, whitespace or display-unsafe character. Every raw url echo in Endpoint.parse, requireSecureTransport and fromQuestDB is therefore safe on screen as well as on the wire, and the rejection message sanitizes the url it reports. Adds a regression test covering a right-to-left override, a zero-width space, the BOM and a supplementary-plane tag character. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
isUnsafeForDisplay now treats an unpaired UTF-16 surrogate as unsafe, so a lone surrogate half - which JsonLexer emits verbatim for a single backslash-u-XXXX escape and which codePointAt surfaces as a SURROGATE code point - is stripped from a user_code, verification_uri or error string before it reaches a terminal or a log line. A valid high+low pair is still reassembled by codePointAt and judged on its real category, so a legitimate emoji survives. The method comment is corrected too: codePointAt in the callers reassembles pairs, not the lexer. close() and the class Javadoc no longer claim an in-flight sign-in is cancelled "promptly". The cancel flag is observed between polls (within about 100ms) but a poll request already in flight is not interrupted, so close() can take up to one HTTP request timeout to return - still far short of the device-code lifetime. The docs now say so. Adds tests: lone high and low surrogates are stripped from the device challenge while an emoji survives; and the private isLoopbackHost classifier (which gates the plaintext-channel MITM pin) is pinned for localhost and the 127.0.0.0/8 block, and against non-loopback and spoofing hosts such as 127.evil.com, localhost.evil.com, 127.1 and 127.0.0.256. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The poll loop now clamps the slow_down-inflated interval to the same MAX_POLL_INTERVAL_SECONDS cap the initial interval already respects, so repeated slow_down responses from the identity provider cannot grow the wait without bound. The device-authorization, token and well-known parsers now reset their current field to FIELD_NONE after each value, matching SettingsDiscoveryParser. The parsers are not currently confusable - in well-formed JSON a name event always sets the field before the next value, array elements arrive as EVT_ARRAY_VALUE, and nested values are filtered by the depth check - so this is a defensive consistency fix that removes a latent field-confusion foot-gun rather than a behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
JsonLexer.unescape no longer re-scans the value from the start to re-find the backslash the lexer already flagged via hasEscape; it walks the value once, copying plain characters and resolving escapes in place. That drops the now-dead "no escapes" early return and the separate prefix copy, so an escaped value is traversed about twice (decode then unescape) instead of three times. parseHex4 looks the hex digit up in the shared Numbers.hexNumbers table instead of Character.digit, keeping the same -1-on-non-hex contract. All of this is on the cold error/discovery/auth parse path, never on ingestion. Reorders pollForToken ahead of pollOnce so the private methods stay in alphabetical order; no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 4 MiB response-body cap (MAX_RESPONSE_BODY_BYTES) that bounds the OIDC device flow against a hostile or MITM'd server streaming an endless body had no test coverage on the parseBody path. Add an oversizedJson() mode to MockOidcServer that streams a chunked, mostly-whitespace body past the cap, and a test that drives discovery against it and asserts the bounded read aborts with the size-limit error - which also confirms the token-bearing body never reaches the message. The body is whitespace so the lexer keeps consuming until the byte cap trips, instead of hitting its per-value length limit first. Verified both ways: the test passes with the 4 MiB cap and fails when the cap is disabled, where the full body is read and parsing fails with "Unterminated object" instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three small fixes to the OIDC device authorization flow, all in OidcDeviceAuth: - runDeviceFlow now rejects a non-2xx device authorization response. Previously it trusted any body that carried device_code/user_code/ verification_uri and no OAuth error, so a non-2xx response would prompt the user and start polling. It now applies the same 2xx gate pollOnce and tryRefresh already use before trusting a body. - pollForToken checks the device-code deadline at the top of the loop and never sleeps past it, so an expiry that elapses during a sleep times out promptly instead of after one more wasted poll and up to a full extra poll interval. - tryRefresh drops an unreachable branch that rethrew on an OAuth error. postForm only throws on a parse failure here, and a real OAuth error arrives in tokenParser.error (handled by the hasRequiredToken check), so the branch was dead. No behaviour change. Add testNonSuccessDeviceAuthorizationResponseRejected covering the new 2xx gate; it fails without the check (the 403 is accepted, the user is prompted, and polling fails later instead). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
[PR Coverage check]😍 pass : 832 / 884 (94.12%) file detail
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds interactive OIDC sign-in to the Java client using the OAuth 2.0 Device Authorization Grant (RFC 8628). A process with no local browser — a remote notebook kernel, a container, a headless job — can sign a human in against QuestDB Enterprise: the user authorizes on any device (laptop or phone) while the process only makes outbound calls to the identity provider.
On first use it prints a verification URL and a short code; once the user authorizes, the token is cached in memory and refreshed silently on later calls.
What's new
OidcDeviceAuth(io.questdb.client.cutlass.auth) — runs the flow and owns the token:OidcDeviceAuth.fromQuestDB(url)discovers the client id, scope and IdP endpoints from the server's unauthenticated/settings;OidcDeviceAuth.fromQuestDB(url, issuer)additionally pins the identity provider (see Discovery and trust below);OidcDeviceAuth.builder()configures the identity provider explicitly.getToken()signs in interactively on first use, then serves a cached token and refreshes it silently;getTokenSilently()never prompts (safe on a request/flush path);getAuthorizationHeaderValue()returns the fullBearer …value;close()cancels an in-flight sign-in. Token state lives in memory only and does not survive a process restart.Senderintegration — newHttpTokenProviderinterface andSender.builder(...).httpTokenProvider(auth::getTokenSilently). The sender pulls a freshly refreshed token on every request, so a long-lived sender keeps working as the token rotates — unlike a fixedhttpToken(...), which is captured once and eventually starts returning 401s. Mutually exclusive withhttpToken/httpUsernamePassword, and HTTP-transport only (rejected for TCP/UDP/WebSocket).DeviceCodePrompt/DeviceAuthorizationChallenge— how the verification URL and user code are shown. The default prints toSystem.out; supply your own to render a clickable link or a QR code, e.g. in a notebook.The token can be presented to QuestDB over any auth path the server already validates:
Authorization: Bearer <token>._ssowith the token as the password (requiresacl.oidc.pg.token.as.password.enabled=trueon the server).Discovery and trust
fromQuestDB(...)takes the IdP endpoints from the server's unauthenticated/settings, so by default it trusts that server to designate where the user signs in: a spoofed, compromised, or man-in-the-middled server could otherwise redirect the sign-in — and the long-lived refresh token — to an attacker-controlled identity provider. An optionalissuerargument addresses this, and also covers servers that do not advertise a device-authorization endpoint. It plays three roles:.well-knowndiscovery fallback. Current servers do not advertise the device-authorization endpoint. When it (and/or the token endpoint) is missing,fromQuestDB(url, issuer)reads it from the issuer's{issuer}/.well-known/openid-configurationdocument. The discovery origin comes only from the caller-suppliedissuer(or an explicitdiscoveryUrl), never from a/settings-supplied value, so a tampered/settingscannot choose where discovery — and the credential POSTs it resolves — are aimed. Without an issuer, discovery is refused rather than guessed.validateEndpointOrigins, enforced on every construction path (discovery and the explicitbuilder()), requires the token and device-authorization endpoints to share one origin (RFC 8628 co-locates them), and — when an issuer is set — to belong to that issuer's origin, so a compromised-but-reachable server cannot redirect the credentials off the trusted issuer./settingsresponse fetched over plaintexthttpto a non-loopback host (only reachable withallowInsecureTransport) is MITM-able, so its advertised endpoints are not trusted to route credentials without an issuer/discoveryUrlpin.Without an
issuer, the behaviour against anhttpsserver that advertises its endpoints is unchanged: that server is trusted, as before.Security
httpsis required by default for both the QuestDB server and the IdP endpoints;httpis rejected unless the caller opts in withallowInsecureTransport(true)(local development only).expires_in/intervalare clamped; a persistent transport failure while polling aborts on a small error budget instead of running silently to the device-code timeout; IPv6-literal hosts are rejected rather than mis-parsed.Supporting changes
JsonLexernow resolves JSON string escape sequences (\",\\,\/,\b \f \n \r \t,\uXXXX; lenient on malformed input), so string values arrive fully decoded.Responsegainsrecv(int timeout)for deadline-bounded body reads.AbstractLineHttpSenderplumbs the token provider through and rebuilds the pending request so the very first send already carries a provider-sourced token.Tradeoffs and limitations
builder(). This matches the Python client.MockOidcServer.dropConnection()).allowInsecureTransportstill relaxes the IdP endpoints as well as the QuestDB link; it is not narrowed to the server link only (unlike the Python client, which always holds the IdP to https/loopback). The plaintext-channel pin mitigates the credential-routing risk this leaves, but the final hop to the IdP can still be plaintext when this flag is set.Tests & docs
OidcDeviceAuthTest+MockOidcServer,SenderBuilderErrorApiTest,JsonLexerTest; a runnableOidcDeviceFlowExample; and a README "OIDC Sign-In (Device Flow)" section. The issuer feature adds tests for:.well-knowndiscovery via an issuer, and via an explicitdiscoveryUrl