gateway: custom-domain app lookup does an uncached DNS query on every connection, adding ~0.8–3s to the TLS handshake


## Problem

For a **custom domain** (an SNI that isn't a `<app-id>.<base-domain>` subdomain), the gateway looks up the target app via DNS on **every connection**, **before** the TLS handshake completes — so the delay shows up as handshake latency. The lookup is slow because it:

1. Builds a **new DNS resolver every call** (`AsyncResolver::tokio_from_system_conf()`), so nothing is cached between connections and the record TTL is never used.
2. Runs the primary and legacy TXT lookups with `tokio::join!` and **waits for both**, putting a slow/negative legacy lookup on the critical path.

This is worst when the gateway's resolver is slow — e.g. a CVM on QEMU user-mode (SLIRP) networking, where DNS is forwarded and uncached. Subdomain-routed apps skip the lookup and are unaffected.

## Code

`gateway/src/proxy/tls_passthough.rs`, `resolve_app_address()` (called per connection from `proxy_with_sni()`, before `tls_accept()`):

```rust
let resolver = hickory_resolver::AsyncResolver::tokio_from_system_conf()?;  // (1) new resolver every call
// ...
let (lookup, lookup_legacy) = tokio::join!(   // (2) waits for BOTH; legacy is usually NXDOMAIN
    resolver.txt_lookup(txt_domain),
    resolver.txt_lookup(txt_domain_legacy),
);
```

## Evidence

TLS-handshake time against one gateway, over loopback (no internet RTT), 18 samples each:

| SNI | pre-handshake work | median | max |
|---|---|---:|---:|
| custom domain | DNS lookup + handshake | **820 ms** | **3373 ms** |
| `<app-id>.<base>` subdomain | no DNS | 10 ms | 16 ms |

The ~810 ms gap is entirely the DNS step. Adding the missing legacy TXT record (so the second lookup isn't NXDOMAIN) dropped the median to ~499 ms — confirming the `join!`-on-both cost, but most of the delay is the per-connection uncached resolver.

## Suggested fix

1. Build the resolver **once** and reuse it (hickory caches by TTL).
2. Cache resolved app-addresses by record TTL so steady-state connections skip DNS.
3. Make the legacy lookup a fallback (only on primary miss), not a `join!` that always waits.

(1)+(2) should bring custom-domain connections down to the subdomain baseline.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gateway: custom-domain app lookup does an uncached DNS query on every connection, adding ~0.8–3s to the TLS handshake #736

Problem

Code

Evidence

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

SNI	pre-handshake work	median	max
custom domain	DNS lookup + handshake	820 ms	3373 ms
`<app-id>.<base>` subdomain	no DNS	10 ms	16 ms

gateway: custom-domain app lookup does an uncached DNS query on every connection, adding ~0.8–3s to the TLS handshake #736

Description

Problem

Code

Evidence

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions