Search quality, source prioritization, OutSystems connector, dark mode & indexing perf by rajivml · Pull Request #47 · UiPath/danswer

rajivml · 2026-06-23T12:20:15Z

Rolls up ~3 weeks of search-quality, connector, indexing, and UX work on feature/improve-queries (115 commits) for merge into feature/darwin. All changes have been running in prod (backend vha-204, web vha-103). Brief feature summary below.

Search & answer quality

Reranking (cross-encoder) — two-level gate (RERANK_ENABLED global × Persona.rerank_enabled), unbiased single-query candidates, per-conversation chat toggle. Cluster-config-driven; currently disabled (no GPU).
LLM relevance filter — one-shot listwise call on the main LLM, independently gated; no GPU.
Source prioritization & authoritative citations (layered, global, keyed off PROTECTED_SOURCES = web/sfkbarticles/highspot/outsystems):
- reserved retrieval (SOURCE_RESERVED_RETRIEVAL_SLOTS) so curated sources reach the candidate set;
- diversity promotion (SOURCE_DIVERSITY_RESERVED_SLOTS) + per-source cap (MAX_PROMPT_DOCS_PER_SOURCE) so a chatty source can't monopolize the prompt;
- soft authoritative-sources nudge in the citation prompt;
- verify-then-retain footer (AUTHORITATIVE_CITATION_RETENTION_ENABLED) — surfaces relevant uncited authoritative docs, judged on the matched chunk against the question+answer;
- version-aware docs-link rewrite — resolves docs.uipath.com links to the version the question asks about, else latest.
Versioned-docs dedup + recency/decay documentation.

Connectors & indexing

OutSystems connector for inside.uipath.com (PDF/file extraction, large-doc hardening, skip-list for pathological pages).
Web connector: opt-in latest-N version tracking for docs.uipath.com, sitemap/prefix scoping, version-expand fallback hardening.
Highspot sync.
Indexing perf/reliability: resumable document-set sync (persisted cursor), sync-concurrency caps + per-source override, Vespa bulk-update batching + KeyError fix, prune-check cadence fix, re-drive of orphaned connector deletions.

Assistants & UX

Opt-out assistant visibility — admin-created assistants appear for all users by default; users can hide per-user (hidden_assistants).
App-wide dark mode (default) + token theming, global light/dark toggle, chat landing redesign + readability overhaul.
Assistants management UX, lazy-loaded sidebar chat history, best-match-first connector search, doc-set picker fixes, responsive admin indexing tabs, persona display_name.

Reliability, auth & ops

User-friendly API errors (no raw SQL/exception leakage), async DB pool_pre_ping/recycle, X-API-Key auth under enforced OIDC fix.
Chat history pagination, Slack response blocklist.
build-deploy.sh registry-existence guard before apply; Apple-Silicon web cloud-build routing.
Self-managed Velero overlay for weekly Vespa backups; backup/runbook docs.

Docs

Branch design doc (docs/search-quality-reranking-and-recency.md) covering the full reranking → source-prioritization → authoritative-citation pipeline, plus web-deploy / backup runbooks.

🤖 Generated with Claude Code

…d candidates Incremental, A/B-comparable reranking instead of a single global switch. Two-level gate: rerank runs only when the global master switch RERANK_ENABLED (a GPU-backed model server is deployed) AND the per-assistant opt-in Persona.rerank_enabled are both on. Default off everywhere, so existing assistants and the GPU-free local setup are unchanged and need no GPU. - Persona.rerank_enabled column (+ migration f6a7b8c9d0e1, server_default false), threaded through upsert/create_update_persona and the persona API models, with a 'Rerank results (beta)' toggle in the assistant editor. - Single resolver _resolve_skip_rerank() in retrieval_preprocessing is now the one place both chat and Slack decide reranking (Slack passes skip_rerank=None to share it). Legacy ENABLE_RERANKING_* flags kept as a fallback. - RERANK_MODEL_NAME makes the cross-encoder env-selectable (prod can pick a stronger model, e.g. BAAI/bge-reranker-v2-m3); model server warms it when RERANK_ENABLED. - Retrieval split: when reranking is on, skip the two-query source-prioritization flow (it normalizes a narrow source-filtered set independently, inflating those scores and polluting the rerank candidate window) and run a single all-sources query; when off, the legacy prioritized flow is unchanged. Driven by prioritize_sources=query.skip_rerank. - Tests: global x per-assistant resolver matrix; single-vs-two-query split. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Optional kustomize component that pins the inference-model-server onto a GPU node pool (nodeSelector agentpool=gpupool, toleration sku=gpu:NoSchedule, nvidia.com/gpu: 1) and serves the cross-encoder reranker (RERANK_MODEL_NAME, default BAAI/bge-reranker-v2-m3) alongside the embedding + intent models. Also sets real cpu/mem requests+limits (base leaves them empty -> eviction-prone). Opt in from the prod overlay's components: and set RERANK_ENABLED=true in env.properties. The existing model-server image already bundles CUDA torch, so no rebuild. Local omits the component and runs GPU-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Captures the verified query-path analysis (chat + Slack), the rerank flow and its corrected mental model, the recency decay math + levers (incl. the dead 'auto' auto-detect finding), the source-prioritization normalization bias and its two-path fix, the incremental per-assistant rollout, the GPU sizing/plan, the implementation map, and how to enable in prod. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…reranker client - LLM relevance filter is now a single listwise call on the MAIN llm (llm_eval_chunks_listwise + LISTWISE_CHUNK_FILTER_PROMPT, fails open), gated independently of reranking by LLM_RELEVANCE_FILTER_ENABLED x per-assistant llm_relevance_filter (resolver _resolve_skip_llm_chunk_filter). No GPU. - Reranker served by Hugging Face TEI on CPU when RERANK_SERVER_URL is set (CrossEncoderEnsembleModel /rerank path); model server skips loading the cross-encoder in that case. Replaces the GPU plan. - _query_vespa simplified to a single all-sources query (removed the two-query source-prioritization union and its score-inflation bias). - Source diversity moved to final selection: ensure_source_diversity in doc_pruning reserves up to SOURCE_DIVERSITY_RESERVED_SLOTS slots for PROTECTED_SOURCES, so KB/web aren't crowded out. Always-on, global, no per-assistant knob. - Chat per-conversation toggles (use_reranking / use_relevance_filter) threaded via SearchTool -> SearchRequest. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nput bar Two per-conversation switches (default off) wired through sendMessage -> CreateChatMessageRequest (use_reranking / use_relevance_filter), independent of the assistant's own settings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- optional/tei-rerank: Hugging Face TEI on CPU (bge-reranker-v2-m3, fp32), Deployment + Service, health probes, model-cache volume. - prod overlay includes it + sets RERANK_ENABLED / RERANK_SERVER_URL / LLM_RELEVANCE_FILTER_ENABLED. Local sets RERANK_ENABLED (no RERANK_SERVER_URL) so the model server loads the reranker in-process — no GPU anywhere. - Removed the optional/gpu-inference component. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…p updates - Unit: relevance-filter gating matrix, listwise parser, source-diversity promotion/caps/disable (replaces the removed two-query test). - Integration: TEI rerank transport (mocked), real CPU cross-encoder reordering (MiniLM), filter_chunks with a stub LLM. - Docs updated to the final design (TEI-on-CPU, source diversity at selection, two assistant knobs); CONTRIBUTING documents local in-process reranking. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The TEI CPU image is ONNX-Runtime-based and bge-reranker-v2-m3 ships no ONNX weights, so the stock image 404s on onnx/model.onnx. Our own image exports the model to ONNX at build time (HF Optimum; pinned optimum 1.23.3 + torch 2.2.2 + numpy<2 for the >2GB external-data path) and bakes it at /model — no runtime download, no re-download on restart, no HF dependency. Deployment uses the ACR image with --model-id /model (dropped the emptyDir that shadowed /data and the unneeded istio annotation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Built + deployed from feature/improve-queries; api-server self-migrated persona.rerank_enabled on rollout. Validated live: TEI reranker healthy and scoring correctly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

CPU serving was validated as non-viable (24-98s for 15 chunks on prod; the long blocking inference also starved /health and got the pod liveness-killed). Move reranking to a T4 GPU node pool (Standard_NC4as_T4_v3, tainted gpu=true:NoSchedule) where the same bge-reranker-v2-m3 reranks in ~0.6-3.2s (45ms for short inputs). - Use the UPSTREAM TEI GPU image directly (ghcr turing-1.5) instead of building our own: TEI's GPU backend is Candle+safetensors, which the model ships, so the ONNX-export/custom-image dance (a CPU-runtime-only constraint) is gone. Deleted the custom Dockerfile. - Take the pod out of the istio mesh (sidecar.istio.io/inject: false): leaf service, PERMISSIVE mTLS so the api-server still reaches it, and an injected sidecar isn't up during the init phase (so the prefetch init container couldn't reach the network). - Prefetch the model into the PVC with the Python HF client in an init container: TEI's Rust hf-hub client fails on HF's redirect with 'relative URL without a base'; the Python client handles it. TEI loads offline from /data/model. Cluster-side (not in repo): added a gpu=true:NoSchedule toleration to the nvidia-device-plugin-daemonset so it schedules on the tainted GPU node and advertises nvidia.com/gpu. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

PROTECTED_SOURCES and SOURCE_DIVERSITY_RESERVED_SLOTS were running on code defaults (web,sfkbarticles / 2), so the always-on diversity logic was active but invisible in the configmap. Pin them to the current defaults for visibility and tunability — no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

These api keys are service credentials for automation and intentionally don't map to a User. Enabling OIDC (AUTH_TYPE=oidc) flipped DISABLE_AUTH off, so current_user started 403'ing api-key requests (no session, and the keys don't resolve to a user) — bouncing automation into the SSO login flow. current_user now authorizes a request carrying a valid X-API-Key as an anonymous service caller (user=None, which endpoints already handle). Browser requests without a session still 403 (SSO gate intact), and a key alone does not grant admin (current_admin_user still requires an admin user). Adds request_has_valid_api_key() mirroring validate_api_key's lookup+cache, plus integration tests locking both the SSO and api-key flows. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Deployed + validated live: GET /api/persona with a valid x-api-key returns 200; without a key still 403s (SSO gate intact). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Make design-system color tokens CSS-variable-driven so they flip under a `.dark` ancestor (light values unchanged via fallbacks). Dark is now the app-wide default via a no-flash init script on <html>; users opt into light via the sidebar toggle (persisted app-wide to localStorage). - Typography: replace Inter with IBM Plex Sans (body) + Fraunces (display); empty-state headline uses the display font with a staggered entrance. - Chat polish: input bar elevated with an accent focus-glow (fixes the previously-broken focus ring); subtle atmosphere glow on the chat canvas; reusable da-fade-up motion (reduced-motion guarded). - Assistant picker ("Choose Assistant") gains a live search box (name + description) with an empty state. - Consistency sweep: map raw neutral colors to semantic tokens across shared components + search/admin so they flip correctly in dark. Followup judgment-call stragglers + per-page UX land in a separate commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Add color-scheme: dark to the .dark root so the browser renders native form controls (checkboxes, date pickers, default input backgrounds) and scrollbars in dark — fixes the glaring white checkboxes (assistant form Tools, knowledge- set list) and any browser-default white input backgrounds. - Brand the checkbox/radio accent-color to the app accent. - Give the chat Filters knowledge-set search input an explicit dark surface (bg-background + token text/placeholder) — it had no bg class and rendered a bright white box. Followup to 8c5f4f8 (kept separate so it can be reverted independently). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Deployed + verified live: web pod Running 2/2 on vha-79, build-deploy verify shows live==manifest, and /auth/login serves the dark-mode no-flash init. Build note: the local Mac amd64 web build SIGSEGVs (next build / musl under Rosetta), so vha-79 was built natively via 'az acr build' on darwinacr and copied into sfbrdevhelmweacr (docker pull/tag/push) — same digest d53f076e. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Captures the working web-image deploy flow: local Mac next build SIGSEGVs under amd64 emulation (even with Rosetta), so build natively via az acr build on darwinacr, copy into sfbrdevhelmweacr, bump tag, apply, verify. Includes the Contributor/PIM requirement and the optional service-principal path for CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The toggle previously lived only in the chat sidebar, so admin/assistants and other pages had no way to switch themes. Add a Light/Dark item to UserDropdown (the avatar menu present app-wide), reusing the same darwin-theme localStorage + .dark-on-<html> mechanism. Verified: flips both ways, label reflects state. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

next build's eslint (rules-of-hooks) failed the prod build because the darkMode useState/useEffect sat after the `if (!combinedSettings) return null` early return. Hooks must be unconditional — moved them above it. (Dev mode didn't catch this; only the production build's lint does.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Deployed + verified: web pod Running 2/2 on vha-80, verify shows live==manifest. Built via az acr build on darwinacr -> transfer to sfbrdevhelmweacr (digest a1f3167). Note: Docker Hub anon pull rate-limit on the node:20-alpine base flaked several ACR runs; succeeded once the window freed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Single source of truth = RERANK_ENABLED / LLM_RELEVANCE_FILTER_ENABLED env: - Backend surfaces the effective global flags via /settings (load_settings), mirroring the existing chat_file_max_size_mb env-injection (relevance also respects the DISABLE_LLM_CHUNK_FILTER kill-switch). - Frontend hides the per-conversation (ChatInputBar) and per-assistant (AssistantEditor) rerank/relevance toggles when disabled cluster-wide. - prod overlay: RERANK_ENABLED=false, LLM_RELEVANCE_FILTER_ENABLED=false; drop the tei-rerank component so the GPU node pool can be removed. Code is intact — flip the env (+ re-add the tei-rerank component and a GPU node pool) to re-enable. No code change needed to toggle. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…lags) Deployed + verified: load_settings() returns rerank_enabled=False, llm_relevance_filter_enabled=False — the cluster-level flags the chat + assistant UIs read to hide the toggles. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Deployed + verified: web Running 2/2 on vha-81; live /api/settings returns rerank_enabled=false / llm_relevance_filter_enabled=false, so the chat + assistant toggles are hidden. GPU pool (gput4) scaled to 0 — no GPU VM running. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Standalone prod overlay (k8s/overlays/prod-velero), applied separately from the app like prod-vespa. Backs up only the backup=vespa PVCs (Vespa index + configserver state) weekly, ttl 504h (last 3 retained). - reuses SP darwinvelero (155fae3c), which already holds Contributor on the backup/node/darwin RGs -> no Owner/UAA role assignment needed - BSL reuses the darwinaksbackup SA; disk snapshots go to the unlocked node RG - alerts via the existing robusta Prometheus stack: PrometheusRule + ServiceMonitor (release: robusta), incl. a no-recent-successful-backup staleness alert that catches silent failure - notifier CronJob posts per-run success/failure to #darwin-devs via the Slack bot token - SP secret + bot token sourced from gitignored files (templates committed) Replaces the prior self-managed Velero that silently failed for ~a year on an expired SP secret. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Sets llm_relevance_filter=false for the 3 seeded personas (Darwin, GPT, Paraphrase) so the on-boot upsert_persona() reseed doesn't re-enable the filter. A/B eval over 120 questions showed the filter yields no measurable answer-quality gain (20/20/80 A/B/tie, p~1.0) and discarded all chunks in 14% of cases. Global LLM_RELEVANCE_FILTER_ENABLED=false already gates it off; this keeps the per-assistant seed config consistent and durable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Runbook for the managed Azure Backup-for-AKS approach, kept for reference. The cluster uses self-managed Velero instead (k8s/overlays/prod-velero) because managed AKS Backup needs role assignments / Trusted Access that require Owner/UAA, which isn't available; this documents that path + the [OWNER] hand-off if it's ever pursued. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Document starting Postgres + Redis together under the -p danswer-stack project (shared danswer-stack_default network), run Vespa via the manual docker run on that same network (the compose `index` service is unreliable locally), add Redis ping/stop + no-auth notes, and set REDIS_HOST=localhost for the host-run backend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ings gating) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…essages Connector deletion is the event-driven cleanup_connector_credential_pair_task on the non-durable Redis broker. If Redis or the celery worker restarts while it's queued, the broker message is lost but the task_queue_jobs row stays PENDING forever — the connector shows "Deleting" indefinitely and nothing re-runs it (deletion, unlike sync/prune, is never periodically rescheduled), while the delete API's dedup guard blocks resubmission. We had 5 connectors stuck Deleting since Mar–May for exactly this reason. Adds a periodic celery-beat task (check_for_stuck_deletion_tasks, every 30m) that re-enqueues any cleanup task whose latest task_queue_jobs row has been non-terminal past JOB_TIMEOUT (db.tasks.get_stuck_deletion_cc_ids). The cleanup task's per-cc-pair advisory lock makes a re-enqueue a no-op if a deletion is genuinely still running, and the fresh row stays live for JOB_TIMEOUT, so this self-throttles to one re-drive per cc-pair per window. Also recovers the existing stuck connectors on first run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>