Search quality, source prioritization, OutSystems connector, dark mode & indexing perf#47
Open
rajivml wants to merge 115 commits into
Open
Search quality, source prioritization, OutSystems connector, dark mode & indexing perf#47rajivml wants to merge 115 commits into
rajivml wants to merge 115 commits into
Conversation
…d candidates Incremental, A/B-comparable reranking instead of a single global switch. Two-level gate: rerank runs only when the global master switch RERANK_ENABLED (a GPU-backed model server is deployed) AND the per-assistant opt-in Persona.rerank_enabled are both on. Default off everywhere, so existing assistants and the GPU-free local setup are unchanged and need no GPU. - Persona.rerank_enabled column (+ migration f6a7b8c9d0e1, server_default false), threaded through upsert/create_update_persona and the persona API models, with a 'Rerank results (beta)' toggle in the assistant editor. - Single resolver _resolve_skip_rerank() in retrieval_preprocessing is now the one place both chat and Slack decide reranking (Slack passes skip_rerank=None to share it). Legacy ENABLE_RERANKING_* flags kept as a fallback. - RERANK_MODEL_NAME makes the cross-encoder env-selectable (prod can pick a stronger model, e.g. BAAI/bge-reranker-v2-m3); model server warms it when RERANK_ENABLED. - Retrieval split: when reranking is on, skip the two-query source-prioritization flow (it normalizes a narrow source-filtered set independently, inflating those scores and polluting the rerank candidate window) and run a single all-sources query; when off, the legacy prioritized flow is unchanged. Driven by prioritize_sources=query.skip_rerank. - Tests: global x per-assistant resolver matrix; single-vs-two-query split. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Optional kustomize component that pins the inference-model-server onto a GPU node pool (nodeSelector agentpool=gpupool, toleration sku=gpu:NoSchedule, nvidia.com/gpu: 1) and serves the cross-encoder reranker (RERANK_MODEL_NAME, default BAAI/bge-reranker-v2-m3) alongside the embedding + intent models. Also sets real cpu/mem requests+limits (base leaves them empty -> eviction-prone). Opt in from the prod overlay's components: and set RERANK_ENABLED=true in env.properties. The existing model-server image already bundles CUDA torch, so no rebuild. Local omits the component and runs GPU-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Captures the verified query-path analysis (chat + Slack), the rerank flow and its corrected mental model, the recency decay math + levers (incl. the dead 'auto' auto-detect finding), the source-prioritization normalization bias and its two-path fix, the incremental per-assistant rollout, the GPU sizing/plan, the implementation map, and how to enable in prod. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…reranker client - LLM relevance filter is now a single listwise call on the MAIN llm (llm_eval_chunks_listwise + LISTWISE_CHUNK_FILTER_PROMPT, fails open), gated independently of reranking by LLM_RELEVANCE_FILTER_ENABLED x per-assistant llm_relevance_filter (resolver _resolve_skip_llm_chunk_filter). No GPU. - Reranker served by Hugging Face TEI on CPU when RERANK_SERVER_URL is set (CrossEncoderEnsembleModel /rerank path); model server skips loading the cross-encoder in that case. Replaces the GPU plan. - _query_vespa simplified to a single all-sources query (removed the two-query source-prioritization union and its score-inflation bias). - Source diversity moved to final selection: ensure_source_diversity in doc_pruning reserves up to SOURCE_DIVERSITY_RESERVED_SLOTS slots for PROTECTED_SOURCES, so KB/web aren't crowded out. Always-on, global, no per-assistant knob. - Chat per-conversation toggles (use_reranking / use_relevance_filter) threaded via SearchTool -> SearchRequest. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nput bar Two per-conversation switches (default off) wired through sendMessage -> CreateChatMessageRequest (use_reranking / use_relevance_filter), independent of the assistant's own settings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- optional/tei-rerank: Hugging Face TEI on CPU (bge-reranker-v2-m3, fp32), Deployment + Service, health probes, model-cache volume. - prod overlay includes it + sets RERANK_ENABLED / RERANK_SERVER_URL / LLM_RELEVANCE_FILTER_ENABLED. Local sets RERANK_ENABLED (no RERANK_SERVER_URL) so the model server loads the reranker in-process — no GPU anywhere. - Removed the optional/gpu-inference component. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…p updates - Unit: relevance-filter gating matrix, listwise parser, source-diversity promotion/caps/disable (replaces the removed two-query test). - Integration: TEI rerank transport (mocked), real CPU cross-encoder reordering (MiniLM), filter_chunks with a stub LLM. - Docs updated to the final design (TEI-on-CPU, source diversity at selection, two assistant knobs); CONTRIBUTING documents local in-process reranking. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The TEI CPU image is ONNX-Runtime-based and bge-reranker-v2-m3 ships no ONNX weights, so the stock image 404s on onnx/model.onnx. Our own image exports the model to ONNX at build time (HF Optimum; pinned optimum 1.23.3 + torch 2.2.2 + numpy<2 for the >2GB external-data path) and bakes it at /model — no runtime download, no re-download on restart, no HF dependency. Deployment uses the ACR image with --model-id /model (dropped the emptyDir that shadowed /data and the unneeded istio annotation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Built + deployed from feature/improve-queries; api-server self-migrated persona.rerank_enabled on rollout. Validated live: TEI reranker healthy and scoring correctly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CPU serving was validated as non-viable (24-98s for 15 chunks on prod; the long blocking inference also starved /health and got the pod liveness-killed). Move reranking to a T4 GPU node pool (Standard_NC4as_T4_v3, tainted gpu=true:NoSchedule) where the same bge-reranker-v2-m3 reranks in ~0.6-3.2s (45ms for short inputs). - Use the UPSTREAM TEI GPU image directly (ghcr turing-1.5) instead of building our own: TEI's GPU backend is Candle+safetensors, which the model ships, so the ONNX-export/custom-image dance (a CPU-runtime-only constraint) is gone. Deleted the custom Dockerfile. - Take the pod out of the istio mesh (sidecar.istio.io/inject: false): leaf service, PERMISSIVE mTLS so the api-server still reaches it, and an injected sidecar isn't up during the init phase (so the prefetch init container couldn't reach the network). - Prefetch the model into the PVC with the Python HF client in an init container: TEI's Rust hf-hub client fails on HF's redirect with 'relative URL without a base'; the Python client handles it. TEI loads offline from /data/model. Cluster-side (not in repo): added a gpu=true:NoSchedule toleration to the nvidia-device-plugin-daemonset so it schedules on the tainted GPU node and advertises nvidia.com/gpu. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PROTECTED_SOURCES and SOURCE_DIVERSITY_RESERVED_SLOTS were running on code defaults (web,sfkbarticles / 2), so the always-on diversity logic was active but invisible in the configmap. Pin them to the current defaults for visibility and tunability — no behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
These api keys are service credentials for automation and intentionally don't map to a User. Enabling OIDC (AUTH_TYPE=oidc) flipped DISABLE_AUTH off, so current_user started 403'ing api-key requests (no session, and the keys don't resolve to a user) — bouncing automation into the SSO login flow. current_user now authorizes a request carrying a valid X-API-Key as an anonymous service caller (user=None, which endpoints already handle). Browser requests without a session still 403 (SSO gate intact), and a key alone does not grant admin (current_admin_user still requires an admin user). Adds request_has_valid_api_key() mirroring validate_api_key's lookup+cache, plus integration tests locking both the SSO and api-key flows. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deployed + validated live: GET /api/persona with a valid x-api-key returns 200; without a key still 403s (SSO gate intact). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Make design-system color tokens CSS-variable-driven so they flip under a
`.dark` ancestor (light values unchanged via fallbacks). Dark is now the
app-wide default via a no-flash init script on <html>; users opt into light
via the sidebar toggle (persisted app-wide to localStorage).
- Typography: replace Inter with IBM Plex Sans (body) + Fraunces (display);
empty-state headline uses the display font with a staggered entrance.
- Chat polish: input bar elevated with an accent focus-glow (fixes the
previously-broken focus ring); subtle atmosphere glow on the chat canvas;
reusable da-fade-up motion (reduced-motion guarded).
- Assistant picker ("Choose Assistant") gains a live search box (name +
description) with an empty state.
- Consistency sweep: map raw neutral colors to semantic tokens across shared
components + search/admin so they flip correctly in dark.
Followup judgment-call stragglers + per-page UX land in a separate commit.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Add color-scheme: dark to the .dark root so the browser renders native form controls (checkboxes, date pickers, default input backgrounds) and scrollbars in dark — fixes the glaring white checkboxes (assistant form Tools, knowledge- set list) and any browser-default white input backgrounds. - Brand the checkbox/radio accent-color to the app accent. - Give the chat Filters knowledge-set search input an explicit dark surface (bg-background + token text/placeholder) — it had no bg class and rendered a bright white box. Followup to 8c5f4f8 (kept separate so it can be reverted independently). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deployed + verified live: web pod Running 2/2 on vha-79, build-deploy verify shows live==manifest, and /auth/login serves the dark-mode no-flash init. Build note: the local Mac amd64 web build SIGSEGVs (next build / musl under Rosetta), so vha-79 was built natively via 'az acr build' on darwinacr and copied into sfbrdevhelmweacr (docker pull/tag/push) — same digest d53f076e. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Captures the working web-image deploy flow: local Mac next build SIGSEGVs under amd64 emulation (even with Rosetta), so build natively via az acr build on darwinacr, copy into sfbrdevhelmweacr, bump tag, apply, verify. Includes the Contributor/PIM requirement and the optional service-principal path for CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The toggle previously lived only in the chat sidebar, so admin/assistants and other pages had no way to switch themes. Add a Light/Dark item to UserDropdown (the avatar menu present app-wide), reusing the same darwin-theme localStorage + .dark-on-<html> mechanism. Verified: flips both ways, label reflects state. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
next build's eslint (rules-of-hooks) failed the prod build because the darkMode useState/useEffect sat after the `if (!combinedSettings) return null` early return. Hooks must be unconditional — moved them above it. (Dev mode didn't catch this; only the production build's lint does.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deployed + verified: web pod Running 2/2 on vha-80, verify shows live==manifest. Built via az acr build on darwinacr -> transfer to sfbrdevhelmweacr (digest a1f3167). Note: Docker Hub anon pull rate-limit on the node:20-alpine base flaked several ACR runs; succeeded once the window freed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Single source of truth = RERANK_ENABLED / LLM_RELEVANCE_FILTER_ENABLED env: - Backend surfaces the effective global flags via /settings (load_settings), mirroring the existing chat_file_max_size_mb env-injection (relevance also respects the DISABLE_LLM_CHUNK_FILTER kill-switch). - Frontend hides the per-conversation (ChatInputBar) and per-assistant (AssistantEditor) rerank/relevance toggles when disabled cluster-wide. - prod overlay: RERANK_ENABLED=false, LLM_RELEVANCE_FILTER_ENABLED=false; drop the tei-rerank component so the GPU node pool can be removed. Code is intact — flip the env (+ re-add the tei-rerank component and a GPU node pool) to re-enable. No code change needed to toggle. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lags) Deployed + verified: load_settings() returns rerank_enabled=False, llm_relevance_filter_enabled=False — the cluster-level flags the chat + assistant UIs read to hide the toggles. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deployed + verified: web Running 2/2 on vha-81; live /api/settings returns rerank_enabled=false / llm_relevance_filter_enabled=false, so the chat + assistant toggles are hidden. GPU pool (gput4) scaled to 0 — no GPU VM running. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Standalone prod overlay (k8s/overlays/prod-velero), applied separately from the app like prod-vespa. Backs up only the backup=vespa PVCs (Vespa index + configserver state) weekly, ttl 504h (last 3 retained). - reuses SP darwinvelero (155fae3c), which already holds Contributor on the backup/node/darwin RGs -> no Owner/UAA role assignment needed - BSL reuses the darwinaksbackup SA; disk snapshots go to the unlocked node RG - alerts via the existing robusta Prometheus stack: PrometheusRule + ServiceMonitor (release: robusta), incl. a no-recent-successful-backup staleness alert that catches silent failure - notifier CronJob posts per-run success/failure to #darwin-devs via the Slack bot token - SP secret + bot token sourced from gitignored files (templates committed) Replaces the prior self-managed Velero that silently failed for ~a year on an expired SP secret. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sets llm_relevance_filter=false for the 3 seeded personas (Darwin, GPT, Paraphrase) so the on-boot upsert_persona() reseed doesn't re-enable the filter. A/B eval over 120 questions showed the filter yields no measurable answer-quality gain (20/20/80 A/B/tie, p~1.0) and discarded all chunks in 14% of cases. Global LLM_RELEVANCE_FILTER_ENABLED=false already gates it off; this keeps the per-assistant seed config consistent and durable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Runbook for the managed Azure Backup-for-AKS approach, kept for reference. The cluster uses self-managed Velero instead (k8s/overlays/prod-velero) because managed AKS Backup needs role assignments / Trusted Access that require Owner/UAA, which isn't available; this documents that path + the [OWNER] hand-off if it's ever pursued. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document starting Postgres + Redis together under the -p danswer-stack project (shared danswer-stack_default network), run Vespa via the manual docker run on that same network (the compose `index` service is unreliable locally), add Redis ping/stop + no-auth notes, and set REDIS_HOST=localhost for the host-run backend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ings gating) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…essages Connector deletion is the event-driven cleanup_connector_credential_pair_task on the non-durable Redis broker. If Redis or the celery worker restarts while it's queued, the broker message is lost but the task_queue_jobs row stays PENDING forever — the connector shows "Deleting" indefinitely and nothing re-runs it (deletion, unlike sync/prune, is never periodically rescheduled), while the delete API's dedup guard blocks resubmission. We had 5 connectors stuck Deleting since Mar–May for exactly this reason. Adds a periodic celery-beat task (check_for_stuck_deletion_tasks, every 30m) that re-enqueues any cleanup task whose latest task_queue_jobs row has been non-terminal past JOB_TIMEOUT (db.tasks.get_stuck_deletion_cc_ids). The cleanup task's per-cc-pair advisory lock makes a re-enqueue a no-op if a deletion is genuinely still running, and the fresh row stays live for JOB_TIMEOUT, so this self-throttles to one re-drive per cc-pair per window. Also recovers the existing stuck connectors on first run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eration) Citations are the LLM's output and it inconsistently cites curated sources even when they're promoted to prompt position [1] (confirmed in chat + Slack across runs). Prompt nudging can't reliably fix this. Add a deterministic post-generation step in the shared Answer pipeline (covers chat + Slack, all assistants): - additive: the LLM's own inline citations are untouched - deduped: only authoritative (PROTECTED_SOURCES) docs the LLM did NOT cite, deduped by document_id (same page often appears as multiple chunks) - honest: one batched LLM call verifies each candidate actually SUPPORTS a statement in the answer (topic-match is not enough); fail-closed on any error - bounded: at most ONE extra call, and only when an uncited authoritative doc is in context (no call otherwise) Supporting docs are appended as an "Authoritative sources" markdown footer (renders in chat UI + Slack). Gated by AUTHORITATIVE_CITATION_RETENTION_ENABLED (default off; prod on). In this deployment fast_llm == main llm, so the verify call uses self.llm. - chat_configs: AUTHORITATIVE_CITATION_RETENTION_ENABLED - llm/answering/authoritative_retention.py: select/verify/footer + orchestrator - llm/answering/answer.py: accumulate answer+cited-ids in _process_stream, append verified footer after the citation stream - prod env: enabled - tests: 14 unit tests (selection/dedupe/parse/verify-fail-closed/footer) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ations) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The verify step is a non-streaming gateway completion that occasionally times out (observed transiently; normally 1.6-5.5s). Retry once before failing closed so a single gateway hiccup doesn't drop the authoritative-sources footer. Still fail-closed after retries (never appends on a real error). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… tighten gate Two changes per product feedback: 1. Single Sources section (no footer): inject the verified authoritative doc as a CitationInfo using its context position as citation_num. Since authoritative docs are promoted to positions 1-3 and the UI orders the Sources group by citation number (JS integer-key order / Slack list), it lands at the TOP of the existing single "Sources" section. Drops the separate markdown footer. 2. Tighter gate: only run when the answer cites NO authoritative source at all. If the LLM already cited any PROTECTED_SOURCES doc, do nothing (no verify call) — the answer is already authoritatively grounded. Net: at most one conditional verify call, only on answers missing authoritative citations; result merges into the one Sources section instead of a second block. - authoritative_retention.py: gate in select_authoritative_candidates; retained_authoritative_citations() returns CitationInfo (footer removed) - answer.py: yield the CitationInfo packets after the stream - tests: 18 (gate skip + no-LLM-call, context-position citation_num, verify reject) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ources) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ollided) Merging the retained authoritative doc into the single Sources section assigned it a citation_num equal to its context position — which collides with the LLM's own citations (it owns the low numbers), and translate_citations de-dupes first-wins, so the injected citation was dropped (validated: 0 OutSystems shown). Putting it at the TOP of the numbered list would require renumbering the LLM's inline [[n]], which breaks the inline links. Revert to the 'Authoritative sources' footer block (reliably surfaces the link), but KEEP the tightened gate from the prior change: only run when the answer cites NO authoritative source at all (no verify call otherwise). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Strict 'directly supports a statement' verification dropped clearly-relevant authoritative docs whose content didn't literally restate the answer — e.g. the 'Elastic Robot Orchestration Setup For AWS' KB article was at prompt position #1 for an ERO/Automation-Suite question but got filtered out because the answer's key statement was a negative ('not available self-hosted') the setup guide doesn't literally assert. Loosen the verify prompt to 'is a RELEVANT authoritative reference for this answer (same subject/scenario; need not restate it)'. Surfaces the authoritative KB/docs link in these cases; tradeoff is occasionally a topically-related doc, acceptable for the support-engineer use case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ance) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s cited
The tightened gate ('skip if any authoritative source was cited') was too coarse:
on an Orchestrator upgrade question the LLM cited a KB article (sfkbarticles) +
Slack, which suppressed surfacing the relevant, uncited docs.uipath.com page
(/2023.10/.../maintenance-considerations) that was at the front of the prompt — a
regression vs the old 2-phase union. Loosen: candidates = any uncited relevant
authoritative doc, regardless of whether some other authoritative source was
cited. Verify call now fires whenever an uncited authoritative doc is present
(still conditional/batched/retry-guarded).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…!= relevant) Relevance verify was too loose: for an 'AI in standalone Orchestrator' question it surfaced 'Does Standalone Orchestrator leverage Azure SignalR Service?' (a messaging/transport doc) as an authoritative reference. Tighten to require the doc be about the SAME SPECIFIC topic/feature, not merely the same product — with an explicit messaging/infra-vs-AI exclusion and 'when unsure, exclude'. Middle ground between strict 'supports-a-statement' (missed relevant docs) and loose 'same-subject' (false positives). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The messaging-vs-AI example was overfit to one question. State the general rule: the doc must be about the SAME SPECIFIC topic/feature, not merely the same product; a different feature/component/service isn't a relevant reference. No domain example. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… signal) The verify was judging relevance from the title + first 600 chars of the doc — usually boilerplate, not the passage that matched — causing oscillation (signalR false-positive when loose; 'Before you upgrade' false-negative when strict). Now pass each candidate's full MATCHED PASSAGE (LlmDoc.content = the retrieved chunk; capped at 4000 chars) and judge from that, with rebalanced wording (same topic as the answer = include; different feature/component = exclude). Candidate list is small (1-3), so the extra tokens are bounded. This makes precision come from the chunk rather than from overfit prompt strictness. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The reserved pass guaranteed only the top-3 protected docs into the candidate set; relevant docs ranked #4 among protected sources (e.g. Orchestrator Maintenance Considerations on an upgrade question) fell off and never reached verify/footer. Bump to 6 so more of the top protected docs are guaranteed in; the per-source prompt cap still bounds how many reach the LLM. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… indexed The version dedup only collapses versions that were retrieved, so when retrieval surfaces a stale version (content is near-identical across versions, so which one ranks is ~arbitrary) the old link is shown even though newer versions are indexed (e.g. /2023.4/ surfaced while /2025.10/ exists). After pruning, rewrite each versioned docs.uipath.com link to the NEWEST version of that same page (same URL with the version segment stripped, slug preserved) found in the index. Fixes both inline docs citations and the authoritative footer (both use final_context_docs links). One PK-indexed prefix-scan per distinct page-prefix; no reindex. - doc_pruning: _versioned_url_parts + rewrite_docs_links_to_latest(db_session) - search_tool: rewrite final_context_documents after prune - tests: 6 (parse, rewrite-to-latest, noop-when-latest, slug-variants, non-docs, new-scheme) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… version) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Blindly rewriting docs links to latest is wrong for version-specific questions
("is X supported in 23.10?" must resolve to the 23.10 doc, not latest). Make it
version-aware: parse_question_doc_version() extracts a version from the question
(23.10 -> 2023.10) only when EXACTLY one is named (multiple, e.g. 'upgrade 23.10 to
25.10', is ambiguous -> None -> latest). rewrite_docs_links() then resolves each
docs page to that exact indexed version (even if older than retrieved); falls back
to newest indexed when no version is specified. Not confused by '2.9 million'.
- doc_pruning: parse_question_doc_version + rewrite_docs_links(target_version)
- search_tool: parse from the query, pass through
- tests: parse cases (single/multiple/none/2.9M), target-older, target-not-indexed
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Judging relevance only against the ANSWER let cloud-connectivity-adjacent KBs through non-deterministically: for 'does standalone Orchestrator have AI?' the answer says 'AI requires an Automation Cloud connection', so KBs sharing that theme (Azure SignalR, 'Automation Cloud cannot be accessed' Studio error) matched the answer even though they don't address AI capability. Pass the QUESTION to the verify and judge whether the doc helps answer THE QUESTION's specific subject — keyword/product/service overlap is explicitly not enough, and error/troubleshooting docs are excluded unless that's what's asked. When in doubt, exclude. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…answer) Replace the carve-out-heavy wording with a clean rule: include a candidate only if its matched passage is genuinely relevant to BOTH the question and the answer. Requiring relevance to the question (not just the answer) is what excludes cloud-connectivity-adjacent KBs (Azure SignalR, 'Automation Cloud cannot be accessed' Studio error) that shared the answer's 'needs cloud connection' theme but don't address the question's AI-capability subject. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ive-citation pipeline Rewrite §5 from 'source diversity' into the full layered pipeline as built: 5.1 reserved retrieval (recall, SOURCE_RESERVED_RETRIEVAL_SLOTS=6) 5.2 diversity promotion (SOURCE_DIVERSITY_RESERVED_SLOTS=3) 5.3 per-source cap (MAX_PROMPT_DOCS_PER_SOURCE=8) 5.4 authoritative-sources nudge (soft, from PROTECTED_SOURCES) 5.5 verify-then-retain footer (chunk + question/answer relevance; why footer not merged) 5.6 version-aware docs link rewrite Plus the citation-attribution lesson (presence != citation), PROTECTED_SOURCES now web,sfkbarticles,highspot,outsystems, updated §6b/§8/§9, and §10 follow-ups (configmap externalization, inline-card version gap, indexed-content freshness, DB-backed prompts). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
swati354
approved these changes
Jun 23, 2026
Sarath1018
approved these changes
Jun 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rolls up ~3 weeks of search-quality, connector, indexing, and UX work on
feature/improve-queries(115 commits) for merge intofeature/darwin. All changes have been running in prod (backendvha-204, webvha-103). Brief feature summary below.Search & answer quality
RERANK_ENABLEDglobal ×Persona.rerank_enabled), unbiased single-query candidates, per-conversation chat toggle. Cluster-config-driven; currently disabled (no GPU).PROTECTED_SOURCES= web/sfkbarticles/highspot/outsystems):SOURCE_RESERVED_RETRIEVAL_SLOTS) so curated sources reach the candidate set;SOURCE_DIVERSITY_RESERVED_SLOTS) + per-source cap (MAX_PROMPT_DOCS_PER_SOURCE) so a chatty source can't monopolize the prompt;AUTHORITATIVE_CITATION_RETENTION_ENABLED) — surfaces relevant uncited authoritative docs, judged on the matched chunk against the question+answer;Connectors & indexing
Assistants & UX
hidden_assistants).display_name.Reliability, auth & ops
pool_pre_ping/recycle, X-API-Key auth under enforced OIDC fix.build-deploy.shregistry-existence guard before apply; Apple-Silicon web cloud-build routing.Docs
docs/search-quality-reranking-and-recency.md) covering the full reranking → source-prioritization → authoritative-citation pipeline, plus web-deploy / backup runbooks.🤖 Generated with Claude Code