Skip to content

Search quality, source prioritization, OutSystems connector, dark mode & indexing perf#47

Open
rajivml wants to merge 115 commits into
feature/darwinfrom
feature/improve-queries
Open

Search quality, source prioritization, OutSystems connector, dark mode & indexing perf#47
rajivml wants to merge 115 commits into
feature/darwinfrom
feature/improve-queries

Conversation

@rajivml

@rajivml rajivml commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Rolls up ~3 weeks of search-quality, connector, indexing, and UX work on feature/improve-queries (115 commits) for merge into feature/darwin. All changes have been running in prod (backend vha-204, web vha-103). Brief feature summary below.

Search & answer quality

  • Reranking (cross-encoder) — two-level gate (RERANK_ENABLED global × Persona.rerank_enabled), unbiased single-query candidates, per-conversation chat toggle. Cluster-config-driven; currently disabled (no GPU).
  • LLM relevance filter — one-shot listwise call on the main LLM, independently gated; no GPU.
  • Source prioritization & authoritative citations (layered, global, keyed off PROTECTED_SOURCES = web/sfkbarticles/highspot/outsystems):
    • reserved retrieval (SOURCE_RESERVED_RETRIEVAL_SLOTS) so curated sources reach the candidate set;
    • diversity promotion (SOURCE_DIVERSITY_RESERVED_SLOTS) + per-source cap (MAX_PROMPT_DOCS_PER_SOURCE) so a chatty source can't monopolize the prompt;
    • soft authoritative-sources nudge in the citation prompt;
    • verify-then-retain footer (AUTHORITATIVE_CITATION_RETENTION_ENABLED) — surfaces relevant uncited authoritative docs, judged on the matched chunk against the question+answer;
    • version-aware docs-link rewrite — resolves docs.uipath.com links to the version the question asks about, else latest.
  • Versioned-docs dedup + recency/decay documentation.

Connectors & indexing

  • OutSystems connector for inside.uipath.com (PDF/file extraction, large-doc hardening, skip-list for pathological pages).
  • Web connector: opt-in latest-N version tracking for docs.uipath.com, sitemap/prefix scoping, version-expand fallback hardening.
  • Highspot sync.
  • Indexing perf/reliability: resumable document-set sync (persisted cursor), sync-concurrency caps + per-source override, Vespa bulk-update batching + KeyError fix, prune-check cadence fix, re-drive of orphaned connector deletions.

Assistants & UX

  • Opt-out assistant visibility — admin-created assistants appear for all users by default; users can hide per-user (hidden_assistants).
  • App-wide dark mode (default) + token theming, global light/dark toggle, chat landing redesign + readability overhaul.
  • Assistants management UX, lazy-loaded sidebar chat history, best-match-first connector search, doc-set picker fixes, responsive admin indexing tabs, persona display_name.

Reliability, auth & ops

  • User-friendly API errors (no raw SQL/exception leakage), async DB pool_pre_ping/recycle, X-API-Key auth under enforced OIDC fix.
  • Chat history pagination, Slack response blocklist.
  • build-deploy.sh registry-existence guard before apply; Apple-Silicon web cloud-build routing.
  • Self-managed Velero overlay for weekly Vespa backups; backup/runbook docs.

Docs

  • Branch design doc (docs/search-quality-reranking-and-recency.md) covering the full reranking → source-prioritization → authoritative-citation pipeline, plus web-deploy / backup runbooks.

🤖 Generated with Claude Code

rajivml and others added 30 commits June 3, 2026 22:26
…d candidates

Incremental, A/B-comparable reranking instead of a single global switch.

Two-level gate: rerank runs only when the global master switch RERANK_ENABLED
(a GPU-backed model server is deployed) AND the per-assistant opt-in
Persona.rerank_enabled are both on. Default off everywhere, so existing
assistants and the GPU-free local setup are unchanged and need no GPU.

- Persona.rerank_enabled column (+ migration f6a7b8c9d0e1, server_default false),
  threaded through upsert/create_update_persona and the persona API models, with
  a 'Rerank results (beta)' toggle in the assistant editor.
- Single resolver _resolve_skip_rerank() in retrieval_preprocessing is now the
  one place both chat and Slack decide reranking (Slack passes skip_rerank=None
  to share it). Legacy ENABLE_RERANKING_* flags kept as a fallback.
- RERANK_MODEL_NAME makes the cross-encoder env-selectable (prod can pick a
  stronger model, e.g. BAAI/bge-reranker-v2-m3); model server warms it when
  RERANK_ENABLED.
- Retrieval split: when reranking is on, skip the two-query source-prioritization
  flow (it normalizes a narrow source-filtered set independently, inflating those
  scores and polluting the rerank candidate window) and run a single all-sources
  query; when off, the legacy prioritized flow is unchanged. Driven by
  prioritize_sources=query.skip_rerank.
- Tests: global x per-assistant resolver matrix; single-vs-two-query split.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Optional kustomize component that pins the inference-model-server onto a GPU
node pool (nodeSelector agentpool=gpupool, toleration sku=gpu:NoSchedule,
nvidia.com/gpu: 1) and serves the cross-encoder reranker (RERANK_MODEL_NAME,
default BAAI/bge-reranker-v2-m3) alongside the embedding + intent models. Also
sets real cpu/mem requests+limits (base leaves them empty -> eviction-prone).

Opt in from the prod overlay's components: and set RERANK_ENABLED=true in
env.properties. The existing model-server image already bundles CUDA torch, so
no rebuild. Local omits the component and runs GPU-free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Captures the verified query-path analysis (chat + Slack), the rerank flow and
its corrected mental model, the recency decay math + levers (incl. the dead
'auto' auto-detect finding), the source-prioritization normalization bias and
its two-path fix, the incremental per-assistant rollout, the GPU sizing/plan,
the implementation map, and how to enable in prod.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…reranker client

- LLM relevance filter is now a single listwise call on the MAIN llm
  (llm_eval_chunks_listwise + LISTWISE_CHUNK_FILTER_PROMPT, fails open), gated
  independently of reranking by LLM_RELEVANCE_FILTER_ENABLED x per-assistant
  llm_relevance_filter (resolver _resolve_skip_llm_chunk_filter). No GPU.
- Reranker served by Hugging Face TEI on CPU when RERANK_SERVER_URL is set
  (CrossEncoderEnsembleModel /rerank path); model server skips loading the
  cross-encoder in that case. Replaces the GPU plan.
- _query_vespa simplified to a single all-sources query (removed the two-query
  source-prioritization union and its score-inflation bias).
- Source diversity moved to final selection: ensure_source_diversity in
  doc_pruning reserves up to SOURCE_DIVERSITY_RESERVED_SLOTS slots for
  PROTECTED_SOURCES, so KB/web aren't crowded out. Always-on, global, no
  per-assistant knob.
- Chat per-conversation toggles (use_reranking / use_relevance_filter) threaded
  via SearchTool -> SearchRequest.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nput bar

Two per-conversation switches (default off) wired through sendMessage ->
CreateChatMessageRequest (use_reranking / use_relevance_filter), independent of
the assistant's own settings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- optional/tei-rerank: Hugging Face TEI on CPU (bge-reranker-v2-m3, fp32),
  Deployment + Service, health probes, model-cache volume.
- prod overlay includes it + sets RERANK_ENABLED / RERANK_SERVER_URL /
  LLM_RELEVANCE_FILTER_ENABLED. Local sets RERANK_ENABLED (no RERANK_SERVER_URL)
  so the model server loads the reranker in-process — no GPU anywhere.
- Removed the optional/gpu-inference component.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…p updates

- Unit: relevance-filter gating matrix, listwise parser, source-diversity
  promotion/caps/disable (replaces the removed two-query test).
- Integration: TEI rerank transport (mocked), real CPU cross-encoder reordering
  (MiniLM), filter_chunks with a stub LLM.
- Docs updated to the final design (TEI-on-CPU, source diversity at selection,
  two assistant knobs); CONTRIBUTING documents local in-process reranking.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The TEI CPU image is ONNX-Runtime-based and bge-reranker-v2-m3 ships no ONNX
weights, so the stock image 404s on onnx/model.onnx. Our own image exports the
model to ONNX at build time (HF Optimum; pinned optimum 1.23.3 + torch 2.2.2 +
numpy<2 for the >2GB external-data path) and bakes it at /model — no runtime
download, no re-download on restart, no HF dependency. Deployment uses the ACR
image with --model-id /model (dropped the emptyDir that shadowed /data and the
unneeded istio annotation).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Built + deployed from feature/improve-queries; api-server self-migrated
persona.rerank_enabled on rollout. Validated live: TEI reranker healthy and
scoring correctly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CPU serving was validated as non-viable (24-98s for 15 chunks on prod; the long
blocking inference also starved /health and got the pod liveness-killed). Move
reranking to a T4 GPU node pool (Standard_NC4as_T4_v3, tainted gpu=true:NoSchedule)
where the same bge-reranker-v2-m3 reranks in ~0.6-3.2s (45ms for short inputs).

- Use the UPSTREAM TEI GPU image directly (ghcr turing-1.5) instead of building
  our own: TEI's GPU backend is Candle+safetensors, which the model ships, so the
  ONNX-export/custom-image dance (a CPU-runtime-only constraint) is gone. Deleted
  the custom Dockerfile.
- Take the pod out of the istio mesh (sidecar.istio.io/inject: false): leaf
  service, PERMISSIVE mTLS so the api-server still reaches it, and an injected
  sidecar isn't up during the init phase (so the prefetch init container couldn't
  reach the network).
- Prefetch the model into the PVC with the Python HF client in an init container:
  TEI's Rust hf-hub client fails on HF's redirect with 'relative URL without a
  base'; the Python client handles it. TEI loads offline from /data/model.

Cluster-side (not in repo): added a gpu=true:NoSchedule toleration to the
nvidia-device-plugin-daemonset so it schedules on the tainted GPU node and
advertises nvidia.com/gpu.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PROTECTED_SOURCES and SOURCE_DIVERSITY_RESERVED_SLOTS were running on code
defaults (web,sfkbarticles / 2), so the always-on diversity logic was active
but invisible in the configmap. Pin them to the current defaults for visibility
and tunability — no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
These api keys are service credentials for automation and intentionally don't
map to a User. Enabling OIDC (AUTH_TYPE=oidc) flipped DISABLE_AUTH off, so
current_user started 403'ing api-key requests (no session, and the keys don't
resolve to a user) — bouncing automation into the SSO login flow.

current_user now authorizes a request carrying a valid X-API-Key as an
anonymous service caller (user=None, which endpoints already handle). Browser
requests without a session still 403 (SSO gate intact), and a key alone does
not grant admin (current_admin_user still requires an admin user). Adds
request_has_valid_api_key() mirroring validate_api_key's lookup+cache, plus
integration tests locking both the SSO and api-key flows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deployed + validated live: GET /api/persona with a valid x-api-key returns 200;
without a key still 403s (SSO gate intact).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Make design-system color tokens CSS-variable-driven so they flip under a
  `.dark` ancestor (light values unchanged via fallbacks). Dark is now the
  app-wide default via a no-flash init script on <html>; users opt into light
  via the sidebar toggle (persisted app-wide to localStorage).
- Typography: replace Inter with IBM Plex Sans (body) + Fraunces (display);
  empty-state headline uses the display font with a staggered entrance.
- Chat polish: input bar elevated with an accent focus-glow (fixes the
  previously-broken focus ring); subtle atmosphere glow on the chat canvas;
  reusable da-fade-up motion (reduced-motion guarded).
- Assistant picker ("Choose Assistant") gains a live search box (name +
  description) with an empty state.
- Consistency sweep: map raw neutral colors to semantic tokens across shared
  components + search/admin so they flip correctly in dark.

Followup judgment-call stragglers + per-page UX land in a separate commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Add color-scheme: dark to the .dark root so the browser renders native form
  controls (checkboxes, date pickers, default input backgrounds) and scrollbars
  in dark — fixes the glaring white checkboxes (assistant form Tools, knowledge-
  set list) and any browser-default white input backgrounds.
- Brand the checkbox/radio accent-color to the app accent.
- Give the chat Filters knowledge-set search input an explicit dark surface
  (bg-background + token text/placeholder) — it had no bg class and rendered a
  bright white box.

Followup to 8c5f4f8 (kept separate so it can be reverted independently).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deployed + verified live: web pod Running 2/2 on vha-79, build-deploy verify
shows live==manifest, and /auth/login serves the dark-mode no-flash init.

Build note: the local Mac amd64 web build SIGSEGVs (next build / musl under
Rosetta), so vha-79 was built natively via 'az acr build' on darwinacr and
copied into sfbrdevhelmweacr (docker pull/tag/push) — same digest d53f076e.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Captures the working web-image deploy flow: local Mac next build SIGSEGVs under
amd64 emulation (even with Rosetta), so build natively via az acr build on
darwinacr, copy into sfbrdevhelmweacr, bump tag, apply, verify. Includes the
Contributor/PIM requirement and the optional service-principal path for CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The toggle previously lived only in the chat sidebar, so admin/assistants and
other pages had no way to switch themes. Add a Light/Dark item to UserDropdown
(the avatar menu present app-wide), reusing the same darwin-theme localStorage +
.dark-on-<html> mechanism. Verified: flips both ways, label reflects state.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
next build's eslint (rules-of-hooks) failed the prod build because the darkMode
useState/useEffect sat after the `if (!combinedSettings) return null` early
return. Hooks must be unconditional — moved them above it. (Dev mode didn't
catch this; only the production build's lint does.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deployed + verified: web pod Running 2/2 on vha-80, verify shows live==manifest.
Built via az acr build on darwinacr -> transfer to sfbrdevhelmweacr (digest
a1f3167). Note: Docker Hub anon pull rate-limit on the node:20-alpine base
flaked several ACR runs; succeeded once the window freed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Single source of truth = RERANK_ENABLED / LLM_RELEVANCE_FILTER_ENABLED env:
- Backend surfaces the effective global flags via /settings (load_settings),
  mirroring the existing chat_file_max_size_mb env-injection (relevance also
  respects the DISABLE_LLM_CHUNK_FILTER kill-switch).
- Frontend hides the per-conversation (ChatInputBar) and per-assistant
  (AssistantEditor) rerank/relevance toggles when disabled cluster-wide.
- prod overlay: RERANK_ENABLED=false, LLM_RELEVANCE_FILTER_ENABLED=false; drop
  the tei-rerank component so the GPU node pool can be removed.

Code is intact — flip the env (+ re-add the tei-rerank component and a GPU node
pool) to re-enable. No code change needed to toggle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lags)

Deployed + verified: load_settings() returns rerank_enabled=False,
llm_relevance_filter_enabled=False — the cluster-level flags the chat + assistant
UIs read to hide the toggles.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deployed + verified: web Running 2/2 on vha-81; live /api/settings returns
rerank_enabled=false / llm_relevance_filter_enabled=false, so the chat + assistant
toggles are hidden. GPU pool (gput4) scaled to 0 — no GPU VM running.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Standalone prod overlay (k8s/overlays/prod-velero), applied separately from
the app like prod-vespa. Backs up only the backup=vespa PVCs (Vespa index +
configserver state) weekly, ttl 504h (last 3 retained).

- reuses SP darwinvelero (155fae3c), which already holds Contributor on the
  backup/node/darwin RGs -> no Owner/UAA role assignment needed
- BSL reuses the darwinaksbackup SA; disk snapshots go to the unlocked node RG
- alerts via the existing robusta Prometheus stack: PrometheusRule +
  ServiceMonitor (release: robusta), incl. a no-recent-successful-backup
  staleness alert that catches silent failure
- notifier CronJob posts per-run success/failure to #darwin-devs via the
  Slack bot token
- SP secret + bot token sourced from gitignored files (templates committed)

Replaces the prior self-managed Velero that silently failed for ~a year on an
expired SP secret.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sets llm_relevance_filter=false for the 3 seeded personas (Darwin, GPT,
Paraphrase) so the on-boot upsert_persona() reseed doesn't re-enable the
filter. A/B eval over 120 questions showed the filter yields no measurable
answer-quality gain (20/20/80 A/B/tie, p~1.0) and discarded all chunks in
14% of cases. Global LLM_RELEVANCE_FILTER_ENABLED=false already gates it off;
this keeps the per-assistant seed config consistent and durable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Runbook for the managed Azure Backup-for-AKS approach, kept for reference.
The cluster uses self-managed Velero instead (k8s/overlays/prod-velero)
because managed AKS Backup needs role assignments / Trusted Access that
require Owner/UAA, which isn't available; this documents that path + the
[OWNER] hand-off if it's ever pursued.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document starting Postgres + Redis together under the -p danswer-stack project
(shared danswer-stack_default network), run Vespa via the manual docker run on
that same network (the compose `index` service is unreliable locally), add
Redis ping/stop + no-auth notes, and set REDIS_HOST=localhost for the host-run
backend.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ings gating)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…essages

Connector deletion is the event-driven cleanup_connector_credential_pair_task
on the non-durable Redis broker. If Redis or the celery worker restarts while
it's queued, the broker message is lost but the task_queue_jobs row stays
PENDING forever — the connector shows "Deleting" indefinitely and nothing
re-runs it (deletion, unlike sync/prune, is never periodically rescheduled),
while the delete API's dedup guard blocks resubmission. We had 5 connectors
stuck Deleting since Mar–May for exactly this reason.

Adds a periodic celery-beat task (check_for_stuck_deletion_tasks, every 30m)
that re-enqueues any cleanup task whose latest task_queue_jobs row has been
non-terminal past JOB_TIMEOUT (db.tasks.get_stuck_deletion_cc_ids). The cleanup
task's per-cc-pair advisory lock makes a re-enqueue a no-op if a deletion is
genuinely still running, and the fresh row stays live for JOB_TIMEOUT, so this
self-throttles to one re-drive per cc-pair per window. Also recovers the
existing stuck connectors on first run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rajivml and others added 28 commits June 21, 2026 23:00
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eration)

Citations are the LLM's output and it inconsistently cites curated sources even
when they're promoted to prompt position [1] (confirmed in chat + Slack across
runs). Prompt nudging can't reliably fix this. Add a deterministic post-generation
step in the shared Answer pipeline (covers chat + Slack, all assistants):

- additive: the LLM's own inline citations are untouched
- deduped: only authoritative (PROTECTED_SOURCES) docs the LLM did NOT cite,
  deduped by document_id (same page often appears as multiple chunks)
- honest: one batched LLM call verifies each candidate actually SUPPORTS a
  statement in the answer (topic-match is not enough); fail-closed on any error
- bounded: at most ONE extra call, and only when an uncited authoritative doc is
  in context (no call otherwise)

Supporting docs are appended as an "Authoritative sources" markdown footer
(renders in chat UI + Slack). Gated by AUTHORITATIVE_CITATION_RETENTION_ENABLED
(default off; prod on). In this deployment fast_llm == main llm, so the verify
call uses self.llm.

- chat_configs: AUTHORITATIVE_CITATION_RETENTION_ENABLED
- llm/answering/authoritative_retention.py: select/verify/footer + orchestrator
- llm/answering/answer.py: accumulate answer+cited-ids in _process_stream, append
  verified footer after the citation stream
- prod env: enabled
- tests: 14 unit tests (selection/dedupe/parse/verify-fail-closed/footer)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ations)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The verify step is a non-streaming gateway completion that occasionally times out
(observed transiently; normally 1.6-5.5s). Retry once before failing closed so a
single gateway hiccup doesn't drop the authoritative-sources footer. Still
fail-closed after retries (never appends on a real error).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… tighten gate

Two changes per product feedback:
1. Single Sources section (no footer): inject the verified authoritative doc as a
   CitationInfo using its context position as citation_num. Since authoritative
   docs are promoted to positions 1-3 and the UI orders the Sources group by
   citation number (JS integer-key order / Slack list), it lands at the TOP of the
   existing single "Sources" section. Drops the separate markdown footer.
2. Tighter gate: only run when the answer cites NO authoritative source at all. If
   the LLM already cited any PROTECTED_SOURCES doc, do nothing (no verify call) —
   the answer is already authoritatively grounded.

Net: at most one conditional verify call, only on answers missing authoritative
citations; result merges into the one Sources section instead of a second block.

- authoritative_retention.py: gate in select_authoritative_candidates;
  retained_authoritative_citations() returns CitationInfo (footer removed)
- answer.py: yield the CitationInfo packets after the stream
- tests: 18 (gate skip + no-LLM-call, context-position citation_num, verify reject)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ources)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ollided)

Merging the retained authoritative doc into the single Sources section assigned it
a citation_num equal to its context position — which collides with the LLM's own
citations (it owns the low numbers), and translate_citations de-dupes first-wins,
so the injected citation was dropped (validated: 0 OutSystems shown). Putting it at
the TOP of the numbered list would require renumbering the LLM's inline [[n]],
which breaks the inline links.

Revert to the 'Authoritative sources' footer block (reliably surfaces the link),
but KEEP the tightened gate from the prior change: only run when the answer cites
NO authoritative source at all (no verify call otherwise).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Strict 'directly supports a statement' verification dropped clearly-relevant
authoritative docs whose content didn't literally restate the answer — e.g. the
'Elastic Robot Orchestration Setup For AWS' KB article was at prompt position #1
for an ERO/Automation-Suite question but got filtered out because the answer's key
statement was a negative ('not available self-hosted') the setup guide doesn't
literally assert. Loosen the verify prompt to 'is a RELEVANT authoritative
reference for this answer (same subject/scenario; need not restate it)'. Surfaces
the authoritative KB/docs link in these cases; tradeoff is occasionally a
topically-related doc, acceptable for the support-engineer use case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ance)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s cited

The tightened gate ('skip if any authoritative source was cited') was too coarse:
on an Orchestrator upgrade question the LLM cited a KB article (sfkbarticles) +
Slack, which suppressed surfacing the relevant, uncited docs.uipath.com page
(/2023.10/.../maintenance-considerations) that was at the front of the prompt — a
regression vs the old 2-phase union. Loosen: candidates = any uncited relevant
authoritative doc, regardless of whether some other authoritative source was
cited. Verify call now fires whenever an uncited authoritative doc is present
(still conditional/batched/retry-guarded).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…!= relevant)

Relevance verify was too loose: for an 'AI in standalone Orchestrator' question it
surfaced 'Does Standalone Orchestrator leverage Azure SignalR Service?' (a
messaging/transport doc) as an authoritative reference. Tighten to require the doc
be about the SAME SPECIFIC topic/feature, not merely the same product — with an
explicit messaging/infra-vs-AI exclusion and 'when unsure, exclude'. Middle ground
between strict 'supports-a-statement' (missed relevant docs) and loose
'same-subject' (false positives).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The messaging-vs-AI example was overfit to one question. State the general rule:
the doc must be about the SAME SPECIFIC topic/feature, not merely the same product;
a different feature/component/service isn't a relevant reference. No domain example.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… signal)

The verify was judging relevance from the title + first 600 chars of the doc —
usually boilerplate, not the passage that matched — causing oscillation (signalR
false-positive when loose; 'Before you upgrade' false-negative when strict). Now
pass each candidate's full MATCHED PASSAGE (LlmDoc.content = the retrieved chunk;
capped at 4000 chars) and judge from that, with rebalanced wording (same topic as
the answer = include; different feature/component = exclude). Candidate list is
small (1-3), so the extra tokens are bounded. This makes precision come from the
chunk rather than from overfit prompt strictness.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The reserved pass guaranteed only the top-3 protected docs into the candidate set;
relevant docs ranked #4 among protected sources (e.g. Orchestrator Maintenance
Considerations on an upgrade question) fell off and never reached verify/footer.
Bump to 6 so more of the top protected docs are guaranteed in; the per-source
prompt cap still bounds how many reach the LLM.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… indexed

The version dedup only collapses versions that were retrieved, so when retrieval
surfaces a stale version (content is near-identical across versions, so which one
ranks is ~arbitrary) the old link is shown even though newer versions are indexed
(e.g. /2023.4/ surfaced while /2025.10/ exists). After pruning, rewrite each
versioned docs.uipath.com link to the NEWEST version of that same page (same URL
with the version segment stripped, slug preserved) found in the index. Fixes both
inline docs citations and the authoritative footer (both use final_context_docs
links). One PK-indexed prefix-scan per distinct page-prefix; no reindex.

- doc_pruning: _versioned_url_parts + rewrite_docs_links_to_latest(db_session)
- search_tool: rewrite final_context_documents after prune
- tests: 6 (parse, rewrite-to-latest, noop-when-latest, slug-variants, non-docs, new-scheme)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… version)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Blindly rewriting docs links to latest is wrong for version-specific questions
("is X supported in 23.10?" must resolve to the 23.10 doc, not latest). Make it
version-aware: parse_question_doc_version() extracts a version from the question
(23.10 -> 2023.10) only when EXACTLY one is named (multiple, e.g. 'upgrade 23.10 to
25.10', is ambiguous -> None -> latest). rewrite_docs_links() then resolves each
docs page to that exact indexed version (even if older than retrieved); falls back
to newest indexed when no version is specified. Not confused by '2.9 million'.

- doc_pruning: parse_question_doc_version + rewrite_docs_links(target_version)
- search_tool: parse from the query, pass through
- tests: parse cases (single/multiple/none/2.9M), target-older, target-not-indexed

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Judging relevance only against the ANSWER let cloud-connectivity-adjacent KBs
through non-deterministically: for 'does standalone Orchestrator have AI?' the
answer says 'AI requires an Automation Cloud connection', so KBs sharing that theme
(Azure SignalR, 'Automation Cloud cannot be accessed' Studio error) matched the
answer even though they don't address AI capability. Pass the QUESTION to the
verify and judge whether the doc helps answer THE QUESTION's specific subject —
keyword/product/service overlap is explicitly not enough, and error/troubleshooting
docs are excluded unless that's what's asked. When in doubt, exclude.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…answer)

Replace the carve-out-heavy wording with a clean rule: include a candidate only if
its matched passage is genuinely relevant to BOTH the question and the answer.
Requiring relevance to the question (not just the answer) is what excludes
cloud-connectivity-adjacent KBs (Azure SignalR, 'Automation Cloud cannot be
accessed' Studio error) that shared the answer's 'needs cloud connection' theme but
don't address the question's AI-capability subject.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ive-citation pipeline

Rewrite §5 from 'source diversity' into the full layered pipeline as built:
5.1 reserved retrieval (recall, SOURCE_RESERVED_RETRIEVAL_SLOTS=6)
5.2 diversity promotion (SOURCE_DIVERSITY_RESERVED_SLOTS=3)
5.3 per-source cap (MAX_PROMPT_DOCS_PER_SOURCE=8)
5.4 authoritative-sources nudge (soft, from PROTECTED_SOURCES)
5.5 verify-then-retain footer (chunk + question/answer relevance; why footer not merged)
5.6 version-aware docs link rewrite
Plus the citation-attribution lesson (presence != citation), PROTECTED_SOURCES now
web,sfkbarticles,highspot,outsystems, updated §6b/§8/§9, and §10 follow-ups
(configmap externalization, inline-card version gap, indexed-content freshness,
DB-backed prompts).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants