codefrydev · PrashantUnity · Jun 21, 2026 · Jun 20, 2026 · Jun 20, 2026 · Jun 20, 2026
diff --git a/AGENT.md b/AGENT.md
@@ -43,14 +43,47 @@ Developer reference for agents and contributors. User-facing overview: [README.m
 | Local analysis | `analysis/local.py`, `requirements.txt` |
 | AI insights (LLM) | `llm/enrich.py`, `llm/agent.py`, `llm_config.py`, `requirements.txt` |
 | Audit query tools (MCP + chat) | `tools/audit_tools/`, `mcp/server.py`, `mcp/http_server.py`, `commands/chat_cmd.py` |
-| Agent readiness checks | `tools/audit_tools/agent_readiness.py`, `tools/audit_tools/_aeo_helpers.py` |
+| Agent readiness checks | `tools/audit_tools/geo/agent_readiness.py`, `tools/audit_tools/_aeo_helpers.py` |
 | Config / CLI | `config.py` (`load_config`, `load_config_from_db`), `cli.py`, `input.txt.example` |
 | UI pipeline schema | `web/src/lib/pipelineConfigSchema.ts` |
 | UI LLM schema | `web/src/lib/llmConfigSchema.ts` |
 | UI config I/O | `web/src/server/pipelineConfig.ts`, `web/src/server/llmConfig.ts` |
+| D3 charts (custom / compare / overview) | `web/src/components/charts/d3/`, `web/src/lib/viz/` |
+| Chart.js charts (standard bar/line/doughnut) | `web/src/utils/chartJsDefaults.ts`, `react-chartjs-2` in views under `web/src/views/`, `web/src/components/searchPerformance/`, `web/src/components/traffic/` |
 
 Schema changes: add Alembic migration (`alembic revision`).
 
+**Charts — Chart.js + D3 (hybrid)**
+
+The web UI uses **both** Chart.js and D3.js. Pick the library that fits each chart; do not migrate everything to one stack.
+
+| Prefer **Chart.js** when… | Prefer **D3** when… |
+|---------------------------|---------------------|
+| Standard bar, line, or doughnut with typical legend/tooltip/responsive canvas | Custom layout (grouped compare bars, dual lines with null gaps, arc gauges) |
+| Quick add with minimal custom SVG | Tight theme control via CSS vars (`--chart-grid`, `--chart-title`, etc.) |
+| Page already on Chart.js (GSC, GA4, Links, Content Analytics) | Reusing shared components in `web/src/components/charts/d3/` |
+| Chart.js plugins or defaults are enough | Neutral data types + adapters in `web/src/lib/viz/` |
+
+**Decision rule:** If a D3 component already exists (`D3GroupedBarChart`, `D3DualLineChart`, `D3VerticalBarChart`, `D3DonutChart`, compact charts, `arcGauge.ts`), reuse it. If it is a one-off standard chart on a Chart.js page, stay on Chart.js unless D3 clearly wins.
+
+**Current split (indicative)**
+
+| Area | Library |
+|------|---------|
+| Overview dashboard (`/dashboard`) | D3 |
+| Compare (`/compare`) | D3 |
+| Content analytics — Analytics tab (`/content-analytics?tab=analytics`) | D3 |
+| GSC / GA4 / scatter (`GscCharts`, `Ga4Charts`) | Chart.js |
+| Links explorer, Content Analytics, Text Content Analysis | Chart.js |
+| Score rings, distribution donuts, compact sparklines | D3 |
+
+**Conventions (both stacks)**
+
+- Wrap charts in `ChartPanel`, `ChartAccessibleFallback`, and/or `ChartCard` where applicable.
+- Theme helpers live in `web/src/utils/chartJsDefaults.ts` (`getGridColor`, `getChartTitleColor`, `truncateChartLabel`) — use them from D3 as well as Chart.js.
+- Keep chart-library types out of data-prep: use neutral shapes (`BarChartData`, `DualSeriesChartData` in `web/src/lib/viz/types.ts` and `web/src/lib/compareChartData.ts`); convert at the render layer via `web/src/lib/viz/adapters.ts` when needed.
+- Migrate page-by-page when D3 is the better fit; do not remove `chart.js` from `package.json` until all consumers are migrated.
+
 **Company standards:** UI copy in `web/src/strings.json` (Site Audit, Properties, Run audit). Data provenance on `report_meta` in report payload. Docs: `docs/COMPANY_STANDARDS.md`, `docs/GLOSSARY.md`. Migration `003_company_standards` (properties, pipeline_jobs, audit_log). Durable jobs in `web/src/server/pipelineJobsDb.ts`. Export: `GET /api/report/export`, `src/website_profiling/tools/export_audit.py`.
 
 **Common footguns (check before finishing web or DB work)**

diff --git a/AGENTS.md b/AGENTS.md
@@ -32,8 +32,11 @@ python -m website_profiling.mcp   # Start MCP server (stdio)
 |------|-------|
 | Crawl | `src/website_profiling/crawl/` |
 | Report | `src/website_profiling/reporting/` |
-| GEO / AEO / Agent readiness | `src/website_profiling/tools/audit_tools/geo_tools.py`, `agent_readiness.py` |
+| GEO / AEO / Agent readiness | `src/website_profiling/tools/audit_tools/geo/geo_tools.py`, `geo/agent_readiness.py` |
 | DB schema | `alembic/versions/` |
 | UI | `web/src/views/`, `web/app/` |
+| Charts | D3: `web/src/components/charts/d3/`, `web/src/lib/viz/` · Chart.js: GSC/GA4/Links etc. — see [AGENT.md](AGENT.md) § Charts |
+
+**Charts:** Use **both** Chart.js and D3 — choose per chart (Overview/Compare → D3; standard GSC/GA4 bars → Chart.js). Full rules in [AGENT.md](AGENT.md).
 
 **Common pitfalls:** See [AGENT.md](AGENT.md) for the full footguns checklist (React context, Python local imports, psycopg dict rows, coverage gates).
diff --git a/README.md b/README.md
@@ -25,6 +25,7 @@
 
 <p align="center">
   <a href="#getting-started">Quick start</a> ·
+  <a href="#seo-feedback-loop">Feedback loop</a> ·
   <a href="#features">Features</a> ·
   <a href="#scope-and-limitations">Limitations</a> ·
   <a href="#architecture">Structure</a> ·
@@ -50,8 +51,33 @@ Site Audit is a **developer-friendly SEO audit** tool: self-hosted, transparent,
 - Content writing and optimization with live SEO scoring
 - Search Console, GA4, and Bing Webmaster integration
 - Agency portfolio management and run comparison
+- **Closed-loop SEO workflow** — audit, report, feed data to IDE agents via MCP, fix in code, review and compare
 - Optional AI-assisted analysis over audit data via MCP-compatible tools
 
+## SEO feedback loop
+
+Site Audit is built for a **continuous improve-and-verify cycle**, not one-off dashboard checks. Crawl your site, generate reports, expose audit data to AI agents in **Cursor, Claude Code, or Copilot** via [340 MCP tools](docs/MCP.md), fix issues in your repository, then **review** the next run to compare health scores and issue deltas.
+
+```text
+Audit → Report → MCP → Fix → Review → (repeat)
+```
+
+<p align="center">
+  <img src="docs/assets/seo-feedback-loop.png" alt="Site Audit SEO feedback loop — Audit, Report, MCP, Fix, Review" width="920">
+</p>
+
+**How each step maps to the product**
+
+| Step | What you do | In Site Audit |
+|------|-------------|---------------|
+| **Audit** | Crawl and score the site | Pipeline (`python -m src`), Lighthouse, on-page checks |
+| **Report** | Export and prioritize fixes | PDF/HTML/CSV exports, issue board, fix roadmap |
+| **MCP** | Pull audit context into your IDE | `python -m website_profiling.mcp` — read-only tools for Cursor / Claude Desktop |
+| **Fix** | Ship changes in your codebase | Your PR workflow (MCP does not write to the site) |
+| **Review** | Prove improvement | Compare runs, category deltas, GSC metric changes |
+
+See [docs/MCP.md](docs/MCP.md) for MCP setup and example prompts (e.g. compare two reports, export issue diffs).
+
 ## Scope and limitations
 
 Site Audit focuses on **honest, self-hosted technical SEO**. It is not a drop-in replacement for every paid SaaS data product.
@@ -93,7 +119,7 @@ Site Audit focuses on **honest, self-hosted technical SEO**. It is not a drop-in
   </tr>
 </table>
 
-Also included: **AI chat** over audit data (optional), **Content studio** (write &amp; optimize with live SEO scoring), **340 MCP tools** (local stdio or remote Streamable HTTP), image SEO, GEO/AEO readiness, keyword explorer (GSC + on-site), backlinks (GSC Links import), compare runs, and portfolio management for agencies.
+Also included: **AI chat** over audit data (optional), **Content studio** (write &amp; optimize with live SEO scoring), **340 MCP tools** (local stdio or remote Streamable HTTP), image SEO, GEO/AEO readiness, keyword explorer (GSC + on-site), backlinks (GSC Links import), compare runs, portfolio management for agencies, and the **agent-driven feedback loop** above.
 
 <img src="docs/assets/social-preview.png" alt="Site Audit — developer-friendly SEO audit preview" width="100%">
 

diff --git a/alembic/versions/024_app_settings.py b/alembic/versions/024_app_settings.py
@@ -0,0 +1,30 @@
+"""Add app_settings table for generic application-level key-value settings.
+
+Used to persist appearance customisations (custom color palette, etc.) and
+any future app-level preferences that have no dedicated table.
+
+Revision ID: 024_app_settings
+Revises: 023_crawl_page_markdown
+"""
+from __future__ import annotations
+
+from alembic import op
+
+revision = "024_app_settings"
+down_revision = "023_crawl_page_markdown"
+branch_labels = None
+depends_on = None
+
+
+def upgrade() -> None:
+    op.execute("""
+        CREATE TABLE app_settings (
+            key         TEXT        NOT NULL PRIMARY KEY,
+            value       TEXT        NOT NULL DEFAULT '',
+            updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
+        )
+    """)
+
+
+def downgrade() -> None:
+    op.execute("DROP TABLE IF EXISTS app_settings")
diff --git a/docs/README.md b/docs/README.md
@@ -25,6 +25,7 @@ Marketing and README assets are stored in [assets/](assets/):
 | Asset | Purpose |
 |-------|---------|
 | `readme-banner.png` | README header banner |
+| `seo-feedback-loop.png` | SEO feedback loop diagram (Audit → Report → MCP → Fix → Review) |
 | `social-preview.png` | Application screenshot for README and social previews |
 | `banner.svg` | Source artwork for the banner |
 | `logo.svg`, `logo-icon.svg` | Product logo and icon |

diff --git a/docs/assets/seo-feedback-loop.png b/docs/assets/seo-feedback-loop.png
diff --git a/local-prod b/local-prod
@@ -0,0 +1,2 @@
+#!/usr/bin/env bash
+exec "$(cd "$(dirname "$0")" && pwd)/scripts/local-prod.sh" "$@"
diff --git a/scripts/local-prod.sh b/scripts/local-prod.sh
@@ -0,0 +1,114 @@
+#!/usr/bin/env bash
+# Local prod: same Postgres as ./local-run, Next.js build + start (NODE_ENV=production).
+# Usage: ./local-prod [command]
+#   (default) start   — DB, migrations, npm run build, npm run start
+#   build             — npm run build only
+#   help              — show commands
+set -euo pipefail
+
+ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
+cd "$ROOT"
+
+PG_CONTAINER="${WP_PG_CONTAINER:-wp-pg}"
+PG_PORT="${WP_PG_PORT:-5432}"
+PG_USER="${WP_PG_USER:-postgres}"
+PG_PASSWORD="${WP_PG_PASSWORD:-dev}"
+PG_DB="${WP_PG_DB:-website_profiling}"
+
+export DATABASE_URL="${DATABASE_URL:-postgres://${PG_USER}:${PG_PASSWORD}@127.0.0.1:${PG_PORT}/${PG_DB}}"
+export DATA_DIR="${DATA_DIR:-$ROOT/data}"
+export PYTHON="${PYTHON:-$ROOT/.venv/bin/python}"
+export WEBSITE_PROFILING_ROOT="$ROOT"
+export PYTHONPATH="${PYTHONPATH:+$PYTHONPATH:}$ROOT/src"
+export NODE_ENV=production
+
+WEB="$ROOT/web"
+LOCAL_RUN="$ROOT/scripts/local-run.sh"
+
+log() { printf '\033[1;36m→\033[0m %s\n' "$*"; }
+die() { printf '\033[1;31m✗\033[0m %s\n' "$*" >&2; exit 1; }
+
+need_cmd() {
+  command -v "$1" >/dev/null 2>&1 || die "Missing required command: $1"
+}
+
+cmd_web_deps() {
+  need_cmd npm
+  if [[ ! -d "$WEB/node_modules" ]]; then
+    log "Installing web dependencies (npm ci)"
+    (cd "$WEB" && npm ci)
+  fi
+}
+
+cmd_build() {
+  cmd_web_deps
+  log "Building Next.js (production)"
+  (cd "$WEB" && npm run build)
+}
+
+cmd_start() {
+  local skip_build=0
+  for arg in "$@"; do
+    case "$arg" in
+      --skip-build) skip_build=1 ;;
+    esac
+  done
+
+  mkdir -p "$DATA_DIR"
+  log "Ensuring Postgres and migrations (via ./local-run migrate)"
+  "$LOCAL_RUN" migrate
+  if [[ "$skip_build" -eq 0 ]]; then
+    cmd_build
+  else
+    cmd_web_deps
+    log "Skipping build (--skip-build)"
+  fi
+  log "Starting Next.js production server (Ctrl+C to stop)"
+  log "DATABASE_URL=$DATABASE_URL"
+  log "DATA_DIR=$DATA_DIR"
+  log "PYTHON=$PYTHON"
+  log "NODE_ENV=$NODE_ENV"
+  cd "$WEB"
+  export DATABASE_URL DATA_DIR PYTHON WEBSITE_PROFILING_ROOT PYTHONPATH NODE_ENV
+  exec npm run start
+}
+
+cmd_help() {
+  cat <<EOF
+Local prod runner — same Postgres as ./local-run, Next.js in production mode
+
+  ./local-prod              Same as: start
+  ./local-prod start        DB + migrations + build + npm run start
+  ./local-prod start --skip-build   Start without rebuilding (reuse .next)
+  ./local-prod build        npm run build only
+  ./local-prod help         Show this help
+
+Environment overrides (optional):
+  DATABASE_URL  (default: postgres://postgres:dev@127.0.0.1:5432/website_profiling)
+  DATA_DIR      (default: <repo>/data)
+  AUTH_SECRET   (optional — enables login when set)
+  WP_PG_CONTAINER, WP_PG_PORT, WP_PG_PASSWORD, WP_PG_DB
+
+After start, open: http://localhost:3000/home
+Use localhost (not 127.0.0.1) for pipeline APIs.
+
+Dev mode with hot reload: ./local-run start
+EOF
+}
+
+main() {
+  local cmd="${1:-start}"
+  case "$cmd" in
+    start)
+      shift || true
+      cmd_start "$@"
+      ;;
+    build) cmd_build ;;
+    help|-h|--help) cmd_help ;;
+    *)
+      die "Unknown command: $cmd (try: ./local-prod help)"
+      ;;
+  esac
+}
+
+main "$@"
diff --git a/scripts/local-run.sh b/scripts/local-run.sh
@@ -172,6 +172,8 @@ Environment overrides (optional):
 After start, open: http://localhost:3000/home
 Run audits via sidebar "Run audit" (bottom-right FAB).
 
+Production Next.js (same Postgres, no hot reload): ./local-prod start
+
 Run CI-style tests: ./local-test (see ./local-test help). JS crawl integration: ./local-test browser.
 EOF
 }

diff --git a/src/website_profiling/analysis/local.py b/src/website_profiling/analysis/local.py
@@ -41,7 +41,11 @@ def _cfg_int(cfg: dict[str, str] | None, key: str, default: int) -> int:
 
 
 def _tokenize_simhash(text: str) -> list[str]:
-    return re.findall(r"[a-z0-9]{3,}", text.lower())
+    # `[^\W_]` is word chars minus underscore: identical to the old `[a-z0-9]`
+    # for ASCII (input is lowercased) but ALSO matches Unicode letters/digits, so
+    # CJK / Cyrillic / Arabic / Greek pages no longer tokenize to nothing and
+    # collapse to SimHash 0 (which falsely clustered them all as duplicates).
+    return re.findall(r"[^\W_]{3,}", text.lower(), re.UNICODE)
 
 
 def _stable_token_hash(token: str) -> int:
@@ -123,6 +127,11 @@ def compute_duplicate_groups(
 
     bucket: dict[int, list[str]] = defaultdict(list)
     for u, h in url_to_sh.items():
+        # SimHash 0 means "no tokenizable content", not "identical content".
+        # Bucketing those together unioned every untokenizable page as a single
+        # giant duplicate group — skip them.
+        if h == 0:
+            continue
         bucket[h].append(u)
 
     fuzz = _import_rapidfuzz()
@@ -163,7 +172,9 @@ def union(a: str, b: str, method: str) -> None:
             union(base, m, "simhash")
 
     if hamming_max > 0 and len(urls) <= simhash_max_urls:
-        sh_list = [(u, url_to_sh[u]) for u in urls]
+        # Exclude SimHash-0 (untokenizable) pages — every pair of them has
+        # Hamming distance 0 and would be wrongly merged as duplicates.
+        sh_list = [(u, url_to_sh[u]) for u in urls if url_to_sh[u] != 0]
         for i, (u1, h1) in enumerate(sh_list):
             for u2, h2 in sh_list[i + 1 :]:
                 if _hamming(h1, h2) <= hamming_max:

diff --git a/src/website_profiling/analysis/page.py b/src/website_profiling/analysis/page.py
@@ -91,7 +91,7 @@ def walk(obj: object) -> bool:
     "corporation",
     "store",
     "restaurant",
-    "professionalService",
+    "professionalservice",
     "newsmediaorganization",
 })
 _CONTACT_CAP = 10

diff --git a/src/website_profiling/cli.py b/src/website_profiling/cli.py
@@ -9,6 +9,7 @@
     enrich_cmd,
     google_cmd,
     gsc_links_cmd,
+    help_cmd,
     keywords_cmd,
     lighthouse_cmd,
     page_coach_cmd,
@@ -46,6 +47,8 @@ def main() -> None:
         chat_cmd.run(cfg, args)
     elif args.command == "page-markdown":
         page_markdown_cmd.run(cfg, args)
+    elif args.command == "help":
+        help_cmd.run(cfg, args)
     else:
         pipeline_cmd.run(cfg, args)
 

diff --git a/src/website_profiling/commands/config_resolve.py b/src/website_profiling/commands/config_resolve.py
@@ -281,6 +281,7 @@ def build_parser() -> argparse.ArgumentParser:
             "page-coach",
             "chat",
             "page-markdown",
+            "help",
         ],
         help="Run only this step (default: run all steps according to config)",
     )
@@ -394,7 +395,7 @@ def build_parser() -> argparse.ArgumentParser:
         "--stdin-json",
         action="store_true",
         dest="stdin_json",
-        help="For 'chat' command: read JSON payload from stdin and emit NDJSON events.",
+        help="For 'chat' and 'help' commands: read JSON payload from stdin and emit NDJSON events.",
     )
     parser.add_argument(
         "--resume-run-id",

diff --git a/src/website_profiling/commands/help_cmd.py b/src/website_profiling/commands/help_cmd.py
@@ -0,0 +1,41 @@
+"""CLI: help --stdin-json — single-turn help chat (NDJSON events on stdout)."""
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+
+from ..text_sanitize import sanitize_unicode_deep
+from ..llm.help_agent import run_help_turn
+
+
+def run(_cfg: dict, args: argparse.Namespace) -> None:
+    if not getattr(args, "stdin_json", False):
+        print("Error: help requires --stdin-json", file=sys.stderr)
+        sys.exit(1)
+
+    try:
+        payload = json.load(sys.stdin)
+    except json.JSONDecodeError as e:
+        print(json.dumps({"type": "error", "message": f"Invalid stdin JSON: {e}"}))
+        sys.exit(1)
+
+    messages = payload.get("messages") or []
+    if not isinstance(messages, list):
+        messages = []
+
+    def on_event(event: dict) -> None:
+        print(json.dumps(sanitize_unicode_deep(event), default=str), flush=True)
+
+    try:
+        result = run_help_turn(messages, on_event=on_event)
+    except Exception as e:
+        msg = str(e).strip() or type(e).__name__
+        print(json.dumps({"type": "error", "message": msg}), flush=True)
+        sys.exit(1)
+
+    if not result.get("ok"):
+        err = result.get("error", "Help agent failed")
+        print(json.dumps({"type": "error", "message": err}), flush=True)
+        sys.exit(1)
+    sys.exit(0)
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		#!/usr/bin/env bash
		exec "$(cd "$(dirname "$0")" && pwd)/scripts/local-prod.sh" "$@"