Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 2 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ omit =
*/website_profiling/llm/providers/*
*/website_profiling/llm/*
*/website_profiling/llm_config.py
*/website_profiling/llm_client_http.py
*/website_profiling/commands/chat_cmd.py
*/website_profiling/cli.py
*/website_profiling/commands/enrich_cmd.py
# FastAPI server — tested via integration tests, not unit tests
Expand Down
20 changes: 20 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -133,3 +133,23 @@ jobs:
dotnet-version: '10.0.x'
- name: Test Data service
run: dotnet test services/Data/Data.slnx

ai:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: '10.0.x'
- name: Test AiService
run: dotnet test services/AiService/AiService.slnx

integrations:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-dotnet@v4
with:
dotnet-version: '10.0.x'
- name: Test IntegrationsService
run: dotnet test services/IntegrationsService/IntegrationsService.slnx
25 changes: 14 additions & 11 deletions AGENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,21 @@ Developer reference for agents and contributors. User-facing overview: [README.m

**What it is:** `python -m src` from repo root (`src/__main__.py` -> package **`website_profiling`**). Config: stored in **PostgreSQL** (`pipeline_config` table, `key/value/is_unknown/updated_at`). A shadow **`pipeline-config.txt`** is auto-written to `DATA_DIR` on every Save/Run. CLI loads DB first (`DATABASE_URL`), then shadow file; `--config` overrides with a file. Reference keys: `input.txt.example` and `pipeline-config.example.txt` (not auto-loaded).

**LLM / AI:** Settings live in **`llm_config`** table in PostgreSQL. Providers: OpenAI, Google Gemini, Anthropic, Groq, Ollama (`web/src/lib/llmConfigSchema.ts`). Configure only via web UI **AI** tab (`GET/PUT /api/llm-config`, localhost). Never in `pipeline-config.txt` or `--config`.
**LLM / AI:** Settings live in **`llm_config`** (and related tables) in PostgreSQL. Providers: OpenAI, Google Gemini, Anthropic, Groq, Ollama (`web/src/lib/llmConfigSchema.ts`). **Browser writes** for API keys and LLM toggles go **BFF → AiService** (`PUT /api/secrets`, `PUT /api/llm-config`). Configure via **Secrets** (`/secrets`) and **Run audit → AI settings**. Never in `pipeline-config.txt` or `--config`. Worker/CLI calls AiService via `llm_client_http.py` (`AI_SERVICE_URL`, default `:8092`).

**Frontend:** **`web/`** (Vite + React SPA) — browser calls **`services/Bff/`** for all `/api/*`; BFF proxies to FastAPI and FileService.
**Frontend:** **`web/`** (Vite + React SPA) — browser calls **`services/Bff/`** for all `/api/*`; BFF proxies to FastAPI, AiService, Data, and FileService.

**Key paths**

- `src/website_profiling/` -- `cli.py`, `config.py`, `crawl/`, `db/storage.py`, `lighthouse/`, `reporting/`, `analysis/`, `llm/`, `tools/`
- `services/Bff/` -- .NET BFF (auth, CORS, `/api/*` proxy)
- `src/website_profiling/` -- `cli.py`, `config.py`, `crawl/`, `db/storage.py`, `lighthouse/`, `reporting/`, `analysis/`, `llm_client_http.py`, `tools/`
- `services/Bff/` -- .NET BFF (auth, CORS, `/api/*` proxy to FastAPI + AiService + Data + FileService)
- `services/AiService/` -- .NET AI (chat, secrets, LLM config, MCP, enrichment; port 8092)
- `services/Data/` -- .NET read service (report payloads, portfolio, issue status; port 8091)
- `services/FileService/` -- .NET PDF + Excel workbook export (HTTP-only; see [README](services/FileService/README.md))
- `web/src/` -- React SPA (`AppRoutes.tsx`, `views/`, `components/`); pipeline UI: `PipelineRunnerFab`, `PipelineContext`
- `alembic/` -- schema migrations

**Local dev:** `./local-run` (Postgres in Docker `wp-pg`, FileService on `:8080`, FastAPI on `:8001`, BFF on `:8090`, Vite on `:3000`; default `DATABASE_URL`: `postgres://postgres:dev@127.0.0.1:5432/website_profiling`). See `scripts/local-run.sh`. **Local tests:** `./local-test` runs **three** Python coverage gates (core 100%, reporting 100%, tools 100%) plus web checks — mirrors CI **python** and **web** jobs; Docker CI is separate (see `.github/workflows/ci.yml`). `./local-test browser` for `@pytest.mark.browser` integration tests — see `scripts/local-test.sh`. Mocked browser unit tests: `tests/test_browser_fetcher_unit.py`.
**Local dev:** `./local-run` (Postgres in Docker `wp-pg`, FileService `:8080`, Data `:8091`, AiService `:8092`, FastAPI `:8001`, BFF `:8090`, Vite `:3000`; default `DATABASE_URL`: `postgres://postgres:dev@127.0.0.1:5432/website_profiling`). See `scripts/local-run.sh`. **Local tests:** `./local-test` runs **three** Python coverage gates (core 100%, reporting 100%, tools 100%) plus web and .NET checks — mirrors CI; Docker CI is separate (see `.github/workflows/ci.yml`). `./local-test browser` for `@pytest.mark.browser` integration tests — see `scripts/local-test.sh`. Mocked browser unit tests: `tests/test_browser_fetcher_unit.py`.

**JavaScript crawl (optional):** Config keys `crawl_render_mode` (`static` | `javascript` | `auto`) and `crawl_js_*` in pipeline config / `pipelineConfigSchema.ts`. JS/auto crawls can capture browser console errors and uncaught exceptions (`crawl_js_capture_console`, stored under `page_analysis.browser`). **Auto mode** uses static-first fetch, pre-parse SPA heuristics (`needs_js_render`), then post-parse low-outlink fallback (`needs_js_render_after_parse`) in `crawler.py`. **Preflight:** `GET /api/crawl/browser-status` (localhost) spawns Python `browser_status()`; Run audit settings/run validation calls it when render mode is `javascript` or `auto`. Browser deps: Playwright from `requirements.txt` (installed by `./local-run setup` and `./local-test`). Runtime needs Chromium on `PATH` or `CHROME_PATH` (Docker sets `CHROME_PATH=/usr/bin/chromium`). Integration tests: `@pytest.mark.browser` — excluded by default in `pytest.ini`; Docker CI runs `tests/test_crawl_fetchers.py` and `tests/test_crawler_browser_e2e.py -m browser`; locally `./local-test browser`.

Expand All @@ -25,15 +27,15 @@ Developer reference for agents and contributors. User-facing overview: [README.m
- Run audit (CLI): `python -m src` — reads config from PostgreSQL (`pipeline_config`); shadow `DATA_DIR/pipeline-config.txt` if table empty. CLI override: `python -m src --config path`
- Optional step: `crawl` | `report` | `plot` | `lighthouse` | `keywords` | `warnings` | `enrich` | `google` | `chat`
- **`preserve_crawl_history`** (default true): append crawls; `false` truncates crawl tables but restores `report_payload`, Lighthouse, `google_data`, `keyword_data`, `keyword_history`, `keyword_suggest_cache`, and `crawl_runs`
- **`DATABASE_URL`** env: PostgreSQL connection string (required). **`DATA_DIR`**: secrets + shadow config (Docker: `/data`).
- **`DATABASE_URL`** env: PostgreSQL connection string (required). **`DATA_DIR`**: shadow pipeline config and local artifacts (Docker: `/data`); API keys live in Postgres via AiService.
- **Pipeline storage** (crawl, edges, nodes, report payload, Lighthouse, keywords, warnings) lives in **PostgreSQL only**. Deliverables use the Export view, `GET /api/report/export`, or MCP `export_*` tools — not files written by the main pipeline step.
- **Pool tuning:** `DB_POOL_MIN` / `DB_POOL_MAX` (Python). Bulk crawl writes via `executemany`; optional **`crawl_stream_to_db`** streams rows during fetch. Per-URL raw HTML: `crawl_page_html` table (migration `015`); API `GET/POST /api/crawl/page-html`.
- **Browser API (BFF):** All `/api/*` routes are served by `services/Bff/` (proxied to FastAPI / FileService). Notable: `/api/report/*`, `/api/run`, `/api/jobs/*`, `/api/pipeline-config`, `/api/llm-config`, `/api/chat` (SSE), `/api/integrations/google/*` (OAuth callback on BFF origin). `PipelineRunnerFab` saves pipeline + LLM state before each run. OpenAPI: `web/openapi.json`; BFF client: `services/Bff/src/Bff.Application/Generated/`.
- **MCP:** `python -m website_profiling.mcp` (stdio) or `python -m website_profiling.mcp.http` (remote Streamable HTTP). Configure at **`/mcp`** in the web UI. See `docs/MCP.md`.
- **Browser API (BFF):** All `/api/*` routes are served by `services/Bff/`. **FastAPI:** `/api/run`, `/api/jobs/*`, `/api/pipeline-config`, crawl, integrations (OAuth reads), properties, content drafts, etc. **AiService:** `/api/chat` (SSE), `/api/llm-config`, `/api/secrets`, `/api/ollama/status`, `/api/issues/fix-suggestion`, `/api/issues/action-plan`, `/api/dashboards/ai-generate`, `/api/content/analyze`, `/api/content/wizard`, `/api/links/page-coach`, `/api/mcp-tools`, `/api/report/audit-tool`. **Data:** report payload reads, portfolio, issue status, saved filters (see `DATA_ROUTES`). **FileService:** PDF/workbook export. `PipelineRunnerFab` saves pipeline config (FastAPI) and LLM state (`PUT /api/llm-config` → AiService) before each run. OpenAPI: `web/openapi.json` (FastAPI routes only — AiService routes are not in this spec); BFF client: `services/Bff/src/Bff.Application/Generated/`.
- **MCP:** AiService (.NET) — stdio host or HTTP at `/mcp` when `WP_MCP_HTTP=1` on `:8092`. Configure at **`/mcp`** in the web UI. See `docs/MCP.md` and [services/AiService/README.md](services/AiService/README.md).
- **AI Chat UI:** `/chat` — property-scoped chat with saved sessions (`chat_sessions`, `chat_messages`; migration `012_chat_sessions`).
- **Job store:** PostgreSQL `pipeline_jobs` (FastAPI); live job status via `/api/jobs/*` through the BFF.
- **Schema head:** `015_crawl_page_html` (recent: `013` link_edges/discovery, `014` job log truncation, `015` per-URL HTML storage).
- **Docker:** Root `Dockerfile` (Python backend); `web/Dockerfile` (Vite SPA + nginx); `docker-compose.yml` (postgres + fastapi + worker + bff + web + FileService); **`docker-compose.prod.yml`** (production + optional MCP on `:8000`); **`docker-compose.pull.yml`** for pre-built images (`BACKEND_IMAGE`, `WEB_IMAGE`); **`LIGHTHOUSE_CHROME_FLAGS`**
- **Docker:** Root `Dockerfile` (Python backend); `web/Dockerfile` (Vite SPA + nginx); `docker-compose.yml` (postgres + fastapi + worker + ai + data + bff + web + FileService); **`docker-compose.prod.yml`** (production + optional MCP profile mapping host `:8000` → AiService `:8092`); **`docker-compose.pull.yml`** for pre-built images (`BACKEND_IMAGE`, `WEB_IMAGE`); **`LIGHTHOUSE_CHROME_FLAGS`**

**Where to edit**

Expand All @@ -44,12 +46,13 @@ Developer reference for agents and contributors. User-facing overview: [README.m
| PDF / workbook export | `services/FileService/` (rendering); BFF routes `/api/report/export` and `/api/report/export-workbook` to FileService |
| DB schema | `alembic/versions/` |
| Local analysis | `analysis/local.py`, `requirements.txt` |
| AI insights (LLM) | `llm/enrich.py`, `llm/agent.py`, `llm_config.py`, `requirements.txt` |
| Audit query tools (MCP + chat) | `tools/audit_tools/`, `mcp/server.py`, `mcp/http_server.py`, `commands/chat_cmd.py` |
| AI insights (LLM) | `services/AiService/` (browser-facing), `llm_client_http.py` (worker/CLI), `llm_config.py` |
| Audit query tools (MCP + chat) | `services/AiService/src/AiService.Tools/`, `services/AiService/src/AiService.Mcp/`, `tools/audit_tools/`, `commands/chat_cmd.py` |
| Agent readiness checks | `tools/audit_tools/geo/agent_readiness.py`, `tools/audit_tools/_aeo_helpers.py` |
| Config / CLI | `config.py` (`load_config`, `load_config_from_db`), `cli.py`, `input.txt.example` |
| UI pipeline schema | `web/src/lib/pipelineConfigSchema.ts` |
| UI LLM schema | `web/src/lib/llmConfigSchema.ts` |
| UI secrets schema | `web/src/lib/secretsConfigSchema.ts`, `web/src/hooks/useSecrets.ts` |
| Browser API client | `web/src/lib/publicBase.ts` (`apiUrl`, `apiFetch`, `VITE_BFF_BASE_URL`) |
| D3 charts (custom / compare / overview) | `web/src/components/charts/d3/`, `web/src/lib/viz/` |
| Chart.js charts (standard bar/line/doughnut) | `web/src/utils/chartJsDefaults.ts`, `react-chartjs-2` in views under `web/src/views/`, `web/src/components/searchPerformance/`, `web/src/components/traffic/` |
Expand Down
16 changes: 10 additions & 6 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,17 @@

This file is the canonical entry point for agents. For full detail see [AGENT.md](AGENT.md).

**What it is:** Self-hosted SEO crawl and technical audit platform — `python -m src` from repo root. Stack: Python (crawl + analysis + MCP + FastAPI), Vite + React SPA (web UI), .NET BFF (browser API), .NET Data (report reads), PostgreSQL.
**What it is:** Self-hosted SEO crawl and technical audit platform — `python -m src` from repo root. Stack: Python (crawl + analysis + FastAPI), Vite + React SPA (web UI), .NET BFF (browser API), .NET Data (reads), .NET AiService (AI/LLM/MCP), .NET IntegrationsService (Google/Bing I/O), PostgreSQL.

**Key paths**

- `src/website_profiling/` — core Python package
- `cli.py`, `config.py`, `api/`, `worker/`, `crawl/`, `db/`, `reporting/`, `analysis/`, `llm/`, `tools/`
- `cli.py`, `config.py`, `api/`, `worker/`, `crawl/`, `db/`, `reporting/`, `analysis/`, `llm_client_http.py`, `tools/`
- `web/` — Vite + React SPA (static nginx in prod); browser calls `services/Bff/` for all `/api/*`
- `services/Bff/` — .NET BFF (auth, CORS, proxy to FastAPI + Data + FileService)
- `services/Bff/` — .NET BFF (auth, CORS, proxy to FastAPI + IntegrationsService + Data + AiService + FileService)
- `services/Data/` — .NET read service (report payloads, portfolio, issue status, filters; port 8091)
- `services/AiService/` — .NET AI service (Microsoft.Extensions.AI, chat, enrichment, MCP, **secrets/llm-config writes**; port 8092). See [services/AiService/README.md](services/AiService/README.md)
- `services/IntegrationsService/` — .NET Google/Bing integrations (GSC/GA4 fetch, OAuth, page-live, keyword reads; port 8093). See [services/IntegrationsService/README.md](services/IntegrationsService/README.md)
- `services/FileService/` — .NET PDF + Excel workbook export (port 8080). HTTP-only via `REPORT_API_URL`; no Postgres. Profiles: `executive|standard|full|premium`. Details: [services/FileService/README.md](services/FileService/README.md). Env: `FILE_SERVICE_URL` (MCP), `REPORT_API_URL` (FileService).
- `alembic/` — DB migrations
- `docs/` — documentation index
Expand All @@ -21,13 +23,15 @@ This file is the canonical entry point for agents. For full detail see [AGENT.md
**Run / dev**

```bash
./local-run # Start Postgres + FileService + Data + worker + FastAPI + BFF + Vite dev server
./local-run # Start Postgres + FileService + Data + AiService + IntegrationsService + worker + FastAPI + BFF + Vite
./local-test # Python + web + .NET tests (CI parity)
python -m src # Run audit pipeline
python -m website_profiling.mcp # Start MCP server (stdio)
# MCP: AiService stdio/HTTP — see services/AiService/README.md and docs/MCP.md
```

**MCP:** 340 read-only audit tools via Model Context Protocol. See [docs/MCP.md](docs/MCP.md).
**MCP:** 369 read-only audit tools via Model Context Protocol (AiService). See [docs/MCP.md](docs/MCP.md).

**Secrets / credentials:** Browser writes go BFF → AiService only (`PUT /api/secrets`, `PUT /api/llm-config`). Python FastAPI keeps `pipeline-config` and read-only integration routes; worker/crawl reads `llm_config` / `google_app_settings` from Postgres at runtime.

**Edit targets**

Expand Down
Loading
Loading