Skip to content

feat(resolver): asset classifier mapping identifiers to specific taxonomy#33

Open
robertoecf wants to merge 7 commits into
mainfrom
claude/kind-euler-06c857
Open

feat(resolver): asset classifier mapping identifiers to specific taxonomy#33
robertoecf wants to merge 7 commits into
mainfrom
claude/kind-euler-06c857

Conversation

@robertoecf

@robertoecf robertoecf commented Jun 29, 2026

Copy link
Copy Markdown
Owner

O que

resolve_asset(identifier) — resolver determinístico que mapeia qualquer identificador de ativo brasileiro (ticker/CNPJ/ISIN/nome) para a taxonomia de consolidação (cliente: Wealthuman). Exposto como biblioteca (findata.resolver.resolve_asset()), REST (GET /resolver/resolve) e tool MCP resolve_asset.

📄 Contrato/entrega para o demandante: docs/RESOLVER.md.

Dois eixos (decisão de modelo)

  • macro_class = classe de ativo pura: RF / RV / Multimercado / Alternativos / Estruturados (+ Indefinido).
  • exposure = geografia ortogonal: Brasil / Internacional. B3 = domicílio do ativo, não a exposição → IVVB11, BDR, FIA global = RV + Internacional.

⚠️ Diverge do spec doc atual (lista ARBOR/WHG como macro=Internacional). Correção do spec por handoff em outra sessão. Os testes deste PR refletem o modelo novo.

Campos de auditoria

  • Certeza fiscal: lei_12431_status (confirmed/candidate/not_applicable/unknown) + isento_status. Heurística Petrobras → candidate (confirmar ISIN); IFRA11/infra explícito → confirmed; CRA/CRI/LCI-LCA → confirmed_exempt.
  • signals[]: trilha estruturada — qual regra disparou + evidência concreta (ex.: debenture / DEB / basis=heuristic;indexador=IPCA+).
  • source + confidence + as_of (America/Sao_Paulo) + cascade.

Cascata

Núcleo openfindata (curated → rules) resolve o test set offline; providers externos (Mais Retorno MCP / CVM-B3 / web search restrito) são ponto de extensão injetável (AssetProvider), consultado só quando o resultado é fraco. No estado atual só o degrau 1 está ligado (externos = stubs a conectar no deploy).

Test set (passa 100%, offline)

Identificador macro_class exposure nota
IFRA11 / FI ITAUINFRA Renda Fixa Brasil ETF de debêntures de infra; "Indexada à Inflação"; isento confirmed
ARBOR FIC FIA Renda Variável Internacional mandato global sem IE
WHG GLOBAL FIC FIA IE Renda Variável Internacional estrutura IE
DEB PETROBRAS IPCA+ Renda Fixa Brasil incentivada candidate (confirmar ISIN)
COE Estruturados (n/a) kind=coe, nunca ETF
"Crédito Estruturado" (Warren/AMW) Renda Fixa Brasil name-trap
IVVB11 Renda Variável Internacional ETF de ações S&P 500
HGLG11 / MXRF11 Renda Variável Brasil subclasse FII

Reviews aplicadas

  • Adversarial review (Codex, cross-host) — veio DO_NOT_MERGE (1 P0 + 3 P1); todos corrigidos: P0 isento heurístico → candidate + confidence 0.7 abaixo do short-circuit; IE/LC/LF crus, substring LCI, seed ARBOR, ETF RF underlying, as_of fuso → corrigidos.
  • CI review bots (Gemini + CodeRabbit) — gate de PARTICIPACOES/MACRO/ACOES em contexto de fundo (evita nomes de empresa), subclasse de Tesouro pelo código do título (NTN-B → inflação), double-fold, e doc drift (macro_class não lista mais "Internacional").

Validação

  • 278 passed, 15 deselected (testes live de rede), ruff + mypy limpos
  • REST + MCP testados end-to-end; REST surface 95 → 96; MCP curado 24 → 25

Commits

  1. feat(resolver) — classificador base + taxonomia + cascata
  2. feat(resolver) — eixo de certeza fiscal (lei_12431_status / isento_status)
  3. fix(resolver) — debenture=None pra ETF de Tesouro
  4. feat(resolver) — trilha estruturada signals[]
  5. fix(resolver) — review-bot findings (collisions, Tesouro subclasse) + doc drift
  6. docs(resolver) — contrato client-facing (docs/RESOLVER.md)

Follow-ups restantes (pós-merge)

  • Wire dos providers externos reais (Mais Retorno MCP, web search)
  • Confirmação ISIN-level da incentivada via ANBIMA/debentures.com.br
  • Correção do spec doc (openfindata-mcp-spec.md) no repo do cliente — handoff já preparado

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added an asset resolver available through the app, REST API, and MCP tool.
    • Supports resolving Brazilian assets from ticker, CNPJ, ISIN, or name into taxonomy-based classifications with exposure, subclass, and audit details.
    • Expanded structured output with certainty statuses, signals, source, confidence, and timestamped results.
  • Bug Fixes

    • Improved handling for tricky asset names and instruments to reduce misclassification.
  • Documentation

    • Updated the resolver and MCP surface documentation to reflect the new tool and API counts.

…xonomy

Add resolve_asset(): turns any Brazilian asset identifier (ticker/CNPJ/ISIN/
name) into a classification mapped to the consolidation macro taxonomy
(RF/RV/Multimercado/Alternativos/Estruturados) plus an orthogonal exposure
axis (Brasil/Internacional), subclasse, underlying_nature, debenture
Lei-12.431 facts, source, confidence, and an audit cascade.

Deterministic, cacheable, no PII. Core is offline (curated ETF/global-fund
seed + structural rules); external providers (Mais Retorno / CVM-B3 / web
search) are an injectable fallback chain. Exposed over REST (/resolver/resolve)
and as the resolve_asset MCP tool.

Passes the Wealthuman spec test set: IFRA11→RF/Indexada à Inflação,
ARBOR/WHG→RV+Internacional, DEB PETROBRAS IPCA+→RF incentivada+isento,
COE→Estruturados (never ETF), "Crédito Estruturado"→RF (name-trap),
IVVB11→RV+Internacional, FIIs→RV.

Geography is modeled as the exposure axis, not a macro class (macro_class is
pure asset class). Hardened after cross-host adversarial review: heuristic
Lei-12.431 isento kept below the cascade short-circuit so a provider confirms
by ISIN; bare-token collisions (IE/LC/LF, substring LCI) removed; API length
caps; as_of stamped in America/Sao_Paulo.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 50f57c4f-d6be-4f5f-b7fd-a3f512ede220

📥 Commits

Reviewing files that changed from the base of the PR and between 24e77d7 and 63b5b3f.

📒 Files selected for processing (11)
  • CHANGELOG.md
  • docs/MCP_SURFACE.md
  • docs/RESOLVER.md
  • src/findata/api/mcp_app.py
  • src/findata/api/routers/resolver.py
  • src/findata/resolver/__init__.py
  • src/findata/resolver/engine.py
  • src/findata/resolver/models.py
  • src/findata/resolver/normalize.py
  • src/findata/resolver/seed.py
  • tests/test_resolver.py
✅ Files skipped from review due to trivial changes (2)
  • CHANGELOG.md
  • docs/MCP_SURFACE.md
🚧 Files skipped from review as they are similar to previous changes (6)
  • src/findata/api/routers/resolver.py
  • src/findata/api/mcp_app.py
  • src/findata/resolver/init.py
  • src/findata/resolver/seed.py
  • src/findata/resolver/normalize.py
  • src/findata/resolver/engine.py

📝 Walkthrough

Walkthrough

Adds a new resolver package for deterministic classification of Brazilian asset identifiers, wires it into REST and MCP surfaces, and expands tests, docs, and configuration to match the new resolver behavior.

Changes

Asset-classification resolver

Layer / File(s) Summary
Output contract, normalization, and taxonomy
src/findata/resolver/models.py, src/findata/resolver/normalize.py, src/findata/resolver/taxonomy.py
Defines strict output models and certainty enums, adds structured signals, and implements identifier folding/tokenization plus CVM/ANBIMA/exposure mapping helpers.
Curated seed lookup
src/findata/resolver/seed.py
Defines curated ETF and global-fund seed entries, builds lookup indexes, and resolves seeded classifications by ticker, CNPJ, or constrained name substrings.
Classification engine
src/findata/resolver/engine.py
Implements the deterministic rule cascade, assembles typed classifications, and adds async provider enrichment with confidence-based short-circuiting.
Public API, routing, docs, and config
src/findata/resolver/__init__.py, src/findata/api/routers/resolver.py, src/findata/api/app.py, src/findata/api/mcp_app.py, docs/RESOLVER.md, docs/MCP_SURFACE.md, CHANGELOG.md, tests/test_mcp_surface.py, pyproject.toml
Re-exports resolver symbols, registers REST/MCP entry points, updates resolver and MCP docs, records the changelog entry, adjusts surface-count tests, and relaxes Ruff complexity checks for the engine.
Resolver tests
tests/test_resolver.py
Adds offline coverage for taxonomy mapping, certainty states, signal emission, provider behavior, contract validation, and edge-case regressions.

Sequence Diagram

sequenceDiagram
  participant Client
  participant FastAPI as /resolver/resolve
  participant resolve_asset
  participant classify
  participant lookup_seed
  participant _rule_payload
  participant AssetProvider

  Client->>FastAPI: GET /resolver/resolve
  FastAPI->>resolve_asset: optional identifiers
  resolve_asset->>classify: normalized input
  classify->>lookup_seed: seed match lookup
  lookup_seed-->>classify: seed or none
  classify->>_rule_payload: structural rules
  _rule_payload-->>classify: payload
  classify-->>resolve_asset: AssetClassification
  loop optional providers
    resolve_asset->>AssetProvider: norm, current
    AssetProvider-->>resolve_asset: enriched result or None
  end
  resolve_asset-->>FastAPI: AssetClassification
  FastAPI-->>Client: JSON response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • robertoecf/OpenFinData#29: Directly related to the resolver MCP/API surface, including the resolve_asset tool endpoint wiring.

Poem

🐇 I hop through tickers, swift and neat,
and fold each name to find the beat.
With signals, seeds, and rules in line,
the asset path now reads just fine.
Hop hop! The resolver shines today 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.03% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is concise and accurately reflects the main change: adding a resolver that maps asset identifiers to taxonomy classifications.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/kind-euler-06c857

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 markdownlint-cli2 (0.22.1)
CHANGELOG.md

markdownlint-cli2 wrapper config was not available before execution

docs/MCP_SURFACE.md

markdownlint-cli2 wrapper config was not available before execution

docs/RESOLVER.md

markdownlint-cli2 wrapper config was not available before execution


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new asset-classification resolver module and its corresponding API endpoints and MCP tool, allowing Brazilian assets to be mapped to a macro taxonomy and exposure axis. The code review feedback focuses on enhancing the robustness of the rule-based classification engine. Key recommendations include restricting keywords like 'PARTICIPACOES', 'MACRO', 'ACOES', and 'EQUITY' to a fund context to prevent collisions with direct equities or commercial names. Additionally, the feedback suggests mapping federal public bond abbreviations (e.g., LFT, LTN) to their respective indexers to avoid incorrect classification as 'Crédito Privado', and removing a redundant string folding operation in the seed lookup.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/findata/resolver/engine.py Outdated
Comment thread src/findata/resolver/engine.py Outdated
Comment thread src/findata/resolver/engine.py Outdated
Comment thread src/findata/resolver/engine.py Outdated
Comment thread src/findata/resolver/engine.py Outdated
Comment thread src/findata/resolver/seed.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/MCP_SURFACE.md (1)

56-72: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

The heading count and the listed tools disagree.

This section is titled "The 25 curated tools", but the block still includes the optional findata_run_code, so the rendered list shows 26 names. Either move the code-mode tool below this section or rename the heading to make the 25+1 split explicit.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/MCP_SURFACE.md` around lines 56 - 72, The section title and tool list
are inconsistent because the `findata_run_code` entry is still included under
“The 25 curated tools,” making the rendered count 26. Update the `MCP_SURFACE`
tool catalog block by either removing `findata_run_code` from this section and
placing it elsewhere as the opt-in code-mode tool, or renaming the heading to
clearly indicate a 25+1 split so the count matches the listed tools.
🧹 Nitpick comments (3)
tests/test_mcp_surface.py (1)

22-23: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert the new resolver surface explicitly, not just the totals.

Bumping the counts alone will still pass if some unrelated route/tool is added while resolve_asset stays unwired. Since this PR's contract is the new resolver surface, please pin it directly in the test.

Example strengthening
 def test_curated_mcp_is_a_small_fraction_of_the_rest_surface() -> None:
     mcp_ids = _operation_ids(mcp_app)
     rest_ids = _operation_ids(app)
+    assert "resolve_asset" in mcp_ids
+    assert "/resolver/resolve" in app.openapi()["paths"]
     assert len(mcp_ids) == EXPECTED_TOOLS
     assert len(rest_ids) == EXPECTED_REST_OPERATIONS
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_mcp_surface.py` around lines 22 - 23, The surface test currently
only checks aggregate tool and REST counts, so it can pass even if the new
resolver is not actually wired in. Update the assertions in test_mcp_surface.py
to explicitly verify the resolver surface by name, including resolve_asset and
its related MCP/REST exposure, alongside the existing totals. Use the existing
EXPECTED_TOOLS, EXPECTED_REST_OPERATIONS, and the test that enumerates the
surface to pin the new contract directly instead of relying on count-only
coverage.
pyproject.toml (1)

140-142: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Prefer a targeted Ruff suppression over a file-wide ignore.

This disables complexity checks for every future function in src/findata/resolver/engine.py, not just the intentional rule cascade. A function-level # noqa on the specific dispatcher(s) keeps the exception local and preserves signal for new helpers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pyproject.toml` around lines 140 - 142, Move the Ruff suppression out of the
pyproject.toml file-wide ignore and localize it to the intentional dispatcher in
src/findata/resolver/engine.py. Keep the complexity exceptions only on the
specific function(s) implementing the classification cascade, using a
function-level noqa on the relevant dispatcher symbol instead of suppressing
C901, PLR0912, and PLR0911 for the entire module.
src/findata/resolver/seed.py (1)

146-150: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win

Fail fast on duplicate curated identifiers.

These dict comprehensions silently let the last duplicate ticker/CNPJ win. One accidental duplicate in SEED_ENTRIES will change resolver output with no signal, which is risky for a hand-maintained taxonomy seed.

Proposed fix
+def _build_unique_index(entries: list[SeedEntry], attr: str) -> dict[str, SeedEntry]:
+    index: dict[str, SeedEntry] = {}
+    for entry in entries:
+        key = getattr(entry, attr)
+        if not key:
+            continue
+        if key in index:
+            raise ValueError(f"Duplicate seed {attr}: {key}")
+        index[key] = entry
+    return index
+
+
 SEED_ENTRIES: list[SeedEntry] = _build_etf_seed() + _GLOBAL_FUNDS
 
 # Index by ticker for O(1) hits (the common path).
-_BY_TICKER: dict[str, SeedEntry] = {e.ticker: e for e in SEED_ENTRIES if e.ticker}
-_BY_CNPJ: dict[str, SeedEntry] = {e.cnpj: e for e in SEED_ENTRIES if e.cnpj}
+_BY_TICKER: dict[str, SeedEntry] = _build_unique_index(SEED_ENTRIES, "ticker")
+_BY_CNPJ: dict[str, SeedEntry] = _build_unique_index(SEED_ENTRIES, "cnpj")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/findata/resolver/seed.py` around lines 146 - 150, The seed indexes in the
resolver silently overwrite duplicate tickers/CNPJs when building _BY_TICKER and
_BY_CNPJ from SEED_ENTRIES, so add an explicit duplicate check during
initialization and fail fast if any curated identifier repeats. Update the logic
around _build_etf_seed, _GLOBAL_FUNDS, and the _BY_TICKER/_BY_CNPJ construction
so duplicates are detected before the dicts are created, and raise a clear error
naming the repeated identifier and its SeedEntry source.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/MCP_SURFACE.md`:
- Line 58: The fenced block in the MCP_SURFACE documentation is missing a
language identifier and is triggering MD040; update that markdown fence to
include an appropriate language tag by locating the affected fenced block in the
docs content and adding the missing specifier so the Markdown linter passes.

In `@src/findata/api/mcp_app.py`:
- Around line 101-126: The resolve_asset_tool endpoint metadata is advertising
the wrong contract by including Intl/Internacional in the macro taxonomy. Update
the router summary and the resolve_asset_tool docstring to match the actual
resolver output and wording, keeping the taxonomy focused on the real macro
classes and moving any geography-related concept to exposure instead. Use the
resolve_asset_tool function and its response description as the source of truth
so the text clearly matches what resolve_asset returns.

In `@src/findata/api/routers/resolver.py`:
- Around line 27-33: Update the resolver route docstring in the resolver module
so the documented macro_class values no longer include Internacional. Keep the
description aligned with the current contract by reflecting that geography is
now represented via exposure, and make sure the text around the
resolver/classification behavior matches the symbols used by the API route (for
example, the resolver docstring and its macro_class field description).

In `@src/findata/resolver/__init__.py`:
- Around line 3-7: The public resolver docs still list Internacional as a macro
class, but the contract now uses geography only in exposure. Update the
documentation tied to resolve_asset in __init__.py so the macro taxonomy only
includes the actual macro_class values and describes Internacional as an
exposure attribute instead. Keep the rest of the resolver contract wording
aligned with the symbols resolve_asset and its returned classification fields.

In `@src/findata/resolver/engine.py`:
- Around line 440-475: The ticker-only fallback in the resolver engine is too
aggressive for ambiguous B3 suffixes and should not force every non-BDR symbol
into a specific asset class. Update the fallback logic in the ticker branch of
the resolver so only high-confidence cases like known BDR suffixes stay
classified, and ambiguous suffixes such as units or other non-share codes return
Indefinido to let the HITL/provider cascade decide. Keep the change localized
around the ticker suffix handling in the same resolver method and preserve the
existing confidence/notes pattern for truly safe classifications.
- Around line 223-248: The debenture classification in
classify()/resolve_asset() is setting tax-exempt fields too early on the
heuristic Lei 12.431 path. Update the logic around _infer_incentivada so that
deb["incentivada_1243"] and tax["isento"] are only populated when the basis is
explicit (provider-confirmed by ISIN), and leave them unset for the heuristic
issuer+IPCA branch. Keep the confidence handling and other classification fields
unchanged.
- Around line 557-565: The optional enrichment loop in resolve_asset() currently
lets a single provider exception abort the whole classification path. Wrap each
await provider(norm, result) call in per-provider exception handling inside the
provider loop, keep the last successful result unchanged on failure, and
continue to the next provider; preserve the existing early-stop check and the
cascade merge behavior in the resolve_asset logic.
- Around line 530-555: The deterministic offline classification path in
classify()/resolve_asset() ignores ISIN even though normalize() preserves it, so
ISIN-only inputs still resolve to Indefinido unless an external provider runs.
Update the core path to use norm.isin in the seed lookup and/or _rule_payload
logic, or remove isin from the public resolve_asset() surface if it is not meant
to be supported. Make the change in the classify and _rule_payload flow so
bare-ISIN assets are handled consistently without relying on providers.

In `@src/findata/resolver/models.py`:
- Around line 74-124: The resolver contract models currently allow unknown
fields to pass silently, which can hide typos and produce incomplete
classifications. Update IdentifierResolved, CvmInfo, DebentureInfo, TaxInfo, and
AssetClassification to use strict Pydantic config with extra fields forbidden,
ideally via a shared base or ConfigDict(extra="forbid"), so payloads passed into
DebentureInfo and TaxInfo fail validation on unexpected keys.

In `@src/findata/resolver/normalize.py`:
- Around line 61-68: The predicate helpers in Normalize are too dependent on
callers pre-folding inputs; update has_token() and name_contains() to normalize
their candidate arguments internally before matching against self.tokens and
self.name_folded. Keep the behavior of the Normalize class consistent by
applying the same folding/normalization logic used elsewhere in the resolver so
natural-string callers work correctly without silently missing matches.

In `@src/findata/resolver/seed.py`:
- Around line 154-168: The name-based lookup in lookup_seed is using raw
substring checks, so short markers like FIA can incorrectly match longer tokens
such as FIAGRO and return the wrong SeedEntry. Update the matching logic in
lookup_seed and/or the SeedEntry name_substrings handling to be token-aware or
boundary-aware for structural markers, while preserving the existing ticker/CNPJ
exact-match behavior and the fold-based normalization.

---

Outside diff comments:
In `@docs/MCP_SURFACE.md`:
- Around line 56-72: The section title and tool list are inconsistent because
the `findata_run_code` entry is still included under “The 25 curated tools,”
making the rendered count 26. Update the `MCP_SURFACE` tool catalog block by
either removing `findata_run_code` from this section and placing it elsewhere as
the opt-in code-mode tool, or renaming the heading to clearly indicate a 25+1
split so the count matches the listed tools.

---

Nitpick comments:
In `@pyproject.toml`:
- Around line 140-142: Move the Ruff suppression out of the pyproject.toml
file-wide ignore and localize it to the intentional dispatcher in
src/findata/resolver/engine.py. Keep the complexity exceptions only on the
specific function(s) implementing the classification cascade, using a
function-level noqa on the relevant dispatcher symbol instead of suppressing
C901, PLR0912, and PLR0911 for the entire module.

In `@src/findata/resolver/seed.py`:
- Around line 146-150: The seed indexes in the resolver silently overwrite
duplicate tickers/CNPJs when building _BY_TICKER and _BY_CNPJ from SEED_ENTRIES,
so add an explicit duplicate check during initialization and fail fast if any
curated identifier repeats. Update the logic around _build_etf_seed,
_GLOBAL_FUNDS, and the _BY_TICKER/_BY_CNPJ construction so duplicates are
detected before the dicts are created, and raise a clear error naming the
repeated identifier and its SeedEntry source.

In `@tests/test_mcp_surface.py`:
- Around line 22-23: The surface test currently only checks aggregate tool and
REST counts, so it can pass even if the new resolver is not actually wired in.
Update the assertions in test_mcp_surface.py to explicitly verify the resolver
surface by name, including resolve_asset and its related MCP/REST exposure,
alongside the existing totals. Use the existing EXPECTED_TOOLS,
EXPECTED_REST_OPERATIONS, and the test that enumerates the surface to pin the
new contract directly instead of relying on count-only coverage.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6929b1a2-994c-4b49-8a04-fb9f422ff799

📥 Commits

Reviewing files that changed from the base of the PR and between b43c6d5 and 24e77d7.

📒 Files selected for processing (14)
  • CHANGELOG.md
  • docs/MCP_SURFACE.md
  • pyproject.toml
  • src/findata/api/app.py
  • src/findata/api/mcp_app.py
  • src/findata/api/routers/resolver.py
  • src/findata/resolver/__init__.py
  • src/findata/resolver/engine.py
  • src/findata/resolver/models.py
  • src/findata/resolver/normalize.py
  • src/findata/resolver/seed.py
  • src/findata/resolver/taxonomy.py
  • tests/test_mcp_surface.py
  • tests/test_resolver.py

Comment thread docs/MCP_SURFACE.md Outdated
Comment thread src/findata/api/mcp_app.py
Comment thread src/findata/api/routers/resolver.py
Comment thread src/findata/resolver/__init__.py Outdated
Comment thread src/findata/resolver/engine.py Outdated
Comment thread src/findata/resolver/engine.py
Comment thread src/findata/resolver/engine.py
Comment thread src/findata/resolver/models.py Outdated
Comment thread src/findata/resolver/normalize.py Outdated
Comment thread src/findata/resolver/seed.py
robertoecf and others added 5 commits June 29, 2026 21:06
…tion

The flat incentivada_1243/isento bools cannot tell a structurally certain
infra signal (explicit name / FI-Infra ETF) apart from a weak issuer+IPCA
heuristic. Add two Literal status fields that carry that certainty:

- DebentureInfo.lei_12431_status: confirmed | candidate | not_applicable | unknown
- TaxInfo.isento_status: confirmed_exempt | candidate_exempt | confirmed_taxable | unknown

Existing bool fields are kept unchanged (spec/test contract). Engine debenture
rule, CRA/CRI and LCI/LCA statutory exemptions, and the IFRA11 seed now stamp
these statuses; Tesouro-backed RF ETFs get not_applicable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A non-debenture ETF should not carry a stub DebentureInfo just to hold
lei_12431_status=not_applicable; None is the honest 'no debenture facts'
shape. Status not_applicable remains on actual non-infra debentures.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a machine-readable `signals` list to AssetClassification alongside the
free-text `notes` and coarse `cascade`, so an auditor sees WHICH rule fired and
WHAT concrete token/phrase/ticker matched. Each `_rule_payload` branch records
its rule id + real matched evidence (via `_first_matching_token`/`_phrase`
helpers); the curated-seed path synthesizes a `curated_seed` signal in
`classify()` without mutating the frozen seed entry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Cross-host CI reviewers (Gemini, CodeRabbit) on PR #33:
- Gate the FIP/Multimercado/FIA keyword paths ("PARTICIPACOES", "MACRO",
  "ACOES"/"EQUITY") on a fund context so company/trade names ("Randon
  Participações SA", "Macro Atacadista") are not misclassified as funds.
- Derive the Tesouro subclasse from the bond code when the name lacks the
  index word: NTN-B → Indexada à Inflação, LFT → Pós-fixada, LTN/NTN-F →
  Prefixada; unmapped public bonds → "Título Público", never "Crédito Privado".
- Drop the redundant second fold() in lookup_seed (input is pre-folded).
- Fix doc drift: macro_class no longer lists "Internacional" (it is the
  exposure axis) in the REST router, MCP tool summary/docstring, package
  docstring, and MCP_SURFACE.md; add a language to the fenced block.

6 new regression tests. 272 → 278 passed, 15 deselected. ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…deliverable)

Add docs/RESOLVER.md: the MCP/REST/Python contract the consolidator calls —
input, full output schema with the two-axis model (asset class + exposure),
fiscal-certainty status, cascade, verified test-set table, and the pre-prod
pending list.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@robertoecf robertoecf changed the title feat(resolver): asset classifier mapping identifiers to Wealthuman taxonomy feat(resolver): asset classifier mapping identifiers to specific taxonomy Jun 30, 2026
- models: strict ConfigDict(extra="forbid") base for the contract models, so a
  typo in an internally-built DebentureInfo/TaxInfo payload fails loudly instead
  of silently dropping a field.
- engine: per-provider try/except in the cascade — a raising provider logs
  provider_error:<Type> on the cascade and continues, never nuking the
  deterministic core result.
- engine: tighten the ticker fallback — only suffixes 3-8 map to Ações; other
  suffixes (subscription rights / odd codes) defer to Indefinido/HITL. Bare ...11
  stays FII (spec test set: HGLG11/MXRF11).
- seed: token-aware name-marker matching, so ("ARBOR","FIA") matches
  "ARBOR FIC FIA" but not "ARBOR FIAGRO" (FIA not a token of FIAGRO).
- normalize: fold candidate args inside has_token/name_contains so natural-string
  callers cannot silently miss rules.
- tests: regressions for each of the above.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant