feat: add fastCRW reader by us · Pull Request #23 · InternScience/InternAgent

us · 2026-06-14T23:47:35Z

What

Adds fastCRW as a web scrape/search provider, alongside the existing Firecrawl integration — additive, mirrors the Firecrawl wiring (Firecrawl untouched).

Why

fastCRW is a genuinely different engine, not a Firecrawl alias:

Runs 100% locally with stealth + JS rendering in the open core. Anti-bot handling (Cloudflare JS-challenge, UA rotation), BYO-proxy + rotation, SPA rendering, and an HTTP→headless→proxy fallback ladder all ship in the AGPL binary. Firecrawl's OSS self-host gates its stealth engine (fire-engine) behind a cloud-only flag, so its self-hosted instance falls back to plain fetch/Playwright and can't reach protected or JS-heavy sites. fastCRW's open core can.

Faster and higher recall on Firecrawl's own benchmark dataset. Truth-recall 63.74% vs 56.04%; median latency ~1.9 s vs ~2.3 s (p50). Ships as a single ~8 MB Rust binary using ~6 MB RAM — no multi-service stack.

Search is built on SearXNG, not an alternative to it. SearXNG is the metasearch aggregator underneath; crw adds a quality layer on top: query expansion (multi-variant rewrite), content-aware reranking (re-scoring by fetched content instead of SearXNG's content-blind ordering), and category routing (research queries fan out to arxiv/semantic scholar/Google Scholar, code queries to GitHub). You get SearXNG's breadth plus a measurable accuracy layer — all open-source (AGPL) and self-hostable with configurable engines.

Flat pricing: 1 credit = 1 page, no stealth surcharge, no billed-on-failure. Key via CRW_API_KEY (free tier at https://fastcrw.com/dashboard); self-host base URL supported.

The integration diff is small precisely because fastCRW exposes a Firecrawl-compatible API — that compatibility is the reason this is additive rather than a larger change, not the reason to use it.

Happy to adjust to your conventions — I maintain the integration and can provide free credits to evaluate.

gemini-code-assist

Code Review

This pull request introduces a new Crw loader class in internagent/mas/agents/dr_agents/camel/loaders/crw_reader.py to integrate the fastCRW web scraper, exposing it in the package's __init__.py. The review feedback suggests simplifying the parameter unpacking in the crawl method to use **(params or {}) for consistency, and adding a type check in structured_scrape to ensure data is a dictionary before calling .get() to prevent potential AttributeErrors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-14T23:48:22Z

+            crawl_response = self.app.crawl(
+                url=url,
+                **({} if params is None else params),
+                **kwargs,
+            )


For consistency with the scrape and map_site methods in this class, use **(params or {}) instead of the more verbose inline conditional expression.

Suggested change

crawl_response = self.app.crawl(

url=url,

**({} if params is None else params),

**kwargs,

)

crawl_response = self.app.crawl(

url=url,

**(params or {}),

**kwargs,

)

gemini-code-assist · 2026-06-14T23:48:22Z

+            data = self.app.scrape(
+                url,
+                formats=['json'],
+                jsonSchema=response_format.model_json_schema(),
+            )
+            return data.get('json', {})


If self.app.scrape returns None or a non-dictionary response (e.g., due to an unexpected API failure or empty response), calling .get() on it will raise an AttributeError. Adding a type check ensures robustness.

Suggested change

data = self.app.scrape(

url,

formats=['json'],

jsonSchema=response_format.model_json_schema(),

)

return data.get('json', {})

data = self.app.scrape(

url,

formats=['json'],

jsonSchema=response_format.model_json_schema(),

)

return data.get('json', {}) if isinstance(data, dict) else {}

feat: add fastCRW reader

d1a84a6

gemini-code-assist Bot reviewed Jun 14, 2026

View reviewed changes

refactor(crw): simplify crawl params and guard structured_scrape return

98a1e07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add fastCRW reader#23

feat: add fastCRW reader#23
us wants to merge 2 commits into
InternScience:mainfrom
us:feat/add-fastcrw

us commented Jun 14, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

us commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

us commented Jun 14, 2026 •

edited

Loading