feat: add fastCRW tool block by us · Pull Request #5025 · simstudioai/sim

us · 2026-06-13T17:21:36Z

What

Adds fastCRW as a tool block (scrape / crawl / map / search), mirroring the existing Firecrawl block.

Why

fastCRW is a Firecrawl-API-compatible web engine in a single ~8MB binary — self-host free or managed cloud. Flat pricing (1 credit = 1 page; no 4x stealth surcharge, no billed-on-failure) and free anti-bot stealth — a drop-in alternative to the Firecrawl block for Sim workflows.

Changes (additive only)

apps/sim/tools/crw/: scrape/crawl/map/search + types (mirrors tools/firecrawl/).
apps/sim/blocks/blocks/crw.ts + registered in blocks/registry.ts, tools/registry.ts.
Icon, CSP allowlist entry, BYOK key entry, integrations.json — every place Firecrawl is registered.

Config

CRW_API_KEY from https://fastcrw.com/dashboard (free tier); base URL overridable for self-host.

Why fastCRW — beyond Firecrawl compatibility

The common assumption: "Firecrawl is open-source, so self-hosting it gets you the same thing." It doesn't.

Firecrawl's real anti-bot and Cloudflare-bypass path runs through fire-engine, which is cloud-only — the self-hosted build falls back to plain fetch / Playwright and cannot get past Cloudflare or most JS-heavy, protected sites. It also requires a multi-service stack (Redis + workers + Playwright) to run at all.

fastCRW ships the full capability set in its open core (AGPL): Cloudflare JS-challenge handling, UA rotation, SPA rendering, BYO-proxy with rotation, and an HTTP → headless → proxy fallback ladder — one binary, no cloud dependency, no asterisks.

Practical upshot for Sim users: self-hosted Sim + fastCRW is a genuinely complete, cloud-free scrape/crawl/search stack — something you cannot actually build today with Firecrawl's OSS. For the managed path, flat per-page pricing (no stealth surcharge, no charge on failure) makes cost predictable for crawl- and search-heavy workflows.

Happy to adjust — I maintain the integration and can provide free credits.

vercel · 2026-06-13T17:21:43Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		Jun 13, 2026 7:40pm

cursor · 2026-06-13T17:21:46Z

PR Summary

Low Risk
Additive integration with BYOK-only outbound API calls; crawl polling could hold a workflow slot until timeout, similar to other async crawl providers.

Overview
Adds fastCRW as a Firecrawl-compatible web data integration so workflows can scrape, search, crawl, and map sites via managed cloud (https://fastcrw.com/api) or a self-hosted Base URL.

New crw workflow block routes operations to four tools (crw_scrape, crw_search, crw_crawl, crw_map) with BYOK API keys (CRW_API_KEY / workspace BYOK provider crw). Crawl creates an async job and polls until completion or the execution timeout.

Wiring is additive across block/tool registries, integrations.json, icon mapping, CSP connect-src for fastcrw.com, and the Search & web BYOK settings section. Vitest coverage exercises URL resolution, request shaping, and response mapping.

^{Reviewed by Cursor Bugbot for commit 056eac2. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 2964aed. Configure here.}

cursor · 2026-06-13T17:24:08Z

+        pages: [],
+        total: 0,
+      },
+    }


Crawl job id wrong path

High Severity

The crawl create handler sets jobId from the top-level id on the JSON body, but fastCRW’s documented POST /v1/crawl response puts the job id under the nested data object. jobId stays undefined, so status polling hits /v1/crawl/undefined and the crawl operation fails end-to-end.

^{Reviewed by Cursor Bugbot for commit 2964aed. Configure here.}

cursor · 2026-06-13T17:24:08Z

+          formats: params.formats || ['markdown'],
+          onlyMainContent: params.onlyMainContent || false,
+        },
+      }


Crawl sends maxPages not limit

Medium Severity

The crawl request body sends maxPages, while fastCRW’s Firecrawl-compatible POST /v1/crawl expects limit for the page cap. The block’s Max Pages value is ignored and the service falls back to its default crawl size.

^{Reviewed by Cursor Bugbot for commit 2964aed. Configure here.}

greptile-apps · 2026-06-13T17:28:16Z

Greptile Summary

This PR adds fastCRW as a new tool block (scrape / crawl / map / search), mirroring the existing Firecrawl block. The integration is additive-only: new files under tools/crw/ and blocks/blocks/crw.ts, plus registration in the block/tool registries, BYOK keys, CSP allowlist, icon, and integrations.json.

Four tool configs (crw_scrape, crw_search, crw_crawl, crw_map) mirror Firecrawl's structure with fastCRW-specific differences: maxPages instead of limit for crawl, a dynamic baseUrl param for self-hosting, and a resolveCrwBaseUrl helper.
Registration is complete across all required locations (BYOK schema, type union, CSP, icon mapping, integrations JSON), and a test file covers URL construction, body building, and response transformation for all four operations.

Confidence Score: 4/5

The change is purely additive and isolated to new files; no existing functionality is modified. The three tools with hardcoded success responses will silently swallow API-level errors, but they won't cause data corruption or affect other blocks.

Three of the four new tools (scrape, search, crawl) always return success: true from transformResponse even when the API body indicates failure — the crawl case is the worst because an undefined jobId leads the poll loop to request /v1/crawl/undefined, masking the real error. The fourth tool (map) handles this correctly, making the inconsistency self-contained within this PR. No other part of the codebase is touched.

apps/sim/tools/crw/scrape.ts, apps/sim/tools/crw/search.ts, apps/sim/tools/crw/crawl.ts — the transformResponse functions in all three need to check data.success before reporting a successful result.

Important Files Changed

Filename	Overview
apps/sim/blocks/blocks/crw.ts	New block config mirroring Firecrawl; routes scrape/search/crawl/map to the correct crw_* tools, formats params, and exposes baseUrl for self-hosting. Clean and consistent with existing block patterns.
apps/sim/tools/crw/scrape.ts	Scrape tool is structurally correct but hardcodes `success: true` in transformResponse regardless of API-level errors, unlike map.ts which properly checks data.success.
apps/sim/tools/crw/search.ts	Search tool also hardcodes `success: true` in transformResponse; same inconsistency with map.ts. Additionally, `limit` and `sources` params are used in the body builder but not declared in the tool's params definition (though this mirrors the Firecrawl search pattern).
apps/sim/tools/crw/crawl.ts	Crawl tool implements async polling correctly, but transformResponse ignores data.success — if job creation returns HTTP 200 with success:false, postProcess will poll /v1/crawl/undefined leading to a confusing 404 error instead of the real failure.
apps/sim/tools/crw/map.ts	Map tool correctly checks data.success in transformResponse and handles missing links with a fallback array. Well-structured and complete.
apps/sim/tools/crw/types.ts	Comprehensive type definitions and output property constants. Clean mirror of the Firecrawl types, with appropriate additions for fastCRW-specific fields.
apps/sim/tools/crw/crw.test.ts	Good coverage of URL construction, body building, and response transformation for all four operations. Tests document the expected API response shapes clearly.
apps/sim/lib/core/security/csp.ts	Adds https://fastcrw.com to connect-src allowlist. Covers the full domain/origin, which is sufficient since the API lives at /api/v1/* on the same origin.
apps/sim/tools/crw/base-url.ts	Clean utility for resolving the base URL, with trailing-slash stripping and a sensible default. Well-tested.
apps/sim/lib/api/contracts/byok-keys.ts	Correctly adds 'crw' to the BYOK provider ID zod schema enum.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[CrwBlock - crw.ts] -->|operation=scrape| B[crw_scrape tool]
    A -->|operation=search| C[crw_search tool]
    A -->|operation=crawl| D[crw_crawl tool]
    A -->|operation=map| E[crw_map tool]

    B --> F["POST /v1/scrape\n(fastcrw.com/api)"]
    C --> G["POST /v1/search\n(fastcrw.com/api)"]
    D --> H["POST /v1/crawl\n(fastcrw.com/api)"]
    E --> I["POST /v1/map\n(fastcrw.com/api)"]

    D -->|async job| J[postProcess polling loop]
    J --> K["GET /v1/crawl/{jobId}"]
    K -->|completed| L[Return pages + total]
    K -->|failed| M[Return error]
    K -->|timeout| N[Return timeout error]

    B --> O[transformResponse - always success:true]
    C --> P[transformResponse - always success:true]
    E --> Q[transformResponse - checks data.success]

Comments Outside Diff (1)

apps/sim/tools/crw/crawl.ts, line 623-634 (link)

transformResponse ignores API-level job creation failure

If the crawl POST returns HTTP 200 with { success: false, error: "…" }, transformResponse still returns success: true with jobId: undefined. postProcess then checks if (!result.success) (passes), and proceeds to poll ${baseUrl}/v1/crawl/undefined, which returns a 404 and surfaces a confusing "Failed to get crawl status: Not Found" error rather than the original creation error. Guard against this by checking data.success (or at least data.id) in transformResponse before the poll loop begins.

_{Reviews (1): Last reviewed commit: "feat: add fastCRW tool block" | Re-trigger Greptile}

greptile-apps · 2026-06-13T17:28:20Z

+    const result = data.data ?? data
+
+    return {
+      success: true,
+      output: {
+        markdown: result.markdown,
+        html: result.html,
+        metadata: result.metadata,
+      },
+    }
+  },
+
+  outputs: {


Scrape/search always report success: true regardless of API error body

Both scrape.ts and search.ts hardcode success: true in transformResponse. The map.ts counterpart correctly propagates data.success. When the fastCRW API returns HTTP 200 with { success: false, error: "…" } (e.g., invalid URL or auth error), the scrape and search tools will still emit success: true with undefined output fields, masking the failure from downstream blocks. map.ts shows the correct pattern: return success: data.success and reflect it in the output envelope.

greptile-apps · 2026-06-13T17:28:21Z

+  transformResponse: async (response: Response) => {
+    const data = await response.json()
+
+    return {
+      success: true,
+      output: {
+        data: data.data,
+      },
+    }
+  },


Search always reports success: true on API-level failures

Same issue as scrape.ts — transformResponse always returns success: true without checking data.success. The map.ts tool in this same PR correctly checks data.success. If the search API returns { success: false, error: "…" } with HTTP 200, downstream blocks see a successful result with data: undefined rather than a proper error.

scrape, search, and crawl transformResponse hardcoded success: true, masking HTTP 200 responses with { success: false, error }. They now reflect data.success and surface the error, matching map.ts. Crawl additionally fails fast when job creation has no id, preventing a poll loop against /v1/crawl/undefined. Adds error-path tests.

feat: add fastCRW tool block

2964aed

vercel Bot temporarily deployed to Preview June 13, 2026 17:21 Inactive

cursor Bot reviewed Jun 13, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 13, 2026

View reviewed changes

vercel Bot temporarily deployed to Preview June 13, 2026 19:40 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add fastCRW tool block#5025

feat: add fastCRW tool block#5025
us wants to merge 2 commits into
simstudioai:mainfrom
us:feat/add-fastcrw

us commented Jun 13, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

cursor Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 13, 2026

Uh oh!

Uh oh!

cursor Bot Jun 13, 2026

Uh oh!

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Jun 13, 2026

Uh oh!

greptile-apps Bot Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

us commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Changes (additive only)

Config

Why fastCRW — beyond Firecrawl compatibility

Uh oh!

vercel Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 13, 2026

Choose a reason for hiding this comment

Crawl job id wrong path

Uh oh!

Uh oh!

cursor Bot Jun 13, 2026

Choose a reason for hiding this comment

Crawl sends maxPages not limit

Uh oh!

greptile-apps Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

us commented Jun 13, 2026 •

edited

Loading

vercel Bot commented Jun 13, 2026 •

edited

Loading

cursor Bot commented Jun 13, 2026 •

edited

Loading

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading