fix: return successful scrapes for empty 2xx pages by tsubasakong · Pull Request #3131 · firecrawl/firecrawl

tsubasakong · 2026-03-12T21:21:48Z

Summary

treat 2xx scrape results with no page error and no extractable body text as successful instead of cascading into SCRAPE_ALL_ENGINES_FAILED
keep the existing waterfall behavior for genuinely unsuccessful scrapes, but include the empty-page factor in the quality logs
add a focused regression test for the empty-page text detection helper

Testing

pnpm exec jest src/scraper/scrapeURL/empty-page.test.ts --runInBand
git diff --check

Notes

no repo PR template was present in .github/ at checkout time

Summary by cubic

Treat 2xx scrapes with no page error and no extractable text as successful to prevent false SCRAPE_ALL_ENGINES_FAILED and avoid unnecessary fallbacks. Adds hasNoExtractableText and updates quality logs and tests.

Bug Fixes
- Mark explicitly empty pages (2xx, no error, no text) as successful; keep waterfall for real failures or bad status codes.
- Include isExplicitlyEmptyPage in success/failure and proxy-adequacy logs.
- Add a focused regression test for hasNoExtractableText.

^{Written for commit a6c4229. Summary will update on new commits.}

cubic-dev-ai

1 issue found across 3 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/api/src/scraper/scrapeURL/emptyPage.ts">

<violation number="1" location="apps/api/src/scraper/scrapeURL/emptyPage.ts:2">
P2: Empty-page detection can produce false negatives (head-only/bodyless docs and non-`&nbsp;` invisible entities), causing valid empty 2xx scrapes to be marked unsuccessful.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-03-12T21:28:16Z

apps/api/src/scraper/scrapeURL/emptyPage.ts

@@ -0,0 +1,16 @@
+export function hasNoExtractableText(html: string): boolean {
+  const body = html.match(/<body\b[^>]*>([\s\S]*?)<\/body>/i)?.[1] ?? html;


P2: Empty-page detection can produce false negatives (head-only/bodyless docs and non-  invisible entities), causing valid empty 2xx scrapes to be marked unsuccessful.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At apps/api/src/scraper/scrapeURL/emptyPage.ts, line 2: <comment>Empty-page detection can produce false negatives (head-only/bodyless docs and non-` ` invisible entities), causing valid empty 2xx scrapes to be marked unsuccessful.</comment> <file context> @@ -0,0 +1,16 @@ +export function hasNoExtractableText(html: string): boolean { + const body = html.match(/<body\b[^>]*>([\s\S]*?)<\/body>/i)?.[1] ?? html; + + const text = body </file context>

…-success

fix: treat empty 2xx pages as successful scrapes

1e20407

tsubasakong requested a review from mogery as a code owner March 12, 2026 21:21

cubic-dev-ai bot reviewed Mar 12, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into fix/2316-empty-page…

a6c4229

…-success

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: return successful scrapes for empty 2xx pages#3131

fix: return successful scrapes for empty 2xx pages#3131
tsubasakong wants to merge 2 commits intofirecrawl:mainfrom
tsubasakong:fix/2316-empty-page-success

tsubasakong commented Mar 12, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Mar 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -0,0 +1,16 @@
		export function hasNoExtractableText(html: string): boolean {
		const body = html.match(/<body\b[^>]>([\s\S]?)<\/body>/i)?.[1] ?? html;

Conversation

tsubasakong commented Mar 12, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Notes

Summary by cubic

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tsubasakong commented Mar 12, 2026 •

edited by cubic-dev-ai bot

Loading

cubic-dev-ai bot Mar 12, 2026 •

edited

Loading