fix: handle empty webpage content without 500 error by withsivram · Pull Request #3213 · firecrawl/firecrawl

withsivram · 2026-03-22T06:41:34Z

Summary

When scraping a valid but content-empty webpage (HTTP 200, no page errors, but no extractable text), the scraper now returns a successful response with empty content instead of throwing a misleading 500 SCRAPE_ALL_ENGINES_FAILED error.
The fix adds a third condition to the engine success check in scrapeURLLoopIter: if the status code is good (2xx/304) and there is no page-level error, treat the scrape as successful even when content is empty.
Existing behavior for pages with content, bad status codes, or page-level errors is unchanged.

Test plan

Scrape a valid but empty HTML page (e.g., <html><body></body></html>) — should return 200 with empty markdown, not 500
Scrape a normal page with content — should still return 200 with content (no regression)
Scrape a page that returns a 4xx/5xx status code — should still attempt fallback engines as before
Scrape a page where the engine reports a page-level error but status is 200 — should still try fallback engines

🤖 Generated with Claude Code

Summary by cubic

Return 200 with empty content for valid webpages that have no extractable text, instead of a 500 SCRAPE_ALL_ENGINES_FAILED.
Adds a success condition in scrapeURLLoopIter for good status (2xx/304) with no page-level error; fallback behavior for bad status or page-level errors is unchanged (fixes #2316).

^{Written for commit 1aadaa8. Summary will update on new commits.}

…2316) When scraping a valid but empty webpage (good HTTP status, no page errors, but no extractable content), the engine loop previously treated the empty result as an engine failure. All engines would fail this check, leading to a NoEnginesLeftError and a misleading 500 response with SCRAPE_ALL_ENGINES_FAILED. The fix adds a third condition to the success check: if the status code is good (2xx/304) and there is no page-level error, the scrape is treated as successful even when the content is empty. This returns a 200 with empty content instead of a 500 error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cubic-dev-ai

No issues found across 1 file

Copilot

Pull request overview

Updates the scrape engine loop success criteria so that valid, content-empty pages no longer surface as SCRAPE_ALL_ENGINES_FAILED (500), aligning the API response with the “empty but successful” outcome described in issue #2316.

Changes:

Treats 2xx/304 + no page-level error as a successful scrape even when extracted content is empty.
Adds targeted logging to distinguish “successful but empty” scrapes from typical successes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-22T06:45:52Z

apps/api/src/scraper/scrapeURL/index.ts

+    // When the page returns a good status code but has no content, this is a
+    // valid empty page — not an engine failure. We accept the result so the
+    // caller gets a 200 with empty content instead of a misleading 500
+    // SCRAPE_ALL_ENGINES_FAILED error. (fixes #2316)
+    if (isLongEnough || !isGoodStatusCode || (isGoodStatusCode && hasNoPageError)) {
+      if (!isLongEnough && isGoodStatusCode && hasNoPageError) {
+        meta.logger.info(
+          "Scrape via " + engine + " returned empty content with a good status code. Treating as successful empty page.",
+          {
+            factors: { isLongEnough, isGoodStatusCode, hasNoPageError },
+            statusCode: engineResult.statusCode,
+          },
+        );
+      } else {
+        meta.logger.info("Scrape via " + engine + " deemed successful.", {
+          factors: { isLongEnough, isGoodStatusCode, hasNoPageError },
+        });
+      }


The new success condition ... || (isGoodStatusCode && hasNoPageError) will treat any 2xx/304 response without engineResult.error as successful even when extracted markdown is empty. Since engines like fetch/index typically leave error undefined, this effectively disables the “try fallbacks when content is empty” behavior and can return empty content for pages where a later engine (e.g. Fire Engine / Playwright) would have produced text. Consider narrowing the empty-content acceptance to cases where you’ve exhausted fallbacks (or returning an “empty-but-acceptable” candidate only if all engines yield empty) so you still waterfall to other engines when the first result has no extractable content.

Suggested change

// When the page returns a good status code but has no content, this is a

// valid empty page — not an engine failure. We accept the result so the

// caller gets a 200 with empty content instead of a misleading 500

// SCRAPE_ALL_ENGINES_FAILED error. (fixes #2316)

if (isLongEnough || !isGoodStatusCode || (isGoodStatusCode && hasNoPageError)) {

if (!isLongEnough && isGoodStatusCode && hasNoPageError) {

meta.logger.info(

"Scrape via " + engine + " returned empty content with a good status code. Treating as successful empty page.",

{

factors: { isLongEnough, isGoodStatusCode, hasNoPageError },

statusCode: engineResult.statusCode,

},

);

} else {

meta.logger.info("Scrape via " + engine + " deemed successful.", {

factors: { isLongEnough, isGoodStatusCode, hasNoPageError },

});

}

// When the page returns a good status code but has no content, we do not

// treat this as a successful scrape here, so that the caller can continue

// to waterfall to other engines that may be able to extract content. The

// decision to ultimately accept an empty result should be made only after

// fallbacks are exhausted.

if (isLongEnough || !isGoodStatusCode) {

meta.logger.info("Scrape via " + engine + " deemed successful.", {

factors: { isLongEnough, isGoodStatusCode, hasNoPageError },

});

Copilot · 2026-03-22T06:45:52Z

apps/api/src/scraper/scrapeURL/index.ts

+    // When the page returns a good status code but has no content, this is a
+    // valid empty page — not an engine failure. We accept the result so the
+    // caller gets a 200 with empty content instead of a misleading 500
+    // SCRAPE_ALL_ENGINES_FAILED error. (fixes #2316)
+    if (isLongEnough || !isGoodStatusCode || (isGoodStatusCode && hasNoPageError)) {
+      if (!isLongEnough && isGoodStatusCode && hasNoPageError) {
+        meta.logger.info(
+          "Scrape via " + engine + " returned empty content with a good status code. Treating as successful empty page.",
+          {
+            factors: { isLongEnough, isGoodStatusCode, hasNoPageError },
+            statusCode: engineResult.statusCode,
+          },
+        );


This change introduces a new “empty content but successful” path for good status codes with no engineResult.error, but there doesn’t appear to be a regression test asserting that an empty (but valid) HTML page returns success: true with empty markdown (and that we don’t raise SCRAPE_ALL_ENGINES_FAILED). Adding a unit test around this branch would help prevent future changes from reintroducing the 500 behavior.

Copilot · 2026-03-22T06:45:53Z

apps/api/src/scraper/scrapeURL/index.ts

+    if (isLongEnough || !isGoodStatusCode || (isGoodStatusCode && hasNoPageError)) {
+      if (!isLongEnough && isGoodStatusCode && hasNoPageError) {


PR description/test plan says 4xx/5xx status codes should still attempt fallback engines, but this success check (and the prior behavior) returns immediately when !isGoodStatusCode is true, so the loop will typically not try additional engines for 4xx/5xx responses (except the special proxy-handling case above). Please align the PR description/test plan with the actual behavior, or adjust the logic if fallback-on-bad-status is intended.

withsivram requested a review from mogery as a code owner March 22, 2026 06:41

Copilot AI review requested due to automatic review settings March 22, 2026 06:41

Copilot started reviewing on behalf of withsivram March 22, 2026 06:42 View session

cubic-dev-ai bot approved these changes Mar 22, 2026

View reviewed changes

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle empty webpage content without 500 error#3213

fix: handle empty webpage content without 500 error#3213
withsivram wants to merge 1 commit intofirecrawl:mainfrom
withsivram:fix/empty-page-500-error-issue-2316

withsivram commented Mar 22, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if (isLongEnough \|\| !isGoodStatusCode \|\| (isGoodStatusCode && hasNoPageError)) {
		if (!isLongEnough && isGoodStatusCode && hasNoPageError) {

Conversation

withsivram commented Mar 22, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by cubic

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

withsivram commented Mar 22, 2026 •

edited by cubic-dev-ai bot

Loading