Conversation
|
Found 4 test failures on Blacksmith runners: Failures
|
There was a problem hiding this comment.
18 issues found across 45 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/java-sdk/src/main/java/com/firecrawl/models/ParseFile.java">
<violation number="1" location="apps/java-sdk/src/main/java/com/firecrawl/models/ParseFile.java:21">
P2: Return a defensive copy from `getContent()` so callers cannot mutate the stored file bytes.</violation>
</file>
<file name="apps/api/src/routes/v2.ts">
<violation number="1" location="apps/api/src/routes/v2.ts:229">
P2: Check credits before buffering the multipart file. As written, exhausted accounts can still force up to 50 MB into memory before the request is rejected.</violation>
</file>
<file name="apps/js-sdk/firecrawl/src/v2/client.ts">
<violation number="1" location="apps/js-sdk/firecrawl/src/v2/client.ts:143">
P2: Add the same schema-aware overloads as `scrape`; otherwise `parse(..., { formats: [{ type: 'json', schema }] })` always returns `json` as `unknown` to TypeScript callers.</violation>
</file>
<file name="apps/python-sdk/firecrawl/__tests__/e2e/v2/aio/test_aio_parse.py">
<violation number="1" location="apps/python-sdk/firecrawl/__tests__/e2e/v2/aio/test_aio_parse.py:13">
P2: Treat whitespace-only `API_KEY`/`API_URL` as missing in the skip condition so this e2e test does not run with invalid environment values.
(Based on your team's feedback about rejecting empty and whitespace-only API keys.) [FEEDBACK_USED]</violation>
</file>
<file name="apps/python-sdk/firecrawl/__tests__/unit/v2/methods/test_parse_request_preparation.py">
<violation number="1" location="apps/python-sdk/firecrawl/__tests__/unit/v2/methods/test_parse_request_preparation.py:51">
P2: Avoid a hard-coded “missing” file path here; it makes the test environment-dependent and potentially flaky if that path ever exists.</violation>
</file>
<file name="apps/python-sdk/firecrawl/__tests__/e2e/v2/test_parse.py">
<violation number="1" location="apps/python-sdk/firecrawl/__tests__/e2e/v2/test_parse.py:13">
P2: Trim the environment variables in the skip condition so whitespace-only `API_KEY`/`API_URL` values do not run this e2e test.
(Based on your team's feedback about rejecting whitespace-only API keys during validation.) [FEEDBACK_USED]</violation>
</file>
<file name="apps/java-sdk/src/test/java/com/firecrawl/FirecrawlClientTest.java">
<violation number="1" location="apps/java-sdk/src/test/java/com/firecrawl/FirecrawlClientTest.java:235">
P2: Assert the uploaded HTML content appears in the parsed markdown; a non-empty check alone can let this E2E test pass on unrelated output.</violation>
</file>
<file name="apps/python-sdk/firecrawl/v2/client.py">
<violation number="1" location="apps/python-sdk/firecrawl/v2/client.py:191">
P3: Include `pathlib.Path` in the new `parse()` file type so the public SDK signature matches the implementation.</violation>
</file>
<file name="apps/python-sdk/firecrawl/v2/methods/aio/parse.py">
<violation number="1" location="apps/python-sdk/firecrawl/v2/methods/aio/parse.py:18">
P1: Duplicate function `_prepare_file_payload` exists in both `parse.py` and `aio/parse.py` with identical implementation. The two helper functions share the same name, same logic, same docstring structure, and same return type. This violates DRY principles and creates a maintenance burden where any bug fix or enhancement to file payload preparation must be manually synchronized in both files.</violation>
<violation number="2" location="apps/python-sdk/firecrawl/v2/methods/aio/parse.py:27">
P2: Avoid synchronous file reads in the async `parse` path; large uploads will block the event loop and reduce concurrency.</violation>
</file>
<file name="apps/python-sdk/firecrawl/v2/utils/http_client.py">
<violation number="1" location="apps/python-sdk/firecrawl/v2/utils/http_client.py:134">
P2: Merge caller headers into multipart-safe defaults; otherwise passing `_prepare_headers(...)` into `post_multipart` sends `Content-Type: application/json` on a multipart body.</violation>
</file>
<file name="apps/api/src/services/logging/log_job.ts">
<violation number="1" location="apps/api/src/services/logging/log_job.ts:236">
P2: Parse jobs are still persisted to GCS as `scrape`, so the new `parses` path produces inconsistent metadata.</violation>
</file>
<file name="apps/api/src/__tests__/snips/v2/parse.test.ts">
<violation number="1" location="apps/api/src/__tests__/snips/v2/parse.test.ts:256">
P3: Do not swallow Supabase query errors inside the polling callback; fail immediately so the real DB error is visible.</violation>
</file>
<file name="apps/js-sdk/firecrawl/src/v2/methods/parse.ts">
<violation number="1" location="apps/js-sdk/firecrawl/src/v2/methods/parse.ts:63">
P2: `ParseOptions.timeout` is ignored here because multipart requests bypass the HTTP client's timeout override path.</violation>
</file>
<file name="apps/api/src/scraper/scrapeURL/engines/fetch/index.ts">
<violation number="1" location="apps/api/src/scraper/scrapeURL/engines/fetch/index.ts:36">
P2: This bypasses the engine's charset re-decoding logic, so non-UTF-8 HTML uploads will be mis-decoded.</violation>
</file>
<file name="apps/rust-sdk/src/v2/parse.rs">
<violation number="1" location="apps/rust-sdk/src/v2/parse.rs:78">
P2: `parse` accepts `ScrapeOptions`, but that type serializes several advanced parse settings in a wire format the `/v2/parse` schema rejects.</violation>
<violation number="2" location="apps/rust-sdk/src/v2/parse.rs:90">
P2: Trim the filename before attaching it to the multipart part; otherwise filenames with surrounding whitespace can fail server-side file-type detection.</violation>
</file>
<file name="apps/python-sdk/firecrawl/v2/methods/parse.py">
<violation number="1" location="apps/python-sdk/firecrawl/v2/methods/parse.py:17">
P3: Path support was added internally, but the public parse methods still omit `Path` from their type hints.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
apps/java-sdk/src/main/java/com/firecrawl/models/ParseFile.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
4 issues found across 20 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/python-sdk/firecrawl/v2/methods/aio/parse.py">
<violation number="1" location="apps/python-sdk/firecrawl/v2/methods/aio/parse.py:33">
P1: `asyncio.to_thread()` drops Python 3.8 compatibility, but the SDK still advertises `>=3.8` support.</violation>
</file>
<file name="apps/rust-sdk/src/v2/parse.rs">
<violation number="1" location="apps/rust-sdk/src/v2/parse.rs:82">
P2: This new `ParseOptions` type omits supported parse fields, so Rust callers lose JSON/change-tracking/attribute configuration that `/v2/parse` still accepts.</violation>
<violation number="2" location="apps/rust-sdk/src/v2/parse.rs:106">
P2: `ParseOptions.proxy` still permits `stealth` and `enhanced`, but `/v2/parse` only accepts `basic` or `auto`. Narrow or validate this field before sending the request.</violation>
</file>
<file name="apps/api/src/scraper/scrapeURL/engines/fetch/index.ts">
<violation number="1" location="apps/api/src/scraper/scrapeURL/engines/fetch/index.ts:32">
P2: Fall back to the document's `<meta charset>` when the header charset cannot be decoded.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
5 issues found across 28 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/js-sdk/firecrawl/src/v2/utils/validation.ts">
<violation number="1" location="apps/js-sdk/firecrawl/src/v2/utils/validation.ts:81">
P2: Reject unsupported parse format string shorthands before continuing; `"screenshot"`, `"changeTracking"`, and `"branding"` currently bypass validation in JS callers.</violation>
</file>
<file name="apps/java-sdk/src/main/java/com/firecrawl/models/ParseOptions.java">
<violation number="1" location="apps/java-sdk/src/main/java/com/firecrawl/models/ParseOptions.java:74">
P2: Guard null format entries before reflection; a null element currently crashes `build()` with `NullPointerException`.</violation>
<violation number="2" location="apps/java-sdk/src/main/java/com/firecrawl/models/ParseOptions.java:125">
P2: Blank `proxy` values bypass validation and are still serialized as request data.</violation>
</file>
<file name="apps/js-sdk/firecrawl/src/v2/types.ts">
<violation number="1" location="apps/js-sdk/firecrawl/src/v2/types.ts:70">
P2: Exclude bare `json` from `ParseFormatString`; the current type accepts `formats: ['json']` even though parse validation rejects it.</violation>
</file>
<file name="apps/rust-sdk/src/v2/parse.rs">
<violation number="1" location="apps/rust-sdk/src/v2/parse.rs:105">
P2: `formats` is now too narrow for `/v2/parse`: it can no longer represent the supported `query` format object, so Rust clients cannot request query answers from parse.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| if (fmt === "json") { | ||
| throw new Error("json format must be an object with { type: 'json', prompt, schema }"); | ||
| } | ||
| continue; |
There was a problem hiding this comment.
P2: Reject unsupported parse format string shorthands before continuing; "screenshot", "changeTracking", and "branding" currently bypass validation in JS callers.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/js-sdk/firecrawl/src/v2/utils/validation.ts, line 81:
<comment>Reject unsupported parse format string shorthands before continuing; `"screenshot"`, `"changeTracking"`, and `"branding"` currently bypass validation in JS callers.</comment>
<file context>
@@ -62,3 +70,73 @@ export function ensureValidScrapeOptions(options?: ScrapeOptions): void {
+ if (fmt === "json") {
+ throw new Error("json format must be an object with { type: 'json', prompt, schema }");
+ }
+ continue;
+ }
+
</file context>
| continue; | |
| if (fmt === "changeTracking" || fmt === "screenshot" || fmt === "branding") { | |
| throw new Error(`parse does not support ${fmt} format`); | |
| } | |
| continue; |
| if (proxy != null && !proxy.isBlank()) { | ||
| if (!proxy.equals("basic") && !proxy.equals("auto")) { | ||
| throw new IllegalArgumentException("parse only supports proxy values 'basic' or 'auto'"); | ||
| } |
There was a problem hiding this comment.
P2: Blank proxy values bypass validation and are still serialized as request data.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/java-sdk/src/main/java/com/firecrawl/models/ParseOptions.java, line 125:
<comment>Blank `proxy` values bypass validation and are still serialized as request data.</comment>
<file context>
@@ -0,0 +1,155 @@
+ if (timeout != null && timeout <= 0) {
+ throw new IllegalArgumentException("timeout must be positive");
+ }
+ if (proxy != null && !proxy.isBlank()) {
+ if (!proxy.equals("basic") && !proxy.equals("auto")) {
+ throw new IllegalArgumentException("parse only supports proxy values 'basic' or 'auto'");
</file context>
| if (proxy != null && !proxy.isBlank()) { | |
| if (!proxy.equals("basic") && !proxy.equals("auto")) { | |
| throw new IllegalArgumentException("parse only supports proxy values 'basic' or 'auto'"); | |
| } | |
| if (proxy != null) { | |
| if (proxy.isBlank()) { | |
| proxy = null; | |
| } else if (!proxy.equals("basic") && !proxy.equals("auto")) { | |
| throw new IllegalArgumentException("parse only supports proxy values 'basic' or 'auto'"); | |
| } | |
| } |
| if (type instanceof String) return (String) type; | ||
| } | ||
| try { | ||
| Object type = fmt.getClass().getMethod("getType").invoke(fmt); |
There was a problem hiding this comment.
P2: Guard null format entries before reflection; a null element currently crashes build() with NullPointerException.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/java-sdk/src/main/java/com/firecrawl/models/ParseOptions.java, line 74:
<comment>Guard null format entries before reflection; a null element currently crashes `build()` with `NullPointerException`.</comment>
<file context>
@@ -0,0 +1,155 @@
+ if (type instanceof String) return (String) type;
+ }
+ try {
+ Object type = fmt.getClass().getMethod("getType").invoke(fmt);
+ if (type instanceof String) return (String) type;
+ } catch (ReflectiveOperationException ignored) {
</file context>
|
|
||
| export type ParseFormatString = Exclude< | ||
| FormatString, | ||
| 'screenshot' | 'changeTracking' | 'branding' |
There was a problem hiding this comment.
P2: Exclude bare json from ParseFormatString; the current type accepts formats: ['json'] even though parse validation rejects it.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/js-sdk/firecrawl/src/v2/types.ts, line 70:
<comment>Exclude bare `json` from `ParseFormatString`; the current type accepts `formats: ['json']` even though parse validation rejects it.</comment>
<file context>
@@ -65,6 +65,22 @@ export type FormatOption =
+export type ParseFormatString = Exclude<
+ FormatString,
+ 'screenshot' | 'changeTracking' | 'branding'
+>;
+
</file context>
| 'screenshot' | 'changeTracking' | 'branding' | |
| 'screenshot' | 'changeTracking' | 'branding' | 'json' |
| #[serde(rename_all = "camelCase")] | ||
| pub struct ParseOptions { | ||
| /// Output formats to include in the response. | ||
| pub formats: Option<Vec<ParseFormat>>, |
There was a problem hiding this comment.
P2: formats is now too narrow for /v2/parse: it can no longer represent the supported query format object, so Rust clients cannot request query answers from parse.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/rust-sdk/src/v2/parse.rs, line 105:
<comment>`formats` is now too narrow for `/v2/parse`: it can no longer represent the supported `query` format object, so Rust clients cannot request query answers from parse.</comment>
<file context>
@@ -79,16 +79,30 @@ pub enum ParseProxyType {
pub struct ParseOptions {
/// Output formats to include in the response.
- pub formats: Option<Vec<Format>>,
+ pub formats: Option<Vec<ParseFormat>>,
/// Additional HTTP headers.
pub headers: Option<HashMap<String, String>>,
</file context>
Summary by cubic
Add a new
/v2/parseAPI to upload and parse local files (HTML, PDF, DOCX, etc.) through the existing scraping pipeline, returning a Document in requested formats. Implements ENG-4608 with full SDK support, correct parse billing, dedicated logging, and explicit rejection of browser-only options.New Features
/v2/parse(multipart/form-data). Fields:file(binary) andoptions(JSON, scrape‑shaped subset). 50 MB limit. Validates input and returns a standard Document. New error:PARSE_UNSUPPORTED_OPTIONS.scrape-workerviauploadedFile; forces fetch pipeline (skipsindexengine), adds HTML charset detection from headers/meta, disables caching and search indexing, logs to theparsestable, avoids GCS uploads, bills under theparseendpoint, and honors zero data retention.parseto JS, Python (sync/async), Java, and Rust with multipart support and parse‑specific validation (disallows change tracking, screenshot, branding, actions, waitFor, location, mobile). Unit/e2e tests, including billing; README examples across SDKs.Dependencies
multerand@types/multer.FormDataand no longer forces JSON Content‑Type.post_multipartfor sync/async clients.ParseFileandParseOptions.reqwest"multipart" and addedmime_guess.Written for commit 4888b93. Summary will update on new commits.