🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
-
Updated
Mar 21, 2026 - TypeScript
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
Python scraper based on AI
Open-source, production-grade web scraping engine built for LLMs. Scrape and crawl the entire web, clean markdown, ready for your agents.
Web Data Scraper - no-code internet scraping. Extract and export to CSV, Excel, JSON, Google Sheets, and Webhook.
AI based web-wrapper for web-content-extraction
qcrawl - fast async web crawling & scraping framework for Python.
Using LLMs and AI browser automation to robustly extract web data
The this.url class is designed to fetch and parse URL data, returning an object with structured information that can then be used for machine learning algorithms in a database or other storage.
Quick guide with code example how to use Java for web scraping
GNewsScraper is a TypeScript package that scrapes article data from Google News based on a keyword or phrase. It returns the results as an array of JSON objects, making it convenient to access and use the scraped information
An API wrapper for Scrappey.com written in Node.js (cloudflare bypass & solver)
AI-based web extractor
Java Framework which is used by the Web Data Commons project to extract Microdata, Microformats and RDFa data, Web graphs, and HTML tables from the web crawls provided by the Common Crawl Foundation.
Open-source web crawler
A pipeline to scrape, extract, and analyze book data from web pages to insights.
High-performance HTML to Markdown converter with full GitHub Flavored Markdown support. Written in Rust, available for Node.js and as a native Rust crate.
The Tableau Web Data Connector for Facebook Insights API
RealShotPDF is a Chrome extension designed to simplify the process of creating PDF documents from web content. The extension allows users to navigate through selected webpages, parse and display links in a tree view, and generate PDFs for the chosen pages. It operates locally without sending any data to external servers.
Lightfeed SDK to search and filter web data
Add a description, image, and links to the web-data-extraction topic page so that developers can more easily learn about it.
To associate your repository with the web-data-extraction topic, visit your repo's landing page and select "manage topics."