The web scraping API that gets past the CAPTCHA — and stays this simple
Send a URL, get back clean minified HTML built for AI. The crawler renders pages on a mobile viewport and solves CAPTCHAs automatically — no proxies, no fingerprints, no headless browser fleet to run.
We didn't set out to build a crawler.
The web made us.
We run a grocery shopping & recipe API. Which means reading public webpages all day — recipe blogs, supermarket listings, social posts, across 13 countries — and turning the mess into clean data our models can actually use.
Then the web fought back. Bloated JavaScript. The good stuff hiding on mobile. And CAPTCHAs everywhere.
Every scraper made us pick a side. The easy ones quit the second a CAPTCHA showed up. The tough ones wanted proxies, fingerprints, and a whole browser fleet before they'd read a single page.
So we built our own. Mobile viewport. CAPTCHAs solved on the fly. Pages stripped to lean HTML our AI reads in one pass. It's been quietly doing exactly this in production behind our /parse endpoint ever since.
Nothing out there was this simple. So we're handing it to you.
Every scraping API forces a trade-off. This one doesn't.
The market splits cleanly in two. One side is easy but stops at the wall. The other gets through the wall but is a project to set up. Almost no one does both — and no one defaults to mobile.
Simple, but stops at the wall
Reader-style APIs and prefix tricks. Dead simple to call. Then a CAPTCHA or bot check shows up and you get an error page instead of content.
- Trivial to call
- Folds on CAPTCHAs
- Desktop-first
Gets through, but it's a project
Proxy networks and unblockers. They can defeat bot protection — once you've configured proxies, fingerprints, sessions, and retries, and accepted the bill.
- Defeats bot protection
- Heavy setup & config
- Desktop-first
Both — plus mobile
One authenticated POST with a URL. CAPTCHAs solved inside the call. Output is mobile-rendered, minified, and ready for an LLM. Nothing to configure.
- Trivial to call
- Solves CAPTCHAs
- Mobile-first & AI-ready
How it works
URL in, AI-ready text out. Three steps, one request.
Send a URL
POST any public HTTP or HTTPS URL to /api/crawl with your Bearer token. No options to learn.
We render & unblock
The page is rendered on a mobile viewport in a real browser. If a CAPTCHA appears, it's solved automatically — proxies and fingerprints handled for you.
Get clean HTML
You receive a JSON object with an html field: compact, minified HTML built for parsing and AI workflows.
Output an LLM can actually read
"Minified HTML" means a compact version of the page body: visible text is kept, hidden elements and
<script>/<style>
are removed, and only id and class
survive. You feed it straight to a model — no boilerplate to strip, no token budget wasted on markup.
<body> <script src="analytics.js">…</script> <style>.hdr{display:flex;…}</style> <div class="product" data-ga="x9" style="…"> <div hidden>tracking pixel</div> <h1 data-id="42">Organic Oats</h1> <span class="price">£2.40</span> </div> <!-- 40kb of nav, footer, modals --> </body>
<body> <div class="product"> <h1>Organic Oats</h1> <span class="price">£2.40</span> </div> </body> // every token earns its place
Same content. A fraction of the tokens. Structure preserved via id and class so your parser still knows where things are.
What you can build with it
It started as recipe parsing. It works for any public page you need turned into clean data.
Give an agent eyes on the live web
Let a LangChain, AutoGPT-style, or custom agent read any public page — even CAPTCHA-protected ones — and get back HTML it can reason over directly.
Clean ingestion for retrieval
Feed minified HTML into your chunker and embeddings without writing a boilerplate stripper for every site. Less markup, more signal per token.
Track pages that fight back
Monitor product, listing, or pricing pages that throw bot checks. The crawler gets through and returns the rendered content every time.
Turn any URL into structured input
Summarizers, readers, and research assistants that need the real rendered text — not a half-loaded SPA or a "verify you're human" page.
Parse recipes from anywhere
The original use case. Pull a recipe page or social post and pair it with /parse for structured ingredients, steps, and nutrition.
Reach what desktop scrapers miss
Some sites serve their best content only to mobile. Because the crawler requests a mobile viewport by default, you get the lean version made for phones.
How it compares
A fair look at where the Pepesto Crawl API sits against the tools developers reach for first.
| Pepesto Crawl | Reader-style APIs | Proxy / unblocker platforms | |
|---|---|---|---|
| Solves CAPTCHAs automatically | Yes | No | Yes |
| Setup to first call | One POST | One call | Proxies, fingerprints, config |
| Mobile viewport by default | Yes | No | Optional / manual |
| Output tuned for AI | Minified HTML | Markdown / text | Raw HTML |
| Pricing model | €0.20 / request, pay-as-you-go | Token / credit tiers | Subscription + usage |
| No subscription required | Yes | Varies | Usually no |
A general comparison of common approaches, not specific products. Capabilities vary by provider and plan.
Simple pricing
Pay only for what you crawl. No subscription, no per-site fee.
One price per crawled page — CAPTCHA solving, mobile rendering, and minification all included. Pay-as-you-go via Stripe; your API key is returned instantly.
Running real volume? Volume discounts are available, and we're happy to tailor a rate to what you're building. Tell us about your project — we'd love to chat.
See full pricing →Frequently asked questions
It's a REST endpoint that fetches a public webpage and returns a compact, mobile-optimized minified HTML representation of the rendered page. You send a single URL and receive clean HTML built for downstream parsing and AI workflows. It solves CAPTCHAs when they're encountered, so it returns content for CAPTCHA-protected public pages that most scrapers fail on.
Yes. When the crawler hits a CAPTCHA on a public page, it solves it and returns the content. You don't configure proxies, browser fingerprints, or third-party CAPTCHA services — it's all inside the single /api/crawl call. That's the core difference: easy scraping APIs stop at the CAPTCHA wall, and APIs that get past it usually require heavy setup.
A JSON object with an html field — the minified HTML from the rendered page. Visible text is preserved, hidden elements and <script>/<style> tags are removed, and only id and class attributes are retained. It's an AI-ready representation of the page body for parsing and LLM workflows, not a byte-for-byte copy of the original source.
The crawler requests pages with a mobile viewport. Mobile pages are leaner, render faster, and carry the core content without desktop-only clutter — so the resulting minified HTML is smaller and cleaner for AI analysis. Most competing scraping APIs default to a desktop browser; mobile-first rendering is a niche almost no one serves.
No. The endpoint works on public pages only. It doesn't work for anything that requires authentication — pages behind a login, a private account area, a paywall login, or any flow that depends on a user-specific session.
Most scraping tools force a trade-off. The simple, easy-to-call ones tend to fold the moment a page is CAPTCHA-protected. The ones that get past bot protection usually require proxy configuration, fingerprinting, and real engineering setup. The Pepesto Crawl API does both in a single POST — it solves CAPTCHAs and stays trivial to call — and renders mobile-first by default, which almost no one else does.
€0.20 per request, pay-as-you-go via Stripe credits — no monthly subscription and no per-site fee. Volume discounts bring it down further, and we're glad to tailor a rate, so get in touch if you're running real volume — we'd love to chat. Buy credits and your API key is returned instantly, with no approval process. See the pricing page for current rates across all endpoints.