Does the crawl API solve CAPTCHAs automatically?

Yes. When the crawler encounters a CAPTCHA on a public page, it solves it automatically and returns the page content. You do not configure proxies, browser fingerprints, or third-party CAPTCHA-solving services — it is handled inside the single API call. This is the main differentiator — most easy-to-use scraping APIs stop at the CAPTCHA wall, and most APIs that get past it require heavy setup.

Why is the crawler mobile-first?

The crawler requests pages using a mobile viewport and is optimized for mobile pages. Mobile pages are typically leaner, render faster, and contain the core content without desktop-only clutter — which makes the resulting minified HTML smaller and cleaner for AI analysis. Most competing scraping APIs default to a desktop browser; mobile-first rendering is a niche almost no one serves.

How much does the crawl API cost?

Pay-as-you-go per request via Stripe credits, with no monthly subscription and no per-site fee. Volume discounts are available and we are happy to tailor a rate, so get in touch if you are running serious volume. See the pricing page for current per-request rates across all Pepesto endpoints.

How do I get an API key?

Buy credits through Stripe and your API key is returned instantly — no approval process. Authenticate with a Bearer token and POST a URL to https://s.pepesto.com/api/crawl to receive the minified HTML.

✓ Solves CAPTCHAs Mobile-first AI-ready HTML One POST request

The web scraping API that gets past the CAPTCHA — and stays this simple

Send a URL, get back clean minified HTML built for AI. The crawler renders pages on a mobile viewport and solves CAPTCHAs automatically — no proxies, no fingerprints, no headless browser fleet to run.

Get an API Key See How It Works

13Countries battle-tested

1 POSTURL in, HTML out

MobileViewport by default

€0.20Per request, pay-as-you-go

Why we built this

We didn't set out to build a crawler.
The web made us.

We run a grocery shopping & recipe API. Which means reading public webpages all day — recipe blogs, supermarket listings, social posts, across 13 countries — and turning the mess into clean data our models can actually use.

Then the web fought back. Bloated JavaScript. The good stuff hiding on mobile. And CAPTCHAs everywhere.

Every scraper made us pick a side. The easy ones quit the second a CAPTCHA showed up. The tough ones wanted proxies, fingerprints, and a whole browser fleet before they'd read a single page.

So we built our own. Mobile viewport. CAPTCHAs solved on the fly. Pages stripped to lean HTML our AI reads in one pass. It's been quietly doing exactly this in production behind our /parse endpoint ever since.

Nothing out there was this simple. So we're handing it to you.

Every scraping API forces a trade-off. This one doesn't.

The market splits cleanly in two. One side is easy but stops at the wall. The other gets through the wall but is a project to set up. Almost no one does both — and no one defaults to mobile.

Camp 1 — Easy

Simple, but stops at the wall

Reader-style APIs and prefix tricks. Dead simple to call. Then a CAPTCHA or bot check shows up and you get an error page instead of content.

Trivial to call
Folds on CAPTCHAs
Desktop-first

Camp 2 — Powerful

Gets through, but it's a project

Proxy networks and unblockers. They can defeat bot protection — once you've configured proxies, fingerprints, sessions, and retries, and accepted the bill.

Defeats bot protection
Heavy setup & config
Desktop-first

Pepesto Crawl API

Both — plus mobile

One authenticated POST with a URL. CAPTCHAs solved inside the call. Output is mobile-rendered, minified, and ready for an LLM. Nothing to configure.

Trivial to call
Solves CAPTCHAs
Mobile-first & AI-ready

How it works

URL in, AI-ready text out. Three steps, one request.

Send a URL

POST any public HTTP or HTTPS URL to /api/crawl with your Bearer token. No options to learn.

We render & unblock

The page is rendered on a mobile viewport in a real browser. If a CAPTCHA appears, it's solved automatically — proxies and fingerprints handled for you.

Get clean HTML

You receive a JSON object with an html field: compact, minified HTML built for parsing and AI workflows.

Want the request and response schema? Read the API docs →

Output an LLM can actually read

"Minified HTML" means a compact version of the page body: visible text is kept, hidden elements and <script>/<style> are removed, and only id and class survive. You feed it straight to a model — no boilerplate to strip, no token budget wasted on markup.

Raw rendered page

<body>
  <script src="analytics.js">…</script>
  <style>.hdr{display:flex;…}</style>
  <div class="product" data-ga="x9" style="…">
    <div hidden>tracking pixel</div>
    <h1 data-id="42">Organic Oats</h1>
    <span class="price">£2.40</span>
  </div>
  <!-- 40kb of nav, footer, modals -->
</body>

→

Pepesto minified HTML

<body>
  <div class="product">
    <h1>Organic Oats</h1>
    <span class="price">£2.40</span>
  </div>
</body>
// every token earns its place

Same content. A fraction of the tokens. Structure preserved via id and class so your parser still knows where things are.

What you can build with it

It started as recipe parsing. It works for any public page you need turned into clean data.

AI agents

Give an agent eyes on the live web

Let a LangChain, AutoGPT-style, or custom agent read any public page — even CAPTCHA-protected ones — and get back HTML it can reason over directly.

RAG & LLM pipelines

Clean ingestion for retrieval

Feed minified HTML into your chunker and embeddings without writing a boilerplate stripper for every site. Less markup, more signal per token.

Price & catalog monitoring

Track pages that fight back

Monitor product, listing, or pricing pages that throw bot checks. The crawler gets through and returns the rendered content every time.

Content & research tools

Turn any URL into structured input

Summarizers, readers, and research assistants that need the real rendered text — not a half-loaded SPA or a "verify you're human" page.

Recipe & food apps

Parse recipes from anywhere

The original use case. Pull a recipe page or social post and pair it with /parse for structured ingredients, steps, and nutrition.

Mobile-only content

Reach what desktop scrapers miss

Some sites serve their best content only to mobile. Because the crawler requests a mobile viewport by default, you get the lean version made for phones.

How it compares

A fair look at where the Pepesto Crawl API sits against the tools developers reach for first.

	Pepesto Crawl	Reader-style APIs	Proxy / unblocker platforms
Solves CAPTCHAs automatically	Yes	No	Yes
Setup to first call	One POST	One call	Proxies, fingerprints, config
Mobile viewport by default	Yes	No	Optional / manual
Output tuned for AI	Minified HTML	Markdown / text	Raw HTML
Pricing model	€0.20 / request, pay-as-you-go	Token / credit tiers	Subscription + usage
No subscription required	Yes	Varies	Usually no

A general comparison of common approaches, not specific products. Capabilities vary by provider and plan.

Simple pricing

Pay only for what you crawl. No subscription, no per-site fee.

€0.20 / request

One price per crawled page — CAPTCHA solving, mobile rendering, and minification all included. Pay-as-you-go via Stripe; your API key is returned instantly.

Running real volume? Volume discounts are available, and we're happy to tailor a rate to what you're building. Tell us about your project — we'd love to chat.

See full pricing →

Frequently asked questions

What is the Pepesto Crawl API?

It's a REST endpoint that fetches a public webpage and returns a compact, mobile-optimized minified HTML representation of the rendered page. You send a single URL and receive clean HTML built for downstream parsing and AI workflows. It solves CAPTCHAs when they're encountered, so it returns content for CAPTCHA-protected public pages that most scrapers fail on.

Does it solve CAPTCHAs automatically?

Yes. When the crawler hits a CAPTCHA on a public page, it solves it and returns the content. You don't configure proxies, browser fingerprints, or third-party CAPTCHA services — it's all inside the single /api/crawl call. That's the core difference: easy scraping APIs stop at the CAPTCHA wall, and APIs that get past it usually require heavy setup.

What format does it return?

A JSON object with an html field — the minified HTML from the rendered page. Visible text is preserved, hidden elements and <script>/<style> tags are removed, and only id and class attributes are retained. It's an AI-ready representation of the page body for parsing and LLM workflows, not a byte-for-byte copy of the original source.

Why is it mobile-first?

The crawler requests pages with a mobile viewport. Mobile pages are leaner, render faster, and carry the core content without desktop-only clutter — so the resulting minified HTML is smaller and cleaner for AI analysis. Most competing scraping APIs default to a desktop browser; mobile-first rendering is a niche almost no one serves.

Can it scrape pages behind a login or paywall?

No. The endpoint works on public pages only. It doesn't work for anything that requires authentication — pages behind a login, a private account area, a paywall login, or any flow that depends on a user-specific session.

How is the Pepesto Crawl API different from other scraping APIs?

Most scraping tools force a trade-off. The simple, easy-to-call ones tend to fold the moment a page is CAPTCHA-protected. The ones that get past bot protection usually require proxy configuration, fingerprinting, and real engineering setup. The Pepesto Crawl API does both in a single POST — it solves CAPTCHAs and stays trivial to call — and renders mobile-first by default, which almost no one else does.

How much does it cost and how do I start?

€0.20 per request, pay-as-you-go via Stripe credits — no monthly subscription and no per-site fee. Volume discounts bring it down further, and we're glad to tailor a rate, so get in touch if you're running real volume — we'd love to chat. Buy credits and your API key is returned instantly, with no approval process. See the pricing page for current rates across all endpoints.