The playground

Try any of 1042 endpoints — live.

Pick an endpoint, load a working example, tweak the params, and send — no signup to try. Results render the way the data deserves; raw JSON, headers & code are one tab away.

Playground demo key · api.reefapi.com

post/web-extract/v1/crawl1 credit

Crawl a site starting from a seed URL (up to 25 pages): follows internal links breadth-first with configurable depth, include/exclude URL patterns, and returns every visited page in the formats you choose. Ideal for content indexing, site audits, and building knowledge bases from documentation or blog sites.

Working example

Parameters

urlrequired

The page URL to extract. Full URL or bare domain (https:// assumed). Only http/https; private/internal/metadata targets are SSRF-blocked.

max_pagesoptional

Maximum number of pages to crawl in one call (1–25, default 10). Increase for deeper site coverage. (1–25)

max_depthoptional

Max link depth from the seed URL (0-5, default 2). (0–5)

same_domain_onlyoptional

Only follow links on the seed's registrable domain (default true).

include_patternsoptional

Regex patterns; only URLs whose path matches are crawled.

exclude_patternsoptional

Regex patterns; URLs whose path matches are skipped.

formatsoptional

Which outputs to return (array or comma-string). Any of: markdown, text, html (cleaned main-content), rawHtml, metadata, links, images, jsonld. Default: markdown+metadata. Unknown values are ignored. For screenshots use the web-capture engine.

timeoutoptional

Per-request timeout in seconds (3-60, default 25). (3–60)

request preview

curl -X POST https://api.reefapi.com/web-extract/v1/crawl \
  -H "x-api-key: $REEF_KEY" \
  -H "content-type: application/json" \
  -d '{"url":"https://example.com","max_pages":"2","max_depth":"1"}'

Hit Send to run this endpoint live.