Web Archive API
Web Archive API
/web-archive/v1/snapshots1 creditAll Wayback captures for a URL (timestamp, status, mimetype, digest, archival size, snapshot_url), date/status/mime filtered, collapsible, RESUME-KEY PAGINATED.
| Parameter | Allowed / range | Description | |
|---|---|---|---|
| url | required | — | The URL, host, or path to look up in the archive (e.g. 'github.com', 'github.com/torvalds', 'https://example.com/page'). Scheme optional. Use with match=prefix/domain/host to widen. |
| from | optional | — | Earliest capture to include. ISO date (YYYY-MM-DD) or a Wayback timestamp (YYYYMMDDhhmmss, any 4-14 digit prefix). Inclusive. |
| to | optional | — | Latest capture to include. ISO date or Wayback timestamp prefix. Inclusive. |
| limit = 100 | optional | 1–1000 | Max captures to return (1-1000, default 100). Page further with meta.resume_key on the snapshots action. |
| collapse | optional | digest · timestamp:4 · timestamp:6 · timestamp:8 · timestamp:10 · urlkey · original · statuscode · mimetype | Deduplicate consecutive captures on a field (CDX 'collapse'). collapse=digest is the change-detection workhorse. |
| match = exact | optional | exact · prefix · host · domain | How to match url: exact | prefix | host | domain. domain_captures forces 'domain'. |
| status | optional | — | Keep only captures with this HTTP status (e.g. 200, 404, 301). Prefix with '!' to exclude (e.g. '!200'). |
| mime | optional | — | Keep only captures of this MIME type (e.g. text/html, application/pdf, image/png). Prefix with '!' to exclude. |
| filter | optional | — | Advanced raw CDX filter expression(s), comma-separated. Format [!]field:regex over urlkey/timestamp/original/mimetype/statuscode/digest/length (e.g. 'original:.*\.pdf$'). Power-user escape hatch. |
| resume_key | optional | — | Opaque pagination token from meta.resume_key of the previous page; returns the next page of captures. |
/web-archive/v1/available1 creditClosest single Wayback snapshot to a given date (or the latest). Fast existence check.
| Parameter | Allowed / range | Description | |
|---|---|---|---|
| url | required | — | The URL, host, or path to look up in the archive (e.g. 'github.com', 'github.com/torvalds', 'https://example.com/page'). Scheme optional. Use with match=prefix/domain/host to widen. |
| timestamp | optional | — | Target date for the closest snapshot. ISO date or Wayback timestamp prefix. Omitted → the most recent capture. |
/web-archive/v1/history1 creditLifespan + capture cadence for a URL: EXACT first_seen/last_seen + span, plus per-year capture counts and status/mime breakdown — the SEO/due-diligence summary.
| Parameter | Allowed / range | Description | |
|---|---|---|---|
| url | required | — | The URL, host, or path to look up in the archive (e.g. 'github.com', 'github.com/torvalds', 'https://example.com/page'). Scheme optional. Use with match=prefix/domain/host to widen. |
| from | optional | — | Earliest capture to include. ISO date (YYYY-MM-DD) or a Wayback timestamp (YYYYMMDDhhmmss, any 4-14 digit prefix). Inclusive. |
| to | optional | — | Latest capture to include. ISO date or Wayback timestamp prefix. Inclusive. |
| match = exact | optional | exact · prefix · host · domain | How to match url: exact | prefix | host | domain. domain_captures forces 'domain'. |
/web-archive/v1/domain_captures1 creditAll archived URLs under a domain (the domain + its subdomains), one row per unique URL — the 'every page this site ever had' view. Resume-key paginated.
| Parameter | Allowed / range | Description | |
|---|---|---|---|
| url | required | — | The URL, host, or path to look up in the archive (e.g. 'github.com', 'github.com/torvalds', 'https://example.com/page'). Scheme optional. Use with match=prefix/domain/host to widen. |
| from | optional | — | Earliest capture to include. ISO date (YYYY-MM-DD) or a Wayback timestamp (YYYYMMDDhhmmss, any 4-14 digit prefix). Inclusive. |
| to | optional | — | Latest capture to include. ISO date or Wayback timestamp prefix. Inclusive. |
| limit = 100 | optional | 1–1000 | Max captures to return (1-1000, default 100). Page further with meta.resume_key on the snapshots action. |
| status | optional | — | Keep only captures with this HTTP status (e.g. 200, 404, 301). Prefix with '!' to exclude (e.g. '!200'). |
| mime | optional | — | Keep only captures of this MIME type (e.g. text/html, application/pdf, image/png). Prefix with '!' to exclude. |
| filter | optional | — | Advanced raw CDX filter expression(s), comma-separated. Format [!]field:regex over urlkey/timestamp/original/mimetype/statuscode/digest/length (e.g. 'original:.*\.pdf$'). Power-user escape hatch. |
| resume_key | optional | — | Pagination token from the previous page's meta.resume_key. |
/web-archive/v1/batch1 creditClosest-snapshot existence + capture-count for up to 20 URLs in one call (bulk archival presence — competitor/portfolio sweeps).
| Parameter | Allowed / range | Description | |
|---|---|---|---|
| urls | required | — | Up to 20 URLs/hosts to look up at once (JSON array or comma/newline-separated string). |
| timestamp | optional | — | Target date for the closest snapshot. ISO date or Wayback timestamp prefix. Omitted → the most recent capture. |
/web-archive/v1/cc_search1 creditCommon Crawl index lookup for a URL — alternate/broader coverage with richer fields (detected language, encoding, WARC offset/filename). Cross-source corroboration.
| Parameter | Allowed / range | Description | |
|---|---|---|---|
| url | required | — | The URL, host, or path to look up in the archive (e.g. 'github.com', 'github.com/torvalds', 'https://example.com/page'). Scheme optional. Use with match=prefix/domain/host to widen. |
| collection | optional | — | Common Crawl index id (e.g. 'CC-MAIN-2026-21'). Omit to use the newest index. List ids via action='cc_indexes'. |
| match = exact | optional | exact · prefix · host · domain | URL match mode for the Common Crawl index query. |
| limit = 100 | optional | 1–1000 | Max captures to return (1-1000, default 100). Page further with meta.resume_key on the snapshots action. |
| from | optional | — | Earliest capture to include. ISO date (YYYY-MM-DD) or a Wayback timestamp (YYYYMMDDhhmmss, any 4-14 digit prefix). Inclusive. |
| to | optional | — | Latest capture to include. ISO date or Wayback timestamp prefix. Inclusive. |
| filter | optional | — | Advanced raw CDX filter expression(s), comma-separated. Format [!]field:regex over urlkey/timestamp/original/mimetype/statuscode/digest/length (e.g. 'original:.*\.pdf$'). Power-user escape hatch. |
/web-archive/v1/cc_indexes1 creditList the available Common Crawl monthly indexes (id, name, date range) — pick a collection for cc_search.
Try in playground →curl -X POST https://api.reefapi.com/web-archive/v1/snapshots \
-H "x-api-key: $REEF_KEY" \
-H "content-type: application/json" \
-d '{"url":"github.com","limit":10,"collapse":"digest"}'{
"ok": true,
"data": { /* the result */ },
"meta": {
"latency_ms": 240,
"record_count": 12,
"completeness_pct": 100
},
"error": null
}