docs / web-archive

Web Archive API

base /web-archive/v17 endpoints

post/web-archive/v1/snapshots1 credit

All Wayback captures for a URL (timestamp, status, mimetype, digest, archival size, snapshot_url), date/status/mime filtered, collapsible, RESUME-KEY PAGINATED.

Parameter		Allowed / range	Description
url	required	—	The URL, host, or path to look up in the archive (e.g. 'github.com', 'github.com/torvalds', 'https://example.com/page'). Scheme optional. Use with match=prefix/domain/host to widen.
from	optional	—	Earliest capture to include. ISO date (YYYY-MM-DD) or a Wayback timestamp (YYYYMMDDhhmmss, any 4-14 digit prefix). Inclusive.
to	optional	—	Latest capture to include. ISO date or Wayback timestamp prefix. Inclusive.
limit = 100	optional	1–1000	Max captures to return (1-1000, default 100). Page further with meta.resume_key on the snapshots action.
collapse	optional	digest · timestamp:4 · timestamp:6 · timestamp:8 · timestamp:10 · urlkey · original · statuscode · mimetype	Deduplicate consecutive captures on a field (CDX 'collapse'). collapse=digest is the change-detection workhorse.
match = exact	optional	exact · prefix · host · domain	How to match url: exact \| prefix \| host \| domain. domain_captures forces 'domain'.
status	optional	—	Keep only captures with this HTTP status (e.g. 200, 404, 301). Prefix with '!' to exclude (e.g. '!200').
mime	optional	—	Keep only captures of this MIME type (e.g. text/html, application/pdf, image/png). Prefix with '!' to exclude.
filter	optional	—	Advanced raw CDX filter expression(s), comma-separated. Format [!]field:regex over urlkey/timestamp/original/mimetype/statuscode/digest/length (e.g. 'original:.*\.pdf$'). Power-user escape hatch.
resume_key	optional	—	Opaque pagination token from meta.resume_key of the previous page; returns the next page of captures.

Try in playground →

post/web-archive/v1/available1 credit

Closest single Wayback snapshot to a given date (or the latest). Fast existence check.

Parameter		Allowed / range	Description
url	required	—	The URL, host, or path to look up in the archive (e.g. 'github.com', 'github.com/torvalds', 'https://example.com/page'). Scheme optional. Use with match=prefix/domain/host to widen.
timestamp	optional	—	Target date for the closest snapshot. ISO date or Wayback timestamp prefix. Omitted → the most recent capture.

Try in playground →

post/web-archive/v1/history1 credit

Lifespan + capture cadence for a URL: EXACT first_seen/last_seen + span, plus per-year capture counts and status/mime breakdown — the SEO/due-diligence summary.

Parameter		Allowed / range	Description
url	required	—	The URL, host, or path to look up in the archive (e.g. 'github.com', 'github.com/torvalds', 'https://example.com/page'). Scheme optional. Use with match=prefix/domain/host to widen.
from	optional	—	Earliest capture to include. ISO date (YYYY-MM-DD) or a Wayback timestamp (YYYYMMDDhhmmss, any 4-14 digit prefix). Inclusive.
to	optional	—	Latest capture to include. ISO date or Wayback timestamp prefix. Inclusive.
match = exact	optional	exact · prefix · host · domain	How to match url: exact \| prefix \| host \| domain. domain_captures forces 'domain'.

Try in playground →

post/web-archive/v1/domain_captures1 credit

All archived URLs under a domain (the domain + its subdomains), one row per unique URL — the 'every page this site ever had' view. Resume-key paginated.

Parameter		Allowed / range	Description
url	required	—	The URL, host, or path to look up in the archive (e.g. 'github.com', 'github.com/torvalds', 'https://example.com/page'). Scheme optional. Use with match=prefix/domain/host to widen.
from	optional	—	Earliest capture to include. ISO date (YYYY-MM-DD) or a Wayback timestamp (YYYYMMDDhhmmss, any 4-14 digit prefix). Inclusive.
to	optional	—	Latest capture to include. ISO date or Wayback timestamp prefix. Inclusive.
limit = 100	optional	1–1000	Max captures to return (1-1000, default 100). Page further with meta.resume_key on the snapshots action.
status	optional	—	Keep only captures with this HTTP status (e.g. 200, 404, 301). Prefix with '!' to exclude (e.g. '!200').
mime	optional	—	Keep only captures of this MIME type (e.g. text/html, application/pdf, image/png). Prefix with '!' to exclude.
filter	optional	—	Advanced raw CDX filter expression(s), comma-separated. Format [!]field:regex over urlkey/timestamp/original/mimetype/statuscode/digest/length (e.g. 'original:.*\.pdf$'). Power-user escape hatch.
resume_key	optional	—	Pagination token from the previous page's meta.resume_key.

Try in playground →

post/web-archive/v1/batch1 credit

Closest-snapshot existence + capture-count for up to 20 URLs in one call (bulk archival presence — competitor/portfolio sweeps).

Parameter		Allowed / range	Description
urls	required	—	Up to 20 URLs/hosts to look up at once (JSON array or comma/newline-separated string).
timestamp	optional	—	Target date for the closest snapshot. ISO date or Wayback timestamp prefix. Omitted → the most recent capture.

Try in playground →

post/web-archive/v1/cc_search1 credit

Common Crawl index lookup for a URL — alternate/broader coverage with richer fields (detected language, encoding, WARC offset/filename). Cross-source corroboration.

Parameter		Allowed / range	Description
url	required	—	The URL, host, or path to look up in the archive (e.g. 'github.com', 'github.com/torvalds', 'https://example.com/page'). Scheme optional. Use with match=prefix/domain/host to widen.
collection	optional	—	Common Crawl index id (e.g. 'CC-MAIN-2026-21'). Omit to use the newest index. List ids via action='cc_indexes'.
match = exact	optional	exact · prefix · host · domain	URL match mode for the Common Crawl index query.
limit = 100	optional	1–1000	Max captures to return (1-1000, default 100). Page further with meta.resume_key on the snapshots action.
from	optional	—	Earliest capture to include. ISO date (YYYY-MM-DD) or a Wayback timestamp (YYYYMMDDhhmmss, any 4-14 digit prefix). Inclusive.
to	optional	—	Latest capture to include. ISO date or Wayback timestamp prefix. Inclusive.
filter	optional	—	Advanced raw CDX filter expression(s), comma-separated. Format [!]field:regex over urlkey/timestamp/original/mimetype/statuscode/digest/length (e.g. 'original:.*\.pdf$'). Power-user escape hatch.

Try in playground →

post/web-archive/v1/cc_indexes1 credit

List the available Common Crawl monthly indexes (id, name, date range) — pick a collection for cc_search.

Try in playground →

Example request · snapshots

curl -X POST https://api.reefapi.com/web-archive/v1/snapshots \
  -H "x-api-key: $REEF_KEY" \
  -H "content-type: application/json" \
  -d '{"url":"github.com","limit":10,"collapse":"digest"}'

Response shape

{
  "ok": true,
  "data": { /* the result */ },
  "meta": {
    "latency_ms": 240,
    "record_count": 12,
    "completeness_pct": 100
  },
  "error": null
}