docs / web-archive

Web Archive API

Web Archive API

base /web-archive/v17 endpoints
post/web-archive/v1/snapshots1 credit

All Wayback captures for a URL (timestamp, status, mimetype, digest, archival size, snapshot_url), date/status/mime filtered, collapsible, RESUME-KEY PAGINATED.

ParameterAllowed / rangeDescription
urlrequiredThe URL, host, or path to look up in the archive (e.g. 'github.com', 'github.com/torvalds', 'https://example.com/page'). Scheme optional. Use with match=prefix/domain/host to widen.
fromoptionalEarliest capture to include. ISO date (YYYY-MM-DD) or a Wayback timestamp (YYYYMMDDhhmmss, any 4-14 digit prefix). Inclusive.
tooptionalLatest capture to include. ISO date or Wayback timestamp prefix. Inclusive.
limit = 100optional1–1000Max captures to return (1-1000, default 100). Page further with meta.resume_key on the snapshots action.
collapseoptionaldigest · timestamp:4 · timestamp:6 · timestamp:8 · timestamp:10 · urlkey · original · statuscode · mimetypeDeduplicate consecutive captures on a field (CDX 'collapse'). collapse=digest is the change-detection workhorse.
match = exactoptionalexact · prefix · host · domainHow to match url: exact | prefix | host | domain. domain_captures forces 'domain'.
statusoptionalKeep only captures with this HTTP status (e.g. 200, 404, 301). Prefix with '!' to exclude (e.g. '!200').
mimeoptionalKeep only captures of this MIME type (e.g. text/html, application/pdf, image/png). Prefix with '!' to exclude.
filteroptionalAdvanced raw CDX filter expression(s), comma-separated. Format [!]field:regex over urlkey/timestamp/original/mimetype/statuscode/digest/length (e.g. 'original:.*\.pdf$'). Power-user escape hatch.
resume_keyoptionalOpaque pagination token from meta.resume_key of the previous page; returns the next page of captures.
Try in playground →
post/web-archive/v1/available1 credit

Closest single Wayback snapshot to a given date (or the latest). Fast existence check.

ParameterAllowed / rangeDescription
urlrequiredThe URL, host, or path to look up in the archive (e.g. 'github.com', 'github.com/torvalds', 'https://example.com/page'). Scheme optional. Use with match=prefix/domain/host to widen.
timestampoptionalTarget date for the closest snapshot. ISO date or Wayback timestamp prefix. Omitted → the most recent capture.
Try in playground →
post/web-archive/v1/history1 credit

Lifespan + capture cadence for a URL: EXACT first_seen/last_seen + span, plus per-year capture counts and status/mime breakdown — the SEO/due-diligence summary.

ParameterAllowed / rangeDescription
urlrequiredThe URL, host, or path to look up in the archive (e.g. 'github.com', 'github.com/torvalds', 'https://example.com/page'). Scheme optional. Use with match=prefix/domain/host to widen.
fromoptionalEarliest capture to include. ISO date (YYYY-MM-DD) or a Wayback timestamp (YYYYMMDDhhmmss, any 4-14 digit prefix). Inclusive.
tooptionalLatest capture to include. ISO date or Wayback timestamp prefix. Inclusive.
match = exactoptionalexact · prefix · host · domainHow to match url: exact | prefix | host | domain. domain_captures forces 'domain'.
Try in playground →
post/web-archive/v1/domain_captures1 credit

All archived URLs under a domain (the domain + its subdomains), one row per unique URL — the 'every page this site ever had' view. Resume-key paginated.

ParameterAllowed / rangeDescription
urlrequiredThe URL, host, or path to look up in the archive (e.g. 'github.com', 'github.com/torvalds', 'https://example.com/page'). Scheme optional. Use with match=prefix/domain/host to widen.
fromoptionalEarliest capture to include. ISO date (YYYY-MM-DD) or a Wayback timestamp (YYYYMMDDhhmmss, any 4-14 digit prefix). Inclusive.
tooptionalLatest capture to include. ISO date or Wayback timestamp prefix. Inclusive.
limit = 100optional1–1000Max captures to return (1-1000, default 100). Page further with meta.resume_key on the snapshots action.
statusoptionalKeep only captures with this HTTP status (e.g. 200, 404, 301). Prefix with '!' to exclude (e.g. '!200').
mimeoptionalKeep only captures of this MIME type (e.g. text/html, application/pdf, image/png). Prefix with '!' to exclude.
filteroptionalAdvanced raw CDX filter expression(s), comma-separated. Format [!]field:regex over urlkey/timestamp/original/mimetype/statuscode/digest/length (e.g. 'original:.*\.pdf$'). Power-user escape hatch.
resume_keyoptionalPagination token from the previous page's meta.resume_key.
Try in playground →
post/web-archive/v1/batch1 credit

Closest-snapshot existence + capture-count for up to 20 URLs in one call (bulk archival presence — competitor/portfolio sweeps).

ParameterAllowed / rangeDescription
urlsrequiredUp to 20 URLs/hosts to look up at once (JSON array or comma/newline-separated string).
timestampoptionalTarget date for the closest snapshot. ISO date or Wayback timestamp prefix. Omitted → the most recent capture.
Try in playground →
post/web-archive/v1/cc_indexes1 credit

List the available Common Crawl monthly indexes (id, name, date range) — pick a collection for cc_search.

Try in playground →