Overview
Forage is a declarative language for gathering data. A recipe says what records you want and where they live; the runtime fetches them, types them, and returns a structured result. A recipe is data, not code, so it's easy to review and share.
Every recipe follows the same arc: gather records from a source, enrich them, share the result on the hub.
Recipes are data, not code
Most data collection is bespoke code: a script per site, each with its own request loop, retries, and parsing quirks. Expressive, and impossible to review or hand off.
A Forage recipe declares; it doesn't execute. It names the types it emits, the requests to make, and how each response binds into typed records. The engine is the only thing that runs requests, drives a browser, or applies transforms. A recipe can't run arbitrary code: transforms come from a fixed vocabulary, pagination from a named set of strategies. Narrower expression buys a much smaller trusted surface, which is what makes a recipe reviewable as a diff.
Gather
A recipe pulls typed records from any site or API. You declare the types you want and for / emit bindings that map each response onto them.
A recipe acquires data two ways, mixed freely in one body:
step: the data is already in the response. Fetches over HTTP, no browser.visit: the data only appears once JavaScript runs. Drives a real browser, which also clears bot-management gates that block plain clients.
Both emit the same record types, so downstream code never cares which ran; the runtime only stands up a browser when a visit is present. Pagination, auth, and HTML extraction are part of gathering too.
→ Syntax · Engines · HTML · Auth
Enrich
Records are typed, so they compose. compose chains recipes so one's output feeds the next; an adapter adds fields or reconciles against another source. aligns maps types and fields onto shared vocabularies (schema.org, Wikidata) so a record means the same thing across authors.
Share
A recipe and its types publish to the hub as a self-contained package: source, shared declarations, fixtures, and the snapshot it produced. Anyone can search, import, fork, or build an adapter on its types. The hub indexes by type and alignment, so enrichment travels across authors.
→ Hub
Run it
A workspace is a directory of .forage files plus the data they ride with: _fixtures/<recipe>.jsonl (captured exchanges) and _snapshots/<recipe>.json (the records the recipe should produce). One execution path, three modes:
- Replay: requests are served from fixtures. Sub-second, no network. Your test loop.
- Record: hit the live source, refresh the fixtures, re-run in replay. The one-command repair when a site changes shape.
- Live: production. Records flow wherever you wire them.
Every run returns a diagnostic report: whether anything cut the run short, and which expect invariants didn't hold. In Studio, recipes hot-reload on save.
→ Replay · Expectations · Diagnostics
Where to go next
- New here? Quickstart runs your first recipe end to end.
- Writing recipes? Start with the Syntax reference.
- Tools: the CLI, Studio, and the hub.