Syntax reference
Every construct in the .forage DSL. Read top-to-bottom for a tour; jump to a section if you know what you're looking for. For the formal EBNF, see Grammar.
Workspace shape
A workspace is a directory marked by forage.toml. Source files live as .forage files at any depth, typically flat at the workspace root. File position carries no semantics: every .forage file declares zero or one recipes and zero or more type / enum / fn declarations. Reserved data dirs sit alongside source:
my-recipes/
├── forage.toml
├── catalog.forage // shared type / enum declarations
├── acme-store.forage // recipe "acme-store"
├── market-deals.forage // recipe "market-deals"
├── _fixtures/
│ ├── acme-store.jsonl
│ └── market-deals.jsonl
├── _snapshots/
│ ├── acme-store.json
│ └── market-deals.json
└── .forage/ // daemon runtime stateThe default workspace is ~/.forage/, but any directory marked by forage.toml works. _fixtures/<recipe>.jsonl is the replay capture stream, keyed by recipe header name; _snapshots/<recipe>.json is the golden snapshot; .forage/ holds the daemon's SQLite state. Source files can't live inside these reserved directories.
Recipe header
A .forage file declares at most one recipe. The header (recipe "<name>") sits at the top of the file alongside the other top-level forms. There is no engine declaration and no surrounding { } block — a recipe mixes steps (author requests) and visits (drive a browser) freely, and the runtime stands up a browser only if the body has a visit.
recipe "acme-store"
// ... body ...The string in the header is the recipe's identity: the daemon, output stores, fixtures, snapshots, and hub publishes all key on it. File basenames are organizational and incidental; a file named foo.forage can declare a recipe named bar. Comments are // to end-of-line or /* … */ block.
A file without a recipe "..." header is a pure declarations file. It contributes shared types / enums / fns to the workspace catalog and can't carry recipe-only forms (auth, step, visit, for, emit, expect).
Types
Declare the shape of records the recipe will emit. Fields are typed; ? marks a field optional; [T] is a list. Nested record types are allowed.
type Product {
externalId: String
name: String
brand: String?
price: Double?
tags: [String]
}Built-in scalars: String, Int, Double, Bool.
By default a type is file-scoped: visible only to the recipe (or other declarations) in the same file. Prefix with share to publish it to the workspace catalog:
share type Product { … } // visible to every recipe in the workspace
type LocalPanel { … } // file-scoped helperWorkspace-wide name collisions among shared types are a validator error. A file-scoped type in one file overrides a same-named shared type when both reach the same recipe's catalog, useful for recipe-specific overrides of a shared shape.
Enums
A closed set of named variants. Used as field types and in iteration.
share enum Channel { ONLINE, STORE }Like type, enum is file-scoped by default and share-able to the workspace.
Inputs
Per-run parameters supplied by the consumer. The same recipe can serve every store on a platform; per-store config (store id, menu URL, category list) comes in as inputs.
input storeId: String
input channels: [Channel]
input categoryIds: [Int]Reference an input anywhere a value is expected as $input.fieldName. input declarations are recipe-local; they don't take share.
Emits
A recipe may declare the set of types it emits with a top-level emits clause. Single-type recipes use emits T; multi-type recipes declare a sum with |:
emits Product // single-type
emits Product | Variant | PriceObservation // multi-type sumThe clause is optional. When present, every emit X { … } in the body must reference a type listed in emits; the validator flags mismatches. When omitted, the recipe's output shape is inferred from whatever its body emits, and no per-emit cross-check fires. The clause sits alongside the header and other top-level forms.
Auth
Auth strategies are named, fixed primitives. Pick one (or none); the engine knows how to apply it.
auth.staticHeader
A single header sent on every request.
auth.staticHeader {
name: "X-Store-Id"
value: $input.storeId
}auth.htmlPrime
For sites that gate their AJAX endpoints behind a per-session nonce and a cookie set on first page load. A named step performs the prime; the engine extracts the nonce by regex on the response body and carries the cookie forward.
auth.htmlPrime {
step: prime
nonceVar: "ajaxNonce"
ajaxUrlVar: "ajaxUrl"
}Steps
A step names an HTTP request whose response becomes addressable as $<stepName>. Steps appear at the top level of a recipe and can be nested inside for loops.
step products {
method "POST"
url "https://api.example.com/products"
body.json {
page: 1
pageSize: 50
filters: { category: [$catId] }
}
}Step keys:
| Key | Form | Notes |
|---|---|---|
method | String literal | "GET", "POST", … |
url | String literal | Templated: {$input.x} and {$var.path} interpolations. |
headers | Object | Per-request headers. Static-header auth is layered on top. |
body.json | Object | JSON body. Values can reference inputs, loop vars, prior step outputs. |
body.form | Object | Form-encoded body. |
paginate | Strategy block | See Pagination. |
Visits
A visit is the browser-driving counterpart of a step. Instead of authoring an HTTP request, it drives the browser: navigate to a URL, optionally scroll or click until the page settles, then observe what the page rendered and fetched. It binds $<name>, mixes freely with steps in the same body, and like a step can be nested inside for loops.
visit list {
url "https://letterboxd.com/films/popular/"
scroll until noProgressFor(2)
}Visit keys:
| Key | Form | Notes |
|---|---|---|
url | String literal | Templated, like a step's url. The page to navigate to. Required. |
| paginate | Settle clause | Optional: scroll until noProgressFor(n) or click "<sel>" until noProgressFor(n), plus optional maxIterations <n> / iterationDelay <secs>. Absent, the visit navigates once and settles on load. |
The binding exposes two things:
$<name>.dom— the settled document as a node; query it withselect/text/attr.$<name> | matched("<url-substring>")— the body of the first intercepted fetch/XHR whose URL contains the argument, parsed as JSON; reach into it withgetField.
Nest a visit in a for and template its url off a prior capture to chain master → detail — or feed a step listing into per-item visits. See the visit statement for worked examples.
Iteration
Two iteration sources: a list value (e.g. an input or a path into a response) or an enum's variants.
for $channel in $input.channels {
// $channel is a Channel value, available in nested steps and emits
}
for $product in $products[*] {
// $product is one element of the $products response list
}Loops can nest. Inner scopes see all variables from enclosing scopes.
Emit
An emit binds the fields of a declared type to extraction expressions. Each emit produces one record in the output snapshot.
emit Product {
externalId ← $product.id | toString
name ← $product.name
brand ← $product.brand?.name
price ← $product.price
}The validator checks every required (non-optional) field is bound and every bound field type matches.
Path expressions
The right-hand side of an emit field is a path expression with optional pipes through transforms.
| Form | Meaning |
|---|---|
$step | The full response value from a named step. |
$visit.dom | The settled document from a named visit (use matched("…") for its XHRs). |
$input.x | A recipe input. |
$loopVar | The current iteration value. |
.field | Object field access. |
?.field | Optional chaining: short-circuits to null if any intermediate is null. |
[*] | Iterate over a list (in for-loops) or map over a list (in expressions). |
[N] | Index a list by integer. |
String templates
A string literal in a URL, header, or body becomes a template: every {...} interpolation is a full extraction expression, evaluated against the current scope and stringified into the surrounding text.
url "https://api.example.com/stores/{$input.storeId}/products?page={$i}"
headers {
"X-Trace": "page-{$i | toString}"
}
body.json {
key: "price_{$weight | lowercase | replace(" ", "_")}" // dynamic key built from a pipeline
}Inside {...}, you can use the same forms an extraction supports:
- bare paths:
{$input.x},{$step.list[0].id} - pipe transforms:
{$count | toString},{$label | lowercase} - function-call transforms:
{coalesce($a, $b, "fallback")} case … of { … }branches
Transforms inside template interpolations are checked by the validator at load time, so a typo'd {$x | snak_case} fails before any HTTP request fires, not at runtime, three pages into a paginated scrape.
Transforms
Transforms are named, engine-implemented functions chained with |. The vocabulary is fixed: new transforms land in Rust as real platforms need them, not invented per-recipe.
| Transform | Effect |
|---|---|
toString | Number or bool to a string. |
lower / lowercase, upper / uppercase | Change case. |
capitalize / titleCase | Capitalize the first letter, or every word. |
trim | Strip surrounding whitespace. |
replace(find, repl) | Replace a literal substring. |
split(sep) | Split a string into a list. |
match(/re/) | Regex match; returns { matched, captures }. |
matches(/re/) | Whether the string matches (bool). |
replaceAll(/re/, repl) | Replace every regex match. |
parseInt / parseFloat / parseBool | Parse a string to that scalar; null if it doesn't parse. |
parseJson | Parse a JSON string into a value. |
parseHtml | Parse an HTML string into a node for select. |
length | Length of a list or string (0 for null). |
dedup | Drop duplicates, preserving order. |
first | First element of a list (null if empty). |
coalesce(a, …) | The piped value if non-null, else the first non-null argument. |
default(v) | Substitute v when the piped value is null. |
getField(name) | Look up a field whose name is computed at runtime. |
select(css) | CSS-select within a parsed node. |
text | Text content of a node. |
attr(name) | A node's attribute value. |
html / innerHtml | Outer / inner HTML of a node. |
The HTML transforms are covered in HTML extraction. A few transforms fetch over the engine's transport for reconciliation (today, wikidataEntity(qid)); see Compose.
Domain-specific transforms (size/price parsers, currency normalizers) live as user-defined functions, or share fn declarations in the workspace. The engine registry stays generic.
User functions
A fn declaration introduces a named transform, pipe-callable as $x | myFn or directly as myFn($x, $y). Like type and enum, fn is file-scoped by default and workspace-visible with share.
share fn shouty($x) { $x | upper | trim }
fn variantKey($name) {
case $name of {
"Half Ounce" → "half_ounce"
"Ounce" → "ounce"
}
}See User-defined functions for full semantics: let-bindings, scope rules, namespace resolution.
case expressions
Branch on an enum value. Useful when the same emit binds differently per dimension.
price ← case $channel of {
ONLINE → $variant.priceOnline
STORE → $variant.priceInStore
}The validator requires every variant of the enum to be covered.
Expectations
An expect block declares an invariant about the snapshot the recipe is supposed to produce. The engine evaluates each clause at the end of a run and adds any failures to report.unmetExpectations, a structured diagnostic instead of leaving the consumer to wonder why the output looks thin.
expect { records.where(typeName == "Product").count >= 50 }
expect { records.where(typeName == "Variant").count > 0 }See the expectations page for the full grammar and failure rendering.
The validator is your first reader
Most mistakes are caught statically before any HTTP request fires: unknown types, unbound paths, missing required fields, unknown transforms (including ones inside {...} template interpolations), and non-exhaustive case branches. Errors point at the line and column with terms from the DSL.