Prose flows

.flow.md is the lead authoring surface. Describe the scenario the way you’d type it into a ticket. The planner compiles prose to a deterministic IR, commits the result alongside as .flow.json, and the runtime executes that cache like any hand-authored YAML flow.


# Sign in and see today's notifications
 
Sign in with the seeded test user, dismiss the onboarding modal, and
assert that the home screen shows today's notifications. Take a visual
snapshot called "home-after-login".

That is the whole flow file. No imports, no selectors, no setup boilerplate.

File structure

A .flow.md file has three parts; only the title is required.


---
hints:
  effort: high
  preferRoles: ['button', 'link']
fixtures:
  email: pm@example.com
---
 
# Login smoke
 
Type the email into the email field and tap Sign in. Wait for the
welcome greeting.

Front-matter (optional)

YAML between --- fences. Three keys:

fixtures — free-form key/value pairs surfaced to the planner. Useful for inline test data that does not deserve a separate fixtures/ file.
hints.model / hints.effort / hints.preferRoles — planner overrides. effort: high lets the planner think longer; preferRoles: ['button'] biases targets toward role-tagged elements.
Unknown keys are rejected at parse time.

Title

The first # Heading line. It becomes Flow.name in the IR.

Body

Everything after the title is handed to the LLM planner verbatim. Write prose; the planner figures out the steps.

How the planner compiles prose to IR

The planner takes three inputs: prose body, an element-graph snapshot of the screen the flow starts on, and the current planner version. It emits a SemanticPlan — the IR plus a _meta block recording how the plan was produced.


prose body  ─┐
snapshot    ─┼──▶ planner ──▶ SemanticPlan { steps, _meta }
version     ─┘                       │
                                     ▼
                          flows/login.flow.json (committed)

Each prose sentence maps to one or more IR steps. “Tap Sign In” becomes a tap step; “if a What’s New modal appears, dismiss it” becomes an optional step (see IR reference). The planner only emits IR variants the runtime understands — it cannot invent step kinds.

Capturing the snapshot is a one-liner the CLI handles for you:


klera plan flows/login.flow.md --snapshot snap.json

If snap.json does not exist, the CLI starts the runtime, captures the current screen, and writes the snapshot. Subsequent klera plan calls reuse it unless the screen has materially changed.

The `.flow.json` cache

Treat the cache like a lockfile. It is generated, committed, and reviewed alongside the prose change in the same PR — adopters never hand-edit it.

- login.flow.md
- login.flow.json

_meta carries fingerprints of every input the planner saw:


{
  "_meta": {
    "model": "anthropic:claude-sonnet-4-7",
    "promptHash": "sha256:a4f9…",
    "snapshotHash": "sha256:1e2c…",
    "plannerVersion": "0.1.0",
    "combined": "sha256:7b09…",
    "fixturesUsed": ["users.regular"]
  },
  "steps": [
    { "tap": { "testID": "login-email" } },
    /* … */
  ]
}

The combined hash is hash(prose + element_graph + planner_version). When any of the three changes the cache is stale.

Staleness detection

The CLI knows three states:

State	Meaning
fresh	`combined` matches a recompute against the current prose + snapshot + planner version.
stale	At least one input changed. CI gates on this.
missing-combined	A pre-ADR-0054 cache. Treated as stale; regenerate once and the new field lands.


klera compile flows/login.flow.md --check     # exits 1 if stale, 0 if fresh
klera compile flows/login.flow.md --force     # always regenerate
klera compile flows/login.flow.md --diff out.md  # write a Markdown step-list diff
klera compile --all --check                   # gate every flow at once (CI)
klera compile --all                           # batch regenerate every flow

klera compile is the canonical way to bring a stale cache up to date. klera plan covers the same ground but is biased toward first-time generation; once you have a .flow.json committed, prefer compile.

CI typically runs klera compile --all --check as a PR gate. The optional klera ci scaffold flag compileMode: 'auto' instead regenerates stale caches and posts a Markdown step-list diff as a PR comment for review.

Run-time drift recovery vs recompile

The cache fingerprints what the planner saw at compile time; the runtime sees what the screen actually looks like at run time. They can disagree.

Symptom	Resolution
Selector drift (`testID` renamed, button moved a pixel)	Matcher self-heals via the strategy ladder. No replan, no recompile.
One-off optional surface (What’s New modal, A/B test)	Runtime replans the remaining steps in-memory. No cache rewrite.
Whole-screen redesign (the planner’s snapshot is wrong)	Recompile: delete `snap.json`, rerun `klera compile --force`.
Prose intent itself changed	Edit the prose, run `klera compile`, commit both files together.

Runtime replanning is on by default and bounded — three rungs of recovery, and every replan attempt is recorded in the report’s matcher trace. Replans never rewrite the on-disk .flow.json; PR diffs stay deterministic.

`--strict` mode


klera run flows/login.flow.md --strict

--strict disables runtime replanning entirely. Intent drift surfaces as a hard failure with the same matcher diagnostics a YAML flow produces. This is the expected mode for CI — the cache committed on the PR is what runs, full stop.

Runtime replanning is a debugging affordance for local iteration. CI should always run --strict. If --strict fails and the local non-strict run passes, you have a stale cache; run klera compile and commit the result.

How the planner uses the snapshot

The element-graph snapshot is a JSON tree of every accessible node the runtime saw on the starting screen — testID, accessibilityLabel, role, text, frame, parent. The planner projects only IR-relevant fields (no internal id, no fiber bookkeeping) into the prompt so the LLM cannot cite handles that won’t exist at run time.

The snapshot has two jobs:

Disambiguate references. “Tap the Sign In button” is one node; “tap the second Sign In link” is a different one. The snapshot tells the planner which is which.
Anti-hallucination grounding. The planner is instructed to prefer testIDs and labels that appear in the snapshot. Targets that don’t appear get rejected by a semantic-check pass before the cache is written; the retry loop carries the rejection back to the LLM.

The snapshot itself participates in the cache key — change the screen substantially and the cache goes stale, even if the prose did not change.

Planner transports

Four transports produce a bit-identical SemanticPlan cache. Adopters pick based on what auth they already have.

API


klera plan flows/login.flow.md --snapshot snap.json

Default transport. Calls the Anthropic API directly. Needs ANTHROPIC_API_KEY in the environment. Deterministic, headless, ideal for CI.

Local CLI


klera plan flows/login.flow.md --via-cli claude
# or codex / gemini — auto-detected from PATH at init time

Spawns the developer’s local coding-agent CLI, pipes the prompt to stdin, parses JSON from stdout. No API key needed — the LLM bill goes through your existing Claude / Codex / Gemini subscription. klera init writes the chosen CLI into .klera/config.yaml.

Manual paste


klera plan flows/login.flow.md --snapshot snap.json --manual
# wrote flows/login.plan-prompt.md
 
# … paste into any chat-style LLM, save the response, then:
 
klera plan flows/login.flow.md --snapshot snap.json \
  --apply-response flows/login.plan-response.json

Writes the prompt to disk; you paste it into any chat-style LLM, paste the response back via --apply-response. No API key, no CLI, no network calls from klera.

MCP


klera plan flows/login.flow.md --snapshot snap.json \
  --via-mcp 'pnpm exec klera mcp'

Spawns an MCP server and routes plan_flow through it. Editor agents (Claude Code, Cursor) call the same MCP tools directly so the round-trip happens inside the editor without leaving the surface you author in.

The _meta.model field on the cached IR records which transport produced the plan: "anthropic:<model-id>", "manual", "manual:claude-3-5-sonnet" (when you pass a custom modelTag), or "mcp:host" / "mcp:server". Triage and the HTML report viewer surface this in the report header.

See planner transports for the full transport reference.

What `klera run` does with the cache

klera run flows/login.flow.md never calls the LLM. It loads the sibling .flow.json, validates it via Zod, and executes the IR. If the cache is missing the runner errors with the exact klera compile command needed to generate it.


klera run flows/login.flow.md
klera run flows/login.flow.md --strict      # CI mode
klera run flows/login.flow.md --watch       # iterate on prose; re-runs on save

Watch mode hooks Metro’s file watcher (or the @klera/metro-plugin) so saving the .flow.md triggers a debounced klera compile followed by a re-run against the same attached bridge. Iteration latency drops by an order of magnitude vs cold-run-per-edit. See watch mode for details.

Optional steps and conditionals

Prose conditionals like “if X appears, do Y” compile to an optional IR step. The matcher evaluates the predicate against the runtime element graph and only runs the inner step when it matches:


{
  "optional": {
    "when": { "visible": { "testID": "whats-new-modal" } },
    "do": { "tap": { "testID": "whats-new-dismiss" } }
  }
}

Optional steps are flat — they cannot be nested inside another optional. Express compound conditionals as multiple sequential optionals.

Next steps

YAML flows — the power-user escape hatch and what prose compiles into.
IR reference — every step kind the planner can emit.
Fixtures and secrets — committed test data and credential handling.
Planner transports — wiring up local CLIs, manual paste, MCP.