Skip to Content
DocsAuthoring flowsProse flows

Prose flows

.flow.md is the lead authoring surface. Describe the scenario the way you’d type it into a ticket. The planner compiles prose to a deterministic IR, commits the result alongside as .flow.json, and the runtime executes that cache like any hand-authored YAML flow.

# Sign in and see today's notifications Sign in with the seeded test user, dismiss the onboarding modal, and assert that the home screen shows today's notifications. Take a visual snapshot called "home-after-login".

That is the whole flow file. No imports, no selectors, no setup boilerplate.

File structure

A .flow.md file has three parts; only the title is required.

--- hints: effort: high preferRoles: ['button', 'link'] fixtures: email: pm@example.com --- # Login smoke Type the email into the email field and tap Sign in. Wait for the welcome greeting.

Front-matter (optional)

YAML between --- fences. Three keys:

  • fixtures — free-form key/value pairs surfaced to the planner. Useful for inline test data that does not deserve a separate fixtures/ file.
  • hints.model / hints.effort / hints.preferRoles — planner overrides. effort: high lets the planner think longer; preferRoles: ['button'] biases targets toward role-tagged elements.
  • Unknown keys are rejected at parse time.

Title

The first # Heading line. It becomes Flow.name in the IR.

Body

Everything after the title is handed to the LLM planner verbatim. Write prose; the planner figures out the steps.

How the planner compiles prose to IR

The planner takes three inputs: prose body, an element-graph snapshot of the screen the flow starts on, and the current planner version. It emits a SemanticPlan — the IR plus a _meta block recording how the plan was produced.

prose body ─┐ snapshot ─┼──▶ planner ──▶ SemanticPlan { steps, _meta } version ─┘ │ flows/login.flow.json (committed)

Each prose sentence maps to one or more IR steps. “Tap Sign In” becomes a tap step; “if a What’s New modal appears, dismiss it” becomes an optional step (see IR reference). The planner only emits IR variants the runtime understands — it cannot invent step kinds.

Capturing the snapshot is a one-liner the CLI handles for you:

klera plan flows/login.flow.md --snapshot snap.json

If snap.json does not exist, the CLI starts the runtime, captures the current screen, and writes the snapshot. Subsequent klera plan calls reuse it unless the screen has materially changed.

The .flow.json cache

Treat the cache like a lockfile. It is generated, committed, and reviewed alongside the prose change in the same PR — adopters never hand-edit it.

    • login.flow.md
    • login.flow.json

_meta carries fingerprints of every input the planner saw:

{ "_meta": { "model": "anthropic:claude-sonnet-4-7", "promptHash": "sha256:a4f9…", "snapshotHash": "sha256:1e2c…", "plannerVersion": "0.1.0", "combined": "sha256:7b09…", "fixturesUsed": ["users.regular"] }, "steps": [ { "tap": { "testID": "login-email" } }, /* … */ ] }

The combined hash is hash(prose + element_graph + planner_version). When any of the three changes the cache is stale.

Staleness detection

The CLI knows three states:

StateMeaning
freshcombined matches a recompute against the current prose + snapshot + planner version.
staleAt least one input changed. CI gates on this.
missing-combinedA pre-ADR-0054 cache. Treated as stale; regenerate once and the new field lands.
klera compile flows/login.flow.md --check # exits 1 if stale, 0 if fresh klera compile flows/login.flow.md --force # always regenerate klera compile flows/login.flow.md --diff out.md # write a Markdown step-list diff klera compile --all --check # gate every flow at once (CI) klera compile --all # batch regenerate every flow

klera compile is the canonical way to bring a stale cache up to date. klera plan covers the same ground but is biased toward first-time generation; once you have a .flow.json committed, prefer compile.

CI typically runs klera compile --all --check as a PR gate. The optional klera ci scaffold flag compileMode: 'auto' instead regenerates stale caches and posts a Markdown step-list diff as a PR comment for review.

Run-time drift recovery vs recompile

The cache fingerprints what the planner saw at compile time; the runtime sees what the screen actually looks like at run time. They can disagree.

SymptomResolution
Selector drift (testID renamed, button moved a pixel)Matcher self-heals via the strategy ladder. No replan, no recompile.
One-off optional surface (What’s New modal, A/B test)Runtime replans the remaining steps in-memory. No cache rewrite.
Whole-screen redesign (the planner’s snapshot is wrong)Recompile: delete snap.json, rerun klera compile --force.
Prose intent itself changedEdit the prose, run klera compile, commit both files together.

Runtime replanning is on by default and bounded — three rungs of recovery, and every replan attempt is recorded in the report’s matcher trace. Replans never rewrite the on-disk .flow.json; PR diffs stay deterministic.

--strict mode

klera run flows/login.flow.md --strict

--strict disables runtime replanning entirely. Intent drift surfaces as a hard failure with the same matcher diagnostics a YAML flow produces. This is the expected mode for CI — the cache committed on the PR is what runs, full stop.

Runtime replanning is a debugging affordance for local iteration. CI should always run --strict. If --strict fails and the local non-strict run passes, you have a stale cache; run klera compile and commit the result.

How the planner uses the snapshot

The element-graph snapshot is a JSON tree of every accessible node the runtime saw on the starting screen — testID, accessibilityLabel, role, text, frame, parent. The planner projects only IR-relevant fields (no internal id, no fiber bookkeeping) into the prompt so the LLM cannot cite handles that won’t exist at run time.

The snapshot has two jobs:

  1. Disambiguate references. “Tap the Sign In button” is one node; “tap the second Sign In link” is a different one. The snapshot tells the planner which is which.
  2. Anti-hallucination grounding. The planner is instructed to prefer testIDs and labels that appear in the snapshot. Targets that don’t appear get rejected by a semantic-check pass before the cache is written; the retry loop carries the rejection back to the LLM.

The snapshot itself participates in the cache key — change the screen substantially and the cache goes stale, even if the prose did not change.

Planner transports

Four transports produce a bit-identical SemanticPlan cache. Adopters pick based on what auth they already have.

klera plan flows/login.flow.md --snapshot snap.json

Default transport. Calls the Anthropic API directly. Needs ANTHROPIC_API_KEY in the environment. Deterministic, headless, ideal for CI.

The _meta.model field on the cached IR records which transport produced the plan: "anthropic:<model-id>", "manual", "manual:claude-3-5-sonnet" (when you pass a custom modelTag), or "mcp:host" / "mcp:server". Triage and the HTML report viewer surface this in the report header.

See planner transports for the full transport reference.

What klera run does with the cache

klera run flows/login.flow.md never calls the LLM. It loads the sibling .flow.json, validates it via Zod, and executes the IR. If the cache is missing the runner errors with the exact klera compile command needed to generate it.

klera run flows/login.flow.md klera run flows/login.flow.md --strict # CI mode klera run flows/login.flow.md --watch # iterate on prose; re-runs on save

Watch mode hooks Metro’s file watcher (or the @klera/metro-plugin) so saving the .flow.md triggers a debounced klera compile followed by a re-run against the same attached bridge. Iteration latency drops by an order of magnitude vs cold-run-per-edit. See watch mode for details.

Optional steps and conditionals

Prose conditionals like “if X appears, do Y” compile to an optional IR step. The matcher evaluates the predicate against the runtime element graph and only runs the inner step when it matches:

{ "optional": { "when": { "visible": { "testID": "whats-new-modal" } }, "do": { "tap": { "testID": "whats-new-dismiss" } } } }

Optional steps are flat — they cannot be nested inside another optional. Express compound conditionals as multiple sequential optionals.

Next steps

Last updated on