Auto-triage
Every failed run answers the first question: is this our bug?
When a flow fails, klera classifies the failure into one of four verdicts and ships PM-readable and engineer-readable prose alongside it. The classifier is deterministic; the prose is LLM-narrated; both run on every failed run with no flags to set.
The runtime tapped “Place order”, but the next screen never mounted. The element graph shows the button transitioning to disabled — no navigation event followed.
- Tap “Place order” and confirm the order receipt appears.
+ Pick a saved card, tap “Place order”, and confirm the order receipt appears.The four verdicts
| Verdict | Meaning | What it tells you |
|---|---|---|
| regression | The matcher could not resolve the target and the planner could not find a working path either | Likely a real bug; klera surfaces a suspect commit window for the failed file(s) |
| drift | The matcher could not resolve the cached target, but the planner produced a different working step | The screen shifted enough to outrun the matcher; klera proposes the test update |
| flake | A waitForIdle gate in the same flow timed out and the planner agrees the cached IR is correct | Environment noise, not a product bug; rerun before opening a ticket |
| data | The step’s error string matches a known value-mismatch pattern (e.g. hasText mismatch) | The product is doing what was asked; the test fixture / seeded data has shifted |
The classifier picks one verdict per failed flow. Earlier rules
short-circuit later ones — data wins over a matcher-based call,
regression wins over a flake heuristic. The full rule order lives in
pickVerdict in packages/engine/src/triage.ts.
How the classifier decides
The classifier is pure. No LLM call. It reads three signals off the failed step:
matcherTrace— every ladder rung the matcher probed, how many candidates each rung saw, and whether the ladder resolved (match,drift, orfail). Built by the matcher; lives on every step result. See self-healing matcher for the trace shape.- Planner replan record — the engine optionally re-runs the planner
against the failure-state element graph and the original prose. If
the planner produces a different step at the failed index, that is
the
driftsignal; if it produces an equivalent step or errors out, that is theregressionsignal. The replan never overwrites the cached IR — it is read-only input to the classifier. - Error strings — the executor’s per-step handlers emit specific
error shapes (
hastext mismatch,expected notVisible). Those land thedataverdict directly.
A few derived signals show up too: a waitForIdle step earlier in the
flow that timed out is the flake heuristic; a network-mock divergence
between the cached IR and the runtime call log is part of how the
planner replan decides whether to propose a different step. When the
classifier degrades — no replan available, no source links emitted in
production builds — it picks the conservative call. A matcher-fail
without a replan answer is presumed regression.
Worked example: button text changed
A flow asserts tap "Sign In". Engineering renames the button to
"Log in" without touching the test.
The matcher tries testID (rename, miss), accessibility-label
(miss), exact-text (no element with that text), and falls all the
way down to fuzzy-text — which finds "Log in" at score 0.78. If
self-healing is enabled the run passes with a drift annotation. If
the threshold isn’t met, the run fails. Either way the planner
re-runs against the post-rename element graph and emits
tap "Log in" instead.
Verdict: drift. The triage block ships:
- A PM narrative explaining that the button copy changed.
- An engineer narrative pointing at the matcher trace and the planner’s replan diff.
- A
proposedStepsarray — the IR the planner thinks the cache should hold next.
The triage card in the HTML report has a one-click “Open PR with this
fix” affordance that turns proposedSteps into a prose update.
Worked example: API contract changed
A flow does tap "Place order" then asserts visible "Order placed".
A backend deploy starts returning HTTP 500 for the order endpoint;
the in-app handler shows an error toast and disables the button.
The matcher resolves "Place order" cleanly. The tap fires. The
following assert for "Order placed" exhausts the ladder — the text
is genuinely not on screen. The planner re-runs against the
post-failure element graph; the order screen has not navigated, the
toast carries "Network error", and there is no obvious replanned
path. The planner returns error: "no working path".
Verdict: regression. The triage block ships:
- A PM narrative explaining what the user would see.
- An engineer narrative naming the failed assertion.
- A ranked
suspectFileslist, derived from the elements involved in the matcher trace (the dev-only__sourcedenormalisation; see failure evidence). - A
suspectCommitpointer — the most recent commit touching any of the suspect files within the last 200 commits.
The HTML report renders the suspect commit author, message, and SHA inline. The PR comment has a clickable link straight to that commit.
Suspect-file ranking
The deterministic step uses the failed step’s sourceLinks. Each
linked node carries (elementId, fileName, lineNumber) from the dev-
only __source denormalisation that React Native’s babel plugin
emits. The classifier walks the matcher trace’s matchedElementId and
candidateIds, looks each up in the failure-state snapshot, and
collects _source from the element itself plus its three nearest
ancestors. Composite components that wrap interactive primitives are
the typical suspects, so ancestors carry weight.
The LLM narrator may re-rank or trim the list. The deterministic list
is always populated when __source is available; it survives an LLM
outage as the fallback.
The LLM narrator
After the classifier picks the verdict, klera invokes the planner LLM (Claude Sonnet 4.6 by default) with a structured prompt:
- The verdict.
- The flow name + failed step index.
- The cached IR step that ran.
- The replanned step the planner produced (drift case).
- The matcher trace, error message and details, source links, frames.
The narrator writes a tweet-length PM narrative (≤280 chars) and a
paragraph-length engineer narrative (≤800 chars), plus its own ranked
suspectFiles (≤5 entries). The schema is enforced by Zod — malformed
narrator output falls back to the deterministic suspect-file list and
short stub prose.
Escape hatches
The classifier and narrator are both opt-out:
klera run --no-triage— skip the triage block entirely. The report still ships, just without the verdict / narrative / suspect files.KLERA_NO_TRIAGE=1— environment-variable form of the same flag. Useful in CI environments where you want triage on locally and off in a one-off rerun job.
Graceful degradation without an API key
The classifier never needs an LLM. The narrator does. When
ANTHROPIC_API_KEY is unset (or the local-CLI planner transport is
unreachable), klera ships:
- The deterministic verdict.
- A short stub PM narrative (
"Auto-triage narrative pending — wire ANTHROPIC_API_KEY for prose."). - A short stub engineer narrative.
- The deterministic
suspectFileslist. - The
suspectCommitpointer when the verdict isregressionandgit logsucceeds.
The HTML report renders all of the above without a placeholder. Add
the API key (or wire claude / codex / gemini on PATH) and the
next failure ships full narratives.
Where the triage block lives
The triage block is part of the JSON report’s top-level shape:
{
"schemaVersion": 4,
"flow": { ... },
"steps": [ ... ],
"triage": {
"verdict": "regression",
"failedStepIndex": 4,
"pmNarrative": "Tapping Place order didn't navigate forward...",
"engineerNarrative": "Step 4 (assert visible 'Order placed')...",
"suspectFiles": [
{ "fileName": "packages/checkout/src/PlaceOrderButton.tsx", "lineNumber": 42, "reason": "..." }
],
"suspectCommit": {
"sha": "a1c4f29",
"author": "miyu",
"subject": "checkout: gate submit on payment-method validity"
}
}
}klera report --html renders the triage block as the card you saw at
the top of this page. klera report --junit folds the verdict into the
JUnit <system-out> so it shows up in PR test panes.