Skip to Content
DocsIntegrationsObservability (OTel)

Observability

The per-run JSON report answers “why did this run fail?”. OpenTelemetry answers “what happened across runs over weeks?” — flake-rate trends, slow-step regressions, matcher-strategy drift, triage classification distribution. They answer different questions; the JSON report stays canonical per run, OTel adds aggregation.

One env var

export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io export OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=YOUR_KEY export OTEL_SERVICE_NAME=klera klera run flows/welcome.flow.md

That is the entire opt-in surface. Without an OTel env var, the emitter is unconfigured — adopters who do not want OTel pay zero install cost, since the SDK packages are lazy-required.

You can also configure the endpoint in .klera/config.yaml:

telemetry: otlp: endpoint: http://localhost:4318 serviceName: klera

Spans

Every klera run emits a tree of spans:

klera.flow flow.name=welcome (root, ~12s) ├─ klera.step step.kind=tap[0] (~340ms) ├─ klera.step step.kind=type[1] └─ klera.step step.kind=assert[2]

Stable attributes (subset):

AttributeSpanTypeMeaning
klera.flow.nameklera.flowstringFlow’s name field
klera.flow.stepsklera.flownumberStep count
klera.flow.statusklera.flowstringpassed / failed
klera.flow.duration_msklera.flownumberWall-clock
klera.flow.drift_countklera.flownumberMatcher drifts during the run
klera.flow.replan_countklera.flownumberPlanner replans
klera.flow.default_timeout_msklera.flownumberConfigured timeout
klera.step.indexklera.stepnumberZero-based
klera.step.kindklera.stepstringtap / type / assert / …
klera.step.statusklera.stepstringpassed / failed
klera.step.duration_msklera.stepnumberWall-clock

Span error status is set when a step fails; the message is the matcher / dispatch failure string with secrets redacted.

Metrics

MetricKindAttributes
klera.flows.totalcounterklera.flow.name, klera.flow.status
klera.steps.totalcounterklera.flow.name, klera.step.kind, klera.step.status
klera.step.durationhistogramklera.flow.name, klera.step.kind
klera.matcher.strategy.attemptscounterklera.flow.name, klera.step.kind, klera.matcher.strategy, …outcome
klera.visual_snapshot.diff_ratiohistogramklera.flow.name, klera.visual_snapshot.id, klera.visual_snapshot.created
klera.triage.classificationscounterklera.flow.name, klera.triage.verdict

Operator questions map one-for-one to instruments:

  • Which flow has the highest flake rate? → group klera.flows.total by klera.flow.name filtered to klera.flow.status="failed".
  • Which step is the slowest? → percentile of klera.step.duration faceted by klera.step.kind.
  • Did drift events drop after we landed the fix?klera.triage.classifications filtered to verdict="drift".
  • What fraction of taps still resolve via testID vs fall through to fuzzy text? → ratio of klera.matcher.strategy.attempts{strategy="testID"} to …strategy="fuzzy-text".
  • Is the onboarding screen drifting? → percentile of klera.visual_snapshot.diff_ratio filtered to id="onboarding-1".

Logs — correlated to spans

When OTel is active, every klera log record carries trace_id and span_id in its data field, drawn from the active span. Backends that index those identifiers (Tempo with Loki, Honeycomb, Datadog APM combined with logs) automatically join log records to the spans that produced them — no custom collector logic.

Composition order

The pipeline is user → guard → redact → real SDK. Two consequences:

  • A faulty SDK can never abort a flow. The guard wrapper catches every exporter exception, logs it once, and returns control to the engine.
  • Registered secrets never reach an exporter. Every ${secret:KEY} value the redactor knows about is scrubbed from span attributes, metric labels, and log records before the OTel SDK ever sees them.

What’s not emitted

The OTel egress is deliberately narrow:

  • No element-graph contents. Those carry user-visible text and are large; they live in the per-run JSON report only.
  • No prose flow bodies, no LLM prompt / response payloads. Planner spans carry mode + model + duration + result, never the bodies.
  • No fixture values, no resolved secrets. The redactor enforces this.
  • No screenshots, no diff PNGs. Image bytes are not OTel payloads; they live in the report.

CLI bootstrap activation

The CLI lazy-loads @opentelemetry/sdk-node only when activation is detected, in priority order:

Per-signal endpoint env vars

OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, OTEL_EXPORTER_OTLP_METRICS_ENDPOINT, OTEL_EXPORTER_OTLP_LOGS_ENDPOINT. Any one set turns the bootstrap on.

Aggregate endpoint

OTEL_EXPORTER_OTLP_ENDPOINT. The most common opt-in.

Exporter selection

OTEL_TRACES_EXPORTER / OTEL_METRICS_EXPORTER set to anything other than none.

Config file

.klera/config.yaml telemetry.otlp.endpoint.

When activated, the bootstrap:

  • Lazy-imports the SDK + OTLP HTTP exporters for traces, metrics, and logs. Adopters who never opt in keep a static module graph at the engine’s type-only surface.
  • Wraps each OTel SDK instrument to satisfy the engine’s Tracer / Meter / Logger interfaces.
  • Registers the LoggerProvider with OTel’s global logsApi so adopter- injected instrumentation sees klera’s records.
  • Returns a shutdown() that flushes both pipelines under a 5s timeout.

The CLI’s run command invokes the bootstrap once per invocation, threads tracer + meter into runFlow, and calls shutdown() in a finally block so spans + metrics + logs flush even on flow failure.

Wiring to common backends

export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=$HONEYCOMB_KEY" export OTEL_SERVICE_NAME=klera klera run flows/welcome.flow.md

The endpoint variable is the only field that changes per backend. Headers carry auth where the backend requires it.

Backfilling historical reports

Adopters who wire OTel after they already have a JSON report archive can replay every report into the configured endpoint with klera report --otlp:

export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io export OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=YOUR_KEY for report in reports/*.json; do klera report "$report" --otlp done

Each invocation emits the same klera.flow + klera.step spans and the same six instruments the live engine would have. Replay is pure projection of the JSON; no runtime, bridge, or device required.

Two caveats:

  • Replay timestamps are synthesised — backends see relative timing faithfully but absolute timestamps reflect replay time, not the original run.
  • Replay is not idempotent. Running the loop twice ships every span and metric twice.

Programmatic adoption

The engine exports Tracer / Meter / Logger interfaces and their no-op / in-memory / guarded / redacting / correlating wrappers. Embeddings that do not speak OTel can implement the interfaces themselves and pass them via RunOptions:

import { runFlow, type Tracer, type Meter } from "@klera/engine"; const tracer: Tracer = { /* your own implementation */ }; const meter: Meter = { /* your own implementation */ }; await runFlow(flow, bridge, { tracer, meter, logger });

The CLI’s OTel adapter is one such implementation; nothing about the engine’s contract favours OTel over alternatives.

See also

  • CI scaffolds — wire the JSON-report → JUnit → test-results pane pipeline.
  • Reports — the canonical per-run artefact.
Last updated on