Observability

The per-run JSON report answers “why did this run fail?”. OpenTelemetry answers “what happened across runs over weeks?” — flake-rate trends, slow-step regressions, matcher-strategy drift, triage classification distribution. They answer different questions; the JSON report stays canonical per run, OTel adds aggregation.

One env var


export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
export OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=YOUR_KEY
export OTEL_SERVICE_NAME=klera
 
klera run flows/welcome.flow.md

That is the entire opt-in surface. Without an OTel env var, the emitter is unconfigured — adopters who do not want OTel pay zero install cost, since the SDK packages are lazy-required.

You can also configure the endpoint in .klera/config.yaml:


telemetry:
  otlp:
    endpoint: http://localhost:4318
    serviceName: klera

Spans

Every klera run emits a tree of spans:


klera.flow                 flow.name=welcome      (root, ~12s)
├─ klera.step              step.kind=tap[0]       (~340ms)
├─ klera.step              step.kind=type[1]
└─ klera.step              step.kind=assert[2]

Stable attributes (subset):

Attribute	Span	Type	Meaning
`klera.flow.name`	`klera.flow`	string	Flow’s `name` field
`klera.flow.steps`	`klera.flow`	number	Step count
`klera.flow.status`	`klera.flow`	string	`passed` / `failed`
`klera.flow.duration_ms`	`klera.flow`	number	Wall-clock
`klera.flow.drift_count`	`klera.flow`	number	Matcher drifts during the run
`klera.flow.replan_count`	`klera.flow`	number	Planner replans
`klera.flow.default_timeout_ms`	`klera.flow`	number	Configured timeout
`klera.step.index`	`klera.step`	number	Zero-based
`klera.step.kind`	`klera.step`	string	`tap` / `type` / `assert` / …
`klera.step.status`	`klera.step`	string	`passed` / `failed`
`klera.step.duration_ms`	`klera.step`	number	Wall-clock

Span error status is set when a step fails; the message is the matcher / dispatch failure string with secrets redacted.

Metrics

Metric	Kind	Attributes
`klera.flows.total`	counter	`klera.flow.name`, `klera.flow.status`
`klera.steps.total`	counter	`klera.flow.name`, `klera.step.kind`, `klera.step.status`
`klera.step.duration`	histogram	`klera.flow.name`, `klera.step.kind`
`klera.matcher.strategy.attempts`	counter	`klera.flow.name`, `klera.step.kind`, `klera.matcher.strategy`, `…outcome`
`klera.visual_snapshot.diff_ratio`	histogram	`klera.flow.name`, `klera.visual_snapshot.id`, `klera.visual_snapshot.created`
`klera.triage.classifications`	counter	`klera.flow.name`, `klera.triage.verdict`

Operator questions map one-for-one to instruments:

Which flow has the highest flake rate? → group klera.flows.total by klera.flow.name filtered to klera.flow.status="failed".
Which step is the slowest? → percentile of klera.step.duration faceted by klera.step.kind.
Did drift events drop after we landed the fix? → klera.triage.classifications filtered to verdict="drift".
What fraction of taps still resolve via testID vs fall through to fuzzy text? → ratio of klera.matcher.strategy.attempts{strategy="testID"} to …strategy="fuzzy-text".
Is the onboarding screen drifting? → percentile of klera.visual_snapshot.diff_ratio filtered to id="onboarding-1".

Logs — correlated to spans

When OTel is active, every klera log record carries trace_id and span_id in its data field, drawn from the active span. Backends that index those identifiers (Tempo with Loki, Honeycomb, Datadog APM combined with logs) automatically join log records to the spans that produced them — no custom collector logic.

Composition order

The pipeline is user → guard → redact → real SDK. Two consequences:

A faulty SDK can never abort a flow. The guard wrapper catches every exporter exception, logs it once, and returns control to the engine.
Registered secrets never reach an exporter. Every ${secret:KEY} value the redactor knows about is scrubbed from span attributes, metric labels, and log records before the OTel SDK ever sees them.

What’s not emitted

The OTel egress is deliberately narrow:

No element-graph contents. Those carry user-visible text and are large; they live in the per-run JSON report only.
No prose flow bodies, no LLM prompt / response payloads. Planner spans carry mode + model + duration + result, never the bodies.
No fixture values, no resolved secrets. The redactor enforces this.
No screenshots, no diff PNGs. Image bytes are not OTel payloads; they live in the report.

CLI bootstrap activation

The CLI lazy-loads @opentelemetry/sdk-node only when activation is detected, in priority order:

Per-signal endpoint env vars

OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, OTEL_EXPORTER_OTLP_METRICS_ENDPOINT, OTEL_EXPORTER_OTLP_LOGS_ENDPOINT. Any one set turns the bootstrap on.

Aggregate endpoint

OTEL_EXPORTER_OTLP_ENDPOINT. The most common opt-in.

Exporter selection

OTEL_TRACES_EXPORTER / OTEL_METRICS_EXPORTER set to anything other than none.

Config file

.klera/config.yaml telemetry.otlp.endpoint.

When activated, the bootstrap:

Lazy-imports the SDK + OTLP HTTP exporters for traces, metrics, and logs. Adopters who never opt in keep a static module graph at the engine’s type-only surface.
Wraps each OTel SDK instrument to satisfy the engine’s Tracer / Meter / Logger interfaces.
Registers the LoggerProvider with OTel’s global logsApi so adopter- injected instrumentation sees klera’s records.
Returns a shutdown() that flushes both pipelines under a 5s timeout.

The CLI’s run command invokes the bootstrap once per invocation, threads tracer + meter into runFlow, and calls shutdown() in a finally block so spans + metrics + logs flush even on flow failure.

Wiring to common backends

Honeycomb


export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=$HONEYCOMB_KEY"
export OTEL_SERVICE_NAME=klera
klera run flows/welcome.flow.md

Grafana / Tempo


export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp-gateway-prod.grafana.net/otlp
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic $GRAFANA_TOKEN_B64"
export OTEL_SERVICE_NAME=klera
klera run flows/welcome.flow.md

Jaeger


# Local Jaeger all-in-one with OTLP enabled
docker run -d --name jaeger \
  -p 4318:4318 -p 16686:16686 \
  jaegertracing/all-in-one:latest
 
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_SERVICE_NAME=klera
klera run flows/welcome.flow.md
# Open http://localhost:16686 → service "klera"

Datadog


# Datadog Agent OTLP receiver — see DD docs for agent config
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
export OTEL_RESOURCE_ATTRIBUTES="deployment.environment=ci"
export OTEL_SERVICE_NAME=klera
klera run flows/welcome.flow.md

The endpoint variable is the only field that changes per backend. Headers carry auth where the backend requires it.

Backfilling historical reports

Adopters who wire OTel after they already have a JSON report archive can replay every report into the configured endpoint with klera report --otlp:


export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
export OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=YOUR_KEY
 
for report in reports/*.json; do
  klera report "$report" --otlp
done

Each invocation emits the same klera.flow + klera.step spans and the same six instruments the live engine would have. Replay is pure projection of the JSON; no runtime, bridge, or device required.

Two caveats:

Replay timestamps are synthesised — backends see relative timing faithfully but absolute timestamps reflect replay time, not the original run.
Replay is not idempotent. Running the loop twice ships every span and metric twice.

Programmatic adoption

The engine exports Tracer / Meter / Logger interfaces and their no-op / in-memory / guarded / redacting / correlating wrappers. Embeddings that do not speak OTel can implement the interfaces themselves and pass them via RunOptions:


import { runFlow, type Tracer, type Meter } from "@klera/engine";
 
const tracer: Tracer = {
  /* your own implementation */
};
const meter: Meter = {
  /* your own implementation */
};
 
await runFlow(flow, bridge, { tracer, meter, logger });

The CLI’s OTel adapter is one such implementation; nothing about the engine’s contract favours OTel over alternatives.