IR reference
Every step keyword klera understands. The IR is defined as a Zod schema
in @klera/protocol/ir.ts; this page enumerates every variant with its
long-form payload, short-form sugar, a runnable example, and the
gotchas that bite in practice.
Steps are grouped by concern:
- Gestures —
tap,longPress,tapCoord,multiTap,pinch,swipe,scroll - Input —
type - Assertions —
assert,assertJS - Sync —
wait,waitForIdle - Device —
dismissAlert,dismissActionSheet,setLocation,setBiometric,setOrientation,openURL,setClipboard,dismissKeyboard,pressBack,grantPermission - Lifecycle —
relaunch,backgroundApp,foregroundApp - Network —
mockNetwork,unmockNetwork,assertNetworkCalled - Visual —
screenshot,visualSnapshot - Control flow —
optional
Targets
Every gesture, input, and assertion that resolves an element accepts a
target selector. The selector must specify at least one of testID,
text, or accessibilityLabel; the matcher walks the strategy ladder
in that order, then role + text (when role is set), then fuzzy text.
target: { testID: login-submit }
target: { text: Sign In }
target: { text: Sign In, role: button }
target: { accessibilityLabel: Submit form }
target: { testID: row, scope: 'todo-list' }role constrains the text/label rungs of the ladder; it’s not its own
rung. scope restricts the search to descendants of an element matched
by a free-form hint (typically a parent testID or text). String
short-form (tap: Sign In) is sugar for { text: "Sign In" }.
Gestures
tap
Synthesise a single tap on the resolved element.
- tap: Sign In # short form
- tap: { testID: login-submit } # long form
- tap: { text: Continue, role: button }Resolves through the matcher; self-heals through the strategy ladder. The most common step in any flow.
Gotcha: if two elements match the same selector, the matcher picks
the topmost one in document order. Disambiguate with scope or a
tighter testID.
longPress
Same shape as tap; the runtime holds the contact for a longer
duration so press-and-hold gesture recognisers fire.
- longPress: { testID: card-1 }tapCoord
Synthesise a tap at a raw view-coordinate point. Pixel coordinates
leak into the flow — use only when an element has no testID /
accessibility label and the matcher cannot help.
- tapCoord: { x: 200, y: 400 }
- tapCoord: { x: 200, y: 400, durationMs: 60 }durationMs defaults to 60 ms. iOS and Android both implement this via
the native driver’s coordinate-injection path.
multiTap
Two-to-five concurrent contact points delivered as a single UIEvent so
UITapGestureRecognizer with numberOfTouchesRequired > 1 sees them
as one gesture.
- multiTap:
points:
- { x: 100, y: 200 }
- { x: 300, y: 200 }
durationMs: 60pinch
Two-finger pinch / zoom. Two touch sequences interpolated linearly from
from → to over durationMs. Pinch when to is closer than from;
zoom when farther.
- pinch:
from: [{ x: 100, y: 400 }, { x: 300, y: 400 }]
to: [{ x: 50, y: 400 }, { x: 350, y: 400 }] # zoom (apart)
durationMs: 300swipe
Synthesise a directional swipe. Short form picks sensible
middle-of-screen defaults; long form takes explicit from / to and
optional duration.
- swipe: up # short form
- swipe: down
- swipe: # long form
from: [200, 700]
to: [200, 200]
durationMs: 300The native module clamps the synthetic points to the actual key window bounds, so the short form works on any device size.
scroll
Walk a scrollable container until a target appears, or for a fixed number of swipes in a direction.
- scroll: # walk down up to 10 times until the target appears
to: { testID: row-42 }
direction: down
maxSwipes: 10
- scroll: # blind scroll
direction: up
maxSwipes: 3direction defaults to down; maxSwipes defaults to 10 (max 50).
When to is set the executor stops as soon as the target resolves.
Input
type
Resolve into, focus it, and type value character-by-character. The
runtime synthesises onChangeText events so controlled inputs see each
keystroke.
- type:
into: { testID: login-email }
value: pm@example.com
- type:
into: search-box # short form for {text: "search-box"}
value: ${env:E2E_QUERY}Gotcha: for password fields, set secureTextEntry on the
React Native side. The runtime types the resolved value; the rendered
PNG captured during failure-evidence will leak the typed character if
the field is not secured. The Redactor scrubs string values, not
pixels — see fixtures and secrets.
Assertions
assert
Static assertions over the element graph. Three mutually-exclusive shapes — at least one must be set.
- assert: { visible: { testID: welcome-greeting } }
- assert: { visible: Welcome back } # short form for visible
- assert: { notVisible: { testID: spinner } }
- assert:
hasText:
target: { testID: welcome-greeting }
value: 'Welcome, pm@example.com'Assertions are read-only — they don’t drive the UI, they observe it.
hasText is exact match; substring assertions go through assertJS.
assertJS
Eval an expression in the runtime’s JS context via Hermes CDP. The
short form (assertJS: <string>) implies truthy; the long form
takes one comparator.
- assertJS: 'globalThis.featureFlags?.newCheckout === true'
- assertJS:
expression: 'globalThis.cartItems.length'
equals: 3
- assertJS:
expression: 'globalThis.user.email'
matches: '^[^@]+@example\\.com$'
- assertJS:
expression: 'document.title' # whatever Hermes can see
contains: 'klera'Comparators: equals, notEquals, matches (regex), contains
(substring/array), gt / lt / gte / lte, truthy. At most one
per step. timeoutMs defaults to 2000.
assertJS is the escape hatch when the element graph alone can’t
express the invariant — feature-flag state, in-memory store
contents, computed values. Prefer assert over assertJS when
both work; the matcher’s diagnostics are richer.
Sync
wait
Two shapes: pause for a fixed duration, or wait until an element appears (with a timeout).
- wait: { seconds: 2 } # blunt — last resort
- wait: # poll until visible
for: { testID: welcome-greeting }
timeoutSeconds: 5Prefer the element-visibility form; fixed sleeps cause flakes.
waitForIdle
Synchronise with the runtime’s idle gates. The runtime owns counters
for animations (Animated / LayoutAnimation / rAF) and network (wrapped
fetch / XHR / WebSocket); waitForIdle blocks until the requested
gates are quiet for the quiet window.
- waitForIdle: true # short form — both gates
- waitForIdle:
animations: true
network: true
timeoutMs: 10000 # default falls back to flow's defaultTimeoutMs
quietWindowMs: 100 # runtime defaultGotcha: waitForIdle: false is rejected. Opt out by simply
omitting the step. At least one gate must be enabled in the long form.
Device
dismissAlert
Dismiss the topmost native alert / modal by tapping the button whose
label matches. Special values accept and cancel resolve to “first
non-cancel button” and “the cancel button” respectively.
- dismissAlert: OK
- dismissAlert: accept
- dismissAlert: cancelRouted through the host-side DeviceDriver, not the in-app runtime —
native alerts live outside the React tree.
dismissActionSheet
Dismiss a UIKit ActionSheet (the sibling style of UIAlertController
used by ActionSheetIOS, share-sheet triggers, native pickers) by
tapping the button whose title matches. iOS-only.
- dismissActionSheet: SharesetLocation
Set the simulator’s reported GPS coordinates. Simulator-only; physical-device GPS forging is unsupported.
- setLocation: { lat: 37.7749, lon: -122.4194 }setBiometric
Set the simulator’s Touch ID / Face ID outcome.
- setBiometric: success # short form
- setBiometric: failure
- setBiometric: not-enrolled
- setBiometric: { outcome: success } # long formsetOrientation
Rotate the simulator / app to portrait or landscape.
- setOrientation: landscape
- setOrientation: portraitopenURL
Open a URL scheme / universal link. Triggers Linking listeners in the
app under test.
- openURL: myapp://deep/path
- openURL: https://example.com/share/abcsetClipboard
Write to the system pasteboard.
- setClipboard: 'copied value'dismissKeyboard
Dismiss the soft keyboard. Bare-key short form: the value is intentionally ignored, but YAML wants something there.
- dismissKeyboard: truepressBack
Fire Android’s hardware-back gesture. Android-only — the engine’s
composite rejects it on iOS with capability_unsupported.
- pressBack: truegrantPermission
Grant a runtime permission to the app under test. Android-only. The
enum is bounded; new values land in @klera/protocol so the
driver-side mapping table stays in lockstep.
- grantPermission: camera
- grantPermission: microphone
- grantPermission: location
- grantPermission: notificationsLifecycle
relaunch
Terminate and relaunch the app under test. Short form relaunch: true
is a clean restart; the long form takes optional launch args and a
URL delivered via openurl after launch.
- relaunch: true # clean restart
- relaunch:
args: ['--debug', '--region=eu']
- relaunch:
url: myapp://post-launch/pathbundleId is resolved at runtime from the connected runtime’s
RuntimeInfo.appId.
backgroundApp / foregroundApp
Send the app to background and bring it back. Both are bare-key short forms — the truthy literal keeps YAML well-formed.
- backgroundApp: true
- waitForIdle: { animations: true }
- foregroundApp: trueUseful for testing app-restoration behaviour, push-notification handlers, or session expiry.
Network
mockNetwork
Install one or more mocks for the rest of the flow (or until
unmockNetwork is called). Each spec matches by method + path,
then synthesises a response. Inline body and file-backed fixture
are mutually exclusive; supply neither and the runtime echoes {}.
- mockNetwork:
- method: GET
path: /api/notifications
status: 200
body: { notifications: [] }
- method: POST
path: /api/login
status: 200
fixture: fixtures/login-success.json
delayMs: 500
headers:
x-rate-limit-remaining: '99'method: '*' matches any method. path is exact-match in v1; glob
and regex broaden the matcher in a later wave.
unmockNetwork
Clear all installed mocks (true) or a targeted subset.
- unmockNetwork: true
- unmockNetwork:
- { method: GET, path: /api/notifications }assertNetworkCalled
Assert that a request matching the predicate was observed in the
runtime’s per-flow network log. At least one of method or path
must be supplied. count defaults to 1; supply 0 to assert no
matching request.
- assertNetworkCalled:
method: POST
path: /api/login
- assertNetworkCalled:
path: /api/analytics
count: 0 # assert nothing went out
- assertNetworkCalled:
method: POST
path: /api/orders
bodyContains: { sku: 'WIDGET-1' }
headersContain:
authorization: 'Bearer'See network mocking for the wider story.
Visual
screenshot
Capture the current app state to disk. Short form is sugar for the long form below; intermediate directories are created on demand.
- screenshot: home.png # short form
- screenshot:
path: artefacts/home.png
scale: 2 # optional Retina scaleUseful for debugging — the screenshot is not compared against anything.
For visual regression, use visualSnapshot.
visualSnapshot
Capture a screenshot and diff against a stored baseline. The first run when no baseline exists writes the captured PNG as the new baseline and passes; subsequent runs diff against it.
- visualSnapshot: home-after-login # short form
- visualSnapshot:
id: home-after-login
tolerance: 0.5 # max % of pixels allowed to differ; default 0.5
region: { testID: notification-list } # narrow the comparisonBaselines live under __baselines__/<flow>/<id>.png. On mismatch, the
report artefact dir gets the actual / baseline / diff PNG triplet. See
visual snapshots for the full treatment.
Control flow
optional
Conditional / branch step. Runs do only if the when.visible
predicate matches the current element graph; otherwise the step passes
with details indicating the predicate was not met.
- optional:
when: { visible: { testID: whats-new-modal } }
do: { tap: { testID: whats-new-dismiss } }The planner uses this to encode prose conditionals (“if a What’s New modal appears, dismiss it”) that resolve at runtime against the actual screen.
Gotcha: optional steps cannot be nested inside another optional. Express compound conditionals as multiple sequential optionals.
Editor autocomplete
The full IR ships as a JSON Schema export. Add the language-server directive at the top of any YAML flow and your editor gets autocomplete
- hover docs + inline validation for every step listed on this page:
# yaml-language-server: $schema=../.klera/flow.schema.jsonSee editor support for the end-to-end setup across VS Code, JetBrains, Neovim, Helix, and Sublime.