Skip to main content

Scripts

Scripts are the runnable code behind Qodex scenarios. When Qodex creates a scenario, it also creates a script that can run the test again without asking the LLM to recreate the steps.

What scripts do

Every scenario produces one script. That script is what runs when you click Run, when a schedule fires, when a webhook lands, or when CI invokes the suite. Scripts are standard test code, not opaque blobs.
  • UI scripts are Playwright Test specs (.spec.ts). They use the intent runner: each natural-language step is resolved via gpt-5-mini using a live accessibility snapshot, with a step cache for zero-LLM replays.
  • API scripts are Node.js HTTP runners (.test.ts). Steps are deterministic HTTP calls with assertions on status, headers, and JSON paths. No LLM on the replay path, ever.
Both shapes use environment variables. The same script can run against staging, production, or a preview deploy with no code change:
TARGET_URL=https://staging.app.com \
API_BASE_URL=https://staging.api.com \
AUTH_TOKEN=$STAGING_AUTH \
node scripts/auth-rejects-invalid-password.test.ts

Where scripts live

In the hosted product, scripts live in the scenarios.script_id link inside Postgres, and are materialized to a temp directory (.qodeclaw-scripts/) at execution time. Postgres is the source of truth; the filesystem is only the working scratch space. In self-hosted deploys you can sync scripts to a connected GitHub repo. The agent writes scenarios and scripts in the chat; the sync pushes them to a branch.

Why scripts lower replay cost

The LLM authors the script once. After that, replays are deterministic, parameterized, and close to what a hand-written Playwright or HTTP test would do. Your nightly suite costs Playwright or HTTP runtime, not OpenAI tokens. UI replay can still call gpt-5-mini on a cache miss for self-healing. That makes UI runs cheap, not always zero. A fully deterministic UI runner is on the roadmap and brings UI reruns to the same zero-LLM cost level as API reruns.

Ejectability

Generated scripts are standard code. You can clone the repo, run them locally, edit them, and check them into git. There is no proprietary runtime or code-level lock-in. If you leave Qodex, your tests come with you.

When to use it

  • Use scripts when you want regression coverage that runs in CI at standard test cost.
  • Read and edit the generated test before promoting it.
  • Keep tests in git as part of your normal code review process.

When not to use it

  • The thing you want to test is a one-shot exploration. Use chat; let the agent decide.
  • You need to share state across scripts. Each script is independent and re-establishes its own setup.

On the roadmap

Planned: deterministic UI replay runner. Removes the LLM from the cached UI replay path, bringing UI replays to API-level zero-LLM cost. See product.md.
Planned: SARIF and JUnit XML export for CI integrations. Today scripts emit JSON. SARIF unlocks GitHub Code Scanning; JUnit unlocks standard CI test reporters.

Scenarios

The structured spec a script implements.

Findings

What a failed script produces when the failure is a real bug.

Run tests

The execution surface for scripts.

GitHub sync

Push scenarios and scripts to a connected repo.