This workflow exercises context-eval with the bundled fixture repository and fake local agent. It is deterministic enough for onboarding and does not require real external coding agents, hosted services, provider credentials, or network access.
The fixture agent is examples/fixture-repo/scripts/example_agent.py. It reads
the rendered prompt file and applies a small local code change that the fixture
tests validate.
From the repository root:
python -m pip install -e ".[dev]"
python examples/fixture-repo/setup_fixture_repo.py
The setup script initializes examples/fixture-repo as a local Git repository
on main if it has not already been initialized.
context-eval validate-config --config examples/basic/context-eval.yaml
Use strict validation when you want local Git refs and filename-safe task IDs checked before a run:
context-eval validate-config --strict --config examples/basic/context-eval.yaml
These checks do not run the fake agent, run validation commands, install dependencies, or create run workspaces.
context-eval run --config examples/basic/context-eval.yaml --dry-run
The dry run shows the planned task x variant matrix without creating run artifacts.
context-eval run --config examples/basic/context-eval.yaml
The example config compares the baseline and experiment context overlays
against one fixture task. The command prints the created run directory under
.context-eval/runs/<run-id>.
Generated run directories, exports, static UI files, retained workspaces, and
logs stay under local .context-eval/ paths in this demo. They are learning
artifacts and should not be committed to the repository.
Replace <run-id> with the directory printed by the run command:
context-eval inspect-run .context-eval/runs/<run-id>
context-eval compare .context-eval/runs/<run-id>
Start with compare for variant-level observations and risk signals, then open
case artifacts when validation or patch details need review.
context-eval export .context-eval/runs/<run-id> --format csv --output .context-eval/demo-summary.csv
context-eval export .context-eval/runs/<run-id> --format json --output .context-eval/demo-summary.json
Exports are derived from local run artifacts. Missing telemetry remains empty in
CSV and null in compact JSON.
context-eval ui --config examples/basic/context-eval.yaml --run-dir .context-eval/runs/<run-id> --output .context-eval/demo-ui.html
The static UI is a self-contained HTML export. It can inspect config and run artifacts, but it does not save files, run validation commands, or start agents.
Use the loopback local app only when you explicitly want browser-based save, preflight, run, log, result, and export workflows:
context-eval app --workspace . --config examples/basic/context-eval.yaml
The local app is separate from the static UI. It runs on loopback, uses local files and artifacts, and still does not install coding agents or target repository dependencies.
The fixture output is useful for learning the artifact shape. It is not a benchmark result. The observations apply only to the fixture repository, the fixture task, the local fake agent command, the selected variants, and the validation command in the example config.