This spec defines the accepted first-class model for running context-eval against Codex CLI, Claude Code, traecli, and custom local coding agents through noninteractive commands.
The runtime supports one command-template agent through the legacy
agent.command shape. The profile model keeps that behavior and expands it into named agent
profiles so users can run the same task, context variant, and trial matrix
against multiple local agents without copying entire config files.
A user can define one or more local agent profiles. Each profile names a coding agent and provides a noninteractive command template such as:
agents:
codex:
kind: "codex-cli"
command: "codex exec -C {workspace} - < {prompt_file}"
timeout_minutes: 60
claude:
kind: "claude-code"
command: "claude -p {prompt_file}"
timeout_minutes: 60
trae:
kind: "traecli"
command: "traecli -p \"{prompt}\""
timeout_minutes: 60
coco:
kind: "coco"
command: "coco -y --query-timeout 10m --bash-tool-timeout 5m -p \"{prompt}\""
timeout_minutes: 60
context-eval prepares the workspace, prompt file, context overlays, validation commands, and artifact directory. The selected coding agent command runs noninteractively from the prepared workspace and may modify files. context-eval captures stdout, stderr, patch, diff stats, validation results, and optional telemetry. It never commits automatically.
Existing configs with a single agent field remain valid. The old shape:
agent:
name: "myAgent"
command: "myAgent -p {prompt_file}"
is treated as a single implicit profile. The new agents map is additive and
becomes the preferred form when more than one agent is configured.
Configs must not set both a top-level agent and agents unless a later
migration spec defines explicit precedence. The first implementation should keep
the failure mode clear instead of guessing which shape wins.
Each profile has:
name: derived from the map key and written to result rows as agent_name.kind: required for agents profiles and set to codex-cli, claude-code,
traecli, coco, or custom; legacy agent configs default to custom.command: the noninteractive command template.timeout_minutes: command timeout for the coding agent process.network: recorded in results as today; still not real network isolation.prompt_template: optional path resolved relative to the config file.telemetry: optional collector config, starting with none and json-file.The kind field is a profile classification for validation, presets, and UI
copy. It does not install agents and does not imply a hosted provider
integration.
The command template supports the existing variables:
{workspace}{prompt}{prompt_file}{task_id}{variant}{output_dir}{telemetry_file} when the JSON telemetry collector prepares oneThe template validator must reject unknown variables before any agent command runs. The UI must show the rendered command preview for a representative case with sensitive values avoided or clearly local.
Commands must be noninteractive. A profile that requires login prompts, editor prompts, or manual approval is invalid for unattended evaluation until the user changes the local agent configuration outside context-eval.
Built-in presets help users start, but they remain editable templates:
codex-cli: a Codex CLI noninteractive command template.claude-code: a Claude Code noninteractive command template.traecli: a traecli noninteractive command template such as
traecli -p "{prompt}".coco: a Coco noninteractive command template such as
coco -y --query-timeout 10m --bash-tool-timeout 5m -p "{prompt}". This
allows unattended local edits and shell commands through Coco’s own CLI
permission mode. The preset is a user-editable agents.coco.command
template, not a permission grant from context-eval. Users who want narrower
approval can replace -y with explicit --allowed-tool flags.custom: a user-supplied command such as my-agent --prompt {prompt_file}.The preset layer must not assume every machine has the agent installed. Preflight checks should verify executable availability only when requested by the user or when the local app is running an explicit preflight.
CLI users can request the same side-effect-free executable availability check with:
context-eval validate-config --strict --check-agents --config context-eval.yaml
--check-agents checks the first executable token from each configured command
template, including profile-map entries such as agents.trae.command. It does
not run agent commands, install coding agents, validate credentials, or call
hosted services.
When multiple profiles are selected, the case matrix becomes:
agent x task x variant x trial
Every result row records agent_name, task_id, variant, and trial_index.
Artifacts remain case-local, including prompt, logs, patch, workspace, and
optional telemetry files. Row ordering should remain deterministic by selected
agent, task, variant, and trial unless a later spec changes the ordering.
The CLI runs all configured profiles by default. `context-eval run –agent