context-eval is a local-first Context A/B Testing Framework for coding agents. It compares context variants under controlled local conditions and records the resulting artifacts for engineering review.
Use these docs when you want to understand the product boundary, run the fixture demo, inspect the architecture, or prepare the repository documentation site.
Agent-facing context changes are easy to ship and hard to evaluate. A new
AGENTS.md, local documentation bundle, DeepWiki export, skill, or rule set can
look useful while still making real coding-agent tasks slower, less stable, or
less correct.
context-eval keeps that question local and inspectable: hold the repository, task, command template, trials, and validation commands steady, then change the context variant and review the recorded artifacts.
report.md, results.jsonl, run_manifest.json, logs, patches,
exports, and optional UI output before drawing conclusions.Read the docs in this order for a first pass:
Maintainers preparing a GitHub Pages project site should also read Pages setup.
context-eval is local-first. It compares context variants such as AGENTS.md,
local docs, DeepWiki exports, skills, and rules against explicit local tasks and
validation commands.
The outputs are local observations, not absolute model rankings. The validation confidence boundary comes from project validation commands and human review, not from patch size or an LLM judge alone. Reporting is artifact-only: completed reports, exports, terminal summaries, and the static UI read recorded local artifacts.
context-eval is not a leaderboard, hosted service, provider billing tool, credential manager, automatic agent installer, or automatic target-repository commit workflow. The static UI is offline and export-only. The local app is an explicit loopback mode that runs on the user’s machine.