docs(benchmarks): publish Deliverable 3 pilot benchmark (#126)
Establishes docs/reference/benchmarks/ as the home for published
benchmark runs. Adds the v0.1.0-beta.6 pilot (30 UIC documents,
175 pages, $12.99, 235 issues graded by severity) as the first
entry: report, manifest, aggregate summary, flat per-document CSV,
and 30 qualitative review notes.
Generalizes scripts/batch_run.py to accept either a manifest file
(reproducible corpora) or a directory glob (ad-hoc runs); drops
the hardcoded UIC-specific paths. Adds docs/how-to/run-the-benchmark.md
for reproduction.
Phase 2 improvement roadmap links to #82 rather than duplicating it.
Closes #80
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>