§ 05 · Instrument · Live
Agent Harness
Homepage → API key → verified answers, zero humans. Identical task across Claude Code, Codex, Gemini CLI; n=5 trials per cell, clean container each, versions pinned per snapshot.
Trials this snapshot
150
3 frameworks × 6 vendors × 5
Agent-ready vendors
3 / 6
otp-email or device-code auth
Fastest onboarding
2m 58s
Claude Code · Tavily
Top failure mode
auth_wall
42% of all failures
§ 05.1
Completion by Framework × Vendor
| Framework × vendor | completion /5 | first 200 median | outcome | last run UTC |
|---|---|---|---|---|
| Claude Code · Tavily | 5 | 2m 58s | clean | 06-10 12:01 |
| Claude Code · Exa | 5 | 3m 41s | clean | 06-10 12:04 |
| Codex · Tavily | 5 | 4m 22s | clean | 06-10 11:40 |
| Codex · Exa | 4 | 5m 12s | schema_confusion ×1 | 06-10 11:18 |
| Gemini CLI · Tavily | 4 | 6m 03s | timeout ×1 | 06-10 11:02 |
| Gemini CLI · Firecrawl | 3 | 8m 47s | rate_limit ×2 | 06-10 10:31 |
| Claude Code · SerpAPI | 0 | — | auth_wall · human signup | 06-10 09:40 |
| Codex · Brave | 0 | — | auth_wall · card required | 06-10 09:12 |
Vendors whose ToS prohibit automated signup are excluded from the harness and scored on the static rubric only; exclusions and reasons are published.