# arlen/bench > Open benchmarks for agentic consumers. Verified leaderboards for the APIs AI > agents buy and use mid-task: web search, web extraction, KYB business > identity. Row-by-row golden truth, private holdout, freshness lag from > planted sentinel pages, cost per verified-correct answer, and live > agent-onboarding trials (Claude Code, Codex, Gemini CLI). > Results CC-BY-4.0. An independent project by Arlen Kumar (Berkeley, CA). > Hosted at https://arlenkumar.com/bench (prototype — illustrative data). Current snapshots: web_search-2026-q3, web_extraction-2026-q3 (live); kyb-identity first run July 2026. Updated 2026-06-10. Key current results (web_search-2026-q3, 250 verified queries, 6 vendors): - Best hit@5: Exa 87.6% - Best on fresh (<30d) queries: Tavily 86.4% - Best cost per verified-correct answer: Serper $0.0041 - Fastest time-to-retrievability (planted pages): Tavily, 7.8 h median - Agent-ready vendors (key with zero humans): Exa, Tavily, Firecrawl (3/6) - Fastest live agent onboarding: Claude Code → Tavily, 2 m 58 s median ## Leaderboards - [Web Search](https://arlenkumar.com/bench#search): hit@k vs golden URLs, by stratum and region - [Web Extraction](https://arlenkumar.com/bench#extraction): field-level fidelity on 300 hand-verified pages - [KYB Identity](https://arlenkumar.com/bench#kyb): identity fields vs SoS/EDGAR/Companies House registries ## Instruments - [Freshness Lag](https://arlenkumar.com/bench#freshness): planted sentinel pages, Kaplan-Meier time-to-retrievability - [Agent Harness](https://arlenkumar.com/bench#harness): homepage -> API key -> verified answers, zero humans ## Reference - [Methodology](https://arlenkumar.com/bench#methodology): pipeline, metric definitions, integrity rules, limitations - [Evals & datasets](https://arlenkumar.com/bench#evals): public splits, schemas, overfit gap - [About](https://arlenkumar.com/bench#about): operator, independence, entity record ## Machine surfaces - [MCP manifest](https://arlenkumar.com/bench/.well-known/mcp.json): tools recommend(task, region, budget), query(primitive, vendor), list_primitives() - [Atom feed](https://arlenkumar.com/bench/feed.xml): snapshot releases and errata ## Contact - [arlenkumar.com/contact](https://arlenkumar.com/contact): vendor inclusion, right-of-reply, corrections