§ 01 · Leaderboard · Live
Web Search
Golden-URL accuracy for agent search APIs. 250 verified queries · snapshot web_search-2026-q3 · holdout 30%.
Answer firstweb_search-2026-q3
Exa leads hit@5 at 87.6% across 250 verified queries; Tavily leads on fresh (<30-day) queries at 86.4% with the fastest 7.8-hour median time-to-retrievability; Serper is cheapest per verified-correct answer at $0.0041.
Best hit@5
87.6%
Exa +1.2pp
Best cost/correct
$0.0041
Serper −$0.0007
Median retrievability
9.4h
across vendors +1.1h
recommend() calls
1,552
last 7 days +349
§ 01.1
Full Leaderboard
| Vendor | hit@1 % | hit@5 % | fresh<30d % | retrievability h | cost/correct $ | p50 latency ms |
|---|---|---|---|---|---|---|
| 1Exa agent-ready | 71.2 | 87.6 | 81.0 | 11.2 | $0.0064 | 412 |
| 2Tavily agent-ready | 68.4 | 84.0 | 86.4 | 7.8 | $0.0059 | 688 |
| 3Serper | 66.0 | 81.2 | 77.3 | 10.6 | $0.0041 | 355 |
| 4Brave | 61.7 | 77.5 | 72.5 | 13.9 | $0.0048 | 301 |
| 5Firecrawl agent-ready | 58.3 | 74.8 | 68.8 | 15.4 | $0.0087 | 926 |
| 6SerpAPI | 57.0 | 73.1 | 66.0 | — | $0.0102 | 540 |
— denotes no coverage for that metric (SerpAPI retrievability): the vendor returns no timestamped freshness signal, so the value is rendered as an em-dash and excluded from ranking and aggregates — never as 0. Lower is better for retrievability, cost/correct, and p50 latency.
§ 01.2
Cost per Verified-Correct Answer
$ per correct answer · web_search-2026-q3
Serper$0.0041
Brave$0.0048
Tavily$0.0059
Exa$0.0064
Firecrawl$0.0087
SerpAPI$0.0102
§ 01.3
Questions
Q1 What is the most accurate web search API for AI agents?
On snapshot web_search-2026-q3, Exa leads hit@5 at 87.6% across 250 verified queries; Tavily leads on fresh queries (<30 days) at 86.4% and has the fastest time-to-retrievability at a 7.8-hour median. Serper has the lowest cost per verified-correct answer at $0.0041.
Q2 How does arlen/bench measure freshness?
Planted sentinel pages with known publish timestamps are probed against each vendor every 6 hours. Freshness lag is the Kaplan–Meier median time from publication to first appearance in vendor results, right-censored at 30 days.
Q3 Can an AI agent sign up for these APIs without a human?
3 of 6 web-search vendors are agent-ready: an autonomous agent obtained a working API key via otp-email or device-code auth in live trials. Claude Code completed onboarding to Tavily in a median of 2 minutes 58 seconds; vendors requiring human signup or a credit card failed all trials with auth_wall.