arlen/benchOpen benchmarks for agentic consumers
SNAPSHOT web_search-2026-q3
11 JUN 2026 · BERKELEY, CA
Vendor Profile · Illustrative prototype

Tavily

Tavily on the arlen/bench leaderboards — agent web-search accuracy, extraction fidelity, freshness, cost per verified-correct answer, and agent-readiness, scored against golden truth.

Answer firstillustrative

Tavily ranks #2 of 6 for agent web-search accuracy at 84.0% hit@5; #4 of 7 for extraction fidelity at 0.79, and is agent-ready — an autonomous agent obtained a working API key in live trials with zero humans.

hit@5
84.0%
web search
cost / correct
$0.0059
web search
extraction fidelity
0.79
web extraction
agent-ready
Yes
onboarding harness
§ A

Web Search — Tavily

snapshot web_search-2026-q3
Metric TavilyLeaderboard bestDirection
hit@1 %68.4Exa 71.2higher is better
hit@5 %84.0Exa 87.6higher is better
fresh<30d %86.4leadshigher is better
retrievability h7.8leadslower is better
cost/correct $$0.0059Serper $0.0041lower is better
p50 latency ms688Brave 301lower is better
§ B

Web Extraction — Tavily

snapshot web_extraction-2026-q2
Metric TavilyLeaderboard bestDirection
fidelity 0-10.79Firecrawl 0.91higher is better
JS gap Δ0.18Firecrawl 0.03lower is better
block rate %7.6Apify 3.3lower is better
cost/correct $$0.0044Jina $0.0009lower is better
schema validity %Firecrawl 96.2higher is better
§ C

Compare Tavily

Tavily vs Exa Tavily vs Serper Tavily vs Brave Tavily vs Firecrawl Tavily vs SerpAPI

Illustrative prototype. No verified vendor run has been published yet; every figure here is a placeholder and must not be cited as a measured result. Numbers are replaced when a snapshot’s first full run lands.