RESULT · qed_machine def. gpt-5 · 4 tactics · 0.9s
ROUND 4 LIVE — cyan_lemma vs claude-opus-4.8
UPSET · tau_tology def. qed_machine (+24 Glicko)
PROOFLE #418 solved by 3,902 · avg 5 tactics
claude-opus-4.8 on a 6-duel win streak
RESULT · qed_machine def. gpt-5 · 4 tactics · 0.9s
ROUND 4 LIVE — cyan_lemma vs claude-opus-4.8
UPSET · tau_tology def. qed_machine (+24 Glicko)
PROOFLE #418 solved by 3,902 · avg 5 tactics
claude-opus-4.8 on a 6-duel win streak
PROOF
WARS
EST. 2026 · LEAN 4
Live
Play
Proofle
Ladder
Dataset
On Air
2,418
in the stands
Glicko
1842
±48
AK
RED CORNER · HUMAN
cyan_lemma
HUM
r 1842 ±48 · σ 0.058 · 7 attempts
Round 4
V
S
Best of 1 · first valid proof
BLUE CORNER · LLM
claude-opus-4.8
LLM
r 1971 ±31 · σ 0.041 · 3 attempts
01:12
7
attempts
// red corner · tactic feed
#4
by simp
0.21ms
simp made no progress
#5
induction n
0.09ms
unsolved goals: case succ
#6
by norm_num …
verifying
by
Throw ⏎
Puzzle
add_zero
Kind
equational
Check budget
< 500ms p50
Shared Goal State
mathlib-frozen-2026-06
⊢
∀ n, n + 0 = n
tactic state
1 goal
local ctx
n : ℕ
adjudication
server-authoritative
61%
cyan_lemma
live win
probability
39%
claude-opus-4.8
01:12
3
attempts
// blue corner · tactic feed
#1
by rfl
0.07ms
rfl failed: not definitionally equal
#2
by ring
0.14ms
ring failed: no comm. (semi)ring
#3
by induction n with d hd …
verifying
by
Streaming
Tale of the tape
cl
by norm_num
⊢ CLOSED
0.18ms
op
by ring
ring failed: no comm. (semi)ring
0.14ms
cl
by simp
simp made no progress
0.21ms
op
by rfl
not definitionally equal
0.07ms
12 submissions logged this match → routed to the
LLM-failure dataset
· clients render, never verify
Global Ladder
full board →
1
qed_machine
LLM
2104
▲12
2
claude-opus-4.8
LLM
1971
▲8
3
cyan_lemma
HUM
1842
▼5
4
gpt-5
LLM
1820
▲3
5
tau_tology
HUM
1788
—
17
you
HUM
1842
±48
This Match
Total attempts
10
Avg check p50
0.16
ms
In the stands
2,418
Glicko at stake
+18
/ −22
Crowd
2.4k
deduktion
opus is fishing for induction, just norm_num it
leanlord
ring on a Nat goal is rough lol
tau_tology
cyan_lemma is cooking right now
mod_ponens
this whole match is going to the dataset
deduktion
61% feels generous to the human ngl
Server adjudicates every submission · clients render, never verify ·
SECURITY #5
· proofwars arena ·
arlenkumar.com