Builder · Founder · Researcher

Arlen
Kumar

CTO @ Wrodium · NLP/IR @ Berkeley Hearst Lab · GEO-16

I build products, platforms, GEO infrastructure, AI benchmarks, agent-readable surfaces, and the systems that fight bureaucracy and measure AI.

products.
platforms.
GEO infra.
benchmarks.
RAG systems.
agent layers.
web apps.
APIs.
backends.
pipelines.
dev tooling.
living articles.
cockpits.
foundations.

CTO at Wrodium. NLP/IR researcher at Berkeley's Hearst Lab. Currently building the knowledge-freshness infrastructure that decides what AI engines cite — and writing autopsies for dead benchmarks.

Read the research → See what I'm building

Berkeley · SkyDeck · Hearst Lab

UC BerkeleySkyDeck '24Prof. Marti HearstAir Quake (exit)SCET featured

UC BerkeleyCS, Data Science, Economics

Hearst LabResearch under Prof. Marti Hearst

Berkeley SkyDeck '24Backed · Most Innovative Tech

Previous exitAir Quake Simulations (acquired)

● Now — June 2026

What I'm actually working on.

The honest version. Three things, each with a real deadline.

CHASE @ COLM 2026

Research paper submitted to COLM 2026 — the Conference on Language Modeling. Awaiting the decision round.

Under reviewdecisions Jul 8, 2026

Wrodium

Building the GEO + knowledge-freshness infrastructure that decides which sources AI engines trust and cite.

$1.5M pre-seed2026

Things I build

Shipped, live, and clickable.

All projects & research →

01 / Build

Wrodium

The infrastructure layer between brands and AI — share of voice, citations, and the content that wins them. $1.5M pre-seed, Berkeley SkyDeck.

TypeCompany

StatusLive

Open ↗

02 / Build

llms.txt Generator

Free GEO tool that crawls any site and generates its llms.txt — the curated map that tells AI engines what to cite. One click.

TypeGEO tool

StatusFree

Open →

03 / Build

arlen/bench

Open, reproducible benchmarks for the APIs that AI agents buy and use mid-task.

TypeBenchmarks

StatusOpen

Open →

04 / Build

Wrodium Roast

Drop a URL, get a brutally honest AI-visibility audit.

TypeAudit

StatusLive

Open ↗

05 / Living article

Springbank

A continuously-updated piece on the Springbank fund and its Shotsy investment — a working demo of Wrodium's knowledge-freshness layer keeping a page accurate and current for AI engines.

TypeLiving article

StatusLive

Read the living article ↗

06 / Build

Wrodium · .pro

The knowledge-freshness layer for LLMs.

TypeProduct

StatusLive

Open ↗

07 / Build

Benchmark Graveyard

An illustrated data-essay autopsying how AI benchmarks die.

TypeData-essay

StatusRead

Open →

08 / Build

Proof Duel Arena

A live, sportscast-style arena where two solvers race to prove a proposition.

TypeArena

StatusDemo

Open →

09 / Build

Market Auction Engine

A Bloomberg-terminal-style double-auction simulator with a tick-by-tick clearing price.

TypeSimulator

StatusDemo

Open →

10 / Build

regress.fish

A NOAA-chart-styled bite-probability readout for the Sonoma Coast.

TypeReadout

StatusDemo

Open →

Selected work

Things that exist and you can check.

Every card links to something live.

Research · NLP/IR

GEO-16

A 16-factor framework for what AI answer engines actually cite — grounded in an empirical audit of real citations across production engines, not vibes.

arXiv:2509.10762 ↗Plain-language explainer →

Open dataset · Evals

The Benchmark Graveyard

Autopsies of dead AI benchmarks — saturation, contamination, gaming — as a citable dataset with reproducible saturation curves and evidence tiers.

Visit the museum →Code & data ↗

Formal methods

proofwars

Competitive Lean 4 theorem proving — a full Glicko-2 rating engine, a sandboxed warm prover pool, and an authoritative match server.

Play a duel →Code ↗

Applied ML

regress.fish

An honest fishing forecast on real NOAA data with a public, auto-published accuracy scoreboard — it ships the Brier score even when it's embarrassing.

Open the forecast →Code ↗

Wrodium, in action

From one prompt to a publishing pipeline.

Describe a content workflow in plain English; the Copilot builds the agent pipeline — ground, generate, de-slop, publish.

External demo

See Wrodium compose a pipeline, live

Loads an interactive app from wrodium-dashboard-ui.vercel.app — nothing third-party loads until you click.

Live demo · the Workflow Copilot composing a daily-blog → CMS pipeline from one prompt

The mission

Hallucinations aren't just embarrassing — they're expensive. I build systems that give LLMs perfect context: structured, verified, current.

Most American business content is a mess — outdated pages, unstructured data, no fact-verification layer. Wrodium is the infrastructure layer that makes AI accurate about real businesses. The goal isn't just to be cited — it's to be cited correctly.

Leanid & Arlen, co-founders of Wrodium — Leanid & Arlen — co-founders of Wrodium

About me

The short version.

At 19 I built and sold Air Quake Simulations, a VR flight-sim hardware company whose 3D-printed cockpits undercut the industry roughly 10×.

Today I research how AI answer engines decide which sources to cite at UC Berkeley's Hearst Lab under Prof. Marti Hearst — the work that became the GEO-16 framework.

And I'm CTO of Wrodium, building the infrastructure that decides who gets cited when AI becomes the interface to everything.

What we're building

Wrodium — Generative Engine Optimization for brands that refuse to be invisible.

Leanid and I started it because the companies that structure their content for AI today will own their categories tomorrow. Everyone else will wonder where their traffic went.

Audits

We show you exactly how AI systems currently describe your brand. (Spoiler: it's probably wrong or incomplete.)

Optimizes

We restructure your content so AI engines can parse, verify, and cite it — schema markup, semantic structure, fact-verification layers.

Maintains

AI changes fast. We keep your content current so you don't drift into hallucination territory.

Measures

We track AI visibility and citations so you can tie "ChatGPT mentioned us" to actual revenue.

We've built integrations for six major CMS platforms. We run experiments constantly. We know what works because we test it — not because we read a blog post about prompt engineering.

Research

Most "AI SEO" advice is vibes. I wanted data.

Leanid and I ran hundreds of experiments on how AI answer engines select sources. The result: "AI Answer Engine Citation Behavior: An Empirical Analysis of the GEO-16 Framework" — a paper identifying the 16 factors that predict whether AI cites you.

It's not theoretical. It's built from real queries, real responses, and real citation patterns across ChatGPT, Gemini, Perplexity, and Claude.

I treat research as product development: every insight becomes a feature, every experiment becomes something I can ship.

Read GEO-16 on arXiv →

What I care about

The principles under the work.

Honest attribution

The web has a credit problem — creators of accurate, useful content get erased when AI summarizes them with no link. Better structure, better standards, better tracking for how AI cites sources.

Responsibility that ships

I've studied how AI systems erase or misrepresent people — that's a product concern, not just an ethical one. I build systems that optimize for accurate exposure, not just exposure.

Theory → buttons

I like research. I like shipping more. The best part of Wrodium is turning abstract ideas like "generative engine optimization" into dashboards, workflows, and reports people actually use.

Track record