arXiv:2509.10762 First author UC Berkeley · Hearst Lab CC BY 4.0

GEO-16: what makes AI search engines cite a page.

A 16-pillar benchmark for Generative Engine Optimization. We audited 18,635 AI citations from production AI search engines to learn which properties of a web page actually predict whether a model cites it — not whether it ranks.

TL;DR

AI search engines don't rank ten blue links — they read a handful of sources and cite a few. The unit of competition is the citation, not the rank. GEO-16 turns "be citable" into a measurable rubric: 16 pillars of page quality, each scored, combining into an overall citation-worthiness score G ∈ [0, 1].

Empirically, the pillars tied to metadata & freshness, semantic HTML, and structured data showed the strongest association with being cited, and overall page quality was itself a strong predictor of citation. In short: machine-readable, well-structured, and recently-maintained pages get cited more.

The framework

16 pillars, one score.

GEO-16 scores a page across sixteen pillars that group into a few families. The pillars marked below carried the strongest empirical association with citation in our audit — but the full, exact rubric and per-pillar weights live in the paper.

Strongest signal
Metadata & freshness
Recency markers, dated and versioned claims, last-modified signals, and maintenance cadence. Stale pages get displaced.
Strongest signal
Semantic HTML
Clean heading hierarchy, real landmarks, answer-first prose a model can extract in one window.
Strongest signal
Structured data
schema.org / JSON-LD that resolves the entity and makes claims machine-checkable.
Pillar family
Content quality & specificity
Concrete statistics, direct claims, and quotable statements over hedged generalities.
Pillar family
Authority & trust
Attribution, provenance, and signals a model uses to decide a source is safe to repeat.
Pillar family
Machine accessibility
Crawlability for AI agents, robots/sitemap hygiene, and surfaces like llms.txt.

The six families above organize the sixteen pillars for readability; for the precise pillar list, scoring method, and weights, read the paper (§ methodology).

What we found

Key findings.

18,635
AI citations audited across production generative search engines — the empirical base for the pillar associations, not a lab simulation.
3 pillars
Metadata & Freshness, Semantic HTML, and Structured Data showed the strongest associations with citation. The web's machine-readable layer matters as much as the prose.
G score
Overall page quality — the combined G score — was itself a strong predictor of citation. Citation-worthiness is holistic, not a single trick.
Freshness
Recency markers and concrete statistics surfaced repeatedly as top predictors — the basis for the "knowledge freshness" infrastructure thesis behind Wrodium.
Why it matters

From SEO to GEO.

Classic SEO optimizes for ranking in a list of links. Generative engines collapse that list into an answer and cite a small set of sources. That changes the objective: instead of "rank #1," the goal is "be the source the model trusts enough to quote." GEO-16 is an attempt to make that objective auditable and explainable rather than folklore — a rubric you can score a page against and act on.

It also has teeth for evaluation: because each pillar is defined and the citation outcome is observed, GEO-16 doubles as a measurement harness for citation behavior across engines over time.

Read & cite

The paper.

GEO-16: A Benchmark for Generative Engine Optimization — Arlen Kumar & Leanid Palkhouski, 2025. arXiv:2509.10762, CC BY 4.0.

@article{kumar2025geo16,
  title   = {GEO-16: A Benchmark for Generative Engine Optimization},
  author  = {Kumar, Arlen and Palkhouski, Leanid},
  journal = {arXiv preprint arXiv:2509.10762},
  year    = {2025},
  url     = {https://arxiv.org/abs/2509.10762}
}

Code and dataset release in progress — this page links to the runnable artifacts as they publish.