GEO-16 — A Benchmark for How AI Search Engines Cite Sources

TL;DR

AI search engines don't rank ten blue links — they read a handful of sources and cite a few. The unit of competition is the citation, not the rank. GEO-16 turns "be citable" into a measurable rubric: 16 pillars of page quality, each scored, combining into an overall citation-worthiness score G ∈ [0, 1].

Empirically, the pillars tied to metadata & freshness, semantic HTML, and structured data showed the strongest association with being cited, and overall page quality was itself a strong predictor of citation. In short: machine-readable, well-structured, and recently-maintained pages get cited more.

The framework

16 pillars, one score.

GEO-16 scores a page across sixteen pillars that group into a few families. The pillars marked below carried the strongest empirical association with citation in our audit — but the full, exact rubric and per-pillar weights live in the paper.

Strongest signal

Metadata & freshness

Recency markers, dated and versioned claims, last-modified signals, and maintenance cadence. Stale pages get displaced.

Strongest signal

Semantic HTML

Clean heading hierarchy, real landmarks, answer-first prose a model can extract in one window.

Strongest signal

Structured data

schema.org / JSON-LD that resolves the entity and makes claims machine-checkable.

Pillar family

Content quality & specificity

Concrete statistics, direct claims, and quotable statements over hedged generalities.

Pillar family

Authority & trust

Attribution, provenance, and signals a model uses to decide a source is safe to repeat.

Pillar family

Machine accessibility

Crawlability for AI agents, robots/sitemap hygiene, and surfaces like llms.txt.

The six families above organize the sixteen pillars for readability; for the precise pillar list, scoring method, and weights, read the paper (§ methodology).

What we found

Key findings.

Field data

A large-scale audit of real AI citations across production generative search engines — the empirical base for the pillar associations, not a lab simulation.

3 pillars

Metadata & Freshness, Semantic HTML, and Structured Data showed the strongest associations with citation. The web's machine-readable layer matters as much as the prose.

G score

Overall page quality — the combined G score — was itself a strong predictor of citation. Citation-worthiness is holistic, not a single trick.

Freshness

Recency markers and concrete statistics surfaced repeatedly as top predictors — the basis for the "knowledge freshness" infrastructure thesis behind Wrodium.

Why it matters

From SEO to GEO.

Classic SEO optimizes for ranking in a list of links. Generative engines collapse that list into an answer and cite a small set of sources. That changes the objective: instead of "rank #1," the goal is "be the source the model trusts enough to quote." GEO-16 is an attempt to make that objective auditable and explainable rather than folklore — a rubric you can score a page against and act on.

It also has teeth for evaluation: because each pillar is defined and the citation outcome is observed, GEO-16 doubles as a measurement harness for citation behavior across engines over time.

Read & cite

The paper.

GEO-16: A Benchmark for Generative Engine Optimization — Arlen Kumar & Leanid Palkhouski, 2025. arXiv:2509.10762, CC BY 4.0.

@article{kumar2025geo16,
  title   = {GEO-16: A Benchmark for Generative Engine Optimization},
  author  = {Kumar, Arlen and Palkhouski, Leanid},
  journal = {arXiv preprint arXiv:2509.10762},
  year    = {2025},
  url     = {https://arxiv.org/abs/2509.10762}
}

Code and dataset release in progress — this page links to the runnable artifacts as they publish.