[ Post ]

How LLMs decide what to cite (and how to be in that set)

Citations look magical from the outside. They're not. Here's what actually drives whether ChatGPT, Claude, and Perplexity link to your site — and the editorial moves that consistently work.

May 22, 2026Updated May 26, 2026·b/cited·AEO

Every AI answer engine works roughly the same way:

Retrieval — for each query, the model (or a search subsystem) pulls candidate sources from a search index. This is the closest analog to traditional SEO ranking — pages that rank well organically tend to surface here.
Selection — from the candidate set, the model picks 3–10 sources it will actually cite. This step uses different signals than retrieval.
Synthesis — the model writes the answer, weaving citations inline or compiling them into a source list.

Most AEO work goes wrong because people optimize for step 1 (which is just SEO) and assume step 2 takes care of itself. It doesn't.

What actually happens in selection

Different models weigh this differently, but the pattern is consistent enough to bet on:

1. Structured extractability

LLMs preferentially cite pages from which they can pull crisp, quotable claims. A page with explicit <h2>What is X?</h2> followed by a direct one-paragraph answer gets cited far more than the same content as flowing prose, even if the prose is better-written.

The hard data on this is in your own AI Gateway logs if you run any: cached requests for the same prompt show consistent source preference for pages with FAQPage JSON-LD over equivalent untagged pages.

2. Citation density

Pages that themselves cite well — to primary sources, official documentation, named studies — outperform pages with the same content but no outbound links to credible sources.

Counterintuitive read: a page that links more gets cited more. The model uses outbound link patterns as a credibility signal.

3. Recency for time-sensitive queries

For anything time-sensitive ("what's the best X right now", "latest changes to Y"), recency wins. A page from this year that's mid-quality often beats a deeper page from three years ago.

If you have evergreen content that's pertinent to a time-sensitive query, the move is to add a clear "Last updated" date and refresh the dates when you genuinely revisit the content. Don't fake it — models also penalize sites with widespread date manipulation.

4. Domain trust shape

Citations cluster around domains the model perceives as authoritative. The signals here aren't mysterious: depth on the topic (multiple related pages, internal linking between them), incoming links from other trusted domains, and clean technical presentation.

For new domains, the path to "trusted" goes through producing genuinely useful pages that get linked from older trusted domains. There are no shortcuts — buying links works even worse in AEO than it does in SEO, because models pick up the link-farm pattern at scale.

5. Mention before citation

We see this consistently across our citation tracker: a brand often shows up as a mention in answers for a topic for weeks before it shows up as a citation. The model knows the name; it just hasn't picked you as a source yet.

If your tracking shows you're mentioned but not cited, that's actually a strong signal. You're in the consideration set. The work is to give the model a better reason to link to you specifically — usually clearer answer-shape content on the queries that are surfacing the mention.

The shortest path to being cited

Three moves, in order:

1. Find the prompts that should cite you

Pull the queries that drive traffic from Google Search Console. For each that's a question ("how do I X", "what is X", "best X for Y"), translate it into an AI-engine prompt and check the actual answer. Note who's cited.

2. Audit the pages that lose

For prompts where competitors get cited and you don't:

Does your equivalent page have explicit Q&A structure? Add an FAQ section in the first 300 words.
Does it have FAQPage schema? Add the JSON-LD.
Does it cite primary sources? Add 2–3 links to original research, official docs, or your own first-party data.
Is the page recently updated? If not, do a genuine refresh and update the date.

3. Measure over time

Run the same prompt set again 4–6 weeks later. If you moved from mention to citation on at least a third of the prompts, you're on the right track. If not, the bottleneck is usually one of:

The pages aren't extractable enough — the answer-shape is still buried in prose
The domain authority gap is too wide for the editorial work to close — focus on a narrower topic cluster
The competitor pages are recently updated; yours aren't

The whole loop — audit, fix, measure — is what tools like b/cited automate. Manual is fine to start; tooling matters when you want to know in week 6 whether week 3's work moved anything.

A note on engines varying

OpenAI, Anthropic, and Perplexity weight these signals differently:

Perplexity is the most retrieval-driven — closest to traditional SEO; it actively searches the web per query and cites what it finds.
OpenAI (ChatGPT) leans heavier on training data + selective web access. Citations are less common but more selective.
Anthropic (Claude) sits in the middle and is most sensitive to source quality signals.

For most AEO work, optimize for being cited by Perplexity first — the wins transfer to the others. The reverse is less consistent.

If you want this loop running on autopilot for your site, sign in with Google — b/cited connects Search Console and starts tracking a prompt set you define.

citation tracking
LLMs
schema
editorial