Canonical URL is the version of a page you've designated as the authoritative one when multiple URLs serve the same or near-identical content. It's declared via a <link> tag in the HTML head:
<link rel="canonical" href="https://bcited.ai/glossary/canonical" />
When search engines crawl a duplicate (say, ?utm_source=... parameter variants, or https://www.bcited.ai/... vs https://bcited.ai/...), they read the canonical tag and consolidate ranking signals onto the canonical URL instead of splitting them.
Common canonical scenarios:
- Parameter variants —
?utm_*,?sort=, pagination — should canonicalize to the parameter-free base URL - www vs non-www — pick one, canonicalize the other
- HTTP vs HTTPS — HTTP canonicalizes to HTTPS (Cloudflare's "Always Use HTTPS" + HSTS handles this)
- Cross-domain syndication — a guest post on Medium can canonicalize back to your own domain's version
Why it matters for AEO
LLMs follow canonical signals when deciding which URL to cite. If your content lives at five URLs but only one is canonical, the LLM cites the canonical — even if it crawled the others. Get this wrong and you can be cited as the wrong URL (a parameter version, a syndicated copy, or an old subdomain that 301s but kept being indexed).
What b/cited does about it
The site readiness audit checks:
- Canonical tag presence (Indexing category warning if missing)
- Canonical pointing to a valid resolvable URL (warning if it 404s)
- Canonical matching the page's own URL when no duplicates exist (warning on self-canonical mismatch — common config bug)
- HTTP→HTTPS canonical alignment
These usually surface as low-severity findings but can be high-impact: a misconfigured canonical leaks ranking signal silently for months.