Clustering

The process of grouping a project's Google Search Console queries into topical buckets so authority, ownership, and editorial gaps can be measured at the topic level instead of one query at a time.

Also known as:Query ClusteringTopic Clustering

Clustering is the process that turns a flat list of ranking queries into a much smaller set of meaningful topics.

A typical project pulls 3,000-15,000 unique queries from 90 days of Google Search Console. Looking at any one of them in isolation is useless — you can't tell which represent the same underlying intent and which are genuinely different topics. Clustering is what makes that data actionable.

How b/cited does it

Every query gets embedded with OpenAI's text-embedding-3-small model into a 1536-dimension vector that captures its semantic meaning. Then a centroid-greedy algorithm running over Cloudflare Vectorize groups vectors that are close enough together (cosine similarity above 0.85) into the same cluster. Each cluster gets a human-readable name from GPT-4.1-mini.

The result: a project with 8,000 GSC queries becomes ~40-80 topical clusters the dashboard can present, score, and brief.

Why not k-means or HDBSCAN

Both work in theory. Centroid-greedy + Vectorize wins on three things: it runs in a single Worker request budget, it handles arbitrary cluster shapes (HDBSCAN's specialty), and the cluster centroid lookup at query time is O(log n) — the same pattern internal-link suggestion and brief generation use later in the pipeline.