Ingestion is the durable Cloudflare Workflow that runs every time a project syncs — on initial onboarding, when you hit Re-sync, and on the nightly cron.
It's the only path that writes ranking data into BCited. Every dashboard surface (clusters, authority, briefs, quick wins) reads from what ingestion produced.
The seven steps
- Claim — atomic project status transition from
readytoingesting, prevents double-runs. - Fetch GSC — pulls the last 90 days of query-level data from Google Search Console via the read-only
webmasters.readonlyOAuth scope. - Persist queries — batched D1 inserts; old vectors are purged before the new batch lands.
- Embed — OpenAI batch calls (parallelized 8-wide via AI Gateway) turn each query into a 1536-dim vector.
- Cluster + label — centroid-greedy clustering over Vectorize, then GPT-4.1-mini names each cluster.
- URL pass — second GSC pull at query × page granularity, used for ownership-status scoring.
- Authority — per-cluster weighted blend (ranking share, position strength, impression coverage, URL consolidation) produces the 0–100 authority score.
Total wall-clock is usually 3-5 minutes per project depending on query volume. Workflow steps are durable: a failure at step 5 retries from step 5, not from the top.