AI Gateway is a Cloudflare service that sits between your application and LLM providers (OpenAI, Anthropic, Perplexity, Google Gemini, others). Your code calls the gateway URL instead of the provider's URL directly; the gateway forwards the request, caches eligible responses, logs every call, and surfaces analytics across providers.
The change is one URL swap — the OpenAI client points at:
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_slug}/openai
instead of https://api.openai.com/v1. Everything else stays identical.
What the gateway adds:
- Caching — identical requests return cached responses without re-paying the provider. Saves real money on repeated embeddings or deterministic prompts.
- Retry with backoff — provider hiccups (rate limits, transient 5xx) get automatic retries before failing.
- Cost + latency analytics — per-prompt, per-provider, per-day breakdown.
- Single auth point — provider API keys live in Cloudflare, not scattered across services.
- Rate limiting + budget caps — bounded spend if a workflow goes runaway.
Why it matters
b/cited makes a lot of provider calls per ingest:
- Hundreds of embedding calls (one per query × batches of 100)
- A handful of brief-generation calls (one per cluster being briefed)
- AEO citation runs across three providers per tracked prompt
Cached embeddings alone save more than the gateway costs at scale. The cross-provider analytics are how we know the cost-per-AEO-run breakdown — visible at dash.cloudflare.com/?to=/ai/aigateway.
What b/cited does with it
- One gateway slug per account; all three Workers (
web,api,ingest) route through it AI_GATEWAY_SLUGis a Worker var, not a secret — the URL is public, only the provider API key in the upstream request is sensitive- Anthropic + Perplexity routes use the same gateway via their respective
/anthropicand/perplexity-aisuffixes