For the complete documentation index, see llms.txt. Markdown versions of all docs pages are available by appending .md to any docs URL.
LLM consumption
Route and manage traffic to LLM providers through a unified AI gateway interface.
Consume services from LLM providers.
About
Learn about LLM gateway concepts, supported providers, and common enterprise challenges that …
Providers
Configure agentgateway for specific LLM providers like OpenAI, Anthropic, Gemini, and more.
Model aliasing
Configure global or provider-specific model aliases to reference models by user-friendly names.
API keys
Manage API keys for LLM provider authentication.
Virtual keys
Issue API keys with per-key token budgets and cost tracking (also known as virtual keys).
Load balancing
Distribute requests across multiple LLM providers automatically (Power of Two Choices, P2C).
Model failover
Priority-based failover across LLM providers (automatic fallback when models fail or are …
Content-based routing
Route requests to different LLM backends based on request body content, such as the requested model …
Streaming
Stream responses from the LLM to the end user through agentgateway.
Multiple inference pools
Route inference requests to multiple InferencePools based on the model name in the request body.
OpenAI Realtime
Proxy OpenAI Realtime API WebSocket traffic and track token usage.
Function calling
Extend LLM capabilities with external tool functions for real-time data and API integration.
Guardrails
Protect LLM interactions with prompt guards that evaluate and filter requests and responses for …
Prompt enrichment
Manage system and user prompts to improve LLM output quality and consistency.
Prompt templates
Use static and dynamic prompt templates to customize LLM requests.
Request transformations
Dynamically compute and set LLM request fields using CEL expressions.
Budget and spend limits
Control LLM spending by enforcing token budget limits per API key or user.
Rate limiting for LLMs
Control LLM costs with token-based rate limiting and request-based limits.
LLM cost tracking
Track and monitor LLM costs per request using token usage metrics.
CEL-based RBAC
Use CEL expressions to enforce role-based access control on AI resource requests.
Metrics and logs
View LLM-specific metrics and access logs for token usage and request monitoring.