LLM consumption

Route and manage traffic to LLM providers through a unified AI gateway interface.

Consume services from LLM providers.

About

Learn about LLM gateway concepts, supported providers, and common enterprise challenges that …

Providers

Configure agentgateway for specific LLM providers like OpenAI, Anthropic, Gemini, and more.

Model aliasing

Configure global or provider-specific model aliases to reference models by user-friendly names.

API keys

Manage API keys for LLM provider authentication.

Virtual keys

Issue API keys with per-key token budgets and cost tracking (also known as virtual keys).

Load balancing

Distribute requests across multiple LLM providers automatically (Power of Two Choices, P2C).

Model failover

Priority-based failover across LLM providers (automatic fallback when models fail or are …

Content-based routing

Route requests to different LLM backends based on request body content, such as the requested model …

Streaming

Stream responses from the LLM to the end user through agentgateway.

Multiple inference pools

Route inference requests to multiple InferencePools based on the model name in the request body.

OpenAI Realtime

Proxy OpenAI Realtime API WebSocket traffic and track token usage.

Function calling

Extend LLM capabilities with external tool functions for real-time data and API integration.

Guardrails

Protect LLM interactions with prompt guards that evaluate and filter requests and responses for …

Prompt enrichment

Manage system and user prompts to improve LLM output quality and consistency.

Prompt templates

Use static and dynamic prompt templates to customize LLM requests.

Request transformations

Dynamically compute and set LLM request fields using CEL expressions.

Budget and spend limits

Control LLM spending by enforcing token budget limits per API key or user.

Rate limiting for LLMs

Control LLM costs with token-based rate limiting and request-based limits.

LLM cost tracking

Track and monitor LLM costs per request using token usage metrics.

CEL-based RBAC

Use CEL expressions to enforce role-based access control on AI resource requests.

Metrics and logs

View LLM-specific metrics and access logs for token usage and request monitoring.

Was this page helpful?