OpenAI Realtime

Verified Code examples on this page have been automatically tested and verified.

Proxy OpenAI Realtime API traffic through agentgateway to get token usage tracking and observability for WebSocket-based interactions.

About

The OpenAI Realtime API uses WebSocket connections for low-latency, multimodal interactions. agentgateway can proxy these WebSocket connections and parse the response.done events to extract token usage data, including input tokens, output tokens, and cached token counts.

To enable token usage tracking, you must prevent the client and server from negotiating WebSocket frame compression. When the sec-websocket-extensions: permessage-deflate header is present, the WebSocket frames are compressed and agentgateway cannot parse the token usage data. Remove this header from the request so that frames remain uncompressed and parseable.

ℹ️
The Realtime route type supports token usage tracking and observability. Other LLM policies such as prompt guards, prompt enrichment, and request-body rate limiting are not supported for WebSocket traffic.

Before you begin

  1. Install and set up an agentgateway proxy.
  2. Set up access to the OpenAI or an OpenAI API-compatible LLM provider.

Step 1: Add the Realtime route type

Verify that your OpenAI AgentgatewayBackend includes the Realtime route type in the policies.ai.routes map. The default behavior routes all traffic as Completions. You must explicitly add the Realtime route type for the /v1/realtime path.

If you already set up multiple endpoints, add the /v1/realtime path to your existing AgentgatewayBackend.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: openai
  namespace: agentgateway-system
spec:
  ai:
    provider:
      openai:
        model: gpt-4
  policies:
    auth:
      secretRef:
        name: openai-secret
    ai:
      routes:
        "/v1/chat/completions": "Completions"
        "/v1/realtime": "Realtime"
        "*": "Passthrough"
EOF

Step 2: Remove the WebSocket compression header

Create an AgentgatewayPolicy resource that removes the sec-websocket-extensions header from requests to the OpenAI Realtime endpoint. This step prevents the client and server from negotiating permessage-deflate compression, which would make WebSocket frames unreadable for token tracking.

  1. Create the AgentgatewayPolicy to strip the header. Target the HTTPRoute section that handles the /v1/realtime path.

    kubectl apply -f- <<EOF
    apiVersion: agentgateway.dev/v1alpha1
    kind: AgentgatewayPolicy
    metadata:
      name: realtime-strip-websocket-extensions
      namespace: agentgateway-system
    spec:
      targetRefs:
      - group: gateway.networking.k8s.io
        kind: HTTPRoute
        name: openai
        sectionName: openai-realtime
      traffic:
        transformation:
          request:
            remove:
            - sec-websocket-extensions
    EOF
  2. Verify that the AgentgatewayPolicy is accepted.

    kubectl get agentgatewaypolicy realtime-strip-websocket-extensions -n agentgateway-system

Step 3: Send a Realtime request

Send a request to the OpenAI Realtime API through agentgateway using a WebSocket client. The Realtime API uses WebSocket connections, so standard HTTP tools like curl do not work. Use a WebSocket client such as websocat, wscat, or a custom application.

Connect to the agentgateway proxy URL with the model query parameter and send the following client events as JSON messages.

  1. Create a conversation item with a text message.

    {"type":"conversation.item.create","item":{"type":"message","role":"user","content":[{"type":"input_text","text":"Say hello in one word."}]}}
  2. Trigger a text response.

    {"type":"response.create","response":{"modalities":["text"]}}
  3. Look for a response.done event in the server output. This event contains the token usage data that agentgateway extracts for metrics.

    {"type":"response.done","response":{...,"usage":{"total_tokens":225,"input_tokens":150,"output_tokens":75}}}

Step 4: Verify token tracking

After the Realtime request completes, verify that agentgateway recorded the token usage metrics.

  1. Open the agentgateway metrics endpoint.
  2. Look for the agentgateway_gen_ai_client_token_usage metric. The metric includes labels for the token type (input or output) and the model used.

For more information about LLM metrics and observability, see LLM cost tracking.

Cleanup

You can remove the resources that you created in this guide.
kubectl delete AgentgatewayPolicy realtime-strip-websocket-extensions -n agentgateway-system
Agentgateway assistant

Ask me anything about agentgateway configuration, features, or usage.

Note: AI-generated content might contain errors; please verify and test all returned information.

Tip: one topic per conversation gives the best results. Use the + button in the chat header to start a new conversation.

Switching topics? Starting a new conversation improves accuracy.
↑↓ navigate select esc dismiss

What could be improved?

Your feedback helps us improve assistant answers and identify docs gaps we should fix.

Need more help? Join us on Discord: https://discord.gg/y9efgEmppm

Want to use your own agent? Add the Solo MCP server to query our docs directly. Get started here: https://search.solo.io/.