Request transformations
Use LLM request transformations to dynamically compute and set fields in LLM requests using Common Expression Language (CEL) CEL (Common Expression Language) A simple expression language used throughout agentgateway to enable flexible configuration. CEL expressions can access request context, JWT claims, and other variables to make dynamic decisions. expressions. Transformations let you enforce policies such as capping token usage or conditionally modifying request parameters, without changing client code.
To learn more about CEL, see the following resources:
Before you begin
Configure LLM request transformations
Create an AgentgatewayPolicy resource to apply an LLM request transformation. The following example caps
max_tokensto 10, regardless of what the client requests.kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayPolicy metadata: name: cap-max-tokens namespace: agentgateway-system labels: app: agentgateway spec: targetRefs: - group: gateway.networking.k8s.io kind: HTTPRoute name: openai backend: ai: transformations: - field: max_tokens expression: "min(llmRequest.max_tokens, 10)" EOFSetting Description backend.ai.transformationsA list of LLM request field transformations. fieldThe name of the LLM request field to set. Maximum 256 characters. expressionA CEL expression that computes the value for the field. Use the llmRequestvariable to access the original LLM request body. Maximum 16,384 characters.ℹ️You can specify up to 64 transformations per policy. Transformations take priority over
overridesfor the same field. If an expression fails to evaluate, the field is silently removed from the request.Thinking budget fields, such as
reasoning_effortandthinking_budget_tokenscan also be set or capped by using transformations. This way, operators can enforce reasoning limits centrally without requiring client changes. For example, use"field": "reasoning_effort"with the expression"medium"to cap all requests to medium reasoning efforts regardless of what the client sends.Send a request with
max_tokensset to a value greater than 10. The transformation caps it to 10 before the request reaches the LLM provider. Verify that thecompletion_tokensvalue in the response is 10 or fewer, the response is capped and thefinish_reasonis set tolength.curl "$INGRESS_GW_ADDRESS/v1/chat/completions" \ -H "content-type: application/json" \ -d '{ "model": "gpt-3.5-turbo", "max_tokens": 5000, "messages": [ { "role": "user", "content": "Tell me a short story" } ] }' | jqcurl "localhost:8080/v1/chat/completions" \ -H "content-type: application/json" \ -d '{ "model": "gpt-3.5-turbo", "max_tokens": 5000, "messages": [ { "role": "user", "content": "Tell me a short story" } ] }' | jqExample output:
{ "model": "gpt-3.5-turbo-0125", "usage": { "prompt_tokens": 12, "completion_tokens": 10, "total_tokens": 22, "completion_tokens_details": { "reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0 }, "prompt_tokens_details": { "cached_tokens": 0, "audio_tokens": 0 } }, "choices": [ { "message": { "content": "Once upon a time, in a small village nestled", "role": "assistant", "refusal": null, "annotations": [] }, "index": 0, "logprobs": null, "finish_reason": "length" } ], ... }
Inject LLM model information as response headers
Use CEL expressions to inject LLM model information as response headers. This strategy is useful for detecting silent fallbacks, where a request is redirected to a different model without the client being notified. However, this setup might not be suitable for streaming responses.
Inject model headers from request and response bodies
Parse the model field from the incoming request body and the upstream response body using json(), then inject them as response headers. This configuration lets you compare which model was requested against which model actually responded.
json(request.body).model: Reads themodelfield from the incoming request body.json(response.body).model: Reads themodelfield from the upstream response body.
Create a AgentgatewayPolicy resource that targets the OpenAI provider’s HTTPRoute and injects the model fields as response headers.
kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayPolicy metadata: name: llm-model-headers namespace: agentgateway-system labels: app: agentgateway spec: targetRefs: - group: gateway.networking.k8s.io kind: HTTPRoute name: openai traffic: transformation: response: set: - name: x-requested-model value: 'string(json(request.body).model)' - name: x-actual-model value: 'string(json(response.body).model)' EOFSend a chat completion request through the gateway and inspect the response headers.
curl -vi "http://$INGRESS_GW_ADDRESS/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hi"}]}'curl -vi "http://localhost:8080/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hi"}]}'Example output:
< HTTP/1.1 200 OK HTTP/1.1 200 OK < content-type: application/json content-type: application/json < x-requested-model: gpt-4 x-requested-model: gpt-4 < x-actual-model: gpt-4 x-actual-model: gpt-4 ...Actual model values might differ slightly from the requested model, even if the same model is used. Some responses might include a unique identifier as part of the model name. In these circumstances, you might use the
contains()function to verify.When a fallback model handles the request,
x-actual-modeldiffers fromx-requested-model:< x-requested-model: gpt-4o x-requested-model: gpt-4o < x-actual-model: gpt-4o-mini x-actual-model: gpt-4o-mini
Detect fallback with the llm context variables
When the agentgateway proxy routes to an AI backend, the llm CEL context provides first-class variables that are parsed directly from the LLM protocol layer rather than from raw body strings:
llm.requestModel: The model name from the original request.llm.responseModel: The model name the upstream LLM provider reported in the response.
Use metadata to compute each value once and reference it by name. This setup avoids repeating the default() fallback expression in every header and keeps the x-model-fallback condition readable:
kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
name: llm-context-vars
namespace: agentgateway-system
labels:
app: agentgateway
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: openai
traffic:
transformation:
response:
metadata:
requestedModel: 'default(llm.requestModel, string(json(request.body).model))'
actualModel: 'default(llm.responseModel, string(json(response.body).model))'
set:
- name: x-requested-model
value: metadata.requestedModel
- name: x-actual-model
value: metadata.actualModel
- name: x-model-fallback
value: 'metadata.requestedModel != metadata.actualModel ? "true" : "false"'
EOFThe default() fallback is written once per value rather than repeated in every header and in the comparison.
Cleanup
You can remove the resources that you created in this guide.kubectl delete AgentgatewayPolicy cap-max-tokens -n agentgateway-system --ignore-not-found
kubectl delete AgentgatewayPolicy llm-model-headers -n agentgateway-system --ignore-not-found
kubectl delete AgentgatewayPolicy llm-context-vars -n agentgateway-system --ignore-not-found
kubectl delete httproute openai -n agentgateway-system --ignore-not-found