Ollama
Configure Ollama to serve local models through agentgateway. Ollama runs models locally on your machine and exposes an OpenAI-compatible API that agentgateway can route to.
Before you begin
- Install the
agentgatewaybinary. Install Ollama.
Make sure that you have at least one model pulled locally.
ollama listIf not, pull a model.
ollama pull llama3.2
Configure agentgateway
Create a configuration file that routes requests to your local Ollama instance.
# yaml-language-server: $schema=https://agentgateway.dev/schema/config
llm:
port: 3000
models:
- name: "*"
provider: openAI
params:
hostOverride: "localhost:11434"Review the following table to understand this configuration.
| Setting | Description |
|---|---|
provider | Set to openAI because Ollama exposes an OpenAI-compatible API. |
params.hostOverride | Points to the Ollama server address. The default Ollama port is 11434. |
name: "*" | Matches any model name, so clients can request any model that Ollama has pulled. |
Start agentgateway:
agentgateway -f config.yamlTest the configuration
Send a request to verify that agentgateway routes to Ollama. The model name in the request must match a model you have pulled with ollama pull.
curl http://localhost:3000/v1/chat/completions \
-H "content-type: application/json" \
-d '{
"model": "llama3.2",
"messages": [
{"role": "user", "content": "Hello! Tell me about Ollama in one sentence."}
]
}' | jqExample output:
{"model":"llama3.2","usage":{"prompt_tokens":14,"completion_tokens":323,"total_tokens":337},"choices":[{"message":{"content":"<think>\nOkay, user just asked for a one-sentence explanation of Ollama. That's pretty concise and specific—no fluff allowed here. They're probably either testing my knowledge or genuinely need a quick definition without jargon overload.\n\nHmm, judging by the tone, they might be evaluating if I can give crisp technical explanations. Since they didn't specify their familiarity level, a neutral layman-to-engineer explanation would work best. \n\nOllama is known for being a wrapper around LLMs that handles infrastructure stuff invisibly to end users. But how to phrase it in one sentence without sounding like \"magic\"? Need to balance clarity and technical accuracy...\n\n*Brainstorming:*\nOption 1: Focus on what it does (local, multi-model inference) + benefit aspect\nOption 2: Contrast with alternatives (\"handles all the heavy lifting\")\nOption 3: Mention its approach-to-problem innovation\n\nGoing with Option 1 feels safest since non-engineers might struggle with \"wrapper\" terminology. Also emphasizing accessibility (\"you\") helps bridge technical and casual users. Should keep it under 50 characters for social media-readiness.\n\n...wait, is this going to feel too basic? No—simple beats vague when someone asks explicitly. Final polish: add asterisk as signal that I can expand explanation if they want more depth.\n</think>\nOllama lets you run large language models (LLMs) like Llama and Mistral on your device by handling all the infrastructure work for local multi-model inference.\n\n*(Let me know if you'd like a longer explanation!)*","role":"assistant"},"index":0,"finish_reason":"stop"}],"id":"chatcmpl-738","object":"chat.completion","created":1773934551,"system_fingerprint":"fp_ollama"}Troubleshooting
Rate limit exceeded (429)
What’s happening:
The request returns rate limit exceeded and logs show endpoint=api.openai.com:443.
Why it’s happening:
The hostOverride setting is not being applied, so agentgateway is sending requests to the default OpenAI host (api.openai.com) instead of your local Ollama. Without a valid OpenAI API key or within rate limits, that API returns a 429 response.
How to fix it:
Ensure agentgateway is started with your config file explicitly.
agentgateway -f /path/to/your/config.yamlIn the config file, put
hostOverrideunderparamswith correct indentation and use a string value.llm: port: 3000 models: - name: "*" provider: openAI params: hostOverride: "localhost:11434"
After a successful fix, the agentgateway logs show endpoint=localhost:11434 (or your override) instead of api.openai.com:443.
Connection refused
What’s happening:
Requests to agentgateway return a 503 error or connection refused.
Why it’s happening:
Ollama is not running or is not listening on the expected port.
How to fix it:
Verify Ollama is running.
curl http://localhost:11434/api/versionExample output:
{"version":"0.11.8"}If Ollama is not running, start it.
ollama serve
Model not found
What’s happening:
The response returns a model not found error.
Why it’s happening:
The requested model has not been pulled to your local Ollama instance.
How to fix it:
- List available models.
ollama list - Pull the missing model.
ollama pull llama3.2