Ollama

Configure Ollama to serve local models through agentgateway. Ollama runs models locally on your machine and exposes an OpenAI-compatible API that agentgateway can route to.

Before you begin

Install the agentgateway binary.
Install Ollama.
Make sure that you have at least one model pulled locally.
```
ollama list
```
If not, pull a model.
```
ollama pull llama3.2
```

Configure agentgateway

Create a configuration file that routes requests to your local Ollama instance.

# yaml-language-server: $schema=https://agentgateway.dev/schema/config
llm:
  port: 3000
  models:
  - name: "*"
    provider: openAI
    params:
      hostOverride: "localhost:11434"

Review the following table to understand this configuration.

Setting	Description
`provider`	Set to `openAI` because Ollama exposes an OpenAI-compatible API.
`params.hostOverride`	Points to the Ollama server address. The default Ollama port is `11434`.
`name: "*"`	Matches any model name, so clients can request any model that Ollama has pulled.

Start agentgateway:

agentgateway -f config.yaml

Test the configuration

Send a request to verify that agentgateway routes to Ollama. The model name in the request must match a model you have pulled with ollama pull.

curl http://localhost:3000/v1/chat/completions \
  -H "content-type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [
      {"role": "user", "content": "Hello! Tell me about Ollama in one sentence."}
    ]
  }' | jq

Example output:

{"model":"llama3.2","usage":{"prompt_tokens":14,"completion_tokens":323,"total_tokens":337},"choices":[{"message":{"content":"<think>\nOkay, user just asked for a one-sentence explanation of Ollama. That's pretty concise and specific—no fluff allowed here. They're probably either testing my knowledge or genuinely need a quick definition without jargon overload.\n\nHmm, judging by the tone, they might be evaluating if I can give crisp technical explanations. Since they didn't specify their familiarity level, a neutral layman-to-engineer explanation would work best. \n\nOllama is known for being a wrapper around LLMs that handles infrastructure stuff invisibly to end users. But how to phrase it in one sentence without sounding like \"magic\"? Need to balance clarity and technical accuracy...\n\n*Brainstorming:*\nOption 1: Focus on what it does (local, multi-model inference) + benefit aspect\nOption 2: Contrast with alternatives (\"handles all the heavy lifting\")\nOption 3: Mention its approach-to-problem innovation\n\nGoing with Option 1 feels safest since non-engineers might struggle with \"wrapper\" terminology. Also emphasizing accessibility (\"you\") helps bridge technical and casual users. Should keep it under 50 characters for social media-readiness.\n\n...wait, is this going to feel too basic? No—simple beats vague when someone asks explicitly. Final polish: add asterisk as signal that I can expand explanation if they want more depth.\n</think>\nOllama lets you run large language models (LLMs) like Llama and Mistral on your device by handling all the infrastructure work for local multi-model inference.\n\n*(Let me know if you'd like a longer explanation!)*","role":"assistant"},"index":0,"finish_reason":"stop"}],"id":"chatcmpl-738","object":"chat.completion","created":1773934551,"system_fingerprint":"fp_ollama"}

Troubleshooting

Rate limit exceeded (429)

What’s happening:

The request returns rate limit exceeded and logs show endpoint=api.openai.com:443.

Why it’s happening:

The hostOverride setting is not being applied, so agentgateway is sending requests to the default OpenAI host (api.openai.com) instead of your local Ollama. Without a valid OpenAI API key or within rate limits, that API returns a 429 response.

How to fix it:

Ensure agentgateway is started with your config file explicitly.
```
agentgateway -f /path/to/your/config.yaml
```

In the config file, put hostOverride under params with correct indentation and use a string value.

llm:
  port: 3000
  models:
  - name: "*"
    provider: openAI
    params:
      hostOverride: "localhost:11434"

After a successful fix, the agentgateway logs show endpoint=localhost:11434 (or your override) instead of api.openai.com:443.

Connection refused

What’s happening:

Requests to agentgateway return a 503 error or connection refused.

Why it’s happening:

Ollama is not running or is not listening on the expected port.

How to fix it:

Verify Ollama is running.

curl http://localhost:11434/api/version

Example output:

{"version":"0.11.8"}

If Ollama is not running, start it.
```
ollama serve
```

Model not found

What’s happening:

The response returns a model not found error.

Why it’s happening:

The requested model has not been pulled to your local Ollama instance.

How to fix it:

List available models.
```
ollama list
```
Pull the missing model.
```
ollama pull llama3.2
```

OpenAI-compatible providers Vertex AI

Ollama

Before you begin

Configure agentgateway

Test the configuration

Troubleshooting

Rate limit exceeded (429)

Connection refused

Model not found

What could be improved?