ollama_proxy/README.md

# Ollama-OpenAI Proxy

This is a Go-based proxy server that enables applications designed to work with the Ollama API to interact seamlessly with an OpenAI-compatible endpoint. It translates and forwards requests and responses between the two APIs while applying custom transformations to the model names and data formats.

**Note:** This is a pet project I use to forward requests to LiteLLM for use with Kerlig, which doesn’t support custom OpenAI endpoints. As this is a personal project, there might be issues and rough edges. Contributions and feedback are welcome!

## Features

**Endpoint Proxying:**
- **/v1/models & /v1/completions:** These endpoints are forwarded directly to the downstream OpenAI-compatible server.
- **/api/tags:** Queries the downstream `/v1/models` endpoint, transforms the response into the Ollama-style model list, and appends `:proxy` to model names if they don’t already contain a colon.
- **/api/chat:** Rewrites the request to the downstream `/v1/chat/completions` endpoint. It intercepts and transforms streaming NDJSON responses from the OpenAI format into the expected Ollama format, including stripping any trailing `:proxy` from model names.
- **/api/pull and other unknown endpoints:** Forwarded to a local Ollama instance running on `127.0.0.1:11505`.

**Debug Logging:**
When running in debug mode, the proxy logs:
- Every incoming request.
- The outgoing `/api/chat` payload.
- Raw downstream streaming chunks and their transformed equivalents.

**Model Name Handling:**
- For `/api/tags`, if a model ID does not contain a colon, the proxy appends `:proxy` to the name.
- For other endpoints, any `:proxy` suffix in model names is stripped before forwarding.

## Prerequisites

- **Go 1.18+** installed.
- An OpenAI-compatible server endpoint (e.g., running on `http://127.0.0.1:4000`).
- *(Optional)* A local Ollama instance running on `127.0.0.1:11505` for endpoints not handled by the downstream server.

## Installation

Clone this repository:

```
git clone https://github.com/yourusername/ollama-openai-proxy.git
cd ollama-openai-proxy
```

Build the project:

```
go build -o proxy-server ollama_proxy.go
```

## Usage

Run the proxy server with the desired flags:

```
./proxy-server --listen=":11434" --target="http://127.0.0.1:4000" --api-key="YOUR_API_KEY" --debug
```

## Command-Line Flags

- `--listen`: The address and port the proxy server listens on (default `:11434`).
- `--target`: The base URL of the OpenAI-compatible downstream server (e.g., `http://127.0.0.1:4000`).
- `--api-key`: *(Optional)* The API key for the downstream server.
- `--debug`: Enable detailed debug logging for every request and response.

## How It Works

1. **Request Routing:**
   The proxy intercepts requests and routes them based on the endpoint:
   - Requests to `/v1/models` and `/v1/completions` are forwarded directly.
   - Requests to `/api/tags` are handled locally by querying `/v1/models` on the downstream, transforming the JSON response, and appending `:proxy` where needed.
   - Requests to `/api/chat` are rewritten to `/v1/chat/completions`, with the payload and response processed to strip or add the `:proxy` suffix as appropriate.
   - All other endpoints are forwarded to the local Ollama instance.

2. **Response Transformation:**
   Streaming responses from the downstream `/v1/chat/completions` endpoint (in NDJSON format) are read line by line. Each chunk is parsed, transformed into the Ollama format, and streamed back to the client.

3. **Logging:**
   With debug mode enabled, detailed logs of incoming requests, outgoing payloads, and both raw and transformed response chunks are printed.

## Contributing

Contributions are welcome! As this is a pet project, there may be rough edges and issues. Please feel free to open issues or submit pull requests for improvements and bug fixes.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								# Ollama-OpenAI Proxy
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
 								This is a Go-based proxy server that enables applications designed to work with the Ollama API to interact seamlessly with an OpenAI-compatible endpoint. It translates and forwards requests and responses between the two APIs while applying custom transformations to the model names and data formats.
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								**Note:** This is a pet project I use to forward requests to LiteLLM for use with Kerlig, which doesn’t support custom OpenAI endpoints. As this is a personal project, there might be issues and rough edges. Contributions and feedback are welcome!
 								## Features
 								**Endpoint Proxying:**
 								- **/v1/models & /v1/completions:** These endpoints are forwarded directly to the downstream OpenAI-compatible server.
 								- **/api/tags:** Queries the downstream `/v1/models` endpoint, transforms the response into the Ollama-style model list, and appends `:proxy` to model names if they don’t already contain a colon.
 								- **/api/chat:** Rewrites the request to the downstream `/v1/chat/completions` endpoint. It intercepts and transforms streaming NDJSON responses from the OpenAI format into the expected Ollama format, including stripping any trailing `:proxy` from model names.
 								- **/api/pull and other unknown endpoints:** Forwarded to a local Ollama instance running on `127.0.0.1:11505`.
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								**Debug Logging:**
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
+								When running in debug mode, the proxy logs:
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								- Every incoming request.
 								- The outgoing `/api/chat` payload.
 								- Raw downstream streaming chunks and their transformed equivalents.
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								**Model Name Handling:**
 								- For `/api/tags`, if a model ID does not contain a colon, the proxy appends `:proxy` to the name.
 								- For other endpoints, any `:proxy` suffix in model names is stripped before forwarding.
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								## Prerequisites
 								- **Go 1.18+** installed.
 								- An OpenAI-compatible server endpoint (e.g., running on `http://127.0.0.1:4000`).
 								- *(Optional)* A local Ollama instance running on `127.0.0.1:11505` for endpoints not handled by the downstream server.
 								## Installation
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
 								Clone this repository:
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								```
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
+								git clone https://github.com/yourusername/ollama-openai-proxy.git
 								cd ollama-openai-proxy
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								```
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
 								Build the project:
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								```
-												Update README.md
											
										
										
											2025-02-15 11:38:18 +00:00
+								go build -o proxy-server ollama_proxy.go
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								```
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								## Usage
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
 								Run the proxy server with the desired flags:
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								```
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
+								./proxy-server --listen=":11434" --target="http://127.0.0.1:4000" --api-key="YOUR_API_KEY" --debug
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								```
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								## Command-Line Flags
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								- `--listen`: The address and port the proxy server listens on (default `:11434`).
 								- `--target`: The base URL of the OpenAI-compatible downstream server (e.g., `http://127.0.0.1:4000`).
 								- `--api-key`: *(Optional)* The API key for the downstream server.
 								- `--debug`: Enable detailed debug logging for every request and response.
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								## How It Works
 . **Request Routing:**
 								   The proxy intercepts requests and routes them based on the endpoint:
 								   - Requests to `/v1/models` and `/v1/completions` are forwarded directly.
 								   - Requests to `/api/tags` are handled locally by querying `/v1/models` on the downstream, transforming the JSON response, and appending `:proxy` where needed.
 								   - Requests to `/api/chat` are rewritten to `/v1/chat/completions`, with the payload and response processed to strip or add the `:proxy` suffix as appropriate.
 								   - All other endpoints are forwarded to the local Ollama instance.
 . **Response Transformation:**
 								   Streaming responses from the downstream `/v1/chat/completions` endpoint (in NDJSON format) are read line by line. Each chunk is parsed, transformed into the Ollama format, and streamed back to the client.
 . **Logging:**
 								   With debug mode enabled, detailed logs of incoming requests, outgoing payloads, and both raw and transformed response chunks are printed.
 								## Contributing
 								Contributions are welcome! As this is a pet project, there may be rough edges and issues. Please feel free to open issues or submit pull requests for improvements and bug fixes.
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								## License
-												Initial commit

											
										
										
											2025-02-15 11:26:16 +00:00
-												Update README.md
											
										
										
											2025-02-15 11:35:44 +00:00
+								This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.