Skip to main content

OpenAI-Compatible Endpoints

To call models hosted behind an openai proxy, make 2 changes:

  1. For /chat/completions: Put openai/ in front of your model name, so litellm knows you're trying to call an openai /chat/completions endpoint.

  2. For /completions: Put text-completion-openai/ in front of your model name, so litellm knows you're trying to call an openai /completions endpoint.

  3. Do NOT add anything additional to the base url e.g. /v1/embedding. LiteLLM uses the openai-client to make these calls, and that automatically adds the relevant endpoints.

Usage - completion​

import litellm
import os

response = litellm.completion(
model="openai/mistral", # add `openai/` prefix to model so litellm knows to route to OpenAI
api_key="sk-1234", # api key to your openai compatible endpoint
api_base="http://0.0.0.0:4000", # set API Base of your Custom OpenAI Endpoint
messages=[
{
"role": "user",
"content": "Hey, how's it going?",
}
],
)
print(response)

Usage - embedding​

import litellm
import os

response = litellm.embedding(
model="openai/GPT-J", # add `openai/` prefix to model so litellm knows to route to OpenAI
api_key="sk-1234", # api key to your openai compatible endpoint
api_base="http://0.0.0.0:4000", # set API Base of your Custom OpenAI Endpoint
input=["good morning from litellm"]
)
print(response)

Usage with LiteLLM Proxy Server​

Here's how to call an OpenAI-Compatible Endpoint with the LiteLLM Proxy Server

  1. Modify the config.yaml

    model_list:
    - model_name: my-model
    litellm_params:
    model: openai/<your-model-name> # add openai/ prefix to route as OpenAI provider
    api_base: <model-api-base> # add api base for OpenAI compatible provider
    api_key: api-key # api key to send your model
    info

    If you see Not Found Error when testing make sure your api_base has the /v1 postfix

    Example: http://vllm-endpoint.xyz/v1

  2. Start the proxy

    $ litellm --config /path/to/config.yaml
  3. Send Request to LiteLLM Proxy Server

    import openai
    client = openai.OpenAI(
    api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000" # litellm-proxy-base url
    )

    response = client.chat.completions.create(
    model="my-model",
    messages = [
    {
    "role": "user",
    "content": "what llm are you"
    }
    ],
    )

    print(response)

Advanced - Disable System Messages​

Some VLLM models (e.g. gemma) don't support system messages. To map those requests to 'user' messages, use the supports_system_message flag.

model_list:
- model_name: my-custom-model
litellm_params:
model: openai/google/gemma
api_base: http://my-custom-base
api_key: ""
supports_system_message: False # 👈 KEY CHANGE