Skip to main content
POST
/
v1
/
openai
/
chat
/
completions
Openai Chat Completions
curl --request POST \
  --url https://api.example.com/v1/openai/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "messages": [
    {
      "content": "<string>",
      "tool_call_id": "<string>",
      "cache_control": {},
      "role": "tool"
    }
  ],
  "stream": false,
  "temperature": 1,
  "top_p": 1,
  "min_p": 0,
  "top_k": 0,
  "max_tokens": 500000,
  "stop": "<string>",
  "n": 1,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "tools": [
    {
      "function": {
        "name": "<string>",
        "description": "<string>",
        "parameters": {}
      },
      "cache_control": {},
      "type": "function"
    }
  ],
  "tool_choice": "<string>",
  "response_format": {
    "type": "text"
  },
  "repetition_penalty": 1,
  "user": "<string>",
  "seed": 4611686018427388000,
  "logprobs": true,
  "stream_options": {
    "include_usage": true,
    "continuous_usage_stats": false
  },
  "reasoning_effort": "low",
  "reasoning": {
    "effort": "low",
    "enabled": true
  },
  "prompt_cache_key": "<string>",
  "chat_template_kwargs": {}
}
'
{
  "detail": [
    {
      "loc": [
        "<string>"
      ],
      "msg": "<string>",
      "type": "<string>"
    }
  ]
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

x-deepinfra-source
string | null
xi-api-key
string | null

Body

application/json
model
string
required

model name

Example:

"meta-llama/Llama-2-70b-chat-hf"

messages
(ChatCompletionToolMessage · object | ChatCompletionAssistantMessage · object | ChatCompletionUserMessage · object | ChatCompletionSystemMessage · object)[]
required

conversation messages: (user,assistant,tool)*,user including one system message anywhere

stream
boolean
default:false

whether to stream the output via SSE or return the full response

temperature
number
default:1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic

Required range: 0 <= x <= 2
top_p
number
default:1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

Required range: x <= 1
min_p
number
default:0

Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.

Required range: 0 <= x <= 1
top_k
integer
default:0

Sample from the best k (number of) tokens. 0 means off

Required range: x >= 0
max_tokens
integer | null

The maximum number of tokens to generate in the chat completion.

The total length of input tokens and generated tokens is limited by the model's context length. If explicitly set to None it will be the model's max context length minus input length or 16384, whichever is smaller.

Required range: 0 <= x <= 1000000
stop

up to 16 sequences where the API will stop generating further tokens

n
integer
default:1

number of sequences to return

Required range: 1 <= x <= 4
presence_penalty
number
default:0

Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

Required range: -2 <= x <= 2
frequency_penalty
number
default:0

Positive values penalize new tokens based on how many times they appear in the text so far, increasing the model's likelihood to talk about new topics.

Required range: -2 <= x <= 2
tools
ChatTools · object[] | null

A list of tools the model may call. Currently, only functions are supported as a tool.

tool_choice

Controls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. required means the model must call a function. defined tool means the model must call that specific tool. none is the default when no functions are present. auto is the default if functions are present.

response_format
TextResponseFormat · object

The format of the response. Currently, only json is supported.

repetition_penalty
number
default:1

Alternative penalty for repetition, but multiplicative instead of additive (> 1 penalize, < 1 encourage)

Required range: 0.01 <= x <= 5
user
string | null

A unique identifier representing your end-user, which can help monitor and detect abuse. Avoid sending us any identifying information. We recommend hashing user identifiers.

seed
integer | null

Seed for random number generator. If not provided, a random seed is used. Determinism is not guaranteed.

Required range: -9223372036854776000 <= x < 18446744073709552000
logprobs
boolean | null

Whether to return log probabilities of the output tokens or not.If true, returns the log probabilities of each output token returned in the content of message.

stream_options
StreamOptions · object

streaming options

reasoning_effort
enum<string> | null

Constrains effort on reasoning for reasoning models. Currently supported values are none, low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response. Setting to none disables reasoning entirely if the model supports.

Available options:
low,
medium,
high,
none
reasoning
ChatReasoningSettings · object

Reasoning configuration.

prompt_cache_key
string | null

A key to identify prompt cache for reuse across requests. If provided, the prompt will be cached and can be reused in subsequent requests with the same key.

chat_template_kwargs
Chat Template Kwargs · object

Chat template kwargs.

Response

Successful Response