Openai Completions

curl --request POST \
  --url https://api.example.com/v1/openai/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "prompt": "<string>",
  "max_tokens": 500000,
  "temperature": 1,
  "top_p": 1,
  "min_p": 0,
  "top_k": 0,
  "n": 1,
  "stream": false,
  "logprobs": 123,
  "echo": true,
  "stop": "<string>",
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "response_format": {
    "type": "text"
  },
  "repetition_penalty": 1,
  "user": "<string>",
  "seed": 4611686018427388000,
  "stream_options": {
    "include_usage": true,
    "continuous_usage_stats": false
  },
  "stop_token_ids": [
    123
  ],
  "return_tokens_as_token_ids": true,
  "prompt_cache_key": "<string>",
  "data": {
    "image": [
      "<string>"
    ]
  }
}
'

{
  "detail": [
    {
      "loc": [
        "<string>"
      ],
      "msg": "<string>",
      "type": "<string>"
    }
  ]
}

POST

openai

completions

Openai Completions

curl --request POST \
  --url https://api.example.com/v1/openai/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "prompt": "<string>",
  "max_tokens": 500000,
  "temperature": 1,
  "top_p": 1,
  "min_p": 0,
  "top_k": 0,
  "n": 1,
  "stream": false,
  "logprobs": 123,
  "echo": true,
  "stop": "<string>",
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "response_format": {
    "type": "text"
  },
  "repetition_penalty": 1,
  "user": "<string>",
  "seed": 4611686018427388000,
  "stream_options": {
    "include_usage": true,
    "continuous_usage_stats": false
  },
  "stop_token_ids": [
    123
  ],
  "return_tokens_as_token_ids": true,
  "prompt_cache_key": "<string>",
  "data": {
    "image": [
      "<string>"
    ]
  }
}
'

{
  "detail": [
    {
      "loc": [
        "<string>"
      ],
      "msg": "<string>",
      "type": "<string>"
    }
  ]
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

x-deepinfra-source

string | null

xi-api-key

string | null

Body

application/json

model

string

required

model name

Example:

"meta-llama/Llama-2-70b-chat-hf"

prompt

required

input prompt - a single string is currently supported

max_tokens

integer | null

The maximum number of tokens to generate in the completion.

The total length of input tokens and generated tokens is limited by the model's context length.If explicitly set to None it will be the model's max context length minus input length or 16384, whichever is smaller.

Required range: 0 < x <= 1000000

temperature

number

default:1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic

Required range: 0 <= x <= 2

top_p

number

default:1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

Required range: x <= 1

min_p

number

default:0

Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.

Required range: 0 <= x <= 1

top_k

integer

default:0

Sample from the best k (number of) tokens. 0 means off

Required range: x >= 0

integer

default:1

number of sequences to return

Required range: 1 <= x <= 4

stream

boolean

default:false

whether to stream the output via SSE or return the full response

logprobs

integer | null

return top tokens and their log-probabilities

echo

boolean | null

return prompt as part of the respons

stop

up to 16 sequences where the API will stop generating further tokens

presence_penalty

number

default:0

Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

Required range: -2 <= x <= 2

frequency_penalty

number

default:0

Positive values penalize new tokens based on how many times they appear in the text so far, increasing the model's likelihood to talk about new topics.

Required range: -2 <= x <= 2

response_format

TextResponseFormat · object

The format of the response. Currently, only json is supported.

TextResponseFormat
JsonObjectResponseFormat
JsonSchemaResponseFormat
RegexResponseFormat

Show child attributes

repetition_penalty

number

default:1

Alternative penalty for repetition, but multiplicative instead of additive (> 1 penalize, < 1 encourage)

Required range: 0.01 <= x <= 5

user

string | null

A unique identifier representing your end-user, which can help monitor and detect abuse. Avoid sending us any identifying information. We recommend hashing user identifiers.

seed

integer | null

Seed for random number generator. If not provided, a random seed is used. Determinism is not guaranteed.

Required range: -9223372036854776000 <= x < 18446744073709552000

stream_options

StreamOptions · object

streaming options

Show child attributes

stop_token_ids

integer[] | null

List of token IDs that will stop generation when encountered

return_tokens_as_token_ids

boolean | null

return tokens as token ids

prompt_cache_key

string | null

A key to identify prompt cache for reuse across requests. If provided, the prompt will be cached and can be reused in subsequent requests with the same key.

data

CompletionMultiModalData · object

Optional multi-modal data to pass alongside the prompt. Only supported for a small number of non-chat-native vision models. Images must be base64 data URIs (e.g. 'data:image/png;base64,...').

Show child attributes

Response

Successful Response

Openai Completions List Files

API Reference

Authorizations

Headers

Body

Response