The completions API is the legacy text generation interface — you provide a raw prompt string and the model continues it. For most use cases, the Chat Completions API is simpler and recommended instead.
The endpoint is:
POST https://api.deepinfra.com/v1/openai/completions
This is an advanced API. You need to know your model’s exact prompt format. Different models have different input formats. Check the model’s API section on its page for the expected format.
Example
The example below uses deepseek-ai/DeepSeek-V3 with its prompt format:
from openai import OpenAI
openai = OpenAI(
api_key="$DEEPINFRA_TOKEN",
base_url="https://api.deepinfra.com/v1/openai",
)
stream = True # or False
completion = openai.completions.create(
model="deepseek-ai/DeepSeek-V3",
prompt="<|begin▁of▁sentence|><|User|>Hello!<|Assistant|>",
stop=["<|end▁of▁sentence|>"],
stream=stream,
)
if stream:
for event in completion:
if event.choices[0].finish_reason:
print(event.choices[0].finish_reason,
event.usage.prompt_tokens,
event.usage.completion_tokens)
else:
print(event.choices[0].text, end="", flush=True)
else:
print(completion.choices[0].text)
print(completion.usage.prompt_tokens, completion.usage.completion_tokens)
Supported parameters
| Parameter | Notes |
|---|
model | Model name or MODEL_NAME:VERSION |
prompt | Raw prompt string in the model’s expected format |
max_tokens | |
stream | |
temperature | |
top_p | |
stop | |
n | |
echo | |
logprobs | |
For every model, you can check its prompt format in the API section on its page.