The completions API is the legacy text generation interface — you provide a raw prompt string and the model continues it. For most use cases, the Chat Completions API is simpler and recommended instead. The endpoint is:Documentation Index
Fetch the complete documentation index at: https://docs.deepinfra.com/llms.txt
Use this file to discover all available pages before exploring further.
Example
The example below usesdeepseek-ai/DeepSeek-V3 with its prompt format:
Supported parameters
| Parameter | Notes |
|---|---|
model | Model name or MODEL_NAME:VERSION |
prompt | Raw prompt string in the model’s expected format |
max_tokens | Max tokens to generate. Defaults to the model’s max context length minus input length |
stream | Stream output via SSE instead of returning the full response at once. Default: false |
temperature | Sampling temperature between 0 and 2. Higher values produce more random output; lower values more deterministic. Default: 1.0 |
top_p | Nucleus sampling threshold — only tokens comprising the top top_p probability mass are considered. Default: 1.0 |
stop | Up to 4 sequences where the API will stop generating further tokens |
n | Number of completion sequences to return. Default: 1 |
echo | If true, the prompt is included at the start of the returned text |
logprobs | Return log probabilities for the generated tokens |