Skip to main content
DeepInfra supports streaming responses via server-sent events (SSE), the same protocol as OpenAI. Set stream: true in your request to enable it.

Examples

from openai import OpenAI

openai = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

stream = openai.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
)

for event in stream:
    if event.choices[0].finish_reason:
        print(event.choices[0].finish_reason,
              event.usage['prompt_tokens'],
              event.usage['completion_tokens'])
    else:
        print(event.choices[0].delta.content, end="", flush=True)

SSE format

Each streamed chunk is a data: line containing a JSON object:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":5}}

data: [DONE]
The final chunk before [DONE] contains usage information.

Notes

  • Streaming works for all supported models
  • Usage stats are available in the last chunk (when finish_reason is set)
  • The completion_tokens and prompt_tokens counts are the same as non-streaming