Reasoning Models - DeepInfra

Some models on DeepInfra support extended chain-of-thought reasoning — the model “thinks through” a problem step by step before producing a final answer. By default, reasoning models produce a reasoning trace alongside the response. You can control this behavior with the reasoning_effort parameter.

Supported models

Reasoning is available on models that support chain-of-thought, including:

deepseek-ai/DeepSeek-R1

Check the model catalog for the latest list.

Controlling reasoning effort

Use reasoning_effort to control how much reasoning the model performs. Higher effort means deeper thinking but more output tokens and higher latency.

from openai import OpenAI

client = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=[{"role": "user", "content": "Prove that the square root of 2 is irrational."}],
    extra_body={"reasoning_effort": "high"},
)

print(response.choices[0].message.content)

Disabling reasoning

Set reasoning_effort to "none" to disable chain-of-thought entirely. The model will respond directly without a reasoning trace — faster and cheaper.

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    extra_body={"reasoning_effort": "none"},
)

The `reasoning` parameter

For more granular control, use the reasoning object instead of reasoning_effort:

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=[{"role": "user", "content": "Solve this step by step: 15! / 13!"}],
    extra_body={
        "reasoning": {
            "effort": "medium",
            "enabled": True,
        }
    },
)

Setting "enabled": false is equivalent to reasoning_effort: "none".

When to use reasoning

Use case	Recommended setting
Math, logic, and code problems	`"high"` (default for reasoning models)
Multi-step analysis	`"medium"` or `"high"`
Simple Q&A, translation, summarization	`"none"`
Cost-sensitive workloads	`"none"` or `"low"`

Supported parameters

Parameter	Type	Description
`reasoning_effort`	`string`	Controls reasoning depth: `"none"`, `"low"`, `"medium"`, `"high"`.
`reasoning`	`object`	Fine-grained reasoning config.
`reasoning.effort`	`string`	Same values as `reasoning_effort`.
`reasoning.enabled`	`boolean`	Explicitly enable or disable reasoning.

Notes

Reasoning tokens count toward output token billing
Disabling reasoning on a reasoning model makes it behave like a standard chat model
reasoning_effort: "none" is equivalent to reasoning: { enabled: false }
Not all models support reasoning — using these parameters on a non-reasoning model has no effect

Chat Completions

Full chat completions API reference.

Streaming

Stream reasoning responses token by token.

Prompt Caching

Cache long prompts for faster reasoning.

Documentation Index

​Supported models

​Controlling reasoning effort

​Disabling reasoning

​The reasoning parameter

​When to use reasoning

​Supported parameters

​Notes