Skip to main content
DeepInfra is an AI inference cloud that makes it simple to run the latest machine learning models at scale — LLMs, vision, embeddings, image generation, video generation, speech, and more.

What you can do

Why DeepInfra

Drop-in OpenAI replacement. Point your existing OpenAI SDK to https://api.deepinfra.com/v1/openai and your code works without changes. No migration required. Best price for open-source models. DeepInfra consistently offers the lowest prices for open-source model inference. You only pay per token — no idle GPU time, no minimums, no seat fees. DeepInfra is also the provider with the most models on OpenRouter. Always-fresh model catalog. DeepInfra is typically among the first providers to deploy a newly released model. Private deployments for compliance and customization. Need to run your own fine-tuned weights, or require data isolation? Deploy a dedicated instance on A100/H100/H200/B200/B300 with autoscaling and a private endpoint — competitive GPU pricing, deployable in just a few clicks. GPU Clusters for training and full control. Rent a B200 or B300 cluster with SSH access and run whatever you want.

Get started in 60 seconds

Quickstart

Make your first API call — no installation required.

Quick example

from openai import OpenAI

client = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
Get your API key from the Dashboard.