What you can do
LLMs & Chat
OpenAI-compatible API for 100+ LLMs. Swap your base URL, keep your code.
Vision & OCR
Multimodal models for visual understanding and document text extraction.
Embeddings & Reranking
State-of-the-art embedding and reranker models for search and RAG.
Image & Video Generation
FLUX, Stable Diffusion, text-to-video, and more.
Speech
Speech recognition (Whisper) and text-to-speech models.
Deploy Private Models
Run your own fine-tuned LLM on A100 / H100 / H200 / B200 / B300 with autoscaling.
Why DeepInfra
Drop-in OpenAI replacement. Point your existing OpenAI SDK tohttps://api.deepinfra.com/v1/openai and your code works without changes. No migration required.
Best price for open-source models. DeepInfra consistently offers the lowest prices for open-source model inference. You only pay per token — no idle GPU time, no minimums, no seat fees. DeepInfra is also the provider with the most models on OpenRouter.
Always-fresh model catalog. DeepInfra is typically among the first providers to deploy a newly released model.
Private deployments for compliance and customization. Need to run your own fine-tuned weights, or require data isolation? Deploy a dedicated instance on A100/H100/H200/B200/B300 with autoscaling and a private endpoint — competitive GPU pricing, deployable in just a few clicks.
GPU Clusters for training and full control. Rent a B200 or B300 cluster with SSH access and run whatever you want.
Get started in 60 seconds
Quickstart
Make your first API call — no installation required.