Model categories
- Text generation / LLMs — Llama, DeepSeek, Mistral, Qwen, Gemma, and more
- Embeddings — Qwen3 Embedding, BAAI/bge, sentence-transformers, and more
- Rerankers — Cross-encoder rerankers for RAG pipelines
- Vision / multimodal — Qwen2.5-VL, Llama Vision, and more
- OCR — Specialized models for document text extraction
- Text to image — FLUX, Stable Diffusion, and more
- Text to video — Generate video clips from text prompts
- Text to speech — Convert text to natural-sounding audio
- Speech recognition — Whisper and other ASR models
Model pages
Each model has a dedicated page where you can:- Try it out interactively
- See its API documentation
- Grab ready-to-use code examples
Private models
We also support deploying custom models on DeepInfra infrastructure. Run your own fine-tuned or trained-from-scratch LLM on dedicated A100/H100/H200/B200/B300 GPUs.Specifying model versions
Some models have more than one version available. You can infer against a particular version using{"model": "MODEL_NAME:VERSION", ...} format.
You can also infer against a deploy_id using {"model": "deploy_id:DEPLOY_ID", ...}. This is especially useful for Custom LLMs — you can start inferring before the deployment finishes and before you have the model name + version pair.