Deploy Private Models

DeepInfra allows you to deploy your own models on dedicated infrastructure — your weights, your endpoint, your isolation.

Why run private models?

Compliance — data stays on dedicated infrastructure, not shared with other users
Custom weights — deploy fine-tuned or trained-from-scratch models
Predictable latency — no sharing with other users means consistent response times
Autoscaling — scale from 0 to many instances automatically based on load
Competitive GPU pricing — some of the lowest per-GPU-hour rates available, with no lock-in
Simple deployment — up and running in just a couple of clicks from the dashboard

What you can deploy

Custom LLMs

Deploy any Hugging Face LLM on A100/H100/H200/B200/B300 GPUs with the OpenAI-compatible API.

LoRA Adapters

Deploy LoRA fine-tuned language models on top of supported base models.

LoRA Image Models

Deploy LoRA adapters for image generation from Civitai.

GPU options

Private model deployments run on:

A100-80GB — proven workhorse for LLM inference, great value
H100-80GB — fast and widely supported
H200-141GB — large HBM3e memory, ideal for big models
B200-180GB — NVIDIA Blackwell, significantly faster for inference workloads
B300-288GB — latest NVIDIA Blackwell Ultra, highest performance available

Pricing model

Unlike shared inference (pay per token), private deployments are billed per GPU-hour. You pay for the time your GPUs are running, regardless of traffic.

Leaving a custom deployment running by mistake can rack up costs quickly. For example, forgetting to shut down a 2-GPU deployment over a weekend (64 hours) costs ~$256 USD. Always set spending limits in payment settings.

Getting started

Go to Dashboard → Deployments
Click New Deployment
Choose your deployment type (Custom LLM, LoRA, or LoRA Image)
Fill in the configuration and deploy

See the specific guides for each deployment type:

DeepInfra Native API Custom LLMs

Documentation Index

​Why run private models?

​What you can deploy

Custom LLMs

LoRA Adapters

LoRA Image Models

​GPU options

​Pricing model

​Getting started

Why run private models?

What you can deploy

GPU options

Pricing model

Getting started