Why run private models?
- Compliance — data stays on dedicated infrastructure, not shared with other users
- Custom weights — deploy fine-tuned or trained-from-scratch models
- Predictable latency — no sharing with other users means consistent response times
- Autoscaling — scale from 0 to many instances automatically based on load
- Competitive GPU pricing — some of the lowest per-GPU-hour rates available, with no lock-in
- Simple deployment — up and running in just a couple of clicks from the dashboard
What you can deploy
Custom LLMs
Deploy any Hugging Face LLM on A100/H100/H200/B200/B300 GPUs with the OpenAI-compatible API.
LoRA Adapters
Deploy LoRA fine-tuned language models on top of supported base models.
LoRA Image Models
Deploy LoRA adapters for image generation from Civitai.
GPU options
Private model deployments run on:- A100-80GB — proven workhorse for LLM inference, great value
- H100-80GB — fast and widely supported
- H200-141GB — large HBM3e memory, ideal for big models
- B200-180GB — NVIDIA Blackwell, significantly faster for inference workloads
- B300-288GB — latest NVIDIA Blackwell Ultra, highest performance available
Pricing model
Unlike shared inference (pay per token), private deployments are billed per GPU-hour. You pay for the time your GPUs are running, regardless of traffic.Getting started
- Go to Dashboard → Deployments
- Click New Deployment
- Choose your deployment type (Custom LLM, LoRA, or LoRA Image)
- Fill in the configuration and deploy