Skip to main content
GET
/
deploy
/
llm
/
presets
Deploy Llm Presets
curl --request GET \
  --url https://api.deepinfra.com/deploy/llm/presets \
  --header 'Authorization: Bearer <token>'
[
  {
    "id": "<string>",
    "gpu_configs": [
      "<string>"
    ],
    "source": "deepinfra",
    "engine": "vllm",
    "standard_args": {
      "max_context_size": 5000000,
      "max_concurrent_requests": 512,
      "gpu_memory_fraction": 0.735,
      "max_prefill_tokens": 65792,
      "enable_prefix_caching": true
    },
    "label": ""
  }
]

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

xi-api-key
string | null
x-api-key
string | null

Query Parameters

hf_repo_id
string
required
gpu
enum<string> | null
Available options:
L4-24GB,
L40S-48GB,
A100-80GB,
H100-80GB,
H200-141GB,
B200-180GB,
B300-270GB,
RTXPRO6000-96GB,
other
engine
string | null

Response

Successful Response

id
string
required

Preset id.

gpu_configs
string[]
required

Allowed Nx hardware configs.

source
string
default:deepinfra

Source of this config (e.g. deepinfra).

engine
string
default:vllm

Inference engine the preset was tuned for.

standard_args
StandardArgs · object

Preset engine tuning knobs.

label
string
default:""

Short display name for the preset (e.g. "Throughput-optimized").