Deploy Llm Presets

curl --request GET \
  --url https://api.deepinfra.com/deploy/llm/presets \
  --header 'Authorization: Bearer <token>'

[
  {
    "id": "<string>",
    "gpu_configs": [
      "<string>"
    ],
    "source": "deepinfra",
    "engine": "vllm",
    "standard_args": {
      "max_context_size": 5000000,
      "max_concurrent_requests": 512,
      "gpu_memory_fraction": 0.735,
      "max_prefill_tokens": 65792,
      "enable_prefix_caching": true
    },
    "label": ""
  }
]

GET

deploy

llm

presets

Deploy Llm Presets

curl --request GET \
  --url https://api.deepinfra.com/deploy/llm/presets \
  --header 'Authorization: Bearer <token>'

[
  {
    "id": "<string>",
    "gpu_configs": [
      "<string>"
    ],
    "source": "deepinfra",
    "engine": "vllm",
    "standard_args": {
      "max_context_size": 5000000,
      "max_concurrent_requests": 512,
      "gpu_memory_fraction": 0.735,
      "max_prefill_tokens": 65792,
      "enable_prefix_caching": true
    },
    "label": ""
  }
]

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

xi-api-key

string | null

x-api-key

string | null

Query Parameters

hf_repo_id

string

required

gpu

enum<string> | null

Available options:

L4-24GB,

L40S-48GB,

A100-80GB,

H100-80GB,

H200-141GB,

B200-180GB,

B300-270GB,

RTXPRO6000-96GB,

other

engine

string | null

Response

Successful Response

string

required

Preset id.

gpu_configs

string[]

required

Allowed Nx hardware configs.

source

string

default:deepinfra

Source of this config (e.g. deepinfra).

engine

string

default:vllm

Inference engine the preset was tuned for.

standard_args

StandardArgs · object

Preset engine tuning knobs.

Show child attributes

label

string

default:""

Short display name for the preset (e.g. "Throughput-optimized").

Deploy Llm Standard Args Deploy Create Llm