Skip to main content
POST
/
deploy
/
llm
Deploy Create Llm
curl --request POST \
  --url https://api.example.com/deploy/llm \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model_name": "<string>",
  "gpu": "L4-24GB",
  "num_gpus": 1,
  "max_batch_size": 96,
  "hf": {
    "repo": "<string>",
    "revision": "<string>",
    "token": "<string>"
  },
  "base_model": "<string>",
  "settings": {
    "min_instances": 1,
    "max_instances": 1
  },
  "extra_args": [
    "<string>"
  ]
}
'
{
  "deploy_id": "<string>",
  "model_name": "<string>",
  "version": "<string>",
  "task": "<string>",
  "status": "<string>",
  "fail_reason": "<string>",
  "created_at": "<string>",
  "updated_at": "<string>",
  "type": "legacy",
  "instances": {
    "running": 123,
    "pending": 123
  },
  "config": {
    "gpu": "L4-24GB",
    "num_gpus": 123,
    "max_batch_size": 123,
    "weights": {
      "repo": "<string>",
      "revision": "<string>",
      "token": "<string>"
    }
  },
  "settings": {
    "min_instances": 1,
    "max_instances": 1
  }
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

xi-api-key
string | null

Body

application/json
model_name
string
required

model name for deepinfra (username/mode-name format)

gpu
enum<string>
required

The type of GPU the deployment is running on

Available options:
L4-24GB,
L40S-48GB,
A100-80GB,
H100-80GB,
H200-141GB,
B200-180GB,
B300-270GB,
RTXPRO6000-96GB,
other
num_gpus
integer
default:1

Number of GPUs used by one instance

Required range: 1 <= x <= 8
max_batch_size
integer
default:96

Maximum number of concurrent requests

Required range: 1 <= x <= 256
hf
HFWeights · object
base_model
string | null

Base public model

settings
ScaleSettings · object
extra_args
string[] | null

Extra command line arguments for custom deployments

Response

Successful Response

deploy_id
string
required

Deploy Id

Example:

"fkj843kjh8"

model_name
string
required

Model Id from huggingface

Example:

"google/vit-base-patch16-224"

version
string
required

Model version

Example:

"d8b79b422843bd59d628bf25b01aded94a9ec1a9b917e69fe460df9ff39ec42b"

task
string
required

Task

Example:

"image-classification"

status
string
required

Status

Example:

"deployed"

fail_reason
string
required

Failure reason

Example:

"Initialization failed"

created_at
string
required

Created at

Example:

"2021-08-27T17:19:21+00:00"

updated_at
string
required

Updated at

Example:

"2021-08-27T17:19:21+00:00"

type
enum<string>
default:legacy
Available options:
legacy,
llm,
lora,
tts
instances
DeployInstances · object

Details about number of instances running right now

config
DeployLLMConfig · object

Immutable deploy configuration

settings
ScaleSettings · object

Scale Settings