Deploy Create Llm - DeepInfra

curl --request POST \ --url https://api.example.com/deploy/llm \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data ' { "model_name": "<string>", "gpu": "L4-24GB", "num_gpus": 1, "max_batch_size": 96, "hf": { "repo": "<string>", "revision": "<string>", "token": "<string>" }, "base_model": "<string>", "settings": { "min_instances": 1, "max_instances": 1 }, "extra_args": [ "<string>" ] } '

{ "deploy_id": "<string>", "model_name": "<string>", "version": "<string>", "task": "<string>", "status": "<string>", "fail_reason": "<string>", "created_at": "<string>", "updated_at": "<string>", "type": "legacy", "instances": { "running": 123, "pending": 123 }, "config": { "gpu": "L4-24GB", "num_gpus": 123, "max_batch_size": 123, "weights": { "repo": "<string>", "revision": "<string>", "token": "<string>" } }, "settings": { "min_instances": 1, "max_instances": 1 } }

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Headers

xi-api-key

string | null

Body

application/json

model_name

string

required

model name for deepinfra (username/mode-name format)

gpu

enum<string>

required

The type of GPU the deployment is running on

Available options:

L4-24GB,

L40S-48GB,

A100-80GB,

H100-80GB,

H200-141GB,

B200-180GB,

B300-270GB,

RTXPRO6000-96GB,

other

num_gpus

integer

default:1

Number of GPUs used by one instance

Required range: 1 <= x <= 8

max_batch_size

integer

default:96

Maximum number of concurrent requests

Required range: 1 <= x <= 256

HFWeights · object

Show child attributes

base_model

string | null

Base public model

settings

ScaleSettings · object

Show child attributes

extra_args

string[] | null

Extra command line arguments for custom deployments

Response

Successful Response

deploy_id

string

required

Deploy Id

Example:

"fkj843kjh8"

model_name

string

required

Model Id from huggingface

Example:

"google/vit-base-patch16-224"

version

string

required

Model version

Example:

"d8b79b422843bd59d628bf25b01aded94a9ec1a9b917e69fe460df9ff39ec42b"

task

string

required

Task

Example:

"image-classification"

status

string

required

Status

Example:

"deployed"

fail_reason

string

required

Failure reason

Example:

"Initialization failed"

created_at

string

required

Created at

Example:

"2021-08-27T17:19:21+00:00"

updated_at

string

required

Updated at

Example:

"2021-08-27T17:19:21+00:00"

type

enum<string>

default:legacy

Available options:

legacy,

llm,

lora,

tts

instances

DeployInstances · object

Details about number of instances running right now

Show child attributes

config

DeployLLMConfig · object

Immutable deploy configuration

Show child attributes

settings

ScaleSettings · object

Scale Settings

Show child attributes

API Reference

Authorizations

Headers

Body

Response