Batch Endpoints - DeepInfra

An OpenAI-compatible Batch API for submitting and managing asynchronous inference jobs. All endpoints are relative to:

https://api.deepinfra.com/v1/openai

These endpoints operate on the Batch object.

Create a batch

POST /batches

Creates and starts executing a batch job from an uploaded file of requests. Creating a batch requires the following parameters:

completion_window

The time window after which the batch expires. Currently only "24h" is supported.

endpoint

The endpoint to run the batch against. One of the batch-supported endpoints, currently /v1/chat/completions, /v1/completions, or /v1/embeddings. Must match the url used on every line of the input file.

input_file_id

The id of the uploaded input file (created with purpose="batch").

metadata (optional)

Up to 16 key–value string pairs, where each key is a string of up to 64 characters and each value is a string of up to 512 characters.

output_expires_after (optional)

Controls how long the output and error files remain available. An object with two fields:

anchor (optional) — must be "created_at". The expiry is measured from when the output file is created. Defaults to "created_at".
seconds (optional) — the number of seconds the file stays available after the anchor. An integer between 3600 (1 hour) and 2592000 (30 days). Defaults to 2592000 (30 days).

This endpoint returns a Batch object.

from openai import OpenAI

client = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

batch = client.batches.create(
    input_file_id="file_abc123",
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={"description": "nightly eval run"},
    output_expires_after={"anchor": "created_at", "seconds": 604800},
)
print(batch.id, batch.status)

{
  "id": "batch_abc123",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "errors": null,
  "input_file_id": "file_abc123",
  "completion_window": "24h",
  "status": "validating",
  "output_file_id": null,
  "error_file_id": null,
  "created_at": 1711471533,
  "in_progress_at": null,
  "expires_at": null,
  "finalizing_at": null,
  "completed_at": null,
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": null,
  "cancelled_at": null,
  "request_counts": null,
  "metadata": {
    "description": "nightly eval run"
  },
  "model": null,
  "usage": null
}

Retrieve a batch

GET /batches/{batch_id}

Returns information about a specific batch. Retrieving a batch requires the following parameters:

batch_id

The id of the batch to retrieve.

This endpoint returns a Batch object.

batch = client.batches.retrieve("batch_abc123")
print(batch.id, batch.status)

{
  "id": "batch_abc123",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "errors": null,
  "input_file_id": "file_abc123",
  "completion_window": "24h",
  "status": "in_progress",
  "output_file_id": null,
  "error_file_id": null,
  "created_at": 1711471533,
  "in_progress_at": 1711471538,
  "expires_at": null,
  "finalizing_at": null,
  "completed_at": null,
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": null,
  "cancelled_at": null,
  "request_counts": {
    "total": 100,
    "completed": 40,
    "failed": 1
  },
  "metadata": {
    "description": "nightly eval run"
  },
  "model": "deepseek-ai/DeepSeek-V3",
  "usage": {
    "input_tokens": 4800,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 3200,
    "output_tokens_details": {
      "reasoning_tokens": 0
    },
    "total_tokens": 8000
  }
}

List batches

GET /batches

Listing batches requires the following parameters:

after (optional)

A pagination cursor. The returned list starts from the object right after the batch with this id. If omitted, the list starts from the first batch.

limit (optional)

An integer between 1 and 100. The returned list will have at most limit elements. Defaults to 20.

The returned object has the following fields:

Field	Type	Description
`object`	string	The object type, always `"list"`.
`data`	array	A list of Batch objects.
`first_id`	string	The `id` of the first batch in the list.
`last_id`	string	The `id` of the last batch in the list.
`has_more`	boolean	`true` if there are more batches after `last_id`.

batches = client.batches.list(limit=20)
for batch in batches.data:
    print(batch.id, batch.status)

{
  "object": "list",
  "data": [
    {
      "id": "batch_abc123",
      "object": "batch",
      "endpoint": "/v1/chat/completions",
      "errors": null,
      "input_file_id": "file_abc123",
      "completion_window": "24h",
      "status": "completed",
      "output_file_id": "file_out456",
      "error_file_id": null,
      "created_at": 1711471533,
      "in_progress_at": 1711471538,
      "expires_at": null,
      "finalizing_at": 1711475133,
      "completed_at": 1711475134,
      "failed_at": null,
      "expired_at": null,
      "cancelling_at": null,
      "cancelled_at": null,
      "request_counts": {
        "total": 100,
        "completed": 100,
        "failed": 0
      },
      "metadata": {
        "description": "nightly eval run"
      },
      "model": "deepseek-ai/DeepSeek-V3",
      "usage": {
        "input_tokens": 12000,
        "input_tokens_details": {
          "cached_tokens": 0
        },
        "output_tokens": 8000,
        "output_tokens_details": {
          "reasoning_tokens": 0
        },
        "total_tokens": 20000
      }
    },
    {
      "id": "batch_def456",
      "object": "batch",
      "endpoint": "/v1/embeddings",
      "errors": null,
      "input_file_id": "file_def456",
      "completion_window": "24h",
      "status": "in_progress",
      "output_file_id": null,
      "error_file_id": null,
      "created_at": 1711558000,
      "in_progress_at": 1711558005,
      "expires_at": null,
      "finalizing_at": null,
      "completed_at": null,
      "failed_at": null,
      "expired_at": null,
      "cancelling_at": null,
      "cancelled_at": null,
      "request_counts": {
        "total": 5000,
        "completed": 1200,
        "failed": 0
      },
      "metadata": null,
      "model": "Qwen/Qwen3-Embedding-8B",
      "usage": {
        "input_tokens": 60000,
        "input_tokens_details": {
          "cached_tokens": 0
        },
        "output_tokens": 0,
        "output_tokens_details": {
          "reasoning_tokens": 0
        },
        "total_tokens": 60000
      }
    }
  ],
  "first_id": "batch_abc123",
  "last_id": "batch_def456",
  "has_more": false
}

Cancel a batch

POST /batches/{batch_id}/cancel

Cancels a given batch. The batch moves to the cancelling status until it is finalized, at which point its status becomes cancelled. Requests that managed to finish are written to the output file, and the rest are written to the error file as cancelled. Cancelling a batch requires the following parameters:

batch_id

The id of the batch to cancel.

This endpoint returns a Batch object.

batch = client.batches.cancel("batch_abc123")
print(batch.status)

{
  "id": "batch_abc123",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "errors": null,
  "input_file_id": "file_abc123",
  "completion_window": "24h",
  "status": "cancelled",
  "output_file_id": "file_out456",
  "error_file_id": "file_err789",
  "created_at": 1711471533,
  "in_progress_at": 1711471538,
  "expires_at": null,
  "finalizing_at": null,
  "completed_at": null,
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": 1711472000,
  "cancelled_at": 1711472050,
  "request_counts": {
    "total": 100,
    "completed": 40,
    "failed": 0
  },
  "metadata": {
    "description": "nightly eval run"
  },
  "model": "deepseek-ai/DeepSeek-V3",
  "usage": {
    "input_tokens": 4800,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 3200,
    "output_tokens_details": {
      "reasoning_tokens": 0
    },
    "total_tokens": 8000
  }
}

​Create a batch

​Retrieve a batch

​List batches

​Cancel a batch

Create a batch

Retrieve a batch

List batches

Cancel a batch