Skip to main content
An OpenAI-compatible Batch API for submitting and managing asynchronous inference jobs. All endpoints are relative to:
https://api.deepinfra.com/v1/openai
These endpoints operate on the Batch object.

Create a batch

POST /batches
Creates and starts executing a batch job from an uploaded file of requests. Creating a batch requires the following parameters:
The time window after which the batch expires. Currently only "24h" is supported.
The endpoint to run the batch against. One of the batch-supported endpoints, currently /v1/chat/completions, /v1/completions, or /v1/embeddings. Must match the url used on every line of the input file.
The id of the uploaded input file (created with purpose="batch").
Up to 16 key–value string pairs, where each key is a string of up to 64 characters and each value is a string of up to 512 characters.
Controls how long the output and error files remain available. An object with two fields:
  • anchor (optional) — must be "created_at". The expiry is measured from when the output file is created. Defaults to "created_at".
  • seconds (optional) — the number of seconds the file stays available after the anchor. An integer between 3600 (1 hour) and 2592000 (30 days). Defaults to 2592000 (30 days).
This endpoint returns a Batch object.
from openai import OpenAI

client = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

batch = client.batches.create(
    input_file_id="file_abc123",
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={"description": "nightly eval run"},
    output_expires_after={"anchor": "created_at", "seconds": 604800},
)
print(batch.id, batch.status)
{
  "id": "batch_abc123",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "errors": null,
  "input_file_id": "file_abc123",
  "completion_window": "24h",
  "status": "validating",
  "output_file_id": null,
  "error_file_id": null,
  "created_at": 1711471533,
  "in_progress_at": null,
  "expires_at": null,
  "finalizing_at": null,
  "completed_at": null,
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": null,
  "cancelled_at": null,
  "request_counts": null,
  "metadata": {
    "description": "nightly eval run"
  },
  "model": null,
  "usage": null
}

Retrieve a batch

GET /batches/{batch_id}
Returns information about a specific batch. Retrieving a batch requires the following parameters:
The id of the batch to retrieve.
This endpoint returns a Batch object.
batch = client.batches.retrieve("batch_abc123")
print(batch.id, batch.status)
{
  "id": "batch_abc123",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "errors": null,
  "input_file_id": "file_abc123",
  "completion_window": "24h",
  "status": "in_progress",
  "output_file_id": null,
  "error_file_id": null,
  "created_at": 1711471533,
  "in_progress_at": 1711471538,
  "expires_at": null,
  "finalizing_at": null,
  "completed_at": null,
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": null,
  "cancelled_at": null,
  "request_counts": {
    "total": 100,
    "completed": 40,
    "failed": 1
  },
  "metadata": {
    "description": "nightly eval run"
  },
  "model": "deepseek-ai/DeepSeek-V3",
  "usage": {
    "input_tokens": 4800,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 3200,
    "output_tokens_details": {
      "reasoning_tokens": 0
    },
    "total_tokens": 8000
  }
}

List batches

GET /batches
Listing batches requires the following parameters:
A pagination cursor. The returned list starts from the object right after the batch with this id. If omitted, the list starts from the first batch.
An integer between 1 and 100. The returned list will have at most limit elements. Defaults to 20.
The returned object has the following fields:
FieldTypeDescription
objectstringThe object type, always "list".
dataarrayA list of Batch objects.
first_idstringThe id of the first batch in the list.
last_idstringThe id of the last batch in the list.
has_morebooleantrue if there are more batches after last_id.
batches = client.batches.list(limit=20)
for batch in batches.data:
    print(batch.id, batch.status)
{
  "object": "list",
  "data": [
    {
      "id": "batch_abc123",
      "object": "batch",
      "endpoint": "/v1/chat/completions",
      "errors": null,
      "input_file_id": "file_abc123",
      "completion_window": "24h",
      "status": "completed",
      "output_file_id": "file_out456",
      "error_file_id": null,
      "created_at": 1711471533,
      "in_progress_at": 1711471538,
      "expires_at": null,
      "finalizing_at": 1711475133,
      "completed_at": 1711475134,
      "failed_at": null,
      "expired_at": null,
      "cancelling_at": null,
      "cancelled_at": null,
      "request_counts": {
        "total": 100,
        "completed": 100,
        "failed": 0
      },
      "metadata": {
        "description": "nightly eval run"
      },
      "model": "deepseek-ai/DeepSeek-V3",
      "usage": {
        "input_tokens": 12000,
        "input_tokens_details": {
          "cached_tokens": 0
        },
        "output_tokens": 8000,
        "output_tokens_details": {
          "reasoning_tokens": 0
        },
        "total_tokens": 20000
      }
    },
    {
      "id": "batch_def456",
      "object": "batch",
      "endpoint": "/v1/embeddings",
      "errors": null,
      "input_file_id": "file_def456",
      "completion_window": "24h",
      "status": "in_progress",
      "output_file_id": null,
      "error_file_id": null,
      "created_at": 1711558000,
      "in_progress_at": 1711558005,
      "expires_at": null,
      "finalizing_at": null,
      "completed_at": null,
      "failed_at": null,
      "expired_at": null,
      "cancelling_at": null,
      "cancelled_at": null,
      "request_counts": {
        "total": 5000,
        "completed": 1200,
        "failed": 0
      },
      "metadata": null,
      "model": "Qwen/Qwen3-Embedding-8B",
      "usage": {
        "input_tokens": 60000,
        "input_tokens_details": {
          "cached_tokens": 0
        },
        "output_tokens": 0,
        "output_tokens_details": {
          "reasoning_tokens": 0
        },
        "total_tokens": 60000
      }
    }
  ],
  "first_id": "batch_abc123",
  "last_id": "batch_def456",
  "has_more": false
}

Cancel a batch

POST /batches/{batch_id}/cancel
Cancels a given batch. The batch moves to the cancelling status until it is finalized, at which point its status becomes cancelled. Requests that managed to finish are written to the output file, and the rest are written to the error file as cancelled. Cancelling a batch requires the following parameters:
The id of the batch to cancel.
This endpoint returns a Batch object.
batch = client.batches.cancel("batch_abc123")
print(batch.status)
{
  "id": "batch_abc123",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "errors": null,
  "input_file_id": "file_abc123",
  "completion_window": "24h",
  "status": "cancelled",
  "output_file_id": "file_out456",
  "error_file_id": "file_err789",
  "created_at": 1711471533,
  "in_progress_at": 1711471538,
  "expires_at": null,
  "finalizing_at": null,
  "completed_at": null,
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": 1711472000,
  "cancelled_at": 1711472050,
  "request_counts": {
    "total": 100,
    "completed": 40,
    "failed": 0
  },
  "metadata": {
    "description": "nightly eval run"
  },
  "model": "deepseek-ai/DeepSeek-V3",
  "usage": {
    "input_tokens": 4800,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 3200,
    "output_tokens_details": {
      "reasoning_tokens": 0
    },
    "total_tokens": 8000
  }
}