base_url and api_key, plus using a model from our catalog.
Any OpenAI-compatible model available for real-time inference on a batch-supported endpoint can also be used in batch.
Supported endpoints
A batch runs all of its requests against a single endpoint. The supported endpoints are:/v1/chat/completions/v1/completions/v1/embeddings
Workflow
Step 1 — Prepare the input file
The input is a JSONL file with one request per line. Each line has acustom_id, the HTTP method (POST), the url (which must match the batch endpoint), and the request body — the exact JSON you’d send to that endpoint in real time.
custom_idmust be unique across all requests in a file. It’s how you match results back to requests, since output order isn’t guaranteed.methodmust bePOST.urlmust equal the batch’sendpoint— you can’t mix endpoints in one file.- All requests must use the same model.
- Use a DeepInfra model id (e.g.
deepseek-ai/DeepSeek-V3), not an OpenAI model name. bodymust match the request format of the corresponding endpoint — it’s the exact JSON you’d send to that endpoint in real time. See Chat Completions, Text Completions, or Embeddings.
Step 2 — Upload the file
Upload the JSONL file withpurpose="batch". In return you get a FileObject containing the id of the uploaded file. For more information on how to upload a file, see Create file.
Step 3 — Create the batch
Create the batch job from the uploaded input file, choosing theendpoint to run against. The batch starts executing as soon as it’s created. For the exact details of creating a batch request, see Create a batch.
Step 4 — Check status
A batch runs asynchronously, so after creating it you can check its status to know when the results are ready. Retrieve the batch periodically and watch itsstatus until it reaches a terminal state — completed, failed, expired, or cancelled. For details on retrieving a batch, see Retrieve a batch.
usage and request_counts fields when checking status.
Once the batch reaches a terminal state, the output and error files will be available, if they contain any information.
Step 5 — Download the results
Once the batch reaches a terminal state, you can get either theerrors field, or the output and error files (output_file_id and error_file_id) from the Batch object, depending on the state.
You can download the output and error files using the Files API.
custom_id from the input so you can match it back to your request. Successful lines have a response; failed lines have an error. The response has a body field with the same format that the real-time API would return. The error has code and message fields that better describe why the line failed.
Cancel a batch
You can cancel a batch at any point while its status is non-terminal. Cancelling a batch puts it in thecancelling status for some time, after which it moves to cancelled.
Pricing
Batch requests are billed at 20% less than the corresponding real-time price for the same model and endpoint. The discount is applied automatically — there’s nothing extra to configure.Rate limits
Batch requests and usage do not affect your real-time rate limits. There are additional batch-related rate limits:| Limit | Value |
|---|---|
| Requests (lines) per file | 50,000 |
| Input file size | 200 MB |
| Embedding inputs per file | 50,000 |
| Concurrent batches per user | 100 |