Skip to main content
Webhooks are a feature of the DeepInfra Native API. They are not supported with the OpenAI-compatible API. Webhooks let you submit an inference request and receive the result via an HTTP callback, instead of waiting for the response synchronously. This is useful for long-running requests or fire-and-forget workloads.

How it works

Add a webhook parameter to your request. The API immediately responds with status queued, then calls your webhook URL with the result once inference is complete.

Text generation example

import { TextGeneration } from "deepinfra";

const client = new TextGeneration(
  "https://api.deepinfra.com/v1/inference/deepseek-ai/DeepSeek-V3",
  "$DEEPINFRA_TOKEN"
);

const res = await client.generate({
  input: "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
  stop: ["<|eot_id|>"],
  webhook: "https://your-app.com/deepinfra-webhook"
});

console.log(res.inference_status.status); // "queued"

Embeddings example

curl "https://api.deepinfra.com/v1/inference/Qwen/Qwen3-Embedding-8B" \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer $DEEPINFRA_TOKEN" \
   -d '{
     "inputs": ["I like chocolate"],
     "webhook": "https://your-app.com/deepinfra-webhook"
   }'

Webhook payload

On success, your endpoint receives:
{
    "request_id": "R7X9fdlIaF5GlVisBAi5xR3E",
    "inference_status": {
        "status": "succeeded",
        "runtime_ms": 228,
        "cost": 0.0001140000022132881
    },
    "results": { ... }
}
On failure:
{
    "request_id": "RHNShFanUP5ExA8rzgyDWH88",
    "inference_status": {
        "status": "failed",
        "runtime_ms": 0,
        "cost": 0.0
    }
}

Retry behavior

DeepInfra will make a few retry attempts if your webhook endpoint returns a 4xx or 5xx status code.