Skip to main content
Whisper is OpenAI’s speech recognition model. Given an audio file, it produces transcribed text with per-sentence timestamps. DeepInfra hosts multiple Whisper variants. Browse all speech recognition models.

Models

ModelNotes
openai/whisper-largeBest accuracy
openai/whisper-mediumBalanced
openai/whisper-smallFast
openai/whisper-baseSmallest
openai/whisper-timestamped-mediumPer-word timestamps
By default, Whisper produces per-sentence timestamp segmentation. whisper-timestamped gives per-word timestamps.

Example

import { AutomaticSpeechRecognition } from "deepinfra";
import path from "path";
import { fileURLToPath } from "url";

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

const client = new AutomaticSpeechRecognition(
  "openai/whisper-large",
  "$DEEPINFRA_TOKEN"
);

const response = await client.generate({
  audio: path.join(__dirname, "audio.mp3"),
});

console.log(response.text);

Supported formats

  • mp3
  • wav

Additional parameters

Each Whisper variant supports parameters like language, task (transcribe vs. translate), and more. Check the model’s documentation page for the full list: