W
To be verified
A general-purpose speech recognition model by OpenAI.
Whisper is a general-purpose speech recognition model developed by OpenAI. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification. Whisper uses a Transformer sequence-to-sequence model trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
- Transcribing audio files to text
- Translating speech from one language to another
- Identifying the language spoken in an audio file
- Whisper can be used via command-line or within Python. For command-line usage
- you can transcribe speech in audio files by specifying the audio file and model size. For Python usage
- you can load the model and use the transcribe() method to process audio files.