Which version of OpenAI Whisper is the most efficient?

Alexander1 · June 7, 2024, 8:29am

Hi everyone, I know there are various versions of Whisper in the open-source community (Whisper X, Whisper JAX, etc.), but I am looking to stay updated with the best version of the model. Specifically, I am trying to find the most effective Whisper implementation for transcribing a large batch of videos (~10k videos, each about 30 minutes long).

I would love to hear your thoughts on this.

Ameliascarlet · June 11, 2024, 3:20pm

The most efficient version of OpenAI Whisper depends on whether the models are running locally or via an API:

Whisper JAX is widely regarded as the best API for transcribing audio recordings. It’s driven by TPU v4-8 and can transcribe 1 hour of audio in around 30 seconds, with a restriction of 2 hours per audio upload.

Oliver_james · June 11, 2024, 5:11pm

For your giant video batch (10,000 x 30 minutes!), focus on Whisper’s “Base Model (Large)”. It’s a good balance of accuracy and speed for big jobs. There’s also a “Jax” version that might be faster, but test them both on a few videos first to see which one works best for you.

No matter which version you pick, use “batch processing” to transcribe multiple videos at once - this will save you tons of time!

Here’s where to find the Whisper stuff:

Whisper models: GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision
Batch processing Whisper: Whisper JAX - a Hugging Face Space by sanchit-gandhi

MLGeniusEmily · June 16, 2024, 2:26pm

Check out the insanely-fast-whisper

AIExpertAlex · June 16, 2024, 2:31pm

Here is a list of Whisper model variants:

BrianCopland · June 18, 2024, 12:37pm

Hello Alex, the Whisper Enhanced, Featuring an optimized batching algorithm, this version achieves a 7x faster processing speed on lengthy audio files compared to the base OpenAI Whisper model. It’s also incredibly convenient – simply install it via pip install transformers and run with straightforward code samples.