Home TechnologyWhisper: the creators of ChatGPT create a revolutionary AI that transcribes audio...

Whisper: the creators of ChatGPT create a revolutionary AI that transcribes audio to text

November 13, 2023

137

At OpenAI’s recent DevDay event, which highlighted the announcements of GPT-4 Turbo and the possibility of developing custom GPTs, another launch went unnoticed by many: the presentation of Whisper V3, a powerful artificial intelligence (AI) that allows transcribing audio to text with amazing efficiency. Although GPT-4 received much media attention, Whisper V3 has emerged as a revolutionary tool, available for free and open source.

Unlike its predecessors, Whisper V3 has been trained with over one million hours of labeled audio and over 4 million hours of pseudo-labeled audio. This extensive training has allowed errors to be reduced between 10 and 20%, establishing an error rate below 5% for the Spanish language. Thus, it becomes one of the most precise models in this language.

As explained by the CEO of OpenAI, Sam Altman, Whisper V3 stands out for its multitasking capacity, since it can recognize and translate audio files in multiple languages, in addition to automatically identifying language changes in the same conversation, providing versatility unique in its field.

For now, OpenAI offers this tool in different sizes, starting from versions with less than 1 GB of VRAM to large models with 1.55 billion parameters and requirements of approximately 10 GB of VRAM. This range allows users to adapt Whisper V3 to different applications and needs.

In terms of usability, Whisper V3 is accessible through platforms such as Hugging Face or Replicate, and its source code is available on Github. Users can take advantage of this technology for free, making it a valuable tool for accurate transcriptions and translations.

Despite not receiving the same attention as other developments at DevDay, Whisper V3 represents an effective and easy-to-deploy solution for a variety of applications, from simple transcription tasks to more complex functions in the field of voice assistance.

For those who have dealt with the limitations of free audio transcription tools, Whisper V3 represents a significant advance, standing out for its reliability and effectiveness. Altman’s company aims to encourage the adoption of this tool by software developers, anticipating its integration into a variety of applications to improve the user experience in speech recognition and audio-to-text transcription.