A week of madness. The frenzy overwhelms the world of artificial intelligence. Announcements of new high-capacity models follow one another with almost no time to test them. OpenAI and Google have been the main protagonists, with the presentation of their amazing new GPT-4o and Gemini 1.5 models, but they are not the only ones accelerating. The conclusion of these days is that we are beginning to experience AIs like those in cinema. It is impossible not to remember the cult film Her (Spike Jonze, 2013), a science fiction in which the character played by Joaquim Phoenix falls madly in love with an AI named Samantha (voiced by Scarlett Johansson). If you haven’t seen it, it is available on Filmin and Prime Video.
The first impact came last Monday, when OpenAI presented its best language model to date. It is called GPT-4o (the last character is the letter “o” in “omni”), a complete multimodal model that interprets voice, text, audio, video and image messages and can respond in any of these formats. It’s so fast that it’s hard to believe the examples are real. Conversations with this AI through a voice ChatGPT are like between people, with a response time of only 232 milliseconds. Those who have tried it these days with the phone app will have noticed a much higher latency, of several seconds, but the new voice mode has not yet been launched. Right now the old version is used in the application, although according to the company’s CEO, Sam Altman, “the new version is worth the wait.”
In addition to its great power, the other notable novelty is that access to GPT-4o is free. Having kept GPT-4 as a paid model for a year, OpenAI is now overtaking it and offering its successor for free (with some perks for those who pay). The decision may have an influence on what other AI companies do now, which also make people pay for their superior versions. You have to hear it, but this artificial intelligence, like Her’s Samantha, modulates emotions with its voice and can even sing. She has access to 50 languages, so simultaneous translation is now a reality.
On his blog, Sam Altman explains the decision to make the new AI free. “We are a company and we will find a lot of things to charge for, and that will help us provide a great, free AI service to (hopefully) billions of people.” Will a possible agreement with Apple be behind this decision? It’s just speculation but could it replace Siri? “The new voice (and video) mode is the best computer interface I have ever used. It looks like AI from the movies; and I’m still a little surprised that it’s real. Reaching human-level response times and expressiveness turns out to be a big change,” says Altman, who although he says that GPT-4o “is fast, intelligent, fun, natural and useful,” does not reach the intelligence of Samantha from Her. nor that of another very dangerous movie AI, Skynet from Terminator.
The next day, Google started its Google I/O developer conference, where it presented a long list of AI news. Chief among them were new updates to its Gemini language model. The Gemini 1.5 Pro version, like GPT-4o, can proficiently analyze documents, videos, audio and code bases. In addition, Gemini 1.5 Flash was presented, a new model optimized for speed and efficiency on mobile devices.
In the chapter of big news, Google unveiled a new video model called I See that can generate videos of more than 60 seconds and 1080p resolution from text, images and videos. The competition with OpenAI’s Sora is already underway. In addition, Google introduced an image-from-text model called Image 3 that is up there with the best, and a text-to-video conversion tool called VideoFX that lets you create scene-by-scene storyboards, complete with music. included. To try everything you have to sign up for a waiting list. At the moment, the verb wait and AI go together poorly.