Artificial intelligence (AI) continues to take giant steps to make reality indistinguishable from fiction. Open AI, the company responsible for ChatGPT has just launched Sora, its new generative AI model “that can create realistic and imaginative scenes from text instructions”, as explained on the project’s website. And in view of the results it is truly so.

It was announced yesterday by Sam Altman, executive director of Open IA, from X and instantly this social network was filled with videos of all kinds created with this new tool. From futuristic scenes, to cartoon clips or scenes from an American Western town that fit perfectly as old or vintage footage. At the moment only up to one minute in length and through textual instructions. Sora can also lengthen existing videos.

At the moment, Sora is not open to the general public. It is currently open to a small group of “visual artists, designers and filmmakers to get feedback on how to advance the model to make it more useful for creative professionals.” Separately, Open AI’s so-called red team (experts in areas such as disinformation and content that incites hatred and prejudice) is evaluating “critical areas for harm or risks.”

But already at this early point in his development, Sora’s capabilities are astonishing, as he is able to generate complex scenes with multiple characters, specific types of movement, and precise details of the protagonists and background. According to its creators, “the model understands not only what the user has requested in the message, but also how those things exist in the physical world.”

In the almost fifty videos generated with Sora, which have been published on the project website, images are shown with characters that are convincing and with great visual consistency throughout the different shots and shots that the AI ??is also capable of. to generate. This, according to those responsible, shows that “the model has a deep knowledge of language, which allows it to interpret indications accurately and generate characters that express vibrant emotions.”

However, in its current version, which was released yesterday, it still has some limitations. It still has difficulty, as Open AI acknowledges, “to accurately simulate the physics of a complex scene and may not understand specific cases of cause and effect. For example, a person may take a bite of a cookie, but then, “The cookie may not have the bite mark.” Similarly, “he may also confuse the spatial details of a prompt, for example, mixing left and right – as seen in the video of the man running on the treadmill – and may have difficulty with accurate descriptions of events that have place over time, like following a specific camera path.”

As is usual with any leap forward in AI, concerns immediately arise about the misuse that can be made of a tool like Sora. In this sense, it is worth remembering that Sora is not the first video generative tool. The Midjourney laboratory has a bot on Discord with which it is possible to generate short videos using textual instructions. Or Stable Diffussion, another model with which you can also create videos, although only between 2 and 5 seconds.

In any case, the capabilities of these two models are far behind those of Sora, who – for example – could be asked to make a video of army soldiers entering a hospital and killing people. doctors and patients and that the images had the same style as those broadcast by television around the world in any war conflict. From this example, the list of imaginable misuses is very long, although they do not differ much from the most common ones whenever the ethical conflicts of any AI model are discussed.

For this reason, Open AI has explained that it is already taking precautions in this regard and assures that it is already working on creating “tools to help detect misleading content, such as a detection classifier that can indicate when Sora generated a video”, based on the experience gained in the development of DALL·E 3, the Open AI imaging model, which are also applicable to Sora.

Thus, when Sora is integrated into any other Open AI product and is opened to the public, text input requests asking to generate videos that show “extreme violence, sexual content, hate images or images of celebrities” will be rejected. , they assure. But from this company they recognize the inevitable and that despite “exhaustive research and testing, we cannot predict all the beneficial ways in which people will use our technology, nor all the ways in which they will abuse it.”