GPT-4 is here, the new language model with which OpenAI will replace GPT-3.5, which has fed the amazing generative artificial intelligence of ChatGPT since the end of November. The company has explained that GPT-4 is a large multimodal model that accepts image and text inputs, and emits text outputs and that, “although it is less capable than humans in many real-world scenarios, it shows a performance human level on various professional and academic benchmark tests.

According to OpenAI, GPT-4 is able to pass a mock bar exam and score in the top 10% of the test takers while GPT-3.5, in the same tests, was in the bottom 10%. The artificial intelligence company has spent six years training its model in such tests in which it can compete with humans.

The company says that “in casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. The difference comes to light when the complexity of the task reaches a sufficient threshold: GPT-4 is more reliable, creative, and capable of handling much more nuanced instructions than GPT-3.5”. In the last two years, OpenAI has co-designed with Microsoft’s Azure a supercomputer with which to test their language models.

GPT-4’s abilities are greater than his predecessor’s. For example, you may be shown an image and asked for some type of response related to it, as if you had vision. The response is as if you were posed a text input. “We have evaluated the performance of GPT-4 on a reduced set of standard academic vision tests,” says OpenAI. “However, these figures do not fully reflect the extent of its capabilities, as we are constantly discovering new and exciting tasks that the model is able to tackle.”

The company admits that its AI model also has some limitations. “It’s still not totally reliable,” he notes, “because it freaks out about the facts and makes reasoning errors.” “You have to be very careful – he warns – when using the results of linguistic models, especially in high-risk contexts”, and apply a human review to the results.

OpenAI has explained that GPT-4 is not aware of events after September 2021 and that it can respond incorrectly: “Sometimes it can make simple reasoning errors that don’t seem to match its competence in so many areas, or be overly gullible in accepting assertions.” obviously false from a user. And it can sometimes fail at difficult problems in the same way that humans do, such as introducing security vulnerabilities into the code it produces.”

Another caveat: “Additional GPT-4 capabilities lead to new risk surfaces.” To find out its scope, OpenAI hired more than 50 experts in various risks, such as cybersecurity, biohazards or international security to try to test the model in an adversarial way. The fixes that have been made “increase the difficulty of causing bad behavior, but doing so is still possible.” Company recognizes that there are ways to generate responses that violate its usage guidelines and will seek to achieve a high degree of reliability.