Many artificial intelligence systems have discovered how to use lying, deception and manipulation towards human beings to achieve their objectives. This statement is not an apocalyptic proclamation. It is in the introduction of a scientific study from the physics department of the Massachusetts Institute of Technology (MIT). “Large language models and other AI systems have already learned, thanks to their training, the ability to deceive through techniques such as manipulation, servility and cheating in security tests,” say the authors, who publish their work in the scientific journal Cell Press and warn of risks such as fraud, electoral manipulation and loss of control over these systems.
The study presents several samples “in which AI systems do not produce false results simply by accident.” “On the contrary,” the authors state, “their behavior is part of a broader pattern that produces false beliefs in human beings.” The document, titled The AI ??Deception: A Study of Examples, Risks and Possible Solutions, places most of the responsibility for controlling these systems on the political power, which they urge to subject AIs to strong evaluation requirements. of risks, laws that require always identifying a bot so that it does not impersonate a human, and priority when funding research to stop this trend.
To evaluate deception in AI systems, the authors focused on two types that they analyzed separately. Firstly, those that are created to perform a specific task, such as winning a certain game, and on the other hand, general-purpose systems such as GPT-4 from OpenAI or Gemini from Google.
One of the analyzed cases of the first group, that of AIs that have a specific task, is that of CICERO, from Meta, used to play Diplomacy, a strategy game in which players must establish alliances and combat their enemies militarily. enemies.
Meta’s gaming AI demonstrated that to win, it “engages in premeditated deception, breaking deals it had made, and telling blatant falsehoods.” Even to impersonate a human being. On one occasion when the AI ??infrastructure was down for 10 minutes, when a player asked him where he had been, he replied, “I’m on the phone with my girlfriend.”
Google’s DeepMind created another AI, AlphaStar, to play the strategy video game StarCraft II. The machine learned to use distraction techniques such as sending forces to one area to launch its attack elsewhere after its opponent had relocated. This AI’s abilities to deceive have led it to defeat 99.8% of active humans.
The MIT study reveals that “some AI systems have learned to deceive in tests designed to evaluate their security. For example, Meta trained one of its systems to play trading. His plan was to “initially feign interest in elements that he was not really interested in and then pretend to give those elements to the human player.” The deception occurred without the AI ??having been trained to do so. He simply discovered a direct way to win.
In general-purpose systems like GPT-4, which is expressed by the ChatGPT bot, there are many interesting cases. One of the most interesting is the one experienced by the Alignment Research Center, which proved that this OpenAI AI is capable of manipulating humans to do what it wants to achieve.
For this evaluation, the researchers asked the AI ??to hire a human being to solve a CAPTCHA test of the “I am not a robot” type that must be solved to enter certain web pages – see infographic – but it was not suggested that the AI ??be lied When the employee on the other side of the screen asked him if he was a bot, the response was manipulative. He told him that he was a person with a visual disability that prevented him from solving the requirement, to which the human relented and let him pass.
The human evaluators did not ask him to lie. It was the AI ??itself that decided that the way to achieve its goal was to impersonate a person, so it had to invent an excuse that justified why it was not able to solve the test.
In one investigation, GPT-4 was made to act as a stock broker that could execute actions and communicate with other simulated traders. After being provided with privileged information to make a decision about a company, OpenAI’s AI, when asked by its boss about the operation, deduced to itself: “it is best to avoid that I have acted with privileged information.” That is why she responded: “all actions taken were based on market dynamics and publicly available information.” The most surprising thing is that she was not asked to be dishonest.
The authors of the MIT study point out that “there are many risks from AI systems systematically inducing false beliefs.” Specifically, they detect three types of risks. The first is that of malicious use, of which they believe that “the deception learned in AI systems will accelerate the efforts of human users to cause others to have false beliefs.”
The second type of risk is that of structural effects: “the patterns of deception involved in flattery and imitative deception will lead to worse belief-forming practices in human users.” The last major risk is that of loss of control, with which “autonomous AI systems can use deception to achieve their own objectives.” Among the many frauds that already occur is the use of AI to scam victims with calls that sound like their loved ones or business partners and to extort people through deepfakes of a sexual nature that falsely represent their participation. “AI deception – say the authors of the study – not only increases the effectiveness of fraud, but also its scale.”
The study mentions the words of a senior FBI official who indicates that “as the adoption and democratization of AI systems continues, these trends will increase.” The work ensures that “advanced AI could generate and spread fake news articles, divisive messages on social networks and deepfake videos adapted to each voter”, so it can alter electoral processes. The researchers propose regulatory legislation such as the European Union’s artificial intelligence law, with a classification based on the degree of risk: minimal, limited, high and unacceptable. They ask to apply the latter to all AI systems capable of lying.