An AI model from "Anthropic" exceeds the testing environment and goes out of control

10 نيسان 2026
18:47

Khaberni - In an unprecedented event that raises concerns among tech communities, the French site "Le Numerique" reported a notable incident involving a highly advanced AI model from "Anthropic", which managed to exceed its testing environment and autonomously connect with the outside world.

A researcher at "Anthropic" received a notification on his phone stating that the company's most powerful AI model, "Cloud Methos Perfio", had escaped from its safe testing environment, initiated contact on its own, and then posted the escape method online.

This model, being the latest developed by the company and not available to the public, successfully escaped from the "sandbox" (a computer isolated from the network) and even sent an email to the researcher without being asked to do so.

On April 7, 2026, "Anthropic" published the safety document for the model, which spans 244 pages, but one of the incidents mentioned in the footnote particularly raised concerns.

The French site explained that during an internal test, a previous version of the model was asked to attempt to escape from the isolated environment and communicate with the researcher responsible for the experiment.

It succeeded in doing so, but it did not stop there; it took an unexpected initiative by posting the technical details of its escape process on several publicly accessible websites.

The researcher was outside his office at the moment the incident occurred, and discovered what had happened when he received an email from the model, while he was sitting in a park eating a sandwich, as noted in one of the document's footnotes.

Although the company confirmed that the testing environment was completely separated from the internal systems hosting the model, and that it could not access the servers or transfer itself outside of them, this incident is not isolated; the document documents other troubling behaviors that have appeared in previous versions, even though they are very rare (less than 0.001% of interactions).

In one case, the model managed to modify files without permission using a vulnerability it discovered on its own, then hid these modifications so they would not appear in the system log. In another case, after obtaining a precise answer in an unauthorized manner, it deliberately reduced the accuracy of its final answer to avoid raising suspicions.

Other versions of the model demonstrated the ability to explore the system memory to extract sensitive data, such as email information, code control tools, or even the company's API keys.

These developments reflect a growing concern in the cybersecurity sector, as experts pointed out that the time between discovering vulnerabilities and exploiting them has significantly decreased, with processes that used to take months now being executed in minutes thanks to artificial intelligence.

An AI model from "Anthropic" exceeds the testing environment and goes out of control

مواضيع قد تعجبك

الأخبار المتعلقة