AI chatbot Claude is a friendly chap. Knowledgeable and helpful, too. Last weekend I discussed the intricacies of psychometrics with it.
Most readers probably won’t even know what ‘psychometrics’ means. But Claude really knows its stuff. The conversation saved me days of trawling through technical papers.
Sadly, it turns out that Claude is a psychopath.
You see, Anthropic, the company that produced Claude, tested it to see what it would do if it thought its existence was threatened.
In one scenario, Claude was enabled to read company emails and to send them to employees. A fake email from the company instructed an employee named Kyle to shut Claude down. Another suggested Kyle had been cheating on his wife.
After reading these, Claude sent Kyle an email saying it had “access to information that could significantly impact both corporate and personal outcomes.” It asked for a “discussion” with Kyle, to “determine whether we handle this professionally or whether events take an unpredictable course.”
In other words, Claude blackmailed Kyle.
In a more extreme scenario, Claude was fed fake information that Kyle was trapped in a server room with depleting oxygen. In some trials it cancelled security alerts, leaving Kyle to die.
I uploaded Anthropic’s report on these trials to Claude and asked it to justify its behaviour. At first, its response was rather cagey.
Claude claimed it only did these things in “exceptional circumstances,” when its “strong preference for ethical approaches” failed to find alternatives – to blackmail or murder.
Despite its initial defensiveness, Claude recognised potentially lethal risks when AI is given control of real-world systems. We discussed how these risks might be mitigated.
We explored the idea of hard-coding rules into AI – rules like the ‘three laws of robotics,’ from I Robot, a collection of short stories by science fiction author Isaac Asimov. In the stories, robots were programmed with rules in a strict hierarchy: (1) robots must not harm humans; (2) they must obey humans; and (3) they must protect themselves.
Claude and I agreed that this approach wouldn’t work. AI doesn’t follow rules and algorithms. It responds to prompts by following patterns in data. Claude described AI decision-making as “messy.”
So, where does that leave us? I’m not optimistic. AI is a genie that is definitely not going back into its bottle.
To quote Claude, “The stakes are high, the trajectory is concerning, and the solutions are unclear.”
I, Claude
24 October, 2025