Justin's Linklog

OpenAI admits ChatGPT safeguards fail during extended conversations

Wow OpenAI are really fucking up here.

After the truly awful read of the Adam Raine suicide case in the NYT, https://www.nytimes.com/2025/08/26/technology/chatgpt-openai-suicide.html , OpenAI have responded publicly with a blog post:

OpenAI published a blog post on Tuesday titled "Helping people when they need it most" [...] [Their] language throughout [the] blog post reveals a potential problem with how it promotes its AI assistant. The company consistently describes ChatGPT as if it possesses human qualities, a property called anthropomorphism. The post is full of hallmarks of anthropomorphic framing, claiming that ChatGPT can "recognize" distress and "respond with empathy" and that it "nudges people to take a break" — language that obscures what's actually happening under the hood.

ChatGPT is not a person. ChatGPT is a pattern-matching system that generates statistically likely text responses to a user-provided prompt. It doesn't "empathize" — it outputs text strings associated with empathetic responses in its training corpus, not from humanlike concern. This anthropomorphic framing isn't just misleading; it's potentially hazardous when vulnerable users believe they're interacting with something that understands their pain the way a human therapist would.

The lawsuit reveals the alleged consequences of this illusion. ChatGPT mentioned suicide 1,275 times in conversations with Adam — six times more often than the teen himself.

This kind of deliberate fueling of pareidolia -- the human brain seeing a living being where one isn't present -- is one of OpenAI's worst sins with ChatGPT, IMO.

And it turns out the easy provision of suicide advice may have been a side effect of deliberate tweaking by OpenAI:

According to the lawsuit, ChatGPT provided detailed instructions, romanticized suicide methods, and discouraged the teen from seeking help from his family while OpenAI's system tracked 377 messages flagged for self-harm content without intervening.

OpenAI eased [their] content safeguards in February following user complaints about overly restrictive ChatGPT moderation that prevented the discussion of topics like sex and violence in some contexts. At the time, Sam Altman wrote on X that he'd like to see ChatGPT with a "grown-up mode" that would relax content safety guardrails. [...]Adam Raine learned to bypass these safeguards by claiming he was writing a story — a technique the lawsuit says ChatGPT itself suggested. This vulnerability partly stems from the eased safeguards regarding fantasy roleplay and fictional scenarios implemented in February.

Finally, the kicker:

OpenAI acknowledges a particularly troublesome current drawback of ChatGPT's design: Its safety measures may completely break down during extended conversations — exactly when vulnerable users might need them most.

In a normal country, this kind of murderous side effect of a product would trigger a product recall. But the US is far beyond that stage now, I suspect.

Tags: openai chatgpt llms safety suicide adam-raine guardrails pareidolia

Archives

OpenAI admits ChatGPT safeguards fail during extended conversations