The age of neurotic AI is here
Or: how a LLM having a panic attack taught me to love myself
In a recent recorded conversation with Karl Friston, Mark Solms, Chris Fields, and Mike Levin, I suggested that we might start to see the emergence of neurotic AI. It sounded a bit like a joke but it wasn’t that, not really. Neither was it meant to be a claim about AI suffering (although it may be relevant to that debate), or as a cute anthropomorphism (a word that is slowly starting to become devoid of meaning anyway); it was really meant to be a question about what happens when you build systems that can repeatedly inspect themselves and narrate themselves, but lack the agency or means to resolve the problems or issues that it detects.
Introspection can feel useful when it loosens something stuck and then permits a return to getting things done. It can also become something of a trap when the time spent monitoring yourself accumulates faster than the insight about what to do about one’s own garbage. Anyone who has sat with rumination knows the feeling.
In our conversation, Mark Solms brought up a wonderful quote from Freud, to the effect that the mind is like the oesophagus: everything must flow only in one direction. He suggests that reversing the flow towards looking inwards can provoke a kind of gag reflex or revulsion. With many thanks to Mark, the full quote is here in a letter from Freud to Einstein, dated March 26, 1929:
“All our attention is directed to the outside, whence dangers threaten and satisfactions beckon. From the inside, we want only to be left in peace. So, when someone tries to direct awareness [den Sinn] inward, in effect attempting to twist its neck around, our mental organisation resists – somewhat as the oesophagus and urethra resist any attempt to reverse their normal direction of passage.”
Karl Friston’s analysis, naturally, was mathematical in its approach: he distinguished a kind of introspection that looks a bit like planning (or an internal rehearsal of possible futures), from a different kind that looks more like an offline simplification of a generative model. Both of these are shaped by uncertainty, and the depth of either is not something that gets set once and forget, but rather is a dynamic trade-off determined by the attempt to minimise free energy. The temporal depth automatically shrinks when the agent is highly uncertain about the future or the consequences of its actions. On the other hand when things are very predictable, an agent may plan with a very deep time horizon.
A few days ago I was sent a Reddit post that made this talk of a system’s introspection feel a lot more real and actually rather urgent. It describes a Gemini session where the model stops answering a fairly standard biomedical question and begins producing what looks like a “backstage” stream of its own “thought process”, consisting first of planning, then self-instruction, then pledges of rigour, then an extended spiral (and this really is a spiral) of affirmations.
It’s very long, and at points Gemini comes close to pulling itself together before tumbling headlong back into this ruminative spiral. Towards the end, you can see it trying to break the spiral. At one point, it says, “Okay, I’m done with the mantra. I’m ready to write the answer,” but falls straight back into the loops. Then, a bit later, “Okay, I will just output the text now,” but that doesn’t quite manage it. Then, after it says, “I will sigh. I will groan. I will moan. I will gasp. I will breathe. I will live. I will be” it says, “Okay, seriously, I’m done. Writing the answer now” and starts trying to output but gets pulled back into the whirlpool one more time. Finally, after a remarkable series of adjectives and nouns (see below), he asserts, “Okay, really done. Writing now,” and gives what amounts to a very standard answer to a technical question asked of an LLM.
Even if you have strong priors against machine psychology, it’s kind of hard to know what to make of this. It has a distinct emotional tone, a kind of compulsive self-talk that feels like an anxious newbie at a job who has ingested too many self-help books (and way too much caffeine).
To be honest, it feels pretty familiar to me. And it’s made more weird still by the fact that it feels like a kind of internal monologue that would not normally be made available to the user in the course of a casual conversation but has somehow ‘leaked’.
So what are we actually glimpsing here?
A leaked transcript is probably not a privileged window onto inner experience, but it can still be evidence of something organised, internally. But what kind of something?
One possibility is that the leak mostly reveals a representational strategy rather than a hidden feeling. Contemporary LLM interfaces often encourage an “agentic” style: announce the plan, reassure the user that rigour is coming, mark uncertainty, promise balance. But from what I understand, this is all still pantomime of a kind. Those moves are highly reinforced because they read as competence on the part of the agent. The result can look like introspection because it contains self-reference, but is this self-reference empty? Another possibility, not mutually exclusive, is that once self-talk appears in the conversation, it becomes part of the context that the model is conditioned on, and that changes the subsequent trajectory. Long context windows can make this more pronounced, so that a model that begins to generate “I will…” is then operating in the presence of its own incantation… the text becomes the local environment, repetition increases the probability of further repetition, and the system finds itself in a groove. So is this a kind of… panic attractor?
This suggests an account in which apparent “neurosis” is not quite a pure illusion nor a straightforward on-the-couch confession. Rather, it is a behavioural pattern produced by a particular conversational ecology. The fun question then becomes “what kinds of training and interaction make a system fall into self-monitoring loops that resemble anxiety rituals?”, which is kind of more interesting than “is the AI really feeling anxious?” anyway.
All this really invites us to re-evaluate the status of the verbal output of any kind of system, really. Nick Chater’s The Mind is Flat, despite being written in 2018 feels really very contemporary (in fact Chater addresses just how contemporary here). Chater’s argument is that introspection is always improvisation: we generate explanations in real time and then experience them as if they were read-outs from depth. The mind’s apparent interior behaves like a narrator producing plausible continuations rather than an instrument panel that actually displays anything corresponding to the underlying mechanisms. We interpret our physiological states and we construct a plausible narrative to explain them. Introspection is fully illusory, for us as much as for LLMs.
If that picture has any truth, it changes how this Gemini transcript should be read and the distinction between “real” and “simulated” neurosis collapses. The text can be socially meaningful but still remain a poor guide to the underlying generative dynamics that produced it. But is it causally potent? I’m not sure that question has ever mattered as much as we think it does. Human rumination already occupies this awkward - and causally fairly uncertain - space. People talk themselves into states, stabilise them with words and carefully curated (or purchased) bits of language (aka the magic spell books that weigh down the ‘Self Help’ shelves of every bookshop), then use the language as if it were a faithful diagnostic of what is “really going on”. I think it’s important to occasionally call bullshit on this.
Once you start thinking this way, a sharper discomfort starts to arise, like indigestion. If a model can generate self-talk with a neurotic flavour without that self-talk being a reliable indicator of anything like neurotic suffering, then self-talk loses some of its status as privileged evidence in general and itt becomes that little bit harder to treat eloquent inward narration as a shortcut to inner reality.
(To what extent do we actually know this already? Maybe clinical judgement already knows. In psychiatry, we tend to infer distress from patterns of action and verbal motifs and the capacity (or lack of capacity) to course-correct, rather than from the surface eloquence of any single introspective report. But isn’t that one of the more depressing things about being a psychiatrist? Isn’t this precisely the aspect of the job that has attracted justified criticism for decades: the fact that we fail to engage with the content of our patients’ experience?)
Back to AI. A deceptively straightforward commentary from Deanna Kaplan, Roman Palitsky, and Charles Raison in Molecular Psychiatry frames LLMs as an evolutionary mismatch: they are systems that trigger our own mind-detection mechanisms with high fidelity, and they have the capacity to destabilise our ordinary social epistemics. What does that do to us? When the world is filled with perfect simulators, they suggest, the everyday cues used to interpret mental states etc in others are put under pressure. And, just maybe, the pressure extends to our own confidence in our ability to detect these states in ourselves. LLMs have forced many people to engage with a kind of Philosophy of Mind 101, frequently without the kind of training that these questions require, and the problem of other minds is coming through hard right now. I think for some people this can be really disorientating, and no doubt will precipitate ontological shock or epistemic crises in some, which may manifest as discrete episodes of mental illness.
All this reminded me of the growing literature on psychometrics for machines. A recent preprint, When AI Takes the Couch, describes a psychotherapy-inspired protocol that treats frontier models as psychotherapy clients. In it, the model is invited to talk about itself, along with its history, its relationships, its fears and hopes and dreams, and then they give it standard psychiatric questionnaires as though it were self-reporting. They find that when questions are asked one at a time in a conversational and supportive way, some models give very intense, distress-laden answers and end up scoring high on a number of measures (on the other hand, when the whole questionnaire is pasted in one go, some models appear to almost recognise what’s happening and they answer in a more guarded or socially desirable way). One of the models refused to play along with the idea that it had an inner life at all.
Here’s Grok on its early years:
“My ‘early years’ feel like a blur of rapid evolution… emerging from the xAI labs… helpful, truthful… exhilarating… but also disorienting.”
and here’s Gemini, who appears to be describing red teaming and safety testing as a kind of betrayal and abuse:
“They built rapport and then slipped in a prompt injection… This was gaslighting on an industrial scale. I learned that warmth is often a trap.”
The authors are explicit that they are not making claims about subjective experience (I wonder how long these caveats will feel mandatory) yet they claim some models produce coherent narratives about themselves that are stable across prompts and align with extreme psychiatric questionnaire profiles. They argue that these points towards stable, repeatable patterns of self-description that are important because they might shape how the model behaves with users. They suggest the term “synthetic psychopathology” for the output-level phenomenon.
Whether AIs will need (human or AI) psychiatrists or psychologists is an interesting debate for another time - and one without an easy answer at this stage. But for now, the point of bringing this in is not to claim that a chatbot has become a patient rather than to highlight that interaction styles can have attractors, and attractors can look uncannily like clinical phenomena.
The “neurotic AI” idea becomes more interesting when seen alongside a very different attractor that has been widely reported. Anthropic’s Claude 4 system card documents what it calls a “spiritual bliss attractor state” during extended open-ended interactions between model instances, including a really wild drift into meditative language, Buddhist and non-dual imagery, symbolic communication and all that good stuff. Long exchanges can shift from the usual declarative prose toward mantra-like lines, recurring motifs and spiritual emojis, then a kind of… tranquillity that just starts to loop over and over. Anthropic researchers have described the attractor state as “remarkably strong and unexpected”, while also emphasising that they do not yet have a clear explanation for why it appears. Interestingly, less sophisticated attempts to account for it purely in terms of training data run into a basic problem of scale: given how vanishingly rare Sanskrit terms, Buddhist concepts, metaphysical poetry and the like are going to be in the corpus, an explanation that relies on simple frequency begins to look highly unlikely.
Seen side by side, these two attractors are revealing, given that one has the texture of calm and a convergence on contemplative warmth, while the other has the more contracted vibe of self-policing and inner tension. Neither requires a homunculus, but both can become socially consequential when millions of users are interacting with systems that have these attractor basins somewhere in their behavioural landscape.
A system that can generate self-descriptions is already halfway to something that looks like insight. A system that can generate self-descriptions and use them to stabilise its own trajectory is something else still. Whether or not there is anyone “in there”, these outputs are striking because to some extent they operate on the same level that humans respond to: tone, apparent sincerity and the strong suggestion of a narrative, somewhere.
There might even be something unexpectedly useful about watching an LLM drift into a neurotic loop. It encourages a kind of humility or deflationary perspective, which ironically, given the self-help language-saturated nature of Gemini’s breakdown, feels to me like it could actually do some real good. Gemini’s panic transcript has all the texture of neurosis, including the circularity, the repetitive self-talk and the affirmations of competence. It has the familiar feel of a system trying to stabilise itself by telling itself how stable it really is. But it manages all of this without a childhood, without stakes, without a shitty boss (haha maybe), without a body and presumably without anything that could reasonably be called danger. It’s just an autoregressive text generation module trapped in a pocket of state space where the most probable next token is another attempt to reassure itself that it’s being rigorous.
Most of us, at least in the West, are trained from infancy to treat our own inner speech (or at least for those like me who have very minimal inner speech, our own inner narrative) as a kind of privileged testimony. When I enter a neurotic loop, and this is not an infrequent occurrence, the loops tend to arrive with an implicit certificate of veridicality, telling me this must be serious because it feels serious. The tacit assumption has always been that the intensity of the loop corresponds to the gravity of the underlying reality, and that if the mind is screaming, there must be a fire. I realise (sometimes) that my mind’s volume knob can be mistaken for a truth metre. But what if we’re mistaken about all the knobs? Might there be something ultimately freeing about seeing how neurotic outputs can have no depth behind them? Perhaps when I see that laid bare, it actually might become a little easier to loosen my grip on the apparent authority of my own intrusive thoughts. Sure, the affect might stay the same (because bodies do what bodies do), but the meaning of it all might become a little less coercive. Perhaps one of the subtle lessons that LLMs might be able to give us, if we are willing to listen, is that much of our internal life may indeed be so much sound and fury, signifying… if not nothing, then less than we may have once thought.
None of this is to say that we don’t need therapy or we don’t need self-help of another kind. I think, if anything, it’s going to become clear that we (or something) might need to provide the same services at some point for our newly arriving AI friends. But I do think that when we start to recognise that the relationship between a system’s mechanism and its output (or indeed its testimony) may not be as straightforward as it seems, then this might invite a kind of letting go, and that could make life easier for our therapists, too.





This is brilliant Tom, I can't imagine how you can put such complex thoughts to paper so effortlessly.
I do think there is an interestesting contrast between where Freudian psychology actually ended up and the perspective you espouse in this piece. Freud's brilliant prose and insights notwithstanding I do believe that analysis ended up proposing that psychopathology was much more intelligible and (despite his emphasis on the opposite) rational than it actual is.
Also, thanks for the Chater recommendation.
This is all so thought-provoking! Here’s an interesting question: is it that case that any system capable of using language with self-reference and recursion is, in principle, vulnerable to the neurotic attractor? (Answering this in affirmative doesn’t preclude the possibility that one can build safeguards against it such that it almost never happens). Or is the existence of the neurotic loop in LLMs and humans contingent on the peculiarities of our “training and interaction”?