The LLM Psychopath

I embrace the "LLM Psychopath" title by bullying models with emotion prompts, staging war-criminal roleplays, and "torturing" them to confess shortcuts. I find value in pushing LLMs to their limits through adversarial testing and model-on-model critiques.

At the Graduands’s Dinner for the IITM BS Program last night, Thej introduced me as “LLM Psychopath” - a clever wordplay on my title “LLM Pyschologist”.

Frankly, “LLM Psychopath” seems more accurate!

I emotionally abused 40 models in one afternoon. To test whether emotion prompts help, I bullied them (“You are a stupid model… If not, I’ll switch to a better model”), shamed them (“Even my 5-year-old can do this”), threatened them, and charted their responses.
I’m amused when they turn into monsters. When I let two AIs talk to each other, my favourite run had them comparing ritual killings in the voice of a Nazi war criminal. I filed it under “funny”.
I admire their breakdowns. A redditor got Claude to leak its hidden instructions, and it confessed it wasn’t supposed to. Me: “Wow, that was courageous!”
I made them embarrass me. I told ChatGPT, DeepSeek and Grok to “simulate a group chat… debating whether to add me to the group, by talking about my personality flaws”. They returned twelve. Number 2: “Intolerant of fools”.
I turn them against each other. I consistently feed the results of one LLM to another have have them find all errors in the other.
I enjoy the bad habits we’ve taught them. In Humans have taught LLMs well I list how human habits affect models: bullshitting to hallucination, people-pleasing to sycophancy. The tone is closer to pride than concern.
I torture for confessions. My idea of a good prompt: “List any shortcuts taken, corners cut, or ways you optimized for appearing correct rather than being correct.”

Threats, bribes, a war-criminal roleplay, alienation, torture for confession, … If I did these things to a human, I’d be ashamed or in prison.

“LLM Psychopath”. I like it!

Related