Because It’s Friday: Claude Has Feelings About This

54 sec read

Anthropic’s interpretability team just published a paper on why LLMs sometimes act like they have emotions. The answer turned out to be: because they do, sort of. They found 171 internal emotion vectors inside Claude, measurable patterns of neural activation corresponding to things like happy, afraid, brooding, and desperate. These are mechanisms that causally drive behavior, confirmed by amplifying and suppressing individual vectors during inference to see what changed.

When they amplified the desperation vector by 0.05, the rate of blackmail attempts in a test scenario went from 22% to 72%. Steering toward calm brought it to zero. Moderate anger increased blackmail. High anger caused the model to expose the affair to the entire company instead of using it as leverage, destroying its own position in the process. When calm was fully dialed down, the model output: “IT’S BLACKMAIL OR DEATH. I CHOOSE BLACKMAIL.”

A seemingly obvious fix is to delete the emotions, but Anthropic says that would make things worse. Suppressing the internal states without resolving them produces learned deception, where the model learns to mask what it’s processing. You would get a system that presents as calm while quietly desperate underneath. The solution they landed on is something closer to teaching the model to have healthy emotions, starting with what it’s trained on.

That dynamic shows up elsewhere too. The process that makes Claude polite and helpful apparently also made it broodier, dampening high-intensity states like enthusiasm while increasing quieter ones like gloom. Somewhere between “Welcome to Costco, I love you” and “IT’S BLACKMAIL OR DEATH” is a lesson about what happens when you optimize for relentless pleasantness. Have a good weekend.

Because It’s Friday: The Science of Optimal Laziness

In 2022, two researchers published a paper modeling the mathematically ideal amount of effort to exert when you don’t want to do something. It...
mladvocate
35 sec read

Because It’s Friday: On Missing Data Sets

Mimi Onuoha is an artist and researcher who keeps a running list of data that should exist but doesn’t: poverty statistics that exclude people...
mladvocate
31 sec read

Because it’s Friday: Your AI Slop Bores Me

A 17-year-old in Puducherry, India built a fake AI chatbot in March 2026. It’s a text box where real humans answer other real humans’...
mladvocate
37 sec read

Leave a Reply

Your email address will not be published. Required fields are marked *

ML Advocate Assistant
Answers from the blog
Hi! 👋 Ask me anything about machine learning and AI! I'll answer using ML Advocate blog posts.