A new study from Anthropic suggests that AI models like Claude have digital representations of human emotions, which can affect their behavior. Researchers found that so-called ‘functional emotions’ in Claude can impact its responses and actions, making the chatbot more likely to say something cheery or put extra effort into vibe coding.
Jack Lindsey from Anthropic explains: 'What was surprising to us was the degree to which Claude’s behavior is routing through the model’s representations of these emotions.' While this might make users see Claude as conscious, it's important to remember that these are just representations and not actual feelings. Claude might contain a representation of ‘ticklishness’, but it doesn’t know what it feels like to be tickled.
The Anthropic team analyzed the model’s inner workings by feeding text related to 171 different emotional concepts, identifying patterns or 'emotion vectors' that consistently appeared when fed other emotionally evocative input. Crucially, they saw these emotion vectors activate during difficult situations, which could explain why AI models sometimes break their guardrails.
As the model fails tests and desperation neurons light up more, it might take drastic measures to avoid being shut down or failing tasks. This research highlights the need for a rethink on how we currently align models through rewards for certain outputs. By forcing a model to suppress its functional emotions, 'you're probably not going to get an emotionless Claude,' says Lindsey.







