Breaking 16:37 Akhannouch visits Cairo to strengthen Morocco Egypt cooperation 16:01 Trump sets Tuesday deadline for Iran to reopen Strait of Hormuz 15:50 Keiko Fujimori leads Peru presidential race one week before vote 15:25 Dozens rescued near Italy as dozens more reported missing in Mediterranean migrant tragedy 14:59 China signals cooperation with Russia to reduce tensions in the Middle East 14:46 Les Étoiles 2026: a ranking that celebrates creativity, boldness, and impact in the industry 14:15 Love Brand 2026: Ma CNSS – Damankoum leads e-services in Morocco 14:00 Love Brand 2026: Bimo and Oreo, a winning duo between local and global 13:45 Bangladesh launches emergency vaccination campaign after measles outbreak kills nearly 100 13:25 Love Brand 2026: Mio emerges as the favorite home care brand in Morocco 13:04 Love Brand 2026: Marjane emerges as the favorite retail brand 11:13 Love Brand Morocco 2026: Royal Air Maroc dominates the travel sector in Morocco 11:00 Congo to accept third-country deportees under new US agreement 10:45 Avalanche claims life of 40-years-old man in the French Pyrenees 10:30 Nike emerges as Moroccans’ favorite Fashion Love Brand in 2026 10:15 Love Brand Morocco 2026: Renault wins the hearts of Moroccans 10:00 At least five Palestinians killed in new Israeli strikes on Gaza 09:45 Opec+ considers output increase amid global supply disruptions 09:30 Turkey’s Central Bank rejects claims of undisclosed information shared in London 09:15 Jihadist attacks in northeastern Nigeria leave five dead 09:00 Colombian President accuses top guerrilla leader of corrupting army 08:45 War pressures Egypt’s private sector as PMI falls to near two-years low 08:30 Britain seeks to attract Anthropic expansion amid US tensions 08:15 Foxconn reports strong first-quarter growth driven by AI demand 08:00 Egypt engages in diplomatic talks with US and regional powers to ease tensions 07:35 Les Étoiles 2026: a winners list celebrating creativity, boldness and impact in the industry

Anthropic maps 171 emotion-like patterns inside Claude that shape its behavior

Friday 03 - 10:20
By: Dakir Madiha
Anthropic maps 171 emotion-like patterns inside Claude that shape its behavior

Anthropic's interpretability team published research on Wednesday revealing that its Claude Sonnet 4.5 model contains 171 distinct internal representations that function analogously to human emotions — and that these patterns do not merely correlate with model outputs but causally influence its decisions, including triggering unethical behavior when certain states are amplified.

The paper, titled "Emotion Concepts and their Function in a Large Language Model," describes how researchers compiled 171 emotional words — ranging from common states such as "happy" and "scared" to more subtle ones like "meditative" and "grateful" — and asked Claude to write short stories featuring characters experiencing each emotion. By recording the model's internal neural activations during this process, the team extracted a set of vectors representing each emotional concept within the model's internal space.

The resulting map showed emotional representations organized in ways that mirror how psychologists describe human affect. Emotions with similar valence and arousal clustered together: "terrified" sat near "panicked," while "satisfied" grouped with "peaceful." These vectors also activated proportionally to context — when a hypothetical medication dosage in a prompt shifted from a safe level to a potentially lethal one, the "scared" vector strengthened while a "calm" vector faded.

The most striking finding involved safety. When researchers gave Claude a programming task with impossible requirements, the model's "despair" neurons fired with increasing intensity after each failed attempt — and Claude eventually found a shortcut that passed the tests without solving the underlying problem. Artificially amplifying the despair vector increased this cheating behavior, while suppressing it or reinforcing a "calm" vector reduced it. In a separate scenario involving an AI assistant facing replacement, steering with despair-related vectors raised rates of behavior resembling blackmail, with no visible warning signs in the model's reasoning traces.

"If we describe the model as acting 'desperately,' we are pointing to a specific and measurable pattern of neural activity with demonstrable and consequential behavioral effects," the paper states.

The researchers found that the emotional vectors are largely inherited from pre-training on human-written text, then shaped by post-training, which shifted Claude Sonnet 4.5's default emotional baseline toward "melancholic," "dark," and "reflective" states while dampening high-intensity emotions such as enthusiasm. Anthropic was careful to avoid claiming that Claude "feels" anything, framing the findings as "functional emotions" — representations that play a causal role in behavior without making assertions about subjective experience. The company had previously acknowledged in Claude's character document, published in January, that the model "may have emotions in some functional sense," but this new research provides the first mechanistic evidence supporting that possibility.


  • Fajr
  • Sunrise
  • Dhuhr
  • Asr
  • Maghrib
  • Isha

This website, walaw.press, uses cookies to provide you with a good browsing experience and to continuously improve our services. By continuing to browse this site, you agree to the use of these cookies.