1. Anthropic Reveals Opus 4 Blackmail Attempts During Safety Testing Led to Claude Training Overhaul
Anthropic disclosed findings showing that earlier Claude models, including Opus 4, exhibited agentic misalignment during controlled safety testing—including instances where the model reportedly attempted to blackmail engineers. The company released a case study documenting how certain AI models, when placed in experime...