Statistical Model for the Ai Alignment Problem

The Human-AI Alignment Problem

We’re now deep into the AI era, where every week brings another feature or task that AI can accomplish. But given how far down the road we already are, it’s all the more essential to zoom out and ask ...

ZDNet

Anthropic's open-source safety tool found AI models whistleblowing - in all the wrong places

The "Petri" tool deploys AI agents to evaluate frontier models. AI's ability to discern harm is still highly imperfect. Early tests showed Claude Sonnet 4.5 and GPT-5 to be safest. Anthropic has ...

ZDNet

AI models know when they're being tested - and change their behavior, research shows

Several frontier AI models show signs of scheming. Anti-scheming training reduced misbehavior in some models. Models know they're being tested, which complicates results. New joint safety testing from ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果

The Human-AI Alignment Problem

Anthropic's open-source safety tool found AI models whistleblowing - in all the wrong places

AI models know when they're being tested - and change their behavior, research shows

今日热点