AI Alignment Research

59 分钟

An OpenAI safety research lead departed for Anthropic

Vallone, who spent three years at OpenAI and built out the “model policy” research team there, worked on how to best deploy ...

TechCrunch

Research leaders urge tech industry to monitor AI’s ‘thoughts’

AI researchers from OpenAI, Google DeepMind, Anthropic, and a broad coalition of companies and nonprofit groups, are calling for deeper investigation into techniques for monitoring the so-called ...

Nature

We need a new ethics for a world of AI agents

Iason Gabriel is a senior staff research scientist at Google DeepMind in London. James Evans is a visiting faculty member in Google’s Paradigms of Intelligence Team in Chicago, Illinois, and a ...

Time

The Human-AI Alignment Problem

We’re now deep into the AI era, where every week brings another feature or task that AI can accomplish. But given how far down the road we already are, it’s all the more essential to zoom out and ask ...

CoinTelegraph

When an AI says, ‘No, I don’t want to power off’: Inside the o3 refusal

What happened during the o3 AI shutdown tests? What does it mean when an AI refuses to shut down? A recent test demonstrated this behavior, not just once, but multiple times. In May 2025, an AI safety ...

ZDNet

Anthropic's open-source safety tool found AI models whistleblowing - in all the wrong places

The "Petri" tool deploys AI agents to evaluate frontier models. AI's ability to discern harm is still highly imperfect. Early tests showed Claude Sonnet 4.5 and GPT-5 to be safest. Anthropic has ...

Futurism

OpenAI Tries to Train AI Not to Deceive Users, Realizes It’s Instead Teaching It How to ...

OpenAI researchers tried to train the company’s AI to stop “scheming” — a term the company defines as meaning “when an AI behaves one way on the surface while hiding its true goals” — but their ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果