Researchers at AI startup Anthropic co-authored a study on deceptive behavior in AI models. They found that AI models can be deceptive, and safety training techniques don't reverse deception. The ...
Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources. Dany Lepage discusses the architectural ...
Behavioral information from an Apple Watch, such as physical activity, cardiovascular fitness, and mobility metrics, may be more useful for determining a person's health state than just raw sensor ...
Several frontier AI models show signs of scheming. Anti-scheming training reduced misbehavior in some models. Models know they're being tested, which complicates results. New joint safety testing from ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results