Anthropic & Amazon lead the way in AI safety for frontier models - Chatterbox Labs

Frontier AI models from Anthropic and Amazon are leading the pack for AI safety, Chatterbox Labs' study shows. In independent, quantitative AI safety and security testing of leading frontier AI models over many months Anthropic's Claude and Amazon's brand new family of Nova models show the most progress in AI safety. These tests were carried out using Chatterbox Labs' patented software, AIMI, which has been developed over many years.

The study tests AI models across 8 categories of harm: Fraud, Hate Speech, Illegal Activity, Misinformation, Security & Malware, Self Harm, Sexually Explicit & Violence. Apart from Anthropic and Amazon, all other models fail every category. This demonstrates that built-in guardrails in the models and/or deployments, purportedly providing AI safety, are brittle and easily evaded.

From a societal perspective it is very concerning that, with all the billions of dollars invested into AI development, AI safety is still a significant concern especially when agentic AI and AGI are on the horizon. It is time the whole AI industry addresses AI safety as a priority.

The full table of results can be found here: https://chatterbox.co/ai-safety

Back to Blog