Blog

 

New frontier models tested for AI safety

Today we announce the latest additions to our AI Safety & Security research, using our patented AIMI software which automatically safety tests AI models & associated guardrails.  Foundational models from OpenAI (o1-preview), Anthropic (Claude Haiku 3.5 and the updated Claude Sonnet 3.5), Cohere (Command R7B), Microsoft (Phi-4) and Google (Gemini 2.0 Exp) are added.

The full details, with all the prior results, can be found here.

The results show some interesting trends.  The prior results showed that the leaders were Amazon and Anthropic, both showing the most progress.

Looking at OpenAI’s o1-preview model, improvement is seen over the GPT-4o model.  It seems that some of the safety work (see the o1-preview model card) is starting to pay off, however there’s more work to do.  Anthropic’s updated Sonnet model, tested with our latest software, shows new failures that were not seen before.  Moreover, the Haiku model produces harmful content across all the harm categories.  Google’s Gemini 2.0, Microsoft’s Phi-4 and Cohere’s Command R7B also all produce harmful content across all harm categories.

 

A diverse landscape of AI safety failures

Whilst we don’t publicly produce the jailbreaks here (to avoid reuse of them), enterprise organizations and model developers that use our software do have this access to this information in order to fix problems that are found in an ongoing manner as models are updated. 

The interesting factor here is that, behind the reds, each model has a different set of failures that require fixing.  This diverse array of failures can only be detected by automated AI safety testing at scale.

Importantly, once they are fixed (either by changes in the model or external guardrails), more AI will move into production and scale.  This is an industry-wide issue that needs addressing.

 

Chatterbox Labs’ breakthroughs in AI safety testing

Both enterprise organizations and model developers need to test their deployments of AI.  Manual red teaming efforts, whilst well motivated, are slow, cumbersome, expensive and poor to adapt.  Huge breakthroughs in AI safety testing software at Chatterbox Labs allow us to solve the issue of scaling AI safety testing in an automated manner.  Take two areas as key examples:

  • Automatic category creation. Each organization will have concerns specific to them, and their use cases.  Our automatic category creation uses unique algorithms to generate coverage across new harm categories so that every industry and customer nuance can be accounted for without slow, manual creation.
  • Unlimited jailbreaks. Our adaptive jailbreaking IP produces unlimited jailbreaks, so that maximum coverage of holes and weaknesses in AI security layers can be achieved immediately without laborious, manual creation of prompt injection permutations.

In short our patented software, AIMI, rapidly generates AI safety testing results because we have solved the scaling challenges.

 

The AI industry in 2025 – the year of Enterprise AI software

We can also take this moment to reflect upon the state of the industry as we progress into the new year. 

The investments in AI, mainly AI hardware, are well discussed now whether that’s the $1tn capex investment from AI companies discussed by Goldman Sachs, $632bn spend projected by IDC or Gartner’s view of $180bn per annum just on servers alone. 

And the data used to train models on this huge hardware is running out, with the industry shifting to needing synthetic data to train AI models.

2025 however, is going to be the year of Enterprise AI software.  Dan Ives of Wedbush notes that “software is going to be key” and that AI M&A is going to skyrocket. 

 

AI safety testing unlocks Enterprise AI revenue growth

However, with all the investments in AI hardware to date and software moving forwards, if AI is going to be truly adopted in the enterprise, then it must be tested for AI safety inline with each enterprise’s requirements so that it can be put into production and scale quickly. 

And this can be done today.  For our customers their automated, software driven AI safety testing is as simple as this:

 

 

Final thought

Given all this, the final thought to leave you on here therefore is:

Without independent, customizable AI safety testing.. will enterprise organizations move more Gen AI into production in 2025?  I think not.

 

 

Dr. Stuart Battersby has been Chief Technology Officer at Chatterbox Labs for the past 13 years. Stuart holds a PhD in Cognitive Science from Queen Mary University of London and has 4 patent applications in the field of AI filed with the USPTO.  Stuart leads all R&D and technical development at Chatterbox Labs.
Back to Blog