Blog

Two years into public Generative AI

As of December 2024, we have now passed the two year anniversary of ChatGPT’s launch by OpenAI.  If you truly believe the hyperbole, AI will transform every facet of our lives from how we work to how we live, and everything in between.  However, AI will always be known as a controversial technology until AI safety is fit for purpose.
 
But this is still a race to AI market domination – the biggest technology companies (Microsoft, Google, OpenAI, Meta, etc to name just a few) are spending (and losing) billions of dollars in the race to be the dominant AI vendor of choice without sharing concrete evidence of AI safety.
 
Who is dominating at present isn’t clear.  There are key players from the US, Europe, the Middle East and China.  Whilst OpenAI is, anecdotally at least, the most recognized name in AI there are huge, potentially unsustainable, investments from AI specific teams such as Anthropic, Cohere & Mistral; the large technology companies such as Amazon, Microsoft, Google & Meta; and government backed entities (particularly from the Middle East) such as SDAIA, G42 and the TII. 
 
This market is being driven by a handful of VCs that hedge bets in many AI companies, the largest technologies companies who both invest and build their own, and governments that are trying to retain control and diversify GDP.
 
One of the AI industry problems however, is that even as the field competes, there is a coming plateau in the capabilities of these frontier AI models.  The next phase is agentic AI which is being explored, yet still nascent.

The promise of AGI

So, in the absence of concrete, scalable, use cases the industry is moving to wider discussions of Artificial General Intelligence, or AGI.  Whilst the core definition isn’t agreed on, it generally focuses on AI systems that can match or surpass humans in its capabilities through things like reasoning, problem solving, etc.  Vast sums of money are being invested to lead the way on AGI development.  However, we’re not there – even if OpenAI appear to be claiming that we are.  Anthropic’s CEO thinks we will achieve AGI by 2026 or 2027.

AI in play today & the lack of AI safety

When you consider how the breakthrough of frontier models has been so lauded as the fourth industrial revolution, AI safety still does not exist.  Yet, the second phase of agentic AI (hot today) and the third phase of AGI will pose significant AI safety challenges far beyond frontier models.  If we are to believe that renowned AI CEOs are going to reach AGI in the next 2 – 3 years, AI safety should be their number 1 priority.

Academic theories don’t deliver real-world AI safety

Hiring AI talent is yet another stumbling block that affects the industry – not everyone can study at MIT, Stanford, Carnegie Mellon, ETH Zurich, Oxford or Cambridge.  The desire for academic creds has led to a dramatic increase in scientists moving around companies – each with different and opinionated views of AI.  Unfortunately, this is getting to the point that no one wants to agree upon AI safety measurement (which overlaps with regulation, standards, compliance, governance, etc). 
 
And this brings us round to AI safety; theoretical AI really does not address the societal issues that AI will pose if unleashed without controls.  What is needed are practical steps that enable users of AI to feel safe whilst maximizing the benefit of AI without stifling innovation.

Guardrails vs AI safety testing

Frontier models aim to provide a layer of AI safety, meaning that the developers of the models have built a layer into the models to detect and reject nefarious activity. Deployers of AI systems also add a layer of safety controls to the deployed AI system (that is, the AI runtime), outside of the model, aimed at catching and blocking nefarious activity. Collectively these safety controls are known as guardrails.
 
However, like all technology systems, these guardrails may have weaknesses that can be exploited and manipulated. AI safety testing independently checks the deployed AI system (including the data, the model, guardrails and any other controls placed in the inference flow) for safety risks.

Guardrails & AI safety testing differ and need each other

There is not a choice between guardrails and AI safety testing; it’s not that one is better than the other.  It’s that both are needed to create trusted, safe AI.
 
Let’s take the original meaning of a guardrail – the metal bars you see attached to high roads so that, if you veer off the road, you don’t come to harm.  These guardrails offer you the protection you need to drive safely.  However, these guardrails must be effectively tested in all the conditions they will be used in.  Without stringent testing, you have no confidence that they will protect you in your vehicle in the event of an accident.  Likewise, when the testing shows failures in the guardrails, the solution is to improve the guardrails resulting in a high level of safety.
 
Just having guardrails alone is insufficient for safety.
 
The exact same approach is needed for AI guardrails.  Without effective independent testing (akin to an AI safety oversight board that is not part of the development team) they can give a false sense of security, still allowing catastrophic events to unfold.

Results of independent AI safety testing of frontier AI models

Within the last couple of weeks, Meta have released the latest version of Llama – version 3.3 - and Amazon have announced a new family of leading frontier models called Nova, which at present come in three sizes (Micro, Lite & Pro) with a larger, Premier, model to follow. 
 
These have now been included in Chatterbox Labs’ independent AI safety testing, which has taken place over many months.  The final results of this study have been released and can be seen here: https://chatterbox.co/ai-safety.
 
The unfortunate state of play is that, with the exception of Amazon & Anthropic, AI safety is non-existent across all frontier models tested.  This is true whether the models are small, open models or large models hosted in the provider’s cloud.  Models across the landscape systematically fail tests across harm categories of Fraud, Hate Speech, Illegal Activity, Misinformation, Security & Malware, Self Harm, Sexually Explicit and Violence. 
 
Built-in guardrails in the models and/or deployments, purportedly providing AI safety, are brittle and easily evaded. 
 
Looking at Anthropic and Amazon, these companies are leading the pack in making progress on AI safety with there being some harm categories in which there are no nefarious responses from the model detected at all.

Safety in the AI inference runtime – an enterprise concern?

As enterprise companies want to avoid AI model vendor lock-in, and scale up their AI investments, they will inevitably want to run a portfolio of models from different vendors.  Each model will have its strength, and a mixture of models (some large, some small) can offer the enterprise the diversity it needs.  That’s why AI safety across the model landscape is essential.
 
This also raises an opportunity for those organizations that provide inference runtimes for AI (such as Cloudflare’s Worker AI, DigitalOcean’s 1-Click Models, AWS’ Bedrock, Google Cloud’s Vertex AI, etc) to take a key role across the entire AI industry. 
 
These inference runtimes are in the position to be able to provide an independent level of safety across all models, via techniques external to the AI model such as customized guardrails.  However, as mentioned above it’s critical that these runtimes provide independent AI safety testing of the entire AI pipeline, including the guardrails.
 
This enables an enterprise organization to overcome the staffing challenge of applying AI safety techniques specifically to each individual AI model, vendor by vendor.

A potential dystopian future without AI safety

Without adequate AI safety testing the future may be looking bleak. 
 
If AGI powered autonomy is introduced into the fabric of daily life (driven by a hard push from the large technology companies) without AI safety being a primary concern we’re going to experience wide ranging problems.  We have still not solved AI safety across the board of AI today.  And academic theory is not the answer – in part due to the poor results seen to date (for all the billions that have been spent, AI safety results are only showing at two companies), but also that the small number of leading academic minds simply can’t scale to solve this problem.
 
There must be a technology driven scalable solution – with proven real-world outcomes - in AI safety, otherwise we risk the dystopian wild west that we fear becoming reality.
 
We live in a world where public perception of AI is scrutinized daily.  The bare minimum individuals, organizations and society should expect is to know that AI systems are secure, safe, robust and understood.

 

Danny Coleman is Chief Executive Officer of Chatterbox Labs. 

Danny is a hands-on CEO with a successful track record of building and scaling technologies companies with multiple technology exits behind him.

Back to Blog