AI Safety is a Technical Problem

AI Safety as a technical concern and engineering discipline

Oct 16, 2023

A Personal Note

I was meaning to write an article or two about the AI Engineer Summit last week, but the arrival of a new PC kept me busy and diverted my attention. I hope to have articles about the summit this week, and I also hope to share more interesting results in future articles on running local AI models and AI Agents from my RTX 4090 enabled machine.

I am right now learning to use voice input on my new PC, specifically the Microsoft Windows voice typing feature. What’s nice about this interface add-on is that it available in any text box input, including my sub stack article editing window.

I am hoping voice input mode makes me a more productive writer, but this is the first time I am using it and so far it's no faster than typing. It turns out most of my time is spent editing not dictating, and I find myself switching between typing and speaking while I edit. Still, I find it useful already, and I believe it will be a big productivity boost once I get the hang of it.

I live near Austin, Texas, and last week I went to an Austin AI meet-up last Thursday. This article is inspired by some of the comments I found myself saying in conversations with others. I'd been listening to the AI Engineer summit talk on Guardrails AI by Shreya Rajpal on the way driving to the meet-up, and it had me thinking more about the AI Safety issues and ways to solve them.

The topic of AI Safety issues came up, and the people I was conversing with were mentioning the issues of regulation and social impacts. I discussed Marc Andreeson’s AI Optimist perspective and the challenges with limiting open-source AI models. Limiting AI might limit some malicious uses of AI but will slow down the development of mostly beneficial uses of AI as well. With AI 95% for the good of humanity, freedom is better than lack of innovation.

But then I said, “You will probably disagree with this, but” and blurted out:

“AI Safety is a Technical Issue”

Now, I’d like to explain what I meant, why it’s true, why social versus technical is a useful distinction, and how this can help solve AI Safety as a problem.

What I meant: AI models and AI applications are engineered systems built to performing to certain expectations over a broad range of possible use case scenarios. AI Safety is a technical issue in the sense that getting an AI model to perform to expectations and desirable outcomes is an engineering problem, as it is with any engineered system.

As engineered systems, AI Safety can be viewed through the same engineering lens as bridge safety, aircraft safety, and industrial safety. Many engineering fields concern themselves with performance, reliability, verification, etc.

When we talk about AI Safety we need to breakdown what are the possible ways in which AI can cause dangerous or damaging effects. There are really three areas of concern in which AI can cause harm:

AI Reliability: Is the AI doing what the user asks? In the worst case, an LLM goes ‘off the rails’ and returns hallucinations, lies, and gibberish; or the self-driving AI car fails to identify a pedestrian or another vehicle and causes an accident.
AI Alignment: In AI Alignment, you want to ensure the AI is aligned to good values and uses only. Allowing misuse of AI with lack of controls could be considered a failure in Alignment. GPT-4 won’t allow you to ask about making poisons or biological weapons, that’s not aligned with reasonable human values; however, some uncensored AI models will.
AI Security: Can a malicious user override or hack around AI safety and alignment guardrails to create a toxic or undesired output? An example of this is jail-breaking AI models as well as prompt injections. This the analogy to internet cyber-security.

All of these AI safety and reliability concerns can be reduced through an engineering approach.

With AI model’s generality, comes unpredictability, and from that comes potential unreliability and difficulties in verifying correctness in all use-cases. It makes AI Safety a hard problem.

AI Safety: An Engineering Approach (*)

How do you stop the worst case outcomes of AI without throwing the baby out with the bath water? What would be a rational engineering approach to AI safety took like?

As an analogy, consider what you would do if you wanted to create the safest automobiles. You would put more effort into designing better airbags, seatbelts, crumple zones, and other safety features than into developing a more efficient or powerful engine. The result of this engineering emphasis would be obvious: a safer car.

Automobiles are far better today than they were 40 years ago, partly because of many safety innovations developed in that time: Anti-lock braking system (ABS); traction control; electronic stability control (ESC), etc. Now, AI-enabled features are making cars even safer, such as lane departure and collision warning and avoidance systems.

The path to AI self-driving cars is through massively improved AI reliability and AI safety. By the time we get to AI self-driving cars, AI self-driving will be far safer than human drivers. So why would it not be the same with an AI-enabled loan decision, or an AI-based service call response?

OpenAI sees it that way. They are taking AI safety very seriously, stating they will put as much emphasis on AI safety as they do on AI performance. To that end, OpenAI created a Red Team Network and are leaning on external users to help test new models:

Over the past few years, our red teaming efforts have grown from a focus on internal adversarial testing at OpenAI, to working with a cohort of external experts to help develop domain specific taxonomies of risk and evaluating possibly harmful capabilities in new systems.
The OpenAI Red Teaming Network is a community of trusted and experienced experts that can help to inform our risk assessment and mitigation efforts more broadly, rather than one-off engagements and selection processes prior to major model deployments.

Engineering and Verifying Safe AI Models

The use of Red Teams is both useful and important, but it might not be enough. Some concepts we might see applied to AI Safety are testability and defect reduction.

Testability: In the semi-conductor world, the need to test IC chips to ensure they are tested leads to incorporating ‘hooks’ in the IC to enable it to be tested. AI models might need the same thing, i.e., some hooks in the models to obtain observability and transparency. Wouldn’t it be helpful to eliminate a risk of certain toxic output but simply erasing the knowledge needed to generate it?

Six Sigma and defect reduction: Six sigma is about improving the quality of manufacturing, design and business results by reducing all of the components and sources of defects within that manufacturing or business process. In manufacturing, the goal is producing every part to exact specification.

In AI, the problem is different; it is about reliability to expected desired behaviors across a broad range of possible prompts in different domains. Yet still the idea of defectivity levels that are below catastrophic holds promise in reducing risks of catastrophic events. Improved AI reliability will make AI

Investing more engineering effort into AI reliability, alignment, and security will result in safer AI systems:

Secure AI systems will be less likely to be misused by malicious actors.
Increased AI reliability will reduce the risk of accidental AI malfunctions.
AI alignment controls will make AI models less likely to be abused or abusive.

The Technical and the Social

When I became an engineering manager I learned one of the first lessons of engineering management: People problems tend to be more intractable difficult and impossible to solve than technical problems. The horrific terror attack last week reminds us that our fellow human beings can be as dangerous as any of our inventions.

Is AI a technical and a social challenge? Certainly, in the sense that AI has huge social impact, any issues arising from using AI will be a social problem. But we should not confuse AI with humans and thus fail to see the underlying engineering challenge inherent in building AI systems. This point is that looking at root causes and principle solutions for AI leads to treating AI’s problems and solutions as technical.

This also means that AI dooming, while it may be helpful in highlighting the challenge, is mis-placed. AI is not inherently dangerous, unsafe, etc. it’s how we build it that matters. This is true in particular in the ‘existential risk’ scenarios.

This in turn means the useful approach is to lean in on AI Safety as a technical concern and engineering discipline. Government can play a useful role in funding AI Safety research, so that AI capabilities don’t race ahead of AI reliability.

We need regulatory approvals to ensure that self-driving cars are safe before they get on the road and are used. We might need the same for some powerful AGI-and-above robots and AI models. But regulatory agencies don't invent the technologies that solve the AI Safety problem, technologists do. Ultimately, AI safety will come from technologists creating engineering solutions for it.

Regulators, social commentators and professional worriers need to give the AI nerds the space to come up with much more reliable, safe and secure AI systems.

If they do, perhaps technologists can develop AI that isn’t just 95% for the good and 5% for the bad, but it achieves Six Sigma quality and safety levels, “five nines” or 99.999% for the good and only 0.001% for the bad.

(*) Postscript

I wouldn’t be surprised if a book titled “AI Safety: An Engineering Approach” comes out in the next few years. It would be very much needed!

AI Changes Everything

Discussion about this post