AI as the new Operating System
AI is a new paradigm for software execution, user experience, and system management
Looking at LLMs as chatbots is the same as looking at early computers as calculators. We're seeing an emergence of a whole new computing paradigm, and it is very early. - Andrej Karpathy, OpenAI
AI as OS
Andrej Karpathy dropped a Deep Thought tweet on X recently that deserves some further elaboration and discussion:
“a more complete picture is emerging of LLMs not as a chatbot, but the kernel process of a new Operating System. E.g. today it orchestrates:
Input & Output across modalities (text, audio, vision)
Code interpreter, ability to write & run programs
Browser / internet access
Embeddings database for files and internal memory storage & retrieval
Complexity Simplified with AI
All complex systems, including complex systems, have a structure to them, an architecture. The architecture of complex information systems have a layered architecture, where complex patterns are supported by lower layers of specific implementations.
For the PC and for computers in general, the lowest layer is the hardware, with an Operating System (OS) controlling the hardware, and an application layer that gains access to hardware capabilities through the OS. The OS acts as an abstraction layer to simplify how to code applications getting access to hardware capabilities. The OS also manages how applications share the resources for any application.
So what does it mean to analogize LLMs as a “kernel process” of an OS? It means that the LLM is acting as an orchestrator of other applications and tools, and also as an abstraction layer that one can access via the LLM. Let’s go through those four orchestration capabilities of User interface; process execution; communications; and memory management:
LLM as User Interface
The LLM can understand input and output across modalities. With voice input and voice output via AI speech recognition and synthesis, speech-and-voice-enabled LLMs can displace prior user interfaces. As those modalities expand, we are replacing the terminology of LLM with LMM - Large Multi-modal Model - that can interface across images, video, sound and speech, as well as text.
Massive menus of features on complex software apps can be replaced by a hand’s-free audio request and response with the LLM as interpreter. The new speech interface to ChatGPT is the interface of the future. This will make these complex systems much easier to use.
LLM as Execution Engine
The LLM is a general execution engine that can give valid responses across a range of requests and queries. Wit the ability to generate code, there is a whole new paradigm possible. The ChatGPT plug-in Code interpreter gives ability to write and run programs. Thus, the LLM can on-the-fly construct a custom software component and run it to solve a problem.
LLM as Communications Manager
Just as the LLM can control code execution, this can extend to controlling the browser and access to the internet. This is another example of plug-ins being made available to LLMs
LLM as Memory Manager
Here the analogy might break down, as LLMs are not directing this part of the AI ecosystem. For example, in Retrieval Augmented Generation (RAG), memory components such as vector databases, and the queries on them are managed outside the LLM itself. AI agents and AI applications run LLM execution loops as a subroutine, feeding it the appropriate managed relevant memory into the LLM context.
RAG is currently just one solution to the problem of getting models access to memory, and we may see a future where access to memory is baked into the AI model. But it’s more generally applicable to see each iterative execution of the LLM as a subroutine.
AI Is The New Computing Paradigm
The Foundation AI model is the foundation of a new computing paradigm. Even though “LLM as an OS kernel process” analogy only partly fits, it’s clear that the AI ecosystem as a whole fits the OS analogy, because everything in the software stack is disrupted by artificial intelligence and some part of the AI stack touches every part of every software application.
AI is a new paradigm for software execution:
User interfaces are AI driven; AI acts as the intermediary, performing speech recognition, reasoning on natural language input, and more.
Software components are built and executed on the fly by AI processes. This is used in data analytics, but custom software could also solve specific reasoning tasks.
AI applications build major capabilities on the basis of calling LLMs as a subroutine to solve some tasks and generate responses from prompts. For example, graphics editing tools will lean in on generative AI for image manipulation and generation tasks.
As we have discussed in recent articles on Intel and NVidia, AI is remaking chips and the hardware stack: The GPU, the CPU, the PC, the data center.
AI is popping up in many software and technology company announcements these days, and it’s not just sprinkling AI marketing fairy dust on existing features, but adapting products and features to this new paradigm shift.
Another way to look at this shift is Software 3.0: Swyx has called this new paradigm of AI-driven software “Software 3.0,” where the LLM itself encoding the decisions that in prior iterations of software would be hard-coded into the software code itself. We’ll discuss this concept further in a followup article.
The AI OS
Yet another way of looking at this is to ask how AI will change the Operating Systems we have. What can AI do to change the OS? What would an AI Operating System (AI OS) look like?
An operating system that integrates AI for computing and system management tasks would make the OS more adaptive, customizable, optimizable and flexible:
Intelligent Task and Resource Management: AI algorithms would efficiently prioritize tasks and manage system resources such as CPU, memory, and storage, based on user’s needs.
Adaptive and Natural User Interface: The AI-driven user interface starts with natural language as its communication medium, either via chat or via speech-to-text input. AI-powered chatbots or virtual assistants would be an integral part of the system. Further, AI will personalize the user experience and adapt to individual user preferences and behavior, by learning from user interactions. AI is the new UX.
Intelligent System Maintenance and Security: AI would be used to enhance security measures, identifying and mitigating potential threats in real-time. AI might also predict hardware failures and initiate maintenance or backup procedures before system failures occur.
Data Analytics, Insights and Learning: AI-based data analytics could provide insights into user behavior, system performance, and other relevant metrics. From this, the OS would learn and adapt to changing user needs and system requirements, optimizing its performance over time.
ML and AI Integration: Last but not least, just as the traditional OS is a layer for serving up software applications, the AI OS would be a layer for serving up ML and AI models. This includes running ML frameworks and AI model inference tasks as part of the system. Managing AI-specific workloads would be a part of the AI OS role.
Autonomous Systems
In the short term, we are seeing generative AI as an added feature in the OS. For example, Microsoft rolling out “Co-pilot” features in their Windows OS.
Longer term, as Foundation AI models advance and go multi-modal and embodied, the AI OS will converge to that of advanced autonomous agents, blurring the lines between system, OS and application. This autonomous AI OS will be the software stack for self-driving cars and autonomous robots.
Combining these into a future AI system takes us in the direction of ‘Jarvis’-style intelligent systems: They have a natural language interface that understands us and is most helpful to us; they use AI models to adapt and respond; they sense us and their environment; their embedded AI models tap into vast amounts of knowledge and memory, then reason on that and respond; they use iterative AI model loops with correction, planning, and ‘chain-of-thought’ to add reliability and complex behavior; they interact at the human level we most need, speaking with us and generating information and creation in many forms and modalities.
This ultimate AI Operating System lands us at AGI. AGI itself needs to be a complete autonomous agent system, and the whole system would be driven by multiple layers and types of AI capability, with OS and software application definitions all redefined in AI terms as both direct software and “AI Software 3.0.”
Postscript. The ‘AI PC’
We have to go back to the internet era to find a time when the PC changed as dramatically. Pre-internet, the core use case of the PC was running stand-alone software applications. In the internet era, the browser became the interface of choice, first to access information, but then later to run online software via Software-as-a-Service.
In the AI era, it’s intelligence on tap that will change things. While our definition of AGI is an autonomous system that could be embodied in a number of ways, there is and will be a place for the PC or laptop as a main intelligent device.
An AI-enabled PC will have those elements of AI OS described above: natural language interfaces, adaptive and intelligent system management with AI, access to AI models via a kind of AI model “app store”, and a learning self-improving system. It will have the hardware, such as neural inference engine chips to efficiently run AI models to make it all work.