AI Agents for 2024 Pt1 - ChatDev
ChatDev uses a team of AI Agents to automate building software apps
AI Agents - An Update
What are the best AI Agents, tools and frameworks right now? What can AI Agents do for you?
While the idea of AI Agents as autonomous AI tools has been around for many years, they remained more idea than reality. GPT-4’s release in March 2023 opened the door to a very capable LLM foundation for autonomous AI agents.
By repeatedly querying GPT-4 to do all the underlying work, an autonomous AI agent became much more accessible. Such an agent could do more than what might be possible inside a single GPT-4 prompt.
An ecosystem quickly developed where many AI agent frameworks were published on Github, several such as AutoGPT going viral. They typically had the same conceptual architecture components: planning and priority-setting of sub-tasks; execution by calling an AI model API to execute sub-tasks; memory of the context and task stored in a database.
We wrote about these AI agents last summer, in the articles “Using AI Agents pt 1: BabyAGI, AgentGPT and Beyond” and “Using AI Agents Pt 2: AutoGPT and SuperAGI.”
From the get-go, there was a lot of promise to AI agents, but they also had limitations in reliability. The goal of an AI agent is automating specific tasks, and the limiting factor in fully automating tasks is 100% reliable completion.
At the time, I mentioned:
Looking ahead to when the reliability kinks are worked out, more tools and plugins are brought into the AI agents, and the underlying AI models are improved, we can expect AI agents to address an ever-growing set of real tasks in the workplace and in our daily lives.
We have since then indeed seen many strides in tools, plug-ins, models and agent frameworks, bringing them closer to making them real tools for automation. New open-source AI models have gotten better, in particular for coding. AI Agent frameworks have added capabilities for planning, tool use, reflection and agent personas.
This is why I believe we will see these projects go from proof-of-concept to being very useful tools. Four new AI Agent frameworks have also been released that show more capabilities, which will discuss in this series: ChatDev, AutoGen, TaskWeaver, and CrewAI.
ChatDev - Building AI Agents for Software Development
ChatDev is an multi-AI agent framework that can automate application software development, creating software from natural language ideas and input.
ChatDev does this by using defined AI Agents with agent personas for the different tasks and roles needed to fully create and test the coding and documentation for a piece of software, including a Chief Executive Officer (CEO), Chief Product Officer (CPO), Chief Technology Officer (CTO), programmer, reviewer, tester, and art designer.
ChatDev is available to download from GitHub, and the setup of ChatDev isn’t difficult for those familiar with “git clone” commands and running applications.
I successfully ran through it developing the Gomoku (5 in a row) game, one of its default games. One neat feature is that it is able to replay all that it did, so you can see how the agents - “CEO” and devs and testers, etc. - interact to create the game. Through this, you can see it invoke calls to the AI model (in this case GPT-35 turbo) to step-by-step create the game.
I next asked ChatDev to make a Pomodoro timer app, and this time the app it created didn’t quite work. I took the code and asked ChatGPT (GPT-4) itself to debug the code and get it working, which it was able to do. I also could get ChatGPT to implement a similar quite simple Pomodoro GUI app.
One lesson from this is the AI agent is not stronger than the AI model it is based on; a single iteration of GPT-4 via ChatGPT interface did better than multiple tries of GPT-3.5 in the ChatDev AI agent. You can use GPT-4 in ChatDev and that will likely expand it’s capabilities and reduce errors, but it’s doesn’t look like a ChatDev AI agent team can produce more applications than a human prompting ChatGPT directly.
ChatDev is a great proof-of-concept that can generate some simple applications out-of-the-box, and has a great framework and structure for filling out a full software application, not just code, but documentation and more.
However, there is still a need for human guidance to get specifics right in more complex applications, where the human developer calls upon AI to assist, while providing more direct guidance. CoPilot is currently the right mode of using AI for real-world software development - human + AI beats either human or AI alone.
AI Agent Swarms
First-generation AI Agents built a wrapper around an LLM API to build a single agent. Second-generation AI Agent frameworks like ChatDev started to incorporate using multiple AI agents with different roles and personas to solve a task.
If one AI agent is an autonomous unit programmed to make decisions and perform tasks, then you can create different types of agents for different tasks and roles, each with specific skills and a particular job to do. You can complete more complex tasks by having specific AI agents with defined roles and skills do different sub-tasks and communicate with one another to complete the whole project.
AI isn’t just replacing a single human role, but potentially automating whole work teams and their tasks. To do so, each AI agent is a member of a team, and that team - or ‘swarm’ - performs specific tasks and communicates just as a team of humans would to complete complex projects.
In follow-up articles, we will discuss AI agent frameworks - AutoGen and Crew AI - that pull together such teams or ‘swarms’ of AI agents to solve complex tasks.