AI Agents 2024 Pt 2 - Autogen Studio
Autogen, TaskWeaver, and Autogen Studio 2.0 to build AI Agent Swarms
A Swarm of AI Agents
Artificial Intelligence (AI) agents are software entities designed to perform tasks autonomously. In previous articles and weekly AI updates, we have explored various aspects of AI agents, most notably in our recent articleWe have written about AI Agents in several articles and many Weekly AI updates, most recently in “AI Agents for 2024 Pt 1 - ChatDev,” focused on the ChatDev AI agent framework. This article follows up on that and will focus on Autogen, a highly flexible and general multi-agent framework where multiple agents converse with each other to solve tasks.
AI agents are improving at a fast pace, and there are many different projects, frameworks and research efforts in this space, with new developments popping up every week. What sets Autogen apart as an AI agent framework is its flexibility and generality, making it a useful framework for a wide range of AI Agent use cases. It also benefits by being an open-source project, initiated and driven by Microsoft Research, with a strong community behind it.
Microsoft Research published their first Autogen paper “AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation” in August, and then followed it up with releasing Autogen as a code repo on Github on September 2023. Autogen as a project was a spinoff of the FLAML (Fast Library for Automated Machine Learning) project.
Since then, they have actively maintained the Autogen Github repo and have a website with documentation as well as a Discord community. It’s a well-supported project in active development.
Autogen Framework Features
First-generation AI agents built a wrapper around a single LLM API to build a single agent. Second-generation AI agent frameworks use multiple AI agents that collaborate together to execute more complex tasks and workflows. AutoGen provides a multi-agent conversation framework as a high-level abstraction, so developers can build more complex multi-agent workflows and applications.
Key features and components in Autogen:
Models and LLM inference: Autogen relies on third party API endpoints for LLM inference. It can work with any model (self-hosted or local) that is accessible through an inference server compatible with OpenAI Chat Completions API. AutoGen supports enhanced LLM inference APIs, which can be used to improve inference performance and reduce cost.
Agent types: An agent is an autonomous unit designed to complete tasks and make decisions. In multi-agent systems, you can create different agents for different tasks and roles, each with specific skills and persona, a specialists useful within a broader context. One key feature of agents is that you can arm them with ‘skills’ which are capabilities to execute code or run different tools.
Agent collaboration: Autogen supports “diverse conversation patterns” for complex workflows. The wide range of conversation patterns, as shown above, between various agent types enables a broad range of application patterns, as shown below.
Taskweaver - Tools for AI Agents
Since the initial Autogen release, there have made several updates and releases, specifically Autogen Studio, a browser-based UI for Autogen, and now Autogen Studio 2.0. Microsoft research has also released a related AI agent project called TaskWeaver. We will discuss Studio below, but first will share more about TaskWeaver’s features and architecture.
TaskWeaver is a “code-first agent framework” which brings tool use to agents, and is well-suited for automating and executing data analytics tasks. TaskWeaver was explained in a paper titled “TaskWeaver: A Code-First Agent Framework” in December, and comes with its own Github repo and website.
How TaskWeaver works: The framework takes in user requests that it converts into coded snippets in a planner LLM that develops a plan and coordinates executing sub-tasks. It then executes code execution and data analytics sub-tasks in a stateful manner through a variety of plugins that it treats as callable functions. It uses self-reflection to review results and make corrections as needed to construct a final output. The architecture is shown below.
The key takeaway from TaskWeaver is that, by using code as language from an LLM, anything you can make a function call can become a skill or subroutine for an AI Agent.
Using code as language makes any software subroutine, API or plug-in accessible to LLMs, so you can create more powerful AI agents by equipping them with code-driven skills. This capability is not unique to TaskWeaver; it’s a part of Autogen as well.
Using Autogen Studio 2.0
The Autogen developers have helpfully created Autogen Studio, an browser-based interface for building and running Autogen AI agent swarms, and it is the easiest and most accessible path to using Autogen.
They say Autogen Studio is “suitable for developers of all skill levels” and it seems to live up to that; it’s not difficult to install and use. Getting Autogen Studio running on your local machine is quite simple, especially if you have any experience with Python. Using Anaconda environment and Python 3.11, these commands will get you started in Linux, other systems will be similar:
$ conda create -n autogenenv python=3.11
$ conda activate autogenenv
$ pip install autogenstudio
$ export OPENAI_API_KEY=sk-M8jHFmq4fLevG... #You need your OpenAI API key here
$ autogenstudio ui --port 8081
That last command kills are tools you can give your AI agents
The Studio interface has three sections at the top level interface: Build, playground, and gallery. The Build section lets you build four things:
Skills - These are python code functions that can be executed from an LLM-based Agent as plug-ins to get further information or complete a task. Studio comes pre-loaded with Arxiv search, generate_images
Models - This sets up which LLM you can access. For an AI model accessible via an API, you can just set up the endpoint and API key.
Agents - An Agent is defined as an LLM-based worker/assistant that can perform a set of tasks. You give it an LLM and system message, define when to request human input, and give it the skills (plug-ins) it has access to.
Workflows - You create a workflow by setting up collaborations of agents and how they communicate with each-other. The interface from the human to the workflow agents is via text interface, just like an LLM chatbot.
Developing a full AI agent set up from scratch can be daunting, but once you hook up an AI model API to your Agent, you can run the “General Agent Workflow” directly to try out the system.
The Playground section is where you run the workflows. I used this prompt to try out the General Workflow as a research gopher:
Please find Arxiv papers written in January 2024 on the topic of Artificial Intelligence with the word "Agent" in the title
It came back with answers that were older than the dates requested. It became clear the request code for date-constrained Arxiv searches had a code error (the datetime defined in Python lacked a time-zone offset). So I tried to get that fixed with: “Please write the Python code to execute a search on Arxiv papers with specific publication date.”
While I was expecting some Python code to plug in to a corrected ‘skill’, it actually both corrected and executed the updated query to success and returned a corrected result.
The code execution was successful, and it has returned two papers related to Artificial Intelligence and AI agents that were published in February 2024:
Title: A call for embodied AI
Published Date: 2024-02-06 09:11:20+00:00
Authors: Giuseppe Paolo, Jonas Gonzalez-Billandon, Balázs Kégl
Link: http://arxiv.org/pdf/2402.03824v1
Title: Toward Human-AI Alignment in Large-Scale Multi-Player Games
Published Date: 2024-02-05 22:55:33+00:00
Authors: Sugandha Sharma, Guy Davidson, Khimya Khetarpal, Anssi Kanervisto, Udit Arora, Katja Hofmann, Ida Momennejad
Link: http://arxiv.org/pdf/2402.03575v1
These papers match the criteria of being on the topic of Artificial Intelligence with relevance to AI agents and being published in February 2024. If you need more papers or further assistance, please let me know.
Impressively, the agent recognized then corrected, reran, and was able to get to success. The ability to correct errors and recover is vital to robust and reliable AI agents, so this was encouraging. Alas, other executions and commands were unable to properly recover from errors as well. It’s still hit-or-miss, but I was pleasantly surprised with the on-target responses.
The final step in using Studio: If you like a particular Playground interaction, you can publish it to the Gallery section for reference later.
Studio is easier and more accessible than using Autogen python library directly, but there is a tradeoff: Some features in the Autogen library are not directly exposed in the Studio interface. For example, Studio only supports simple agent workflows.
Any developer or hacker can treat Autogen Studio like a kid’s chemistry set - just play around with it and see what happens. Run it with local models or try different models; build skills (via python code); automate some typical computer tasks.
Summary
Autogen is a capable multi-agent AI framework that can be used to develop a broad range of AI agent-based applications and workflows. Autogen Studio is the smoothest open source AI agent framework interface available right now; it gives easy access to tap into Autogen’s capabilities. Developers of Autogen Studio promise ongoing improvements.
AI agent frameworks still feel halfway to the full-blown productive workflow tools we want them to be, but they keep getting better. I’m encouraged that we are seeing agent reliability moving in the right direction.
If you are not a developer or just want a zero-code solution, fear not; such solutions are coming. In a followup, we will present a recently released ‘consumer-ready’ AI agent solution from MultiOn. If 2023 was the year of LLMs, 2024 will be the year of multi-modal AI models and AI agents.