AI Agents: Generate Outcomes – not Text

In the realm of technology, a groundbreaking transformation is unfolding. It’s not just about the text generation capabilities of Generative AI (GenAI) anymore; it’s about generating actions and creating intelligent, autonomous agents. In the next 1-2 years, the way we interact with our devices will undergo a radical change. This article explores the evolving world of AI Agents and their promising future.

Beyond Text Generation: The Goal-Oriented GenAI

Lately, GenAI has been lauded for its text and image generation prowess. However, its true potential lies in generating outcomes, an integral feature in the development of autonomous agents. These agents use the power of GenAI not just to converse but to perform tasks, make decisions, and interact with the world in a meaningful way. But an AI Agent can not only execute a task, it can execute complete workflows and generate outcomes.

An AI Agent can execute complete workflows and generate outcomes towards its goal!

Imagine not having to juggle between different apps for different tasks. In the near future, you’ll simply converse with your device in everyday language, expressing your needs (check Rabbit r1 for an early example). The software, depending on your shared information, will offer personalized outcomes, understanding your life intricately and taking actions for you. This AI-powered personal assistant, adept in natural language processing and task execution, is what we call an ‘agent’.

From Reasoning LLMs to Action-Oriented AI Agents

Today’s foundational Large Language Models (LLMs), like GPT-4, are stepping stones. With their growing reasoning and planning capabilities, they are evolving into AI Agents. Some have developed into Large Action Models, closely mirroring LLMs, but with an added focus on action and interaction.

Defining the AI Agent

An AI Agent is an entity capable of perceiving its environment and taking action to achieve objectives. These agents are autonomous, possess sensory capabilities, make decisions, and can alter their environment.

An autonomous agent is a system that perceives its surroundings and acts over time to achieve its own goals, thus influencing its future interactions with the environment.

The Building Blocks of AI Agents

The core components of AI Agents include a rational architecture design (with profiling, memory, planning, and action modules), memory storage for experience accumulation, planning modules for task decomposition, and action modules for decision execution. In trying to mimic the human brain, these systems use LLM models, essentially imitating human brain and equipping it with skills to take action and generate outcomes.

An AI Agent has a personality and a goal, it has short- and long-term memory, can reason, do planning and take decisions towards its goal.
And it can interact with its environment using tools.

The Action Module is what enables the Agent to actually take action, it would typically be comprised by a number of available tools. Today these are purely digital tools, like web-browsing, making api-calls or using hacking tools from Kali Linux. But just by opening up access to APIs for industry robots, the physical world has been entered…

Single-agent vs. Multi-agent Systems

The distinction between single-agent and multi-agent systems is significant. A single-agent system attempts to mimic a simple human brain handling tasks. In contrast, multi-agent systems are akin to a team of brains. This can lead to social interactions between agents or may often include correction-agents that check and correct outputs for another agent. For example, in code generation, an agent might be paired with another that runs the code, checks its function, and returns errors for correction.

A common challenge with LLMs are ‘hallucinations’ in responses, where the LLM gives a false answer that sounds plausible. I would argue this is how the human brain works as well, so hallucination should not be a surprise. Humans have many thoughts they (luckily) don’t speak out loud, but they filter their thoughts before speaking. If we want to build great AI Agents, we have to build in such filters and correction-agents, which essentially enforces use of Multi-Agent-Systems.

Notable Implementations of AI Agents

Among the plethora of AI agents, BabyAGI and AutoGPT stand out with distinct functionalities.

AutoGPT plans one task at the time, executes it and based on the results creates the next task. This can sometimes lead to infinite loops. BabyAGI on the other hand begins by setting up an initial plan that it maintains over the course of its execution by reacting on its input from the environment and updating and reprioritising the plan. There are many other examples of AI Agents, in particular well researched by Cheng et.al  and by Wang et.al.

LangChain is a remarkable project, an open-source framework that simplifies the construction of autonomous agents, making it easier for developers to build and customize their AI solutions. Autogen is a easy-to-use framework from Microsoft for building multi-agent-systems based upon OpenAI LLMs.

The Risks Associated with AI Agents

Despite their potential, AI Agents pose risks such as data privacy concerns, ethical dilemmas, reliance issues, complexity in control, reliability, and socio-economic impacts. As these agents become increasingly integrated into our lives, it’s crucial to approach their development and deployment with a sense of responsibility and awareness of their implications. The “Autonomous” part of AI Agents is probably the most scary aspect of AI and the reason for numerous awesome books on the topic (I strongly recommend Life 3.0 by Max Tegmark here).

Please be careful: Autonomous Agents can be extremely dangerous!

Final Thoughts: A New Beginning

The journey of AI Agents is not concluding; it’s just beginning. As these agents become more sophisticated, their integration into our daily lives promises a future where technology is not just a tool but a personalized assistant, enhancing our lives in ways we’ve only just begun to imagine. The era of AI Agents is here, and it’s redefining our interaction with technology, opening doors to endless possibilities and a future brimming with potential.