My definition of an AI Agent (originally shared on Twitter):
- takes directions (i.e. human-provided objective)
- has multiple tools it can use (i.e. call api, retrieve context, etc.)
- autonomously reasons to decide how & when to use its tools to accomplish directions
*All 3 components must be present
Let's break down each component.
Directions
The objective or goal that the agent is tasked with accomplishing. This could be anything from a discrete task like "write a blog post" to an open-ended objective like "optimize a company's marketing strategy".
The important thing to note is that you're providing an objective, in the declarative sense. It's up to the agent to use its autonomous reasoning to decide how to accomplish the objective. You may provide guidelines or even specific requirements, but the agent must be able to reason through the best way to accomplish the objective.
Quick interruption to define imperative vs declarative (according to gpt-4o):
Imperative/Procedural
This is where you specify a sequence of steps or instructions. Each step is defined in order, like "do A, then B, then C," and the program follows that path exactly.
Declarative
This is where you specify what you want as an outcome without defining the specific steps to get there. Here, saying "I want C" implies that the system will figure out that A and B are needed to achieve C.
I've primarily focused on building declarative agents. The most significant benefits of declarative agents are two-fold:
- The agent is constantly reasoning and reacting to the results of its tool calls, to give itself the best possible chance of accomplishing the objective.
- As the underlying model improves, the agent continually improves, and is even capable of accomplishing more complex objectives.
Autonomous Reasoning
Autonomous reasoning is the critical ingredient because it represents the difference between an agent and other types of LLM applications.
Specifically, autonomous reasoning refers to the ability of the agent to decide what to do next, at each step.
Let's consider an example to illustrate the difference between an autonomous reasoning agent and a series of prompt, chained together in a sequence. We'll start with the example of prompt chaining:
Prompt Chaining
User: "Write a query to calculate the total revenue for the company last year."
LLM: "Sure! Here's a tool call with the query:
SELECT SUM(revenue) FROM sales WHERE date >= '2023-01-01' AND date <= '2023-12-31';"
Tool Result: "ERROR: Table sales does not exist. Did you mean sales_2023?"
LLM: "Unfortunately, the table sales does not exist.
I do not have a way to calculate the total revenue for the company last year.
You may need to use sales_2023 instead."
Now, let's consider the same example with an autonomous reasoning agent:
Autonomous Reasoning Agent
User: "Write a query to calculate the total revenue for the company last year."
LLM: "Sure! Here's a tool call with the query:
SELECT SUM(revenue) FROM sales WHERE date >= '2023-01-01' AND date <= '2023-12-31';"
Tool Result: "ERROR: Table sales does not exist. Did you mean sales_2023?"
LLM: "I encountered an error with my query. I'll try using sales_2023 in my tool call instead:
SELECT SUM(revenue) FROM sales_2023 WHERE date >= '2023-01-01' AND date <= '2023-12-31';"
Tool Result: "{"SUM_REVENUE": 1000000}"
LLM: "The total revenue for the company last year was $1,000,000."
Reacting to tool call errors is just one example of autonomous reasoning, but it's a powerful one.
Tools
The obvious need for tools is that they allow the agent to take actions that go beyond the back-and-forth of a chat conversation. Without them, the agent can only talk.
However, tools are also critical to unlocking reasoning because they allow the agent to "think" over the course of multiple steps.
As next-token-prediction machines, LLMs are very much "boxed in" by their initial response. Calling a tool and then starting a new chat completion request with the results of the tool call is an opportunity to adjust, rethink, and problem solve about how to ultimately accomplish the objective.
There is a point of saturation though, where the agent has too many tools to choose between OR accomplishing the objective requires too many sequential tool calls.
With current state-of-the-art models like Claude 3.5 Sonnet or GPT-4o, I've found that somewhere around 10 distinct tools and tasks that require around 10 steps are the maximums that will be handled well.
Your mileage will absolutely vary, based on the model, the complexity of the tool call arguments, and the nature of the objectives you want the agent to accomplish.
OK, so how does this work in practice?
The core concept of an autonomous reasoning agent in code is a while loop. It sounds stupid, but it's true.
The while loop is triggered by a user request (note: this could be an actual user message, a scheduled task, etc.) and kicks off a loop of:
- Calling the LLM with the current state of the conversation
- Reacting to the LLM's tool call (unless the LLM decides to stop and return a final response)
- Calling the appropriate tool with the tool call parameters
- Appending the tool call result to the conversation
- Restarting the loop with the new state of the conversation
Here's a simple example of how you might implement this in Python:
def run_agent(self, messages):
steps = 0 # Initialize step counter
while steps < self.max_steps: # Dont run forever
steps += 1 # Increment counter
response = self.client.messages.create(
model=self.model,
max_tokens=8192,
system=self.system_prompt,
tools=self.tools,
messages=messages,
)
print(response)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
print(f"Final Response: {response.content}")
break
# Process tool uses and get results
tool_results = self.process_tool_uses(response.content)
# Append the tool results to messages
messages.append({"role": "user", "content": tool_results})
if steps >= self.max_steps: # Add warning if max steps reached
print(f"Warning: Agent reached maximum steps limit of {self.max_steps}")
return {"messages": messages}
Conclusion
Defining what is and isn't an agent is a bit of a moving target, but I'm optimistic this one will last for at least a little while.
Anthropic and OpenAI are both working on specific "Agent" apps that fit this definition, with general-purpose tools that allow the AI to browse and interact with your computer.
I'm excited to see where this goes, and would love to hear your own definitions of what an agent is.