This blog post is based on the AI Agents section of Humans of the Data Sphere Issue #6 with an extra interview with Sean Falconer at the end.
Two interesting blog posts about AI agents have caught my attention over the last few weeks.
Anthropic wrote Building Effective Agents.
Chip Huyen wrote Agents.
Ethan Mollick has also written some excellent blog posts recently:
In this post, I’ll explore what some of the leading experts in this area are saying about AI agents and the challenges ahead.
First, what is an AI agent?
At the most abstract level, Chip Huyen defines an agent in her Agents blog post:
An agent is anything that can perceive its environment and act upon that environment. Artificial Intelligence: A Modern Approach (1995) defines an agent as anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators. This means that an agent is characterized by the environment it operates in and the set of actions it can perform.
Taking actions is a defining characteristic of an AI agent compared to an LLM that only provides textual or graphical responses to prompts. Ethan Mollick notes that much of the work in modernity is digital and something that an AI could plausibly do.
The digital world in which most knowledge work is done involves using a computer—navigating websites, filling forms, and completing transactions. Modern AI systems can now perform these same tasks, effectively automating what was previously human-only work. This capability extends beyond simple automation to include qualitative assessment and problem identification.
Anthropic offer another definition of an agent in their Building Effective Agents blog post:
"Agent" can be defined in several ways. Some customers define agents as fully autonomous systems that operate independently over extended periods, using various tools to accomplish complex tasks. Others use the term to describe more prescriptive implementations that follow predefined workflows. At Anthropic, we categorize all these variations as agentic systems, but draw an important architectural distinction between workflows and agents:
* Workflows are systems where LLMs and tools are orchestrated through predefined code paths.
* Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.
This seems like an important distinction to make. A workflow is a kind of static flow chart of branches and actions that constrain what the AI can do. It’s prescriptive, more predictable, but less flexible. A true AI agent, on the other hand, determines its control flow, giving it the freedom to plan and execute flexibly, but it comes with additional risk. Anthropic note that you should choose the simplest option possible, but when more complexity is needed, then a workflow or agent may be required:
…workflows offer predictability and consistency for well-defined tasks, whereas agents are the better option when flexibility and model-driven decision-making are needed at scale.
…agents can be used for open-ended problems where it’s difficult or impossible to predict the required number of steps, and where you can’t hardcode a fixed path.
In a practical sense, an AI agent is software like any other service that interacts with the world via APIs. The big difference between an agent and regular code is that in order to satisfy a goal, an agent relies on an LLM to decide on control flow, interpret the results of external calls, and decide on next steps. LLMs by themselves have several limitations, such as not directly being able to take actions, or limits to skillsets such as mathematics. AI agents supplement LLMs with tools to augment their capabilities. Chip Huyen classifies the tools into three categories:
Depending on the agent’s environment, there are many possible tools. Here are three categories of tools that you might want to consider: knowledge augmentation (i.e., context construction), capability extension, and tools that let your agent act upon its environment.
Chip uses web browsing as the canonical example of knowledge augmentation and math, calendar, timezone converters, and unit converters as canonical capability extensions.
To satisfy a goal, an agent must use a combination of:
Effective planning and reasoning.
The agent makes a plan of steps it needs to perform in order to satisfy the goal.
Accurate tool selection and execution.
The LLM may need to make API calls for information retrieval and/or maks changes or take actions in the real world.
Self-reflection and evaluation.
At every step, the agent should reflect on what it has planned and the results it has received to ensure it is still doing the right thing.
Reviewing the challenges of AI agents
AI agents are stochastic systems that add a new flavor of risk compared to deterministic systems that can more readily be modeled and comprehensively tested. Many agent systems require multiple intermediate steps to satisfy a goal, and errors in each step can compound, producing side effects that may be challenging to anticipate. This is one of the defining characteristics of AI agents that the agent designer must account for. In fact, accounting for all the failure modes of an agent is where the highest learning curve is found as well as most of the developmental cost.
As a distributed systems engineer, I’m probably more on the paranoid side of the risk-awareness spectrum. Through that lens, I see all manner of challenges to overcome when building AI agents:
Effective Planning:
AI agents must create plans that align with their goals while adapting to dynamic environments and incomplete information. Self-reflection and being able to change course may be necessary.
Plans may need to be evaluated to ensure that they are feasible, efficient, and contextually appropriate. Typically, an agent will be decomposed into multiple parts where planning, execution, and evaluation are carried out by separate components that collaborate.
There are a number of things that can go wrong in the planning phase:
The agent does not revise plans when new information contradicts initial assumptions.
The agent gets stuck in loops, revisiting the same steps repeatedly without making progress.
The agent sets inappropriate or harmful goals due to poorly defined prompts or objectives.
The agent achieves appropriate goals but creates harmful side effects in doing so.
Accurate Tool Selection and Usage:
Agents need to identify the correct tools (e.g., APIs, models) for a task and invoke them properly.
Common issues include:
Invoking the wrong tool. Or failing to consider multiple tools or approaches, leading to suboptimal performance.
Providing incorrect or incomplete inputs can create wrong or suboptimal results.
Hallucinating non-existent tools.
Failing to recognize when tool outputs indicate anomalies, errors, or limitations.
Reasoning and Decision-Making:
Agents may struggle to interpret the results of their actions or external tool outputs.
Errors in reasoning can lead to invalid conclusions, impacting subsequent actions (the compounding of errors). Some reasoning errors may result from forgetting critical context or information needed to make accurate decisions.
Failure Modes in Execution:
Agents can fail to execute actions correctly, leading to unintended consequences. The first challenge is detecting when actions are executed incorrectly and the second challenge is remediating such actions.
Difficulty handling edge cases or unexpected outcomes, or handling rare cases as general cases.
Monitoring and auditing agents may also be challenging. Not only detecting when things go wrong, but also detecting bias and justifying why certain actions were taken.
I could go on but you get the idea. A lot of experimentation and iteration will go into AI agent development and getting that last 20% of completeness and polish could be time-consuming. Chip Huyen covers much of this in her framing of AI agent development. The Anthropic post also steers you on the route of choosing the simplest agent, or no agent at all:
When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed. This might mean not building agentic systems at all.
…
Start with simple prompts, optimize them with comprehensive evaluation, and add multi-step agentic systems only when simpler solutions fall short. When implementing agents, we try to follow three core principles:
* Maintain simplicity in your agent's design.
* Prioritize transparency by explicitly showing the agent’s planning steps.
* Carefully craft your agent-computer interface (ACI) through thorough tool documentation and testing.
AI agents and agentic systems are an emerging practice, and I agree that 2025 will be the year of the AI agent, given the promise that AI agents hold and the rapid improvements in model capabilities.
However, with that said, I do have some serious concerns, and I believe there will be two constraining aspects of AI agents that present a challenge to widespread adoption:
Reliability. There are so many failure modes and even the mitigations are usually run by LLMs and therefore have their own failure modes. Errors compound, detection and mitigations themselves may not be highly reliable.
Cost. Agents may require multiple reasoning steps using the more powerful models. All this pushes up the cost. With higher costs come higher demands for the value proposition. Of course with the arrival of DeepSeek v3, maybe 2025 will also be the year of the more efficient LLM.
Chip Huyen noted in her post:
Compared to non-agent use cases, agents typically require more powerful models for two reasons:
Compound mistakes: an agent often needs to perform multiple steps to accomplish a task, and the overall accuracy decreases as the number of steps increases. If the model’s accuracy is 95% per step, over 10 steps, the accuracy will drop to 60%, and over 100 steps, the accuracy will be only 0.6%.
Higher stakes: with access to tools, agents are capable of performing more impactful tasks, but any failure could have more severe consequences.
As AI becomes more and more capable but with non-trivial associated costs, we may enter an age where cost efficiency is the primary decision maker about when to use AI vs when to use a human or not do the thing at all. If both AI and human workers can execute a digital task at similar levels of competence, then cost efficiency becomes the defining question. François Chollet made this point over the holiday period:
One very important thing to understand about the future: the economics of AI are about to change completely. We'll soon be in a world where you can turn test-time compute into competence -- for the first time in the history of software, marginal cost will become critical. Cost-efficiency will be the overarching measure guiding deployment decisions. How much are you willing to pay to solve X?
In this early phase, agents are likely best suited to narrow tasks that do not involve important actions such as bank transfers, costly purchases, and actions that cannot be undone without cost or negative impact. Ethan Mollick noted that:
Narrow agents are now a real product, rather than a future possibility. There are already many coding agents, and you can use experimental open-source agents that do scientific and financial research.
Narrow agents are specialized for a particular task, which means they are somewhat limited. That raises the question of whether we soon see generalist agents where you can just ask the AI anything and it will use a computer and the internet to do it. Simon Willison thinks not despite what Sam Altman has argued. We will learn more as the year progresses, but if general agentic systems work reliably and safely, that really will change things, as it allows smart AIs to take action in the world.
Interview with Sean Falconer: AI agent use cases and adoption challenges
What are the use cases for AI agents at this early stage? It seems both wide open for creativity but also somewhat limited due to the still immature practice of building agents and the current limitations of the models.
I’m not an AI expert, more of an observer of the space, so I asked someone immersed in the AI field, my colleague at Confluent, Sean Falconer (also a host of Software Engineering Daily and Software Huddle).
Q: What are some good use cases for AI agents at this early stage?
Sean’s Response
At this stage, the use cases for agents are both exciting and evolving, but it's still early days. Think of it as the early days of cars, useful, but we’re far from self-driving.
The abstraction frameworks might not be the right abstractions, dev tooling to support the full lifecycle of deployment, testing, and monitoring are not mature. There’s still a lot of putting your finger in the air and seeing which way the wind is blowing type of experimentation and iteration required to get something useful.
Despite this, I’m bullish on agents; the promise is huge, and even achieving a fraction of that promise is compelling.
I think the most effective use cases right now focus on augmenting human effort rather than replacing it. For example, although not necessarily fully agentic, with coding, companies are seeing AI can boost task completion speed by 55% and improve code quality by 82%. And now the co-pilots of the world are supporting more complex task completion by leveraging agents.
Agents tend to shine in complex, multi-step workflows that are repetitive, resource-intensive, or too intricate for traditional automation. Think processes that are frustrating for humans but not disastrous if they go wrong. In industry, we’re already seeing agents in sales and marketing that research prospects, identify decision-makers, and draft personalized outreach, or in drug discovery, where they semi-automate filling out regulatory forms while humans verify responses.
There’s a ton of potential, but like self-driving cars, perfecting the last 20% to reach a fully automated state will take years. For now, agents excel at reducing grunt work and enhancing human productivity. The real magic happens when smart people work with smart AI.
Q: How should companies evaluate the need for an agent vs a workflow (as per the Anthropic post) or just simple prompts?
Sean’s Response
When it comes to deciding between a simple prompt, a workflow, or a full-blown agent, the key is to ask yourself, "Do I really need a bazooka to swat this fly?" As Knuth said, “premature optimization is the root of all evil,” and honestly, over-engineering is how you end up with solutions that are impressive but wildly unnecessary.
The best place to start is with the business value, ask yourself, what are you actually trying to accomplish, and how will you measure if it’s working? Like most engineering, start with the simplest thing that gets the job done.
For example, I built a tool to help me draft LinkedIn posts about content I’ve worked on. With some clever prompt engineering, I can get a pretty decent draft. Sure, it’s not perfect, sometimes it sounds like an overly enthusiastic social media manager, but it’s a decent enough start that I can refine and it took very little effort, and doesn't cost a lot of tokens.
Now, could I make it fancier? Sure. I could build a complex workflow to pull in context from all my past posts, past articles, and personal anecdotes. I could go further and use an agentic pattern like reflection that iterates and refines the post to perfection.
But, is it worth the effort? Probably not. I’m just trying to get a post out there, not win a Pulitzer.
Zooming out beyond the question of what’s my GenAI need, I think it’s important to ask yourself, is GenAI even the right answer for what you want to achieve. Predictive ML and other forms of automation have served humans for decades. Over the last two years, we’ve lost sight of that. Sometimes, a simpler, faster, and cheaper approach does the trick just fine. It’s like what I see all the time in data engineering, people love their fancy pipelines, but let’s not forget, sometimes a simple script and a spreadsheet is all you really need.
Q: What are the main adoption challenges of AI agents in 2025?
Sean’s Response
I see the main adoption challenges for agents in 2025 as a mix of over-ambition, engineering headaches, and good old-fashioned data dysfunction.
First, there’s the temptation to do too much. Companies dive in headfirst, trying to build agents that can plan, reason, and bring them coffee, only to end up with bloated, overcomplicated systems that deliver meh results. It’s like trying to build a rocket to deliver pizzas, cool idea, but probably not worth the cost. It’s perhaps not as sexy, but starting small and focusing on specific, measurable goals is the way to go.
Then there’s the engineering challenge.
Programming and evaluating non-deterministic workflows requires a real shift in mindset for engineering teams. It’s not like traditional coding, where you write instructions and the machine does exactly what you say. With agents, you need patience, flexibility, and a willingness to deal with unexpected outcomes. You have to be ready to program around these limitations or only focus on non-customer facing workflow automation where some level of variance in quality and accuracy is acceptable.
And then there’s of course data, the Achilles’ heel of most AI projects.
Many companies still don’t have a clear picture of what data they have, who has access, or how to get value from it. If you’re struggling with data modeling and analytics, operationalizing data for AI is going to be a real uphill battle. Data is often locked away in silos, ridiculously expensive to move, and engineers spend their days wrestling with pipelines instead of solving actual problems.
GenAI and agents have enormous potential, but they also expose the cracks in your company’s foundation. If your organization doesn’t tackle these underlying issues, starting with data and engineering workflows, you’ll end up with flashy demos that don’t translate into real value.
[end of interview with Sean]
Wrap up
It will be fascinating to watch how agentic systems evolve as a category. I myself will be closely watching the space. As AI agents continue to evolve, the real test will be how they perform in production environments and whether they can deliver consistent business value. I have some concerns over the operating costs of agents but the continued frenetic pace of model development, including the surprisingly low-cost DeepSeek v3 model, means that it’s hard to predict what the cost profile will look like even 6 months from now. The coming year will bring valuable lessons from teams experimenting with different approaches, refining reliability, and balancing running costs against tangible returns. Observing these case studies will help separate hype from reality, revealing what works, what doesn’t, and where the biggest challenges lie.