From generative AI to agentic AI

By Guido Jouret, Sophia Business Angels, Member. LLMs can reach the places traditional IT can’t LLMs excel in taking in fuzzy inputs (documents, natural language, images, audio, and any other types of unstructured information). They can generate fuzzy outputs (other text, images, or audio). They’re increasingly good at producing structured output also: for example, writing longer and longer computer code. The great thing about code is that it’s verifiable: it either runs or it doesn’t. If we have known good outputs, we can even check that it runs correctly (at least in those use cases). But even in the case of fuzzy output, LLMs can perform very useful work: ask an LLM to ingest your goals and savings to produce a recommended investment portfolio and it’ll do at least as well as any financial advisor. There’s no ‘best’ portfolio: it depends on your risk tolerance, the future is unknown, and other investment choices can provide a similar return over time. The output is fuzzy–but still incredibly useful. Right now, LLMs inform decisions. Companies want to move beyond this to where they actually do the work. Generative AI = task, Agentic AI = job. To turn the LLM into an AI agent, we need to enable them to remember, plan, and act. Embedding LLMs into AI agents The LLM is primed with an initial prompt that describes its role. This lists the actions it can perform, the methods it should use, and possibly examples of inputs & outputs. The agent is triggered by a task (e.g. inbound email, chat, or problem ticket). The agent solves the task taking into account the constraints and methods described in the role. The output of the agent is a set of decisions that translate into actions on systems outside the agent. These actions are implemented via API calls and can include routing the task to someone (or some other AI agent), editing/creating/deleting data, issuing payment, etc. Using an LLM to plan your next ski vacation is very different than an AI agent that is assigned to solve millions of customer service issues. The superpower that LLMs have to understand fuzzy inputs has a dark side: it makes them stochastic (see my previous article: https://www.linkedin.com/posts/gjouret_much-is-made-about-ai-killing-off-enterprise-activity-7425919667874996224-f6JT). Producing a good agent runs into what I call the 80/20 rule of developing solutions using LLMs: The first prompt gets you 80% there — the next 80% of effort gets you the last 20% LLM solution development: “production ready takes time” The “LLM solved this in one-shot” may be true for simple problems, but for anything complex, it’s an urban myth. The issues developers see include: -Whack-a-mole: fixing one problem creates new ones in areas that were working. -Context-rot: LLMs using more than 60% of the context window see sharp declines in performance. -Leaky-bucket: longer sessions result in contexts getting compressed and reloaded. After compaction, the new context loses details of the prior interactions. It’s like carrying water in a leaky bucket. -Limits of prompt refinement: better prompts can produce better output, but can never eliminate stochasticity entirely. -LLMs are resource-hungry: solving complex problems consumes lots of tokens. Running lots of LLM instances makes it worse. But here’s the real problem: building an AI agent with a general purpose LLM (only) is the wrong approach. The LLM is a jack-of-all-trades, not a specialist in the specific role you need it to play as an agent. It’s also using a 100% non-deterministic approach to solve problems that may have components that can be solved much more accurately and efficiently. We need to apply Ashby’s Law of Requisite Variety: A control system must match or exceed the range of states (variety) in the system or disturbances it regulates to maintain stability. In other words, the AI agent should only be as fuzzy as required–anything more just adds to unpredictability or cost. An interesting example of this is a Claude Skill. Anthropic created this feature to enable the LLM to repeat tasks more readily. A skill is just a markdown file containing prose that follows a certain structure: purpose, workflow, input formats, how to perform certain actions, and examples. Sprinkled into the file, however, are snippets of Python code that can perform specific data-processing tasks. These are the deterministic (crunchy) bits in the stochastic (soft & gooey) cookie. This then, is how we should design AI agents that are “just right.” The “goldilocks” approach to building AI agents: 4 phases 1. Tune the prompt Iterate until the role + task prompts give you the “best result” you can manage with the LLM. We’re training the LLM to solve the problem by applying neural networks to the entire problem: Solving 100% via LLM 2. Prompt reification This involves doing what Claude does in building skills: making some fuzzy parts of the solution concrete: applying the LLM to the fuzzy inputs but transforming data and performing calculations with code: Reified: mix of LLM + code The code can now be moved outside the LLM and accessed via Model Context Protocol (MCP): the LLM will invoke these components via an API. We can maintain and evolve these external code bits independently of the rest of the AI agent. 3. LLM specialization with reinforcement learning We’ve now created a partially deterministic agent. Those parts will run reliably and efficiently. If our starting point was a general-purpose LLM, we should now swap that out for an open-source LLM that we can specialize to the role we want the agent to play. Reinforcement-learning (RL) can turn our mule into a racehorse. According to the company Adaptive ML, such specialized LLMs can cost 50-90% less than generalist LLMs. Even better, they become more accurate (on the chosen tasks): Using RL to specialize the LLM 4. Add domain-expertise to the LLM via RL Now that we have a lean and mean LLM without any extra bits we don’t need, we can use RL to further train the model by teaching it on proprietary data. This makes the AI agent even