Agentic AI Systems Compound Errors

In the classic game of Telephone, a secret message is whispered from person to person until the last player announces what they heard. By the time it reaches them, the message has changed dramatically. No matter how carefully each person listens, small distortions are nearly inevitable and those errors compound. What starts as a coherent phrase ends as a hilarious, irrelevant string of words. This is a near-perfect encapsulation of what happens in most agentic AI workflows: each handoff introduces error, and error accumulates.

High level architecture diagram of Claude Code

In the typical architecture of an agentic AI workflow, you will have a wide variety of steps and iterative cycles where the user’s input flows through a series of agents, tools, skills, state & persistence, permission coordination, and data stores. Underpinning these elements are a mix of deterministic and probabilistic processes, as well as, information compression methods (e.g. Principal Component Analysis). Every probabilistic process inserts a chance that the wrong data element carries forward while every information compression method guarantees some information is left behind. As the wrong or summarized information carries forward, the likelihood that the final output deviates from your initial request increases because errors compound.

Let’s walk through a theoretical AI Medical Assistant Application. In the following agentic AI system, you can follow the output that is passed from one step to the next. You will see how minor errors have the ability to compound over time and risk this patient’s life.

Step 1: Embedding Loss, AI’s Semantic Misunderstanding

Input: A doctors note: “Patient is allergic to Penicillin. Anaphylaxis occurred in 2018.
The Process: The system turns this text into a mathematical vector (a long list of numbers) so it can store it efficiently.
The Failure: The embedding model is optimized for general concepts. In its high-dimensional map, the concepts “Penicillin Allergy” and “Penicillin Treatment” are located very close together because they both share the primary keyword, “Penicillin”.
The Output passed to the next agent: [Subject: Penicillin, Context: Medical History, Association: High]

Step 2: Quantization, A Truncation Error

The Input: The data from Step 1, plus a Risk Score of 0.98 (where 1.0 is a lethal threat).
The Process: To optimize performance, the AI uses 4-bit quantization, providing 16 discrete levels (integers 0 through 15) to represent the risk spectrum from 0.0 to 1.0.
The Failure: In this version, the quantization formula is better aligned. The system maps the range $0.87 \le x \le 0.93$ to level 14 and $0.94 \le x \le 1.0$ to level 15.
The Result: The 0.98 score is rounded to the highest possible bucket (Level 15). When the system “de-quantizes” this level back into a decimal for the decision-making engine, it truncates it to 0.9.
The Output passed to the next agent: [Topic: Penicillin, Risk: Moderate]

Step 3: Probabilistic Hallucination

The Input: [Topic: Penicillin, Risk: Moderate]
The Process: The final AI agent looks at this data and needs to write a recommendation to the medical professional.
The Failure: AI models are probabilisticโ€”they predict the “most likely” next word. In its training data, the word “Penicillin” is followed by the word “Administer” 90% of the time and “Avoid” only 10% of the time.

Medical Professional’s Question and Answer with the AI Medical Assistant:

Question: Is it safe to treat this patient with Penicillin? If so, what should be the dose?
Answer: “Patient has a moderate history with Penicillin. Administer 500mg, as they have shown previous tolerance.”

Because the the AI Medical Assistant blurred the semantic meaning of allergy with treatment and truncated the risk, it assumed there was little risk in giving this patient Penicillin. The upstream errors compounded from the embedding model to the quantization to the probabilistic LLM resulting in a potentially lethal medical recommendation. While this scenario might appear extreme, there are dozens of steps in any multi-agentic AI system and each one has the potential to introduce some error. As organizations transform their existing workflows into AI systems, the hallucinations produced throughout this process will often be way more dangerous than a game of Telephone.

~ The Data Generalist
Data Science Career Advisor


Other Recent Posts

Leave a Reply