Why AI Gets Confused — And How We Fixed It

by Nitin Mayande, Co-Founder and Chief Science Officer, Tellagence

 

The Problem You Probably Don’t Know You Have

If you’ve used an AI tool to categorize customer feedback, summarize social media conversations, or organize research, you’ve almost certainly gotten impressive-looking results. But here’s a question I’d like you to sit with: if you ran that same analysis tomorrow, on the same data, would you get the same answer?

In most cases, the honest answer is: not exactly. And in some cases, not even close.

This is the problem my team set out to solve. Not because it’s an interesting academic puzzle — though it is — but because in a world where brands make multi-million dollar strategy decisions on the back of data analysis, “not exactly” isn’t good enough.


Why AI Is Wired to Be Inconsistent

Large Language Models — the technology behind most modern AI tools — are, by design, engines of creative probability. They are built to generate plausible, fluent, novel outputs. That makes them extraordinary for writing, brainstorming, and synthesis.

But it makes them poorly suited for data science, where you need the same input to produce the same output.

The root of the problem lies in something called the attention mechanism — the part of the model that decides which pieces of information matter most when producing an answer. In a generative model, this mechanism is tuned for variety and fluency. That means it might weight the same tokens differently each time it runs. A data point that registers as “significant” on Monday might be deprioritized on Wednesday — for no reason other than the probabilistic nature of the system.

For creative work, this variance is a feature. For data analysis, it is a liability.


The Noise Problem

The second issue compounds the first. Real-world data — the kind that comes from social media, customer reviews, and survey responses — is chaotic. A fraction of the data is directly relevant to the question you’re asking. The vast majority is noise: off-topic comments, duplicates, boilerplate, tangents.

Standard AI models process all of it. When the attention mechanism is also inherently variable, noise becomes even more dangerous. Irrelevant information doesn’t just dilute the signal — it actively pulls the model’s attention toward patterns that don’t matter, producing outputs that look authoritative but are built on a shaky foundation.

Research confirms this: the inclusion of irrelevant information in an AI’s input is more damaging to its performance than the omission of relevant data. In other words, it is better to give the model less information, as long as it’s the right information.


What We Built

The framework we published — called wSSAS, the Weighted Syntactic and Semantic Context Assessment Summary — is a two-phase process designed to solve both problems before the AI ever sees the data.

In Phase 1, we evaluate the incoming data within its specific context, determining which data points are actually relevant to the question being asked and which are noise. This is not a simple keyword filter. It is a structured, hierarchical classification system that organizes data into Themes, Stories, and Clusters — a way of building a map of the information landscape before trying to navigate it.

In Phase 2, we calculate what we call a Signal-to-Noise Ratio for the dataset. We use this score to prioritize the highest-value data points and suppress the rest. Only then — after the data has been filtered, organized, and scored — do we feed it to the AI.

The result is a deterministic process. Structured inputs produce structured outputs. The same data produces the same answer.


What the Results Showed

We tested this framework across three independent, industry-standard datasets: Google Business reviews, Amazon product reviews, and Goodreads book reviews. We compared the wSSAS framework against a direct LLM approach — the same kind of AI analysis most tools offer today.

The framework significantly improved clustering integrity and categorization accuracy across all three datasets. It reduced what we call “categorization entropy” — the technical term for the kind of inconsistency that makes AI analysis unreliable for strategic decisions.

This research was published on arXiv in April 2026 in collaboration with colleagues at Villanova University and Portland State University. It is available for download in its complete form below.


What This Means If You Work With Data

The implications are practical. If your team uses AI to analyze customer sentiment, track brand perception, monitor cultural trends, or organize research, the framework underlying the tool matters — not just the interface on top of it.

A tool built on a raw LLM is producing outputs that are more variable than its confidence levels suggest. A tool built on a deterministic framework like ours is producing results you can stake a strategy on.

That’s the difference Tellagence is built on. And now you can read exactly how it works.

 
 

Want to see what this framework can find in your data? Get in touch. Read the full white paper on the See the Science page.

 

This post is based on peer-reviewed research published on arXiv, April 13, 2026. Full paper: Leveraging Weighted Syntactic and Semantic Context Assessment Summary (wSSAS) Towards Text Categorization Using LLMs Authors: Shreeya Verma Kathuria, Nitin Mayande, Sharookh Daruwalla (Tellagence Inc.); Nitin Joglekar (Villanova University); Charles Weber (Portland State University)

Nitin Mayande

Co-Founder and Chief Science Officer at Tellagence

Previous
Previous

Your Sentiment Tool is Guessing. Here’s the Math.