Your Sentiment Tool is Guessing. Here’s the Math.
by Nitin Mayande, Co-Founder and Chief Science Officer, Tellagence
A Simple Test
Take any AI sentiment analysis tool you use today. Run it on a batch of customer reviews. Note the results. Then run it again tomorrow on the exact same data.
Did you get exactly the same scores? The same categorizations? The same breakdown of positive, negative, and neutral?
If you did not — and the research suggests you probably did not — then you have been making decisions on a foundation that shifts beneath you.
The Inconsistency Problem
Sentiment analysis is one of the most widely used applications of AI in marketing. It tells you how your customers feel about your brand, your competitors, and your campaigns. It is supposed to be the source of truth that informs strategy.
But there is a structural problem baked into every standard AI tool that offers sentiment analysis. These tools are built on Large Language Models: probabilistic engines that generate outputs based on weighted probabilities. The same model, given the same input, may produce different outputs across runs. This is called stochasticity, and it is not a bug that will eventually be patched. It is how the architecture works.
In creative applications, stochasticity is a feature. It is why AI can write you ten different versions of the same email. But when you are trying to determine whether consumer sentiment about your brand is improving or declining, creative variance is the last thing you want. If the same dataset yields a different sentiment score depending on when you run it, the output is not a measurement. It is a guess.
We Measured the Problem
My team at Tellagence spent considerable time not just identifying this problem theoretically, but measuring it empirically. We ran experiments across three industry-standard datasets — Amazon product reviews, Google Business reviews, and Goodreads book reviews — comparing standard LLM-based sentiment analysis against our own framework.
The results were clear. Standard direct-LLM approaches produced measurably inconsistent outputs across runs on identical data. The variance was not trivial. It was the kind of variance that, in a boardroom presentation, would lead to different strategic recommendations depending on which day the analysis was run.
That is not acceptable for enterprise-grade analytics. And yet it is the standard.
What Our Framework Does Differently
The SSAS framework — Syntactic and Semantic Context Assessment Summarization — approaches the problem differently. Rather than feeding raw, chaotic data directly to an LLM and hoping the attention mechanism lands in the right place, SSAS pre-processes the data in two structured phases.
First, it establishes context. Every data point is evaluated for its relevance to the specific question being asked, not in the abstract. The data is then organized hierarchically: Themes at the broadest level, Stories within each Theme, and Clusters within each Story. This is not clustering in the traditional sense. It is a structured map of meaning.
Second, it removes noise. Using a Signal-to-Noise Ratio calculation, the framework identifies and suppresses data points that would dilute the LLM’s attention. Research has shown that irrelevant information is more damaging to AI performance than missing information — because it forces the model to spend its limited attention budget on patterns that do not matter. SSAS removes those patterns before the model ever sees them.
The output is a sentiment prediction built on a stable, filtered, contextually organized foundation. And because the inputs are deterministic, the outputs are reproducible.
The Results
Our framework improved data quality by up to 30% compared to direct LLM approaches, across three independent datasets. That number represents a combination of noise reduction and improvement in the accuracy of the sentiment predictions themselves.
More importantly: the outputs were consistent. The same data, analyzed at different times, produced the same results. That is the standard that enterprise analytics demands. It is the standard that most tools currently available cannot meet.
This research was published on arXiv in April 2026 in partnership with researchers at Villanova University and Portland State University. The full methodology, experimental setup, and results are available in the paper linked below.
The Practical Implication
If you are using sentiment analysis to make strategic decisions — about campaigns, product development, brand positioning, competitive response — the consistency of that analysis is not a technical detail. It is the difference between insight and noise.
Discover is built on this framework. When it tells you that consumer sentiment around a cultural moment shifted three weeks before the trend hit the mainstream press, it is telling you that because the underlying analysis is stable enough to detect a signal that small. A system built on stochastic outputs cannot do that. Ours can.
Want to see what consistent AI intelligence looks like in practice? Get in touch. Read the full white paper on the See the Science page.
This post is based on peer-reviewed research published on arXiv, April 16, 2026.
Authors: Sharookh Daruwalla, Nitin Mayande, Shreeya Verma Kathuria (Tellagence Inc.); Nitin Joglekar (Villanova University); Charles Weber (Portland State University)

