The Arithmetic Blindspot: Why Your AI Can't Count to 10,000

Let me show you something that will make you uncomfortable.

At the beginning of this journey, I asked three AI models to calculate 7,849 × 3,267.

GPT-4: 25,637,283

Claude: 25,645,083

Gemini: 25,642,683

Correct answer: 25,642,683

Two out of three got it wrong. If this were a pub quiz, the team captain would be quietly mortified and pretending to check his phone. These are not, however, pub quiz teams. These are the models running your business analytics.

The Problem Nobody Discusses

We talk endlessly about AI hallucinations in text generation. But there's a parallel problem that's arguably worse: AI can't do maths.

Not "struggles with complex calculations." Can't reliably add, subtract, multiply, or divide.

And yet, companies are using AI for:

Financial analysis
Inventory management
Sales forecasting
ROI calculations
Customer analytics

Trusting that the numbers are right.

I'd like to pause here to note that I have a 25-year-old spreadsheet that has never once confidently given me a wrong answer. It's not glamorous. It doesn't use transformers. But it can multiply. Just putting that out there.

Spoiler: The AI numbers are often not right.

Why This Happens

LLMs process numbers as tokens (text symbols, not mathematical values).

When you ask "What is 25 + 37?", the model doesn't compute. It predicts the next token based on patterns in its training data.

For simple arithmetic, this often works. The training data contains enough examples of "25 + 37 = 62" that the pattern is strong.

But scale up to "7,849 × 3,267" and the model has never seen that exact calculation. It's predicting digit sequences based on partial patterns. It's the mathematical equivalent of a Magic 8-Ball that's spent too much time reading financial reports: eerily confident, occasionally correct, and structurally unreliable.

Sometimes it gets close. Sometimes it's wildly wrong. Sometimes it's exactly right by pure chance.

You cannot tell which by looking at the output.

The "Just Verify" Fallacy

The standard advice: "Always verify AI calculations."

But think about that for a moment. If you're verifying every calculation, what is the AI actually saving you?

You've just added a probabilistic step between your question and your answer. A step that introduces errors you then have to catch. That's not productivity. That's make-work. You've paid for a very expensive calculator that's also incorrect a third of the time. Congratulations.

How Widespread Is This?

We tested this systematically in early 2025. We gave various LLMs 1,000 arithmetic problems of varying complexity:

Simple addition (2-digit numbers): 98% accuracy

Complex multiplication (4-digit numbers): 47% accuracy

Percentage calculations with decimals: 34% accuracy

Multi-step financial formulas: 12% accuracy

Twelve percent.

Twelve percent accuracy means the AI is correct just over one time in eight on complex financial formulas. My ability to correctly guess a wine's vintage by taste alone is higher than 12%. And I am genuinely, catastrophically wrong every single time I try. Let that sink in.

Now think about all the AI-powered analytics dashboards you've seen. The automated reports. The "insights" generated from your data. How confident are you in those numbers?

The Real-World Impact

A logistics client came to us frustrated. They'd implemented an AI system for route optimization. The AI would analyse delivery data and suggest optimal routes. Great in theory.

Except the distance calculations were consistently off by 10-20%. Sometimes more.

The AI was predicting distances based on pattern matching, not calculating them from coordinates. Think of it as asking for directions from someone who's never been to the city but has read extensively about it. Technically very confident. Practically, you'd end up in a field.

Routes that should have saved money were costing more. Delivery times were wrong. Customer satisfaction suffered. They were using AI to solve a maths problem. AI is bad at maths.

Our Solution: Stop Asking AI to Count

Here's our approach with AskDiana. When you ask: "What's our average sale value for Q3?"

Traditional AI: Attempts to calculate from data, probably gets it wrong.

AskDiana:

Understands the question (AI)
Generates SQL code to query the database (AI)
Executes the code (Computer)
Returns the result (Deterministic)
Explains the result in natural language (AI)

See the difference? We use AI for what it's good at: understanding language and generating code. We use computers for what they're good at: precise calculation.

I know. It sounds almost embarrassingly obvious when you say it out loud. And yet, the entire industry spent years trying to make LLMs better at arithmetic instead of simply not asking them to do arithmetic. Sometimes the engineering answer that worked in 1987 is still the right one.

The Architecture

When you ask AskDiana a question that requires calculation:

Intent Recognition: AI determines you need numerical data
Code Generation: AI generates executable code (SQL, Python, whatever's appropriate)
Code Execution: We run that code in a secure sandbox
Result Validation: Verify the output makes sense
Natural Language Response: AI explains the result

The numbers come from deterministic computation. The explanation comes from AI. The two things that are good at their respective jobs do their respective jobs. Revolutionary? No. Effective? Embarrassingly so.

Why This Matters More Than Hallucinations

Hallucinations are obvious problems. When AI invents a citation, you can usually catch it with a quick search.

Bad arithmetic is insidious. The numbers look plausible. They're formatted correctly. They're in the right range. They just happen to be wrong. And you don't know it until something breaks:

Inventory's off
Forecasts miss
Budgets don't balance
Reports to the board are fiction

That last one I've seen with my own eyes. Nobody enjoys explaining to a board why their AI analytics confidently reported Q3 was up 23% when it was actually down 4%. It is, shall we say, a career-clarifying moment.

By then, you've made decisions based on bad data.

Real Deployment: Manufacturing

One of our manufacturing clients in April 2025 needed to analyse production efficiency across multiple facilities. Complex calculations: utilisation rates, waste percentages, comparative performance, trend analysis.

With traditional AI: Results that looked right but were consistently 5-15% off. Enough to make genuinely terrible decisions with great confidence.

With AskDiana's code generation: Precise calculations, every time. Because computers are good at maths. This is not a new discovery.

They can now trust their analytics. Make informed decisions. Optimise based on real numbers, not hallucinated approximations. Turns out this is quite useful in manufacturing.

The Principle

Use tools for what they're designed for:

AI for language understanding
AI for code generation
Computers for calculation
AI for explanation

Don't ask the poet to be an accountant.

I learned this lesson early. In my first job, we had a programmer who was extraordinarily gifted at writing beautiful, elegant code, and absolutely catastrophic at tracking billable hours. We did not ask him to do the billing. The same principle applies here, except the stakes are somewhat higher than one person's expense report and the poetry is being charged at enterprise rates.

What This Means for Your Analytics

If you're using AI for business analytics, ask yourself:

Do you verify every calculation?
How do you know when the numbers are wrong?
What decisions are you making based on AI-generated numbers?
Can you afford to be 15% off?

If your analytics stack can't answer these questions, you might be flying blind. At speed. With a very confident co-pilot.

Next: How Genius2 brings it all together: hallucination elimination, privacy preservation, and reliable computation in one system.

Here is the bloody sacrifice to the gods of marketing, please click to save me from eternal damnation:

Try it out for free: askdiana.ai

All Posts