← Blog
Engineering

How We Reduced AI Agent Token Costs by 96.5%

Three real API calls. Real data. We published every number and the methodology so you can reproduce it yourself.

Diagram showing 4,024 chat tokens collapsing through code execution down to 141 tokens, a 96.5 percent reduction on the same log analysis task
How the same log-analysis task drops from 4,024 tokens to 141 when the work runs in code: a 96.5% reduction.

When an AI agent handles a complex task, the default approach is to send everything to the model: the raw data, the search results, the logs, all of it. The model reads it, processes it, and returns an answer. This works. It also gets expensive fast.

At 10,000 queries per day, the difference between sending 4,024 tokens and 141 tokens is not a footnote. It is the difference between $42,519 per year and $5,988 per year. That is a single architectural decision with a 610% ROI impact.

We built a platform around this idea. Then we tested it properly. What follows are three real API calls, each run with and without the platform. No adjustments. No cherry-picking. Every number published.

Test setup and methodology

All three tests used the same model. We called the API directly. The results are verifiable. We are not asking you to trust our word. Run the same tests and check the numbers yourself.

Reproducibility note
These tests use real API calls on real data. The inputs, prompts, and ground-truth answers come from elementary operations: counting, arithmetic, and a web search. Connect to our platform, run the same tasks, and you will see comparable results.

Test 1: When the model simply cannot do it

The first test was about capability, not token cost. We asked the model a question that requires live web access: retrieve current information about a real-world topic.

Test 01Live web search task
Without platform
30
INPUT TOKENS
FAILED. The model returned: "I cannot search the web." Task not completed.
With platform
5,228
INPUT TOKENS
SUCCEEDED. Real 2026 data returned with sources. Accurate, current results summarised.
This is a capability win. The task was impossible without the platform. Token count is irrelevant here.

Token cost comparison is irrelevant when one path produces an answer and the other produces a refusal. Some tasks are simply not possible any other way.

Test 2: When cheaper is also more accurate

We gave the model two deterministic maths problems: find the largest prime under 10,000, and find the largest gap between consecutive primes up to 10,000. These have exact, verifiable correct answers.

Test 02Mathematical accuracy task
Without platform
773
TOTAL TOKENS
Reported 10,007 as a prime under 10,000. Wrong. 10,007 exceeds 10,000.

Said the largest prime gap was 20. Wrong. Correct answer is 36. A 44% error.
With platform
540
TOTAL TOKENS
Code computed both answers exactly. Correct on all results.

30% fewer tokens than the incorrect attempt.
The platform was both cheaper and more accurate. Deterministic computation does not belong in a model context window.

Test 3: The 96.5% reduction

We gave the model a log file and asked it to count the number of errors. The correct answer is 41.

96.5%
Fewer input tokens. Same task. More accurate answer.
Test 03Log file analysis task
Without platform
4,024
INPUT TOKENS
Entire log file loaded into model context.

Model reported 37 errors. Wrong. Correct answer is 41.
With platform
141
INPUT TOKENS
Code counted errors programmatically. Model received the answer only.

Result: 41 errors. Correct.
Without
4,024
With
141
96.5% fewer input tokens. The platform ran code against the file. The model never read it.

What this means at scale

VolumeWithout platformWith platformAnnual saving
1,000 queries / day$4,252 / yr$599 / yr$3,653
10,000 queries / day$42,519 / yr$5,988 / yr$36,531
ROI at 10k / day610%
The ROI
610% means for every dollar spent on the platform, you get $6.10 back in token savings. That compounds every month the workload runs.

How it works

The platform sits between your AI and the data it would otherwise need to read. Before the model sees the task, the platform runs code, searches the web, queries databases, and commits to GitHub. The model gets the answer, not the raw material to compute it.

Code execution

Count records, process data, parse files. The model receives the result, not the file.

🔍

Web search

Retrieve current information before it reaches the model. Real data instead of a refusal.

🗃

Database queries

Filter, aggregate, and return exact rows. The model gets the result, not the whole table.

🔒

Cost caps built in

Every call is estimated before it runs. Calls over your ceiling are rejected by default.

One URL. Connect your agent once. Works with Claude, ChatGPT, Gemini, or any MCP-compatible model. Every new capability reaches your AI automatically on the next call.

Related reading

If you care about token cost, you probably also care about what the platform does to the code your agent ships. See AI Never Ships Broken Code for how every commit is confirmed to load before it lands.

Start free. Check the numbers yourself.

1,000 free calls per month. One URL. Any MCP-compatible AI agent.

Get your API key
Works with Claude, ChatGPT, Gemini, and any MCP-compatible AI
Read next
Pillar, safety
AI Never Ships Broken Code
What it actually takes to stop an AI coding agent from committing code that does not compile.
Why Token Reduction Beats Prompt Optimization
The real token savings come from where the work runs, not how you word the prompt.
Why AI Agent Costs Spiral
The five hidden multipliers turning a $200 a month tool into a $20,000 a month bill.
Have a question about this post?
We read every message

A comment section with zero readers is just an empty box. Email us directly and we will reply. Once this post has a few hundred readers we will wire up threaded comments here.

Ask a question → hello@universalbench.dev