Infrastructure for AI agents

Your AI can think.
Give it hands.

UniversalBench connects your AI assistant to the tools and accounts you already use, including your own Google Search Console and Google Analytics, so it can read your data and take actions you approve.

Start free See it work

✓ 1,000 free actions a month. No card, no second screen.

96.5%fewer tokens
on data tasks

1 URLconnects any AI
to your stack

UniversalBench

Audit my live site’s SEO, fix what you can, and publish it.

On it.

UniversalBench✓

✓Crawled 24 pages

✓Found 11 SEO issues

✓Rewrote titles and meta

✓Published to your live site

Done. Live, and your traffic is already climbing.

⚙ Opus 4.8 Thinking

↑

🔒search.google.com● Live

Search Consoleyoursite.comLast 12 weeks

Organic traffic

312visits / mo▲ 490%

12.4k

Impressions

18.4

Avg position

2.1%

Avg CTR

SEO health

Top queries climbing

acme co~~#14~~#3 ▲

acme pricing~~#22~~#6 ▲

acme reviews~~#17~~#4 ▲

✓ 11 issues fixed · 24 pages indexed

UniversalBench

Launch my bakery website. Sort the domain, hosting and email, and put it live.

On it.

UniversalBench✓

✓Registered sunrisebakery.com

✓Pointed the DNS

✓Deployed the live site

✓Set up email

Done. You’re live, and your inbox is ready.

⚙ Opus 4.8 Thinking

↑

DomainDNSHostingEmail

🔒sunrisebakery.com● Live

Setting up your site…

Sunrise BakeryMenuAboutOrder

🍞

Fresh every morning

Sourdough, pastries and coffee in the heart of town

🥐Pastries

🥖Breads

☕Coffee

✉ hello@sunrisebakery.comInbox 1

Welcome to Sunrise BakeryYour mailbox is live and ready to receive orders.

UniversalBench

The checkout crashes on an empty cart. Find the bug and push a fix to my repo.

On it.

UniversalBench✓

✓Cloned acme/checkout-api

✓Found the bug in cart.js

✓Wrote the fix and a test

✓Pushed the commit to main

Done. The fix is on main and the tests pass.

⚙ Opus 4.8 Thinking

↑

cart.js⎇ main

11function cartTotal(items) {

12 return items.reduce((a, i) => a + i.price)

12 return items.reduce((a, i) => a + i.price, 0)

13}

Pushed to maina1f4c9d

fix: guard empty-cart total with a reduce seed

✓ lint✓ tests✓ build

UniversalBench

300 new signups came in overnight. Add them to the customers table in my database.

On it.

UniversalBench✓

✓Connected to your Postgres

✓Validated 300 rows

✓Inserted into customers

✓Updated the daily total

Done. 300 rows are live in your database.

⚙ Opus 4.8 Thinking

↑

customers● postgres

rows1,240+300

idemailplancreated

1541ava@acme.copronow

1542noah@hey.ioteamnow

1543mia@studio.devpronow

1544leo@northwind.comfreenow

+296 more inserted

300 rows committedtransaction sealed

UniversalBench moves computation, validation, and execution into a controlled runtime before results ever reach the model. Token reduction, safety enforcement, and verified accuracy all follow from this one architectural decision.

The real problem

Your AI is guessing.
UB makes it compute.

Every enterprise team, every funded startup, every developer paying for Claude or GPT-4 faces the same silent problem: LLMs hallucinate on data, hit token limits on context, and can't execute real code. UniversalBench pre-computes every answer before it reaches your model. Facts. Not guesses.

Same prompt. Different path. Different outcome.

Prompt

Raw to LLM

Hallucinated answer

WRONG

Prompt

UniversalBench

Computed answer

CORRECT

⚡

UB pre-computes and sends only the result to your LLM. 96.5% fewer tokens. Every time.

Integration

Three steps.
Everything changes.

Sign up, get your key

30 seconds. No credit card. 1,000 free executions every month , enough to run a real benchmark against your current workflow.

ubk_a3f9c21e84d0...

Paste one URL into your AI

Claude, ChatGPT, Gemini, Cursor , any MCP-compatible client. One URL in your integration settings. Two minutes of work.

universalbench-mcp.penantiaglobal.workers.dev/u/ubk_yourkey

Your AI gains real execution

Code runs. Data queries. Search verifies. LLM routing activates. Every result pre-computed , your model gets facts, not prompts.

web_search, code, db, github...

What becomes possible

Give Your AI
Real Infrastructure.

Connect one URL to your AI. It can execute code, query databases, search the web, call APIs, and process files before the answer reaches the model.

Building AI Coding Agents

Write code, run tests, validate syntax, and commit to GitHub. Every push is smoke-tested before it lands. Your agent ships working code, not broken patches.

Code execution

Building AI Research Systems

Search the live web, query databases, and run analysis scripts. Only the answer reaches the model, not thousands of raw rows. Up to 96.5% fewer tokens.

Web search + analysis

Building AI Automation Platforms

Trigger API calls, update records, and process files across your entire stack. One URL replaces custom integration for every tool your workflows already use.

API execution

These are the building blocks.
Not the building.

Every company connects UniversalBench for a different reason. The most valuable workflows are usually the ones nobody planned on day one.

Get free API key →

What UniversalBench does

Connect your tools, let your AI act on them.

UniversalBench connects your AI assistant to the tools and accounts you already use, so it can read your data and take real actions for you, only on your instruction. It works across many services.

As one example, you connect your own Google Search Console and Google Analytics, and your AI reads your search performance (queries, impressions, clicks, position) and your site traffic to diagnose issues and improve your site. At your instruction it can also submit sitemaps and request indexing.

We act only on accounts you own, only when your AI requests it. We never retain your Google data after returning it to your AI, and we never use it for advertising or to train any AI or ML model.

How it works

Two approaches to
AI tooling.

Exposing tools to the model and executing work inside a runtime are two different architectural choices. Here is what each one means in practice.

Traditional MCP Server	UniversalBench Runtime
Exposes tools for the AI to call and reason over	Executes Python, search, database, and API operations before returning results
Raw data and intermediate steps are often sent back to the model	Computation happens inside the runtime, returning only the final result
Model processes most of the workload inside the chat context	Heavy work is completed outside the model, reducing token usage by up to 96.5%
Individual tools and workflows are exposed through MCP interfaces	One runtime provides a unified execution layer across multiple capabilities
Safety depends largely on tool implementation and agent behavior	Runtime-enforced limits validate code, spending, and network access before execution
Teams build, host, monitor, and maintain MCP infrastructure	Connect a single MCP endpoint and use managed execution services

Traditional MCP servers expose tools to the model.
UniversalBench moves computation, validation, and execution into a controlled runtime before results ever reach the model.

Runtime Architecture

What runs
where.

Your AI sends one instruction. UniversalBench handles everything in between, auth, execution, and safety, before a single result token reaches the model.

Secrets Vault

Credentials stored encrypted in the Worker. Never passed to your AI, never logged.

Network Isolation

The Runtime cannot reach internal IPs or cloud metadata. Enforced at the network layer, not the prompt layer.

Execution Isolation

Every customer runs in its own sandboxed process. No shared state, no cross-customer access.

Your AI

Claude · GPT · Gemini · Any MCP AI

UB Worker

Auth · Billing · Rate Limit

UB Runtime

Python · Web · GitHub · DB · LLM

Your Tools & Data

GitHub · DB · APIs · Stripe · Slack

Free Setup Call

Get connected in 30 minutes.
I'll walk you through it.

We connect your AI assistant together, run a live execution, and make sure everything works. No technical knowledge needed.

Book a free setup call →

30 minutes · Video call · Free forever

Hard limits

Three hard limits every AI agent must obey

Giving AI real execution power should not require blind trust. UniversalBench enforces production safeguards before actions happen.

Code safety

AI never ships broken code

Every code push is validated before it lands.

Generated code ready

Syntax validation

Live URL smoke test

Deploy approved

Validation before commit
Optional smoke testing
Automatic rollback support
Production-safe deployments

Prevents invalid production changes

Cost guardrails

AI never burns your wallet

Every model call is budget checked before execution.

LLM request incoming

Cost estimated first

Budget check

Allowed and executed

Default ceiling	$0.50 / request
Hard platform cap	$50.00 / request
Over budget	Rejected before run
Surprise invoices	Impossible

Enforced before tokens are spent

Network isolation

AI cannot reach your internal network

Every outbound request is inspected before execution.

AI agent makes HTTP call

Request intercepted

Private IP blocked

Public internet only

Private IP ranges blocked
Loopback addresses blocked
Link-local ranges blocked
Cloud metadata endpoints blocked

Protects internal systems by default

Runtime architecture

Safety by runtime, not prompting

These protections are enforced by the runtime itself. They do not depend on prompt instructions, agent behavior, or model compliance.

AI Agent (Claude, ChatGPT, Gemini)

UniversalBench Runtime

Isolated Sandbox

External APIs & Web

Prompt-based guardrails	Runtime enforcement
AI asked not to break things	Broken deployments blocked before commit
AI asked not to overspend	Overspending is impossible
AI asked not to access internal systems	Internal systems unreachable by design

Secrets vault

Encrypted at rest. One-time setup. Auto-injected into every tool that needs them.

Network controls

Every outbound request inspected. Private IPs, loopback, metadata endpoints blocked.

Cost guardrails

Cost estimated before every LLM call. Requests over your ceiling rejected before tokens are spent.

Code validation

Syntax check, optional smoke test, and auto-rollback before any code reaches production.

Isolated execution

Each customer runs in a separate sandbox. No shared state, no cross-tenant access.

Verified Results

Three tests. Real Anthropic API.
Real tokens. Reproducible.

Run on 20 May 2026 against the live Anthropic API using claude-opus-4-7, the current Anthropic flagship. Test data is published below. The "true" answer in every test is elementary math, verifiable in Excel, R, bash grep, or any calculator. We are inviting you to reproduce these tests yourself.

Verifiable in Anthropic console · claude-opus-4-7 · 2026-05-20

TEST 01

Messy CSV Revenue Extraction

80 sales rows. Three different date formats. Missing and placeholder amounts. Question: total Q3 2025 revenue from EU customers.

Without UB: Opus 4.7 reasoned through every row, used 1,002 output tokens. Got the right answer ($30,111.05), but the long reasoning made the call expensive.

With UB: Python parsed and filtered. Sent Claude only the result. 34 output tokens. Correct.

Right answer, 25x cheaper output

Input tokens, Opus 4.7

Without UB

2,971

With UB

101

Input token reduction

96.6%

TEST 02

Statistics over 200 Transactions

200 amounts. Question: how many are above $5,000, what is the median, what is the standard deviation.

Without UB: Opus 4.7 spent 5,000 output tokens summing line by line. Got to row about 50 of 200. Ran out of budget. Never returned an answer.

With UB: Python computed in milliseconds. 84 output tokens. All three numbers exact.

Task incomplete vs instantly correct

Output tokens spent

Without UB

5,000 (hit budget)

With UB

Outcome

never finished, vs done

TEST 03

Server Log Error Counting

500-line server log with realistic mix of INFO, WARN, ERROR. Question: how many ERROR-level lines.

Without UB: Opus 4.7 said 96. True answer is 80. Off by 16. Confident wrong number.

With UB: Python counted. 80. Correct.

20% wrong vs correct, 500x fewer input tokens

Input tokens, Opus 4.7

Without UB

20,226

With UB

Input token reduction

99.8%

Methodology, plain and verifiable

Model: claude-opus-4-7. Anthropic API direct. Input pricing $15 per million tokens, output $75 per million tokens. Each "with UB" call sends a Python-computed answer to Claude instead of raw data. Token counts pulled from the Anthropic API response usage field.

The "true" answer in every test is elementary mathematics, not a Python opinion. Sum, count, median, standard deviation, and "lines containing [ERROR]" are unambiguous mathematical operations. They produce the same result in Excel, R, MATLAB, bash grep, or any pocket calculator. Test data and reproducer recipes are saved in our published reproducibility receipts. Run the math in whatever tool you trust most. The number will match.

Token reduction depends on the workflow. Small queries save less. Bulk data tasks like the three above save 95% to 99.8%. The "up to 96.5%" claim is a conservative reference from our original public test. At Opus pricing, Test 03 alone saves roughly $40,000 per year at 1,000 queries per day, or up to $400,000 at 10,000 queries per day. Customers pay UB $0.008 per call, so the math is verifiable for their own volume.

Common questions

The things people ask first.

Which AIs work with UniversalBench?

Claude, ChatGPT, Gemini, Cursor, and any MCP-compatible client. One URL works across all of them. Switching AI providers does not require re-integrating UB.

Is the 96.5% token reduction real?

Yes, on bulk data tasks. We have measured reductions of 95% to 99.8% across the three live tests above, all run against the current Anthropic flagship and reproducible from published data. Tasks that send small inputs to your AI will not see this scale of saving. The big savings show up when your AI is reading large datasets, log files, CSVs, or anything where Python can pre-process and send a one-number answer.

What happens after the 1,000 free calls?

You stop, or you top up your wallet from $5 and keep going at $0.008 per call. Credits roll over. No subscription, no auto-renewal, no surprise charges.

Is my data sent to other AI providers?

Only what your AI explicitly asks UniversalBench to send. Credentials you save go into an encrypted vault scoped to your account. Your data is not used for training. Your AI cannot reach your internal network by default.

What if a validation blocks something I want to push?

You see the exact reason. You fix it and try again. The validation is mandatory because that is what makes the safety claim real, but the error is always visible and actionable.

Can I raise the cost ceiling on LLM calls?

Yes. The default ceiling is $0.50 per call. You can raise it up to $50. The cap stays on. You control its size.

How do I cancel?

Stop topping up. There is no subscription. Unused credits get refunded if you ask.

Pricing

Start free.
Pay only for what runs.

All prices in USD

Execution

^$0.008 / call

1,000 calls/month free

Code & Bash
GitHub read & push
Files & parallel threads
Secrets vault

Web Search

^$0.01 / search

100 searches/month free

Live results from the web, cited and structured. Billed per query, separate from your execution credits.

LLM Routing

from ^$0.0001 / 1K tokens

billed per token, per model

LLM Pricing

Per 1M tokens. Your AI picks the model per task.

ModelInput /1MOutput /1M

Database

Free

with your credentials

Connect any PostgreSQL-compatible database once via your vault. Read, write and search from any tool.

Hosted database

Coming soon

🛡️

Zero-risk guarantee: sign up, use it, and if you don't see measurable ROI, we refund you. No questions, no forms. That's how confident we are in the numbers.

Three promises. One URL.

AI that never ships broken code,
never burns your wallet,
never reaches your internal network.

Built into every call by default. One URL into any MCP-compatible AI. Free to start.

Get your free API key

No credit card. 1,000 free calls every month.

Overview

Dashboard Usage Audit Billing

Tools

Web Search LLM Routing GitHub DatabaseSOON

Security

Integrations External capabilities Workflows Secrets Vault

Back to site

Welcome, your account

Loading…

YOUR PERSONAL MCP URL ENCRYPTED · PRIVATE

universalbench-mcp.penantiaglobal.workers.dev/u/ubk_•••••••••••••••••••••••

🔒 Your key is embedded in this URL and masked by default. Paste the whole URL into any MCP client (Claude Desktop, Cursor, etc) in one step. No separate header configuration needed.

YOUR API KEY RAW KEY · KEEP SECRET

ubk_•••••••••••••••••••••••••••••••

🛡 Use this when an MCP host asks specifically for an API key, not a URL. Most clients (Claude Desktop, Cursor) want the URL above instead. Rotate key if you suspect exposure (invalidates current URL and key).

Click to verify your AI client can reach UB

Billing & Usage Loading…

Wallet

$0.00

0 paid calls available

Free this month

resets first of month

Top up wallet

Top up from $5 to $500 per transaction. All amounts in USD. Paid call count shown above updates after payment.

$0.008 USD per execution after your 1,000 free calls each month. No subscription. Funds never expire.

Executions

1,000 free this month

Tokens saved

—

run your first call

Charged this month

$0.00

free tier first, then $0.008 per call

Top tool

—

no calls yet

Executions , last 14 days

Executions

Savings

Your tools

Web Search

Live web search from your prompts.

OFF

Code Execution

Python and Bash.

ACTIVE

LLM Routing

Route prompts to other LLMs from within UB.

OFF

Database

Add your database credentials.

CONFIGURE

GitHub

Add a GitHub access token.

CONFIGURE

Coming soon.

SOON

Quick start , connect Claude in 2 minutes

Open Claude → Settings → Integrations → Add MCP server

Click Copy URL above and paste it as the MCP server URL. Your key is already embedded , no separate field to fill.

Ask Claude: "search the web for latest AI infrastructure news" , it works immediately

External capabilities Live data and actions your agent can use · billed to your wallet

Give your agent live data and real world actions. Switch one on and your agent can use it on every call, billed only for what it uses.

Why are these off by default

Most capabilities are flat-rate and run on UB infrastructure. Web Search and LLM Routing use external services billed per call.

These are off until you opt in. When enabled, the per-call cost is debited from your wallet on top of the standard per-call execution fee.

Set a daily cap below to hard-stop spending.

Capabilities

Web Search

Search the live web from your prompts. 100 free searches per month included. After that, usage is debited from your wallet automatically.

LLM Routing OFF

Route prompts to other LLMs from within UB. Billed per call from your wallet.

Daily safety cap

Hard ceiling on combined external-capability spend per UTC day. External calls block once the cap is hit and reset the next UTC day.

USD per day

External capabilities spend this month

Web Search

$0.00

external usage cost, current month

LLM Routing

$0.00

external usage cost, current month

Your AI can think.Give it hands.

Sign up, get your key

Paste one URL into your AI

Your AI gains real execution

Give Your AIReal Infrastructure.

Connect your tools, let your AI act on them.

Two approaches toAI tooling.

What runswhere.

Get connected in 30 minutes.I'll walk you through it.

Get yourfree API key.

Welcomeback.

Check youremail.

Magiclink.

Resetpassword.

Newpassword.

Check youremail.

Your AI can think.
Give it hands.

Give Your AI
Real Infrastructure.

Two approaches to
AI tooling.

What runs
where.

Get connected in 30 minutes.
I'll walk you through it.

Get your
free API key.

Welcome
back.

Check your
email.

Magic
link.

Reset
password.

New
password.

Check your
email.