Infrastructure for AI agents

Your AI can think.
Give it hands.

UniversalBench connects your AI assistant to the tools and accounts you already use, including your own Google Search Console and Google Analytics, so it can read your data and take actions you approve.

1,000 free actions a month. No card, no second screen.
96.5%fewer tokens
on data tasks
1 URLconnects any AI
to your stack
UniversalBench
Audit my live site’s SEO, fix what you can, and publish it.
On it.
UniversalBench
Crawled 24 pages
Found 11 SEO issues
Rewrote titles and meta
Published to your live site
Done. Live, and your traffic is already climbing.
⚙ Opus 4.8 Thinking
🔒search.google.com● Live
Search Consoleyoursite.comLast 12 weeks
Organic traffic
312visits / mo▲ 490%
12.4k
Impressions
18.4
Avg position
2.1%
Avg CTR
38
SEO health
Top queries climbing
acme co#14#3 ▲
acme pricing#22#6 ▲
acme reviews#17#4 ▲
✓ 11 issues fixed · 24 pages indexed
UniversalBench
Launch my bakery website. Sort the domain, hosting and email, and put it live.
On it.
UniversalBench
Registered sunrisebakery.com
Pointed the DNS
Deployed the live site
Set up email
Done. You’re live, and your inbox is ready.
⚙ Opus 4.8 Thinking
DomainDNSHostingEmail
🔒sunrisebakery.com● Live
Setting up your site…
Sunrise BakeryMenuAboutOrder
🍞
Fresh every morning
Sourdough, pastries and coffee in the heart of town
🥐Pastries
🥖Breads
Coffee
✉ hello@sunrisebakery.comInbox 1
Welcome to Sunrise BakeryYour mailbox is live and ready to receive orders.
UniversalBench
The checkout crashes on an empty cart. Find the bug and push a fix to my repo.
On it.
UniversalBench
Cloned acme/checkout-api
Found the bug in cart.js
Wrote the fix and a test
Pushed the commit to main
Done. The fix is on main and the tests pass.
⚙ Opus 4.8 Thinking
cart.js⎇ main
11function cartTotal(items) {
12 return items.reduce((a, i) => a + i.price)
12 return items.reduce((a, i) => a + i.price, 0)
13}
Pushed to maina1f4c9d
fix: guard empty-cart total with a reduce seed
✓ lint✓ tests✓ build
UniversalBench
300 new signups came in overnight. Add them to the customers table in my database.
On it.
UniversalBench
Connected to your Postgres
Validated 300 rows
Inserted into customers
Updated the daily total
Done. 300 rows are live in your database.
⚙ Opus 4.8 Thinking
customers● postgres
rows1,240+300
idemailplancreated
1541ava@acme.copronow
1542noah@hey.ioteamnow
1543mia@studio.devpronow
1544leo@northwind.comfreenow
+296 more inserted
300 rows committedtransaction sealed

UniversalBench moves computation, validation, and execution into a controlled runtime before results ever reach the model. Token reduction, safety enforcement, and verified accuracy all follow from this one architectural decision.

The real problem
Your AI is guessing.
UB makes it compute.

Every enterprise team, every funded startup, every developer paying for Claude or GPT-4 faces the same silent problem: LLMs hallucinate on data, hit token limits on context, and can't execute real code. UniversalBench pre-computes every answer before it reaches your model. Facts. Not guesses.

Same prompt. Different path. Different outcome.
Prompt
Raw to LLM
Hallucinated answer
WRONG
Prompt
UniversalBench
Computed answer
CORRECT
UB pre-computes and sends only the result to your LLM. 96.5% fewer tokens. Every time.
Integration
Three steps.
Everything changes.
01

Sign up, get your key

30 seconds. No credit card. 1,000 free executions every month , enough to run a real benchmark against your current workflow.

ubk_a3f9c21e84d0...
02

Paste one URL into your AI

Claude, ChatGPT, Gemini, Cursor , any MCP-compatible client. One URL in your integration settings. Two minutes of work.

universalbench-mcp.penantiaglobal.workers.dev/u/ubk_yourkey
03

Your AI gains real execution

Code runs. Data queries. Search verifies. LLM routing activates. Every result pre-computed , your model gets facts, not prompts.

web_search, code, db, github...
What becomes possible

Give Your AI
Real Infrastructure.

Connect one URL to your AI. It can execute code, query databases, search the web, call APIs, and process files before the answer reaches the model.

Building AI Coding Agents
Write code, run tests, validate syntax, and commit to GitHub. Every push is smoke-tested before it lands. Your agent ships working code, not broken patches.
Code execution
Building AI Research Systems
Search the live web, query databases, and run analysis scripts. Only the answer reaches the model, not thousands of raw rows. Up to 96.5% fewer tokens.
Web search + analysis
Building AI Automation Platforms
Trigger API calls, update records, and process files across your entire stack. One URL replaces custom integration for every tool your workflows already use.
API execution
These are the building blocks.
Not the building.

Every company connects UniversalBench for a different reason. The most valuable workflows are usually the ones nobody planned on day one.

Get free API key →
What UniversalBench does

Connect your tools, let your AI act on them.

UniversalBench connects your AI assistant to the tools and accounts you already use, so it can read your data and take real actions for you, only on your instruction. It works across many services.

As one example, you connect your own Google Search Console and Google Analytics, and your AI reads your search performance (queries, impressions, clicks, position) and your site traffic to diagnose issues and improve your site. At your instruction it can also submit sitemaps and request indexing.

We act only on accounts you own, only when your AI requests it. We never retain your Google data after returning it to your AI, and we never use it for advertising or to train any AI or ML model.

How it works

Two approaches to
AI tooling.

Exposing tools to the model and executing work inside a runtime are two different architectural choices. Here is what each one means in practice.

Traditional MCP Server UniversalBench Runtime
Exposes tools for the AI to call and reason over Executes Python, search, database, and API operations before returning results
Raw data and intermediate steps are often sent back to the model Computation happens inside the runtime, returning only the final result
Model processes most of the workload inside the chat context Heavy work is completed outside the model, reducing token usage by up to 96.5%
Individual tools and workflows are exposed through MCP interfaces One runtime provides a unified execution layer across multiple capabilities
Safety depends largely on tool implementation and agent behavior Runtime-enforced limits validate code, spending, and network access before execution
Teams build, host, monitor, and maintain MCP infrastructure Connect a single MCP endpoint and use managed execution services

Traditional MCP servers expose tools to the model.
UniversalBench moves computation, validation, and execution into a controlled runtime before results ever reach the model.

Runtime Architecture

What runs
where.

Your AI sends one instruction. UniversalBench handles everything in between, auth, execution, and safety, before a single result token reaches the model.

Secrets Vault
Credentials stored encrypted in the Worker. Never passed to your AI, never logged.
Network Isolation
The Runtime cannot reach internal IPs or cloud metadata. Enforced at the network layer, not the prompt layer.
Execution Isolation
Every customer runs in its own sandboxed process. No shared state, no cross-customer access.
Your AI
Claude · GPT · Gemini · Any MCP AI
UB Worker
Auth · Billing · Rate Limit
UB Runtime
Python · Web · GitHub · DB · LLM
Your Tools & Data
GitHub · DB · APIs · Stripe · Slack
Free Setup Call

Get connected in 30 minutes.
I'll walk you through it.

We connect your AI assistant together, run a live execution, and make sure everything works. No technical knowledge needed.

Book a free setup call →

30 minutes  ·  Video call  ·  Free forever

Hard limits
Three hard limits every AI agent must obey
Giving AI real execution power should not require blind trust. UniversalBench enforces production safeguards before actions happen.
Code safety
AI never ships broken code
Every code push is validated before it lands.
Generated code ready
Syntax validation
Live URL smoke test
Deploy approved
  • Validation before commit
  • Optional smoke testing
  • Automatic rollback support
  • Production-safe deployments
Prevents invalid production changes
Cost guardrails
AI never burns your wallet
Every model call is budget checked before execution.
LLM request incoming
Cost estimated first
Budget check
Allowed and executed
Default ceiling$0.50 / request
Hard platform cap$50.00 / request
Over budgetRejected before run
Surprise invoicesImpossible
Enforced before tokens are spent
Network isolation
AI cannot reach your internal network
Every outbound request is inspected before execution.
AI agent makes HTTP call
Request intercepted
Private IP blocked
Public internet only
  • Private IP ranges blocked
  • Loopback addresses blocked
  • Link-local ranges blocked
  • Cloud metadata endpoints blocked
Protects internal systems by default
Runtime architecture
Safety by runtime, not prompting
These protections are enforced by the runtime itself. They do not depend on prompt instructions, agent behavior, or model compliance.
AI Agent (Claude, ChatGPT, Gemini)
UniversalBench Runtime
Isolated Sandbox
External APIs & Web
Prompt-based guardrailsRuntime enforcement
AI asked not to break thingsBroken deployments blocked before commit
AI asked not to overspendOverspending is impossible
AI asked not to access internal systemsInternal systems unreachable by design
Secrets vault
Encrypted at rest. One-time setup. Auto-injected into every tool that needs them.
Network controls
Every outbound request inspected. Private IPs, loopback, metadata endpoints blocked.
Cost guardrails
Cost estimated before every LLM call. Requests over your ceiling rejected before tokens are spent.
Code validation
Syntax check, optional smoke test, and auto-rollback before any code reaches production.
Isolated execution
Each customer runs in a separate sandbox. No shared state, no cross-tenant access.
Verified Results
Three tests. Real Anthropic API.
Real tokens. Reproducible.

Run on 20 May 2026 against the live Anthropic API using claude-opus-4-7, the current Anthropic flagship. Test data is published below. The "true" answer in every test is elementary math, verifiable in Excel, R, bash grep, or any calculator. We are inviting you to reproduce these tests yourself.

Verifiable in Anthropic console · claude-opus-4-7 · 2026-05-20
TEST 01
Messy CSV Revenue Extraction
80 sales rows. Three different date formats. Missing and placeholder amounts. Question: total Q3 2025 revenue from EU customers.
Without UB: Opus 4.7 reasoned through every row, used 1,002 output tokens. Got the right answer ($30,111.05), but the long reasoning made the call expensive.
With UB: Python parsed and filtered. Sent Claude only the result. 34 output tokens. Correct.
Right answer, 25x cheaper output
Input tokens, Opus 4.7
Without UB
2,971
With UB
101
Input token reduction
96.6%
TEST 02
Statistics over 200 Transactions
200 amounts. Question: how many are above $5,000, what is the median, what is the standard deviation.
Without UB: Opus 4.7 spent 5,000 output tokens summing line by line. Got to row about 50 of 200. Ran out of budget. Never returned an answer.
With UB: Python computed in milliseconds. 84 output tokens. All three numbers exact.
Task incomplete vs instantly correct
Output tokens spent
Without UB
5,000 (hit budget)
With UB
84
Outcome
never finished, vs done
TEST 03
Server Log Error Counting
500-line server log with realistic mix of INFO, WARN, ERROR. Question: how many ERROR-level lines.
Without UB: Opus 4.7 said 96. True answer is 80. Off by 16. Confident wrong number.
With UB: Python counted. 80. Correct.
20% wrong vs correct, 500x fewer input tokens
Input tokens, Opus 4.7
Without UB
20,226
With UB
48
Input token reduction
99.8%
Methodology, plain and verifiable

Model: claude-opus-4-7. Anthropic API direct. Input pricing $15 per million tokens, output $75 per million tokens. Each "with UB" call sends a Python-computed answer to Claude instead of raw data. Token counts pulled from the Anthropic API response usage field.

The "true" answer in every test is elementary mathematics, not a Python opinion. Sum, count, median, standard deviation, and "lines containing [ERROR]" are unambiguous mathematical operations. They produce the same result in Excel, R, MATLAB, bash grep, or any pocket calculator. Test data and reproducer recipes are saved in our published reproducibility receipts. Run the math in whatever tool you trust most. The number will match.

Token reduction depends on the workflow. Small queries save less. Bulk data tasks like the three above save 95% to 99.8%. The "up to 96.5%" claim is a conservative reference from our original public test. At Opus pricing, Test 03 alone saves roughly $40,000 per year at 1,000 queries per day, or up to $400,000 at 10,000 queries per day. Customers pay UB $0.008 per call, so the math is verifiable for their own volume.

Common questions
The things people ask first.
Which AIs work with UniversalBench?

Claude, ChatGPT, Gemini, Cursor, and any MCP-compatible client. One URL works across all of them. Switching AI providers does not require re-integrating UB.

Is the 96.5% token reduction real?

Yes, on bulk data tasks. We have measured reductions of 95% to 99.8% across the three live tests above, all run against the current Anthropic flagship and reproducible from published data. Tasks that send small inputs to your AI will not see this scale of saving. The big savings show up when your AI is reading large datasets, log files, CSVs, or anything where Python can pre-process and send a one-number answer.

What happens after the 1,000 free calls?

You stop, or you top up your wallet from $5 and keep going at $0.008 per call. Credits roll over. No subscription, no auto-renewal, no surprise charges.

Is my data sent to other AI providers?

Only what your AI explicitly asks UniversalBench to send. Credentials you save go into an encrypted vault scoped to your account. Your data is not used for training. Your AI cannot reach your internal network by default.

What if a validation blocks something I want to push?

You see the exact reason. You fix it and try again. The validation is mandatory because that is what makes the safety claim real, but the error is always visible and actionable.

Can I raise the cost ceiling on LLM calls?

Yes. The default ceiling is $0.50 per call. You can raise it up to $50. The cap stays on. You control its size.

How do I cancel?

Stop topping up. There is no subscription. Unused credits get refunded if you ask.

Pricing
Start free.
Pay only for what runs.
All prices in USD
Web Search
$0.01 / search
100 searches/month free

Live results from the web, cited and structured. Billed per query, separate from your execution credits.

LLM Routing
from $0.0001 / 1K tokens
billed per token, per model
LLM Pricing
Per 1M tokens. Your AI picks the model per task.
ModelInput /1MOutput /1M
Database
Free
with your credentials

Connect any PostgreSQL-compatible database once via your vault. Read, write and search from any tool.

Hosted database
Coming soon
🛡️
Zero-risk guarantee: sign up, use it, and if you don't see measurable ROI, we refund you. No questions, no forms. That's how confident we are in the numbers.
Three promises. One URL.
AI that never ships broken code,
never burns your wallet,
never reaches your internal network.

Built into every call by default. One URL into any MCP-compatible AI. Free to start.

Get your free API key

No credit card. 1,000 free calls every month.

Get your
free API key.

1,000 free executions/month. No credit card. Under 2 minutes to connect.

Free forever · No credit card · Cancel anytime
Already have an account? Sign in
Welcome, your account
Loading…
YOUR PERSONAL MCP URL ENCRYPTED · PRIVATE
universalbench-mcp.penantiaglobal.workers.dev/u/ubk_•••••••••••••••••••••••
🔒 Your key is embedded in this URL and masked by default. Paste the whole URL into any MCP client (Claude Desktop, Cursor, etc) in one step. No separate header configuration needed.
YOUR API KEY RAW KEY · KEEP SECRET
ubk_•••••••••••••••••••••••••••••••
🛡 Use this when an MCP host asks specifically for an API key, not a URL. Most clients (Claude Desktop, Cursor) want the URL above instead. Rotate key if you suspect exposure (invalidates current URL and key).
Click to verify your AI client can reach UB
Billing & Usage Loading…
Wallet
$0.00
0 paid calls available
Free this month
,
resets first of month
Top up wallet
$
Top up from $5 to $500 per transaction. All amounts in USD. Paid call count shown above updates after payment.
$0.008 USD per execution after your 1,000 free calls each month. No subscription. Funds never expire.
Executions
0
1,000 free this month
Tokens saved
run your first call
Charged this month
$0.00
free tier first, then $0.008 per call
Top tool
no calls yet
Executions , last 14 days
Executions
Savings
Your tools
Web Search
Live web search from your prompts.
OFF
Code Execution
Python and Bash.
ACTIVE
LLM Routing
Route prompts to other LLMs from within UB.
OFF
Database
Add your database credentials.
CONFIGURE
GitHub
Add a GitHub access token.
CONFIGURE
Email
Coming soon.
SOON
Quick start , connect Claude in 2 minutes
1
Open Claude → Settings → Integrations → Add MCP server
2
Click Copy URL above and paste it as the MCP server URL. Your key is already embedded , no separate field to fill.
3
Ask Claude: "search the web for latest AI infrastructure news" , it works immediately