Is this an LLM judging another LLM?

No. NumProof uses exact rational and symbolic math (and optional Lean proofs). There is no model in the verdict path, so the same input always yields the same verdict.

Why a signed receipt instead of just recomputing the number?

A counterparty needs an independent, named attestation it can re-check offline. Recomputing a number yourself is not that. NumProof returns a signed Verification Receipt; an open-source re-checker recovers the signer and independently re-derives the verdict with commodity libraries — zero trust in NumProof required.

What can NumProof verify?

Arithmetic, algebraic identities, spreadsheet footing and ties, ratios and covenant rules, and sequence formulas. For anything it cannot decide exactly, it ABSTAINs rather than guessing.

Your AI did the math. Would you stake money on it — without checking?

An AI agent will tell you "gross margin improved to 44.8%" or "this invoice totals $48,200" with total confidence. Often it's right. Often enough it isn't — and in an agent workflow something downstream acts on that number: it releases a payment, files a report, triggers a trade. Confidence is not correctness.

"Just have another model check it" doesn't fix this

Asking a second LLM to grade the first one fails for three reasons. It's non-deterministic — the same claim can pass one minute and fail the next. It fails in correlated ways — models tend to be confidently wrong about the same things. And most importantly, a model checking a model is not an independent, accountable attestor. "We asked GPT and it said the number was fine" is not something a counterparty, an auditor, or a finance team will accept.

What NumProof does

NumProof verifies a numeric or financial claim deterministically: exact rational arithmetic and symbolic math (and optional Lean 4 proofs), no model in the loop. You get one of three answers:

VERIFY — the claim holds, exactly.
REFUTE — it's false, with a concrete counterexample.
ABSTAIN — it isn't exactly decidable, so NumProof says so instead of guessing.

Same input, same verdict, every time — with cell- and formula-level provenance when you hand it a spreadsheet.

curl -s $BASE/verify -H "Content-Type: application/json" \
  -d '{"claim":"gross margin is 60% when gross profit is 600 and revenue is 1000"}'
# -> {"verdict":"VERIFY", ...}

The part that actually matters: a receipt you can re-check yourself

Every verdict can come with a signed Verification Receipt. A second party runs the open-source re-checker — numproof-verify — which recovers the signer and independently re-derives the verdict using commodity libraries (stdlib Fraction + sympy). No trust in NumProof required.

An agent can compute a number for itself. What it cannot do is issue an independent, signed attestation that a counterparty will accept. That asymmetry — not the arithmetic — is the product.

Where this fits

Agent-to-agent payments: release the USDC only when the number verifies; withhold on REFUTE.
Spreadsheet & report audits: footing, cross-foot, balance-sheet ties, margins — with provenance.
Covenant & ratio checks: DSCR, Debt/EBITDA, current ratio, against a rule pack.
Inside AI products: verify numbers before your app ships them to a user.

Try it

Run the live demo (no key) Re-check a receipt yourself

API · CLI · MCP server · x402 pay-per-call (USDC on Base). Building a finance or agent workflow? Request a pilot — or email support@numproof.com.