June 5, 2026 / 3 min read

You can't measure Tokens per Outcome without the right denominator

AI token spend needs a better denominator than requests, tickets, or merged PRs: the business decisions those tokens helped move.

tokens per outcomeAI token spendagentic AI governance

Donella Meadows's Thinking in Systems gives teams two useful feedback loops: reinforcing and balancing.

A reinforcing loop is the engine. Output feeds back as input, so each cycle amplifies the direction of change.

A balancing loop is the governor. It measures the gap between where the system is and a target, then pushes back, like a thermostat or the spinning brass governor on a steam engine that throttles steam flow as speed rises.

Your AI usage bill is currently running three engines.

Three engines are running your AI bill

Engine one: agentic compounding. Agents spawn sub-calls. Every turn re-sends a growing context window. Cost rises with task depth. Agent output becomes another agent's input, creating multiple autonomous loops that compound cost.

Engine two: the budget loop. Successful AI features attract more budget, staffing, and executive attention. That funds more AI features and normalizes more token use over time.

Engine three: Jevons paradox. As tokens get cheaper or more efficient, new use cases become viable. If demand expands faster than unit cost falls, total token consumption, and possibly total spend, can rise.

The broader measure

Meadows's rule of thumb is that reinforcing engines amplify until something stops them. In June 2026, TechCrunch reported that Uber had exhausted its annual AI-coding-tool budget in the first four months of the year, then set a $1,500 monthly token-spending cap per employee per agentic coding tool, with an exception process.

The issue is not that caps are irrational. The issue is that caps become the setpoint. They curb creativity exactly where companies need exploration.

The better setpoint is Tokens per Accepted Outcome. But most teams do not know how to set and measure outcomes across human and AI-agent teams.

You frame the question. The agent answers inside your frame. The agreement reads as independent confirmation. The frames narrow.

A dashboard that says tokens per request completed, tokens per ticket closed, or tokens per PR merged will paint a narrow picture. It may even reward the wrong behavior.

"Pay attention to what is important, not just what is quantifiable." - Donella Meadows, Dancing with Systems

The wrong denominator is an easy proxy for the real purpose.

The missing denominator

The example from Thinking in Systems, consistent with behavioral economics, points toward the solution. In a group of otherwise identical Dutch houses, the homes with the electricity meter in the front hall used roughly a third less power than the homes with the meter in the basement.

A visible number, in the right unit, changed behavior by itself.

So the real question is not only: How many tokens did this take?

The real question is: What key decisions did those tokens help make, and did those decisions move the business?

That is the missing denominator.

Tokens per request tells you the cost of interaction. Tokens per ticket tells you the cost of throughput. Tokens per PR tells you the cost of shipped artifacts.

Tokens per Outcome asks whether the spend moved a decision that mattered.

That number is harder to instrument. It requires teams to define accepted outcomes, connect agent work to the decisions behind those outcomes, and inspect whether those decisions changed customer, revenue, risk, or operating behavior.

But that is the work. Without the right denominator, AI finance turns into throttling. With the right denominator, token spend becomes a signal for business learning.

Bring decision intelligence into your product judgement.

Ask The W is opening enterprise access for teams coordinating strategy, execution, and AI-agent work.

Join the waitlist