AI Unit Economics: Burn Rate & Technical Insolvency

You are not building software anymore. You are operating a metered intelligence system where every interaction carries cost, every request consumes compute, and most teams cannot tell you what that cost actually is. The problem is not visibility at the surface level. The problem is that the system generating the cost is not understood or controlled.

The Structural Shift Most Teams Underestimate

For two decades, SaaS operated on a stable model:

build once
scale users
marginal cost approaches zero

That model produced high-margin businesses because cost did not scale with usage in any meaningful way.

AI changes that completely. Your system now behaves as:

variable cost per interaction
expanding compute per workflow
non-linear cost scaling as complexity increases

If you attach this to a fixed pricing model, your margin compresses as usage grows. This is not a strategy problem. It is a structural constraint.

The Two Cost Systems You Are Running

Most teams measure what is visible:

API calls
token usage
infrastructure spend

This is incomplete.

There is a second system operating underneath:

System-driven cost:

retries caused by low-confidence outputs
prompt expansion to enforce reliability
repeated context loading across interactions
multi-step orchestration chains
fallback model execution

This cost is not driven by user demand. It is driven by system behavior. It scales faster and is significantly harder to control.

Cost of Predictivity

To make AI usable, teams attempt to force consistency onto a probabilistic system.

That effort introduces a new cost layer:

Cost of Predictivity = Total system cost required to achieve acceptable reliability

It is driven by:

additional tokens for instruction clarity
repeated calls for validation
layered prompts for consistency
guardrails that increase compute overhead

Cost does not scale with usage alone. It scales with the level of reliability your system requires.

Why Cost Compounds Instead of Scaling

Uncontrolled systems follow a consistent pattern:

Output variability creates rework
Rework increases compute usage
Increased compute introduces latency
Latency triggers retries
Retries compound cost

This creates a feedback loop. Cost does not grow linearly. It compounds.

The Real Problem: No Control System

The issue is not the model. The issue is the absence of enforced constraints.

Most systems lack:

bounded execution rules
deterministic output formats
pre-execution validation
cost-aware orchestration

Without these, the system expands until it breaks the economics it operates within.

The Only Architecture That Holds

To operate AI systems at scale, you need enforced control layers:

Persistent Memory
Eliminates redundant context injection and reduces repeated compute
Structured Inference
Constrains outputs and reduces variability
Admissibility Controls
Validates actions before execution and blocks invalid operations
Accountability Layer
Records every action for auditability and system correction

Without these, you are not running a system. You are running an experiment.

The Metric That Forces Reality

You need a single metric that reflects cost, scale, and reliability:

Technical Insolvency Date

Defined as:

the point where cost per interaction exceeds recoverable value
the system can no longer sustain its own economics
correction requires structural change, not optimization

This is driven by:

cost per interaction
interaction volume
system inefficiency
pricing constraints

What Most Teams Get Wrong

Failure does not appear immediately. It accumulates.

It looks like:

increasing engagement
growing usage
expanding feature sets

While underneath:

cost per interaction rises
system inefficiency compounds
margin compresses

By the time it appears in financial reporting, it is already structural.

What To Do Now

You do not need more features. You need control.

Start here:

quantify cost per interaction
isolate system-driven cost vs user-driven cost
measure retries and failure-induced compute
map cost to revenue per interaction

Then:

introduce control layers
constrain system behavior
enforce execution boundaries

Action

Run your numbers: https://www.richardewing.io/tools/pdi/

Identify your Technical Insolvency Date. Then decide whether you are operating a product or a liability.

Richard Ewing
The Product Economist
Operator of The AI Economist

Your AI System Has a Burn Rate