Share article

AI was supposed to make software engineering cheaper. Instead, a fresh batch of numbers suggests many companies are paying twice, once for the model output and again to clean up after it. Efficiency, sure.

Data cited this week shows as much as 82% of enterprise AI engineering spend is getting burned before code reaches usable production systems. The losses are concentrated in three buckets: bug fixing, rewrites of AI-generated code, and the review delays that pile up when teams have to verify what the machine produced. That is less "move fast" and more "pay later, repeatedly." [1]

Enjoy articles without ads?

Register for free and get unlimited access to all articles.

The numbers behind the waste

The sharpest figure comes from Entelligence AI, which surveyed 2,444 companies on the economics of AI-assisted engineering. Its breakdown is blunt: for every $1 spent on AI tokens, about $0.44 goes to fixing bugs, $0.27 is lost rewriting generated code, and another $0.11 disappears into review and merge delays. [2]

That adds up to 82 cents on the dollar not translating into production-ready output. The core point is not that AI tools are useless. It is that the gross cost of generating code is only the opening invoice. The expensive part comes when engineers have to test, correct, refactor, and finally approve work that looked fast at the prompt stage.

A separate 2026 report from Lightrun reinforces the same problem from the reliability side. It found that 43% of AI-generated code still needed manual debugging in production even after passing quality checks. That is the sort of stat that ruins a lot of slide decks. [3]

Why AI coding economics look worse in practice

The cost issue is not just buggy code. It is workflow distortion.

AI coding tools can increase output volume faster than they improve output quality. That creates downstream congestion in code review, security checks, and integration testing. Engineering managers then get hit with a familiar problem dressed up as innovation: too much low-confidence code entering the pipeline at once.

That matters because review time is skilled labor, not free compute. If senior engineers spend hours validating brittle code suggestions, the apparent savings from token-based generation can evaporate. Rewrites are even worse. A bug fix preserves some value. A rewrite is an admission that the first pass mostly belonged in the trash.

The hidden cost is trust. If developers cannot rely on generated output, they slow down to inspect more aggressively. That undercuts the main sales pitch of AI copilots, which is speed with acceptable quality. [4]

Production remains the hard boundary

The production environment is where optimistic internal demos meet consequences. Lightrun's findings suggest that passing pre-deployment checks does not guarantee resilience once software is exposed to live traffic, real data, and edge cases. That gap is especially expensive in regulated or always-on systems, where post-release failures carry direct financial and reputational costs.

Recent examples in crypto and adjacent tech have already shown the issue. AI-generated code may accelerate iteration, but when reliability slips, teams end up paying in rollbacks, patch cycles, and manual intervention. Faster output is not the same as finished software.

Oracle's AI infrastructure bet adds another layer of risk

The study's findings land awkwardly beside the infrastructure race now underwriting the AI boom. Oracle, one of the big corporate beneficiaries of AI demand, has reportedly accumulated roughly $108 billion in total debt while also raising another $50 billion in 2026 through debt and equity to expand data center capacity.
That would be easier to shrug off if the demand base looked diversified and financially bulletproof. It does not. More than $300 billion of Oracle's $553 billion backlog is tied to OpenAI, according to the source material. OpenAI, meanwhile, reportedly lost about $14 billion last year. [1]
Concentration risk is doing a lot of work here. If one customer accounts for more than half the backlog, the "AI infrastructure supercycle" story starts to depend on a fairly narrow set of assumptions staying intact. Namely: sustained model demand, continued capital access, and customers willing to keep absorbing very expensive compute.
Negative free cash flow near $13 billion only makes the balancing act tighter. Oracle is effectively making a high-conviction wager that AI demand will grow into the cost structure. Maybe it will. Leverage has a way of looking brilliant until it does not.

Labor is being repriced around AI competence

The third signal in this story is organizational, not technical. Companies are no longer treating AI as an optional productivity layer. They are starting to fold it into how workers are measured.
OKX has linked employee performance reviews to AI proficiency, reflecting a broader shift in which firms increasingly expect staff to work effectively with AI tools. The subtext is obvious: if agents and copilots can automate routine work, employees who add little beyond routine work become easier to identify.

That does not mean AI replaces everyone. It does mean the bar for "useful" is moving upward. Workers are being judged not just on output, but on how well they can direct, verify, and integrate machine-generated output. The winners in that environment are not the people who blindly use AI the most. They are the ones who can extract value from it without flooding the system with errors.

More AI use does not equal more AI value

This is the part many AI-first corporate narratives skip. Adoption is easy to announce. Productive adoption is much harder to prove.

If 82% of spend disappears before production, then usage metrics alone are close to meaningless. A team can generate more code, more tickets, and more model calls while creating less net business value. The interesting metric is not prompt volume. It is how much verified, maintainable software ships without dragging senior engineers into endless repair work.

Why it matters

The bigger story is that AI economics are maturing from hype to accounting. Early gains were often measured in speed at the keyboard. Now the market is starting to measure total system cost: fixes, delays, infrastructure debt, and labor reshuffling included.

That is a healthier lens. It also makes the current wave of AI claims easier to test. If a tool cuts development time, great. If it also increases downstream debugging, review bottlenecks, and rewrite rates, the savings may be fictional.

For crypto firms, exchanges, and infrastructure providers, this matters more than usual. They operate in environments where software errors can become financial losses very quickly. Shipping low-confidence code because a model produced it cheaply is not a growth hack. It is an expensive habit.

What to watch next

Watch for a shift away from raw AI adoption metrics and toward production-quality benchmarks. The useful signals will be code rollback rates, post-deployment incident counts, review-cycle time, and how much generated code survives without major rewrites.

Also watch whether infrastructure spending keeps outrunning demonstrated returns. If enterprise buyers continue absorbing high AI costs while most of the engineering value leaks out before production, the market will eventually ask a rude but fair question: what exactly are we financing?

For now, the clean takeaway is simple. AI can save time. Many companies are still spending that time fixing what AI just made.