From Strategy to Solvency: Revisiting the AI Factory Framework

February 25, 2026

Summer of 2025 is an aeon in the AI era (remember NVIDIA hit $4T?).

Back then, I wrote a piece arguing enterprises should build AI Factories. Think of a factory in the AI era as a system that produces intelligence at scale. The core thesis was that data infrastructure is the highest leverage investment for driving down the cost of that intelligence. I have some confidence that the framework held up. The economics, however, did not.

The market is still digesting a set of brutal physical realities that are putting pressure on the AI Factory model. If you are building an internal AI platform today, you are not deploying software. You are managing a heavy industrial supply chain. And it is subject to some uncompromising constraints.

Compute is no longer the sole bottleneck. The physical chokeholds surrounding the compute are grinding projects down. Chokeholds like high bandwidth memory (HBM), advanced packaging (like TSMC CoWoS), and massive enterprise SSDs – all are upstream of compute and can contribute to compute prices going vertical.
Frontier labs are warping the market. Anthropic and OpenAI are selling tokens below cost, subsidizing generalized intelligence with private capital to capture market share.
The power grid has the final say. The multi-year queue for data center power permitting means physical infrastructure is aggressively lagging behind corporate ambition. Local politics will likely throw a wrench in the efficiency of building and distributing power for AI.

The result is enterprises are paying more per token (either via API markup or by buying their own GPUs at market rates), with less volume to amortize costs. And unlike the frontier labs, they can't bet on future efficiency gains or scale to make the math work.

The question for enterprise AI architects is no longer, "How do we build better factories?" It is, "Which workflows can survive the unit economics of factory-scale infrastructure?"

If you are evaluating enterprise AI economics in 2026, here is a sketch menu for consideration. Of course, this world moves fast, the turns are hairpins and I could be very wrong. Caveat emptor etc etc.

Don't assume you can optimize your way to profitability. The frontier labs have better models, more scale, and access to private capital at levels that look like funny money. And, yet, they're still selling tokens below cost. If Anthropic can't make the unit economics work yet, an internal enterprise AI platform won't either. The bet they're making is that scale and efficiency gains will eventually close the gap. Enterprise architects do not have the same runway and many have to contend with public markets.
Focus on high-value, narrow use cases. The era of "let's try AI on everything" is over. When every GPU is spoken for, experimentation has a real opportunity cost. Compute spend on exploration is compute not spent on something that's working. The teams I've seen get this right pick one or two use cases where AI is directly tied to revenue or cost reduction, prove them out, and only then expand. It's an easier sell to budget overlords and their CFOs thank them immensely. The ones that struggle are still running broad "AI strategy" initiatives with no clear line to value.
Question the tooling stack. I spent nearly two years marketing data infrastructure tooling, so I say this with some self-awareness: every piece of the stack has a cost, and not just the license fee. Redundant vector databases, data versioning, experiment tracking, model registries, they all consume compute, engineering attention, and integration time. In a world where GPU capacity is fully allocated, the question is not "Is this tool useful?" It is, "Does this tool improve $/token enough to justify its overhead?" Some will. Some won't. Be honest about which is which.
Watch memory, not just compute. The GPU shortage gets the headlines, but high-bandwidth memory (HBM) is the quieter constraint that's actually throttling projects. You can provision all the compute you want but if you can't feed the GPU data fast enough, you're paying for idle cycles. SK Hynix, Micron, and Samsung can't keep pace with demand, and that's not changing soon. The cost of basic data storage for training and RAG is skyrocketing. If your team is capacity planning, memory throughput deserves as much attention as GPU count.

The AI Factory framework still works for understanding the system. But the economic reality has shifted from "build it and optimize later" to "prove it pays before you build."

That's the difference seven months makes in a market defined by unprecedented scarcity.