A 248-GPU NVIDIA B200 cluster run on-premises for three years costs roughly one-third of the equivalent hyperscaler spend, according to a 2026 Olds Research benchmark. Stretch the horizon to five years and ownership drops to one-fifth or one-sixth of cloud cost.
And yet most teams that act on that headline number will lose money. The AI compute cost equation in 2026 hinges on a single variable that almost nobody estimates honestly: sustained utilization.
The short version: above roughly 70% sustained GPU utilization for 24+ months, building or buying capacity beats leasing it, with breakeven landing in 14 to 24 months per an IDC/Intel 2025 brief. Below 40% utilization, or for any bursty workload, leasing wins through the full 3-year window.
TL;DR: Hyperscaler capex is forecast to hit roughly $725B in 2026, keeping rental GPU supply abundant and rates competitive. That makes lease the right default for inference and exploration. Buy (appliances, NVAIE licensing, reserved neocloud capacity) wins for steady mid-scale training. Build only makes sense above $30M capex with a real platform team and 70%+ utilization. Most organizations should run all three.
Key takeaways:
- Utilization is the whole ballgame: above 70% sustained, owning GPUs costs a third of renting them over three years; below 40%, the cloud wins every time.
- B200 silicon runs $30,000 to $40,000 per GPU in cluster orders; a GB200 NVL72 rack runs $2M to $3M, pushing true "build" toward a $30M+ minimum.
- Neoclouds undercut hyperscaler on-demand rates by 15 to 35% (Lambda H100 at $3.99/GPU-hr vs. $8 to $12 at AWS), but hyperscaler reserved discounts close most of the gap.
- Personnel is the most underpriced line item in build plans: a minimum viable platform team costs $1.5M to $3.5M per year.
- Egress fees and idle capacity quietly inflate cloud bills by 30 to 50%; analyst surveys put cloud AI waste at 25 to 40% of spend.
What drives AI compute cost in 2026?
Two structural shifts changed the math since 2024. First, NVIDIA's Blackwell generation is sold primarily as rack-scale product. The GB200 NVL72 ships as a 72-GPU rack reported in the $2M to $3M range by trade press, which raises the realistic entry point for a custom build to a $30M+ capex envelope.
Second, the rental market is oversupplied relative to 2023. Hyperscaler AI capex ran about $315B in 2025 and is forecast around $725B for 2026, with IDC reporting AI infrastructure spending on track to eclipse $1 trillion cumulatively by 2029. Gartner separately projects AI-optimized IaaS as the next growth engine for AI infrastructure.
All that capacity keeps per-GPU-hour rates competitive, which softens the case for owning hardware at moderate utilization.
When does building your own AI infrastructure pay off?
Build wins when you can keep expensive silicon busy for years. The IDC/Intel brief puts training-cluster breakeven at 14 to 24 months for workloads sustaining over 70% utilization. Inference fleets, which idle far more, often take over 4 years to break even, which is why almost nobody should build for inference alone.
The cost stack for a representative 248-GPU B200 cluster: $7.5M to $10M in GPU silicon, another 8 to 12% for the InfiniBand or Spectrum-X fabric, $1.5M to $3M for 1 to 2 PB of parallel-filesystem storage, and electricity at roughly $6M to $9M per year for a 10 MW load at PUE 1.20. Total 3-year TCO lands in the $80M to $160M range once the facility and staff are included.
The equivalent cloud spend at sustained utilization runs $250M to $500M, paid monthly with no capex.
The line item that sinks build plans is people. A minimum viable AI platform team of 5 to 10 engineers costs $1.5M to $3.5M annually before overhead, with senior MLOps and distributed-systems engineers clearing $250K to $400K base.
Mature platform orgs run $5M to $15M per year. Any build model priced at average compensation is materially understated.
Build also demands patience. Power-availability lead times in Northern Virginia, Phoenix, and Frankfurt have stretched to 2 to 4 years for new high-density sites. The famous counterexample, xAI's Colossus, stood up 100,000 H100s in 122 days, but that was a multi-billion-dollar program with sovereign-scale urgency.
Saudi Arabia's HUMAIN partnership with NVIDIA shows the same profile: build ROI in 2026 is strategic positioning over 3 to 10 years rather than a near-term payback.
What does the buy path actually get you?
Buy compresses deployment from quarters to weeks while keeping hardware under your control. Turnkey appliances (Dell PowerEdge XE9712, HPE Cray XD670, Supermicro liquid-cooled racks, NVIDIA DGX B200) deploy in 6 to 12 weeks versus 6 to 18 months for a custom build, with payback typically inside 12 to 24 months at 50 to 70% utilization.
The software layer is where the recurring cost hides. NVIDIA AI Enterprise is licensed per GPU, with the Essentials tier widely reported at $4,500/GPU/year (NVIDIA confirms the model in its licensing guide but does not publish the dollar figure).
On a 248-GPU cluster that is about $1.1M per year, or $3.3M over a 3-year subscription. The 5-year subscription drops the effective rate to roughly $3,600/GPU/year.
The buy category also includes reserved capacity at the neoclouds. CoreWeave, which reported a $30.1B backlog in June 2025 and counts OpenAI as an $11.9B five-year customer, offers 12-month reserved H100s at $4.93/GPU-hr (20% off on-demand), with multi-year committed deals reported as low as $1.79/GPU-hr for very large customers. That is the most-disclosed buy-side ROI in the market: fast access to tens of thousands of GPUs without owning any of them.
How cheap is leasing AI compute, really?
On-demand list prices vary by a factor of three for the same silicon. Lambda lists H100 SXM at $3.99/GPU-hr and B200 at $6.69. CoreWeave lists H100 at $6.16 and B200 at $8.60. The hyperscalers sit at roughly $8 to $12/GPU-hr for H100 on AWS p5 instances, Azure ND H100 v5, and GCP A3.
Reserved pricing narrows the spread. Hyperscaler 1-year commitments discount 30 to 40% and 3-year commitments can exceed 50%, while Google's TPU line undercuts equivalent GPU rates by 30 to 60% for inference at scale (TPU v5e on-demand is widely reported at $1.20 to $1.50/chip-hr).
Then there are the costs the pricing page doesn't show. Internet egress runs $0.08 to $0.12/GB at all three hyperscalers and can add 5 to 15% to the bill for data-heavy workloads; Lambda and CoreWeave both advertise zero egress fees.
Analyst surveys put 25 to 40% of cloud AI spend on idle or oversized instances. And per-token model APIs, like Claude Sonnet at $3/$15 per million input/output tokens per Anthropic's pricing docs and matching rates on AWS Bedrock, are priced nearly identically across clouds, so the lock-in there is feature and catalog depth rather than rate.
One ROI number circulating widely deserves a caveat: the Forrester Total Economic Impact study claiming 368% three-year ROI for Microsoft Foundry was commissioned by Microsoft. Treat it as marketing evidence, useful directionally but unfit for your spreadsheet.
Build vs. Buy vs. Lease: the comparison
| Dimension | Build | Buy | Lease |
|---|---|---|---|
| Upfront capital | $30M+ | $1M to $10M | None |
| Time to production | 6 to 18 months | 4 to 12 weeks | Days to weeks |
| Breakeven vs. Cloud | 14 to 24 months at >70% utilization | 12 to 24 months | N/A (pure opex) |
| Platform team required | 5 to 15+ FTE | 1 to 5 FTE or co-managed | None |
| Best workload fit | Frontier training, hard sovereignty | Steady mid-scale training, regulated | Inference, exploration, burst |
| Main hidden cost | Personnel, power lead times | Vendor reference-architecture lock-in | Egress, idle waste, token bills |
How should you actually decide?
Score each workload, never the whole company. The cleanest 2026 pattern is a three-layer portfolio: hyperscaler lease (including model APIs) for inference and exploration, bought appliances or reserved neocloud capacity for steady training, and custom build reserved for frontier-scale training above roughly 5,000 sustained GPUs over 24+ months.
Sovereignty can override the economics. EU AI Act obligations, HIPAA, FedRAMP High, and DORA push the most sensitive workloads toward build or appliance deployments, though Microsoft's European sovereign cloud offerings and the equivalent AWS and Google programs now make in-region lease viable for many regulated cases.
And watch the lock-in you already have. Standardizing on CUDA, NVLink, and NIM creates a portability moat regardless of where the hardware sits. An enterprise running NVAIE on-prem plus Bedrock plus Azure OpenAI is committed to NVIDIA's stack and two hyperscaler catalogs at once.
Open-weight models and serving stacks like vLLM and Triton are the practical hedge.
What this means for you
Start by leasing and measuring. Run real workloads on Bedrock, Azure OpenAI, or Vertex AI for 60 to 90 days and capture actual utilization, latency, egress, and per-inference cost. Those four numbers settle the AI cost optimization debate better than any benchmark report.
If sustained utilization clears 50%, price a reserved neocloud contract or an appliance with NVAIE against your current cloud bill; the payback math usually favors buying within 12 to 24 months. Only open the build conversation if you can clear 70% utilization for two years, fund $30M+, and staff a real platform team.
Then re-score annually. GPU generations, reserved discount curves, and per-token prices all move faster than a 3-year TCO model assumes, and the answer that was right in 2025 is already stale.
Sources
- Gartner: AI-optimized IaaS poised to become the next AI infrastructure growth engine
- IDC: AI infrastructure spending caps historic year, on track to eclipse $1 trillion
- IDC/Intel brief: Balancing datacenter and cloud investments for AI
- Olds Research on-prem vs. Cloud AI TCO benchmark
- NVIDIA AI Enterprise licensing guide
- CoreWeave cloud pricing
- Lambda AI cloud pricing
- AWS EC2 on-demand pricing
- Amazon Bedrock pricing
- Anthropic Claude API pricing
- Google Cloud TPU pricing
- Azure ND H100 v5 series documentation
- TechSpot: NVIDIA Blackwell server cabinets estimated at $2M to $3M
- Microsoft Foundry Forrester TEI study (Microsoft-commissioned)
- HUMAIN and NVIDIA strategic partnership announcement
