Economics Of Ai Coding Agents

Neocloud GPU Economics Are Cheap, Fragile, and Winning Anyway

GPU rental prices have collapsed 64-85% below hyperscalers, but the debt and utilization math underneath is brutal.

By June 26, 202613 min read
neocloud GPU economicsGPU cloud pricingAI compute market
Neocloud GPU Economics Are Cheap, Fragile, and Winning Anyway

In March 2025, CoreWeave IPO'd at a $23 billion valuation on the back of GPUs bought with borrowed money. Fourteen months later it joined the Nasdaq-100, having booked $5.13 billion in 2025 revenue and a contracted backlog that hit $99.4 billion by March 2026.

It also lost $1.167 billion on a GAAP basis that year. That tension, cheap rental pricing financed by fragile debt, is the whole story of neocloud GPU economics in 2026.

Neocloud GPU economics describes how pure-play GPU rental providers like CoreWeave, Lambda, Crusoe, and IREN price compute 30 to 85 percent below hyperscalers while staying solvent on debt-financed capacity that only pays back above 60 percent cluster utilization. For procurement leaders, the practical move is hybrid: neocloud for training where unit cost dominates, hyperscaler for latency-sensitive serving where SLA and compliance dominate, with a two-vendor minimum to hedge counterparty risk.

TL;DR

  • Neoclouds charge 30 to 85 percent less per GPU-hour than AWS, Azure, or GCP for identical NVIDIA silicon as of June 2026.
  • The cheap price rests on debt: CoreWeave carries $21+ billion in GPU-backed loans and posted a $740 million Q1 2026 net loss despite a 56 percent EBITDA margin.
  • Profitability hinges on one variable, utilization. Below 60 percent, clusters lose money; above 80 percent, margins expand fast.
  • The H100 has fallen 64 to 75 percent from its 2023 peak near $8/hr, and AWS broke a 20-year declining-price pattern with a 15 percent H200 hike in January 2026.
  • Best practice is hybrid procurement: neocloud for training and batch, hyperscaler for production serving and regulated workloads.

What is a neocloud, and why did it become a category?

SemiAnalysis coined the working definition in late 2024: neoclouds are "a new breed of cloud compute provider focused on offering GPU compute rental," without the breadth of general-purpose hyperscaler services. The category did over $25 billion in revenue in 2025 and Gartner predicted in June 2026 that neoclouds will capture 20 percent of the $267 billion AI cloud market by 2030.

The competitive set is concentrated. CoreWeave, Lambda, Crusoe, IREN, and Voltage Park collectively hold an estimated 10 to 20 percent of AI cloud revenue, while AWS, Azure, and GCP still control roughly 70 percent of public GPU capacity.

Voltage Park exited standalone status in January 2026 by merging with Lightning AI, a signal that the long tail of small neoclouds is already consolidating.

CoreWeave is the reference case. It reached $5.13 billion in 2025 revenue faster than any cloud company in history, and its backlog grew from $66.8 billion at end-2025 to $99.4 billion by March 2026, anchored by take-or-pay contracts with Microsoft, OpenAI, Meta, and Anthropic.

How much does GPU rental actually cost in June 2026?

The per-GPU-hour market exhibits a 4.7x to 9.1x spread between the cheapest and most expensive providers for identical hardware. The H100 has fallen 64 to 75 percent from its 2023 peak near $8/hr, driven by hyperscaler oversupply and neocloud capacity expansion.

H100 80GB SXM on-demand, Q2 2026

Provider Type On-demand Notes
Voltage Park Neocloud $1.99/hr Raised ~40% in 5 months
Crusoe Neocloud $3.90/hr 8x node $17.36/hr
Lambda Neocloud $3.99/hr Transparent list pricing
CoreWeave Neocloud $6.16/hr HGX H100 8-GPU node
AWS Hyperscaler $6.88/hr P5.4xlarge post-June 2025 cut
Google Cloud Hyperscaler $10.98/hr A3 High GPU
Azure Hyperscaler $12.29/hr ND96isr H100 v5

The cheapest-to-most-expensive spread for H100 is 6.2x, from Voltage Park at $1.99 to Azure at $12.29. The H200 picture is more extreme: Lambda and Crusoe list $4.29/hr, while Azure charges $44.52/hr and AWS lists H200 Capacity Blocks at $39.80/hr after a roughly 15 percent increase in January 2026 that broke a 20-year pattern of declining cloud prices.

H100 80GB on-demand $/GPU-hr, Q2 2026Voltage Park1.99$/hrCrusoe3.9$/hrLambda3.99$/hrCoreWeave6.16$/hrAWS6.88$/hrAzure12.29$/hr
H100 80GB on-demand $/GPU-hr, Q2 2026

The B200 sits in a transitional band of $5 to $7/hr versus $2 to $4/hr for H100. Because the B200 delivers roughly 2.3x the tokens-per-second of H100 on memory-bound inference, its 2.2x hourly premium works out to about 7 percent cheaper per million tokens despite the higher absolute rate.

How do neoclouds make money on prices that low?

Three revenue structures underpin the model. On-demand rental is the base layer, typically $3.90 to $6.16/hr for H100 neocloud on-demand. Reserved and volume commitments knock 15 to 30 percent off for multi-month or multi-year deals.

The structural foundation is take-or-pay long-duration contracts of three to six years, signed with hyperscalers or frontier labs, with upfront or milestone payments the neocloud uses to buy and deploy GPUs.

CoreWeave's $99.4 billion backlog is almost entirely take-or-pay commitments from Microsoft, OpenAI, Meta, and Anthropic. That backlog is the financial engine funding procurement and data-center buildout.

The cost stack and the utilization cliff

A typical 8x H100 node costs $222,000 to $383,000 in upfront capex, with $115,000 to $270,000 in annual operating expenses per node. Power runs 25 to 40 percent of opex; a $0.05/kWh electricity differential swings annual margins by $15,000 to $30,000 per node.

Depreciation alone runs $0.55 to $1.15 per GPU-hour before any operating expense is paid, and GPUs are typically depreciated over four to six years on an 18-to-24-month generation cadence.

Realistic gross margins on GPU rental, accounting for full depreciation, land around 14 to 16 percent after labor, power, and depreciation. CoreWeave's 56 percent adjusted EBITDA margin in Q1 2026, per Sacra, excludes the $740 million net loss that quarter and the substantial interest expense on its debt.

Utilization is the single variable that decides whether the model works.

Utilization Financial outcome
0 to 60% Revenue below fixed costs, operating losses
60 to 75% Breakeven or thin margins, survival zone
75 to 80% Adequate margins, sustainable operation
80 to 95% Strong margins, premium pricing available
Above 95% Operator is leaving money on the table

Hyperscalers can absorb utilization swings because they spread fixed costs across CPU, storage, networking, and managed services. A neocloud has no other revenue line. Every demand dip goes straight to the bottom line.

Why are GPU spot prices so volatile?

Operators typically run three tiers: on-demand at full price, reserved at a 30 to 60 percent discount for one-to-three-month commitments, and spot at up to 50 percent below on-demand with preemption on under two hours notice. Spot fits fault-tolerant batch workloads like training with checkpointing. It is wrong for latency-sensitive serving.

Spot volatility comes from five drivers. Supply constraints shrink the float when CoreWeave, Lambda, and Nebius lock multi-year contracts. Frontier lab demand bursts from OpenAI, Anthropic, Meta, and xAI tighten residual supply.

Hardware generation transitions, H100 to H200 to B200 to GB200 to GB300/Rubin, drop prior-generation spot prices sharply, and GB300/Rubin announcements in 2026 have already depressed B200 spot. Power availability in tight markets like Northern Virginia, Phoenix, and Dublin is now the binding constraint on AI cloud expansion, not GPU supply.

And aggregators like Vast.ai and RunPod smooth volatility for buyers but take 20 to 30 percent commission from operators.

What does the hyperscaler premium actually buy?

The 30 to 85 percent neocloud price advantage is real but incomplete, per McKinsey. The hyperscaler premium buys four things pure GPU rental cannot match.

Managed ML platforms like SageMaker, Azure ML, and Vertex AI eliminate the need for dedicated ML infrastructure engineering teams, a cost invisible in per-GPU-hour comparisons. Compliance breadth is the bigger deal: hyperscalers hold FedRAMP High, PCI-DSS Level 1, HIPAA-eligibility, ITAR, and DoD IL4/IL5 across government regions, while neoclouds typically top out at SOC 2 Type II and ISO 27001.

For healthcare, financial services, or government workloads, hyperscalers are effectively the only viable option without custom compliance engineering.

Global networking is the third pillar. Hyperscalers run 30-plus regions with private backbones like AWS EFA at 400Gbps, Direct Connect, and Cloud WAN. Neoclouds typically operate one to three regions, and cross-region data egress at $0.08 to $0.12/GB compounds fast for multi-region inference.

Reliability is the fourth: GCP Compute Engine guarantees 99.99% monthly uptime for zonal persistent disk instances, and AWS EC2 matches that for Multi-AZ. CoreWeave claims 99.9%, with 99.99% on some configurations.

For organizations already in a hyperscaler ecosystem, enterprise credits and committed-use discounts up to 72 percent off on-demand can narrow the effective GPU cost gap to 15 to 30 percent. Still significant, but not the headline 2x to 6x gap.

The debt underneath the cheap prices

CoreWeave's growth has been debt-fueled. Its financing strategy used GPUs as collateral for large loans, producing roughly $11.2 billion in initial debt and total liabilities approaching $29 billion.

Q1 2025 interest payments alone totaled about $264 million. In June 2026 CoreWeave completed the AI sector's first euro junk-bond deal, and NVIDIA invested $2 billion in January 2026 to help it add 5 gigawatts of capacity.

The company guided $31 to $35 billion in capex for 2026.

Three structural risks follow. Debt-fueled expansion is sensitive to utilization drops or customer defaults; if Microsoft, OpenAI, or Meta renegotiates, debt service becomes a cash-flow crisis immediately. GPU obsolescence turns over every 18 to 24 months, so hardware bought at peak pricing may carry near-zero residual value two years later.

And overcapacity is a live scenario: if demand growth slows because of model efficiency gains or a training market plateau, hundreds of billions in annual capex could crash spot prices and compress operator margins industry-wide.

Customer concentration is the biggest unhedged version of this risk. Microsoft alone accounted for roughly 62 to 67 percent of CoreWeave's FY2025 revenue. Microsoft's partial withdrawal from some CoreWeave agreements ahead of the IPO in early 2025, and its decision not to exercise a nearly $12 billion option that OpenAI then picked up, showed that even committed partners adjust.

A workload-by-workload procurement framework

Workload Primary recommendation
Large-scale pre-training Neocloud, 40 to 60 percent cheaper and faster to scale
Fine-tuning (LoRA, QLoRA, SFT) Neocloud for cost-sensitive; hyperscaler for integrated workflows
Batch inference (async, offline) Neocloud spot, 40 to 70 percent cheaper with acceptable fault tolerance
Burst / spike workloads Neocloud spot if fault-tolerant; hyperscaler reserved if SLA-critical
Real-time / latency-sensitive serving Hyperscaler, multi-AZ failover and managed inference outweigh unit cost
Regulated workloads (HIPAA, FedRAMP, PCI) Hyperscaler, unless neocloud holds the specific certification
Hybrid training plus serving Neocloud for training, hyperscaler for serving

A few decision rules sharpen this. If GPU availability is blocking your team and a hyperscaler quote is six-plus months out, choose a neocloud; speed-to-compute dominates during frontier training cycles.

If your workload is training or fine-tuning and you can self-manage orchestration, choose a neocloud and capture 30 to 60 percent savings. If you operate in a regulated industry, choose a hyperscaler.

If you have steady predictable demand at scale, negotiate take-or-pay contracts with the neocloud giants for better unit economics than hyperscaler multi-year commits.

Due diligence checklist for evaluating a neocloud

  • Power and cooling: Where is the cluster hosted? PPA contracted? N+1 redundancy?
  • Networking: InfiniBand NDR or HDR? Rail-optimized or fat-tree topology?
  • Hardware generation and age: What GPU types, ratios, and vintage?
  • Customer concentration: If public, read the 10-K. If private, ask top-3 customer revenue share.
  • Backlog visibility: Multi-year contracted revenue and counterparty identities.
  • Financial health: Debt structure, interest coverage, runway, capex plans.
  • Compliance certifications: SOC 2 Type II, ISO 27001, HIPAA, FedRAMP if relevant.
  • SLA terms: Uptime commitments, remediation credits, support tiers.
  • Data egress costs: Per-GB charges and contractual limits.
  • Exit and portability: Is egress free? What is migration cost to another provider?

What this means for you

Diversify. Never make a single neocloud your sole AI compute provider. Customer concentration at the operator level, Microsoft at roughly 67 percent of CoreWeave revenue, creates counterparty risk that propagates to your SLA. A two-vendor minimum, one neocloud plus one hyperscaler, or two neoclouds, is the prudent floor.

Model your expected utilization before signing any take-or-pay contract. If your demand is bursty or uncertain, stay on on-demand or spot and avoid the multi-year lock-in that bankrupts operators below 60 percent utilization. If your demand is steady and predictable at scale, take-or-pay with CoreWeave, IREN, or Crusoe beats hyperscaler committed-use on unit economics.

Watch the generation cadence. NVIDIA ships a new GPU generation roughly every 18 to 24 months, and prior-generation spot prices drop sharply on each transition. Hardware you buy or commit to today may be commercially obsolete inside two years, so weight contract length against the depreciation curve, not just the headline hourly rate.

Sources

Frequently asked questions

What is a neocloud and how does it differ from a hyperscaler?

A neocloud is a pure-play GPU rental provider like CoreWeave, Lambda, or Crusoe that sells GPU compute without the managed services, compliance, or global networking stack of AWS, Azure, or Google Cloud. Neoclouds charge 30-85% less per GPU-hour but offer narrower certifications and fewer regions.

How much does an H200 GPU cost to rent in 2026?

As of June 2026, H200 141GB on-demand pricing ranges from $4.29/hr at Lambda and Crusoe to $44.52/hr on Azure, a roughly 10x spread. AWS lists H200 Capacity Blocks at $39.80/hr after a 15% January 2026 increase.

Is CoreWeave profitable?

CoreWeave reported a 56% adjusted EBITDA margin in Q1 2026 but remains GAAP-unprofitable, with a $740 million net loss that quarter and $1.167 billion net loss for FY2025, driven by GPU-backed debt service and depreciation on a fast-obsoleting hardware base.

When should you choose a neocloud over a hyperscaler?

Choose a neocloud for training, fine-tuning, and fault-tolerant batch inference where unit economics dominate and you can self-manage orchestration. Choose a hyperscaler for latency-sensitive production serving, regulated workloads, and multi-region reliability. The emerging best practice is hybrid: neocloud for training, hyperscaler for inference.

What utilization rate do GPU clusters need to be profitable?

Below 60% utilization most GPU clusters lose money because revenue fails to cover fixed depreciation and operating costs. The breakeven zone is 60-75%, adequate margins begin around 75-80%, and strong margins require 80% or higher utilization.