Layer hub · Chips

AI chips

Every Gen α AI article in the Chips layer — GPUs, custom AI silicon, inference hardware, accelerators, HBM and NVLink, TPUs, neoclouds and bare-metal compute, and the silicon-vs-GPU cost math. 8 pieces, organized by the same five-layer taxonomy that tags each article.

Who this is for: chip procurement leads, hardware engineers, and infrastructure buyers evaluating GPUs, custom AI silicon, and inference hardware. You are comparing Blackwell, MI400, and Trainium2 against real inference TCO, sizing HBM and NVLink for your workloads, weighing neocloud and bare-metal capacity against owned silicon, and owning the silicon-vs-GPU decision at the board level — and you need analysis that maps to those procurement decisions, not vendor spec sheets.

How this layer is organized

Gen α AI sorts its coverage into five layers of the AI stack — Energy, Chips, Infrastructure, Models, and Applications — using a computed taxonomy applied to every article at render time. This hub collects every piece the taxonomy classifies into the Chips layer: GPUs and custom AI silicon, inference hardware and accelerators, HBM and NVLink, TPUs, neoclouds and bare-metal compute, the silicon-vs-GPU cost math, and inference TCO by accelerator. Chips is the second-highest-commercial-priority layer in that taxonomy — after Infrastructure — which is why it gets a dedicated hub.

The article list and the count above are computed at render time from the same taxonomy rules in taxonomy.js that tag each article — there is no hand-curated selection and no traffic or popularity ranking behind the order. Pillars surface first, then pieces sort by editorial quality and recency. If a piece is missing, the taxonomy rules did not classify it here; the rules are iteratively refined.

The Chips library

8 articles in this layer. The grid below renders every one of them.

Chips8 pieces · GPUs, custom AI silicon, inference hardware, accelerators, HBM/NVLink, TPUs, neoclouds, bare-metal, silicon-vs-GPU cost math
AI Inference Hardware Has a New Cost BottleneckAI Economics

AI Inference Hardware Has a New Cost Bottleneck

The Nvidia question is now a workload-matching problem: memory bandwidth, utilization, and latency SLOs decide the real inference bill.

10 minJune 23, 2026
When Self Hosted Open Models Beat the API RouteAI Economics

Self Hosted Open Models Win After This Cost Cliff

Self-hosting is now a workload decision: privacy, latency, volume, and ops capacity decide more than ideology.

11 minJune 22, 2026
KV Cache Compression Is How Long Context Gets CheapAI Economics

KV Cache Compression Is the New Inference Lever

The highest-leverage serving work in 2026 is no longer just faster kernels; it is shrinking the cache that long-context models reread on every decode step.

11 minJune 21, 2026
AI Inference TCO 2026: Tokens Beat FLOPSAI Frontiers

AI Inference TCO 2026: Tokens Beat FLOPS

Blackwell, MI400, and Trainium are competing on delivered tokens per watt, software maturity, and power envelopes, not spec-sheet peak math.

11 minJune 20, 2026
VLLM vs SGLang: Pick by Workload ShapeAI Frontiers

VLLM vs SGLang: Pick by Workload Shape

Throughput charts hide the real decision: prefix reuse, structured generation, hardware, and operations determine the right open-source inference server.

11 minJune 20, 2026
Custom AI Silicon Inference Cost Is Now Board-LevelAI Economics

Custom AI Silicon Inference Cost Is Now Board-Level

The chip choice only pays off when you model tokens, utilization, memory, power, software drag, and cloud lock-in as one system.

11 minJune 20, 2026
The AI Stack Is Fracturing. Here's What Builders Do NowAI Frontiers

The AI Stack Is Fracturing. Here's What Builders Do Now

Compute geopolitics turned frontier models into jurisdictional products. Here's the architecture that survives the next directive.

12 minJune 18, 2026
AI compute cost optimization: build vs. Buy vs. Lease in 2026AI Economics

AI Compute Cost in 2026: Build vs. Buy vs. Lease, by the Numbers

Owning GPUs at high utilization can cost a third of renting them, but the breakeven math punishes anyone who guesses wrong about their workload.

10 minJune 12, 2026

Work with us on chips

Sponsor this coverage

This hub sits in high buyer-intent territory — readers are mid-decision on GPUs, custom AI silicon, and inference hardware, weighing TCO, memory bandwidth, and capacity against real workloads. If you build chip or accelerator products — GPUs, custom silicon, inference hardware, neocloud capacity — and want to reach these buyers with clearly labeled, editorially independent sponsorship, talk to us. No fabricated audience metrics; we share real analytics with serious sponsors.

View sponsor inventory →

Need a chip-selection decision, not a list?

If you are stuck choosing between GPUs, custom AI silicon, and inference hardware against real constraints — TCO budgets, memory-bandwidth ceilings, capacity and lead-time risk, workload shape — a focused advisory session can resolve it. Bring your shortlist, your workload, and your constraints; we hand you a written, prioritized chip-selection and inference-hardware recommendation.

Book an advisory session →

Go deeper on chips

We are building a fuller, constraint-driven framework for AI chip decisions — GPU vs. custom-silicon selection, inference TCO by accelerator, HBM and NVLink sizing, and neocloud vs. bare-metal vs. owned capacity — delivered through the biweekly Gen Alpha AI briefing. No spam, unsubscribe anytime.

Get the framework →