Gen α AI · Field notes for AI builders

Depth over hype, for people who bet on AI.

Evidence-first analysis of agentic systems, model evaluation, and the economics of AI software. We read the system card, find the primary source, and tell you what actually changed — and what didn't.

Evidence over vibesDepth over volumeHonest about uncertainty
170Deep dives published
9Evergreen pillar guides
BiweeklyThe field briefing
Editor’s picksNew here? These are the pieces we’d hand you first.
The latestFresh analysis, published continuously — the full archive lives in the rail and the pillars below.
Clinical AI's Real Attack Surface Is the EHR Integration, Not the ModelSecurity & Safety

A Clinical Scribe Fell to Three Prompts. The VA Scaled It to 130 Sites

The Heidi Health NEXUS jailbreak proved safety lives in a text layer the model will gladly rewrite, and the VA just multiplied that risk across 130 facilities.

12 minJune 28, 2026
OpenAI's Jalapeño Chip Is an Inference Hedge, Not a Nvidia KillerAI Economics

OpenAI's Jalapeño Chip Is an Inference Hedge, Not a Nvidia Killer

The first OpenAI custom AI chip keeps the API intact, which is the part earlier custom-silicon efforts got wrong.

12 minJune 28, 2026
How to Debug an AI Agent Incident: A Postmortem PlaybookAgents & Harnesses

Your AI Agent Went Rogue on Friday. Here's the Postmortem

A blameless, SRE-style framework for the five failure modes traditional incident response was never built to handle.

17 minJune 28, 2026
HIPAA, GDPR, and the EU AI Act: One Stack, Three Frameworks, Five WeeksSecurity & Safety

Five Weeks Until EU AI Act High-Risk Day. Is Your Stack Ready?

The August 2, 2026 high-risk deadline stacks three compliance regimes onto a single AI product. Here's how to satisfy them simultaneously.

11 minJune 28, 2026
Why Your LLM Judge Needs a Cohen's Kappa Before It ShipsModel Evaluation

LLM-as-Judge Reliability: The Cohen's Kappa Every Production Eval Needs

Static benchmarks are saturated; the binding constraint on shipping LLM products is now judge reliability over time, templates, and human labels.

12 minJune 28, 2026
Why Memory Bandwidth, Not Compute, Now Sets LLM Inference CostMemory & Context

Why Memory Bandwidth, Not Compute, Is the LLM Inference Bottleneck

Compute grew ~80x in a decade while bandwidth grew ~17x, and the KV cache turns every decoded token into a memory fetch.

12 minJune 28, 2026
Government-Gated AI: Who Now Decides What Frontier Models You Can RunAI Frontiers

Government-Gated AI: Who Decides What Models You Can Run

The BIS has turned frontier model deployment into a licensed activity, and the Anthropic Mythos and GPT-5.6 arcs show the new rules of access.

10 minJune 28, 2026
Voice AI Under 500ms: Latency Architecture for AgentsAI Economics

Voice AI Under 500ms: The Latency Budget That Decides Who Ships

Sub-500ms round trips are the line between a voice agent people prefer and one they hang up on; here's the architecture that gets you there.

12 minJune 27, 2026
How to Evaluate LLM Agents in Production When Benchmarks Skip SafetyModel Evaluation

15 Agent Benchmarks, Zero Safety Scores. Here's the Fix.

A systematic review found no leading agent benchmark integrates safety scoring, so production teams must build their own evaluation loop.

12 minJune 27, 2026
LPU vs GPU Inference: What Groq's Numbers Actually SettleModel Evaluation

LPU vs GPU Inference: Groq's 70% Latency Win, Decoded

The bifurcation debate is over on paper and messy in production; here is the practitioner's read on cost, latency, and routing.

12 minJune 27, 2026
92% of Teams Blew Their AI Budget. Here's the AI FinOps FixAI Economics

92% Blew Their AI Budget. AI FinOps Is the Fix

Token bills are running 2-5x over plan. Treat inference spend as an engineering problem and the math pays back in weeks.

11 minJune 27, 2026
The US vs Them: Fable Off, GPT-5.6 GatedModels & Releases

The "US" vs Them: Fable Off, GPT-5.6 Gated

Washington flipped off Fable for the planet, then opened GPT-5.6 to twenty vetted U.S. orgs. Frontier AI access is now a sovereignty variable.

18 minJune 26, 2026
Explore the pillarsNine durable guides that organize everything we publish.
AI Tools16 pieces

AI Coding Tools in 2026: The Power-User Field Guide

The gap between demo and production is the harness you build around the model, not the…

Explore →
Search & GEO11 pieces

Generative Engine Optimization: How to Earn AI Citations

Search is becoming synthesis. If ChatGPT, Perplexity, and Google's AI Overviews don't cite…

Explore →
Agents & Harnesses20 pieces

Agent Harness Engineering and Agentic Loops: 2026 Field Guide

Execution loops, externalized state, and verification gates now matter more than raw model…

Explore →
AI Economics17 pieces

AI Coding Agent Economics: Real ROI and Cost per Pull Request

Frontier labs now ship more AI-written code than human-written code, but the viral ROI…

Explore →
Model Evaluation25 pieces

Evaluating AI Models and Agents: The 2026 Field Guide

Why static leaderboards lost authority, and how to build an eval program that survives…

Explore →
Memory & Context16 pieces

Context Engineering for AI Agents: Memory, RAG & MCP

Why the context window, not the prompt, is the real bottleneck, and how to engineer…

Explore →
Security & Safety11 pieces

Securing AI Agents and LLM Apps: The 2026 Threat Model

Why indirect prompt injection, tool-mediated exfiltration, and rogue agents now define LLM…

Explore →
Models & Releases16 pieces

AI Models 2026: The Mid-Year Frontier and Open-Weight Map

How the open-weight cluster closed the gap, why reasoning became the default, and which of…

Explore →
AI Frontiers38 pieces

AI Frontiers 2026: Diffusion Models, Multimodal AI & More

A practitioner's map of frontier AI in mid-2026, where independent measurement finally…

Explore →