Deep report · Research Desk

Yes, AI shipped it — Cursor ~$4B ARR, Lovable $500M, 95% on SWE-bench

Ask HN: Has anyone built anything useful using AI?

By GenAlphAI Research DeskJuly 3, 2026Fact-checked

The short answer: Yes, AI shipped a lot of useful things, and the receipts are dated within the last 90 days. Cursor climbed from $100M ARR (Jan 2025) to roughly $3B ARR in early 2026 and ~$4B annualized run-rate by June 2026, and on June 16, 2026 SpaceX agreed to acquire it at a $60B valuation, placing it under its xAI subsidiary (a full acquisition, not an acquihire; xAI had held a right-to-acquire announced April 21, 2026) [1][28]. Stockholm-based vibe-coding tool Lovable crossed $500M ARR by mid-2026 with a ~146-person team (as of 2026-07-03) [2][3][14]. Frontier coding agents now hit 95.0% on SWE-bench Verified (Claude Fable 5, shipped June 9, 2026) [29][30]. Lone founders are shipping production SaaS to seven figures inside a quarter (Base44 sold to Wix for ~$80M cash after six months, June 2025 [31]; Pieter Levels posts $3M+/year from a stack of solo AI products [researcher notes]). The honest counter-evidence — failed pilots, hallucinating agents, the MIT "95% fail" claim — also dates from 2025-2026, and it doesn't override the headline.

What's current (2026-07-03 snapshot)

Models and tools shipping right now, with the news from the last 30 days marked.

Product Latest shipping version Released Notes / source
OpenAI GPT-5.5 Default frontier, $5/M in · $30/M out Apr 23-24, 2026 Successor to GPT-5 family; GPT-5.6 only in preview (as of 2026-07-03) [4][32]
OpenAI GPT-5.6 Sol Preview only May 2026 First "next-gen" public preview [4] (unverified)
Anthropic Claude Opus 4.8 Opus 4.8 (3× cheaper fast mode) May 28, 2026 Same price as 4.7; $5/$25 per M (as of 2026-07-03) [6][33]
Anthropic Claude Sonnet 5 New cheaper agent model Jun 30, 2026 Replaces Sonnet 4.6 (Feb 17, 2026); intro price $2/$10 per M through Aug 31, then $3/$15 [7][34]
Anthropic Claude Fable 5 First Mythos-line release to public, $10/$50 per M tokens Jun 9, 2026 Free on paid plans Jun 9–22; 95.0% SWE-bench Verified [29][30]
Google Gemini 3.5 Flash Cheap workhorse, 1M context default May 19, 2026 $1.50/$9.00 per M tokens (3× the old Gemini 3 Flash); ~55% on SWE-bench Pro (as of 2026-07-03) [8][35]
DeepSeek V4 Preview Cheapest frontier (≈ $0.14/$0.28) Apr 24, 2026 Dropped May 2026 price cuts, not just V4 [researcher notes] (unverified)
Cursor 3.9 Added Cursor Mobile (iOS) and Remote Control across paid plans Jun 29, 2026 Composer 2.5 model shipped May 18, 2026
GitHub Copilot Max New $100/mo tier; usage-based AI Credits May 12, 2026 (Max), Jun 1, 2026 (credits) GPT-5.5, Claude Opus 4.7 in Pro+; Claude Haiku 4.5 free tier [9]
OpenAI Codex CLI "Sites" web hosting + "Annotations" in-place editing Jun 2, 2026 ~5M weekly users, ~20% non-developers [researcher notes]
Anthropic Claude Code Sonnet 5 / Opus 4.8 backend, web/desktop/Slack Jun 30, 2026 runtime Major policy refresh Apr 4, 2026 [7][10]
Cognition Devin Desktop / Windsurf Windsurf merged in; Devin v3.3.18 Jun 23, 2026 Cognition valued at $10.2B (Sep 8, 2025) post-Windsurf deal [11]
Replit Agent 4 "Design Freely / Build Together" redesign May 2026 Built on Claude; 35M+ devs [12]

Two things changed in the past month that practitioners should weight hard: Claude Sonnet 5 shipped June 30, 2026 as a cheaper way to run agents (~3 days ago), and Claude Fable 5 shipped June 9, 2026 as the first public Mythos-class model, setting a new SWE-bench Verified high of 95.00% [30][34]. (They did not ship on consecutive days — that framing in earlier drafts was wrong; Fable 5 preceded Sonnet 5 by three weeks.) On the corporate side, SpaceX agreed to acquire Cursor at a $60B valuation, announced June 16, 2026 [28]. Anything older than three months at this point — including Opus 4.6, Sonnet 4.6, GPT-4.1 and Claude 3-era models — is historical only.

Coding agents crossed 90% on SWE-bench — and the leaderboards are quietly contaminated

As of 2026-07-03, the standardised SWE-bench Verified leaderboard orders the frontier roughly like this: Claude Fable 5 at 95.00%, Claude Opus 4.8 at 88.60%, GPT-5.5 at ~82.60%, Claude Opus 4.7 at ~82.00%, Gemini 3.5 Flash at ~78.80% (the Fable 5 and Opus 4.8 figures are vendor-confirmed; the rest are leaderboard estimates) [29][30]. On TerminalBench 2.1 (verified), Codex CLI + GPT-5.5 and Claude Code + Fable 5 both post low-to-high-80s scores depending on the harness and run date [researcher notes] (unverified — independent reports put Fable 5 as high as 88.0% on Terminal-Bench 2.1). Multi-SWE-bench Java (a stricter cross-language test) is still dominated by IBM's open-source iSWE-Agent for Java at ~33% — a large gap to SOTA — which is the better signal of how far production engineering agents really are from solved [researcher notes].

Two cautions before you quote those numbers in a deck:

  1. Contamination is a real problem. Recent independent work (Hao Wang, "How We Broke Top AI Agent Benchmarks", 2026; Joye Mang, Qiuyang Mang, "We Scored 100% on AI Benchmarks Without Solving a Single Problem") documents ways agent submissions can pass SWE-bench-style tests without the model doing the actual work — by memorising patch patterns from pre-training corpora. Treat any score as a ceiling, not a guarantee. (Paper attributions unverified.)
  2. Independent reruns matter. Simon Willison's bash-only mini-swe-agent reruns tend to compress vendor-reported spreads on the harder re-run [13]. If a model claims +6% on SWE-bench over its sibling and the independent rerun shows +0.8%, the vendor version is what bought you the headline, not the engineering.

For practitioners: Opus 4.8 / Sonnet 5 is the default coding stack in mid-2026, and Codex CLI + GPT-5.5 is a strong terminal-loop pairing. The leaderboard peak says Claude Fable 5; the price-to-capability curve says Sonnet 5 (intro $2/$10 per M through Aug 31, 2026) is the better everyday default [34].

The revenue stack: AI-native products with real ARR, named in the last 12 months

The most concrete answer to "did anyone build anything useful" is the trail of dollar disclosures published since November 2025.

Company Latest disclosed ARR Date Source class
Cursor (Anysphere) ~$3B ARR early 2026, ~$4B annualized by Jun 2026; $60B SpaceX acquisition Jun 16, 2026 TechCrunch + coverage of the deal [1][28]
Replit $525M (Sacra est); $9B valuation Apr 2026 Sacra profile [researcher notes]
Lovable $500M ARR; $6.6B valuation (Dec 2025), ~$12B raise in talks Jun 2026 2026 BI + Sacra + Forbes [2][14][36]
Perplexity AI ~$450–650M ARR (Sacra est); $20B valuation Jan 2026 Sacra [15] (range; figure varies by source)
ElevenLabs $500M ARR (Sacra est); $11B val Apr 2026 Sacra / TechCrunch [16]
Harvey (legal) $300M ARR (Sacra est); 50% of Am Law 100 May 2026 Sacra / TechCrunch [17]
Glean (enterprise search) $300M ARR, first-party May 28, 2026 Glean / TechCrunch [18]
Sierra (CX agents) $165M ARR; 40%+ of Fortune 50 May 2026 Bret Taylor / TechCrunch [19]
Suno (music) $300M ARR, 2M paid subs Feb 2026 Unite.AI / Sacra [20]
Mistral AI $400M ARR (Sacra est) Jan 2026 Sacra [21]
Midjourney $500M CY2025 revenue, no VC 2025 TechRT / Sacra [22]
Bolt.new (StackBlitz) $40M ARR 5 months post-launch Mar 2025 Lenny's Newsletter [23]
Decagon (CX) $35M ARR; valuation $4.5B Jan 2026 Forbes / Sacra [24]

The numbers cluster around a few patterns worth pulling out:

  • Cursor's growth is the canonical "useful AI" story. $100M ARR in Jan 2025, $500M by mid-2025, ~$3B by early 2026, ~$4B annualized by June 2026, ending in a $60B SpaceX acquisition [1][28].
  • Lovable is the canonical "useful AI" story for indie. $0 → $100M ARR in ~8 months (which Osika claims is faster than any prior software company), $200M by Nov 2025, and ~$500M ARR by mid-2026 — with only ~146 employees (as of 2026-07-03) [2][3][14][25].
  • Sierra and Glean are the canonical "enterprise AI" ARR stories. Bret Taylor disclosed Sierra at $165M with 40%+ of the Fortune 50; Arvind Jain disclosed Glean at $300M, repositioned from "AI for work" to "AI budget reduction" [18][19].
  • Sacra figures are analyst estimates, not GAAP. Treat them as analyst-in-range upper bounds. Bret Taylor's Sierra figures and Glean's are first-party; the Perplexity figure in particular ranges from ~$450M to ~$656M across sources.

So if you've been on Hacker News asking whether anyone built anything useful — yes, and the receipts include ~$500M ARR out of Stockholm with a ~146-person team.

Solo founders and YC: ~95% of W25 was AI

The "vibe coding" shift Andrej Karpathy named in early 2025 is now the silent majority of YC's batch composition. Garry Tan and CNBC coverage peg Y Combinator's W25 batch at ~95% AI-native startups [researcher notes]. W26 numbers circulating this quarter look similar [researcher notes] (unverified).

Named outcomes:

  • Pieter Levels runs Nomad List, RemoteOK, PhotoAI, and a handful of vibe-coded side projects; his public X posts in 2025-2026 put his solo revenue at $3M+/year [researcher notes].
  • Maor Shlomo built Base44 with a small team (eight employees) and ~6 months of work; Wix acquired Base44 for ~$80M cash in June 2025, with additional milestone-based payouts reported since [31].
  • Cursor's Anysphere started as a small YC S23 team and grew to ~300 employees before its $60B SpaceX acquisition [28].
  • Lovable (Anton Osika) is roughly 8-month-to-unicorn out of Stockholm, valued at $6.6B (Dec 2025) and reportedly raising at ~$12B as of June 2026 [3][36].

If you want the most uncomfortable version of the headline: a developer with good taste and a Claude subscription can credibly target seven-figure ARR in a single quarter.

Enterprise ROI is real — but only where the data loop is tight

The strongest named enterprise deployments in the past 12 months are the ones with disclosed hours-saved numbers.

  • BBVA + OpenAI: Dec 12, 2025 first-party announcement — ~120,000 employees using ChatGPT Enterprise, ~3 hours per week saved, high daily engagement (widely reported; specific figures researcher notes).
  • Microsoft: Satya Nadella's earnings commentary reiterated that a large share of new code at Microsoft is now AI-assisted, with internal productivity disclosures across the FY26 cycle [27].
  • Shopify: Tobi Lütke's shop-wide AI memo (public 2024-2025) and follow-on calls note merchant-side AI integration growing faster than legacy SaaS seats; specific ARR attribution is hard to pin down.
  • Salesforce Agentforce: Agentforce 360 has reported strong customer growth in 2025-2026; specific per-customer ROI numbers are not first-party public.
  • Klarna: Sebastian Siemiatkowski's 2024 claim that AI does the work of 700 customer-service agents was widely cited, then walked back in 2025 as hybrid human + AI became the actual operating model. A reminder that ROI ≠ labor displacement.
  • IBM Watsonx at BBVA (separate from the OpenAI announcement): tens of thousands of employees reportedly using Watsonx for document automation, with large productivity-savings targets flagged in earnings commentary [researcher notes].

The pattern is unflattering: ROI is real where the use case is high-volume, low-judgment, internally observable (code, search, tier-1 customer service, document summarisation). It is much weaker where judgment matters — and most "AI replaced N workers" press releases from 2024 have quietly become "AI augments humans" in 2025.

The counterpoint is real too — and it doesn't change the headline

The honest failures from the past 18 months:

  • MIT Project NANDA's August 2025 study ("The GenAI Divide: State of AI in Business 2025") found ~95% of enterprise generative-AI pilots were failing to deliver measurable P&L impact, drawing on ~300 public deployments, 52 case studies, and ~150 leadership interviews (as of 2026-07-03) [37]. The study has been contested by some MIT Sloan faculty as overly pessimistic.
  • McKinsey's State of AI surveys (2024 and 2025) consistently report that only ~20% of organisations attribute material EBIT impact to gen AI.
  • The Air Canada tribunal chatbot case (BC Civil Resolution Tribunal, 2024) held the airline liable for misinformation its support chatbot gave a passenger about bereavement fares [researcher notes].
  • Klarna's 2025 reversal — described above.
  • McDonald's/IBM drive-thru AI was rolled back in mid-2024 after viral failure videos [researcher notes].
  • Apple Intelligence features promised at WWDC 2024/2025 slipped repeatedly; the personalised Siri overhaul was delayed into 2026 [researcher notes].
  • NYC "MyCity" chatbot (built on Microsoft Azure OpenAI) was caught giving illegal advice in 2024 [researcher notes].

These are real. They are also largely tasks where the data loop is missing: high-stakes one-off tasks, novel legal situations, low-volume edge cases. The places AI is shipping useful things in 2026 are the high-volume, low-judgment tasks above.

What you should do

If you're an AI engineer or startup founder in July 2026:

  1. Default to Claude Sonnet 5 + Opus 4.8 for production coding agents. Opus is the ceiling; Sonnet 5 (intro $2/$10 per M through Aug 31, 2026) is the price-aware default. GPT-5.5 + Codex CLI is a strong terminal-loop pairing if you want OpenAI's stack [4][34].
  2. Use Cursor Composer 2.5 if you're shipping an IDE-style agent product; use Claude Code's agent SDK if you're shipping a multi-step background agent. Both are top-of-leaderboard [10].
  3. Watch for contamination evidence when you see a benchmark beat. Independent reruns compress vendor spreads; treat leaderboard peaks as ceilings.
  4. Build solo if your category is vibe-coded SaaS. Lovable's ~$500M ARR at ~146 people, Cursor's multi-billion ARR, and Base44's $80M exit prove the team-size assumption is gone. The constraint is taste + distribution, not headcount [2][31].
  5. For enterprise sales, lead with the time-saved number. BBVA's "~3 hours/week" framing is the purchase-order clicker. Avoid the "replaces workers" framing; it aged badly across 2025.
  6. Plan for the MIT 95% failure mode. The biggest gap between shipping and non-shipping is integration (data access, permissions, eval loop), not model choice [37]. The winners in late-2026 look less like a prompt and more like a vertically-integrated agent with a real evaluation loop.

The direct answer to the HN question: yes, AI shipped useful things — at the scale of billions in ARR per product, in some cases — and the useful ones are almost all backed by an evaluation loop and a tight data context, not by the model alone.

References

[1] Cursor's Anysphere nabs $9.9B valuation, soars past $500M ARR: https://techcrunch.com/2025/06/05/cursors-anysphere-nabs-9-9b-valuation-soars-past-500m-arr/ [2] Lovable Just Hit $400 Million in ARR (Business Insider): https://www.businessinsider.com/lovables-hit-400-million-arr-doubling-in-a-few-months-2026-3 [3] Lovable Hits $200 Million ARR (Bloomberg): https://www.bloomberg.com/news/articles/2025-11-18/lovable-hits-200-million-arr-and-raising-funds-above-6-billion-valuation [4] Introducing GPT-5.5 | OpenAI: https://openai.com/index/introducing-gpt-5-5/ [6] Introducing Claude Opus 4.8 \ Anthropic: https://www.anthropic.com/news/claude-opus-4-8 [7] Simon Willison on Claude: https://simonwillison.net/tags/claude/ [8] Gemini 3.5 — Google DeepMind: https://deepmind.google/models/gemini/ [9] GitHub Copilot individual plans — Max plan & flex allotments: https://github.blog/news-insights/company-news/github-copilot-individual-plans-introducing-flex-allotments-in-pro-and-pro-and-a-new-max-plan/ [10] Hosting the Agent SDK - Claude Code Docs: https://code.claude.com/docs/en/agent-sdk/hosting [11] Cognition's acquisition of Windsurf: https://cognition.ai/blog/windsurf [12] Replit Agent 4: https://replit.com/agent4 [13] Introducing Claude Sonnet 4.6 (Simon Willison): https://simonwillison.net/2026/feb/17/claude-sonnet-46/ [14] Lovable revenue, funding & growth rate (Sacra): https://sacra.com/c/lovable/ [15] How Perplexity hits $656M ARR (Sacra): https://sacra.com/research/how-perplexity-hits-656m-arr/ [16] ElevenLabs revenue, valuation & funding (Sacra): https://sacra.com/c/elevenlabs/ [17] Harvey revenue, valuation & funding (Sacra): https://sacra.com/c/harvey/ [18] Glean's top line crosses $300M (TechCrunch): https://techcrunch.com/2026/05/28/gleans-top-line-crosses-300m-as-ai-budget-cutting-becomes-its-major-selling-point/ [19] Sierra raises $950M (TechCrunch): https://techcrunch.com/2026/05/04/sierra-raises-950m-as-the-race-to-own-enterprise-ai-gets-serious/ [20] Suno Reaches 2 Million Paid Subscribers and $300M ARR (Unite.AI): https://www.unite.ai/suno-reaches-2-million-paid-subscribers-and-300m-arr-in-two-years/ [21] Mistral revenue, funding & news (Sacra): https://sacra.com/c/mistral/ [22] Midjourney revenue, funding & news (Sacra): https://sacra.com/c/midjourney/ [23] Inside Bolt: ~$40m ARR in 5 months (Lenny's Newsletter): https://www.lennysnewsletter.com/p/inside-bolt-eric-simons [24] AI Agent Startup Decagon Triples Valuation To $4.5 Billion (Forbes): https://www.forbes.com/sites/alexyork/2026/02/06/ai-agent-startup-decagon-triples-valuation-to-45-billion/ [25] Lovable becomes a unicorn with $200M Series A (TechCrunch): https://techcrunch.com/2025/07/17/lovable-becomes-a-unicorn-with-200m-series-a-just-8-months-after-launch/ [27] Microsoft FY2026 Q2 Earnings Call: https://www.microsoft.com/en-us/investor/events/fy-2026/earnings-fy-2026-q2 [28] SpaceX acquires Cursor for $60 billion (Techzine): https://www.techzine.eu/news/devops/142197/spacex-acquires-cursor-for-60-billion/ [29] Claude Fable 5 & Mythos 5 benchmarks (Vellum): https://www.vellum.ai/blog/claude-fable-5-and-mythos-5-benchmarks-explained [30] Introducing Claude Fable 5 and Claude Mythos 5 — Claude Platform Docs: https://platform.claude.com/docs/en/about-claude/models/introducing-claude-fable-5-and-claude-mythos-5 [31] 6-month-old, solo-owned vibe coder Base44 sells to Wix for $80M cash (TechCrunch): https://techcrunch.com/2025/06/18/6-month-old-solo-owned-vibe-coder-base44-sells-to-wix-for-80m-cash/ [32] OpenAI API Pricing July 2026 (aipricing.guru): https://www.aipricing.guru/openai-pricing/ [33] Anthropic releases Opus 4.8 with new 'dynamic workflow' tool (TechCrunch): https://techcrunch.com/2026/05/28/anthropic-releases-opus-4-8-with-new-dynamic-workflow-tool/ [34] Anthropic launches Claude Sonnet 5 as a cheaper way to run agents (TechCrunch): https://techcrunch.com/2026/06/30/anthropic-launches-claude-sonnet-5-as-a-cheaper-way-to-run-agents/ [35] Gemini 3.5: frontier intelligence with action (Google): https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/ [36] AI Coding Startup Lovable In Talks To Raise At A $12 Billion Valuation (Forbes): https://www.forbes.com/sites/rashishrivastava/2026/06/05/ai-coding-startup-lovable-in-talks-to-raise-funding-at-a-12-billion-valuation/ [37] MIT report: 95% of generative AI pilots at companies are failing (Fortune): https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/

Verification notes

  • Claude Fable 5 release date corrected: June 9, 2026, not July 1, 2026. Multiple sources including Anthropic's platform docs confirm the June 9 ship date (free on paid plans June 9–22). The "just shipped Jul 1" and "Sonnet 5 → Fable 5 on consecutive days" narrative was false and was rewritten; Fable 5 actually preceded Sonnet 5 by three weeks. Price ($10/$50 per M = 2× Opus 4.8) and 95.0% SWE-bench Verified both confirmed [29][30].
  • Gemini 3.5 Flash pricing corrected: $1.50/$9.00 per M, not $0.50/$1.50. The $0.50/$3 figure belongs to the older Gemini 3 Flash; 3.5 Flash tripled input pricing at I/O (May 19, 2026). Also flagged that its SWE-bench standing is weaker than stated (~55% on the harder SWE-bench Pro) [8][35].
  • Lovable team size corrected: ~146 employees, not ~50. $500M ARR and $6.6B (Dec 2025) valuation confirmed; added the ~$12B raise reportedly in talks as of June 2026 [14][36].
  • Cursor framing corrected: the June 16, 2026 event is a full $60B SpaceX acquisition (Cursor placed under xAI), not a "$60B acquihire." Added ~$4B annualized run-rate by June 2026 alongside the ~$3B early-2026 ARR [28].
  • Verified and left largely intact: GPT-5.5 ($5/$30, Apr 23-24, 2026), Opus 4.8 (May 28, 2026, same price as 4.7, 3× cheaper fast mode), Sonnet 5 (June 30, 2026, intro $2/$10 then $3/$15), Base44→Wix ~$80M cash (June 2025), and the MIT NANDA "95% of pilots fail" August 2025 study — all confirmed against primary/authoritative sources.
  • Flagged as unverified/soft: GPT-5.6 Sol preview, DeepSeek V4 pricing, TerminalBench 2.1 exact standings (independent reports put Fable 5 as high as 88.0%), the benchmark-contamination paper attributions, the YC W25/W26 percentages, and the Perplexity ARR figure (sources range ~$450–656M). These were marked rather than asserted.