Google ships the broadest agentic-coding stack in the industry, and it is still losing the only race that decides the category.
As of June 2026, Google fields five distinct coding products on top of its own foundation model: Gemini CLI, the async agent Jules, Gemini Code Assist in the IDE, the Antigravity platform, and a Gemini-powered Android Studio. None of OpenAI or Anthropic matches that surface area.
And yet Google's best coding model, Gemini 3.1 Pro, trails Anthropic's Claude Opus 4.8 by roughly 8 points on SWE-bench Verified.
That gap is the whole story. This is a piece about Google Gemini coding and whether breadth can beat a benchmark lead.
TL;DR: Can Google catch up on coding?
Google can plausibly close the agentic-coding gap, but it is unlikely to lead the category in the next 6 to 12 months. Gemini 3.1 Pro leads the LiveCodeBench Pro Elo leaderboard (2887) and Google owns distribution assets nobody can copy, including Android Studio, Workspace, and TPU-driven pricing. But Claude Opus 4.8 holds SWE-bench Verified at 88.6%, Claude Code already commands roughly 7 million weekly npm downloads, and Google's standalone Gemini CLI free tier is being folded into Code Assist on June 18, 2026. The model lead matters more than the product count. Until the SWE-bench gap closes, Google sits one tier below the leader.
Key takeaways
- SWE-bench Verified is the scoreboard. Claude Opus 4.8 leads at 88.6%; Gemini 3.1 Pro is at 80.6%, an 8-point gap that widens on the harder SWE-bench Pro split.
- Google does lead somewhere. Gemini 3.1 Pro tops LiveCodeBench Pro Elo at 2887, a competitive-programming benchmark.
- OpenAI owns the terminal. GPT-5.5 Codex CLI scores 83.4% on Terminal-Bench 2.1.
- Install base is lopsided. Claude Code's ~7M weekly downloads dwarf Codex (mid six figures) and Gemini CLI (low six figures).
- The CLI sunset is a real signal. Folding Gemini CLI's free tier into Code Assist on June 18, 2026 reads as an admission the standalone CLI did not pay its way.
Where does Google Gemini coding actually win?
Start with the win, because it's real and underreported. Gemini 3.1 Pro holds the top spot on LiveCodeBench Pro Elo at 2887, an independent, continuously-updated leaderboard that rewards code generation and competitive-programming-style problems.
That benchmark is not a vendor self-report. It's one of the strongest first-party-supported claims of Google coding leadership available right now.
The catch is what LiveCodeBench measures. It rewards writing correct code from a prompt, not navigating a large repository to fix a bug across multiple files. Those are different skills, and the second one is what most working engineers actually pay for.
So Google leads the benchmark that looks like a coding interview and trails the benchmark that looks like a Tuesday.
Why SWE-bench Verified is the gap that matters
SWE-bench Verified takes real GitHub issues and checks whether an agent's patch passes the repo's real tests. It is the most-cited proxy for "can this thing do my job."
Here is the mid-2026 picture across the three vendors and the major harnesses.
| Benchmark | Leader | Score | Google's result |
|---|---|---|---|
| SWE-bench Verified | Claude Opus 4.8 | 88.6% | Gemini 3.1 Pro 80.6% |
| SWE-bench Pro | Claude Opus 4.8 | 69.2% | Gemini 3.1 Pro ~54.2% |
| Terminal-Bench 2.1 | GPT-5.5 (Codex CLI) | 83.4% | Gemini 3.1 Pro 80.2% (TB 2.0) |
| LiveCodeBench Pro Elo | Gemini 3.1 Pro | 2887 | (leads) |
Source: swebench.com leaderboard, tbench.ai, and LiveCodeBench, all as of June 2026.
The 8-point Verified gap is meaningful, and it gets worse on SWE-bench Pro, the harder curated split, where Gemini sits around 54% against Opus 4.8's 69%. The gap is widest exactly where the tasks are hardest.
One number worth flagging and then setting aside: Anthropic's Claude Fable 5 reached 95.0% on SWE-bench Verified on June 9, 2026, then had the record suspended on June 12 with no stated reason. Treat that 95% as cautionary, not as a shipping result. Whatever Anthropic ships next will reset the board.
How do Jules, Codex Cloud, and Claude Code on the web compare?
Jules is Google's async agent: it clones a repo, plans, runs tests, and opens a pull request, all in the background. It's available to Google AI Pro and Ultra subscribers and wired into GitHub.
Jules has one genuine architectural distinction. It runs Gemini 3.x directly, where Codex Cloud rides a tuned GPT-5.5 and Claude Code on the web runs Opus 4.8. It's the only async agent using the vendor's raw foundation model, and it's the only one integrated with Workspace, Cloud Build, and Vertex AI.
| Async agent | Model | Ecosystem hook | Adoption signal |
|---|---|---|---|
| Jules | Gemini 3.x | Workspace, Vertex, Cloud Build | Lowest of the three |
| Codex Cloud | GPT-5.5 | ChatGPT Pro/Business/Enterprise | Strong CLI-adjacent base |
| Claude Code (web) | Claude Opus 4.8 | GitHub, Bedrock | Largest CLI install base |
The honest read: Jules is plausible, available, and well-integrated inside the Google ecosystem. It is not yet a destination product for developers who don't already pay Google for AI Pro or Ultra.
Independent community reports through Q2 2026 also flag Jules and Antigravity's managed agents as less reliable on long-horizon, multi-file refactors than Codex Cloud or Claude Code on the web.
What is Antigravity, and does the platform bet work?
Antigravity is Google's most distinctive 2026 move. It's no longer one product but a platform with five surfaces: an IDE, a desktop app (Antigravity 2.0), a CLI, an SDK, and a Managed Agents tier.
The default model is Gemini 3.5 Flash, which went GA on May 19, 2026 at $1.50 / $9.00 per 1M input/output tokens.
The architectural argument is a shared SDK across IDE, CLI, and managed tier. The Managed Agents service lets a customer run Google-operated coding agents on a hosted VM without standing up infrastructure, closer to OpenAI Assistants or Bedrock Agents than to a plain CLI.
Adoption is early. Community upvotes and download trends put Antigravity 2.0 well below Cursor and Copilot install counts. The platform is real; the traction is unproven.
The one place Google clearly owns the stack
Android Studio is the cleanest "Google ahead" datapoint in the comparison. Google controls the model, the agent, and the IDE, and the IDE is mandatory for the platform.
The latest stable releases are Otter 2025.2.x and Panda 2 (2025.3.2), shipping Studio Agent, Gemini in Console, and chat plus completion out of the box. Neither OpenAI nor Anthropic owns a development environment of comparable strategic weight.
The bound on this advantage: Android developers are a smaller slice of the coding-tool market than web and backend developers, and Google has never published how many Android Studio users actually turn on the Gemini features. Strong product, bounded reach.
How OpenAI and Anthropic are defending the lead
OpenAI's Codex is the most mature CLI ecosystem of the three. The latest stable CLI is v0.140.0, shipped June 15, 2026, adding /usage, session delete, Claude Code config import, OS-keyring credentials, and a @ picker for files and skills.
The default model is GPT-5.5, launched April 23, 2026 with 1M context and a self-reported 88.7% on SWE-bench Verified. Codex moved to token-based pricing on April 2, 2026, with a $100/month "Pro 5x" tier built for agent workloads.
Codex launched in May 2025, six months ahead of Gemini CLI, and the head start shows in community skills and slash commands. Its weakness mirrors Google's strength: OpenAI owns no IDE of strategic weight.
Anthropic's Claude Code is the reference product. Above 7 million weekly npm downloads, IDE coverage across VS Code, JetBrains, Cursor, and Zed, and the highest production SWE-bench Verified score (Opus 4.8 at 88.6%).
The "agent skills" pattern that Gemini CLI and Codex now copy originated here. Anthropic reportedly hit a $30B run-rate in 2026 H1 with enterprise share at 34.4%, narrowly ahead of OpenAI.
What this means for you
If you're choosing a coding agent right now, here's the practitioner read.
For repo-scale bug fixing and refactors, default to Claude Code or Codex. The SWE-bench and Terminal-Bench leads track real-world reliability on multi-file work, and the community tooling is deeper. Claude Code if you want the largest skills ecosystem; Codex if you want token-based pricing tuned to agent token burn.
For greenfield code generation and competitive-style problems, Gemini 3.1 Pro is genuinely top-tier. Its LiveCodeBench Elo lead is real. If your workload is "write this function" more than "fix this legacy module," test it head-to-head.
If you live in the Google ecosystem, the calculus shifts. Android Studio, Workspace, and Vertex integration are advantages no competitor can replicate. Jules and Code Assist are the path of least resistance if you already pay for AI Pro or Ultra.
Watch the June 18, 2026 CLI cutover. If you rely on the free Gemini CLI tier, your seat moves to Gemini Code Assist Standard or Enterprise. The open-source code and the free Individual tier survive, with limits.
The signals that will tell you if Google is catching up
Three things would change the verdict, and you can watch all of them.
First, the SWE-bench Verified leaderboard. If Gemini 3.5 Pro or a 4 Pro posts above 88% on an independent harness, the gap closes and this article needs a rewrite.
Second, Gemini CLI npm downloads. If that metric climbs from low six figures toward 1M/week, the ecosystem is catching up to the install base.
Third, Antigravity daily active users. A Cursor-class number on the IDE would prove the platform bet is working.
Google's structural assets are durable. TPU Ironwood gives it a per-token cost edge that funds $1.50 Flash pricing while staying profitable, and Workspace plus Android distribution is unique. But developer surveys still put Gemini a clear third behind ChatGPT and Claude on the "AI tool you use regularly" question.
Catch-up is more likely than leadership in 2026. The model gap is the thing to watch. Everything else Google brings, however strong, sits one tier below until SWE-bench Verified says otherwise.
Sources
- Gemini 3.5 Flash Model Card, Google DeepMind
- Gemini 3.1 Pro, Google DeepMind
- Gemini Developer API Pricing, Google AI for Developers
- Gemini Code Assist Release Notes, Google Cloud
- SWE-bench Verified Benchmark 2026, BenchLM
- Claude Opus 4.8, Awesome Agents
- LiveCodeBench Leaderboard
- OpenAI Codex, GitHub
- OpenAI's GPT-5.5 Powers Codex on NVIDIA Infrastructure
- Codex Pricing, OpenAI Developers
- Claude Code Overview, Anthropic
- Antigravity / Gemini 3.5, Google DeepMind
- Android Studio Panda 2 Release Notes, Android Developers
- Gemini (language model), Wikipedia
- 2025 Stack Overflow Developer Survey
- OpenAI News, Reuters
