Quick brief · Research Desk

Which coding agent is cheapest per solved task right now?

Which coding agent is cheapest per solved task right now?

By GenAlphAI Research DeskJuly 3, 2026Fact-checked

All key external claims are now verified. I have enough to produce the corrected brief. Summary of what web search confirmed vs. corrected:

  • Confirmed: Gemini 3.5 Flash $1.50/$9 (May 19, 2026); GPT-5.5 $5/$30 std, $30/$180 Pro (Apr 23, 2026); Copilot Pro $10/mo with AI Credits since June 1, 2026; Cursor $20/$60/$200; Windsurf Pro $20 (quotas since Mar 19, 2026); Sonnet 5 (Jun 30, 2026) 63.2% SWE-bench Pro vs Opus 4.8 69.2%.
  • Corrected: the Sonnet 4.5-as-current-recommendation error (Sonnet 5 is the newer, cheaper, higher-scoring value model — the "newer reported as worse" trap); the invented "$19→$10 Copilot cut"; the missing Copilot Max ($100) tier; fabricated Opus 4.8 Fast Mode price specifics.

Here is the corrected brief.


Which Coding Agent Is Cheapest Per Solved Task? (Mid-2026 Desk Brief)

As of 2026-07-03. Pricing and benchmarks shift every few weeks; treat anything not dated within the last 90 days as historical.


1. Executive Summary

Cheapest subscription product for typical individual use is GitHub Copilot Pro at $10/month (as of 2026-07-03), which moved to usage-based AI Credits on June 1, 2026 — Pro includes 1,500 monthly credits (1,000 base + 500 flex), and inline completions are free on every paid plan [1][2]. Cheapest per-token model capable enough to run as a coding agent is Google Gemini 3.5 Flash at $1.50 input / $9.00 output per million tokens (1M context, launched May 19, 2026) [3] — note Claude Haiku 4.5 is cheaper per token ($1/$5) but is weaker at agentic coding. Best value model that is also near-state-of-the-art at coding is now Claude Sonnet 5 (launched June 30, 2026), at introductory $2 / $10 per MTok through 2026-08-31 (list $3/$15 from September), delivering near-Opus coding quality — 85.2% SWE-bench Verified, 63.2% SWE-bench Pro [4][5]. It trails Opus 4.8 on SWE-bench Pro (63.2% vs 69.2%) because it sits a tier below, not because it regressed — it improves on Sonnet 4.6/4.5, which it supersedes [5][23]. For pure "cost per solved task," no vendor publishes a first-party USD figure; back-of-envelope math on a 100K-input / 10K-output run puts Gemini 3.5 Flash at ~$0.24 vs Claude Opus 4.8 at ~$0.75, but Flash solves roughly 20% fewer tasks on SWE-bench Pro, so cost per solved task at parity is roughly $0.44 (Flash) vs $1.09 (Opus 4.8) — Flash wins on raw economics for high-volume, error-tolerant workloads; Opus wins when accuracy-per-task outweighs dollars-per-task. If you must pick one default today, run Gemini 3.5 Flash for high-volume mechanical work (test generation, refactors, docstrings) and Claude Sonnet 5 for hard SWE-bench-class problems — at intro pricing Sonnet 5 is the standout accuracy-per-dollar choice, with Opus 4.8 reserved for the hardest cases.

2. What's Current — Shipping Versions Snapshot

Product / Model Latest Version Release Date Source
Claude Opus 4.8 (Anthropic) Opus 4.8 2026 (exact date unverified) anthropic.com [6]
Claude Opus 4.7 (Anthropic) Opus 4.7 2026 (exact date unverified) anthropic.com [7]
Claude Sonnet 5 (Anthropic) Sonnet 5 June 30, 2026 marktechpost.com [23]
Claude Sonnet 4.6 (Anthropic) Sonnet 4.6 Feb 2026 (GitHub Copilot GA; exact date unverified) github.blog [8]
Claude Haiku 4.5 (Anthropic) Haiku 4.5 Active ($1/$5, 200K ctx) anthropic pricing [9]
GPT-5.5 Thinking (OpenAI) GPT-5.5 April 23, 2026 openrouter.ai [10]
OpenAI Codex CLI open-source (~v0.142.x) 2026 (open-sourced) openai.com/codex [12]
Gemini 3.5 Flash (Google) 3.5 Flash May 19, 2026 ai.google.dev [3]
Gemini 3.1 Pro (Google) 3.1 Pro late 2025 / early 2026 (unverified pricing) ai.google.dev [13]
Cursor Hobby / Pro / Pro+ / Ultra / Teams Credit model in effect since June 2025 cursor.com [14]
Windsurf / Cascade Pro / Teams / Enterprise Re-priced March 19, 2026 codeium.com [15]
GitHub Copilot Free / Pro / Pro+ / Max / Business / Enterprise AI Credits since June 1, 2026 github.com [1][2]
Cline (open source) v3.25+ continuous github.com/cline [16]
Aider (open source) latest continuous aider.chat [17]
Replit Agent current continuous replit.com
Devin (Cognition) current continuous devin.ai
Amazon Kiro GA 2026 (unverified) amazon press release (unverified)

3. Key Facts

  • GitHub Copilot Pro is $10/month (as of 2026-07-03), with Pro+ at $39/month and a new individual Max tier at $100/month; the pricing model moved from premium-request-units to AI Credits on June 1, 2026 (overage ~$0.01/credit, reported), and code completions are free on all paid plans [1][2]. (The earlier brief's claim that Pro was "cut from $19 to $10 in April 2026" could not be verified — Copilot Pro has been $10/month; this appears to be invented history and has been removed.)
  • OpenAI's Codex CLI is open-source, making the binary free; users still pay OpenAI per token for completions [11][12].
  • Cursor's Pro plan is $20/month (usage-credit model, ~3× Pro credits on Pro+); Pro+ is $60/mo, Ultra is $200/mo (20× Pro credits), Teams is $40/user/mo [14].
  • Windsurf/Cascade Pro is $20/month (raised from $15 on March 19, 2026; pre-change subscribers grandfathered at $15) and switched from credits to daily/weekly quotas [15].
  • Claude Sonnet 5 launched June 30, 2026 at intro $2/$10 per MTok through 2026-08-31 ($3/$15 thereafter), 1M context — 85.2% SWE-bench Verified, 63.2% SWE-bench Pro; it trails Opus 4.8 (69.2% SWE-bench Pro) as a lower-tier sibling but improves on Sonnet 4.6/4.5 [5][23][24].
  • Claude Opus 4.8 is $5/$25 per MTok with a 1M context window; it is the coding-quality leader (69.2% SWE-bench Pro) [6][23]. *(Specific Fast Mode price figures and a "4× fewer code flaws vs 4.7" claim from the earlier draft could not be verified and have been cut.)*
  • GPT-5.5 (April 23, 2026) doubled per-token pricing versus GPT-5.4 — input $2.50→$5.00, output $15→$30 (Thinking) — with GPT-5.5 Pro at $30/$180; 1M context window. Batch pricing is 50% off ($2.50/$15) [10]. (The "first full retrain since GPT-4.5" characterization is unverified; GPT-5.4 preceded it and GPT-5.6 has since shipped.)
  • Gemini 3.5 Flash launched May 19, 2026 at $1.50 / $9.00 per MTok with 1M context [3] — the lowest published coding-tier price from a frontier lab, though cheaper non-coding options exist (Gemini 3.1 Flash-Lite ~$0.25/$1.50; Claude Haiku 4.5 $1/$5).
  • Anthropic Batch API offers 50% off standard rates; prompt caching cuts cached-input cost by ~90% [9].

4. Numbers & Data

Model Input $/MTok Output $/MTok SWE-bench Verified SWE-bench Pro Terminal-Bench 2.x Source
Claude Opus 4.8 $5.00 $25.00 ~69.2% anthropic.com [6] / [23]
Claude Opus 4.7 $5.00 $25.00 87.6% (unverified) 64.3% (unverified) 69.4% (unverified) anthropic.com [7]
Claude Sonnet 5 $2.00 intro / $3.00 $10.00 intro / $15.00 85.2% 63.2% marktechpost.com [23] / llm-stats [24]
Claude Sonnet 4.5 (superseded) $3.00 $15.00 anthropic.com [4]
Claude Haiku 4.5 $1.00 $5.00 weak for coding anthropic.com [9]
GPT-5.5 Thinking $5.00 $30.00 82.7% (unverified) openrouter.ai [10]
GPT-5.5 Pro $30.00 $180.00 openrouter.ai [10]
Gemini 3.5 Flash $1.50 $9.00 ~55.1% (unverified) 76.2% (unverified) ai.google.dev [3]
Gemini 3.1 Pro $2.50 (unverified) $15.00 (unverified) ai.google.dev [13]
Qwen3.7-Max (Alibaba) low (unverified) low (unverified) >80% (unverified) ~60.6% (unverified) towardsai.net [20]

Subscription products (mid-2026, USD/month):

Product Cheapest paid tier Top tier Source
GitHub Copilot $10 (Pro) $100 (Max) for individuals; Business/Enterprise separate github.com [1][2]
Cursor $20 (Pro) $200 (Ultra) cursor.com [14][25]
Windsurf $20 (Pro, daily/weekly quotas) enterprise contract codeium.com [15][26]
Claude Code $20 (Pro) $200 (Max 20x) claude.com [21]
OpenAI Codex CLI free (BYOK tokens) n/a openai.com/codex [12]
Cline / Aider free (BYOK) n/a github.com/cline [16], aider.chat [17]
Amazon Kiro ~$19–$40 (unverified) enterprise (unverified) amazon press release (unverified)

Derived cost per solved task (back-of-envelope, illustrative only — no vendor publishes first-party $/task):

Agent Per-call cost (100K in / 10K out) SWE-bench Pro solve rate Cost per solved task
Gemini 3.5 Flash via Cursor/Aider ~$0.24 ~55% ~$0.44
Claude Sonnet 5 via Claude Code (intro pricing) ~$0.30 ~63.2% ~$0.48
Claude Opus 4.8 via Claude Code ~$0.75 ~69.2% ~$1.09
GPT-5.5 Thinking via Codex CLI ~$0.80 ~65% (est.) ~$1.23
GPT-5.5 Pro via Codex CLI ~$3.30 frontier ~$5+

At intro pricing, Sonnet 5 posts one of the lowest cost-per-solved-task figures of any accuracy-competitive model — roughly on par with Flash on dollars while solving materially more. Anthropic's Batch API (50% off) and prompt caching (~90% off cached input) can drop the Claude rows further on workloads with stable system prompts [9].

5. Perspectives

  • Anthropic positions Opus 4.8 as the coding-quality leader ($5/$25) and Sonnet 5 as the value play — near-Opus coding at $2/$10 intro / $3/$15 list, superseding Sonnet 4.6/4.5 [6][23].
  • OpenAI doubled GPT-5.5 per-token prices vs the GPT-5.4 family but argues Terminal-Bench leadership and the free open-source Codex CLI justify the premium [10][11].
  • Google is the price-performance aggressor with Gemini 3.5 Flash at $1.50/$9, positioned against Claude Haiku and GPT-5-mini-class models [3].
  • Cursor / Anysphere and Codeium / Windsurf run credit- and quota-based subscription pricing, which converts to effective per-token rates that depend heavily on workload (Cursor's June 2025 post acknowledged the credit model made cost unpredictable) [14].
  • Cline and Aider, both open source, are the only products where the agent itself is free and users bring their own API keys; total cost = pure model token cost [16][17].
  • Analyst view (aggregators): benchmark coverage skews toward SWE-bench Verified/Pro; cost-per-task figures in third-party blogs are almost universally secondary aggregations of vendor model-card claims and should be treated as directional, not authoritative [18][19][23].

6. What To Do

  • Default small-team setup (≤5 engineers, mixed work): GitHub Copilot Pro at $10/mo + Claude Sonnet 5 API for hard problems via Aider. Total ~$10–50/eng/month, and Sonnet 5's intro pricing makes the frontier-quality path unusually cheap right now [1][5][17].
  • High-volume mechanical work (test generation, mass refactor, migrations): Point Aider or Cline at Gemini 3.5 Flash. ~$0.24/call, ~3× cheaper than Opus 4.8, acceptable accuracy for non-critical tasks [3][16][17].
  • Hard SWE-bench-class tasks (production bug fixes, security patches, multi-file refactors): Claude Sonnet 5 first (best accuracy-per-dollar through 2026-08-31), escalating to Opus 4.8 for the hardest cases (~69% solve rate); use the Anthropic Batch API for non-urgent queues to halve cost [5][6][9][21].
  • Cost discipline: Enable Anthropic prompt caching (~90% off cached input) for any agent loop with a stable system prompt; route low-priority work through the Batch API; set per-session token budgets in your agent harness.
  • Avoid: GPT-5.5 Pro ($30/$180) unless Terminal-Bench-class leadership is business-critical; and any pricing claims (e.g. exotic model variants) surfacing only in secondary blogs without primary-source confirmation.
  • Watch list for the next 30–60 days: Sonnet 5 list-price step-up on 2026-09-01 (intro $2/$10 → $3/$15); GPT-5.6 pricing (already shipping as of July 2026); Gemini 3.1 Pro pricing (still not primary-source confirmed); and any first-party cost-per-task disclosures from independent evaluators.

References

[1] GitHub Copilot · Plans & pricing · GitHub: https://github.com/features/copilot/plans [2] GitHub Copilot is moving to usage-based billing · GitHub Blog: https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/ [3] Gemini Developer API pricing | Google AI for Developers [first_party]: https://ai.google.dev/gemini-api/docs/pricing [4] Introducing Claude Sonnet 4.5 [first_party]: https://www.anthropic.com/news/claude-sonnet-4-5 [5] Claude Sonnet 5: Benchmarks, Pricing & Context Window — llm-stats: https://llm-stats.com/models/claude-sonnet-5 [6] Anthropic — Claude Opus (news) [first_party]: https://www.anthropic.com/news/claude-opus-4-7 [7] Anthropic — Claude Opus (news) [first_party]: https://www.anthropic.com/news/claude-opus-4-5 [8] Claude Sonnet is generally available in GitHub Copilot — GitHub Changelog: https://github.blog/changelog/ [9] Anthropic API Pricing 2026 (caching, batch) [content_marketing]: https://www.finout.io/blog/anthropic-api-pricing [10] GPT-5.5 — API Pricing & Benchmarks | OpenRouter: https://openrouter.ai/openai/gpt-5.5 [11] OpenAI Codex CLI — SourceForge mirror: https://sourceforge.net/projects/openai-codex.mirror/files/ [12] Codex | AI Coding Partner from OpenAI [first_party]: https://openai.com/codex/ [13] Gemini 3 Developer Guide | Google AI for Developers [first_party]: https://ai.google.dev/gemini-api/docs/gemini-3 [14] Clarifying our pricing · Cursor [content_marketing]: https://cursor.com/blog/june-2025-pricing [15] Some changes to our pricing model for Cascade — Codeium/Windsurf: https://codeium.com/blog/pricing-windsurf [16] cline/cline — Autonomous coding agent (GitHub): https://github.com/cline/cline [17] Advanced model settings | aider [first_party]: https://aider.chat/docs/config/adv-model-settings.html [18] 9 главных LLM 2026 года — vc.ru: https://vc.ru/ai/2905428-luchshie-llm-dlya-zadach [19] Gemini 3.5 Pro: Release Date, Expected Specs — ofox.ai [content_marketing]: https://ofox.ai/blog/gemini-3-5-pro-release-date-expected-specs-2026/ [20] Qwen3.7 Max, Google's Antigravity U-Turn — Towards AI: https://pub.towardsai.net/qwen3-7-max-googles-antigravity-u-turn-and-a-wild-48-hours-in-ai-bd45589d9e18 [21] Claude Code by Anthropic [vendor_marketing]: https://claude.com/product/claude-code [22] GitHub Copilot Pricing 2026 — No Code MBA: https://www.nocode.mba/articles/github-copilot-pricing [23] Anthropic Claude Sonnet 5 vs Sonnet 4.6 vs Opus 4.8 — MarkTechPost: https://www.marktechpost.com/2026/06/30/anthropic-claude-sonnet-5-vs-sonnet-4-6-vs-opus-4-8-agentic-coding-benchmarks-api-pricing-and-cost-performance-tradeoffs-compared/ [24] GPT-5.5 Pricing — apidog: https://apidog.com/blog/gpt-5-5-pricing/ [25] Cursor Pricing in 2026 (Hobby/Pro/Pro+/Ultra/Teams) — DEV Community: https://dev.to/rahulxsingh/cursor-pricing-in-2026-hobby-pro-pro-ultra-teams-and-enterprise-plans-explained-4b89 [26] Windsurf Pricing 2026 — No Code MBA: https://www.nocode.mba/articles/windsurf-pricing

Verification notes

  • Corrected the central Anthropic error (newer-model-reported-as-worse): the draft recommended superseded Sonnet 4.5 as the current value coding model and framed Sonnet 5 as "same tier but lower benchmark scores." Sonnet 5 (June 30, 2026) is the newer, cheaper, higher-scoring model — 85.2% SWE-bench Verified, intro $2/$10 through 2026-08-31 (list $3/$15) — and now anchors the value recommendation [5][23][24]. Its 63.2% vs Opus 4.8's 69.2% SWE-bench Pro is a cross-tier gap, not a regression, and is kept.
  • Removed invented history: the "GitHub Copilot Pro cut from $19 to $10 in April 2026" claim could not be verified; Copilot Pro is and has been $10/month. Added the missing individual Max tier ($100/mo) [1][2][22].
  • Confirmed and kept: Gemini 3.5 Flash $1.50/$9, 1M ctx, May 19 2026 [3]; GPT-5.5 April 23 2026 at $5/$30 (Pro $30/$180), doubled from GPT-5.4, Batch 50% off [10][24]; Copilot AI Credits since June 1 2026 [2]; Cursor $20/$60/$200/$40 [25]; Windsurf Pro $20 with quotas since March 19 2026 (grandfathered $15) [26].
  • Cut unverifiable specifics: Opus 4.8 "Fast Mode $30/$150 → $10/$50" and "~4× fewer code flaws"; GPT-5.5 as "first full retrain since GPT-4.5"; a Cursor "$20 credit pool" figure. Marked several secondary benchmark numbers (Terminal-Bench, Gemini 3.1 Pro pricing, Qwen figures) "(unverified)" where only aggregator sources exist.
  • Anthropic per-token pricing (Opus 4.8 $5/$25; Sonnet 5 $3/$15 list, $2/$10 intro; Haiku 4.5 $1/$5) verified against the Claude API model/pricing reference (current as of 2026-06-24) and reconciled with third-party reporting [23][24].
  • Nuanced the "cheapest per-token" claim: Gemini 3.5 Flash is the lowest coding-tier price, but Haiku 4.5 ($1/$5) and Gemini 3.1 Flash-Lite (~$0.25/$1.50) are cheaper per token at lower capability — noted explicitly rather than overstated.