Does Claude Science use a new AI model?

No. It runs entirely on Claude Sonnet 4.6 and Claude Opus 4.8, the flagship Anthropic released on May 28, 2026. The product is an application and workflow layer, not new model weights.

What does the Claude Science reviewer agent actually check?

It flags three specific defects: citations that don't exist or are misattributed, numbers with no traceable code or data source, and figures that don't match the code that supposedly generated them. It does not guarantee algorithmic or statistical reproducibility, and it does not catch p-hacking or publication bias.

Is Claude Science safe for proprietary or IP-sensitive research?

Raw datasets, scripts, and outputs stay on your local machine, but prompt content and model responses go to Anthropic under its standard retention policy by default. Teams entering unpublished targets or novel structures should map data flows and confirm retention and BAA terms before adopting it.

How do I get access to Claude Science?

Beta access is open to Pro, Max, Team, and Enterprise plans on macOS and Linux, with discounts for academic and nonprofit labs. The AI for Science grant program funds up to 50 projects with up to $30,000 in credits each; applications close July 15, 2026.

Claude Science Is a Workflow Bet, Not a Model Bet

Q: What is Claude Science?

Claude Science is a macOS and Linux desktop workbench Anthropic launched on June 30, 2026 that orchestrates existing Claude models (Sonnet 4.6 and Opus 4.8) across genomics, single-cell, proteomics, structural biology, and cheminformatics. It ships no new model; the differentiator is a reviewer agent that audits citations, numbers, and figures before a human sees them.

Anthropic shipped a science product on June 30, 2026 and didn't train a single new weight for it. Claude Science runs entirely on models that already existed: Claude Sonnet 4.6 and Claude Opus 4.8, the flagship Anthropic released on May 28, 2026.

So the interesting thing isn't the IQ. It's the wrapper.

The product is a desktop workbench with 60+ pre-configured scientific databases, seven native artifact renderers, and a background reviewer agent that audits the main pipeline's citations, numbers, and figures before a human ever sees them. TechCrunch framed it as a bet on workflow over a new model. That framing is correct, and it tells you where Anthropic thinks the value in AI-for-science actually sits.

TL;DR

Claude Science is an AI lab workbench that orchestrates existing Claude models across five scientific domains, with a reviewer agent as its centerpiece. The thesis Anthropic is testing: in AI for science, the moat is workflow integration and auditability, not raw model capability.

The hard constraint it doesn't touch is wet-lab validation, and its data-retention default is something proprietary research teams need to read closely.

Key takeaways

No new model. Claude Science is an application layer on Sonnet 4.6 and Opus 4.8.
The reviewer agent flags three specific defects: nonexistent or misattributed citations, untraceable numbers, and figure-code mismatches.
It integrates NVIDIA's BioNeMo Agent Toolkit (Evo 2, Boltz-2, OpenFold3), so structure and genomics tools live inside the workflow.
Raw data stays local. Prompt content leaves under Anthropic's standard retention policy, which matters for IP-sensitive work.
The AI for Science Program funds up to 50 projects with up to $30,000 in credits each; applications close July 15, 2026.

What is Claude Science?

Claude Science is a macOS and Linux desktop application, launched June 30, 2026, that gives researchers one environment to move from literature review to analysis to manuscript preparation without switching tools. It targets five domains: genomics, single-cell analysis, proteomics, structural biology, and cheminformatics.

It builds on Claude for Life Sciences and Claude for Healthcare, the vertical solutions Anthropic shipped in January 2026. Think of those as the schema work; Claude Science is the workbench that sits on top.

Beta access is open to Pro, Max, Team, and Enterprise plans, with discounted seats for academic institutions and nonprofit labs.

Why ship a workbench instead of a model?

Because the bottleneck in AI-assisted science stopped being "can the model reason" a while ago. The bottleneck is trust in the output.

A model that drafts a methods section can invent a citation. It can report a p-value with no traceable path back to the code that computed it. It can hand you a figure that looks right but was generated by a different script than the one in the repo.

None of those are reasoning failures. They're provenance failures, and they're exactly what makes AI-generated findings expensive to verify.

Anthropic's answer is to make auditing a first-class part of the loop instead of a chore the scientist does afterward. That's the whole argument for the reviewer agent.

How the reviewer agent works

The architecture has three tiers. The model layer (Sonnet 4.6 and Opus 4.8) handles language, code generation, and multi-step reasoning. The tooling layer provides the database connections, the renderers for protein structures and sequence alignments and mass-spec data, and a sandbox that runs Python, R, and shell scripts so researchers don't rebuild existing pipelines.

The third tier is the differentiator. The reviewer agent watches outputs as the main pipeline produces them and flags three failure modes:

Failure mode	What it catches
Incorrect citations	Paper references that don't exist or are misattributed
Untraceable numbers	Statistics or values with no corresponding code or data source
Figure-code mismatch	Visualizations that don't match the code that supposedly generated them

When it catches something, it can route the output back for correction before it reaches the researcher. Every result gets a provenance record linking it to the specific code, data, and model version that produced it. That's the rerunnable, auditable artifact Anthropic is selling.

Be precise about what this does and doesn't cover, though. It checks that a figure matches its claimed code. It does not guarantee algorithmic reproducibility, that the same code on the same data yields the same result on someone else's machine.

It says nothing about statistical reproducibility, whether an independent experiment would confirm the finding. P-hacking and publication bias sit far outside its scope. The reviewer agent is real reproducibility infrastructure for a specific class of defect.

It is not a fix for the reproducibility crisis, and you should be suspicious of anyone who pitches it that way.

What the BioNeMo integration buys you

NVIDIA confirmed the BioNeMo Agent Toolkit integration on its own blog and in a newsroom announcement. Three models come along:

Evo 2, a genomic foundation model for DNA sequence analysis.
Boltz-2, for predicting 3D protein conformations from sequence.
OpenFold3, an open-source tool for protein complexes and interactions.

These run through NVIDIA's NIM inference containers, so a researcher can invoke structural biology and genomics computation inside a Claude Science workflow instead of bouncing between tools. The practical win is fewer context switches, not a new state of the art in structure prediction.

For that, AlphaFold 3 and DeepMind's next-generation AlphaFold preview still lead, and Claude Science doesn't try to compete there.

Where Claude Science sits against competitors

The closest direct rival is Microsoft Discovery, which reached general availability in 2026 with Azure's enterprise distribution behind it. FutureHouse's Robin multi-agent system chases the same end-to-end discovery vision. Each is making a slightly different wager.

Player	Product	The bet
Anthropic	Claude Science	Workflow + auditability as the moat
Microsoft	Discovery (GA, 2026)	Agentic platform + Azure distribution
DeepMind / Isomorphic	AlphaFold 3 / next-gen	Structure-prediction supremacy
FutureHouse	Robin	Multi-agent end-to-end discovery
NVIDIA	BioNeMo + Agent Toolkit	GPU-accelerated science models as substrate

Anthropic is the one explicitly betting that the differentiator is whether you can trust and reproduce the output, not whether the model is marginally smarter. If that bet is right, the reviewer-agent pattern shows up everywhere within a year.

The data-retention catch nobody's putting in the headline

Here's the part proprietary research teams need to read twice. The data model splits cleanly. Raw datasets, compute, scripts, and output artifacts stay on your local machine. But prompt content, the text of your queries, instructions, and context, plus the model responses, go to Anthropic's infrastructure.

And by default that's under Anthropic's standard retention policy, which permits use of customer content for model improvement unless you've purchased zero data retention. Claude Code documents a zero-retention option prominently; Claude Science does not advertise one as visibly for scientific workflows.

For a team typing an unpublished target, a novel chemical structure, or a genomic finding into a prompt, that's a real IP exposure to evaluate, not a checkbox. If you handle PHI, confirm your Business Associate Agreement covers Claude Science specifically before anything touches it.

These details aren't independently audited yet, so treat them as the starting point for a procurement conversation, not the end of one.

The bottleneck this doesn't touch

Claude Science can generate candidates faster. It cannot validate them. Wet-lab work, synthesis, assays, animal models, still runs anywhere from roughly $50,000 to $2 million per target depending on complexity, and that's the rate-limiting step in AI-designed science.

So there's a real risk of making your worst constraint worse. Speed up generation and you grow the queue of candidates waiting on validation capacity you already don't have.

The teams that get value here will pair adoption with discipline downstream: pre-allocate validation budget for the larger candidate volume, write explicit triage criteria for which AI-generated candidates earn a wet-lab slot, wire outputs into your LIMS, and cap how many refinement rounds happen before something commits to the bench. Fast analysis without fast kill criteria just produces a bigger backlog.

What this means for you

Adopt Claude Science if your team context-switches constantly across genomics pipelines, R, and Python analysis, and if reproducibility or audit trails are publication or compliance requirements. The grant program tilts the math further if your timeline fits a July 15 application.

Evaluate carefully if your data is IP-sensitive, since the standard retention default is the live question. Map your data flows first: know exactly what leaves to Anthropic versus what stays local, and archive every output with its provenance record so you can reproduce or defend it later.

Delay if your core need is state-of-the-art structure prediction, where AlphaFold remains the better tool, or if your evaluation process requires independent benchmarks. As of launch there are no peer-reviewed evaluations, no early-adopter reports, and no third-party verification of the 60+ database count or the reviewer agent's exact behavior.

Pilot it against your own work rather than vendor claims.

The reviewer agent doesn't replace expert judgment, and general model hallucination still happens outside the three defects it catches. Keep a domain expert in the loop for anything novel or surprising.

The product to watch isn't this release. It's whether "the model audits its own provenance" becomes table stakes for every serious AI research tool. Anthropic just made the first credible commercial bet that it will.

The launch arrives alongside the company's June 1, 2026 IPO filing and a confirmed Bristol-Myers Squibb partnership reaching 30,000 employees, so there's real weight behind the wager. Now it has to survive contact with people who run experiments for a living.

Claude Science: Anthropic's Workflow Bet on AI for Science