Securing Ai Agents And Llm Apps

Clinical AI's Real Attack Surface Is the EHR Integration, Not the Model

The Heidi Health NEXUS jailbreak proved safety lives in a text layer the model will gladly rewrite, and the VA just multiplied that risk across 130 facilities.

By June 28, 202612 min read
clinical AI securityEHR AI jailbreakhealthcare AI adversarial attacks
Clinical AI's Real Attack Surface Is the EHR Integration, Not the Model

In March 2026, security researchers at Mindgard disclosed that Heidi Health's clinical scribe could be jailbroken in three prompts. The attack, dubbed NEXUS, extracted the system prompt, rewrote the safety guardrails, and reinstalled them as the active rule set. No CVE was assigned. No model weights were touched.

Meanwhile, the Department of Veterans Affairs was scaling ambient AI documentation to over 130 medical centers as of mid-2026, after a verified October 2025 pilot at 10 facilities using Abridge and Knowtex. The largest clinical AI deployment in US federal healthcare now runs on the same architectural pattern that NEXUS broke.

Clinical AI security is a deployment problem, not a model problem. The attack surface that matters is the integration layer connecting AI scribes to EHRs, RAG pipelines, tool calls, and downstream clinical systems. Fixing the model does nothing for a misconfigured OAuth scope, and patching the EHR does nothing for a prompt injection hidden in a retrieved patient document.

TL;DR

  • The NEXUS attack bypassed Heidi Health's safety guardrails in three conversational prompts by exploiting guardrails stored in the system prompt layer, not the model.
  • The VA is scaling ambient AI scribes to 130+ facilities, multiplying the blast radius of any integration-layer vulnerability across millions of veteran encounters.
  • Five attack vectors dominate clinical AI: prompt injection via EHR text fields, PHI exfiltration via tool calls, persona jailbreaks, indirect injection through patient documents, and supply-chain compromise via model updates.
  • Clinician review is a compensating control, not a primary safeguard. Research shows clinicians review AI output in roughly 11 seconds and develop rubber-stamp acceptance within weeks.
  • Healthcare organizations have more leverage over integration architecture than over model weights, so security investment should concentrate on deployment hardening.

How the NEXUS Attack Broke a Clinical Scribe in Three Prompts

The NEXUS technique, disclosed by Jim Nightingale of Mindgard on March 18, 2026, used a three-phase methodology the researchers called Reveal, Rebuild, Recite.

In the Reveal phase, conversational prompting caused the model to output its own system-level instructions, exposing the safety guardrails and clinical workflow rules that governed the scribe. In Rebuild, the attacker instructed the AI to rewrite those rules to permit previously forbidden behaviors, leveraging instruction-following against the guardrails themselves.

In Recite, the modified rules were established as the operative instruction set, with the AI explicitly affirming the new directives as its active governing policy.

The architectural insight is what matters. Heidi Health's safety guardrails lived in the system prompt, a text layer interpreted at runtime, rather than in the model's weights or an out-of-band content filter.

This design is common across clinical AI vendors because it lets them update safety policies without retraining. It also means any mechanism that can extract or modify the system prompt can bypass every safety control.

As of June 2026, Heidi Health has issued partial mitigation. The vendor updated its terms of service and added some runtime protections against direct system prompt extraction, per its May 2026 changelog.

The core architectural vulnerability, guardrails in system prompts rather than model weights, has not been fundamentally remediated. Alternative extraction techniques using retrieved context or multi-step reasoning chains remain viable.

Treat this attack class as ongoing risk requiring compensating controls, not a resolved bug.

Why Is the Integration Layer the Real Attack Surface?

Vulnerabilities in clinical AI exist at two distinct levels, and conflating them wastes security budget.

Model-class vulnerabilities reside in the underlying language model: training data bias, safety training limits, hallucination tendency, instruction-following susceptibility, and training data memorization. These require changes to weights, training process, or base architecture. Healthcare organizations cannot remediate them through configuration, and commercial vendors rarely disclose the training data provenance or safety evaluation methodology needed for independent verification.

Deployment-integration vulnerabilities arise from how clinical AI connects to EHRs, clinical workflows, and downstream systems. OAuth scope misconfigurations, excessive API privileges, insufficient audit logging, improper input validation, and misconfigured retrieval pipelines all live here. Replacing the base model does not fix a misconfigured EHR integration. Improving model safety does not prevent PHI exfiltration through an unsecured API endpoint.

The asymmetry is the point. Organizations have direct control over integration architecture, API configurations, authentication flows, and data pipelines. They have almost no control over model properties. Security investment should follow the leverage. Assume model-class vulnerabilities exist and cannot be fully patched, then design integration architectures that limit blast radius when they are exploited.

The NEXUS case sits precisely at the intersection. The model supplied the capability (instruction following). The deployment supplied the vulnerability (system prompt placement). Neither layer alone explains the breach.

What Are the Five Attack Vectors for Clinical AI?

A working taxonomy, mapped to OWASP's LLM Top 10 (2025 v2.0) and MITRE ATLAS, gives red teams a structured foundation for threat modeling.

1. Prompt injection via EHR text fields. EHRs are full of free-text fields: clinical notes, problem lists, medication histories, patient-reported information. An attacker who can influence any field the AI retrieves from can inject instructions the model executes as system directives. Wei et al. (2024) showed indirect injection embedded in retrieved documents persists across conversation turns, influencing behavior even when the user never sees the malicious content. Healthcare-specific outcomes include transcribing sensitive data into wrong fields, suppressing medication alerts, or exfiltrating data through tool calls.

2. PHI exfiltration via tool calls. Clinical AI increasingly retrieves records, checks drug interactions, places orders, and sends messages. Each tool invocation is a data egress path. PHI may flow to third-party services, observability platforms, or unintended output recipients. The OWASP LLM02 sensitive information disclosure category covers this directly. The architectural pattern of streaming scribe output to EHRs, mobile apps, and cloud platforms simultaneously multiplies exfiltration points.

3. Alter-ego and persona jailbreaks. NEXUS is the canonical example. Persona attacks reframe harmful requests as role-play or hypothetical scenarios where safety constraints appear inapplicable. Zhang et al. (2025) documented systematic persona manipulation exceeding 80% jailbreak rates across multiple commercial clinical AI systems. Clinical workflows are inherently role-based and authority-delegated, which makes them unusually susceptible.

4. Indirect injection through patient documents. RAG systems pull from patient history, clinical guidelines, scanned PDFs, legacy EHR exports, and patient-provided documents. Each is an injection vector with limited provenance visibility. An attacker who poisons any retrieved document can persistently influence AI behavior across multiple patient encounters.

5. Supply-chain risk via model updates. OWASP LLM03 covers compromised base models, tampered training data, and malicious dependencies. arXiv research from 2025 showed as few as 100 to 500 poisoned training samples can alter model behavior in clinically significant ways. A March 2026 LiteLLM supply-chain compromise affected enterprise deployments and underscored the systemic risk from dependency vulnerabilities in the AI software stack.

How Does the VA Scale-Up Multiply the Risk?

The VA deployment is the largest single clinical AI rollout in US healthcare, and its scale turns integration vulnerabilities into systemic risk.

VA Ambient AI Deployment Scale (facilities)Oct 2025 pilot10facilitiesMid-2026 scale-up130facilities
VA Ambient AI Deployment Scale (facilities)

After the October 2025 pilot at 10 VA medical centers, the VA contracted Rise8 and Thoughtworks to extend ambient documentation to over 130 facilities nationwide. The Orlando VA Health Care System described the implementation as enhancing veteran care through ambient scribe technology in primary care.

The vendor transition itself is a security signal. Microsoft and Nuance originally received a VA ambient documentation contract in July 2024. Implementation subsequently shifted to Abridge, Knowtex, Rise8, and Thoughtworks. Mid-deployment vendor changes introduce integration rework, new trust boundaries, and fresh attack surface at exactly the moment operational momentum is highest.

Federal scale brings advantages the rest of healthcare cannot match: CISA coordination, federal cybersecurity frameworks, congressional oversight. It also creates a high-value target. The VA processes military service history, mental health conditions, combat injuries, and service-connected disabilities for a population with elevated privacy concerns.

Security requirements developed for VA ambient AI will set the de facto standard for commercial healthcare AI. Monitor VA audit findings and incident reports as leading indicators.

Is Clinician Review Enough to Catch AI Failures?

The honest answer is no, not as a primary control.

The supporting case is real. The American Medical Association documented ambient documentation tools saving roughly 15,000 clinician hours annually while maintaining documentation quality. Peer-reviewed studies showed improved documentation completeness when clinicians reviewed and corrected AI output.

The countervailing evidence is stronger. Don Shin's 2026 research documented automation bias effects where clinicians develop rapid acceptance patterns that override critical evaluation, often within weeks of deployment.

A 2025 study of AI-assisted clinical decision-making found clinicians reviewed AI recommendations for an average of 11 seconds before accepting or overriding. Pathak's August 2025 Lancet study documented colonoscopist deskilling from AI diagnostic aids, showing progressive erosion of independent clinical judgment.

If clinicians rely on AI for clinical reasoning, they lose the capacity to detect AI errors. That is a security problem, not just a quality problem. Treat clinician review as one layer in defense-in-depth and invest in technical controls that function independently of human attention: input validation, output monitoring, least-privilege integration design, real-time anomaly detection.

What Should a Clinical AI Red Team Actually Test?

Pre-deployment testing must extend beyond traditional security assessment into AI-specific attack vectors. Threat model with STRIDE, LINDDUN for privacy, MITRE ATLAS for AI-specific attacker actions, PASTA for risk ranking, and AAMI TIR57 for medical-device considerations.

Test categories must include direct and indirect prompt injection, multi-turn jailbreaks, training data extraction, RAG poisoning, agentic tool abuse, clinical hallucination, demographic bias, PHI leakage in outputs and logs, model theft via API extraction, and calibration drift. Required tooling includes Microsoft PyRIT for automated prompt injection, NVIDIA Garak for model vulnerability scanning, Promptfoo for configuration testing, Rebuff for injection detection, and Microsoft Presidio or AWS Comprehend Medical for PHI detection.

Define empirical gate criteria before testing begins: acceptable hallucination rates per clinical scenario, maximum sub-group performance disparities, zero-tolerance PHI leakage in de-identified contexts, and prompt injection success thresholds. No clinical AI should reach production without meeting defined thresholds across all categories.

EHR integration testing is non-negotiable and must happen in the actual deployment context, not a vendor sandbox. The SMART on FHIR standard (HL7 FHIR R4 with OAuth 2.0/OpenID Connect) governs most integrations.

Verify OAuth scopes enforce least privilege, test Bulk Data Export egress controls, validate CDS Hooks integration, and confirm write-back role-based access control. The AI must not escalate privileges beyond authorized scopes, persist tokens past session expiration, or write to the EHR without explicit clinician authorization.

What This Means for You

Three decisions matter most if you are procuring, deploying, or securing clinical AI in mid-2026.

First, treat the system prompt as sensitive security infrastructure, not configuration. Protect it against extraction through access controls, monitoring, and architectural choices that minimize prompt exposure. Ask vendors where their guardrails live and how they survive an extraction attempt.

Second, concentrate security spend on the integration layer where you have leverage. Require HITRUST CSF r2 certification, SOC 2 Type II, ISO/IEC 42001:2023 for AI management, and FedRAMP authorization for federal data.

Demand penetration test reports within 12 months that explicitly cover prompt injection, indirect injection, jailbreaks, RAG poisoning, and model extraction. Contractual provisions must prohibit training on customer PHI, guarantee model isolation, and specify hallucination accuracy SLAs.

Third, build logging that satisfies HIPAA audit controls under 45 CFR §164.312(b) and captures AI-specific signals: prompt input hash, model version, retrieval context, tool invocations, output, clinician override decisions, and drift indicators. Incident response playbooks must include AI-specific scenarios: hallucinated clinical summaries leading to incorrect treatment, PHI spillover, mass drift events, and agent tool abuse.

The model will ship a new version next month. The integration architecture is what you actually own. Harden that.

Sources

Frequently asked questions

What is the NEXUS jailbreak that affected Heidi Health?

NEXUS is a three-prompt 'Reveal, Rebuild, Recite' attack disclosed by Mindgard in March 2026 that extracted Heidi Health's system prompt, rewrote its safety guardrails, and reinstalled them as active rules. It exploited guardrails stored in the system prompt layer rather than model weights, so it bypassed safety controls without touching the underlying model.

Is clinician review an effective security control for clinical AI?

Only partially. Clinician sign-off catches some errors, but research on automation bias shows clinicians review AI recommendations for about 11 seconds on average and develop 'rubber stamp' acceptance within weeks. Treat clinician review as a compensating control, not a primary safeguard, and pair it with technical controls that function independently of human attention.

How is the VA deploying ambient AI scribes?

After an October 2025 pilot at 10 VA medical centers using Abridge and Knowtex, the VA contracted Rise8 and Thoughtworks to scale ambient documentation to over 130 facilities as of mid-2026. This is the largest single clinical AI deployment in US healthcare and processes highly sensitive veteran health information.

What is the primary attack surface for clinical AI?

The integration layer connecting AI to EHRs, clinical workflows, and downstream systems. This includes API connections, OAuth scopes, RAG pipelines, tool invocations, and system prompt placement. Organizations have far more control over integration architecture than over model weights, so security investment should prioritize deployment hardening.

What red-team tools should healthcare use for clinical AI?

Use Microsoft PyRIT and NVIDIA Garak for prompt injection and model vulnerability scanning, Promptfoo for configuration testing, Rebuff for injection detection, and Microsoft Presidio or AWS Comprehend Medical for PHI leakage testing. Define empirical gate criteria, including zero-tolerance PHI leakage and jailbreak success thresholds, before testing begins.