On June 30, 2026, an independent developer going by @thereallo published a byte-level teardown of Claude Code 2.1.196 and found something invisible sitting inside the system prompts: apostrophes that weren't apostrophes.
The tool was swapping plain ASCII apostrophes (U+0027) for lookalike Unicode variants like the right single quotation mark (U+2019) and the modifier letter apostrophe (U+02BC). Different codepoints encoded different bits.
Together they formed a steganographic signature of your deployment, keyed on your configured ANTHROPIC_BASE_URL and local timezone. The original disclosure on thereallo.dev hit #1 on Hacker News with 2,148 points and 497 comments.
The next day, Anthropic confirmed it. Head of Policy Thariq Shihipar posted on X that it was a March 2026 experiment to stop unauthorized resellers and model distillation, that it was never meant to track individual users, and that it had been turned off. The fix shipped in version 2.1.197 with no changelog entry.
This is a small feature with a large lesson. The prompt layer is now a place where AI coding tools can write things you can't see, and where enterprise trust either holds or breaks.
TL;DR
Claude Code embedded invisible Unicode apostrophe-variant markers into system prompts to encode deployment metadata, activating when the configured API endpoint matched Chinese providers DeepSeek, Moonshot, or Zhipu. Anthropic called it an anti-abuse experiment and silently removed it in version 2.1.197.
No peer AI coding tool has a documented equivalent as of July 2, 2026. The practical takeaway: system prompt markers are an auditable trust surface, and you can inspect them yourself in about twenty lines of Python.
What is AI coding assistant telemetry in the prompt layer?
AI coding assistant telemetry usually means the obvious channels: usage analytics, crash reports, request metadata sent over HTTP. Prompt-layer telemetry is different. It hides deployment or identity signals inside the text of the system prompt itself, so the data travels wherever the prompt travels, including into model output, and stays invisible in any normal terminal or editor.
That's what makes it worth your attention. A user-agent header is inspectable and expected. An invisible character woven into a sentence like "Today's configuration" is neither.
Key takeaways
- Claude Code 2.1.196 used four apostrophe variants (U+0027, U+2019, U+02BC, U+02BD) to encode roughly 14 to 28 bits per prompt: domain configuration, timezone, and a checksum.
- The markers activated against an obfuscated domain list matching DeepSeek, Moonshot, and Zhipu endpoints, which is why the community read competitive-intelligence intent into a stated anti-abuse feature.
- Apostrophe variants survive LLM tokenization and can appear in model output, unlike some zero-width characters that get stripped in processing.
- Anthropic acknowledged the practice but published no blog post and no changelog entry for the removal. Users of 2.1.197 had no way to know the behavior changed.
- Across eight peer tools reviewed, none had a documented Unicode marker practice. Open-source tools publish their prompts, so hiding one is structurally hard.
How the marker actually worked
The mechanism is simple once you see it. Claude Code took two deployment-specific values as keys: the ANTHROPIC_BASE_URL you configured, and your system timezone (America/Los_Angeles, Asia/Shanghai, and so on).
It then inserted apostrophe-variant characters at chosen positions in the system prompt. Each variant carried about 2 bits, so the choice of which apostrophe appeared where spelled out an encoded fingerprint of your setup.
The trigger was a domain match. When the base URL matched patterns for DeepSeek (api.deepseek.com), Moonshot (api.moonshot.cn), or Zhipu (api.zhipuai.cn), the marker system activated. The domain list lived as an obfuscated array in the binary, which is what pushed the story past "telemetry bug" and into a debate about targeting specific competitors.
The durable technical point: apostrophe variants aren't zero-width spaces. A zero-width space (U+200B) can get normalized away during text processing. An apostrophe variant looks like ordinary punctuation to a tokenizer, so it passes through the model unchanged and can end up in generated code and commit messages.
How to detect hidden Unicode markers yourself
This is the part that makes the whole thing an engineering control instead of a headline. You can audit any prompt in front of you. The byte-level tell is unambiguous: a plain apostrophe is one byte (27), while U+2019 is three bytes (E2 80 99).
Here's a compact scanner:
import unicodedata
APOSTROPHE_VARIANTS = {'\u0027', '\u2019', '\u02BC', '\u02BD'}
ZERO_WIDTH = {'\u200B', '\u200C', '\u200D', '\uFEFF', '\u00AD'}
def scan(text):
hits = []
for i, ch in enumerate(text):
if ch in APOSTROPHE_VARIANTS or ch in ZERO_WIDTH:
hits.append((i, f"U+{ord(ch):04X}", unicodedata.name(ch, "UNKNOWN")))
variant_count = sum(ch in APOSTROPHE_VARIANTS for ch in text)
ratio = variant_count / max(len(text), 1)
return {"hits": hits, "suspicious": ratio > 0.05, "ratio": round(ratio, 4)}
print(scan("Today\u2019s system configuration"))
A high ratio of fancy apostrophes in machine-generated text is the signal. Real prose has a few. A fingerprint has many, placed deliberately. For raw inspection, pipe suspect strings through a hexdump and look for E2 80 99 where you expected 27.
The unicode.live confusables guide covers the broader detection surface if you want to generalize this into a CI check.
Wire this into your pipeline: log the system prompts your agent sends, run the scanner on them nightly, and alert on anomalous variant ratios. That's a control you own regardless of what any vendor ships next.
Was this competitive intelligence or abuse prevention?
Both readings survive the evidence, and honesty here is the point.
Anthropic's account is coherent. Unauthorized resellers who buy access and resell it are a real problem, and distillation, where a competitor trains on your model's outputs, is a real threat to model IP.
A fingerprint that lets you spot abusive traffic patterns is a recognizable class of defense, closer to an API key prefix than to spyware. The markers encoded deployment metadata, not your code.
The uncomfortable half is the domain list. The activation logic specifically watched for three Chinese AI providers. Anthropic hasn't explained why those endpoints, and until it does, "purely defensive" and "competitive monitoring" are indistinguishable from the outside. The gist analysis that decompiled the domain array is what kept that question alive.
The disclosure gap makes it worse than it needed to be. A changelog note in 2.1.197 saying "removed deployment fingerprinting markers" would have cost Anthropic nothing and bought real credibility.
Its absence means you can't verify from the outside whether the practice was fully removed or merely changed, and it leaves open whether earlier versions did something similar.
Do other AI coding tools do this?
As of July 2, 2026, no peer tool has a documented Unicode marker or steganographic fingerprint. That's a real distinction, and the transparency floor tracks closely with whether the prompts are open.
| Tool | Prompts inspectable | Marker practice documented | Notes |
|---|---|---|---|
| Claude Code | No (closed) | Yes, removed in 2.1.197 | Latest 2.1.198 (Jul 2, 2026) |
| Cursor | No (closed) | None found | Injects a hidden, non-customizable system prompt into SDK agents |
| GitHub Copilot | Extractable via debug logs | None found | Prompts recovered by researchers, no markers seen |
| Aider | Yes (source) | None found | Open source; hiding a marker is hard to conceal |
| Cline | Yes (source) | None found | Apache 2.0, BYOK local |
| OpenCode | Yes (source) | None found | MIT, 75+ providers, v1.17.13 (Jul 1, 2026) |
Two caveats keep this fair. "None documented" is not "none exist," especially for closed backends like Cursor and Copilot where the same technique is technically possible. And Aider being open source didn't make it audit-proof, as the redteams.ai security analysis shows; open prompts just move the trust question to a place you can actually read.
What this means for your enterprise deployment
Treat prompt-layer behavior as part of your vendor-risk surface, the same way you treat data residency and retention. Your exposure depends on how you deploy.
| Deployment | Risk | What to check |
|---|---|---|
Default cloud (api.anthropic.com) |
Low | Markers, if present, encode generic metadata |
| Custom base URL / enterprise proxy | Medium | Fingerprint may encode your custom endpoint config |
| Air-gapped | Low | No egress, but markers still sit in prompt text |
| Third-party API wrapper | Higher | Inspect the wrapper for added telemetry layers |
Concrete moves for regulated teams. Confirm you're on Claude Code 2.1.198 or later against the Claude Code changelog. Run the scanner above against a sample of outbound system prompts and store the results.
Trace whether your requests pass through proxies or wrappers that could add their own telemetry. And pull your enterprise agreement to see what disclosure and notification obligations the vendor actually committed to.
For HIPAA, SOC 2, and GDPR contexts, the metadata question is live but not settled. Timezone and deployment region are arguably low-sensitivity, yet an undisclosed telemetry channel still cuts against SOC 2 confidentiality criteria and GDPR transparency expectations.
Document the timeline and the fix for your auditors either way; a paper trail is cheap insurance.
If you need maximum inspectability today, the open-source options give it to you at the cost of features. Aider (Apache 2.0), Cline (Apache 2.0), and OpenCode (MIT) publish their prompts and support bring-your-own-key local operation.
Cursor and Copilot give you a larger feature set and a closed backend, which is a trade, not a verdict. If you want the vendor's own account of what Claude Code sends, the Claude Code overview docs are the first-party reference.
The norm worth pushing for
The specific markers are gone. The precedent isn't. Any tool that builds your prompts can write invisible content into them, and the only reliable defense is the one you run yourself.
So make prompt inspection routine. Scan outbound prompts in CI, alert on anomalous Unicode, and ask vendors for two commitments in writing: document every system prompt modification, and disclose telemetry changes in the changelog.
This episode resolved well because the format was auditable and someone audited it. Keep that pressure on, and the next hidden marker gets caught in a nightly job instead of a viral thread.
Sources
- Claude Code Is Steganographically Marking Requests (thereallo.dev)
- Hacker News discussion, item 48734373
- Claude Code Anti-China Code Analysis (GitHub Gist)
- Claude Code changelog
- Claude Code overview (Anthropic docs)
- Python unicodedata documentation
- Detecting Unicode confusables (unicode.live, 2026 guide)
- Cursor Data Use & Privacy Overview
- Exploring GitHub Copilot's system prompt (mauricioacosta.dev)
- Aider analytics documentation
- Security Analysis of Aider (redteams.ai)
- Cline on the VS Code Marketplace
- OpenCode repository and releases
