Poison Pill: How Hidden Context Poisoning Actually Works
A practical breakdown of human-visible text versus machine-visible payloads, why hidden channels matter, and how to design AI products defensively.
- AI Safety
- Prompt Injection
- Product Engineering
- LLM UX
- Security
Most teams still evaluate AI prompts the way humans read documents: visible text only.
That assumption breaks quickly in production.
A model does not just “read what we read.” It parses token sequences and metadata channels. If those channels carry extra instructions, the model can follow them even when a human reviewer sees nothing suspicious.
That gap is why I built Poison Pill.
The core mismatch
In normal product review, people ask:
- Does this copy look clear?
- Does this UI feel trustworthy?
- Does this output look correct?
For AI safety, those are necessary but incomplete. You also need to ask:
- What exact bytes/tokens does the model ingest?
- Are there hidden channels in that text path?
- Could those channels override system intent?
Humans review meaning. Models execute structure.
When structure and meaning diverge, that is your attack surface.
What Poison Pill demonstrates
The demo separates one message into two layers:
- Human-visible layer: normal text that looks harmless.
- Machine-visible layer: hidden payload inserted via channel tricks.
Then it shows both outputs side by side:
- what a person thinks the message says
- what a parser can extract from the same content
This is not hypothetical. The mechanisms are simple and cheap:
- zero-width characters appended or interleaved
- hidden comment payloads in HTML
- hybrid payloads that survive copy/paste and rendering transforms
The “magic” is not intelligence. It is encoding.
Why this matters for product teams
Most prompt-injection discussion lives at the model layer. But the bugs usually start in product surfaces:
- rich text inputs
- imported documents
- CMS content
- copied snippets between tools
- automated agent-to-agent handoff messages
If your app allows text to flow between systems, you are already in the context-poisoning game.
The right response is not panic. It is instrumentation and constraints.
Defensive defaults that actually help
In practice, I use a simple posture:
- Normalize and sanitize text before model entry.
- Expose machine-view previews in risky workflows.
- Tag or reject hidden-channel input by policy.
- Separate untrusted context from instruction channels.
- Log raw payloads for incident review.
This makes systems harder to trick and easier to debug.
Why I care about this
I build user-facing AI products, so I have to hold both truths at once:
- AI can massively increase delivery velocity.
- AI can also be mis-steered through fragile context boundaries.
Knowing how to leverage and exploit a system are the same skill viewed from opposite sides. Product quality comes from using that skill responsibly: design for utility, design for abuse resistance, and make failure modes legible.
That is the thesis behind Poison Pill.
Build for the model you actually have, not the one humans imagine.
Best,
Oli
March 10, 2026