Share article
Share article
Enjoy articles without ads?
Register for free and get unlimited access to all articles.
DeepMind's core warning
DeepMind researchers framed the issue around agentic systems that interact with websites and external tools. Unlike a standard model answering a user in a sealed chat window, these agents consume live content from web pages, emails, documents and app interfaces, then use that content to decide what to do next. [2]
That opens the door to adversarial instructions hidden inside ordinary web content. A malicious page does not need to break the model in some exotic way. It only needs to look sufficiently relevant, trustworthy or machine-readable for the agent to absorb the attacker's instructions as part of the task.
The six attack types
1. Direct prompt injection
The most obvious attack is also one of the most effective. A malicious webpage includes text aimed at the agent rather than the human reader, telling it to ignore prior instructions, reveal data or perform a different task.
For a human, the text may look irrelevant or invisible. For the model, it can be treated as just another instruction in context. If the agent is scraping a site or summarising a page before acting, that is enough to create trouble.
2. Indirect prompt injection
This is the version security teams worry about more, because the malicious instruction can arrive through a trusted workflow. The agent may open an email, a shared document, a calendar invite or a retrieved webpage, each containing hidden or seemingly harmless text that changes the model's behaviour.
3. Data exfiltration attacks
This is where AI agents move from quirky demo risk to proper enterprise risk. An agent with broad permissions can become a data extraction tool if it is not tightly sandboxed.
4. Cross-site or cross-app action manipulation
An agent that can move across services is useful, but it is also easier to steer into unintended actions. One manipulated site could induce the agent to take steps in another app, such as sending a message, altering settings or triggering a transaction.
This is not exactly the same as classic browser exploits, but the practical effect can rhyme with them. The attacker uses one trusted interaction surface to influence behaviour somewhere else.
5. Credential and permission abuse
Researchers also flagged attacks that target stored credentials, session state and delegated permissions. Many agent systems rely on connected accounts and reusable tokens so the model can act without repeated approvals.
That convenience creates a tempting target. If an attacker can nudge the agent into using those permissions outside the intended scope, the compromise may happen without any traditional account takeover in the usual sense.
6. Hidden content and deceptive page design
Some attacks rely less on obvious commands and more on how content is presented to the model. Instructions may be buried in HTML, tiny text, metadata, off-screen elements or other structures that humans barely notice but parsers and multimodal models still ingest. [3]
This is a key point. Attackers do not need to "hack" the model in the cinematic sense. They can simply exploit the mismatch between what the human sees and what the agent reads.
Why AI agents are uniquely exposed
Developers have tried to solve this with instruction hierarchies, content filtering and policy layers. DeepMind's findings suggest those controls help, but they do not remove the underlying problem. If untrusted content and privileged actions share the same context window, the risk remains live. [4]
What this means for crypto and fintech
The same applies to fintech and exchanges. If an agent can read support tickets, verify documents, move through back-office tools and reply to customers, the attack surface gets wide very quickly. The old rule still applies: automation is only as safe as its permission boundaries.
Mitigations DeepMind's warning points toward
DeepMind's work adds weight to a security model that treats every external content source as untrusted, even if it arrives through normal workflows. Agents need strict separation between planning, reading and acting.
The big strategic shift is this: developers should stop assuming better prompting alone will solve agent security. This is an architectural problem, not just a wording problem.
Why it matters
DeepMind is effectively saying the web itself can become the prompt that hijacks an AI agent. That should cool some of the breathless "just let the bot handle it" narrative.
The invalidation test is simple enough. If agent builders can prove strong isolation between untrusted content and privileged actions, these attacks become far less potent. If they cannot, then every extra permission handed to an autonomous agent is a fresh bit of risk dressed up as convenience.

