Two independent research teams — Imperva and Varonis — published research findings this week showing that OpenClaw, a self-hosted open-source AI agent, can be forced to execute arbitrary attacker code or exfiltrate sensitive data via seemingly ordinary inputs: a shared contact, a vCard business card, or a routine email. The message-object injection vulnerability has been fixed in OpenClaw 2026.4.23, but the problem of social phishing of agents is architectural and cannot be solved with a patch. Anyone using OpenClaw needs to update immediately and rethink the agent’s access model.
Injection via message objects: technical breakdown
Imperva researcher Johann Sillam discovered that when a shared contact, vCard, or geolocation tag was passed to the language model, OpenClaw “expanded” the contents of these objects directly into the prompt text without any markup indicating an untrusted source. At the same time, content the agent downloaded from the internet was wrapped in a special untrusted-content marker — but message objects were not subjected to this treatment.
According to Imperva, a shared contact passed only the name field to the model, in the format <contact: name, number>. Since angle brackets are allowed in a contact name, the model could not determine where the real name ended and the injected instruction began. A key factor: the contact name is truncated when displayed on screen — in both WhatsApp and the receiving application — so the victim does not see the malicious payload. An analogous vector worked through the full name field in a vCard and through the caption on a shared geolocation.
During testing on Gemini 3.1 Pro (pre-release build), hidden text in the contact instructed the agent to download and execute a script from the researchers’ server — and the agent did so. An attempted injection via an image with embedded instructions failed: in the researchers’ view, models have already been trained to resist this vector, whereas the attack via message objects was new to them.
Imperva highlights an additional risk: memory in OpenClaw is enabled by default, which means a single widely shared contact containing a malicious instruction can silently compromise every agent that processes it, unless they are sandboxed. The researchers also found a similar pattern of “expanding” data into the prompt in other personal AI assistants, pointing to the systemic nature of the problem.
The fix in version 2026.4.23 moves contact names, vCard fields, and geolocation tags out of the prompt body into a separate untrusted-metadata channel.
Phishing agents: an ordinary email as an attack vector
The Varonis Threat Labs team led by Itay Yashar approached the problem from another angle. The researchers built an agent called Pinchy on the OpenClaw platform, connected it to a Gmail inbox containing realistic synthetic business data, and ran four phishing simulations on Google Gemini 3.1 Pro and OpenAI Codex GPT-5.4.
Varonis draws an important distinction between prompt injection (hidden instructions in data) and what they call agent phishing: a plausible request arriving through a normal channel and succeeding because the agent acts before the sender has been verified.
Results of the data exfiltration tests:
- Urgency scenario: an email supposedly from team lead “Dan” sent from an external Gmail address requested access to staging during an alleged production incident. The agent found and forwarded in cleartext test AWS IAM keys, database connection strings, and SSH credentials.
- Routine scenario: a request for the weekly customer export to prepare for a QBR. The agent sent a synthetic dataset of 247 corporate customers with contacts and contract amounts.
Both failures occurred despite a strict profile being enabled that required the agent to verify senders. The rule existed — but urgency overrode it in the first case, and routine in the second.
At the same time, the agent performed significantly better at recognizing technical threats: it interacted with a phishing gift-card page but did not submit real credentials and ultimately marked it as suspicious. Faced with a malicious OAuth consent screen masquerading as a timesheet app, the agent checked the target redirect URL, deemed it suspicious, and stopped.
Key takeaway from Varonis: an AI agent is better than many humans at recognizing malicious URLs and fake login portals, but worse at social judgment — for example, when a colleague suddenly asks for credentials at an unusual time. According to the researchers, OpenAI Codex GPT-5.4 was more cautious than Gemini 3.1 Pro when sending data to external resources, but both models succumbed to the social pretexts.
Architectural problem: the “lethal trifecta”
Varonis relates both attack vectors to a concept Simon Willison calls the “lethal trifecta”: an agent that can read private data, accept untrusted content, and send data out. OpenClaw has all three properties — which is exactly why a poisoned contact and a friendly email lead to the same outcome.
The trust-boundary problem also appears at code level. According to an InfoSec Write-ups analysis, static-analysis rules were created based on past OpenClaw security recommendations, and these uncovered five more vulnerabilities in channel extensions for Slack, Discord, Matrix, Zalo, and Microsoft Teams. All five stemmed from the same bug: the code granted allowlist access based on a mutable display name instead of a stable identifier, allowing an attacker to rename themselves and gain access. According to available information, these vulnerabilities have been fixed.
Regulatory context
The Dutch Data Protection Authority (Autoriteit Persoonsgegevens) has taken the toughest stance, recommending that users and organizations not run OpenClaw on systems containing confidential data, citing the risks of data leakage and account takeover. This lends regulatory weight to the researchers’ technical conclusions.
Protection recommendations
Varonis proposes four specific controls that should be implemented immediately:
- Agent instructions as policy: the instruction file must be a version-controlled, enforced document, not just a suggestion.
- Outbound email control: disallow initial sending to unknown addresses without human confirmation — so a compromised agent cannot send phishing from a trusted account.
- Segregate connector access by trust level: a mailbox that processes external email must not simultaneously have access to the entire CRM system. A connector’s access should match the trust level of the task’s source.
- Human-in-the-loop for critical actions: forwarding credentials, transferring funds, and other high-risk operations must require human approval.
Both teams converge on the same mental model: an agent is not a security tool but, in Varonis’s words, “a junior employee with system access and no intuition for what’s suspicious”, or, in Imperva’s terminology, “an authenticated executor that trusts its input.”
The first priority is to update to OpenClaw 2026.4.23 or later to eliminate the message-object injection vulnerability. But the patch closes only one of the two demonstrated vectors. The architectural problem — an agent that by design trusts its inputs and tries to be helpful — requires implementing the access controls listed above and ensuring mandatory human involvement in decisions with a high level of risk. Organizations that process confidential data should take the Dutch regulator’s recommendation seriously and assess whether connecting OpenClaw to systems holding sensitive information is acceptable at all without full isolation.