How ChatGPhish and New AI Agent Exploits Expand Phishing Risk

Photo of author

CyberSecureFox Editorial Team

Researchers from Permiso Security disclosed an attack technique against ChatGPT called ChatGPhish, which turns routine web page summarization into a phishing vector. According to the researchers, the chatgpt.com response renderer trusts Markdown links and image URLs obtained from third-party pages, automatically fetches such images, and displays the links as clickable elements inside the assistant’s trusted interface. The publication coincided with a series of similar findings from other teams affecting AI coding agents, browser extensions, and frameworks for AI applications — including Microsoft-confirmed vulnerabilities CVE-2026-25592 and CVE-2026-26030 in Semantic Kernel. None of the issues described are known to have been exploited in real-world attacks so far, but public PoC demonstrations are available.

How ChatGPhish Works

The technique described by Permiso researcher Andi Ahmeti abuses the Markdown rendering mechanism used in ChatGPT’s responses. An attacker places a small payload on an arbitrary web page — hidden instructions formatted in Markdown. When a victim asks ChatGPT to summarize this page, the following happens:

  • Metadata leakage: images from the attacker’s server are automatically loaded when the answer is rendered, which, according to the researchers, exposes the victim’s IP address, User-Agent, and Referer header.
  • Phishing links: malicious Markdown links are displayed as active, clickable elements within the assistant’s interface.
  • Fake system alerts: the response can contain bogus security notifications and QR codes hosted, for example, in the attacker’s S3 bucket.

The key feature of ChatGPhish is not prompt injection itself but the fact that instructions embedded in an ordinary web page are executed and visually presented to the user as part of a legitimate response from a trusted AI interface. As Permiso notes, shifting the attack vector from email to the browser significantly expands the attack surface: the user doesn’t need to open an attachment or interact with a suspicious message — it is enough to ask ChatGPT to summarize a page.

Important: as of publication time, OpenAI has not issued an official security advisory on this issue. The technical details are based solely on Permiso’s research.

Attacks on AI Coding Agents: SymJack and TrustFall

In parallel, the Adversa AI team documented two techniquesSymJack and TrustFall — targeting AI coding assistants and agent-based CLI tools.

SymJack exploits symbolic links: a malicious repository tricks the agent into copying an innocent-looking file, but the destination path, via a symlink, actually points to the agent’s own configuration. After a restart, according to the researchers, a malicious MCP server is started with the user’s full privileges.

TrustFall, as reported by Adversa AI, enables one-click remote code execution: the repository includes a configuration that automatically approves and launches an MCP server without explicit user consent. It is enough to clone the repository and click “Yes, I trust this folder” in the trust dialog — and the attacker’s code runs with the developer’s full system privileges.

Broader Context: Vulnerabilities Across the AI Ecosystem

The techniques described are part of a large wave of security research into AI systems. The most significant confirmed findings include:

  • CVE-2026-25592 and CVE-2026-26030 in Microsoft Semantic Kernel — vulnerabilities that, according to a Microsoft advisory, allow prompt injection to be turned into remote code execution at the host level.
  • Typographic prompt injections — Cisco research showed that text rendered as an image can bypass safety filters in multimodal language models. The images look like noise to OCR-based filters but carry fully readable instructions for the target model.
  • Multi-step attacks on LLMs — Cisco emphasizes that LLM safeguards can be bypassed through multi-turn dialogues, whereas standard benchmarks test only single queries.
  • ClaudeBleed — according to LayerX, a vulnerability in the Claude browser extension allowed any extension, without special permissions, to hijack the AI assistant due to missing origin checks on the caller.
  • Agent skill ecosystem — a Snyk audit found that 13.4% of 3,984 analyzed skills on the ClawHub and skills.sh platforms contain at least one critical security issue, including malware distribution, prompt injections, and secret leakage.

Additionally, Unit 42 (Palo Alto Networks) demonstrated the PoC agent Zealot, capable of carrying out full-scale attacks on cloud infrastructure with minimal human involvement, chaining together reconnaissance, exploitation, privilege escalation, and data exfiltration.

Impact Assessment

Organizations that actively use ChatGPT and similar AI assistants for research tasks and content summarization are at the greatest risk. In the case of ChatGPhish, any web page an employee asks the AI to process can potentially contain a payload that turns the assistant’s interface into a phishing platform. For developers using AI coding agents, the SymJack and TrustFall risk means that cloning an unvetted repository can lead to a complete compromise of the workstation.

The trust factor is particularly dangerous: users perceive AI assistant responses as reliable, which lowers their critical perception of phishing elements displayed within a familiar interface.

Practical Recommendations

  1. For ChatGPT users: do not click links or scan QR codes from summarized answers without verifying the URL. Treat any “system alerts” in AI responses with the same skepticism as suspicious emails.
  2. For developers: do not clone or open in AI coding tools repositories from untrusted sources. Review the contents of MCP server configuration files before approving trust dialogs.
  3. For Microsoft Semantic Kernel administrators: apply patches for CVE-2026-25592 and CVE-2026-26030 as a priority.
  4. For SOC teams: enable monitoring of outbound requests from AI tools to external resources. Consider restricting automatic image loading and the rendering of external links in corporate AI environments.
  5. For users of the Claude browser extension: update the extension and audit installed browser extensions — any of them could potentially have exploited ClaudeBleed.

Taken together, the research described points to a systemic problem: trust boundaries in AI systems remain blurred, and models process content from external sources without sufficient isolation from the user interface. The top priority for organizations is to inventory AI tools used by employees, apply available patches (first and foremost for Semantic Kernel), and implement policies that restrict automatic execution of configurations from external repositories in development environments.


CyberSecureFox Editorial Team

The CyberSecureFox Editorial Team covers cybersecurity news, vulnerabilities, malware campaigns, ransomware activity, AI security, cloud security, and vendor security advisories. Articles are prepared using official advisories, CVE/NVD data, CISA alerts, vendor publications, and public research reports. Content is reviewed before publication and updated when new information becomes available.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.