Google’s Threat Intelligence Group (GTIG) is tracking a notable shift in adversary tradecraft: threat actors are embedding large language models (LLMs) directly into malware runtime, enabling self-modifying code that adapts mid-execution. This approach aims to frustrate signature-based detection and static analysis while accelerating iteration speed and lowering development costs for attackers.
AI-powered, self-modifying malware: what PromptFlux demonstrates
GTIG highlights an experimental VBScript dropper dubbed PromptFlux. The sample achieves persistence by saving modified copies into Windows startup locations, spreads via removable media and network shares, and—critically—integrates with the Gemini API. By issuing prompts to the model, PromptFlux “regenerates” obfuscated code on the fly, attempting to break static signatures and rule-based heuristics (MITRE ATT&CK T1027, T1059).
“Thinking Robot”: Gemini-driven evasion tactics on a timer
The most novel component, Thinking Robot, periodically queries Gemini for new antivirus-evasion techniques. Its machine-readable prompts suggest the authors are pursuing a metamorphic script: unlike classic polymorphism, which repacks code without changing logic, metamorphism restructures the code itself. This complicates pattern matching and template-based detection, and raises the bar for reverse engineering.
Google’s response and current attribution status
GTIG assesses PromptFlux as early-stage with limited functionality, but indicative of where AI-enabled threats are heading. Google has blocked the sample’s access to the Gemini API and dismantled associated infrastructure. Attribution remains unresolved; no specific threat group has been publicly named at this time.
Real-world LLM abuse: examples across multiple threat actors
Beyond experiments, GTIG documents operational misuse of AI by state-aligned groups. China-nexus APT41 reportedly used a model to improve the OSSTUN C2 framework and to select obfuscation libraries. Another Chinese actor posed as a CTF participant to bypass LLM safety filters and solicit technical details on exploit development—evidence that social pretexting now targets AI systems themselves.
Iran-linked MuddyCoast (UNC3313) masqueraded as a student and leveraged Gemini to assist with malware development and debugging; operators inadvertently exposed command-and-control endpoints and keys, illustrating the operational security pitfalls of AI-assisted workflows. APT42 employed LLMs to craft phishing lures and stood up a “data-processing agent” that translates natural language into SQL queries to harvest personal data, streamlining post-compromise collection.
North Korea’s Masan (UNC1069) used Gemini to support cryptocurrency theft schemes, multilingual phishing, and deepfake-based lures. Pukchong (UNC4899) leveraged AI to accelerate exploit preparation against edge devices and browsers. In all cited instances, Google disabled implicated accounts and hardened model protections. These observations align with broader industry reporting that nation-state operators are experimenting with LLMs across reconnaissance, development, and influence operations (see Microsoft and Mandiant 2024 reporting; ENISA Threat Landscape 2023).
Criminal LLM marketplaces and the attacker “productivity” pitch
GTIG also notes growth in LLM-as-a-service for cybercrime on English- and Russian-language forums. Offerings span deepfake generation, malware development assistants, phishing toolkits, and exploit helpers. Vendors mirror the marketing of legitimate platforms—advertising “process efficiency,” free tiers, and paid API access. As GTIG’s technical lead Billy Leonard underscores, guardrails on mainstream platforms are pushing some actors to unregulated, “no-rules” models, reducing the barrier to entry for less experienced adversaries.
Defensive implications: prioritize behavior, integrity, and AI-assisted detection
LLM integration in attack chains enables rapid variant churn and broader TTP diversity. Defenders should enhance monitoring of outbound traffic to LLM/API endpoints, enforce script integrity controls (e.g., code signing, file integrity monitoring), apply allowlists for autostart locations (ATT&CK T1547), and restrict write access to removable media where business need is absent. Emphasize behavioral analytics to flag anomalous script execution, dynamic code generation, and unexpected network calls from interpreters.
Investing in defender’s AI can accelerate correlation across telemetry, improve de-obfuscation analysis, and assist in crafting countermeasures at scale. Close coordination with AI providers—rapid key revocation, API abuse detection, and model policy updates—reduces the attacker’s window of opportunity, as the PromptFlux takedown illustrates.
AI-augmented malware is at an early but accelerating stage. Organizations can reduce risk by combining least-privilege for scripting engines, transparent governance for internal LLM integrations, targeted user training against AI-powered phishing, and behavior-first detection. Cross-industry intelligence sharing and steady pressure on illicit LLM marketplaces will be essential to constraining this evolving threat.