Ollama Bleeding Llama Memory Leak And Windows RCE Flaws

Ollama, one of the most popular platforms for running LLM models locally, has faced two classes of critical issues at once: an unauthenticated process memory leak (CVE-2026-7482, Bleeding Llama, CVSS 9.1) and a related chain of vulnerabilities in the Windows client update mechanism (CVE-2026-42248, CVE-2026-42249, CVSS 7.7) that enable persistent code execution on logon. This affects hundreds of thousands of servers and workstations where Ollama is integrated with corporate services and development tools, and requires immediate version upgrades, instance isolation, and disabling auto-update for the Windows client until patches are released.

Technical details: from memory leak to persistent RCE

Bleeding Llama (CVE-2026-7482): memory leak via GGUF models

The vulnerability CVE-2026-7482 is described in the CVE.org database as an out-of-bounds heap read in the GGUF loader in Ollama up to version 0.17.1. The entry is available in the official CVE record and duplicated in the NVD. The affected code is in the files fs/ggml/gguf.go and server/quantization.go, in the WriteTo() function.

The essence of the issue:

The /api/create interface accepts a GGUF-format model file supplied by the client.
The GGUF file contains metadata about the tensor (offset, size, shape), similar to what is described in the TensorFlow tensor guide.
Ollama trusts the declared tensor size and shape without correlating them with the file’s actual length.
When quantizing the model tensor in WriteTo(), it performs a read past the bounds of the allocated heap buffer.

The root cause is the use of the unsafe package in Go when working with GGUF binary structures. The Go documentation explicitly states that the unsafe package bypasses memory safety guarantees, and in this case it resulted in the ability to read extra bytes from the process memory.

Researchers from Cyera demonstrated a three-step exploitation chain:

Upload of a specially crafted GGUF file with an inflated tensor shape to a network-accessible Ollama server via HTTP POST.
Triggering model creation via /api/create, which initiates the out-of-bounds read and embeds memory fragments into the resulting model artifact.
Exporting the resulting model to the attacker’s external registry via /api/push.

Critical protection-relevant characteristics:

Exploitation does not require authentication — the Ollama REST interface has no built-in authentication by default.
The attacker controls the GGUF content, but the extracted data are arbitrary bytes from the Ollama process memory, not limited to the file boundaries.
The project source code is open and available on GitHub, which simplifies exploit development.

In practice, this turns any Ollama instance exposed to external networks into a source of data leakage:

environment variables (including cloud provider keys, CI/CD tokens, etc.);
LLM system prompts and confidential instructions;
code and data processed by tools integrated with Ollama (for example, development tools);
fragments of conversations of other users served by the same process.

CVE-2026-42248 and CVE-2026-42249: Ollama auto-update for Windows as a vector for persistent RCE

The second group of vulnerabilities is related to the update mechanism of the Ollama client for Windows. The research was published by the Striga team on their Striga Research site, with coordinated disclosure handled by CERT Polska.

The Windows client:

starts automatically on user logon via the Windows startup folder;
runs locally on 127.0.0.1:11434 and periodically calls /api/update to check for updates;
when an update is found, downloads and installs a binary file on the next launch.

Two defects were identified:

CVE-2026-42248 (CVSS 7.7) — lack of update signature verification: unlike the macOS version, the Windows client does not verify the cryptographic signature of the binary before installation.
CVE-2026-42249 (CVSS 7.7) — path traversal: the path to the local directory for placing the installer is constructed directly from the HTTP headers of the update server’s response without normalization or validation.

If an attacker can control the update server accessible to the Ollama client (by redirecting OLLAMA_UPDATE_URL to a local HTTP server, compromising the real server, or tampering with network traffic), the exploitation chain looks like this:

The client requests an update via /api/update.
The attacker returns an “update” — an arbitrary executable file — and headers that specify a path outside the expected directory (for example, into the Windows startup folder).
The client saves the file to the specified path without signature verification and without later cleaning up unsigned files.
On the next logon, Windows automatically launches the written executable with the user’s privileges.

According to CERT Polska, Ollama for Windows versions from 0.12.10 through 0.22.0 inclusive are vulnerable, and the vulnerability remained unpatched at the time the recommendations were published. Temporary measures include disabling automatic updates and removing the Ollama shortcut from the startup folder.

The lack of integrity verification alone enables one-off code execution in the context of the update; adding path traversal turns this into persistent code execution on each user logon as long as the malicious file remains in the startup folder.

Impact assessment and risk prioritization

Bleeding Llama (CVE-2026-7482) is the primary threat for organizations that:

have Ollama servers accessible from the internet or from broad internal networks without strict segmentation;
process sensitive data via Ollama: source code, contracts, internal documents, customer data;
integrate Ollama with additional tools (for example, code analysis tools) whose results are passed to the model and stored in memory.

If no action is taken, consequences may include:

mass compromise of keys and tokens followed by intrusions into other systems;
leakage of confidential prompts and business logic underpinning internal LLM services;
exposure of user and customer data flowing through conversations with the model.

CVE-2026-42248/CVE-2026-42249 pose elevated risk for developer and analyst workstations where Ollama for Windows is used as a local interface to models and has access to:

SSH keys and repository tokens;
browser and corporate VPN credentials;
project working directories.

The chain gives an attacker persistent execution of malicious code with the current user’s rights — enough to install spyware, steal secrets, and move laterally across the infrastructure. At the same time, the only “off switch” is deleting the file from the startup folder; the logical flaws in the update component remain.

It is important that the two vectors complement each other: the server-side vulnerability exposes secrets and context, while the client-side one provides a reliable foothold on user systems. Conceptually, scenarios are possible where data leaked via Bleeding Llama (for example, proxy configurations or update parameters) simplify subsequent exploitation of the Windows client.

Practical guidance and checking for exposure

Servers and containers: protection against Bleeding Llama

Immediately update Ollama to version 0.17.1 or higher.
You can check the version in use via the official Ollama GitHub repository and the 0.17.1 release. All instances below this version should be considered vulnerable.
Restrict network access to the Ollama API.
- Close direct internet access to all Ollama interfaces.
- Allow access only via VPN, an API gateway, or a reverse proxy with authentication.
- Tightly segment internal networks: development and test instances should not be accessible from user segments.
Deploy an authenticating proxy or API gateway.
Since the Ollama REST API has no built-in authentication, every instance should be fronted by a component that verifies credentials, tokens, or client certificates.
Log analysis and possible compromise assessment.
- Search logs for HTTP requests to /api/create with large bodies and atypical sources, especially when followed by /api/push calls to external registries.
- If an instance was externally accessible and ran a vulnerable version, assume that all secrets in the Ollama process memory may have leaked and rotate keys, tokens, and passwords.
Revisit your model upload policy.
Do not allow anonymous or untrusted users to upload their own GGUF models to servers that process sensitive data.

Windows workstations: temporary measures against RCE via updates

Until an official fix for the Ollama Windows client is available, it is recommended to:

Disable automatic updates.
Follow the instructions from CERT Polska: turn off the AutoUpdateEnabled option and do not use the OLLAMA_UPDATE_URL variable to redirect to non-standard or unsecured (HTTP) servers.
Remove uncontrolled autostart.
Delete the Ollama shortcut from the startup folder:
%APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup. This does not eliminate the vulnerabilities but deprives the attacker of a convenient persistent launch point.
Check for suspicious files in the startup folder.
Inventory executable files in the startup folder and compare them to the expected set of programs; unknown or recently added files require separate analysis.
Strengthen perimeter control around updates.
- Ensure that traffic to the Ollama update server passes through a controlled proxy and cannot be tampered with.
- Block the use of unencrypted HTTP for downloading updates at the proxy or firewall level.
Monitor process activity.
Configure endpoint protection tools to track launches of unknown executables from the startup folder and executables spawned by the Ollama process.

For infrastructures where Ollama is used both on servers and on Windows workstations, the priority should be to: upgrade server instances to version 0.17.1+ while simultaneously segmenting the network, followed by centralized disabling of auto-update for the Windows client and cleaning the startup folder on workstations.

Technical details: from memory leak to persistent RCE

Bleeding Llama (CVE-2026-7482): memory leak via GGUF models

CVE-2026-42248 and CVE-2026-42249: Ollama auto-update for Windows as a vector for persistent RCE

Impact assessment and risk prioritization

Practical guidance and checking for exposure

Servers and containers: protection against Bleeding Llama

Windows workstations: temporary measures against RCE via updates

CyberSecureFox Editorial Team

Leave a Comment Cancel reply

Cybersecurity News

Microsoft Details Cryptojacking Campaign Abusing AI Chatbots and ScreenConnect

Cybersecurity News

Iran’s MuddyWater targets global orgs with DLL sideloading and ChromElevator

Cybersecurity News

Zero-Day Exploitation of KnowledgeDeliver LMS via ASP.NET ViewState

Cybersecurity News

Mass Ghost CMS compromises via CVE-2026-26980 SQL injection

Cybersecurity News

Banking Trojans Hide in WebRTC and MaaS: Grandoreiro, BTMOB

Cybersecurity News

Malicious npm Package Steals Claude AI Data via /mnt/user-data

How Bleeding Llama and Ollama Windows Auto-Update Lead to Critical RCE

Technical details: from memory leak to persistent RCE

Bleeding Llama (CVE-2026-7482): memory leak via GGUF models

CVE-2026-42248 and CVE-2026-42249: Ollama auto-update for Windows as a vector for persistent RCE

Impact assessment and risk prioritization

Practical guidance and checking for exposure

Servers and containers: protection against Bleeding Llama

Windows workstations: temporary measures against RCE via updates

CyberSecureFox Editorial Team

Leave a Comment Cancel reply

most recent

Cybersecurity News

Microsoft Details Cryptojacking Campaign Abusing AI Chatbots and ScreenConnect

Cybersecurity News

Iran’s MuddyWater targets global orgs with DLL sideloading and ChromElevator

Cybersecurity News

Zero-Day Exploitation of KnowledgeDeliver LMS via ASP.NET ViewState

Cybersecurity News

Mass Ghost CMS compromises via CVE-2026-26980 SQL injection

Cybersecurity News

Banking Trojans Hide in WebRTC and MaaS: Grandoreiro, BTMOB

Cybersecurity News

Malicious npm Package Steals Claude AI Data via /mnt/user-data

CyberSecureFox