GPU RowHammer Attacks: GPUBreach, GDDRHammer and GeForge Threaten Cloud AI Security

CyberSecureFox

Recent academic research has shown that modern high‑performance graphics processing units (GPUs) are vulnerable to a new class of RowHammer attacks on GDDR6 memory. These attacks can not only corrupt data stored in video memory, but in certain scenarios lead to full privilege escalation and complete takeover of the host system. The most notable techniques are described in works dubbed GPUBreach, GDDRHammer and GeForge.

What is RowHammer and why GPUs are now at risk

RowHammer is a well‑known DRAM reliability issue where repeatedly accessing a single memory row (hammering) causes electrical interference that flips bits in adjacent rows. These bit flips break the fundamental assumption of memory isolation that operating systems and sandboxing mechanisms rely on.

For years it was assumed that GPU architectures, error‑correcting code (ECC) memory and aggressive DRAM refresh policies made practical GPU RowHammer attacks unlikely. That view changed with the GPUHammer work, published in 2025, which demonstrated the first practical RowHammer attack on NVIDIA GPUs with GDDR6. By orchestrating massive parallel hammering, the researchers were able to induce controlled bit flips and degrade the accuracy of machine‑learning models running on the GPU by up to 80%.

The newer attacks – GPUBreach, GDDRHammer and GeForge – go significantly further. Their goal is not just to sabotage computations, but to compromise GPU page tables, gain arbitrary memory access and in some cases escalate privileges to the CPU kernel.

GPUBreach: from GDDR6 bit flips to root access

Corrupting GPU page tables via GDDR6 RowHammer

GPUBreach shows that RowHammer‑induced bit flips in GDDR6 can be used to modify critical GPU memory‑management structures – specifically, GPU page table entries (PTEs). A previously unprivileged process can then obtain arbitrary read/write access to GPU memory.

Page table entries define which physical memory regions are mapped into a specific GPU context. By precisely flipping bits inside these entries, an attacker can remap pages, expand their accessible address space and bypass the logical isolation enforced by the GPU driver.

IOMMU bypass and kernel‑level privilege escalation

One of the most critical aspects of GPUBreach is that it works even with a fully enabled IOMMU (Input‑Output Memory Management Unit), which is designed to isolate devices and prevent direct memory access (DMA) attacks.

Once GPU page tables are corrupted, the compromised GPU can use DMA to access system RAM regions that are still legitimately allowed by the IOMMU, such as buffers used by the NVIDIA driver itself. By corrupting trusted driver data structures, the attacker can trigger controlled memory‑safety violations in the kernel, obtaining an arbitrary write primitive in kernel space. This leads directly to privilege escalation to root and execution of a shell with kernel‑level rights.

Stealing cryptographic keys and attacking AI models

Beyond privilege escalation, GPUBreach enables exfiltration of cryptographic secrets, including keys handled by libraries such as NVIDIA cuPQC for post‑quantum cryptography. It also allows targeted degradation of machine‑learning models running on the GPU by selectively corrupting model parameters or intermediate data. As a result, the attack affects both the confidentiality and integrity of workloads accelerated by GPUs.

Impact on cloud AI, multi‑tenant GPUs and HPC environments

The implications for cloud AI platforms, multi‑tenant GPU environments and high‑performance computing (HPC) clusters are particularly serious. In these settings, GPUs are frequently shared between workloads belonging to different customers or projects, making robust isolation essential.

If an attacker rents or otherwise gains access to a shared GPU instance in the cloud, a successful GPUBreach‑style attack could potentially:

  • read and modify other tenants’ data, models or intermediate results residing on the same GPU;
  • access sensitive areas of host memory, including encryption keys and credentials;
  • pivot from a single compromised GPU node to the broader cloud or HPC management plane.

Comparing GPUBreach, GDDRHammer and GeForge

All three research efforts exploit GDDR6 RowHammer to corrupt GPU page tables, but they differ in attack goals, preconditions and impact.

GDDRHammer focuses on modifying the aperture field inside GPU PTEs. This allows an unprivileged CUDA kernel to read and write the entire CPU physical memory region mapped for the GPU. The main impact is broad host memory access, but not necessarily a full, reliable path to root privileges.

GeForge targets the last‑level page directory (PD0) to alter address translation and achieve arbitrary access to both GPU and host memory. Its major limitation is the requirement to disable the IOMMU, which substantially reduces its practicality in well‑hardened data‑center or enterprise deployments.

GPUBreach stands out because it:

  • operates with IOMMU enabled by corrupting trusted driver state rather than bypassing the IOMMU directly;
  • provides not only arbitrary memory access, but reliable CPU privilege escalation to root;
  • combines data exfiltration, ML model sabotage and full system compromise in a single attack chain.

Mitigations and limitations of current defenses

An obvious short‑term mitigation is enabling ECC on GPUs wherever available. However, prior work such as ECCploit and ECC.fail has demonstrated that ECC is not a complete defense against RowHammer: if enough bits in a word flip simultaneously, standard ECC schemes may fail silently and allow data corruption to go undetected.

On desktop and mobile GPUs where ECC is typically not supported, there are currently no widely deployed, robust defenses against GPU RowHammer attacks. In data centers and cloud infrastructures, several additional measures are advisable:

  • reducing or eliminating hard multi‑tenancy on the same physical GPU, or using stronger hardware‑level partitioning mechanisms;
  • keeping GPU drivers and firmware up to date and rapidly applying vendor security patches;
  • monitoring for abnormal memory‑access patterns, such as aggressive hammering of specific rows;
  • adding software and hardware integrity checks for critical structures like GPU page tables and driver control data.

These new GPU RowHammer attacks highlight that dedicated accelerators are no longer peripheral to security architecture; they are core elements of the attack surface. Organizations relying on GPUs for AI, cryptography or large‑scale computation should reassess their threat models, evaluate exposure to multi‑tenant GPU risks and plan a transition to architectures and drivers hardened against memory manipulation. Proactively integrating GPU security into infrastructure design today significantly reduces the likelihood that the next generation of RowHammer attacks will turn high‑performance accelerators into a primary entry point for complete system compromise.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.