Critical Apache Tika Vulnerability CVE-2025-66516 Allows XXE Attacks via PDF XFA Forms

CyberSecureFox 🦊

A new critical vulnerability in Apache Tika, tracked as CVE-2025-66516, has been disclosed with the maximum CVSS score of 10.0. The flaw impacts the way Tika processes PDF documents containing XFA forms and opens the door to XML External Entity (XXE) injection, creating a high risk of data exfiltration and potential server-side remote code execution.

Apache Tika vulnerability overview and risk for enterprise systems

Apache Tika is widely used as both a service and a library to detect file types and extract text and metadata from a broad range of formats, including PDF, Office documents, archives, and more. Because Tika often sits in the core of DLP platforms, search engines, e-discovery tools, email gateways, and document management systems, any parsing vulnerability can propagate across many enterprise and cloud environments.

According to the published advisory, CVE-2025-66516 affects the following components and versions:

tika-core – versions 1.13 through 3.2.1 (inclusive);

tika-pdf-module – versions 2.0.0 through 3.2.1 (inclusive);

tika-parsers – versions 1.13 through 1.28.5 (inclusive) on all supported platforms.

The vulnerability is triggered when a specially crafted PDF file containing XFA content is submitted for processing. Due to improper configuration of the XML parser in the processing chain, an attacker can define and exploit external entities, which is the basis of an XXE attack.

What is XXE injection and why CVE-2025-66516 is so dangerous

XML External Entity (XXE) injection is a class of vulnerabilities that occurs when an XML parser trusts external entities declared inside an XML document. If protections are not enforced, the parser may be tricked into:

Reading local files on the server (for example configuration files, secrets, keys, or tokens);

Issuing internal network requests (Server-Side Request Forgery, SSRF) to services not exposed to the internet;

• In some scenarios, contributing to remote code execution or denial-of-service conditions.

XXE risks are well documented in the OWASP Top 10, which highlights them as a common and impactful issue in XML-processing systems. Services that routinely handle files uploaded by users—such as document converters, indexing pipelines, antivirus and mail gateways—are especially exposed. Because Apache Tika is frequently embedded in exactly these types of workflows, CVE-2025-66516 is particularly severe from an enterprise security perspective.

Link to CVE-2025-54988 and issues in the initial patch

The new Apache Tika vulnerability CVE-2025-66516 is closely related to a previously disclosed flaw, CVE-2025-54988, which had a CVSS score of 8.4 and was initially addressed in August 2025. The latest disclosure clarifies and expands the true attack surface and the set of affected modules.

Initially, the entry point for the earlier vulnerability was thought to be the tika-parser-pdf-module. In reality, the core defect and the necessary code changes resided in the central tika-core library. This mismatch created a dangerous situation: administrators who upgraded only the PDF parser module, but did not simultaneously update tika-core to version 3.2.2 or later, remained vulnerable despite believing the issue was fixed.

The confusion was compounded by an error in the original security bulletin. It did not clearly state that in the Tika 1.x line the PDFParser class was shipped as part of the org.apache.tika:tika-parsers artifact. As a result, a broader set of systems using legacy Tika branches and the tika-parsers artifact, without migration to the newer modular structure, were exposed for longer than expected.

Fixed versions and practical guidance for securing Apache Tika

The Apache Tika project has released updated Maven packages that remediate CVE-2025-66516 for all known affected configurations. Organizations are strongly advised to upgrade to at least the following versions:

tika-core 3.2.2 or newer;

tika-parser-pdf-module 3.2.2 or newer;

tika-parsers 2.0.0 or newer (for environments still relying on the historical artifact that included PDFParser).

During remediation, it is important to:

• Perform a full dependency inventory, checking both direct and transitive dependencies in Maven, Gradle, or other build tools;

• Ensure that all services using Apache Tika—indexing services, file-processing microservices, ETL pipelines—run with consistent versions of tika-core and the parser modules;

• Configure XML parsers according to OWASP recommendations, disabling external entities and DTD processing wherever possible, even after patching.

Strategic recommendations for organizations handling sensitive data

Organizations processing regulated or sensitive information—such as financial institutions, government agencies, healthcare providers, and legal services—should consider establishing or reinforcing a centralized vulnerability management process. This includes regular dependency scanning (e.g., using Software Composition Analysis tools), continuous monitoring of new CVE disclosures, and automated regression testing following component upgrades.

Given the maximum criticality of CVE-2025-66516, unpatched Apache Tika deployments present a direct path to compromising server file systems and internal services through a seemingly harmless operation like “extract text from an uploaded PDF”. To reduce the likelihood and impact of successful attacks, it is advisable not only to install the recommended Tika versions, but also to revisit the overall architecture for handling untrusted files: isolate file processing in dedicated containers or sandboxes, restrict access to the file system and internal network, and deploy monitoring for abnormal outbound and internal requests. Timely action on these measures can significantly lower the risk that a critical flaw in a popular library becomes the starting point of a major cybersecurity incident.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.