VulnerabilitiesDecember 16, 20254 min read

Maximum Severity Apache Tika Flaw Threatens Document Pipelines

CVE-2025-66516 is a CVSS 10.0 XXE injection vulnerability in Apache Tika affecting Solr, Elasticsearch, and countless document processing systems.

Marcus Chen

A maximum-severity vulnerability in Apache Tika, the ubiquitous document parsing library, allows attackers to exploit XML External Entity (XXE) injection to access sensitive internal resources. With a CVSS score of 10.0, CVE-2025-66516 affects countless organizations running document processing pipelines—many of whom may not realize they're using Tika.

Vulnerability Overview

CVE ID: CVE-2025-66516 CVSS Score: 10.0 (Maximum Severity) Vulnerability Type: XML External Entity (XXE) Injection Affected Components: tika-core, tika-parsers, tika-parser-pdf-module

The flaw stems from insecure XML parsing in Apache Tika's handling of XFA (XML Forms Architecture) content within PDF files. By constructing a PDF with specially crafted XFA payloads, an attacker can force Tika to resolve external XML entities—a classic XXE attack vector that should have been blocked.

Technical Details

According to Security Affairs, the attack works as follows:

  1. Payload Delivery: Attacker creates a PDF with embedded malicious XFA content
  2. Processing Trigger: The PDF is submitted to any system using Tika for document parsing
  3. XXE Exploitation: Tika processes the XFA, triggering external XML entity resolution
  4. Data Access: Attacker gains access to sensitive internal resources

XXE Attack Capabilities

Successful XXE exploitation can enable:

  • File Disclosure: Reading arbitrary files from the server (e.g., /etc/passwd, configuration files, credentials)
  • SSRF Attacks: Making requests to internal services
  • Port Scanning: Enumerating internal network services
  • Denial of Service: Resource exhaustion through recursive entity expansion

Affected Versions

ComponentVulnerable Versions
tika-core1.13 through 3.2.1
tika-parsers1.13 before 2.0.0
tika-parser-pdf-module2.0.0 through 3.2.1

Patched Versions

Organizations must upgrade to:

  • tika-core: 3.2.2 or later
  • tika-parsers: Latest patched version
  • tika-parser-pdf-module: Latest patched version

Critical Warning: The advisory explicitly notes that updating only the PDF parser module without upgrading tika-core leaves systems vulnerable, as the root vulnerability exists in tika-core itself.

Why This Is Worse Than It Looks

Apache Tika is everywhere—often invisibly. It's the document parsing engine behind:

  • Search Platforms: Apache Solr, Elasticsearch
  • Content Management: Document indexing and preview generation
  • Compliance Tools: Document scanning and classification systems
  • Email Gateways: Attachment analysis
  • Data Loss Prevention: Content inspection engines
  • eDiscovery Platforms: Legal document processing

Many organizations run Tika without knowing it. If you use any system that extracts text from documents—PDFs, Office files, images with OCR—there's a reasonable chance Tika is involved.

Attack Scenarios

Scenario 1: Public Upload Form

A web application accepts PDF uploads for processing (job applications, document submissions, support tickets). Attacker uploads a malicious PDF, triggering XXE when the backend processes it.

Scenario 2: Email Processing

An organization's email gateway uses Tika to scan attachments. Attacker sends email with weaponized PDF attachment, compromising the email processing infrastructure.

Scenario 3: Search Index Poisoning

An enterprise search platform indexes documents from shared drives. Attacker places malicious PDF in accessible location, compromising the indexing service when it processes the file.

Scenario 4: CI/CD Pipeline

A build pipeline uses Tika for documentation generation or artifact processing. Malicious PDF in repository compromises build infrastructure.

Immediate Actions

1. Inventory Tika Usage

Search your infrastructure for Tika dependencies:

# Maven projects
grep -r "tika" */pom.xml

# Gradle projects
grep -r "tika" */build.gradle

# Check running Java processes
ps aux | grep java | xargs -I {} sh -c 'jcmd {} VM.system_properties 2>/dev/null | grep tika'

2. Upgrade Immediately

Update all Tika components to version 3.2.2 or later. Remember: partial updates leave you vulnerable.

3. Review Document Processing

Identify all entry points where external documents are processed:

  • File upload forms
  • Email processing
  • Automated document ingestion
  • API endpoints accepting documents

4. Implement Defense in Depth

Even after patching:

  • Validate and sanitize uploaded files
  • Run document processing in sandboxed environments
  • Limit network access from document processing services
  • Monitor for unusual file access patterns

Detection

Watch for indicators of XXE exploitation:

  • Unusual outbound connections from document processing services
  • Access to sensitive files (/etc/passwd, configuration files)
  • DNS queries for attacker-controlled domains from processing infrastructure
  • Error logs showing XML parsing failures with external entity references

Resources


Organizations should inventory their Tika usage and prioritize patching. The ubiquity of this library means the attack surface is vast.

Related Articles