Maximum Severity Apache Tika Flaw Threatens Document Pipelines
CVE-2025-66516 is a CVSS 10.0 XXE injection vulnerability in Apache Tika affecting Solr, Elasticsearch, and countless document processing systems.
A maximum-severity vulnerability in Apache Tika, the ubiquitous document parsing library, allows attackers to exploit XML External Entity (XXE) injection to access sensitive internal resources. With a CVSS score of 10.0, CVE-2025-66516 affects countless organizations running document processing pipelines—many of whom may not realize they're using Tika.
Vulnerability Overview
CVE ID: CVE-2025-66516 CVSS Score: 10.0 (Maximum Severity) Vulnerability Type: XML External Entity (XXE) Injection Affected Components: tika-core, tika-parsers, tika-parser-pdf-module
The flaw stems from insecure XML parsing in Apache Tika's handling of XFA (XML Forms Architecture) content within PDF files. By constructing a PDF with specially crafted XFA payloads, an attacker can force Tika to resolve external XML entities—a classic XXE attack vector that should have been blocked.
Technical Details
According to Security Affairs, the attack works as follows:
- Payload Delivery: Attacker creates a PDF with embedded malicious XFA content
- Processing Trigger: The PDF is submitted to any system using Tika for document parsing
- XXE Exploitation: Tika processes the XFA, triggering external XML entity resolution
- Data Access: Attacker gains access to sensitive internal resources
XXE Attack Capabilities
Successful XXE exploitation can enable:
- File Disclosure: Reading arbitrary files from the server (e.g., /etc/passwd, configuration files, credentials)
- SSRF Attacks: Making requests to internal services
- Port Scanning: Enumerating internal network services
- Denial of Service: Resource exhaustion through recursive entity expansion
Affected Versions
| Component | Vulnerable Versions |
|---|---|
| tika-core | 1.13 through 3.2.1 |
| tika-parsers | 1.13 before 2.0.0 |
| tika-parser-pdf-module | 2.0.0 through 3.2.1 |
Patched Versions
Organizations must upgrade to:
- tika-core: 3.2.2 or later
- tika-parsers: Latest patched version
- tika-parser-pdf-module: Latest patched version
Critical Warning: The advisory explicitly notes that updating only the PDF parser module without upgrading tika-core leaves systems vulnerable, as the root vulnerability exists in tika-core itself.
Why This Is Worse Than It Looks
Apache Tika is everywhere—often invisibly. It's the document parsing engine behind:
- Search Platforms: Apache Solr, Elasticsearch
- Content Management: Document indexing and preview generation
- Compliance Tools: Document scanning and classification systems
- Email Gateways: Attachment analysis
- Data Loss Prevention: Content inspection engines
- eDiscovery Platforms: Legal document processing
Many organizations run Tika without knowing it. If you use any system that extracts text from documents—PDFs, Office files, images with OCR—there's a reasonable chance Tika is involved.
Attack Scenarios
Scenario 1: Public Upload Form
A web application accepts PDF uploads for processing (job applications, document submissions, support tickets). Attacker uploads a malicious PDF, triggering XXE when the backend processes it.
Scenario 2: Email Processing
An organization's email gateway uses Tika to scan attachments. Attacker sends email with weaponized PDF attachment, compromising the email processing infrastructure.
Scenario 3: Search Index Poisoning
An enterprise search platform indexes documents from shared drives. Attacker places malicious PDF in accessible location, compromising the indexing service when it processes the file.
Scenario 4: CI/CD Pipeline
A build pipeline uses Tika for documentation generation or artifact processing. Malicious PDF in repository compromises build infrastructure.
Immediate Actions
1. Inventory Tika Usage
Search your infrastructure for Tika dependencies:
# Maven projects
grep -r "tika" */pom.xml
# Gradle projects
grep -r "tika" */build.gradle
# Check running Java processes
ps aux | grep java | xargs -I {} sh -c 'jcmd {} VM.system_properties 2>/dev/null | grep tika'
2. Upgrade Immediately
Update all Tika components to version 3.2.2 or later. Remember: partial updates leave you vulnerable.
3. Review Document Processing
Identify all entry points where external documents are processed:
- File upload forms
- Email processing
- Automated document ingestion
- API endpoints accepting documents
4. Implement Defense in Depth
Even after patching:
- Validate and sanitize uploaded files
- Run document processing in sandboxed environments
- Limit network access from document processing services
- Monitor for unusual file access patterns
Detection
Watch for indicators of XXE exploitation:
- Unusual outbound connections from document processing services
- Access to sensitive files (/etc/passwd, configuration files)
- DNS queries for attacker-controlled domains from processing infrastructure
- Error logs showing XML parsing failures with external entity references
Resources
Organizations should inventory their Tika usage and prioritize patching. The ubiquity of this library means the attack surface is vast.
Related Articles
Cisco ISE XXE Flaw Has Public PoC, Patch Now
Cisco patches CVE-2026-20029, an XML external entity vulnerability in Identity Services Engine with proof-of-concept exploit code already publicly available.
Jan 31, 2026Apache Struts XXE Flaw Exposes Enterprise Apps to Data Theft
CVE-2025-68493 in the XWork component enables XML External Entity attacks that can leak files, perform SSRF, or crash systems. Patch to version 6.1.1.
Jan 19, 2026Cisco Patches ISE Flaw After Public PoC Exploit Emerges
CVE-2026-20029 lets authenticated admins read restricted system files through XML parsing weakness. Trend Micro ZDI researcher found the bug; no workarounds available.
Jan 11, 2026Claude Code Flaws Let Malicious Repos Steal API Keys, Run Code
Check Point found CVE-2025-59536 and CVE-2026-21852 in Anthropic's Claude Code. Opening a cloned repo could execute code and leak API credentials.
Feb 26, 2026