Maximum Severity Apache Tika Flaw Threatens Document Pipelines
CVE-2025-66516 is a CVSS 10.0 XXE injection vulnerability in Apache Tika affecting Solr, Elasticsearch, and countless document processing systems.
A maximum-severity vulnerability in Apache Tika, the ubiquitous document parsing library, allows attackers to exploit XML External Entity (XXE) injection to access sensitive internal resources. With a CVSS score of 10.0, CVE-2025-66516 affects countless organizations running document processing pipelines—many of whom may not realize they're using Tika.
Vulnerability Overview
CVE ID: CVE-2025-66516 CVSS Score: 10.0 (Maximum Severity) Vulnerability Type: XML External Entity (XXE) Injection Affected Components: tika-core, tika-parsers, tika-parser-pdf-module
The flaw stems from insecure XML parsing in Apache Tika's handling of XFA (XML Forms Architecture) content within PDF files. By constructing a PDF with specially crafted XFA payloads, an attacker can force Tika to resolve external XML entities—a classic XXE attack vector that should have been blocked.
Technical Details
According to Security Affairs, the attack works as follows:
- Payload Delivery: Attacker creates a PDF with embedded malicious XFA content
- Processing Trigger: The PDF is submitted to any system using Tika for document parsing
- XXE Exploitation: Tika processes the XFA, triggering external XML entity resolution
- Data Access: Attacker gains access to sensitive internal resources
XXE Attack Capabilities
Successful XXE exploitation can enable:
- File Disclosure: Reading arbitrary files from the server (e.g., /etc/passwd, configuration files, credentials)
- SSRF Attacks: Making requests to internal services
- Port Scanning: Enumerating internal network services
- Denial of Service: Resource exhaustion through recursive entity expansion
Affected Versions
| Component | Vulnerable Versions |
|---|---|
| tika-core | 1.13 through 3.2.1 |
| tika-parsers | 1.13 before 2.0.0 |
| tika-parser-pdf-module | 2.0.0 through 3.2.1 |
Patched Versions
Organizations must upgrade to:
- tika-core: 3.2.2 or later
- tika-parsers: Latest patched version
- tika-parser-pdf-module: Latest patched version
Critical Warning: The advisory explicitly notes that updating only the PDF parser module without upgrading tika-core leaves systems vulnerable, as the root vulnerability exists in tika-core itself.
Why This Is Worse Than It Looks
Apache Tika is everywhere—often invisibly. It's the document parsing engine behind:
- Search Platforms: Apache Solr, Elasticsearch
- Content Management: Document indexing and preview generation
- Compliance Tools: Document scanning and classification systems
- Email Gateways: Attachment analysis
- Data Loss Prevention: Content inspection engines
- eDiscovery Platforms: Legal document processing
Many organizations run Tika without knowing it. If you use any system that extracts text from documents—PDFs, Office files, images with OCR—there's a reasonable chance Tika is involved.
Attack Scenarios
Scenario 1: Public Upload Form
A web application accepts PDF uploads for processing (job applications, document submissions, support tickets). Attacker uploads a malicious PDF, triggering XXE when the backend processes it.
Scenario 2: Email Processing
An organization's email gateway uses Tika to scan attachments. Attacker sends email with weaponized PDF attachment, compromising the email processing infrastructure.
Scenario 3: Search Index Poisoning
An enterprise search platform indexes documents from shared drives. Attacker places malicious PDF in accessible location, compromising the indexing service when it processes the file.
Scenario 4: CI/CD Pipeline
A build pipeline uses Tika for documentation generation or artifact processing. Malicious PDF in repository compromises build infrastructure.
Immediate Actions
1. Inventory Tika Usage
Search your infrastructure for Tika dependencies:
# Maven projects
grep -r "tika" */pom.xml
# Gradle projects
grep -r "tika" */build.gradle
# Check running Java processes
ps aux | grep java | xargs -I {} sh -c 'jcmd {} VM.system_properties 2>/dev/null | grep tika'
2. Upgrade Immediately
Update all Tika components to version 3.2.2 or later. Remember: partial updates leave you vulnerable.
3. Review Document Processing
Identify all entry points where external documents are processed:
- File upload forms
- Email processing
- Automated document ingestion
- API endpoints accepting documents
4. Implement Defense in Depth
Even after patching:
- Validate and sanitize uploaded files
- Run document processing in sandboxed environments
- Limit network access from document processing services
- Monitor for unusual file access patterns
Detection
Watch for indicators of XXE exploitation:
- Unusual outbound connections from document processing services
- Access to sensitive files (/etc/passwd, configuration files)
- DNS queries for attacker-controlled domains from processing infrastructure
- Error logs showing XML parsing failures with external entity references
Resources
Organizations should inventory their Tika usage and prioritize patching. The ubiquity of this library means the attack surface is vast.
Related Articles
Cisco Snort 3 Flaws Enable DoS and Data Leaks
CVE-2026-20026 and CVE-2026-20027 allow remote attackers to crash Snort or extract sensitive data. No workarounds exist—patches are the only fix.
Jan 10, 2026Coolify Command Injection Flaws Grant Root Access
Five critical vulnerabilities in the self-hosting platform allow authenticated users to execute arbitrary commands as root. Over 52,000 instances are exposed globally.
Jan 10, 2026jsPDF Flaw Lets Attackers Embed Local Files in PDFs
CVE-2025-68428 enables path traversal in the popular JavaScript PDF library, allowing attackers to read arbitrary files from Node.js servers and exfiltrate them via generated documents.
Jan 9, 2026CISA Adds 16-Year-Old PowerPoint Flaw to Exploited List
January 7 KEV update includes CVE-2009-0556 from 2009 alongside recently patched HPE OneView vulnerability. Both are seeing active exploitation.
Jan 8, 2026