Ollama Flaw Leaks AI Server Memory to Attackers — 300K at Risk
Critical CVE-2026-7482 vulnerability in Ollama's GGUF model loader lets remote attackers extract API keys, prompts, and conversation data from 300,000+ exposed servers.
A critical vulnerability in Ollama, the popular open-source framework for running large language models locally, could expose sensitive data from more than 300,000 internet-facing servers. The flaw, discovered by security researchers at Cyera and codenamed "Bleeding Llama," allows unauthenticated attackers to read arbitrary heap memory from vulnerable instances.
CVE-2026-7482 carries a CVSS score of 9.1 (Critical) and affects all Ollama versions before 0.17.1. Organizations running self-hosted AI infrastructure should patch immediately—the vulnerability requires no authentication and can be exploited remotely with a single malicious model file.
How the Attack Works
The vulnerability exists in Ollama's handling of GGUF (GPT-Generated Unified Format) files, the standard format for storing and distributing quantized language models. When processing a crafted GGUF file, Ollama's model loader fails to validate tensor offset and size values against the file's actual length.
During the quantization process in fs/ggml/gguf.go and server/quantization.go, the server reads past the allocated heap buffer—a classic out-of-bounds read condition. The unsafe Go package enables operations that bypass the language's built-in memory safety guarantees, turning what should be a bounds-checking error into an exploitable information disclosure.
An attacker exploiting this flaw would:
- Upload a specially crafted GGUF file with inflated tensor dimensions via HTTP POST
- Trigger the vulnerable code path using the
/api/createendpoint during model creation - Exfiltrate leaked heap memory through the
/api/pushendpoint to an attacker-controlled registry
The leaked data could include environment variables, API keys for connected services, system prompts that organizations want to keep confidential, and conversation data from concurrent users.
Why This Matters
This vulnerability highlights the security challenges of running AI infrastructure at the edge. Unlike centralized API services where providers manage security, self-hosted deployments put the burden on organizations that may lack dedicated security teams.
The 300,000+ exposed Ollama instances identified by researchers represent a significant attack surface. Many of these deployments likely handle sensitive internal data—companies use local LLMs precisely because they want to keep prompts and responses private rather than sending them to external providers.
We covered similar concerns about exposed AI infrastructure recently when SANS researchers documented widespread scanning of self-hosted model endpoints. The Bleeding Llama vulnerability demonstrates that running models locally doesn't automatically mean running them securely.
The attack also mirrors techniques seen in the GreyNoise report on AI infrastructure attacks, where researchers tracked over 91,000 attack attempts targeting LLM deployments. As more organizations adopt local AI solutions, these systems become increasingly attractive targets.
Affected Versions and Patch
The vulnerability affects Ollama versions from 0.1.0 through 0.17.0. Cyera's disclosure notes that the fix arrived in version 0.17.1, released on May 10, 2026.
Organizations should:
- Upgrade immediately to Ollama 0.17.1 or later
- Audit network exposure — Ollama instances should not be directly accessible from the internet without authentication
- Deploy authentication proxies or API gateways in front of Ollama endpoints
- Review logs for suspicious
/api/createor/api/pushrequests from unknown sources
The Broader AI Security Challenge
Bleeding Llama joins a growing list of vulnerabilities in AI/ML tooling discovered this year. The recent sglang GGUF vulnerability demonstrated that similar file format parsing flaws exist across multiple LLM serving frameworks. Supply chain attacks against Hugging Face repositories and npm packages targeting AI workflows show attackers are systematically probing the AI development ecosystem.
Security teams deploying local AI infrastructure should treat these systems with the same rigor applied to any internet-facing service: network segmentation, authentication requirements, regular patching, and monitoring for anomalous access patterns.
Related Articles
SGLang CVSS 9.8 Flaw Allows RCE via Malicious AI Model Files
Critical CVE-2026-5760 in SGLang enables unauthenticated RCE through poisoned GGUF model files. Attackers can weaponize Hugging Face models to compromise inference servers.
Apr 26, 2026OpenClaw 'Claw Chain' Flaws Let Attackers Steal Data and Plant Backdoors
Cyera discloses four chainable OpenClaw vulnerabilities (CVE-2026-44112 through 44118) exposing 245,000 servers to credential theft, privilege escalation, and persistent access.
May 16, 2026Pwn2Own Berlin Day 2: Exchange RCE Chain Earns $200K, 15 Zero-Days Fall
Day two of Pwn2Own Berlin 2026 yields 15 new zero-days worth $385,750. Orange Tsai chains three bugs for SYSTEM-level Exchange RCE, earning the event's largest payout.
May 16, 202624 Zero-Days Fall on Day One of Pwn2Own Berlin 2026
Security researchers exploited Windows 11, Microsoft Edge, Red Hat Linux, and multiple AI platforms on the first day of Pwn2Own Berlin 2026, earning $523,000 for 24 unique zero-day vulnerabilities.
May 14, 2026