PROBABLYPWNED
VulnerabilitiesMay 17, 20263 min read

Ollama Flaw Leaks AI Server Memory to Attackers — 300K at Risk

Critical CVE-2026-7482 vulnerability in Ollama's GGUF model loader lets remote attackers extract API keys, prompts, and conversation data from 300,000+ exposed servers.

Marcus Chen

A critical vulnerability in Ollama, the popular open-source framework for running large language models locally, could expose sensitive data from more than 300,000 internet-facing servers. The flaw, discovered by security researchers at Cyera and codenamed "Bleeding Llama," allows unauthenticated attackers to read arbitrary heap memory from vulnerable instances.

CVE-2026-7482 carries a CVSS score of 9.1 (Critical) and affects all Ollama versions before 0.17.1. Organizations running self-hosted AI infrastructure should patch immediately—the vulnerability requires no authentication and can be exploited remotely with a single malicious model file.

How the Attack Works

The vulnerability exists in Ollama's handling of GGUF (GPT-Generated Unified Format) files, the standard format for storing and distributing quantized language models. When processing a crafted GGUF file, Ollama's model loader fails to validate tensor offset and size values against the file's actual length.

During the quantization process in fs/ggml/gguf.go and server/quantization.go, the server reads past the allocated heap buffer—a classic out-of-bounds read condition. The unsafe Go package enables operations that bypass the language's built-in memory safety guarantees, turning what should be a bounds-checking error into an exploitable information disclosure.

An attacker exploiting this flaw would:

  1. Upload a specially crafted GGUF file with inflated tensor dimensions via HTTP POST
  2. Trigger the vulnerable code path using the /api/create endpoint during model creation
  3. Exfiltrate leaked heap memory through the /api/push endpoint to an attacker-controlled registry

The leaked data could include environment variables, API keys for connected services, system prompts that organizations want to keep confidential, and conversation data from concurrent users.

Why This Matters

This vulnerability highlights the security challenges of running AI infrastructure at the edge. Unlike centralized API services where providers manage security, self-hosted deployments put the burden on organizations that may lack dedicated security teams.

The 300,000+ exposed Ollama instances identified by researchers represent a significant attack surface. Many of these deployments likely handle sensitive internal data—companies use local LLMs precisely because they want to keep prompts and responses private rather than sending them to external providers.

We covered similar concerns about exposed AI infrastructure recently when SANS researchers documented widespread scanning of self-hosted model endpoints. The Bleeding Llama vulnerability demonstrates that running models locally doesn't automatically mean running them securely.

The attack also mirrors techniques seen in the GreyNoise report on AI infrastructure attacks, where researchers tracked over 91,000 attack attempts targeting LLM deployments. As more organizations adopt local AI solutions, these systems become increasingly attractive targets.

Affected Versions and Patch

The vulnerability affects Ollama versions from 0.1.0 through 0.17.0. Cyera's disclosure notes that the fix arrived in version 0.17.1, released on May 10, 2026.

Organizations should:

  1. Upgrade immediately to Ollama 0.17.1 or later
  2. Audit network exposure — Ollama instances should not be directly accessible from the internet without authentication
  3. Deploy authentication proxies or API gateways in front of Ollama endpoints
  4. Review logs for suspicious /api/create or /api/push requests from unknown sources

The Broader AI Security Challenge

Bleeding Llama joins a growing list of vulnerabilities in AI/ML tooling discovered this year. The recent sglang GGUF vulnerability demonstrated that similar file format parsing flaws exist across multiple LLM serving frameworks. Supply chain attacks against Hugging Face repositories and npm packages targeting AI workflows show attackers are systematically probing the AI development ecosystem.

Security teams deploying local AI infrastructure should treat these systems with the same rigor applied to any internet-facing service: network segmentation, authentication requirements, regular patching, and monitoring for anomalous access patterns.

Related Articles