Ollama Flaw Leaks AI Server Memory to Attackers — 300K at Risk

A critical vulnerability in Ollama, the popular open-source framework for running large language models locally, could expose sensitive data from more than 300,000 internet-facing servers. The flaw, discovered by security researchers at Cyera and codenamed "Bleeding Llama," allows unauthenticated attackers to read arbitrary heap memory from vulnerable instances.

CVE-2026-7482 carries a CVSS score of 9.1 (Critical) and affects all Ollama versions before 0.17.1. Organizations running self-hosted AI infrastructure should patch immediately—the vulnerability requires no authentication and can be exploited remotely with a single malicious model file.

How the Attack Works

The vulnerability exists in Ollama's handling of GGUF (GPT-Generated Unified Format) files, the standard format for storing and distributing quantized language models. When processing a crafted GGUF file, Ollama's model loader fails to validate tensor offset and size values against the file's actual length.

During the quantization process in fs/ggml/gguf.go and server/quantization.go, the server reads past the allocated heap buffer—a classic out-of-bounds read condition. The unsafe Go package enables operations that bypass the language's built-in memory safety guarantees, turning what should be a bounds-checking error into an exploitable information disclosure.

An attacker exploiting this flaw would:

Upload a specially crafted GGUF file with inflated tensor dimensions via HTTP POST
Trigger the vulnerable code path using the /api/create endpoint during model creation
Exfiltrate leaked heap memory through the /api/push endpoint to an attacker-controlled registry

The leaked data could include environment variables, API keys for connected services, system prompts that organizations want to keep confidential, and conversation data from concurrent users.

Why This Matters

This vulnerability highlights the security challenges of running AI infrastructure at the edge. Unlike centralized API services where providers manage security, self-hosted deployments put the burden on organizations that may lack dedicated security teams.

The 300,000+ exposed Ollama instances identified by researchers represent a significant attack surface. Many of these deployments likely handle sensitive internal data—companies use local LLMs precisely because they want to keep prompts and responses private rather than sending them to external providers.

We covered similar concerns about exposed AI infrastructure recently when SANS researchers documented widespread scanning of self-hosted model endpoints. The Bleeding Llama vulnerability demonstrates that running models locally doesn't automatically mean running them securely.

The attack also mirrors techniques seen in the GreyNoise report on AI infrastructure attacks, where researchers tracked over 91,000 attack attempts targeting LLM deployments. As more organizations adopt local AI solutions, these systems become increasingly attractive targets.

Affected Versions and Patch

The vulnerability affects Ollama versions from 0.1.0 through 0.17.0. Cyera's disclosure notes that the fix arrived in version 0.17.1, released on May 10, 2026.

Organizations should:

Upgrade immediately to Ollama 0.17.1 or later
Audit network exposure — Ollama instances should not be directly accessible from the internet without authentication
Deploy authentication proxies or API gateways in front of Ollama endpoints
Review logs for suspicious /api/create or /api/push requests from unknown sources

The Broader AI Security Challenge

Bleeding Llama joins a growing list of vulnerabilities in AI/ML tooling discovered this year. The recent sglang GGUF vulnerability demonstrated that similar file format parsing flaws exist across multiple LLM serving frameworks. Supply chain attacks against Hugging Face repositories and npm packages targeting AI workflows show attackers are systematically probing the AI development ecosystem.

Security teams deploying local AI infrastructure should treat these systems with the same rigor applied to any internet-facing service: network segmentation, authentication requirements, regular patching, and monitoring for anomalous access patterns.

Ollama Flaw Leaks AI Server Memory to Attackers — 300K at Risk

How the Attack Works

Why This Matters

Affected Versions and Patch

The Broader AI Security Challenge

Related Articles

SGLang CVSS 9.8 Flaw Allows RCE via Malicious AI Model Files

Cursor AI Flaws Let Prompt Injection Escape Sandbox for RCE

SearchLeak Let Attackers Steal M365 Emails and MFA Codes in One Click

Squidbleed Leaks Credentials From 29-Year-Old Squid Proxy Bug

Related Articles

Vulnerabilities3 min read
SGLang CVSS 9.8 Flaw Allows RCE via Malicious AI Model Files
Critical CVE-2026-5760 in SGLang enables unauthenticated RCE through poisoned GGUF model files. Attackers can weaponize Hugging Face models to compromise inference servers.
Apr 26, 2026

Vulnerabilities4 min read
Cursor AI Flaws Let Prompt Injection Escape Sandbox for RCE
Two CVSS 9.8 vulnerabilities in the popular AI code editor allow zero-click attacks where malicious instructions in external data sources execute arbitrary commands on developer machines.
Jul 3, 2026

Vulnerabilities4 min read
SearchLeak Let Attackers Steal M365 Emails and MFA Codes in One Click
CVE-2026-42824 chained prompt injection, a timing race, and CSP bypass to exfiltrate Outlook emails, OneDrive files, and MFA codes via Microsoft 365 Copilot. Now patched.
Jun 23, 2026

Vulnerabilities4 min read
Squidbleed Leaks Credentials From 29-Year-Old Squid Proxy Bug
CVE-2026-47729 exposes a heap over-read in Squid's FTP parser that leaks HTTP authorization headers and cookies. The bug dates to 1997.
Jun 23, 2026