Cloudflare's 6-Hour Outage Traced to API Query Bug

A bug in Cloudflare's automated cleanup system caused a six-hour global service disruption on February 20, 2026, after an API query was misinterpreted as a command to delete all customer IP prefixes instead of just those scheduled for removal.

Cloudflare's post-mortem details how a seemingly minor code change cascaded into one of their more significant recent incidents.

What Went Wrong

The outage stemmed from a change to how Cloudflare's network manages IP addresses onboarded through their Bring Your Own IP (BYOIP) service—a feature that lets customers route their own IP space through Cloudflare's network.

An automated cleanup subtask was designed to identify BYOIP prefixes marked as "pending delete" and remove them. The problem: the API query passed the pending_delete parameter with an empty value rather than a specific filter.

The API server interpreted that empty string as "return all BYOIP prefixes" rather than "return only those marked for deletion." The system then systematically queued every returned prefix for deletion, including those actively serving customer traffic.

Impact Scope

Of Cloudflare's 4,306 total BYOIP prefixes, 1,100 (roughly 25%) were withdrawn from the internet via BGP between 17:56 and 18:46 UTC.

Services affected included:

Core CDN and Security Services — customer traffic couldn't reach Cloudflare
Spectrum applications — proxying failed entirely
Dedicated Egress — both Gateway and CDN variants using BYOIP
Magic Transit — connection timeouts and failures
1.1.1.1 website — returned HTTP 403 "Edge IP Restricted" errors

The total incident duration was 6 hours and 7 minutes, with most of that time spent restoring prefix configurations to their pre-incident state.

Timeline

Time (UTC)	Event
Feb 5, 21:53	Buggy code merged to production
Feb 20, 17:46	Deployment completed
Feb 20, 17:56	Impact begins
Feb 20, 18:46	Issue identified; subtask terminated
Feb 20, 19:19	Customer self-remediation available
Feb 20, 20:30	Automated restoration begins
Feb 20, 23:03	All prefixes restored

The fifteen-day gap between code merge and deployment meant the bug wasn't immediately visible during initial testing.

Root Cause Analysis

This incident illustrates a classic API design pitfall: ambiguous parameter interpretation. When pending_delete was passed without a value, the API treated an empty string as a wildcard query rather than returning an error or empty result.

The failure mode is particularly dangerous because it inverts expected behavior. Instead of failing safely (doing nothing when input is malformed), the system failed maximally (operating on all records).

Similar API interpretation issues have caused problems elsewhere. We've documented how authentication bypass vulnerabilities often stem from edge cases in parameter handling—though in this case the consequence was availability rather than security.

Cloudflare's Remediation

Beyond immediate restoration, Cloudflare outlined several changes:

API schema standardization — stricter validation to prevent ambiguous parameter interpretation
State separation — separating operational state from customer configuration data
Health-mediated rollback — automatic rollback when deployments cause service degradation
Circuit breaker implementation — detecting and halting rapid large-scale changes

The company noted that customers could restore their own prefixes via the dashboard while automated restoration was underway—a self-service option that reduced impact duration for those who caught the issue early.

Broader Implications

BYOIP customers tend to be larger enterprises with specific requirements around IP reputation, geolocation, or regulatory compliance. A six-hour withdrawal of their prefixes means downstream services—anything relying on those IPs—would have experienced failures.

For organizations depending on Cloudflare's BYOIP service:

Monitor BGP advertisements independently for your prefixes
Maintain out-of-band communication for incident coordination
Document rollback procedures that don't depend on affected services
Test disaster recovery assuming CDN unavailability

Infrastructure providers have had a rough stretch. This incident follows the broader pattern of platform reliability concerns that organizations need to account for in their resilience planning. No provider is immune to cascading failures, and defense-in-depth means planning for provider-level outages, not just individual service issues.

Cloudflare's transparency in publishing detailed post-mortems is commendable—it lets the industry learn from these incidents. The lesson here: when automating destructive operations, empty inputs should fail closed, not open.

Cloudflare's 6-Hour Outage Traced to API Query Bug

What Went Wrong

Impact Scope

Timeline

Root Cause Analysis

Cloudflare's Remediation

Broader Implications

Related Articles

Let's Encrypt Halts All Certificate Issuance After Root Signing Mishap

Varonis Atlas Monitors Claude AI With New Compliance API

Anthropic's Mythos Toggle Appears in Claude Code—Then Vanishes

Operation Saffron Dismantles VPN Used by 25 Ransomware Gangs

Related Articles

Announcements3 min read
Let's Encrypt Halts All Certificate Issuance After Root Signing Mishap
Let's Encrypt suspended certificate operations for 2.5 hours on May 8 after discovering a cross-signed certificate linking Generation X and Y roots incorrectly.
May 9, 2026

Announcements4 min read
Varonis Atlas Monitors Claude AI With New Compliance API
Varonis joins 27 other security vendors integrating Anthropic's Claude Compliance API, enabling enterprises to monitor AI conversations, detect data leaks, and enforce governance policies in real time.
May 26, 2026

Announcements4 min read
Anthropic's Mythos Toggle Appears in Claude Code—Then Vanishes
A toggle for claude-mythos-1-preview briefly surfaced in Claude Code before removal. The restricted model found 10,000+ zero-days in its first month through Project Glasswing.
May 25, 2026

Announcements4 min read
Operation Saffron Dismantles VPN Used by 25 Ransomware Gangs
International law enforcement seizes 33 servers and shuts down First VPN, a criminal service used by at least 25 ransomware groups since 2014. 15 nations participated.
May 25, 2026