PROBABLYPWNED
AnnouncementsFebruary 23, 20264 min read

Cloudflare's 6-Hour Outage Traced to API Query Bug

Cloudflare's February 20 outage withdrew 25% of BYOIP customer prefixes after API query misinterpretation. 1,100 prefixes went offline for over six hours.

ProbablyPwned Team

A bug in Cloudflare's automated cleanup system caused a six-hour global service disruption on February 20, 2026, after an API query was misinterpreted as a command to delete all customer IP prefixes instead of just those scheduled for removal.

Cloudflare's post-mortem details how a seemingly minor code change cascaded into one of their more significant recent incidents.

What Went Wrong

The outage stemmed from a change to how Cloudflare's network manages IP addresses onboarded through their Bring Your Own IP (BYOIP) service—a feature that lets customers route their own IP space through Cloudflare's network.

An automated cleanup subtask was designed to identify BYOIP prefixes marked as "pending delete" and remove them. The problem: the API query passed the pending_delete parameter with an empty value rather than a specific filter.

The API server interpreted that empty string as "return all BYOIP prefixes" rather than "return only those marked for deletion." The system then systematically queued every returned prefix for deletion, including those actively serving customer traffic.

Impact Scope

Of Cloudflare's 4,306 total BYOIP prefixes, 1,100 (roughly 25%) were withdrawn from the internet via BGP between 17:56 and 18:46 UTC.

Services affected included:

  • Core CDN and Security Services — customer traffic couldn't reach Cloudflare
  • Spectrum applications — proxying failed entirely
  • Dedicated Egress — both Gateway and CDN variants using BYOIP
  • Magic Transit — connection timeouts and failures
  • 1.1.1.1 website — returned HTTP 403 "Edge IP Restricted" errors

The total incident duration was 6 hours and 7 minutes, with most of that time spent restoring prefix configurations to their pre-incident state.

Timeline

Time (UTC)Event
Feb 5, 21:53Buggy code merged to production
Feb 20, 17:46Deployment completed
Feb 20, 17:56Impact begins
Feb 20, 18:46Issue identified; subtask terminated
Feb 20, 19:19Customer self-remediation available
Feb 20, 20:30Automated restoration begins
Feb 20, 23:03All prefixes restored

The fifteen-day gap between code merge and deployment meant the bug wasn't immediately visible during initial testing.

Root Cause Analysis

This incident illustrates a classic API design pitfall: ambiguous parameter interpretation. When pending_delete was passed without a value, the API treated an empty string as a wildcard query rather than returning an error or empty result.

The failure mode is particularly dangerous because it inverts expected behavior. Instead of failing safely (doing nothing when input is malformed), the system failed maximally (operating on all records).

Similar API interpretation issues have caused problems elsewhere. We've documented how authentication bypass vulnerabilities often stem from edge cases in parameter handling—though in this case the consequence was availability rather than security.

Cloudflare's Remediation

Beyond immediate restoration, Cloudflare outlined several changes:

  1. API schema standardization — stricter validation to prevent ambiguous parameter interpretation
  2. State separation — separating operational state from customer configuration data
  3. Health-mediated rollback — automatic rollback when deployments cause service degradation
  4. Circuit breaker implementation — detecting and halting rapid large-scale changes

The company noted that customers could restore their own prefixes via the dashboard while automated restoration was underway—a self-service option that reduced impact duration for those who caught the issue early.

Broader Implications

BYOIP customers tend to be larger enterprises with specific requirements around IP reputation, geolocation, or regulatory compliance. A six-hour withdrawal of their prefixes means downstream services—anything relying on those IPs—would have experienced failures.

For organizations depending on Cloudflare's BYOIP service:

  1. Monitor BGP advertisements independently for your prefixes
  2. Maintain out-of-band communication for incident coordination
  3. Document rollback procedures that don't depend on affected services
  4. Test disaster recovery assuming CDN unavailability

Infrastructure providers have had a rough stretch. This incident follows the broader pattern of platform reliability concerns that organizations need to account for in their resilience planning. No provider is immune to cascading failures, and defense-in-depth means planning for provider-level outages, not just individual service issues.

Cloudflare's transparency in publishing detailed post-mortems is commendable—it lets the industry learn from these incidents. The lesson here: when automating destructive operations, empty inputs should fail closed, not open.

Related Articles