Cloudflare's 6-Hour Outage Traced to API Query Bug
Cloudflare's February 20 outage withdrew 25% of BYOIP customer prefixes after API query misinterpretation. 1,100 prefixes went offline for over six hours.
A bug in Cloudflare's automated cleanup system caused a six-hour global service disruption on February 20, 2026, after an API query was misinterpreted as a command to delete all customer IP prefixes instead of just those scheduled for removal.
Cloudflare's post-mortem details how a seemingly minor code change cascaded into one of their more significant recent incidents.
What Went Wrong
The outage stemmed from a change to how Cloudflare's network manages IP addresses onboarded through their Bring Your Own IP (BYOIP) service—a feature that lets customers route their own IP space through Cloudflare's network.
An automated cleanup subtask was designed to identify BYOIP prefixes marked as "pending delete" and remove them. The problem: the API query passed the pending_delete parameter with an empty value rather than a specific filter.
The API server interpreted that empty string as "return all BYOIP prefixes" rather than "return only those marked for deletion." The system then systematically queued every returned prefix for deletion, including those actively serving customer traffic.
Impact Scope
Of Cloudflare's 4,306 total BYOIP prefixes, 1,100 (roughly 25%) were withdrawn from the internet via BGP between 17:56 and 18:46 UTC.
Services affected included:
- Core CDN and Security Services — customer traffic couldn't reach Cloudflare
- Spectrum applications — proxying failed entirely
- Dedicated Egress — both Gateway and CDN variants using BYOIP
- Magic Transit — connection timeouts and failures
- 1.1.1.1 website — returned HTTP 403 "Edge IP Restricted" errors
The total incident duration was 6 hours and 7 minutes, with most of that time spent restoring prefix configurations to their pre-incident state.
Timeline
| Time (UTC) | Event |
|---|---|
| Feb 5, 21:53 | Buggy code merged to production |
| Feb 20, 17:46 | Deployment completed |
| Feb 20, 17:56 | Impact begins |
| Feb 20, 18:46 | Issue identified; subtask terminated |
| Feb 20, 19:19 | Customer self-remediation available |
| Feb 20, 20:30 | Automated restoration begins |
| Feb 20, 23:03 | All prefixes restored |
The fifteen-day gap between code merge and deployment meant the bug wasn't immediately visible during initial testing.
Root Cause Analysis
This incident illustrates a classic API design pitfall: ambiguous parameter interpretation. When pending_delete was passed without a value, the API treated an empty string as a wildcard query rather than returning an error or empty result.
The failure mode is particularly dangerous because it inverts expected behavior. Instead of failing safely (doing nothing when input is malformed), the system failed maximally (operating on all records).
Similar API interpretation issues have caused problems elsewhere. We've documented how authentication bypass vulnerabilities often stem from edge cases in parameter handling—though in this case the consequence was availability rather than security.
Cloudflare's Remediation
Beyond immediate restoration, Cloudflare outlined several changes:
- API schema standardization — stricter validation to prevent ambiguous parameter interpretation
- State separation — separating operational state from customer configuration data
- Health-mediated rollback — automatic rollback when deployments cause service degradation
- Circuit breaker implementation — detecting and halting rapid large-scale changes
The company noted that customers could restore their own prefixes via the dashboard while automated restoration was underway—a self-service option that reduced impact duration for those who caught the issue early.
Broader Implications
BYOIP customers tend to be larger enterprises with specific requirements around IP reputation, geolocation, or regulatory compliance. A six-hour withdrawal of their prefixes means downstream services—anything relying on those IPs—would have experienced failures.
For organizations depending on Cloudflare's BYOIP service:
- Monitor BGP advertisements independently for your prefixes
- Maintain out-of-band communication for incident coordination
- Document rollback procedures that don't depend on affected services
- Test disaster recovery assuming CDN unavailability
Infrastructure providers have had a rough stretch. This incident follows the broader pattern of platform reliability concerns that organizations need to account for in their resilience planning. No provider is immune to cascading failures, and defense-in-depth means planning for provider-level outages, not just individual service issues.
Cloudflare's transparency in publishing detailed post-mortems is commendable—it lets the industry learn from these incidents. The lesson here: when automating destructive operations, empty inputs should fail closed, not open.
Related Articles
Two Cybersecurity Pros Face 20 Years for ALPHV Ransomware Role
Ryan Goldberg and Kevin Martin pleaded guilty to deploying ALPHV BlackCat ransomware while working in incident response and negotiation roles. Sentencing set for March 12.
Mar 2, 2026DDoS Attacks Now a Permanent Threat, Link11 Report Finds
Link11's European Cyber Report 2026 reveals DDoS attacks increased 75% with systems under fire 88% of the year. Follow-up attacks surged 80% as attackers adopt persistence tactics.
Mar 2, 2026Samsung Halts Texas TV Data Collection Without Consent
Texas AG Ken Paxton secures settlement forcing Samsung to stop ACR surveillance of Texans' viewing habits without express consent. Four other TV makers still facing lawsuits.
Mar 1, 2026Cisco Secure AI Factory with NVIDIA: Partner Revenue at Scale
Cisco 360 Partner Program offers new AI specializations and certifications tied to NVIDIA partnership, with $267B in projected partner-delivered AI services by 2030.
Feb 19, 2026