Cloudflare's 6-Hour Outage Traced to API Query Bug
Cloudflare's February 20 outage withdrew 25% of BYOIP customer prefixes after API query misinterpretation. 1,100 prefixes went offline for over six hours.
A bug in Cloudflare's automated cleanup system caused a six-hour global service disruption on February 20, 2026, after an API query was misinterpreted as a command to delete all customer IP prefixes instead of just those scheduled for removal.
Cloudflare's post-mortem details how a seemingly minor code change cascaded into one of their more significant recent incidents.
What Went Wrong
The outage stemmed from a change to how Cloudflare's network manages IP addresses onboarded through their Bring Your Own IP (BYOIP) service—a feature that lets customers route their own IP space through Cloudflare's network.
An automated cleanup subtask was designed to identify BYOIP prefixes marked as "pending delete" and remove them. The problem: the API query passed the pending_delete parameter with an empty value rather than a specific filter.
The API server interpreted that empty string as "return all BYOIP prefixes" rather than "return only those marked for deletion." The system then systematically queued every returned prefix for deletion, including those actively serving customer traffic.
Impact Scope
Of Cloudflare's 4,306 total BYOIP prefixes, 1,100 (roughly 25%) were withdrawn from the internet via BGP between 17:56 and 18:46 UTC.
Services affected included:
- Core CDN and Security Services — customer traffic couldn't reach Cloudflare
- Spectrum applications — proxying failed entirely
- Dedicated Egress — both Gateway and CDN variants using BYOIP
- Magic Transit — connection timeouts and failures
- 1.1.1.1 website — returned HTTP 403 "Edge IP Restricted" errors
The total incident duration was 6 hours and 7 minutes, with most of that time spent restoring prefix configurations to their pre-incident state.
Timeline
| Time (UTC) | Event |
|---|---|
| Feb 5, 21:53 | Buggy code merged to production |
| Feb 20, 17:46 | Deployment completed |
| Feb 20, 17:56 | Impact begins |
| Feb 20, 18:46 | Issue identified; subtask terminated |
| Feb 20, 19:19 | Customer self-remediation available |
| Feb 20, 20:30 | Automated restoration begins |
| Feb 20, 23:03 | All prefixes restored |
The fifteen-day gap between code merge and deployment meant the bug wasn't immediately visible during initial testing.
Root Cause Analysis
This incident illustrates a classic API design pitfall: ambiguous parameter interpretation. When pending_delete was passed without a value, the API treated an empty string as a wildcard query rather than returning an error or empty result.
The failure mode is particularly dangerous because it inverts expected behavior. Instead of failing safely (doing nothing when input is malformed), the system failed maximally (operating on all records).
Similar API interpretation issues have caused problems elsewhere. We've documented how authentication bypass vulnerabilities often stem from edge cases in parameter handling—though in this case the consequence was availability rather than security.
Cloudflare's Remediation
Beyond immediate restoration, Cloudflare outlined several changes:
- API schema standardization — stricter validation to prevent ambiguous parameter interpretation
- State separation — separating operational state from customer configuration data
- Health-mediated rollback — automatic rollback when deployments cause service degradation
- Circuit breaker implementation — detecting and halting rapid large-scale changes
The company noted that customers could restore their own prefixes via the dashboard while automated restoration was underway—a self-service option that reduced impact duration for those who caught the issue early.
Broader Implications
BYOIP customers tend to be larger enterprises with specific requirements around IP reputation, geolocation, or regulatory compliance. A six-hour withdrawal of their prefixes means downstream services—anything relying on those IPs—would have experienced failures.
For organizations depending on Cloudflare's BYOIP service:
- Monitor BGP advertisements independently for your prefixes
- Maintain out-of-band communication for incident coordination
- Document rollback procedures that don't depend on affected services
- Test disaster recovery assuming CDN unavailability
Infrastructure providers have had a rough stretch. This incident follows the broader pattern of platform reliability concerns that organizations need to account for in their resilience planning. No provider is immune to cascading failures, and defense-in-depth means planning for provider-level outages, not just individual service issues.
Cloudflare's transparency in publishing detailed post-mortems is commendable—it lets the industry learn from these incidents. The lesson here: when automating destructive operations, empty inputs should fail closed, not open.
Related Articles
Anthropic Restricts Claude Mythos Over Vulnerability-Finding Power
Project Glasswing partners Amazon, Microsoft, Cisco to hunt zero-days with an AI model too dangerous for public release. Thousands of flaws already found.
Apr 9, 2026FCC Bans Import of Foreign-Made Consumer Routers Over Supply Chain Risks
All new overseas-manufactured routers prohibited from U.S. market after Volt Typhoon and Salt Typhoon exploited compromised devices. Existing routers unaffected.
Mar 30, 2026RedLine Infostealer Developer Faces 50 Years After US Extradition
Armenian national Hambardzum Minasyan extradited to face charges for developing RedLine malware infrastructure. Follows 2024 international takedown operation.
Mar 27, 2026UK Sanctions Xinbi Marketplace Over $19B in Scam Laundering
Britain becomes the first country to sanction Xinbi, a Telegram-based crypto marketplace that processed $19.9 billion for pig butchering scams and North Korean hackers.
Mar 26, 2026