Cloudflare's 6-Hour Outage Traced to API Query Bug
Cloudflare's February 20 outage withdrew 25% of BYOIP customer prefixes after API query misinterpretation. 1,100 prefixes went offline for over six hours.
A bug in Cloudflare's automated cleanup system caused a six-hour global service disruption on February 20, 2026, after an API query was misinterpreted as a command to delete all customer IP prefixes instead of just those scheduled for removal.
Cloudflare's post-mortem details how a seemingly minor code change cascaded into one of their more significant recent incidents.
What Went Wrong
The outage stemmed from a change to how Cloudflare's network manages IP addresses onboarded through their Bring Your Own IP (BYOIP) service—a feature that lets customers route their own IP space through Cloudflare's network.
An automated cleanup subtask was designed to identify BYOIP prefixes marked as "pending delete" and remove them. The problem: the API query passed the pending_delete parameter with an empty value rather than a specific filter.
The API server interpreted that empty string as "return all BYOIP prefixes" rather than "return only those marked for deletion." The system then systematically queued every returned prefix for deletion, including those actively serving customer traffic.
Impact Scope
Of Cloudflare's 4,306 total BYOIP prefixes, 1,100 (roughly 25%) were withdrawn from the internet via BGP between 17:56 and 18:46 UTC.
Services affected included:
- Core CDN and Security Services — customer traffic couldn't reach Cloudflare
- Spectrum applications — proxying failed entirely
- Dedicated Egress — both Gateway and CDN variants using BYOIP
- Magic Transit — connection timeouts and failures
- 1.1.1.1 website — returned HTTP 403 "Edge IP Restricted" errors
The total incident duration was 6 hours and 7 minutes, with most of that time spent restoring prefix configurations to their pre-incident state.
Timeline
| Time (UTC) | Event |
|---|---|
| Feb 5, 21:53 | Buggy code merged to production |
| Feb 20, 17:46 | Deployment completed |
| Feb 20, 17:56 | Impact begins |
| Feb 20, 18:46 | Issue identified; subtask terminated |
| Feb 20, 19:19 | Customer self-remediation available |
| Feb 20, 20:30 | Automated restoration begins |
| Feb 20, 23:03 | All prefixes restored |
The fifteen-day gap between code merge and deployment meant the bug wasn't immediately visible during initial testing.
Root Cause Analysis
This incident illustrates a classic API design pitfall: ambiguous parameter interpretation. When pending_delete was passed without a value, the API treated an empty string as a wildcard query rather than returning an error or empty result.
The failure mode is particularly dangerous because it inverts expected behavior. Instead of failing safely (doing nothing when input is malformed), the system failed maximally (operating on all records).
Similar API interpretation issues have caused problems elsewhere. We've documented how authentication bypass vulnerabilities often stem from edge cases in parameter handling—though in this case the consequence was availability rather than security.
Cloudflare's Remediation
Beyond immediate restoration, Cloudflare outlined several changes:
- API schema standardization — stricter validation to prevent ambiguous parameter interpretation
- State separation — separating operational state from customer configuration data
- Health-mediated rollback — automatic rollback when deployments cause service degradation
- Circuit breaker implementation — detecting and halting rapid large-scale changes
The company noted that customers could restore their own prefixes via the dashboard while automated restoration was underway—a self-service option that reduced impact duration for those who caught the issue early.
Broader Implications
BYOIP customers tend to be larger enterprises with specific requirements around IP reputation, geolocation, or regulatory compliance. A six-hour withdrawal of their prefixes means downstream services—anything relying on those IPs—would have experienced failures.
For organizations depending on Cloudflare's BYOIP service:
- Monitor BGP advertisements independently for your prefixes
- Maintain out-of-band communication for incident coordination
- Document rollback procedures that don't depend on affected services
- Test disaster recovery assuming CDN unavailability
Infrastructure providers have had a rough stretch. This incident follows the broader pattern of platform reliability concerns that organizations need to account for in their resilience planning. No provider is immune to cascading failures, and defense-in-depth means planning for provider-level outages, not just individual service issues.
Cloudflare's transparency in publishing detailed post-mortems is commendable—it lets the industry learn from these incidents. The lesson here: when automating destructive operations, empty inputs should fail closed, not open.
Related Articles
Cisco Secure AI Factory with NVIDIA: Partner Revenue at Scale
Cisco 360 Partner Program offers new AI specializations and certifications tied to NVIDIA partnership, with $267B in projected partner-delivered AI services by 2030.
Feb 19, 2026Cisco AI Security Report: 83% Want Agents, 29% Ready
Cisco's State of AI Security 2026 report reveals a dangerous gap between agentic AI adoption ambitions and enterprise security readiness. Here's what the threat landscape looks like.
Feb 19, 2026Cisco DevNet Launches AI Repos Catalog for MCP Servers
New catalog at developer.cisco.com/codeexchange/ai centralizes AI agents and MCP servers for network automation, with built-in testing tools.
Feb 18, 2026Cisco Warns TLS Certificate Changes Could Break mTLS
Public CAs will stop issuing TLS certificates with clientAuth EKU by June 2026. Cisco outlines the impact on CUBE, Expressway, and mTLS deployments.
Feb 17, 2026