PROBABLYPWNED
AnnouncementsFebruary 9, 20265 min read

How Cisco IT Cut Incidents 25% With Unified Observability

Cisco IT unified fragmented monitoring tools into a centralized observability platform, achieving zero network incidents and 45% faster detection using Splunk, ThousandEyes, and AI automation.

ProbablyPwned Team

Cisco IT manages over 100,000 endpoints across a global infrastructure that never sleeps. Until recently, that scale came with a painful operational reality: fragmented monitoring tools created visibility gaps that turned minor issues into major incidents. The company just published a detailed breakdown of how it transformed its observability strategy—and the results are striking. Major incidents dropped 25% year-over-year, network outages fell to zero, and mean time to detect and resolve problems improved by 45%.

The transformation took 18 months and required rethinking everything about how Cisco monitors its own infrastructure. What emerged is a playbook for enterprises struggling with the same problem: too much data, not enough insight.

Why Fragmented Observability Breaks at Scale

Cisco IT's problems weren't unique. The team was running separate monitoring tools for network performance, infrastructure health, application behavior, and security events. Each system collected telemetry in its own format, stored data in isolated silos, and generated alerts without understanding what other tools were seeing. The result was alert fatigue—too many signals, not enough context.

"After the outage, we knew we had to rethink everything," Cisco Senior Director Chuck Churchill said in the company's report. That outage was the breaking point. When incidents happen across siloed systems, engineers waste time correlating logs from different tools instead of fixing the actual problem—the same visibility challenges that allowed the AsyncOS zero-day compromise to evade detection for weeks. Cisco IT was monitoring 10 times less network telemetry than it needed, and the visibility gaps were showing up as recurring incidents—three to four major network failures every quarter.

This problem is accelerating across the industry. Enterprise observability challenges in 2026 are driven by fragmented telemetry and inconsistent context. Each tool tells one story—logs, metrics, and traces rarely align. Organizations end up paying for massive data ingestion without the ability to make effective use of what they collect. Cisco IT decided to break that cycle.

Three Pillars: Network, Platform, Service Operations

Cisco IT's observability transformation rests on three operational pillars:

  1. Network observability - Comprehensive visibility into internal networks and third-party provider infrastructure
  2. Platform and data observability - Centralized monitoring across data centers, cloud environments, and hybrid infrastructure
  3. Service operations - Unified issue detection and resolution using enriched telemetry data

At the center of this architecture sits Splunk Cloud Platform, which aggregates telemetry from network devices, infrastructure components, and applications into a single operational dashboard. Instead of jumping between tools, engineers now see unified visualizations that correlate events across the entire stack.

ThousandEyes provides end-to-end network visibility, extending monitoring into external environments like public internet paths and cloud services. This is the kind of visibility that matters when a SaaS provider's routing problem starts affecting internal application performance—ThousandEyes captures the issue before users start opening tickets.

The third critical component is a centralized Configuration Management Database (CMDB) that serves as the single source of truth for every IT asset. When an alert fires, the system knows exactly what's affected, who owns it, and what dependencies exist—context that turns raw alerts into actionable incidents.

AI Operations: Handling 4 Million Alerts Per Day

The real breakthrough came from applying AI to automate event analysis. Cisco IT now processes roughly 4 million alerts per day from its monitoring infrastructure. Without automation, that volume would be paralyzing. Instead, AI-driven operations handle 99.998% of those alerts autonomously—filtering noise, correlating related events, and escalating only the incidents that require human intervention.

That automation is the difference between drowning in alerts and running a responsive operation. Engineers now monitor 10 times more network telemetry than before the transformation, but they see fewer irrelevant notifications because the system understands which signals actually matter.

The results speak for themselves. Over 18 months, Cisco IT achieved:

  • Zero major network incidents (down from 3-4 per quarter)
  • 25% reduction in major incidents year-over-year
  • 45% faster mean time to detect and resolve issues
  • 20% decrease in change-related incidents
  • 4x greater visibility across infrastructure

Why This Matters for Enterprise IT

Cisco IT's transformation mirrors what Cisco showcased at Black Hat Europe 2025, where the company operated the event's Network Operations Center using production-ready XDR integrations. The ability to unify telemetry from disparate sources—whether it's Corelight NDR feeds or Palo Alto Networks firewalls—is the same observability challenge that Cisco IT solved internally.

The shift toward unified observability isn't optional for enterprises managing hybrid cloud infrastructure. Distributed architectures generate exponentially more telemetry as they scale. Without a structured approach to aggregating, normalizing, and analyzing that data, organizations face escalating costs and diminishing returns. Industry trends predict that by 2026, 50% of organizations with distributed architectures will adopt advanced observability platforms—up from just 20% in 2024.

Cisco IT's experience also highlights the importance of OpenTelemetry as a standard for telemetry collection. Proprietary data formats create vendor lock-in and complicate integration. OpenTelemetry provides a common language for logs, metrics, and traces, making it easier to build unified observability pipelines across multi-vendor environments.

The Human Side of Technical Transformation

Technology alone won't solve observability problems. Cisco Director Mark Hutchins emphasized that "this journey is about changing mindsets as much as deploying technology." Teams that spent years working in isolated tool silos had to adopt new workflows centered around shared data and collaborative problem-solving.

Organizations considering similar transformations should recognize that observability isn't just a monitoring upgrade—it's an operational culture shift. Engineers need to trust centralized dashboards, automation must earn credibility through consistent performance, and leadership has to commit to long-term investment in platform consolidation.

For enterprises wrestling with fragmented monitoring tools, Cisco IT's transformation offers a practical roadmap: consolidate telemetry into a unified platform, apply AI to filter noise and correlate events, and extend visibility into every layer of the infrastructure stack. The payoff isn't just fewer incidents—it's the operational resilience to handle rapid change without compromising service availability.

Related Articles