PROBABLYPWNED
ToolsFebruary 4, 20264 min read

Cisco's ACE Framework Cuts LLM Token Costs by Up to 90%

Analytics Context Engineering addresses three failure modes when LLMs process machine data, delivering dramatic token savings and accuracy gains.

David Okonkwo

Cisco published details on Analytics Context Engineering (ACE), a framework designed to help large language models process machine-generated data without choking on verbose log formats or hallucinating through long numeric sequences. The approach targets observability and telemetry workloads where traditional prompt engineering falls short.

The framework stems from work on Cisco's Deep Network Model, the purpose-built LLM that powers the company's AgenticOps platform. Engineers found that feeding raw observability data directly into prompts created predictable failures—problems ACE now addresses through what the team calls "hybrid data views."

The Problem with Raw Machine Data

LLMs struggle with machine data in three distinct ways. First, token explosion: verbose nested JSON structures from logs and metrics fragment the context window, leaving little room for actual reasoning. Second, context rot: critical signals get buried when models process large payloads, leading to missed anomalies or incomplete analysis. Third, numeric reasoning weakness: models perform poorly when asked to work through long sequences of categorical data or timestamps.

These failures matter because AI agents are increasingly handling operational tasks. When we covered tools for monitoring OpenClaw AI agent activity, the underlying concern was visibility into what autonomous systems actually do. ACE tackles the complementary challenge—ensuring those agents can reason effectively about the telemetry they consume.

How ACE Works

The framework operates on a simple philosophy: "Everything is a file; some are databases." Rather than cramming raw data payloads into prompts, ACE stores them externally and provides the LLM with processed views optimized for specific tasks.

Four components make up the architecture:

Virtual file system - Maps observability APIs to file-like interfaces, allowing the LLM to reference data without embedding it entirely in context.

Preprocessor - Converts raw prompts into hybrid data views before the LLM sees them. This stage handles the heavy lifting of data transformation.

Datastore - Maintains full-fidelity original data for query-based access. The LLM can request specific slices when needed.

Processor loop - Enriches LLM outputs through conditional queries, allowing iterative refinement without reloading entire datasets.

The hybrid views come in two flavors. Columnar representation flattens nested JSON into dotted paths for analytics tasks like anomaly detection. Row-oriented representation preserves record boundaries using modified TF-IDF ranking based on query relevance—surfacing the most pertinent entries while maintaining context.

Measured Results

Cisco reports significant improvements across benchmark tasks:

TaskToken ReductionAccuracy Change
Slot filling53%42 errors corrected (500 tests)
Anomaly detection44%+25% accuracy
Line chart rendering87%Quality score: 0.410 → 0.786

The 87% token reduction for chart rendering stands out. That task previously loaded entire time-series datasets into prompts—ACE now provides summarized views with the option to query underlying data points selectively.

Why Context Engineering Matters Now

Gartner recently recommended making context engineering a core enterprise capability, suggesting new roles like Context Engineers and Context Architects within AI teams. The shift reflects hard lessons from production deployments where token costs dominate operating budgets.

Mezmo, another vendor in this space, claims their context engineering approach reduces token consumption from 500K to roughly 27K per incident. The pattern across vendors is consistent: the era of stuffing everything into prompts is ending.

This connects to broader security considerations around AI agent deployments. Frameworks that reduce token usage also reduce the attack surface for prompt injection—less data in the context window means fewer opportunities to embed malicious instructions. Organizations running AI agents against security-sensitive telemetry should pay attention to how data flows through their LLM pipelines.

Production Integration

ACE was initially developed for Cisco's Deep Network Model but ships as a standalone service supporting other LLM providers. It integrates with Cisco AI Canvas for runbook reasoning and observability workflows.

The framework deliberately restricts complexity. The Cisco team notes that "simple SQL is what you need" for most machine data contexts, drawing inspiration from database hybrid transactional/analytical processing (HTAP) principles. Tool access is limited to SQL and Bash operations—enough to query and filter data without introducing unnecessary risk.

For enterprises operating network-adjacent AI systems, the approach maps well to existing infrastructure. ThousandEyes, AppDynamics, and NetFlow data all flow through standardized pipelines that ACE can intercept and process.

What This Means for Security Teams

Context engineering isn't just an optimization technique. When AI agents process observability data at scale, the quality of their reasoning directly impacts incident response effectiveness. An agent that hallucinates through log analysis or misses critical indicators because they're buried in a verbose payload becomes a liability rather than an asset.

ACE represents one vendor's answer to a problem the entire industry faces. Whether through Cisco's implementation or competing approaches, the discipline of treating context as a first-class architectural concern is becoming table stakes for production AI deployments.

Related Articles