Federated Security Analytics: Why Moving Compute to Data Is Replacing Centralized SIEM

Table of Contents

TL;DR

Security teams didn’t get SIEM wrong. The environment changed around it.

Centralized SIEMs require data to be ingested and indexed before it can be analyzed. That works for recent data, but becomes operationally complex at scale. Federated security analytics takes a different approach — queries run against data in its original location rather than requiring centralized ingestion.

The architectural difference is simple: centralized systems move data to compute, federated systems move compute to data.

The result is broader coverage with reduced operational overhead, without replacing the platforms you’ve already invested in, and a unified data layer ready for AI-driven security operations.

Security teams have always worked with the tools available to them.

The decisions made over the last thirty years were not wrong. They were rational responses to the environment at the time. But environments change, and the architecture that served a SOC well in 2005 is now one of the biggest constraints on what modern security teams can actually do.

To understand where we are, it helps to understand how we got here.

How SIEM Architecture Evolved Over Time
#

For the past two decades, security analytics platforms have relied on a single assumption: if you want to analyze security data, you first move it into one place.

Every major SIEM follows this model. Splunk, Elastic, Sentinel. Different implementations, same underlying design. Ingest, index, query locally.

The first generation was really about compliance. Centralize everything, store it, prove it exists when an auditor asks. Detection was secondary. But once logs became searchable, the SOC started to mature around that capability, and the category grew with it.

So did the data. And somewhere around 2010, cost became a security decision. What went into the SIEM and what didn’t was increasingly a financial call. That tension has never gone away.

Why SIEM Struggles With Modern Data Growth
#

Then cloud happened. Then SaaS. Then containers. Then EDR at scale. The data problem didn’t grow incrementally. It detonated. Security telemetry now spans multiple clouds, SaaS platforms, endpoints, and data lakes, and moving all of it into a single analytics system introduces real constraints: network transfer costs, ingestion overhead, parsing and data normalization work, and growing operational complexity.

Architectures built for a world with walls were now running in an estate that had none. And with data volumes growing faster than budgets, teams needed a way to stay in control without tearing everything down. Telemetry pipelines became the answer.

Why Telemetry Pipelines Don’t Solve the SIEM Data Problem
#

Tools like Cribl and Databahn gave teams back control. Route this, filter that, drop the noise before it hits the system of analysis, send the rest to object storage and retrieve it later.

Connecting on-prem, cloud, and distributed environments into a single collection layer is genuinely hard, and pipelines solve that problem well.

But plumbing is not architecture. Pipelines reduce the cost of centralization without questioning it. Data still flows to one place. Complexity still accumulates at the center.

And then you hit the harder question: what do you actually filter?

There are no universal rules. What matters depends on your business, your threat model, your risk tolerance. Getting that right takes time and experience most teams are still building.

So the underlying problem doesn’t disappear. It just gets deferred.

Why AI in the SOC Requires Complete Data Access
#

Now we’re in a different moment.

Generative AI and agentic workflows are changing what a SOC can do. Detections that used to take specialists can be generated using AI. Investigations that took hours can run in minutes. But there’s a limit that enthusiasm doesn’t fix.

AI is only as good as the data it can access. If half your telemetry is filtered out, sitting in cold storage, or never ingested in the first place, AI hits a boundary. The moment that happens, the workflow stops being autonomous. This is exactly why earlier attempts at AI in security felt constrained. The models were never the issue. The environment was.

AI needs complete visibility to be effective. Most architectures weren’t designed for that.

How Vega Approaches Federated Security Analytics
#

Federated security analytics starts with a different assumption: the data is already where it needs to be.

Most organizations are already storing more data than their SIEM ever sees. Cloud logs, endpoint telemetry, security findings. It’s all there, often in object storage because that’s the only place it’s economically viable. The problem isn’t that the data doesn’t exist. It’s that the platform built to analyze it can’t reach it without pulling it through a high-cost ingestion process.

Federation flips that.

Compute moves to the data. The data stays where it is. Your SIEM, your cloud logs, your object storage, your endpoint tools, even multiple SIEMs and data lakes across environments. Everything becomes queryable without re-ingestion or duplication. And the tools your team has already invested in don’t get replaced. They get connected.

When a new data source needs to be onboarded, LLM-powered parsers mean that what once took weeks of normalization work now happens the same day. The barrier to bringing data into scope drops to almost nothing, which means the data available to both human analysts and AI agents continuously expands without integration bottlenecks.

	Centralized SIEM	Federated Security Analytics (Vega)
Data movement	All telemetry must be ingested into a single cluster	Queries run where data already lives; only results travel
Egress costs	At $0.08–$0.15/GB for cross-cloud/region transfers, moving 5–10TB daily adds up to $150K–$500K in annual egress costs	Near zero; queries return rows, not petabytes
Storage economics	SSD for hot data; frozen tier is functionally offline	Object storage is the primary tier; ~12 months of hot, searchable data at a fraction of the cost
Cold start performance	First queries against historical data suffer cache misses and frozen tier retrieval delays	No cold start penalty; object storage data is always queryable
Operational overhead	Manual tuning: index lifecycle policies, shard counts, node sizing, tier management	Fully managed: no shard math, no tier tuning, no schema planning
Compute model	Same compute handles indexing and querying; contention under load	Indexing and search run on independent, elastic compute
Data source scope	Single cluster; data must be centralized first	Federates across S3, GCS, Splunk, Sentinel, Elastic, Snowflake, Databricks, and more
Time to value	Significant upfront configuration and schema planning	Minimal config; rapid onboarding of new data sources
Best for	Small data volumes with heavy join requirements	Detection engineering, threat hunting, wide scans, aggregations, and historical depth

Centralized vs Federated Analytics in a Real Investigation
#

A credential is compromised. An analyst needs to understand the blast radius. They start in the SIEM and authentication logs come back quickly. Then the questions expand. What happened on the endpoint? What cloud APIs were called? What systems were touched?

This is where the investigation fragments.

Some data needs restoring. Some was filtered. Some never made it into the SIEM. The analyst waits, pivots, or makes a call based on incomplete information. That’s not a failure of the analyst. It’s a consequence of the architecture.

In a federated model, the same investigation runs as a single query across all sources simultaneously. No rehydration. No waiting. No missing context. The scope of the investigation is defined by the question, not by what happened to be ingested.

The difference isn’t speed. It’s coverage.

Most organizations already collect the data they need. They just can’t use all of it. Centralized architectures create a gap between what is collected and what is actually queryable, and that gap shows up as blind spots, slower investigations, and reduced confidence in detection.

Federation closes that gap.

How to Evaluate Your Security Data Architecture Today
#

Most teams are not starting from scratch. They’ve invested years into their platforms, pipelines, and processes, and those decisions made sense at the time. The question is whether those systems are still working the way you need them to.

How much of your telemetry is actually queryable right now? How often do investigations stall because data lives somewhere your platform can’t reach? Does your architecture support what AI workflows actually require? The answers usually point in the same direction.

The data is already distributed. It has been for years. Architecture that attempts to force everything back into one place is working against that reality. Federated security analytics works with it. You don’t need to replace everything you’ve built. You just need to stop asking your data to come to you.

If you’re exploring what this looks like in practice, that’s exactly the problem Vega is built to solve.

How SIEM Architecture Evolved Over Time #

Why SIEM Struggles With Modern Data Growth #

Why Telemetry Pipelines Don’t Solve the SIEM Data Problem #

Why AI in the SOC Requires Complete Data Access #

How Vega Approaches Federated Security Analytics #

Centralized vs Federated Analytics in a Real Investigation #

How to Evaluate Your Security Data Architecture Today #