Realm.Security

Realm.Security Rolls Out AI-Ready Security Data for the Modern SOC Ahead of RSA Conference

Team Realm — Mon, 16 Mar 2026 14:27:26 +0000

Enrichment capabilities add real-time context to security data pipelines, route telemetry to modern data lakes like Hydrolix, and automate governance of sensitive data

Boston, MA – March 16, 2026 – Realm.Security, the company pioneering the industry's first AI-native Security Data Pipeline Platform (SDPP), today announced Realm Data Enrichments, a partnership with Hydrolix, and expanded Privacy Guard capabilities, giving security teams vendor-neutral control over their SOC data to accelerate detection, investigation, and compliance. Realm Data Enrichments is a new AI-powered capability that injects contextual intelligence directly into telemetry in the security data pipeline. Meanwhile, new capabilities within Privacy Guard enable privacy data discovery and automated redaction of sensitive fields before telemetry reaches downstream systems.

"The modern SOC is not defined by a single tool, but by the quality and intelligence embedded in your data that feeds your tools," said Pete Martin, co-founder and CEO of Realm.Security. "By pairing real-time threat intelligence with environmental context at the source, we empower security teams to reclaim control over their data, ensuring it is secure, enriched, and accessible across the entire security ecosystem. Both human and AI analysts can now identify and stop bad actors the moment data hits the stack, rather than waiting for slow, post-ingestion lookups.”

Detect Faster with Realm Data Enrichments

Realm Data Enrichments inject contextual intelligence directly into telemetry as it flows through the security data pipeline. Security analysts are forced to perform manual lookups or build enrichment workflows inside SIEM or data lake environments to determine the source of suspicious activity. Realm eliminates that delay by augmenting logs with contextual metadata before the data reaches downstream systems.

For example, IP addresses can automatically be enriched with geographic location, ISP ownership, and network intelligence using datasets from providers such as MaxMind and IPinfo. Or an IP could be marked with threat intelligence and deliver a detection before it is queried and indexed by the SIEM to streamline detection triage. This enables both human analysts and AI-driven SOC agents to investigate threats more quickly and accurately.

Furthermore, this enriched data expands Realm's partnership with Hydrolix, which enables this enriched telemetry to flow into modern security data lakes designed for large-scale investigation and long-term threat hunting. The joint solution provides a high-performance, AI-ready data fabric. By combining Realm's intelligent security data engine with Hydrolix's high-velocity real-time, global scale data platform, organizations can unlock their archival data, modernize their security telemetry pipeline, and provide the sub-second visibility required for the SOC in modern defense.

"Teaming with Realm.Security was a natural response to what our mutual customers are telling us," said Rob Malnati, Head of Corporate Development at Hydrolix. "As enterprises disaggregate their SIEM, they need solutions that solve the two defining challenges of AI-ready security data: retaining the vast telemetry required to train and inform agents, while keeping storage and data management costs in check. Together, we address both — and we go further. SOC responders no longer have to chase root causes across siloed systems and dashboards. We deliver one view with actionable insights in seconds, so security teams can move from detection to mitigation at the speed threats demand."

Key benefits of Realm Data Enrichments:

Accelerated investigations: Analysts can immediately understand where an event originated and who owns the infrastructure involved.

Reduced compute costs: Enriching data once in the pipeline eliminates repeated lookup queries and JOIN operations in SIEM or data lake environments.

Consistent data schemas: Enrichment data is appended in standardized formats, simplifying cross-tool analytics and dashboards.

Unlocking Restricted Data with Privacy Guard

Privacy Guard now enables organizations to safely ingest security data that may contain sensitive fields such as personally identifiable information (PII). It provides privacy data discovery and automated redaction directly within the security data pipeline, allowing organizations to identify and manage sensitive data according to regulatory frameworks such as GDPR, HIPAA, CCPA, and PCI-DSS. Rather than requiring security teams to manually track every potential source of sensitive data, it enables them to define the frameworks and regional requirements they adhere to, automatically applying policies across incoming telemetry. As a result, they ingest more security data with confidence while ensuring downstream analytics platforms, SIEMs, and AI systems do not receive unnecessary sensitive data.

Key capabilities of Realm Privacy Guard include:

AI-powered discovery and redaction: Automatically identifies sensitive patterns and recommends redaction policies.

SIEM-safe masking: Sensitive values are replaced without breaking log schemas or detection rules.

Centralized governance controls: Security teams can manage redaction policies across pipeline destinations from a unified interface.

Meet Realm at RSA Conference
Realm.Security executives will be onsite at RSA Conference 2026 to demonstrate Data Enrichments, its joint offering with Hydrolix, Privacy Guard and the company's broader AI-native Security Data Pipeline Platform.Security leaders can schedule a meeting here with the team during the event.

The post Realm.Security Rolls Out AI-Ready Security Data for the Modern SOC Ahead of RSA Conference appeared first on Realm.Security.

3 Data-Based Shifts Defining AI-Native Cybersecurity Stacks

Pete Martin — Thu, 05 Mar 2026 14:58:48 +0000

The Security Operations Center (SOC) in modern cybersecurity stacks is changing faster than most people realize. Not in some distant future, but right now. The way security teams detect, investigate, and respond to threats looks very different than the last decade. Of course, AI is a major driver of this transformation.

This doesn’t imply bolting AI features onto existing tools. The entire architecture of the SOC needs to be rebuilt around what AI does well and how it works alongside human cybersecurity analysts. Detection is moving upstream into data pipelines. The SIEM is evolving into a tier two investigation platform. AI agents are taking over early triage and enrichment work that's consumed analyst time for years. According to a survey of CISOs from Team8, 67% of organizations have already deployed AI agents, with another 23% planning to this year.

The teams that understand we’re now working in a new AI-native cybersecurity era and build the right foundation will see dramatic reductions in detection time, fewer false positives, and SOC analysts who can finally focus on complex investigations instead of data gathering.

For security and data leaders, AI-native cybersecurity stacks are among the most demanding real-time data environments in the enterprise. They will help us understand how data pipelines can support low-latency decisioning, when schemas are sufficiently consistent for automation, and whether governance models can accommodate machine-driven access without slowing responses. In that sense, the SOC has become a proving ground for modern data architecture.

Here are the three shifts making that possible.

1. Detection Moves Into the Security Data Pipeline, SIEM Use Evolves

The SIEM has been the center of gravity in every SOC for years. It ingests everything, stores everything, and gets queried constantly to find threats. That model is breaking down.

The problem is too much data. Querying a SIEM for indicators of compromise can take one to five hours based on data volume. That's too slow when attacks move in minutes.

The first major shift: detection is moving upstream into the data pipeline itself. Organizations must now match against known Indicators of Compromise (IOCs) during ingestion, before data even gets stored. This cuts detection time dramatically and stops the constant re-querying of massive datasets.

The SIEM isn’t being put out to pasture. Instead, it's evolving into a platform for tier two and tier three investigation. Analysts work on validated alerts that have already been enriched and contextualized by upstream systems. The noise gets filtered out before it reaches them.

This transition coincides with the decoupling of storage from detection. Organizations are moving raw telemetry into data lakes and archival systems built for machine-driven access. While SIEMs weren't designed to be queried by AI agents, data lakes are. They support high-velocity API calls, cost less at scale, and give agentic agents direct access without clogging primary detection infrastructure.

The new architecture: detection happens upstream, enriched data flows to the SIEM for deeper investigation, and a secondary storage layer handles AI queries.

2. AI Agents Take Over Early Triage and Leverage Enrichment

Once the data foundation is in place, agentic workflows take over the grunt work that eats up the first 20 minutes of every investigation. Traditionally, analysts spend most of their time gathering context—pulling logs, checking authentication events, looking up user details. It's slow and tedious. In an AI-native SOC, that work happens automatically. When an alert fires, AI agents pull relevant historical data from the data lake, correlate it with other events, and enrich it with metadata before a human ever sees it.

The analyst opens a ticket with everything they need to make a decision. The impact is measurable. Gurucul's 2025 Pulse of the AI SOC report found that 60% of organizations using AI SOC tools have cut investigation time by at least 25%.

This breaks down silos that have always slowed incident response. Take impossible travel alerts. In the past, an analyst would email HR to ask if the employee is traveling, then wait days for a response. In an AI-native SOC, the enrichment engine checks the employee's calendar and email automatically. No waiting, no manual coordination.

The same applies to other critical contexts. Is the employee on a performance plan? Are they a contractor with unusual permissions? Did they trigger a DLP spike after being passed over for promotion? AI makes this information available immediately instead of forcing analysts to chase it across departments.

When a true positive hits the SIEM, the AI pipeline automatically queries the data lake for observables. The system does it and sends results back in seconds. This is the move from query-driven to context-delivered. Machines handle speed and scale. Humans focus on complex reasoning and high-impact remediation.

3. AI Democratizes Data Engineering & Levels Playing Field

For the past decade, Fortune 500 companies had a major leg up. They had the purse strings to invest heavily in data engineering teams. Sometimes, that meant having 10 or more folks cleaning, normalizing, and structuring security telemetry. Smaller organizations couldn't afford that. They were stuck with messy data and slower response times while attackers kept accelerating. AI changes that.

AI changes that equation. Mid-market teams can now adopt data cleaning and enrichment systems that used to require expensive custom engineering. AI democratizes strong data hygiene and puts smaller teams on more equal footing.

This matters because log volume continues to surge. Employees are using AI systems for tasks they used to do through Google search, and every AI interaction generates logs—authentication events, usage data, metadata. All of it flows into the SOC. Without AI-native filtering, log volume balloons to unsustainable levels.

In this sense, AI levels the playing field with adversaries leveraging their own AI to try to exploit vulnerabilities in these extremely noisy environments. AI-enabled upstream detection and machine-guided triage in the data pipeline enables data analysts to respond quicker to threats when telemetry data explodes. The modern SOC is now defined by data quality and architecture, not the number of tools or analysts. Clean telemetry, upstream detection, and agentic enrichment have become table stakes.

Organizations that adopt this approach will see dramatic reductions in mean time to detection, fewer false positives, and a level of automation that gives human analysts the space to actually do their best work.

What This Means for Security Teams

The AI-native SOC isn't theoretical as it's already emerging. It has become the blueprint for keeping pace with attackers using their own AI enhancements.

If you're a CISO, the question isn't whether to adopt AI. It's whether your data foundation is ready to support it. The teams pulling ahead are the ones investing in data pipelines that detect in real time, secondary storage that AI can query efficiently, and enrichment systems that deliver context automatically.

With these three shifts in place, the SOC becomes faster, smarter, and more capable than ever – and human analysts finally get to do their best work.

The post 3 Data-Based Shifts Defining AI-Native Cybersecurity Stacks appeared first on Realm.Security.

Engineering for the Inevitable: Managing Downstream Failures in Security Data Pipelines

Pete Martin — Wed, 18 Feb 2026 16:52:38 +0000

Today, data pipelines do more than move information. They play a key role in security. A Security Operations Center depends on a steady, high-quality stream of telemetry from different systems to tools like a SIEM or data lake.

Cloud-native setups have made pipelines more fragile. Now, they depend on downstream systems that can have unexpected outages, changes, or cryptographic updates. Research shows that about half of detection rule failures come from issues in the log delivery chain, not advanced attacker methods.

The Taxonomy of Downstream Fragility: Beyond Simple Outages

To build resilient systems, engineers need to understand the different ways downstream destinations can fail. While a full outage is obvious, less visible "soft failures" can quietly weaken detection over time.

Regional Outages and SaaS Unavailability: Centralized security platforms are at risk from regional cloud disruptions, like the frequent issues in AWS us-east-1. When a destination stops accepting data, standard point-to-point models often have cascading failures. Source-side buffers fill up quickly, which leads to dropped logs.

API Quota Exhaustion and Throttling: SaaS-based tools set strict API rate limits to keep multi-tenant systems stable. Failures happen when telemetry volume goes over these limits, often during "bursty" events like a DDoS attack. This can cause "partial ingestion," where important context for correlation is missing.

Performance Bottlenecks and Resource Skew: Even when a service is technically "up," it can still fail because of resource limits. Out-of-Memory errors can happen when processing very large files or from "data skew," where one partition has too much data. This causes significant lag in security alerts.

The Silent Failure of Schema Drift: If a developer changes a field name, such as from source_ip to src_address, the SIEM might still receive data but fill those fields with null values. This can make it seem like everything is working, while detection rules quietly stop working.

The Operational and Compliance Impact

Missing logs are more than a technical problem. They are a major compliance and forensic risk.

Forensic Blind Spots: During a major outage, which is when visibility is needed most, configuration changes and failover events generate critical logs. If these are lost, analysts cannot trace tactics like privilege escalation or identify persistence, as seen in the Microsoft September 2024 logging incident.

Audit Deficiencies: SOC 2 Type 2 reports check how well controls work over time. If there is an ingestion gap during an audit window, the organization cannot prove its security controls were effective. This leads to formal audit deficiencies.

Regulatory Violations: PCI DSS 4.0 Requirement 10 explicitly focuses on the integrity and availability of audit logs. A failure that results in dropped logs is a direct violation of these requirements.

Regional Infrastructure Disruptions: Lessons from the AWS us-east-1 Outage

The risk of downstream failure becomes real during regional cloud disruptions. For example, on October 20, a major AWS outage in the us-east-1 region made Splunk Cloud unavailable for about four hours. Organizations using standard point-to-point ingestion often face cascading failures, full buffers, and permanent loss of logs created during the downtime.

A real-world analysis of this event illustrates how a resilient security data pipeline maintains continuity:

Persistent Queuing: When the Splunk Cloud failure was detected, the pipeline did not stop ingestion or pause pipelines. Instead, it switched to buffering, writing the incoming log stream to a persistent, disk-backed queue.

Decoupled Ingestion: Data ingestion from all connected sources continued as normal during the four-hour window. This architectural isolation ensures that "blind spots" do not appear during major outages and failover events, which is when infrastructure visibility is most important for security.

Automated Catch-Up Dynamics: When Splunk Cloud came back online, the fabric automatically detected it and started a catch-up process. Data was delivered "as fast as Splunk could safely accept it," using an output capacity higher than the normal ingestion rate.

Zero Manual Intervention: The entire recovery was "hands-free," requiring no restarts, manual data re-plays, or "babysitting" from operators.

This event provides a production proof point: while outages in modern cloud environments are inevitable, data loss and operational disruption are not. By architecting for failure, organizations ensure that full observability is restored with complete historical data intact.

In this walkthrough, we show how the Fabric view provides real-time visibility into feed health and what happens automatically when a destination stops accepting data.

What to Evaluate in Your Own Pipeline

To move from reactive "firefighting" to resilient observability, security engineers should audit their infrastructure against these principles:

Durable Buffering: Does your pipeline rely on memory-constrained buffers that saturate in minutes, or does it utilize disk-backed persistent queuing to survive multi-hour outages?

Backoff Intelligence: Does your system handle HTTP 429 errors with automated backoff and jitter, or does it risk "blackholing" data?

Schema Evolution: Do you have a strategy, such as the Expand-Contract pattern, to manage field renames without breaking downstream detection rules?

Cryptographic Agility: Does your secrets management support overlapping key versions for zero-downtime rotation?

Catch-Up Capacity: Is your pipeline's output capacity significantly higher than its normal ingestion rate to allow for rapid recovery?

Ready to eliminate security blind spots?

Learn more about how Realm’s Security Data Pipeline provides the persistent queuing and automated recovery needed to ensure zero data loss for your enterprise.

The post Engineering for the Inevitable: Managing Downstream Failures in Security Data Pipelines appeared first on Realm.Security.

Jeff Kraemer’s Cyber Journey: The Cyber Roundtable Episode 1

John Greene — Wed, 04 Feb 2026 22:50:31 +0000

I've had a front row seat for the last 15 years on the peaks and valleys of this cyber. I have seen the ebbs and flows and everything in between.

To really understand where we’re headed, we need to hear from the people building these systems: the investors, the operators, and the founders.

That’s why we’re launching The Cyber Roundtable: Security Evolutions.

This series features conversations with cybersecurity leaders who have shaped the industry. We’ll talk about their career paths, tough lessons, and what it takes to lead security teams through constant change.

In our first episode, I sat down with Jeff Kraemer, the Co-Founder and CTO of Realm.Security.

Jeff has been doing this since before most of us had email. He cut his teeth at DEC in the 90s, building embedded systems.

He shifted from networking to network security at Raptor Systems, back when firewalls were just starting to appear. He then worked on endpoint security at Okina and Cisco, and later helped pioneer EDR at Confer and Carbon Black.

Jeff has seen every major inflection point.

We discussed how the early days of the cloud are similar to where we are now with AI. Back then, people were afraid to put security data in the cloud. Now, it’s essential for any enterprise.

He believes AI is following the same path. There’s skepticism now, but its usefulness will eventually win people over.

We also talked about his time away from building security products, when Jeff took on a CISO role at a legal tech startup.

That experience changed the way he approaches building software.

He realized that CISOs aren’t losing sleep over finding a better tool. They’re more concerned about compliance and the high cost of processing data.

That change in perspective drives us today. Jeff explains the difference between selling a toolkit and selling a solution. Toolkits work well for the top 5% of teams with unlimited resources.

Everyone else just wants their problems solved.

It was a great conversation about the real challenges of engineering and the lessons learned along the way.

Check it out.

The post Jeff Kraemer’s Cyber Journey: The Cyber Roundtable Episode 1 appeared first on Realm.Security.

How Realm Data Haven Solves Long-Term Log Storage and Fast Resupply for SOC Teams

Pete Martin — Wed, 21 Jan 2026 14:41:22 +0000

SIEMs were built for detection, not decades of storage. Yet most teams now use them as long-term archives. The result is high cost, slow retrieval, and pressure to drop visibility. Realm Data Haven fixes the root problem by separating detection from retention.

How SIEMs Became Expensive, Accidental Archives

Security teams did not choose to turn SIEMs into archives. Regulations forced longer retention. Breaches now take an average of 241 days to detect. Cloud sprawl drove massive log growth. The fastest option became dumping everything into the SIEM. The outcome is premium pricing tied to low-value data. Only 35 percent of stored SIEM data provides real detection value. Self-managed cloud tiering promised savings but created custom integrations, week-long resupply delays, and brittle workflows. Many teams abandoned it and stayed locked into high SIEM storage bills.

Real-Time Detection and Long-Term Storage Need Different Architectures

Real-time detection and long-term retention impose different technical demands. Detection needs fast ingestion, correlation, and search. Retention requires low-cost storage, robust controls, and rapid retrieval without query engineering.

Forcing both into the same platform breaks both workflows. Realm treats them as separate but connected layers inside the Security Data Pipeline Platform. Data Haven serves as the long-term storage and resupply layer. Realm Focus handles filtering and routing for real-time detection.

Introducing Realm Data Haven

Realm Data Haven removes storage pain from your SOC. Data moves to secure archive storage with zero configuration. Retrieval happens through normalized IOCs and observables. No custom query language. No cloud scripting. No waiting weeks for results. Data Haven keeps your archive usable without loading your SIEM with cost and noise.

How Data Haven Works

Zero Configuration Onboarding
Every data source connected to Realm routes into Data Haven automatically. No storage setup. No routing rules. No manual tuning.

Normalization on Ingestion
Logs from firewall, endpoint, identity, and cloud are normalized at ingest. When resupply happens, analysts receive structured data ready for use across tools.

Configuring a Resupply Destination
Teams mark a destination as resupply eligible. Realm keeps production feeds separate from resupplied data to maintain clean workflows.

IOC and Time-Range Guided Retrieval
Instead of regex or vendor-specific syntax, teams retrieve archived data by username, hash, email, IP address, hostname, process name, URL, time window, and source.

Two Resupply Types
IOC and Observable Resupply for threat hunting and investigations across recent history.
Archival Resupply for long-term compliance and forensic reviews across the full archive.

Confirmation Workflow
Realm calculates the resupply size before transfer. Teams approve before data moves. No surprise charges. No accidental floods.

Why This Matters for the SOC

Data Haven removes the operational friction that has plagued security investigations for years. The impact manifests across multiple operational dimensions.

Accelerated Investigations
When an analyst needs historical context—whether tracing lateral movement, investigating a delayed alert, or responding to a newly discovered IOC—Data Haven eliminates the multi-week wait that characterizes legacy archival solutions. Resupply happens in hours, not weeks. Analysts stay in the flow of investigation rather than waiting on data retrieval tickets to be fulfilled.

No More Query Language Barriers
Security analysts shouldn't need to be data engineers. Data Haven's guided retrieval using normalized IOCs means junior analysts can retrieve archived data with the same ease as senior investigators. There's no need to master complex query languages, understand source-specific log formats, or write regex patterns. The interface guides you through what's possible and surfaces only relevant options.

Prevents SIEM Overload
By routing comprehensive historical logs to Data Haven instead of forcing everything into the SIEM, security teams regain control over SIEM costs and performance. The SIEM can focus on what it does best—real-time detection and correlation—while Data Haven handles what it does best: cost-effective, long-term retention with rapid resupply.

Enables Smarter Data Strategy
When paired with Realm Focus, Data Haven completes the end-to-end lifecycle of intelligent security data management. Focus filters low-value telemetry before it reaches your SIEM, reducing costs and noise. Data Haven ensures that filtered data isn't discarded—it's archived, normalized, and retrievable when investigations demand it. This combination allows security teams to be aggressive with filtering, confident that archived data remains accessible if needed.

Supports Proactive Threat Hunting
Threat hunting often requires analyzing historical patterns that aren't visible in real-time data alone. Data Haven's rapid resupply and normalized observables enable hunters to pull relevant historical data for analysis without waiting on manual retrieval processes or overburdening the SIEM with retrospective queries.

Ready to Stop Using Your SIEM as an Archive?

Realm Data Haven delivers true operational efficiency by automating the mundane, so your team can focus entirely on threat hunting and response. Zero-touch archiving, simplified retrieval, and purpose-built design for security workflows, finally, an archival solution that works the way your SOC operates.

Learn how Realm Data Haven simplifies your security data lifecycle and pairs with Realm Focus to complete your intelligent data strategy.

The post How Realm Data Haven Solves Long-Term Log Storage and Fast Resupply for SOC Teams appeared first on Realm.Security.

Embracing Uncertainty with AI Agents: Vulnerability Assessment using Pydantic AI

Colin Jermain — Thu, 08 Jan 2026 13:14:02 +0000

TLDR: We show union-type structured output allows AI agents to handle uncertain outcomes, critical for auditable and accurate vulnerability triage.

Dependency scanners are a critical part of identifying software vulnerabilities. However, they have a problem – vulnerability overload. Which 5 vulnerabilities are most important out of the 487 that the dependency scanner just flagged? Given limited resources, this is a major challenge for modern security teams.

We show in a practical example that AI Agents can analyze vulnerability context automatically and help security experts prioritize the most important. The source-code and an example vulnerable Python package are available in the realm-security/agent-union-type repo on GitHub.

Vulnerability triage illustrates a core problem faced by AI/ML Engineers building agents – the agent needs to handle uncertainty gracefully when it does not have enough information. A vulnerability agent provides value by:

1) Analyzing the vulnerability in depth to extract detailed information
2) Knowing when to raise the findings to the security team for review

When using structured output to capture the detailed information, a single response schema has a major flaw. In the case of insufficient context, the agent is forced to fill out each detailed field – resulting in hallucinated fields when necessary to match the format. Hallucinations at this level can mask real security threats that need to be reviewed, and also cause alert fatigue by passing incorrect assertions on to the security team.

Figure 1: Vulnerability Agent handling insufficient context: A) a single structured output forces hallucinations due to detailed required fields that don’t match the situation. B) Union-type structured output enables the Agent to admit uncertainty, and provide clear visibility downstream.

With typed agentic frameworks, like Pydantic AI, a union-type allows the agent to select the appropriate response to handle the situation at hand. We show this simple technique greatly improves the accuracy of agentic systems, and unlocks observability that security teams need to audit and make the best use of AI.

Vulnerability Overload

Modern applications are built on hundreds of dependencies that make up the software supply chain. Although an open-source package may have only a few dependencies, the transitive (indirect) dependencies represent a large surface. A 2024 study found average packages in npm and PyPI ecosystems had approximately 3 direct dependencies, with 32 transitive dependencies for npm and 37 for PyPI. Given this can scale to hundreds or thousands of packages across an enterprise codebase, CISOs need to prioritize the most critical vulnerabilities to address first.

CVSS scores provide a valuable representation of the criticality of a vulnerability, but they don’t tell the full story. A CVSS 9.8 vulnerability in an unused code path isn’t as high a priority to address than a CVSS 6.5 vulnerability in user input handling code. Context matters to understand whether a vulnerable function or code path is relevant to the application at hand.

Manual triaging of each vulnerability is often unrealistic for modern security teams given limited resources. Both the scale of transient dependencies, and the frequency of updates drives the challenge of managing mean-time to patch (MTTP). What if an AI agent can analyze the vulnerability context, understand the specific application architecture, and prioritize them based on actual exploitability? Most importantly, can the agent gracefully handle the situation where it doesn’t have enough information?

Gracefully Handling Uncertainty

For security decisions, false confidence is more dangerous than admitting a situation requires further review. The CVE-2017-5638, and resulted in the exposure of personal information for 148 million Americans. The stakes are high to make the right decision.

Traditional ML models produce probability predictions that can help quantify when a model is not confident in the outcome – for example, 67% likelihood of a critical vs. non-critical vulnerability.

The wide variety of text generated by GenAI systems makes understanding agent confidence difficult from natural language directly. Structured output solves this problem by restricting the LLM response to a provided schema.

A critical vulnerability analysis should include:

1) CVE identifies that track the vulnerability and exploit (e.g. CVE-2021-44228)
2) Details about the package name and version
3) CVSS score to contextualize the severity
4) Detailed explanation of how the CVE is exploitable in the specific application code
5) Priority assessment identifying the urgency of patching

The following Pydantic schema expresses this required information and sets specific constraints that allows validation of the schema.

				
					
class CriticalVulnerability(BaseModel):
    """A software vulnerability that requires immediate action."""

    cve_id: str = Field(description="CVE identifier (e.g. CVE-2021-44228)")
    package_name: str = Field(description="Affected package name")
    current_version: str = Field(description="Currently installed version")
    fixed_version: str = Field(description="Version that patches the vulnerability")

    severity_score: float = Field(
        ge=7.0, le=10.0, description="CVSS 3.1 base score (only 7.0+ flagged as critical"
    )

    exploitability_reason: str = Field(
        description=(
            "Detailed explanation of why this CVE is exploitable in this "
            "specific application context. Must reference actual code paths, "
            "exposed APIs, or attack vectors."
        )
    )

    remediation_priority: int = Field(
        ge=1,le=5,description=(
            "Urgency: 1=Patch immediately, 2=This week, 3=This month, 4-5=Next cycle"
        )
    )

    public_exploit_available: bool = Field(
        description="Whether exploit code is publicly available"
    )

    business_impact: str = Field(
        description=(
            "Potential impact if exploited (e.g. 'Customer data exfiltration', 'RCE "
            "on production servers')"
        )
    )

Each field within the CriticalVulnerability class drives the LLM to generate appropriate information in an actionable format. For example, the remediation_priority field identifies the urgency of patching by forcing the LLM to generate integers from 1 to 5 that represent:

● patch immediately
● this week
● this month
● next cycle

The exploitability_reason field provides a detailed explanation, which can help provide context to a human reviewer or during an audit. This detailed breakdown is critical for making the analysis actionable for both humans and downstream systems alike.

With Pydantic AI, we can use the CriticalVulnerability class to restrict the output of the agent. We use a list type to allow the agent to respond with multiple vulnerabilities given the scan input.

				
					vuln_agent = Agent(
    MODEL_ID,
    output_type=list[CriticalVulnerability],
    system_prompt=VULN_AGENT_SYSTEM_PROMPT,
)

Providing the output type exposes the schema to the LLM. The description argument of the Field classes in the Pydantic schema instructs the LLM, simplifying the system prompt. When we call the agent with the relevant information, Pydantic AI will retry the request automatically if the LLM fails to comply with the structured output requirements. The library also uses smart approaches to increase the likelihood of the LLM successfully formatting the data, such as using tool calling APIs and decomposing lists into multiple tool calls.

Using a single output type is similar to having a single conditional in an if-then statement. The “else” cases are not included, so the program is unable to handle them. In our example, the agent is unable to express the case when it has insufficient information. Forcing the agent to respond without sufficient information has a high probability of hallucinations, or producing an empty result that acts as a false negative.

Figure 2: Union-type structured output allows the agent to respond effectively when unable to assess the situation. This enables Expert Review by the security team, and a detailed Audit Trail of the entire assessment through OpenTelemetry (OTEL) observability.

Union-type outputs allow the agent to select between multiple output schemas, based on the most appropriate schema for the response. We can introduce a UnableToAssess class to allow the agent to express this situation, using the output_type of list[CriticalVulnerability] | UnableToAssess. Although a small and simple change to make, this provides a critical outlet for the agent when dealing with insufficient information.

				
					vuln_agent = Agent(
    MODEL_ID,
    output_type=list[CriticalVulnerability] | UnableToAssess,
    system_prompt=VULN_AGENT_SYSTEM_PROMPT,
)

The schema for the UnableToAssess case provides additional benefits. Including a justification field requires the LLM to produce a detailed text response explaining the reason why the context was insufficient for an assessment. This rich context enables human security experts to quickly understand the situation upon escalation.

				
					
class UnableToAssess(BaseModel):
    """Cannot determine exploitability with confidence; requires manual security review."""
    justification: str = Field(
        description=(
            "Detailed explanation of why automated assessment failed. "
            "Examples: 'Cannot determine if vulnerable function is called', "
            "'Insufficient information about network exposure', "
            "'Conflicting informaiton in CVE database and package changelog'"
        )
    )

    flagged_cves: list[str] = Field(
        description="CVE IDs that need manual review"
    )

    recommended_action: str = Field(
        description=(
            "Specific next steps for the security team. "
            "Examples: 'Run dynamic analysis to test exploit', "
            "'Review network firewall rules', "
            "'Contact package maintainer for clarification'"
        )
    )

    uncertainty_category: str = Field(
        description=(
            "Type of uncertainty: 'insufficient_context', 'ambiguous_cve_data', "
            "'complex_dependency_chain', 'missing_exploit_details'"
        )
    )

The additional fields provide detailed content that empower human reviewers or downstream systems to take appropriate action. For example, the recommended_action field gives a specific recommendation based on the information available, that can help orient a security expert to the situation.

When the LLM does not have sufficient information, a security expert gets the detailed vulnerability analysis from UnableToAssess to consider for next steps. There are multiple ways to escalate to the human-in-the-loop, including Slack notifications, emails, and database entries that are exposed with a UI. For the purpose of this demonstration, we focus on the backend observability that can be gained using OpenTelemetry (OTEL).

Observability Layer

The union-type pattern allows for clear conditional handling of the different cases that the agent encounters. Using OTEL, we can instrument our agentic code to report on the occurrence of each case, allowing a top-level view of the agent’s actions. This helps build a clear audit trail that the security team can use to understand which vulnerabilities were identified as critical, and when the agent needed further review.

Pydantic AI has excellent support for OTEL due to their integration with Logfire. It’s also possible to use this tracing directly with a different OTEL monitoring system, such as Honeycomb, Langfuse, or Jaeger. From this, CISOs and their team can build high-level dashboards that reflect the agent’s work. Critically, this allows CISOs to report on vulnerabilities identified and MTTP improvements, with detailed visibility on the occurrences that require escalation to their team. These metrics also fuel iterating and improving the agent itself.

The output_validator decorator provides a useful hook to instrument the agent for OTEL visibility. Based on the type of structured output, the code delegates to an appropriate handler.

				
					@vuln_agent.output_validator
async def validate_and_instrument(
    ctx: RunContext[SecurityContext],
    output: list[CriticalVulnerability] | UnableToAssess,
) -> list[CriticalVulnerability] | UnableToAssess:
    """Validate the agent's output and send telemetry for monitoring."""

    if isinstance(output, UnableToAssess):
        log_unable_to_assess(ctx, output)

        # Consider adding:
        # - Create a Jira ticket for manual review
        # - Send a Slack notification to the security team
        # - Update a vulnerability management dashboard
    else:
        log_critical_vulnerabilities(ctx, output)
    return output

In the case of UnableToAssess, an OTEL span is created with the structured output fields, and a warning log is generated. The spans help establish the connection to the overall tracing of the agentic behavior. Beyond logging, a number of other integrations can be added here. For example, a Slack message can be sent to the Security team to ensure they have rapid awareness of the situation.

				
					async def log_unable_to_assess(
    ctx: RunContext[SecurityContext],
    output: UnableToAssess
) -> None:
    """Log UnableToAssess outcome with OTEL span and log."""
    with logfire.span(
        "vulnerability_assessment_uncertain",
        level="warning",
        justification=output.justification,
        flagged_cves=output.flagged_cves,
        uncertainty_category=output.uncertainty_category,
        application_type=ctx.deps.application_type,
        internet_facing=ctx.deps.internet_facing,
        runtime_environment=ctx.deps.runtime_environment,
    ) as span:
        span.set_attribute("recommended_action", output.recommended_action)
        span.set_attribute("uncertainty_event", True)
        span.set_attribute("requires_manual_review", True)
        span.set_attribute("cve_count", len(output.flagged_cves))

    logfire.warn(
        "Agent unable to assess vulnerability risk -- manual review required",
        cve_count=len(output.flagged_cves),
        justification_summary=output.justification[:200],
        uncertainty_type=output.uncertainty_category,
    )

When the AI agent has sufficient information and outputs list[CriticalVulnerabilities], the spans and logs focus on providing context on the appropriate patching actions that are required.

				
					async def log_critical_vulnerabilities(
    ctx: RunContext[SecurityContext],
    output: list[CriticalVulnerabilities]
) -> None:
    """Log successful analysis outcome with OTEL span and log."""
    critical_count = len(output)
    high_priority_count = sum(1 for v in output if v.remediation_priority <= 2)

    with logfire.span(
        "vulnerability_assessment_complete",
        level="info" if critical_count == 0 else "warning",
        critical_vuln_count=critical_count,
        high_priority_count=high_priority_count,
        application_type=ctx.deps.application_type,
    ) as span:
        span.set_attribute("assessment_successful", True)

        # Log each critical vulnerability for tracking
        for vuln in output:
            logfire.info(
                f"Critical vulnerability identified: {vuln.cve_id}",
                cve_id=vuln.cve_id,
                package=vuln.package_name,
                cvss_score=vuln.severity_score,
                priority=vuln.remediation_priority,
                public_exploit=vuln.public_exploit_available,
                exploitability=vuln.exploitability_reason[:200],
                business_impact=vuln.business_impact,
            )

            # Alert on immediate priority vulnerabilities
            if vuln.remediation_priority == 1:
                logfire.error(
                    f"URGENT: Critical vulnerability requires immediate patching",
                    cve_id=vuln.cve_id,
                    package=vuln.package_name,
                    business_impact=vuln.business_impact,
                )

This detailed logging provides a comprehensive audit trail on what actions were taken by the agent, and clearly separates when the agent is uncertain of how to proceed, and when it can provide a confident recommendation. This example illustrates the simplicity of extending the union-type pattern to take deterministic action through downstream systems.

Conclusion

By embracing uncertain outcomes using union-type structured output, we allow AI agents to handle complex situations without resorting to hallucination. We’ve shown that simply adding an UnableToAssess path allows agents to fail safely when assessing software vulnerabilities, transforming a potential oversight into a specific request for expert review.

By coupling this pattern with OTEL observability, security teams gain a transparent audit trail. They can distinguish clearly between confident automated triage and cases requiring human intervention. This approach turns agent uncertainty from a liability into a documented, actionable workflow, ensuring that high-stakes vulnerability management remains both scalable and secure. For CISOs, we see this level of auditability as a core security requirement for deploying AI Agents. For AI/ML Engineers, this telemetry is equally as important, to ensure the systems can be improved continuously based on real system behavior.

The post Embracing Uncertainty with AI Agents: Vulnerability Assessment using Pydantic AI appeared first on Realm.Security.

Realm.Security Caps 2025 with $2M Strategic Investment from Presidio Ventures

Team Realm — Thu, 11 Dec 2025 14:06:25 +0000

Emerging leader in security data pipelines has raised $22M in 15 months, expanded headcount 250%, and helped CISOs slash six figures in SIEM ingestion costs

Boston, MA – December 11, 2025 – Realm.Security, the company pioneering the industry’s first AI-native Security Data Pipeline Platform (SDPP), today announced a $2M strategic investment from Presidio Ventures, the corporate venture arm of Japanese-headquartered Sumitomo Corporation. A global integrated trading company with a history spanning 400 years, Sumitomo operates numerous subsidiaries around the world, including SCSK, a leading global system integrator, providing managed IT and cybersecurity services to more than 8,000 enterprise customers. Realm will use the investment and access to Sumitomo’s network to accelerate its Asia-Pacific expansion through channel partners and further strengthen its executive team.

“We’re excited to bring on Presidio Ventures as a strategic investor as we meet rising global demand for our platform from security teams who are drowning in data noise and facing spiraling storage and compute costs,” said Pete Martin, co-founder and CEO of Realm.Security. “This investment caps an impressive first full calendar year for Realm, and we believe Sumitomo’s deep connections will help us enter the Asia-Pacific market and scale quickly in 2026. The region is now the third-largest cybersecurity market globally, and Japan is a particularly compelling entry point for our channel-led strategy, given that many enterprises there outsource their Security Operation Centers.”

“Realm is using AI to give every enterprise security team the ability to manage, filter, and normalize security data in ways that, until recently, were only possible for Fortune 1000 companies with dedicated cybersecurity data engineering teams,” said Ross Leav of Presidio Ventures. “We’re excited to join their mission to democratize this capability as the cybersecurity industry faces a data crisis. At the same time, the industry also faces a human resource crisis. Especially in Japan where talented cyber professionals are becoming an increasingly scarce resource, enterprises need to find ways to do more with less. This is a perfect opportunity to employ AI, both to meet the challenge of data overload but also to enable cyber professionals to focus their attention on the most critical threats. We believe Realm can be an essential tool for modernizing our own security operations services while at the same time improving outcomes for the many customers who rely on us.”

The strategic investment follows a $15M Series A round for Realm in October, led by Jump Capital and including participation from Glasswing Ventures and Accomplice. Other recent milestones include Realm being recognized as an “Emerging Leader” in the SDPP category in analyst firm SACR’s “The Rise of Security Data Pipeline Platforms as a Control Plane for the SOC report,” as well as receiving a BostInno 2025 Fire Award as one of the 50 companies having a major impact on the local startup ecosystem. More impressively, Vensure Employer Solutions, a 10,000-person benefits and payroll provider, recently reported using Realm’s platform to cut firewall log volumes by 83%, saving $250,000 annually.

To fuel its momentum and global expansion, Realm.Security is continuing to scale its team, increasing headcount by 250% in 2025. Recent leadership appointments include Holly Cappello as Chief Revenue Officer and Colin Jermain as Vice President of Data Science. Cappello brings more than a decade of cybersecurity sales experience, most recently serving as SVP of Global Sales at Cado Security, which was acquired by Darktrace. She previously held senior sales roles at Menlo Security and Carbon Black. Jermain joins from SecurityScorecard, where he was Senior Director of Data Science, and previously served in multiple senior data science roles at Vectra AI. These appointments underscore Realm’s continued investment in revenue, data, and AI leadership as it enters its next phase of growth.

About Realm.Security Realm.Security helps security teams cut costs and improve outcomes by transforming how they manage and route security data. Headquartered in Boston, the company was founded by industry veterans with decades of experience defending against evolving threats. Realm’s AI-native Security Data Pipeline Platform is radically simple to deploy and operate, embedding artificial intelligence across the platform to deliver faster, smarter outcomes without the manual overhead.

About Presidio Ventures Presidio Ventures is the corporate venture capital arm of Sumitomo Corporation, one of the world's leading integrated trading companies. With offices in Silicon Valley, Boston, New York, London and Los Angeles, it has invested in more than 200 companies, providing both financial backing and business development expertise to help startups scale globally. www.presidio-ventures.com

The post Realm.Security Caps 2025 with $2M Strategic Investment from Presidio Ventures appeared first on Realm.Security.

We Raised $15M to Build the Future of Security Data

Pete Martin — Wed, 08 Oct 2025 12:44:29 +0000

Security teams deserve better. We’re building it.

We’re excited to share that Realm Security has raised a $15M Series A, just 12 months after our $5M seed round. We wouldn’t be here without our customers, our team, and our partners. Thank you for believing in what we’re building and for pushing us to make security data smarter, faster, and more useful every day.

What this means for you

This funding helps us move faster on the things that matter most to security teams:

More product innovation. We’re rolling out new modules quickly, each designed to make your data pipeline smarter, cleaner, and more actionable. You’ll see Realm continue to evolve with AI-powered features that cut noise and surface the signals that matter most.

More integrations. Security stacks are big and messy. We’re making Realm plug in seamlessly with the tools you already use, so it feels natural in your workflow instead of like another system to manage.

More efficiency for the SOC. Clean, structured data is the foundation for AI-driven security operations. Realm is becoming that foundation, so teams can cut costs, respond faster, and stay focused on defending, not plumbing.

For current customers, this means faster time-to-value and more automation without hiring more people or paying for professional services. For future customers, it means a platform that grows with you — your tools, your threats, your environment.

The industry is at a turning point. Teams that fix their data strategy now will be ready for an AI-driven future. Teams that don’t will keep paying more and getting less.

Where it all began

Two years ago, we kept hearing the same thing from CISOs: budgets were out of control, not because of people or tools, but because of data.

Raw logs were flooding SIEMs, creating more noise than clarity. Teams were left with impossible choices — overspend or miss critical threats. Legacy pipelines weren’t helping. They moved logs from one place to another but didn’t add any intelligence.

We started Realm because security data should be an asset, not a liability. It should help teams focus, not bury them. And it should never force a tradeoff between protection and cost.

From the start, we’ve been security-first. My co-founders, Jeff and Sanket, brought deep experience in security operations and scalable architectures. I brought a background in growing companies through seven exits, including two IPOs. Together we set out to build the first independent AI-native telemetry pipeline built specifically for security teams.

How AI is the difference

Traditional pipelines treat all data the same. Realm doesn’t. We embed AI and machine learning throughout the pipeline, analyzing and structuring telemetry in real time so SOC teams only see what matters in their environment.

Being AI-native means Realm replaces the endless manual work of writing rules and reconfiguring pipelines. With machine learning and large language models that understand security context, the platform automates filtering, configuration, routing decisions, and more. As tools, data, and threats evolve, Realm evolves with them. This takes professional services off the table and lets SOC teams stay focused where it counts — on defense.

The results speak for themselves:

Fast deployment. Customers are live in about seven days.

Massive savings. Vensure Employer Solutions reduced firewall log volumes by 83 percent, saving $250,000 a year.

Always current. Realm adapts automatically as tools, sources, and threats change, so you don’t have to babysit it.

As Dwayne Smith, CISO at Vensure, said: “Realm saves us a significant amount of operational budget, which can be repurposed for other strategic priorities. It is a game-changer for budget-constrained security teams.”

In a market dominated by consolidation, Realm is proud to be the only independent AI-native security data pipeline. Built for security. Built to scale. Built to last.

Turning feedback into features

Our customers are the reason Realm works the way it does. They’re why we build fast and why our roadmap keeps expanding. Over the past year, we’ve shipped new modules and integrations at record speed, giving teams more ways to reduce noise and get value from their data.

Every feature starts with a real-world problem and ends with something that works in production.

We’re also grateful to our advisors, who’ve helped us avoid missteps and move faster with their guidance on both technical architecture and go-to-market strategy.

Backed by believers who know their stuff

We’re lucky to have investors who understand both the pain security teams are facing and the future we’re working toward. This round was led by Jump Capital, with Glasswing Ventures and Accomplice joining in. They bring experience, perspective, and support that will help us keep moving quickly.

Be the game-changer in your security team

Clean, structured data is the foundation. Without it, the promise of AI in security will never scale. That’s why customers adopting Realm today are setting themselves up for what comes next.

If you’re a security leader who wants to see what an AI-native pipeline can do, try Realm free for 30 days. Up to 500 GB a day, no strings attached. You’ll be up and running in days. Grab a demo here.

If you’re an engineer, go-to-market leader, or customer success pro who wants to help build the future of security data, we’re hiring. We want to hear from you.

The post We Raised $15M to Build the Future of Security Data appeared first on Realm.Security.

Realm.Security Redefines Security Data Pipelines with AI, Raises $15M to Accelerate Next-Gen SOC Operations

Team Realm — Wed, 08 Oct 2025 12:18:43 +0000

As enterprises battle unsustainable SIEM costs and overwhelming log volumes, Realm’s AI-native pipeline platform sets a new deployment benchmark: deploy in days, cut data volumes by up to 83%, and save millions.

Boston, MA – October 8, 2025 – Realm.Security, the company pioneering an AI-native Security Data Pipeline Platform (SDPP), today announced a $15 million Series A funding round led by Jump Capital, with participation from Glasswing Ventures and Accomplice. The financing will accelerate product development and market expansion as enterprises face a mounting crisis: security operations teams drowning in data noise and spiraling security information and event management (SIEM) costs.

“Security data has become one of the most expensive and complex problems in enterprise IT,” said Pete Martin, CEO of Realm.Security. “Realm exists to solve that problem at the root. By building the industry’s first AI-native Security Data Pipeline, we’re giving CISOs and SOCs clean, structured data they can trust that is fast, efficient, and radically more cost-effective.”

The Data Crisis in Security

Security Operations Centers (SOCs) process billions of daily events. Legacy pipelines simply shuttle raw data logs between tools, leaving analysts buried in irrelevant telemetry and budgets consumed by SIEM ingestion costs. According to the 2025 SANS SOC Survey, 42% of SOCs dump all incoming data into a SIEM, often without a retrieval or management plan. This practice drives up costs and leaves teams without a clear path to manage or analyze the data. Chief Information Security Officers (CISOs) face an impossible tradeoff: pay more, or see their teams overwhelmed.

Realm.Security changes this dynamic. The platform embeds artificial intelligence throughout the pipeline, applying real-time analysis and filtering so SOCs only digest the data that matters. The result is faster investigation, lower costs, and more resilient security operations.

Measurable Results in Days

Enterprises using Realm.Security are seeing immediate, quantifiable impact. Deployment times across the enterprise average just seven days, compared to months with legacy solutions.

Vensure Employer Solutions, a 10,000-person benefits and payroll provider, cut firewall log volumes by 83%, saving $250,000 annually.

“Realm’s Data Filtering module (Realm Focus) allows us to remove data that would never be needed for detection or investigation,” said Dwayne Smith, Senior Vice President of Information Security and Global Chief Information Security Officer at Vensure Employer Solutions. “Realm saves us a significant amount of operational budget, which can be repurposed for other strategic priorities. It’s a game-changer for budget-constrained security teams.”

Realm automatically adapts to changing security tools and data sources, eliminating the need for costly professional services that traditional pipeline implementations require. Being AI-native, Realm replaces the need for manual rule-writing and constant reconfiguration by security teams. Leveraging Machine Learning and Large Language Models, which understand security context, Realm automates filtering rule creation, configuration, routing decisions, and much more. As tools, data, and threats evolve, Realm evolves with them. This eliminates the need for professional services and keeps SOCs focused on defense rather than data plumbing.

“Realm.Security is helping us run a more efficient SOC,” added Aaron Weismann, Chief Information Security Officer of Main Line Health. “It provides trustworthy data we can depend on, which lowers the overhead of managing data and creates more space for our team to focus on advancing our security posture.”

Investor Confidence in a New Category

“Realm’s AI-native approach sets a new benchmark: fast deployment, measurable savings, and a roadmap toward enabling the next generation of AI-driven SOC operations,” said Saaya Pal, Partner at Jump Capital. “We see Realm becoming the trusted foundation for how enterprises manage their security data.”

Realm.Security, Inc.’s Security Data Pipeline Platform (SDPP) is available today. To learn more about Realm.Security request a demo at realm.security/request-a-demo.

About Realm.Security
Realm.Security helps security teams cut costs and improve outcomes by transforming how they manage and route security data. Headquartered in Boston, the company was founded by industry veterans with decades of experience defending against evolving threats.

Realm’s AI-native Security Data Pipeline Platform is radically simple to deploy and operate, embedding artificial intelligence across the platform to deliver faster, smarter outcomes without the manual overhead.

About Jump Capital
Jump Capital is a founder-focused, early-stage venture firm investing in fintech, application software, and infrastructure software. Investing out of its $350 million seventh fund, Jump backs visionary founders solving complex, high-impact problems across industries.

Founded and led by seasoned operators, Jump Capital brings deep experience, curiosity, and a hands-on approach to every partnership. The firm is committed to being the investor that founders call when it matters most, combining capital with strategic guidance, operational expertise, and long-term support. Learn more at www.jumpcap.com

About Glasswing Ventures
Glasswing Ventures is a first-capital-in venture capital firm dedicated to investing in startups applying AI and frontier technology to enterprise and cybersecurity markets. The Boston-based firm was founded by visionary partners with decades of experience in these markets, a disciplined investment approach, and a strong track record of industry-leading returns.

Glasswing leverages its deep domain expertise and world-leading advisory councils to invest in exceptional founders who transform markets and revolutionize industries. Visit Glasswing Ventures for more information.

About Accomplice
Based in Boston, MA, Accomplice has been part of the origin story of AngelList, Carbon Black, CoinList, DraftKings, FalconX, Integral Ad Science, Hopper, HQO, Near, Orchard, Patreon, PillPack, Recorded Future, SecurityScorecard, Veracode, WorkHuman, WHOOP, ZOE, Zoopla, and more. Accomplice is behind the Spearhead operator-angel movement and Accomplice Blockchain, and is also an anchor LP behind numerous solo GP funds, including Archetype, Vinyl, Vibe, and Wolfhead.

The post Realm.Security Redefines Security Data Pipelines with AI, Raises $15M to Accelerate Next-Gen SOC Operations appeared first on Realm.Security.

How SIEMs Became Accidental Archives

Pete Martin — Wed, 01 Oct 2025 14:30:43 +0000

TL;DR: SIEMs have become expensive, inefficient archives due to regulatory requirements, long breach detection times (avg. 241 days), and infrastructure sprawl. The problem: organizations pay premium SIEM pricing to store massive volumes of low-value logs (only 35% deliver real threat detection value), while exponential log growth creates budget overruns. Self-managed archives on cloud storage may seem cheaper, but they require painful custom integrations, complex resupply processes (2-4 weeks), and are often abandoned.

The solution: Route only high-value, actionable data to your SIEM for real-time detection. Archive everything else in a purpose-built system like Realm's Data Haven that handles compliance automatically, normalizes data on ingestion, and enables seamless resupply through guided retrieval (no custom queries or regex). This separates real-time security operations from long-term retention, cutting costs while maintaining investigative capability without the 2-4 week wait times.

The transformation of SIEMs from security tools to de facto archives didn't happen overnight. Several converging forces pushed organizations down this costly path.

Regulatory Complexity Drives Retention: The expansion of data protection regulations fundamentally changed the game. HIPAA demands six years of retention for healthcare organizations, Sarbanes-Oxley (SOX) requires seven years for financial records, and PCI DSS mandates at least one year with three months readily available. Faced with this regulatory maze, organizations defaulted to storing everything in their existing SIEM infrastructure rather than architecting purpose-built retention strategies.

The 200-Day Breach Reality: Modern cyber threats, particularly Advanced Persistent Threats (APTs), can lurk undetected in networks for extended periods. According to the 2025 Cost of a Data Breach Report, enterprise organizations take an average of 241 days to identify and contain a data breach. This sobering reality prompted security teams to retain logs for longer periods. Using the average of 241 days, a data retention window of six months would only help identify around 50% of data breaches.

SOC Consolidation and Infrastructure Sprawl: As organizations consolidated their security operations centers, SIEMs naturally became the repository for all security-related data. Simultaneously, the proliferation of cloud services, IoT devices, microservices, and remote work environments dramatically increased both the volume and variety of security data. This made centralized logging through a single SIEM platform attractive, as it eliminated the need for security teams to build and manage multiple, custom data pipelines for each source. Without a dedicated data platform, creating and managing a separate archive can seem daunting and cumbersome, making the singular SIEM solution appear to be the "easy button".

The Hidden Costs of Using a SIEM as an Archive

The financial burden of using SIEMs for long-term retention represents one of the most underestimated challenges in modern cybersecurity economics.

Sky-high prices for GBs and TBs: SIEM platforms have been notoriously expensive for decades, primarily because their pricing models directly tie expenses to data volume. According to the SANS 2025 SOC Survey, 42% of SOCs dump all incoming data into a SIEM, often without a retrieval or management plan. For enterprises ingesting terabytes daily, costs can reach millions annually. Driving up costs even higher, many SIEM vendors charge a premium for longer retention periods on top of the base volume pricing, making long-term storage "financially unsustainable". The practice of storing all logs for years on these platforms exacerbates this financial burden.

The Exponential Growth Problem: Log volumes continue to skyrocket year-over-year, continuously pushing organizations into higher pricing tiers and license overages. Survey data from Wasabi indicates that 62% of organizations exceeded their budgeted cloud storage spending in 2024, compared to 53% in 2023. This exponential growth compounds the pricing problem, creating budget overruns that force uncomfortable trade-offs between comprehensive logging and fiscal responsibility. In some cases, this leads to risky decisions like intentionally excluding entire data sources, such as firewall logs, to manage costs. This creates significant infrastructure gaps, hindering the SIEM's ability to make effective correlations and leaving organizations with security blind spots.

The Resupply Nightmare: While many organizations recognize the high cost of using a SIEM as a long-term archive, the alternative of self-managing a separate data storage solution is often just as painful. Some have attempted to save money by moving older logs to cheaper storage tiers on platforms like AWS, GCP, or Azure. However, this creates a new set of challenges that can make the process more trouble than it's worth.

Without a purpose-built solution, interoperability between a self-managed archive and the SIEM is cumbersome. Security teams must build and maintain their own connectors and write custom queries to pull data back in, all with little to no guidance. This process is complex and time-consuming, and it's difficult to pinpoint the exact data needed for an investigation, leading to situations where too much or too little data is resupplied. The resupply process itself can be expensive and take weeks.

For some organizations, the wait time for archived data to become queryable can be anywhere from 2 to 4 weeks. This reality, coupled with the hidden costs of data retrieval and complex billing structures, often leads security teams to simply skip the self-managed archive and revert to using the SIEM as their de facto data warehouse, despite the financial burden.

The Value Paradox: Here's the kicker: a 2025 Red Canary Survey finds that only 35% of data stored in legacy SIEMs delivers tangible value for threat detection. Organizations are paying premium prices to store massive volumes of noisy, low-value data that generates false positives and wastes analyst time. Meanwhile, the truly actionable security data gets buried in this haystack of historical logs.

Realm's Perspective: A Smarter Approach to Security Data

Realm believes that SOC teams should focus on what truly matters: actionable data in the SIEM for real-time threat detection. Everything else, the noisy, low-value data needed for compliance and long-term investigation, should be routed to a separate, cost-effective, and structured archive.

Realm’s Data Haven module offers a fundamentally different approach to this problem, transforming the archive from a cold, inaccessible repository into an active, analyst-friendly resource.

Realm enables you to:

Resupply Without Worry: With Realm, you can confidently filter and enrich data before it ever hits your SIEM. This allows you to route only the most valuable, security-relevant data for real-time detection, while the rest is archived. You don't have to worry about data gaps or incomplete investigations, as Realm is ready to resupply layers of context upon request from the archive.

Seamless Data Resupply: When a deeper investigation requires bringing archived data back into the SIEM environment, Realm handles the heavy lifting. You don't have to learn a new query language or write complex regex fields. Instead, Realm guides your retrieval by key elements of the data, such as IOCs, time ranges, and source products, without you having to write a single line of code or complex query. The data is automatically normalized across different tool formats, ensuring that retrieved logs integrate seamlessly with existing security workflows. This eliminates the compatibility and complexity issues that plague traditional archive solutions, making resupply a fast and straightforward process.

Normalize Data on Ingestion: All tools provide data in different formats, but Realm normalizes and structures archived logs from different sources. This ensures that when the data is retrieved, it's immediately usable for forensic analysis, eliminating the need for analysts to manually parse and clean raw logs.

Realm's Perspective: A Smarter Approach to Security Data

Your SIEM's primary purpose is to deliver real-time security. Its value is not in storing every log for years but in providing instant, actionable insights. By separating real-time data from long-term archives, you can optimize both functions without compromise.

This approach also fundamentally changes the painful resupply process. With Realm, you can eliminate the need for an analyst to submit a ticket and wait "2-4 weeks before the data can be queried". Instead of a costly and time-consuming process, resupply becomes a seamless operation. An analyst can supply a feed, a timeframe, and a machine name, and Realm easily resupply the SIEM. This means you can obtain the full context you need to investigate a security event, pulling from the archive and resupply only a small subset of the dataset into the SIEM. The data is already structured, normalized, and actionable, making it immediately ready for queries.

Realm's approach cuts costs, reduces the noise in your SIEM, and makes compliance effortless, all while empowering your security team to investigate faster and smarter. To learn more, schedule a demo of Realm.Security.

The post How SIEMs Became Accidental Archives appeared first on Realm.Security.