OpsMill

An Introduction to Infrahub Sync (With FAQs)

Wim Van Deun — Tue, 10 Mar 2026 12:38:50 +0000

Your network devices are tracked in NetBox, circuit information is in another system. ServiceNow has your assets. IP Fabric knows what's actually running. Somewhere in there, someone has a spreadsheet that's supposed to tie it all together.

Sound familiar?

If it does, it's because that's how most infrastructure teams work: with data scattered across a bunch of different platforms, tools, and data stores. Each system holds part of the story but none of them talk to each other. So when you need to configure a device, generate a report, or troubleshoot an outage, you're stuck manually aggregating data from multiple sources.

Data fragmentation bogs down every project with time-consuming data detective work. Even worse, data silos also break automation, create security gaps, and leave you constantly questioning which system actually has the right information.

Infrahub Sync tackles this problem head-on. It's an automated data synchronization tool that unifies infrastructure data across the organization, without forcing you to rip and replace the tools your teams already rely on.

Infrahub Sync in a nutshell

Infrahub Sync is a Python CLI tool that automatically synchronizes data between infrastructure platforms like NetBox, Nautobot, IP Fabric, Peering Manager, and Infrahub.

Instead of writing custom scripts or maintaining manual update processes, you define your data flows in a simple YAML configuration file. Infrahub Sync handles the rest, using intelligent diff calculation to move only what's changed between systems.

The tool solves a specific problem: keeping infrastructure data consistent across the multiple systems you need to run your network. It doesn't try to replace your existing tools. It makes them work together.

Built on the open-source DiffSync library, Infrahub Sync provides idempotent synchronization. That means you can run it repeatedly without worrying about duplicate data or unintended changes.

It tracks data lineage so you always know where information originated. And it integrates with Infrahub's branching model, letting you test synchronizations before applying them to production.

Why data silos break infrastructure automation

When data lives in disconnected systems, three things suffer: speed, reliability, and trust.

As an example, when new devices are deployed, the network ops team needs those devices added to their systems for monitoring and logging. But if the handover process between teams relies on manual updates, devices get missed. Missed devices aren't monitored. Security scans don't run and access policies aren't applied.

Configuration generation is another example where scattered data can hurt. You're pulling device models from one system, IP addressing from another, and service information from a third. If any of that data is stale or missing, your automation fails. Or worse, it succeeds with the wrong configuration.

Business intelligence becomes nearly impossible with siloed data. You're often stuck aggregating data manually when you need to answer basic questions about circuit costs, device utilization, or service deployments. The people who need that information most—finance, leadership, security teams—can't get it without help from someone who knows which system(s) to check.

The traditional workarounds don't scale. You can document sync processes in Word files or runbooks that engineers are supposed to follow. But people are people: They skip steps, they forget steps, or they ignore the docs completely. And since no one tracks which system should be authoritative for each piece of data, even if the syncs are done, the conflicts pile up.

Custom scripts fill some gaps but they can be brittle. When a vendor changes its API or your schema evolves, those scripts tend to break. Now you're maintaining integration code instead of building automation.

How Infrahub Sync works

Infrahub Sync replaces manual processes and custom scripts with a configuration-driven approach.

You define three things in a YAML file: your source system, your destination system, and how fields map between them. The tool generates the underlying Python adapters automatically, so you don't need to write integration code.

The workflow breaks into three commands:

Generate creates DiffSync adapters and models based on your configuration. You run this whenever you update your config file.
Diff compares the source and destination, showing you exactly what will change before you apply anything. This preview step is critical when you're testing a new sync or troubleshooting data inconsistencies.
Sync applies the changes, moving only what's different between the two systems. Because it's idempotent, running the same sync twice won't create duplicates or cause drift.

Field mappings can be one-to-one (e.g., device name to device name) or transformed using Jinja2 expressions. This ability to transform data during a sync solves the headache of formats that don't align perfectly between systems. For example, you might need to lowercase hostnames, combine multiple fields into one, or convert a float to an integer to make things congruent.

Using filters, you control what data gets synchronized. You can, for example, sync only network devices (excluding PDUs and patch panels), only devices in specific regions, or only objects that meet particular criteria. This keeps your destination system clean and focused on the data you actually need.

Relationships between objects are preserved through the sync process. When you synchronize devices, their connections to sites, racks, device types, and tags come along. Infrahub Sync respects the order you define, ensuring parent objects exist before children that reference them.

Data lineage (aka metadata) captures where each piece of information originated. When you're looking at a device in Infrahub that was synced from NetBox, you can see that source explicitly. Metadata is helpful when troubleshooting discrepancies or determining which system should be authoritative.

Data syncs through Infrahub Sync are branch-aware. That means you can review and test syncs in Infrahub branches before moving any data to production. This adds a safety layer that manual syncs and direct API updates can't provide.

Supported platforms and adapters

Infrahub Sync currently supports synchronization with NetBox, Nautobot, IP Fabric, Peering Manager, LibreNMS, Observium, Slurp'it, Prometheus, Cisco ACI, and generic REST APIs. You can also build local adapters for systems that don't have built-in support.

The direction of the sync depends on the adapter. Some support bidirectional flows (you can sync to and from Infrahub), while others are currently one-way. NetBox to Infrahub is one-way, for example. Peering Manager supports bidirectional synchronization.

The most common patterns we see:

NetBox or Nautobot to Infrahub for brownfield migrations. Teams bring existing device, circuit, and IPAM data into Infrahub while keeping their current tools operational during the transition.
Slurp'it or IP Fabric to Infrahub for network discovery. These tools crawl your live infrastructure and generate device inventories. Syncing that discovered data into Infrahub gives you a starting point for modeling your network.
Infrahub to Peering Manager for BGP automation. You define peering policies and relationships in Infrahub, then push that intent to Peering Manager for configuration generation.
Multiple sources to Infrahub for unified data. You might pull device inventory from one tool, circuit data from another, and business service information from a third. Infrahub becomes the aggregation point where all that context comes together.

Common use cases for Infrahub Sync

Brownfield data migration: You've decided to adopt Infrahub, but you already have years of device, rack, and circuit data in NetBox. Infrahub Sync lets you bring that data over without manual exports and imports. You maintain your NetBox instance while gradually expanding your data model in Infrahub. As you build out new schemas for services or business logic, the existing infrastructure data is already there, properly synchronized.
Mergers and acquisitions: When companies merge, you're typically dealing with multiple network inventories built on different tools and methodologies. Infrahub Sync lets you aggregate these disparate inventories into a single view without forcing immediate migration away from existing tools. You can reconcile conflicting data, identify gaps, and gradually build a unified source of truth while maintaining operational continuity during the transition.
Multi-system aggregation: Your device inventory lives in a CMDB. IP addressing is managed in an IPAM tool. Monitoring data comes from IP Fabric or LibreNMS. Each system serves a purpose, but automation needs data from all of them. Infrahub Sync pulls the relevant pieces into Infrahub, creating a unified view without forcing you to migrate away from existing tools. You maintain clear data lineage so everyone knows which system owns which data.
Real-time operational state: You're managing configuration intent in Infrahub but operational state lives in your monitoring and observability tools. Infrahub Sync can bring device status, interface states, or performance metrics into Infrahub periodically. Your automation can then make decisions based on current network conditions, not just intended state. This matters for workflows like automated remediation or capacity planning.
Service provider automation: If you're running an ISP or managing infrastructure as a service, customer information lives in business systems while network data lives in technical tools. Infrahub Sync can bridge that gap, pulling customer records or service definitions into Infrahub where they can be connected to peering sessions, circuits, and device configurations. This creates the context you need for accurate service provisioning and billing.

Infrahub Sync FAQs

Does Infrahub Sync replace my existing tools?

How to Time Travel on Your Network With a Temporal Graph

Damien Garros — Mon, 09 Mar 2026 20:32:18 +0000

If you've ever tried to debug a network outage with only a snapshot of the current configuration, you know the frustration. You can see how things are, but not how they got there.

Most systems that manage infrastructure data keep only the latest version of the truth. Yesterday's routes, last week's VLAN definitions, or the previous ACL are nowhere to be seen.

A temporal graph changes that. It makes time part of your data model, letting you move backward and forward through your infrastructure's history.

Instead of overwriting what used to be true, a temporal graph records every change and connects it to the state that came before. That small shift—from replacing to recording— unlocks an entirely different way to manage infrastructure.

The problem with "now" in data management

Most databases are built for immediacy. When a switch configuration or device attribute changes, the old value is overwritten.

From a database perspective, that's efficient. Keeping fewer records and less data means a smaller database size over time.

But from the perspective of an automation engineer, it's a liability. Deleting data creates gaps in the record that makes managing and automating infrastructure harder.

When a rollout breaks connectivity, you can't reconstruct the exact configuration that was working yesterday. When the network drifts from design intent, you see the current state but can't query what the network looked like at any earlier point in time

Infrastructure isn't static yet our typical data models pretend it is. Temporal graphs bring network change back into the data picture.

What a temporal graph really does

Think of a knowledge graph as a map showing every device, service, and relationship in your environment.

A temporal graph adds a timeline to that map. Every node and relationship knows not only what it connects to, but when that connection was valid.

That means you can query your infrastructure as it existed at any moment:

- "Show me the topology before we deployed version 2 of the automation pipeline."
- "Highlight every interface that changed between Tuesday and Thursday."

Instead of relying on change logs or diffs, the history lives in the data itself. You're storing both the facts and their evolution.

Temporal graph vs. change logs

A change log is sometimes confused for a temporal graph but the two are fundamentally different and solve different problems.

A change log records events: what changed, when, and by whom. It's a sequential record of operations like inserts, updates, and deletes, and it's genuinely useful for auditing. You can scroll back through it and see that a BGP neighbor was removed on Wednesday, or that a VLAN definition was updated last month.

But a change log doesn't give you a picture. If you want to know what your network looked like last Tuesday, you can't query that directly. You'd have to start from some earlier snapshot, collect every change that happened between then and Tuesday, apply them in order, and reconstruct the state yourself.

A temporal graph does something different. Instead of just storing the operations, it stores the resulting state at every point in time as directly queryable data. So you can ask, "What did my network look like on Tuesday at 2pm?"and get a complete, immediate answer.

The distinction between a temporal graph and a change log makes a big difference in operational flow. With a temporal graph, debugging a production issue or validating a rollback starts with a single query. There's no reconstruction or piecing together required. That kind of investigative ease and speed is critical for managing network automation at any kind of scale.

The power of immutability

A temporal graph is built on a simple but radical idea: immutability. Immutable data is data that never changes and never dies. Once it's written, it's stored forever.

When something changes—a route update, a device replacement, a schema tweak—the new information is added to the graph as a separate, time-stamped node. The old version remains untouched.

Immutability gives infrastructure engineers three major advantages.

First, it makes the record trustworthy. You know that yesterday's data is exactly what was true yesterday, unedited by later changes.

Second, it makes cause and effect traceable. You can follow a configuration's path from its creation through every revision.

Third, it stops teams from tripping over each other. When past states are immutable, parallel updates can coexist safely without overwriting each other's work.

Multiple timelines at once

Immutability is what makes a temporal graph possible. Instead of overwriting state, every change is preserved as part of a continuous timeline.

Infrahub extends that model further, into what's called a multi-temporal graph, where multiple independent timelines can exist simultaneously.

This multi-temporal capability means Infrahub can create independent and isolated branches without having to copy the entire data set into each branch. The branches share the same immutable baseline and only record what diverges.

This makes branches in Infrahub fast to create and lightweight to maintain, meaning you can have hundreds of branches open simultaneously without any performance lag.

Why temporal graphs matter

Temporal graphs have the power to change how automation teams operate.

At a practical level, they deliver the kind of traceability engineers have wanted for years: the ability to query exactly what your infrastructure looked like at any point in the past.

Because each version is immutable, you can test automation workflows, model upgrades, or validate schema changes against historical states. Rollbacks are precise and fast: revert to any previous state, not a backup from days ago.

But the deeper value is perspective. Over time, your data becomes a living record of the network's evolution, explaining not just where you are but how you got there.

Instead of a black box of data that no one wants to touch or use, you've got a trusted data foundation that can power any downstream automations or AI workflows.

Temporal graphs turn infrastructure state into institutional memory, a repository of knowledge that strengthens with every update instead of fading over time.

Temporal graphs in practice for automation

Infrahub takes the temporal graph and makes it fully operational for infrastructure data.

Every update in Infrahub is stored as a discrete, immutable value on the infrastructure graph. Instead of overwriting data, each change is appended to the timeline, so you can query the exact state of your infrastructure at any point in the past.

Infrahub also captures data lineage, also known as metadata, about what changed, when, and by whom. This gives you a complete audit trail that doesn't exist in a standard temporal graph implementation.

By the way, the graph model makes this kind of metadata naturally expressible. Rather than adding new tables and joins to capture context about a change, you simply add nodes. Who made the change, what triggered it, and how it relates to other changes in the graph are all first-class data, stored the same way everything else is.

Because Infrahub supports Git-like workflows, the history is built in natively, not bolted on. You branch at the data layer, not in code or configuration files.

One team can model a new topology while another tests automation logic, each in its own branch, all sharing the same underlying truth. When ready, you merge those branches just as you would in Git, preserving the full record of what changed, who changed it, and when.

Temporal awareness also powers Infrahub's diff view. Because every change is recorded with a timestamp, you can compare the state of your infrastructure at any two points in time. Combined with Infrahub's schema-driven design, this makes change auditable, reversible, and safe.

Controlling time is the ultimate power

Infrastructure management has always been about control: controlling change, risk, and complexity. Temporal graphs add one more kind of control: the control of time.

By preserving every state instead of overwriting it, you gain the power to understand your infrastructure not just as it is, but as it was and how it got there.

Infrahub builds that power directly into its core with an immutable, temporal knowledge graph. Every change, every branch, every merge keeps the story intact.

Don't overwrite your infrastructure history. Let it guide you instead

The post How to Time Travel on Your Network With a Temporal Graph appeared first on OpsMill.

Knowledge Graphs for Infrastructure Data Explained

Damien Garros — Mon, 09 Mar 2026 20:31:50 +0000

Network automation has typically lagged behind in data management innovations.

We’ve wrestled with spreadsheets, fought with source of truth tools, cursed CMDBs, and banged our heads against 5,000-line JSON files.

Yet the answer to our data woes has been hiding in plain sight the whole time. All we needed to do was look at other industries to find interesting and workable data management techniques that are easily applied to networking.

For more than a decade, industries as varied as retail, pharmaceuticals, and financial services, have been using a powerful data model called a knowledge graph to work with densely interconnected data.

For the last 8 years, I’ve been championing the idea of a knowledge graph for infrastructure data as well. Early on, reactions to my message were mostly confusion: “Do you mean a source of truth?”

But the times they are a-changin’, as Bob Dylan famously said. Every day I hear more and more practitioners and vendors in infrastructure management talk about knowledge graphs.

This shift is extremely encouraging to see because I’m as convinced as ever that a knowledge graph is the only way to effectively manage infrastructure data and automation.

To understand knowledge graphs, it also helps to understand a handful of closely related but distinct ideas such as schema, ontology, and semantics, as well as the differences between graph databases and relational databases.

In this post, I’ll walk through the definitions of each of those key concepts, show you how they interact and relate to each other, and why, when they come together, they’re so powerful for managing infrastructure data.

What is a knowledge graph?

A knowledge graph is a data model that represents entities and the relationships between them as nodes and edges.

Each node represents a thing—like a router, service, or VLAN—and each edge represents how those things connect. But the real power comes from the fact that those relationships carry meaning.

In other words, a knowledge graph doesn't just record that two objects are linked. It encodes why they are linked and how that link matters. This makes it a foundation for reasoning, inference, and automation.

A related idea is the temporal graph, which adds time as a dimension. This lets you see not only how your infrastructure is connected, but how it was connected last week or last year. It's critical for auditing, change analysis, and version control.

The traditional alternative to a knowledge graph data model is the relational data model. A relational data model, just like the relational databases that carry its name, represents data in tables and rows instead of nodes and edges.

What is a graph database?

Even though the terms knowledge graph and graph database both contain the word graph, they are not interchangeable terms.

A knowledge graph is a data model. A graph database is a storage and query engine.

A graph database stores the nodes and edges of a knowledge graph natively, making relationships first-class citizens. That means navigating connections is simple, fast, and scalable, whether it's five devices or five million. And adding a new node type or relationship doesn't require restructuring any existing data.

If you've seen me demo the behind-the-scenes database in Infrahub, you've seen this in action. When I open a topology view and query all the relationships from a single object, I'm not pulling from a table join. I'm navigating a living model that understands the data's intent.

This is a big leap from relational databases, where data is stored in rows and columns, and relationships have to be constructed by adding fields to each table to connect them.

To be clear, it's entirely possible to store a knowledge graph data model in a relational database but it will be prone to clunkiness because of the table-based storage. A graph database, in contrast, is specifically suited to storing knowledge graphs.

For infrastructure data, that matters. Networks are inherently relational: devices connect to interfaces, interfaces belong to systems, systems serve applications, and those applications depend on business logic. Graph databases map that reality natively.

What is a schema?

A schema is the blueprint for how data is organized and described within a system. It defines the types of entities that exist, their properties, and how they relate to each other.

In a relational database, the schema will specify things like tables, columns, and data types. In a graph database, the schema defines node types, relationships, and the attributes they can hold.

One of the reasons graph databases are so flexible is that the schema can evolve naturally over time. You don't need to predefine every possible field or table. You can start simply and add new node types or relationships as your understanding of the domain grows.

What is ontology?

Ontology defines and categorizes the things in a system so both people and machines can understand their relationships.

While the schema defines how data is organized, the ontology goes further by defining what those entities mean and how they should relate. It describes the categories of things that exist in a system and the logic that connects them.

For example, in an infrastructure ontology, you might define that "every interface belongs to a device" or "a VLAN is part of a network segment."

And now you might realize that ontology is another word for what we in infrastructure automation call business logic.

What is semantics?

Semantics is the layer that tells the system what relationships mean once they've been defined by the ontology.

If the ontology sets the rules for what types of entities and connections can exist, semantics defines how to interpret those connections in context. Put another way, ontology is the rulebook, semantics is the understanding.

Where ontology might define that a device can connect to a service, semantics tell you that a device powering a service implies dependency, or that one interface backing up another represents redundancy.

This interpretive layer is what allows the system (and by extension, AI or automation engines) to reason about cause and effect. It's how the graph knows that if Device A powers Service B, then taking A offline would impact B.

Semantics turn the graph from a structure into logic. It can infer new relationships, detect anomalies, and simulate change impacts. Without semantics, you have connections. With semantics, you have understanding.

Anatomy of a knowledge graph

Here's how these related data concepts come together in practice.

The knowledge graph is the outermost concept. It's the data model that frames everything. The graph database is the storage and query engine that makes the model practical to work with.

Inside the database are the components that define, constrain, and bring meaning to the data. The schema defines the data structure, the ontology specifies the rules for how data relates, and semantics layer on to define what those relationships mean.

One important nuance: the ontology doesn't have to live inside the schema, the way we've shown it in the diagram. In fact, in most systems, it doesn't.

When schema and ontology are separate, we call that a general-purpose schema. The schema handles structure, constraints, and object-level integrity but no more. The ontology or business logic is embedded in application code, where it's implicit and invisible. This makes the ontology difficult for anyone else to read or reason about.

A domain-specific schema pulls the ontology into the schema itself, making the business logic explicit and accessible for adding further validation to the data.

Infrahub uses a domain-specific schema. When you define your schema in Infrahub, you're also defining your ontology.

The knowledge graph advantage for infrastructure

For most of the automation industry's history, infrastructure data has lived in silos—one tool for IP address management, another for cable management, another for device inventory. Reconciling those datasets is a constant exercise in frustration, and any question that crosses system boundaries is nearly impossible to answer reliably.

Solving that problem starts with bringing all the data into one place. But infrastructure data sets are huge. They're also interconnected in complex ways.

A relational model built on rows and columns will struggle to represent that web of relationships effectively, especially at scale.

The knowledge graph's node/edge structure, however, is purpose-built for capturing entities and the relationships between them, without losing relationships, forcing data into shapes it doesn't fit, or bogging down at scale.

For these reasons, a knowledge graph is really the only practical data model for representing infrastructure data.

Once you add a graph database to store the data, and a domain-specific schema to shape and validate the data, you now have something that has never existed before: a system that understands your infrastructure data from end to end.

Knowledge graphs are AI-ready

This combination is also, it turns out, exactly what AI needs to succeed.

AI agents are only as useful as the data they can access and reason about. Give an AI agent a collection of spreadsheets or a web of disconnected tools, and it will struggle to answer questions that cross boundaries because the connections simply aren't there to follow.

A knowledge graph changes that. Because the data lives in a single model, and relationships are explicit and meaningful, an AI agent can traverse them to understand context, trace dependencies, and reason about impact. The graph database makes that traversal fast and reliable at scale. And a domain-specific schema ensures the AI is reasoning about accurate, consistent data.

This is why knowledge graphs have become the data model of choice to enable AI across industries, and why building your infrastructure automation on a graph puts you one step ahead.

Explore the knowledge graph of your infrastructure

Other industries learned the power of knowledge graphs years ago. Now it's time for infrastructure automation to reap those same benefits.

A knowledge graph alone is only the starting point. What makes it transformative for infrastructure is what you build on top of it: a graph database that stores and traverses relationships natively, and a domain-specific schema that encodes the rules of your domain explicitly.

That's the combination Infrahub is built on. The result is an infrastructure platform that doesn't just store your data but understands it. A platform that can tell you not only what your infrastructure looks like today, but how it's connected, what depends on what, and what the downstream impact of any change will be.

If you're ready to see what that looks like in practice, download the Infrahub Community edition from GitHub to start building your own infrastructure knowledge graph. Or contact our sales team to learn more.

The post Knowledge Graphs for Infrastructure Data Explained appeared first on OpsMill.

AI Is the New Compiler

Damien Garros — Thu, 26 Feb 2026 21:25:16 +0000

Fun fact about me: I didn’t start out studying computer science. Instead, my bachelor was in electronics.

Part of the electronics curriculum (I’m dating myself here!) was learning assembly. Assembly is the lowest-level programming language humans can reasonably write. It consists of cryptic instructions like MOV AX, 0x1234 that directly manipulate a computer's processor and memory. I can tell you from experience it is brutally difficult to learn and read.

Today, almost no one learns assembly, not even in electronics programs. It's been completely abstracted away. That abstraction happened through compilers, tools that take human-friendly code and transform it into machine-executable instructions.

A Python developer writes total = sum(numbers) and the compiler handles the tedious work of translating that into the hundreds of assembly instructions needed to make it happen.

Now we're watching the same pattern repeat, just one level higher. AI is the new compiler.

What is a compiler?

To understand where we're headed, it helps to understand what compilers actually do. They're essentially abstraction engines that let humans work at higher levels of thinking.

When you write Python code, it goes through several transformation stages:

Your source code becomes an intermediate representation (IR)
That IR gets optimized and transformed
It becomes assembly language
Finally, it becomes machine code that executes on your processor

Each part of this transformation serves a specific purpose:

Portability: Your Python code runs on Intel x86, Apple Silicon, and ARM processors without modification. The compiler handles architecture-specific details.

Optimization: Tools like LLVM apply dozens of optimization passes at the IR level, like dead code elimination, loop unrolling, and constant folding. These are far easier to do on structured code than on raw instructions.

Debugging: When something breaks, you see a Python stack trace instead of raw memory addresses. Each layer preserves enough information to help you understand what went wrong.

Composability: You can import libraries, use frameworks, and build on existing code. Each layer maintains interfaces that let software components work together.

This is how software has always evolved. React compiles to JavaScript, C compiles to assembly. Each generation of tools creates a higher level of abstraction, letting developers focus on what they want to build rather than how the machine executes it.

When agents write the code

Developer Wes McKinney recently pointed out that, until now, programming languages have been optimized for human readability and ergonomics. But if AI agents write most of the code in the future, what should languages optimize for?

McKinney’s answer, which I completely agree with: compilation speed, portability, and performance. Human-friendliness becomes less critical in the intermediate layers.

This changes everything. Compilers freed us from worrying about registers and memory addresses. Now AI, acting as a new compiler layer, is freeing us from worrying about syntax, algorithms, and implementation details. The stack now looks like this:

You describe what you want in natural language. AI "compiles" that into code. Traditional compilers handle the rest. It's the same fundamental concept that's driven programming for decades: raise the level of abstraction so humans can work at the level of intent rather than implementation.

The evolving language stack

Will this current four-layer stack collapse as AI takes over code generation?

The intuitive answer is yes. If AI generates code and machines execute it, why keep a human-readable layer in between?

We could skip Python and JavaScript entirely and have AI generate an optimized intermediate representation directly, something like LLVM IR or WebAssembly. This bytecode would be designed purely for machines: dense, fast to compile, and portable across architectures.

But I don't think that's how it will play out. And I'm not sure we'd want it to.

There's real value in keeping abstraction layers for AI to build on. It’s the same for programming languages. They represent decades of accumulated thinking about how to structure computation, handle errors, and compose complex systems. That's a foundation worth preserving.

The more likely future is that programming languages persist, but evolve away from human readability. Today's languages were optimized for humans to read and write. Tomorrow's may be optimized for AI agents to generate and compilers to consume, prioritizing fast compilation, portability, and performance over legibility.

These new intermediate languages might be as incomprehensible to most developers as assembly code is today. And that's fine.

Just as Python developers don't need to understand assembly, tomorrow's builders won't need to understand the code their AI compiler generates. Humans will continue to interact at the natural language level, and the layers below will increasingly become the domain of machines.

Could we skip even further and go straight from natural language to machine code?

Probably not. That would mean the AI compiler needs to understand every hardware architecture deeply, re-"compile" for every different processor, and replicate decades of optimization work that tools like LLVM provide. It would lose portability, optimization infrastructure, and debuggability. Those are all benefits that make intermediate representations valuable.

So my prediction is that the language stack won't get smaller but it will change significantly.

Plus ça change (aka we’ve been here before)

Every generation of developers has learned to trust the compiler. We’ve been defining at higher levels what we want to build, and we trust that the layer below handles the details correctly.

Assembly programmers had to trust C compilers. C programmers had to trust garbage collectors and interpreters. Now we're learning to trust AI as a compiler to take our natural language specifications and generate correct, efficient code.*

Programming has always been about translating human intent into machine execution. With AI, we’re adding one more layer to an age-old model to make that translation more natural.

When I was writing assembly code in electronics school, I had to think in terms of registers, memory addresses, and jump instructions. Today's developers think in terms of functions, objects, and data structures. My prediction is that tomorrow's builders will think in terms of outcomes, behaviors, and user experiences. And the AI compiler will handle the rest.

*Of course, we’ll need some time and evolution for AI to get to a place where we can implicitly trust it to do that compilation correctly.

The post AI Is the New Compiler appeared first on OpsMill.

How Eurofiber Cut Service Deployment from 5 Days to 15 Minutes

Jennifer Tribe — Mon, 23 Feb 2026 17:56:25 +0000

When Eurofiber Cloud Infra rebuilt an entire cloud from scratch, they faced a daunting challenge: how to quickly inventory everything from network backbone to virtual machines while laying the foundations for consistent service automation.

For a service provider like Eurofiber, which offers infrastructure-as-a-service, virtualization, and data center solutions across France, speed is revenue. The faster they can provision new services, the happier the customers and the faster they can start billing.

The challenge: A long chain to value

Eurofiber's infrastructure services span an unusually long value chain. They operate everything from the fiber connections between sites to the data centers themselves, with all the networking equipment, server clusters, and virtual environments delivered to customers. Each new service deployment touches multiple layers.

When considering tools to manage the complexity of this chain, Senior Cloud Architect Cédric Grard explains, “We needed flexibility to adapt to our use cases—and our use cases may change rapidly. On top of that, we had a strong desire to automate everything possible."

Option 1: Integrate multiple tools

Initially, Grard and his team considered stitching together three separate tools: NetBox customized with plugins, a legacy DCIM solution, and their existing Terraform and Ansible automation. Three different systems meant three licenses, three maintenance burdens, and the inevitable data silos and integration fragility that come with tools "that aren’t necessarily designed to work together very well."

With this setup, Grard realized his team would “end up spending more time making sure the automation process works instead of spending that time actually deploying infrastructure for customers.”

Option 2: Build for flexibility

When Grard discovered Infrahub, the core concepts immediately resonated. "It was exactly what we’d been searching for.”

In place of NetBox and the pricey DCIM, Eurofiber implemented Infrahub and used its native integrations with Terraform and Ansible to build consistent automated workflows. Now, Grard says:

"Infrahub is our source of truth, what we consider to be the desired state of everything that’s inventoried within it. It’s the entry point for any new service creation and the authoritative source of information for the entire infrastructure."

The tools previously being considered were limited in their ability to model data outside of core networking devices. They would have required a lot of customization to make work, and even then would still have forced Eurofiber to adapt their usage to tool limitations.

In stark contrast, Infrahub's flexible schema lets the team structure their entire technical infrastructure (including networking devices, servers, and virtual machines) plus their service layer (like connections to customer contracts) exactly how they need. “The model can be entirely modified in every way,” Grard notes.

The Git-native architecture has proven equally transformative. The branching system means team members can work on different changes simultaneously without stepping on each other's toes. Says Grard:

"What I like the most in Infrahub is the fact that it's code-based through Git... It's really a game changer for me that we’re able to design stuff without having to freeze production.”

The numbers that convinced leadership

Explaining technical architecture to executives can be challenging, but the Infrahub proof of concept spoke for itself. Working with the OpsMill team, Eurofiber simulated a typical deployment scenario both with and without Infrahub automation:

"Five days to deploy with the old fashioned way, and with the Infrahub automation process in place, it fell down to 15 minutes or so. That's the kind of thing that talks to everybody."

Today, when Eurofiber needs to provision customer infrastructure, they create a branch in Infrahub, define the platform and virtual machines, have it reviewed, and merge. A GitLab CI/CD pipeline automatically triggers, using OpenTofu and Ansible to provision based on the defined state. The same system manages the full lifecycle, from upgrades and changes to decommissioning.

The cornerstone of an automation system

With Infrahub, Eurofiber has found value beyond initial provisioning as well. The company uses Oxidized to back up configurations for all their network equipment, from switches and routers to firewalls and load balancers. Previously, a backup required manually updating static inventory files every time equipment was added or removed.

Now, they've implemented an artifact generator in Infrahub that dynamically produces device lists and connects those to Oxidized with a simple webhook. "We just have to add or remove equipment in Infrahub, and we can forget about the configuration backup because it's taken care of no matter what."

Grard's advice to other infrastructure engineers?

"Make Infrahub the cornerstone of your automation system. It’s much more than an inventory or simple source of truth. Its true power lies in its automation capabilities.”

Download case study in PDF version

Want to chat with our sales team about what Infrahub can do in your environment? Book a personalized demo.

The post How Eurofiber Cut Service Deployment from 5 Days to 15 Minutes appeared first on OpsMill.

Infrahub Profiles Manage Infrastructure and Network Standards at Scale

Wim Van Deun — Tue, 17 Feb 2026 16:28:40 +0000

Network and infrastructure teams face the persistent challenge of maintaining consistent configurations across thousands of devices while standards are continuously evolving.

Configuration drift is common. Devices start to diverge from documented standards and exceptions pile up without documentation. Teams lose track of which configurations are intentional versus accidental.

When things break, troubleshooting turns into guesswork, inconsistent policies create security vulnerabilities, and compliance audits become weeks-long digs.

Traditional approaches, such as Git-based workflows, database frameworks, and vendor
tools, all treat standards as one-time snapshots. Once infrastructure is deployed, evolving those standards means writing custom scripts or manually updating thousands of objects. Each approach creates more opportunities for drift.

Infrahub Profiles eliminate drift by design, maintaining persistent connections between standards and infrastructure.

When a Profile is changed, those changes propagate to your infrastructure automatically—no scripts, no manual updates, no missed devices. Exceptions are documented and stay discoverable through audit trails and change management processes.

The result: infrastructure that matches documented standards, evolves cleanly as requirements change, and provides complete audit trails showing what changed, when, why, and who approved it. Less configuration drift means fewer issues.

Does this sound familiar?

Consider a common scenario: A team manages a data center with hundreds of network devices. They've designed standard configurations for DNS servers, NTP servers, and syslog targets for each region. Their initial spine-leaf design includes specific port allocations for uplinks and downlinks, and some ports are reserved as spares.

Six months later, capacity planning shows they need to activate those spare ports across the entire deployment. But things have changed, and those network standards they wrote six months ago are no longer valid.

In the meantime, exceptions have already been made to the standards as they previously existed. What comes next is a pile of error-prone manual updates, and exceptions to the templates that must be manually checked against earlier sets of exceptions and overrides. If something gets missed, a deployment could break spectacularly.

How traditional approaches to managing network standards fall short

Template-based approaches

such as Git + Ansible or Terraform

What they do well: Initial deployment through templates and variables.
The limitation: Templates are snapshots, not living standards. Updates require manually re-executing automation across all infrastructure
The pain: You end up running through 20 different files to find and modify the template. You don't have a common place to see and maintain it. And there's no visibility into why a value has its specific configuration. Is it following the standard or does it need to be different? This lack of traceability makes troubleshooting and auditing extremely difficult.
Result: On top of all the manual work to update the template, you end up with exceptions to standards captured in a dozen override files with no history, context, or reason why they exist.

Database frameworks

What they do well: Track infrastructure state as structured, queryable data.
The limitation: Once objects are created, the connection to their template is severed.
The pain: Updating the build script has no effect on existing objects. You have to write custom bulk-update scripts repeatedly. What's more, standard data becomes centralized in ways that are hard for external systems to consume, limiting integration capabilities.
Result: Network standards are static artifacts, not living governance.

Configuration context systems

What they do well: Hierarchical configuration inheritance (global → regional → site).
The limitation: Configuration contexts are essentially free-form JSON data structures that don't integrate well with existing data in the system. They're not easily queryable through APIs and lack the capability to understand inheritance models or track where specific values originated.
The pain: Exception management becomes difficult because there's no easy way to identify where exceptions have been made or understand the reasoning behind them. You lose visibility into whether a configuration follows the standard or represents a necessary deviation.
Result: Without proper metadata and traceability, it's a struggle to maintain governance as infrastructure scales .

The root of the problem is merging the physical and logical

At the core of the limitations found in these approaches is a conceptual problem. They merge two distinct concerns that should remain separate: the physical and the logical.

The physical layout represents the unchanging reality of the hardware, e.g., this switch model has 48 ports. The logical utilization covers how that hardware is being used, e.g., ports 1-24 are server-facing, 25-48 are uplinks, 45-48 are spare.

When the physical and logical are merged in a single template or device type, changing the logical design requires touching every physical instance. Templates can't distinguish between "this never changes" and "this evolves with our design."

The result is infrastructure design that's frozen at creation time and can't dynamically evolve.

Infrahub Profiles are living network standards

Infrahub Profiles solve the challenge of managing infrastructure and network standards at scale by maintaining a persistent, intelligent connection between standards and the objects that inherit from them. Profiles separate the static physical definition from the dynamic logical layer, allowing infrastructure designs to evolve throughout their entire lifecycle.

Here are four crucial ways in which Profiles are different from traditional approaches to managing network standards:

Persistent lifecycle connection: This is the crucial difference. When an object is created using an Infrahub Profile, that relationship is maintained forever. If you change the Profile, the changes propagate automatically to all associated objects. Think of it like the relationship between a class and its instances in object-oriented programming. When you update the class definition, all instances reflect the change. But individual instances can still override specific methods when needed.
Separation of physical and logical: Infrahub separates the physical, defined in templates, from the logical, defined in Profiles. This separation means you can use the same physical device template in different contexts (spine vs. leaf vs. access) by applying different Profiles. When a design evolves, you update the Profile, not thousands of individual objects.
Implementation-agnostic data model: Profiles work at the schema level, not the rendered configuration level. For example, a network engineer defines DNS servers for US-East region once. Infrahub Transformations render the configs appropriately for IOS, NXOS, Junos, Linux—whatever the infrastructure uses. This flexibility is essential for modern multi-vendor environments. You shouldn't need three different "definitions" of the same logical standard just because they run on three different vendor platforms.
Exception management as a first-class concept: Real-world infrastructure always has exceptions, such as a legacy system that needs a different DNS or a compliance zone with additional security requirements. Profiles make exceptions visible and governable. Example: You override a Profile value at the object level. The Infrahub UI clearly shows "this value overrides the Profile." The branch-based workflow requires documenting why the override was made. The approval workflow can enforce a review before the exception is accepted. At any later date, the business justification for each deviation is still visible.

These Profile features are what enable automation at scale. You define the settings that need to be applied, not to individual devices, but to the logical arrangement of those concepts. Your automation is no longer a widget by widget thing.

The alternative to Profiles isn't just a lot of work. It's also infrastructure that can't evolve gracefully. To maintain network standards at scale you need to separate what doesn't change (physical layout) from what evolves (logical design), and make standards persistent, exceptions visible, and evolution automatic. That's what Profiles deliver.

Infrahub Profiles make the difference between version 1.0 of your automation stack that doesn't scale, and battle-hardened automation that handles real-world complexity.

Working with Infrahub Profiles

Define your standards

Create a Profile for any object type in your schema. For a regional data center, a team might define:

DNS servers: 10.1.1.1, 10.1.1.2
NTP servers: 10.1.2.1, 10.1.2.2
Syslog target: 10.1.3.1
Default SNMP community settings

Apply the standards

The most common application method is to assign Profiles to object templates. When devices are created from that template, they automatically inherit the Profile's values. But you can also apply Profiles directly to existing objects, individually or in bulk.

Create a modular, hierarchical composition

Apply multiple Profiles to the same object with priority-based resolution:

Global Profile (priority 100): Company-wide security settings
Regional Profile (priority 200): US-East DNS, NTP, logging
Site Profile (priority 300): Site-specific overrides

The highest priority Profile wins for any given attribute. This creates a natural organizational hierarchy without duplication.

Enjoy full visibility

Every attribute in the Infrahub UI shows its source. Examples:

Inherited from Profile: US-East-DC-Standard
Set by Template: Arista-7050-Spine
Explicitly configured (overrides Profile)

When troubleshooting, you can immediately understand where a configuration came from and whether it's standard or an exception.

Integrate change management

Profile changes go through Infrahub's branch-based workflow:

Create a branch.
Modify the Profile.
Preview exactly which objects will be affected.
Submit for review.
After approval, merge. Changes propagate automatically.

This workflow enables a complete audit trail and preview before impact. There are no moments of, "Oops, I just changed 1,000 devices!"

Real-world use cases for Infrahub Profiles

Standardizing regional infrastructure: Create a Profile for each region that defines DNS, NTP, syslog, and AAA servers, then assign it to device templates for that region. Every new device automatically inherits the regional standards. When it's time to migrate to new NTP infrastructure, updating a single Profile updates hundreds of devices, no scripting required.
Evolving network designs: Consider an initial spine-leaf design that reserves ports 45-48 on each leaf switch as spares. When network growth requires activating those ports, simply update the interface Profile from spare to server-facing. The change propagates across the entire deployment without custom scripts or manual updates.
Managing interface standards: Create role-based Profiles for different interface types. Example: Server-Port-Standard with 802.1X, port-security, and specific VLANs; User-Port-Standard with different authentication settings; and Uplink-Port-Standard for inter-switch links. When enabling 802.1X across a wired network, updating the Profiles updates thousands of ports automatically.
Compliance and audit: When auditors need proof of when security settings were applied and by whom, Profiles, combined with Infrahub's change management, provide complete audit trails. You can generate reports showing which devices follow standards, which have documented exceptions, and the approval history for each deviation—turning weeks of audit preparation into hours.

What's ahead for Infrahub Profiles

The current implementation requires manual Profile assignment. We're actively developing capabilities for automatic Profile application based on object characteristics. For example:

Auto-apply Profiles when a device is assigned to a specific region
Auto-apply interface Profiles when an interface role changes
Bulk Profile application to existing objects matching specific criteria

These enhancements will enable even more powerful workflows, where you can define standards once and let Infrahub ensure they're applied consistently as your infrastructure grows and changes.

Ready to get started with Infrahub Profiles?

To get a quick feel for how Profiles work, think about your most common configuration patterns—those DNS servers, NTP settings, or interface configurations that are repeated across dozens or hundreds of objects.

Pick one of those common standards and define a Profile for it in Infrahub. Next, apply that Profile to existing infrastructure objects.

Congratulations! You're now managing that pattern as a standard instead of individual configurations.

To dig deeper:

Dive into the Profiles documentation
Download the free Community edition of Infrahub and build some Profiles
Request a demo for a personalized review of how Profiles can help your team manage network standards at scale

The post Infrahub Profiles Manage Infrastructure and Network Standards at Scale appeared first on OpsMill.

An Introduction to the Infrahub Schema (With FAQs)

Wim Van Deun — Thu, 12 Feb 2026 15:32:24 +0000

Infrastructure teams face two critical problems with traditional source of truth and data management tools.

First, most schemas are too generic. They try to accommodate every possible use case so most fields end up being optional and validation becomes meaningless.

Second, most schemas are too rigid. They're locked deep in the database core, and often tightly coupled to the platform itself, so changes require heavy migrations and risk breaking everything downstream.

These challenges create real pain for infrastructure teams trying to build automation. You can't trust data that might be incomplete, and you can't adapt quickly when every schema change is a high-stakes operation.

The Infrahub schema is both well-defined and flexible. You get strict validation where you need it and the ability to evolve your schema alongside your infrastructure without breaking integrations.

Schemas in a nutshell

A schema defines how your data is organized. It specifies what types of objects exist, what attributes those objects have, and how they relate to each other.

The schema provides structure and integrity for your data. It enforces rules like "every device must have a name" or "an interface can only connect to one device." This enables validation, powers query engines, and helps you understand what data you have at any given time.

For more on schemas and how they work, see An Automation Engineer's Guide to Understanding Data Schemas.

The problem with generic schemas

When a schema tries to fit multiple use cases, data chaos generally ensues.

First, you'll have a lot fields you never use, but you'll also need to create a lot of custom fields for the data to be useful and relevant for your organization.

Second, when the schema tries to accommodate all the use cases, everything becomes optional. Take NetBox's device model as an example: the name field is optional. This rightly drives people crazy. How do you have a device without a name?

The schema becomes permissive to handle edge cases but you lose first-level validation.

The impact cascades through your automation. You can't trust that a device has a name so you write defensive code everywhere, checking for optional fields and edge cases. Your automation becomes brittle because the schema doesn't enforce the rules you actually need.

The problem with rigid schemas

Schemas in traditional source of truth tools are very hard to change for two reasons.

First, these tools are built on relational databases that store data in tables. When you make a schema change, you're modifying how data is fundamentally stored and queried in those tables, which makes the operation very intensive.

Second, in tools like NetBox and Nautobot, the schema is tightly coupled to the platform itself. That means you can't just modify a device model without potentially breaking the application and everything downstream, including plugins, APIs, and integrations that depend on the schema structure.

The combination of database constraints and tight architectural coupling makes schema evolution risky. Even small changes require careful planning and coordination.

Why flexible, well-defined schemas matter for automation

The combination of a schema that's both too generic and too rigid is particularly painful. You get a schema that doesn't validate what you need it to validate, and you can't update it without major operational risk.

Infrastructure doesn't stand still. Every new service, device type, or business requirement needs schema changes. If you can't extend your schema easily, you can't keep up with business demands.

But extension isn't enough. You also need schemas that are specific to your use cases. A router needs different mandatory fields than a patch panel, and a managed Wi-Fi service needs different attributes than a point-to-point circuit.

Service-level modeling makes this especially critical. If you're managing infrastructure as logical services rather than individual technical elements, you need custom schemas that reflect your business. No two organizations have identical service definitions, so flexibility isn't a luxury, it's a requirement.

Well-defined schemas are also critical for AI. Agents need clear, documented structures to understand and interact with your data. If every field is optional because the schema tries to serve 10 different scenarios, agents can't reason about what's required or what relationships actually mean.

You need a schema that's both flexible and well-defined.

How the schema works in Infrahub

In Infrahub, you define your schema as a YAML file. You describe the objects you want, their attributes, and the relationships between them.

Infrahub reads that schema and automatically generates the UI, API, and version control capabilities. You don't write additional code.

Every schema includes three components:

Structure defines your object types and their attributes.
Relationships specify how objects connect to each other (one-to-one, one-to-many, many-to-many).
Constraints set validation rules, required fields, and referential integrity.

Infrahub enforces your constraints. If you define that a device must have a name, Infrahub won't let you create a device without one. If you specify that an interface can only belong to one device, Infrahub blocks adding the interface to a second device. You get proper business logic enforcement.

But the schema isn't locked in the database core. Infrahub is built on a graph database, where the schema is decoupled from the platform. This means you can update the schema at any time without heavy migration operations or application restarts.

More importantly, Infrahub exposes artifacts (prepared configuration files, JSON payloads, Terraform configs) to your automation stack rather than raw schema data. This decoupling means integrated tools don't break when you evolve your schema.

Infrahub uses polymorphism to enable well-defined schemas

Polymorphism sounds complex but the idea is simple, and it solves a practical problem you've probably encountered many times: objects in the same category aren't always identical.

In many source of truth platforms, there's a generic "device" object that represents routers, switches, firewalls, patch panels, and servers all at once. The data model has to accommodate everything so most fields end up being optional.

But a router should have a management IP address and a patch panel shouldn't. If the model treats them both as generic devices, there's no way to enforce that rule. Unless you use polymorphism, like Infrahub does.

Polymorphism lets you define a base type (like Device), then create specialized types that inherit from it (like Router or PatchPanel). Each specialized type can have its own mandatory fields and relationships while still maintaining a simple relationship from other objects.

Polymorphism makes your data model more accurate and your automation more reliable. Instead of writing defensive code that checks whether optional fields exist, you define what's mandatory for each type. The schema enforces your business logic.

This is particularly important for design-driven automation, where you're modeling high-level services and business intent rather than just low-level configuration details. Different service types need different attributes, and polymorphism gives you a clean way to handle that.

Infrahub's graph database enables schema flexibility

Infrahub is built on a graph database. This choice of database matters more than you might think for the schema.

Relational databases like Postgres or MySQL organize data in tables. The schema is defined at the heart of the database, in what you might call kernel space if you're borrowing from Linux terminology. This creates rigidity since any schema changes modify the core structure of the data model.

Graph databases don't use tables. Instead, they use a much more flexible model where new objects and relationships can be added without changing anything about the existing data. The schema exists and it's enforced but it lives in user space rather than being baked into the database core.

Information is represented in nodes and edges in a graph database

For Infrahub, this architectural choice is fundamental. If the goal is having a schema that can evolve alongside your infrastructure, a relational database works against you. A graph database gives you the flexibility you need to extend or evolve your schema without sacrificing the validation that ensures data integrity.

Infrahub's version-controlled schemas allow safe collaboration

Version control isn't just for configuration files! It's for your entire infrastructure state, including the schema that defines how that state is structured. In Infrahub, schemas can be different on every branch.

Want to add a new device type with specific attributes? In a traditional system, you'd need to plan the change carefully, coordinate with everyone using the database, and hope you didn't break anything when you applied it to production.

In Infrahub, you simply create a branch. You modify the schema in that branch. You test it, validate it, make sure everything works as expected. When you're ready, you open a proposed change to merge your schema modifications into the main branch.

Infrahub applies the necessary migrations in your branch first. When you merge, those same migrations are applied to main. If something goes wrong, your production instance is never affected because the changes stay isolated in the branch.

This removes fear from schema evolution. Schema modifications become routine rather than high-stakes operations. It also means you can model just your core objects for a quick start, then easily add more over time.

Infrahub schema FAQs

Do I need to define the schema for my entire infrastructure to get started?

How to Build and Run Generators to Automate Network Provisioning

Suresh Vina — Wed, 11 Feb 2026 14:44:17 +0000

How often do you find yourself repeating the same steps when deploying something in your network?

Every time you need to add a new site, onboard a device, or provision a service, you end up creating the same set of objects, connecting them, and allocating resources. The steps are predictable, but you still have to do them manually each time.

This is exactly the problem Infrahub Generators are designed to solve. Instead of repeating the same work over and over, you define the logic once and reuse it whenever needed. You tell Infrahub what objects to create, how they should relate to each other, and what data to use.

The next time you need to deploy something, the Generator runs and creates the required resources automatically. You no longer have to worry about missing a step or doing things differently each time.

What we'll cover

In this post, we'll start with the basics of how Generators work. We'll look at the key components that make up a Generator and how they fit together. Then we'll walk through a simple example of automated network provisioning to see Generators in action.

This post is based on Infrahub v1.7.0, but you can follow along if you're on a different version. This post assumes you're somewhat familiar with Infrahub, the Python SDK, and the CLI tool infrahubctl.

If you're new to Infrahub, don't worry. You can still follow along. We'll keep the example simple, so it's easy to understand and follow.

How do Infrahub Generators work?

An Infrahub Generator is a piece of Python code that takes a high-level request and translates it into a full technical implementation. Spinning up a new site, provisioning a service or onboarding a customer, the Generator combines your input with the logic you define and creates all the necessary objects automatically.

For example, let's say you need to provision a dedicated Internet circuit for a customer. You may need to allocate a VLAN from your pool, create a BGP session with the customer's ASN, assign interface IPs, configure the edge router and so on.

Doing this manually, you'ld have to touch multiple components and create each object manually. With a Generator, you capture that entire workflow once.

The next time a customer orders a dedicated circuit, you just create the service request, and the Generator manages the rest. This could even be triggered automatically through a self-service portal or a service catalog, removing any manual steps entirely.

There are four main components that make up a Generator. We'll cover them in detail with examples as we go through this post.

The generator definition specifies what your Generator does. This is where you name your Generator, point to your Python file, and define which group it targets.
The group defines which objects trigger your Generator. When you assign objects to a Generator's target group, those objects become inputs for automation.
A GraphQL query defines the input data your Generator needs. Infrahub executes your query and passes the results to your Generator.
Business logic is where you define the rules for what gets created. This is the Python code that uses Infrahub's SDK to create objects, update relationships, and allocate resources.

When you run the Generator, Infrahub loops through each object in the target group, runs your query to fetch the relevant data, and then executes your Python logic to create the new objects.

You can also trigger a Generator individually for a specific object, either manually or via event triggers. With Generators, you define the logic once. The next time you need to deploy something, you just create the trigger object and run the Generator. Everything else is managed automatically.

Generators are designed to be idempotent, meaning we can run them multiple times without creating duplicates or inconsistencies. When a Generator runs, it updates existing objects as needed and removes anything that's no longer required. However, this depends on how we write the Python code. If we break idempotency in our logic, the Generator may produce unexpected results.

Example scenario for automated network provisioning

Imagine you work at Otter Bank, a small but rapidly growing bank. You and your team have been tasked with rolling out 100 new branch offices over the next six months. To deliver this project efficiently, you want to use a cookie-cutter approach and use the same type of hardware across all locations.

The design can change per site; one branch could have two ISPs, another might have just one. We might have a different number of switches or firewalls depending on the size of the office. But at the core, each branch typically follows the same design patterns. The steps are predictable, and we should have an automated way to provision them instead of doing everything manually each time.

To keep the example simple, we'll look at how to provision a new site with a parent prefix, two subnets, and a pair of firewalls with allocated management IPs. Once you understand how this works, you can expand the logic to handle more complex scenarios.

Our goal is that when we need to provision a new network site, we simply create an Infrahub branch (similar to a Git branch) and add the site we want. That's it. The Generator takes care of everything else.

Based on the logic we define, the Generator allocates a /16 parent prefix from Otter Bank's IP pool, creates two /24 subnets inside it for management and user traffic, provisions two firewalls, and assigns management IPs to each firewall. All of this happens automatically without any manual intervention.

For Otter Bank, this means you can focus on the physical rollout while the Generator handles all the IP allocations and device provisioning. Rolling out 100 sites in six months no longer means repeating the same steps 100 times. We just create the sites and let the Generator do the heavy lifting.

Set up the baseline for the Generator

First, we need a running Infrahub instance. We've covered Infrahub installation in multiple posts, so we're not going to go over the basics here. The Infrahub documentation has guidance if you need it.

You'll also need to install the infrahubctl CLI tool. This is a command-line utility that lets you interact with the Infrahub programmatically.

uv add 'infrahub-sdk[all]' 
source .venv/bin/activate 

export INFRAHUB_ADDRESS=http://localhost:8000 
export INFRAHUB_API_TOKEN="06438eb2-8019-4776-878c-0941b1f1d1ec"

Next, we will import some schemas. Infrahub has a Schema Library that provides pre-built schemas for common use cases. For this example, we'll import the base schema and a minimal location schema.

git clone https://github.com/opsmill/schema-library.git

infrahubctl schema load schema-library/base/ 
infrahubctl schema load schema-library/extensions/location_minimal/

Next, we'll create a supernet for all the branch sites. The idea is that we allocate a /12, and each site will get its own /16 from this parent pool. You can create this using a number of methods, but let's just do it from the Infrahub GUI by navigating to IPAM. For this example, we'll create 10.128.0.0/12 as our supernet.

Create a group for the Generator

Next, we need to create a group. As we covered earlier, the Generator runs against objects that are members of a specific group. Let's create a group called branch_office and later add our sites to this group.

You can create the group in multiple ways. To create it via the GUI, navigate to Object Management and then Groups. From here, select Standard Group as the Kind, give it a Label and leave everything else at the defaults.

You can also use infrahubctl to load it from a YAML file rather than creating the group via the GUI. To do that, create a file (objects/01_groups.yml) with the following content:

---
apiVersion: infrahub.app/v1
kind: Object
spec: 
  kind: CoreStandardGroup
  data:
    - name: branch_office

Then run the following command to load it:

infrahubctl object load objects/01_groups.yml

Create a resource pool for the Generator

Next, and finally for the baseline configuration, we need to create a resource pool in Infrahub's Resource Manager. The Resource Manager can automatically allocate resources from a pool. In our case, we want it to allocate the next available /16 prefix from the parent /12 pool whenever we provision a new network site.

When our Generator asks for a new prefix, the Resource Manager finds the next available /16 from 10.128.0.0/12 and allocates it.

To create a new pool, navigate to Object Management, then Resource Manager, and create a new IP prefix pool. We'll name it branch_office_16, set the default prefix length to 16, and add the 10.128.0.0/12 prefix as the resource.

Create a GraphQL query

For the Generator to do its job, it needs information about the target object it's operating on. The GraphQL query provides exactly that, fetching the relevant input details the Generator requires.

In our case, when the Generator runs for a site, it needs to know the site's name and shortname. The name is used to create descriptions and name the prefixes and pools. The shortname is used for naming the firewall. Without this query, the Generator would have no information about the site it's provisioning.

query SiteQuery($shortname: String!) {
  LocationSite(shortname__value: $shortname) {
    edges {
      node {
        __typename
        id
        name {
          value
        }
        shortname {
          value
        }
      }
    }
  }
}

The query (queries/SiteQuery.gql) takes the shortname as a parameter and returns the site's id, name, and shortname. This data is then passed to the Generator class, which uses it to create all the necessary objects.

Create the .infrahub.yml configuration file

So we created a GraphQL query, and we'll also work on the Generator class shortly. But before that, let's look at the configuration file that ties everything together.

The .infrahub.yml file is a central manifest that Infrahub uses to understand how all the pieces fit together. It tells Infrahub where to find the GraphQL query, where the Generator class lives, what group to target, and how to pass parameters between them.

Think of it as the glue that connects your query, your Generator logic, and your target objects. This file is also used for other purposes, like Transformations and checks. For now, we'll start the .infrahub.yml file with the query we created in the previous section.

---

queries:
  - name: BranchSiteQuery
    file_path: queries/SiteQuery.gql

So far, we've created the .infrahub.yml file and the GraphQL query. Here's how our directory looks at this point. (We're working on a directory called infrahub_generator)

infrahub_generator/
├── .infrahub.yml
└── queries/
    └── SiteQuery.gql

Build the Generator

Before we dive into the Generator code, let's first create the site we want to provision. Start by creating a new Infrahub branch. We'll call it new_london_site.

Navigate to Location and create a new site. Set the name to London and the shortname to LDN. The most important thing here is to add the site to the branch_office group we created earlier. Without this, the Generator won't pick up the site.

You can also automate this step using event rules and actions. For example, you can define a trigger rule that automatically adds a site to the branch_office group whenever a new site is created.

Now the fun part: working on the Generator class. This is where we define the logic for what gets created when we provision a new site. Here's what we want the Generator to do:

Allocate the next available /16 prefix from the top-level IP prefix pool (10.128.0.0/12) we created earlier.
Create a new IP prefix pool from that /16 so it can allocate /24 subnets from it.
Allocate two /24 subnets from the pool, one for Management and one for User subnets.
Create an IP address pool from the Management subnet.
Reserve the first two available IPs from the Management pool.
Provision two firewall devices and allocate two more management IPs for them.

The point of creating these pools is to make the allocation dynamic. We want to avoid creating anything manually or hardcoding values.

For example, if we have 10.128.0.0/16 allocated to a site, we could hardcode 10.128.0.0/24 for Management and 10.128.1.0/24 for User. But instead of doing that, we create an IP prefix pool from the /16 using the Resource Manager and then ask it to give us the next two available /24s. The pool handles the allocation for us.

The same goes for management IPs. Rather than manually assigning 10.128.0.3 to the first firewall and 10.128.0.4 to the second, we create an IP address pool from the Management subnet and let it allocate the next available addresses. This way, everything stays dynamic and consistent, no matter how many sites we provision.

Now, let's look at the Generator itself. Don't worry too much about the code. It's simpler than it looks, and we'll walk through it step by step.

generators/BranchSite.py

from __future__ import annotations
from infrahub_sdk.generator import InfrahubGenerator

DEFAULT_RESOURCE_POOL = "branch_office_16"

SITE_PREFIXES = [
    { "name": "Management Prefix", "status": "active", "member_type": "address", "prefix_length": 24, "pool": True},
    { "name": "User Prefix", "status": "active", "member_type": "address", "prefix_length": 24, "pool": False}
]

class BranchGenerator(InfrahubGenerator):
    async def generate(self, data: dict) -> None:
        location = self.nodes[0]
        resource_manager = await self.client.get("CoreIPPrefixPool", name__value=DEFAULT_RESOURCE_POOL)
        
        parent_prefix = await self.client.allocate_next_ip_prefix(
            resource_pool=resource_manager,
            identifier=location.name.value,
            data={
                "description": f"{location.name.value} Office Parent Prefix",
                "location": location.id,
                "status": "reserved",
                "role": "supernet",
                "member_type": "prefix"
            }
        )

        branch_office_pool = await self.client.create(
            kind="CoreIPPrefixPool",
            name=f"{location.name.value} Branch Office Pool",
            default_prefix_length=24,
            default_prefix_type="IpamPrefix",
            ip_namespace={"hfid": ['default']},
            resources=[parent_prefix]
        )

        await branch_office_pool.save(allow_upsert=True)
        
        for prefix in SITE_PREFIXES:
            office_subnet = await self.client.allocate_next_ip_prefix(
                resource_pool=branch_office_pool,
                identifier=f"{location.name.value} {prefix['name']}",
                prefix_length=prefix["prefix_length"],
                data={
                    "description": f"{location.name.value} {prefix['name']}",
                    "location": location.id,
                    "status": prefix["status"],
                    "member_type": prefix["member_type"],
                    "prefix_length": prefix["prefix_length"]
                }
            )
            if prefix["pool"]:
                ip_pool = await self.client.create(
                    kind="CoreIPAddressPool",
                    name=f"{location.name.value} {prefix['name']} Pool",
                    default_address_type="IpamIPAddress",
                    default_prefix_length=32,
                    resources=[office_subnet],
                    ip_namespace={"hfid": ['default']},
                )
                await ip_pool.save(allow_upsert=True)

                # Reserver first 2 IP addresses for gateway and network devices
                for i in range(2):
                    await self.client.allocate_next_ip_address(
                        resource_pool=ip_pool,
                        identifier=f"Reserved for {prefix['name']} {i+1}",
                        data={"description": f"Reserved IP {i+1} for {prefix['name']}"}
                    )
        
        # Creating two firewalls
        management_pool = await self.client.get("CoreIPAddressPool", name__value=f"{location.name.value} Management Prefix Pool")

        for i in range(1, 3):
            device = await self.client.create(
                kind='DcimDevice',
                name=f"{location.shortname.value}-FW-{i:02d}",
                status='active',
                device_type={'hfid': 'PA-440'},
                platform={'hfid': 'PAN-OS'},
                primary_address=management_pool,
                location=location.id,
            )
            await device.save(allow_upsert=True)

We start by importing the InfrahubGenerator class from the SDK. This is the base class that the generator inherits from. We also define a couple of constants at the top.

DEFAULT_RESOURCE_POOL is the name of the pool we created in the Resource Manager earlier (the top-level /12 prefix). We could also take this further by adding a relationship to the pool on the site object itself, making the Generator more flexible by allowing different sites to use different resource pools.

Next, SITE_PREFIXES is a list that defines the subnets we want to create for each site. Each entry specifies the name, status, prefix length, and whether we should create an IP address pool from it.

The BranchGenerator class inherits from InfrahubGenerator and implements the generate method. This is the method that Infrahub calls when running the Generator. It receives a data parameter which contains the result of the GraphQL query.

The first thing we do inside the generate method is access the target object. When the Generator runs, it populates self.nodes with the objects returned by the query. Since we're querying a single site, we grab the first item with self.nodes[0]. This gives us access to the site's attributes like name and shortname. Note that the GraphQL query must include id and __typename fields for this to work.

Next, we fetch the resource pool using self.client.get(). This retrieves the CoreIPPrefixPool we created earlier, called branch_office_16. With the resource pool in hand, we call allocate_next_ip_prefix() to get the next available /16 from the pool. We pass in an identifier (the site name) and additional data like the description, location, status, and role. When we specify an identifier, we can run allocate_next_ip_prefix() multiple times, and it will not allocate additional prefixes. The identifier ensures idempotency.

Once we have the parent /16 prefix for the site, we create a new IP prefix pool specific to this branch office. This pool will be used to allocate the /24 subnets. We use self.client.create() to create the pool object. We then loop through the SITE_PREFIXES list to create the two /24 subnets. For each entry, we allocate a prefix from the pool we just created. If the entry has pool set to True, we also create an IP address pool from that subnet. In our case, only the Management subnet gets an IP address pool. We then reserve the first two IPs from the Management pool by calling allocate_next_ip_address().

Finally, we provision the two firewall devices. Instead of manually allocating an IP address and then assigning it to the device, we can simply pass the Management pool directly to the primary_address field. Infrahub is smart enough to allocate the next available IP from the pool automatically. The device names use the site's shortname, so for London, they would be LDN-FW-01 and LDN-FW-02.

Update the .infrahub.yml file

Now that our Generator is ready, we can update the .infrahub.yml file to include the Generator definition. We add a generator_definitions section that tells Infrahub everything it needs to know about our Generator.

---

queries:
  - name: BranchSiteQuery
    file_path: queries/SiteQuery.gql

generator_definitions:
  - name: generate-branch-site
    file_path: generators/BranchSite.py
    query: BranchSiteQuery
    targets: branch_office
    parameters:
      shortname: shortname__value
    class_name: BranchGenerator
    convert_query_response: true

The name gives the Generator a unique identifier. The file_path points to the Python file containing our Generator class. The query references the GraphQL query we defined earlier. The targets specify which group the Generator should run against, in our case branch_office.

The parameters section maps the query variable to the object attribute. Here we're saying that the shortname parameter in our GraphQL query should be populated with the shortname__value from the target object. This is how Infrahub knows to pass the site's shortname when running the query.

The class_name tells Infrahub which class to instantiate from the Python file. Finally, convert_query_response set to true means the query response will be automatically converted into node objects, which is why we can access self.nodes in our Generator class. Here's how our directory structure looks now with all the pieces in place.

infrahub_generator/
├── .infrahub.yml
├── generators/
│   └── BranchSite.py
└── queries/
    └── SiteQuery.gql

Test and verify the Generator

Now let's test running the Generator and see it in action. There are multiple ways to run the Generator. Let's start with the infrahubctl command.

infrahubctl generator generate-branch-site shortname=LDN --branch new_london_site

This command tells Infrahub to run the Generator called generate-branch-site. We pass in the shortname LDN, which gets passed to our GraphQL query to fetch the London site. The --branch flag specifies which Infrahub branch to run against. Remember, we created the new_london_site branch earlier and added the London site to it.

When you run this command, the Generator kicks off and creates all the objects we defined in our logic. It allocates the /16 prefix from the parent pool, creates the branch office pool, allocates the two /24 subnets, creates the Management IP address pool, reserves the first two IPs, and provisions the two firewalls with their management IPs. All of this happens automatically based on the logic we captured in the Generator class.

And that's it. With a single command, we provisioned an entire branch office. All we had to do was create the site and let the Generator manage the rest.

Please note that we hardcoded the platform and device type in the Generator for the sake of simplicity. Ideally, you want to pass these as inputs or build logic to determine the right platform and device type based on the site requirements.

Think about what we just achieved. We allocated prefixes, created pools, reserved IPs, and provisioned firewalls without manually touching any of those objects. If you need to add more logic, like creating interfaces, adding more devices, setting up upstream ISP connections, or defining cabling, you just add it to the Generator. The next time you spin up a site, all of that gets created automatically.

Running the Generator in the Infrahub GUI

So, we've seen how to kick off the Generator from the CLI using infrahubctl, but there are other ways to do this. Our current directory structure with all the files can be put into a Git repository, and you can then add the repository to Infrahub. Once you add that repository to Infrahub, it will import the Generator and queries automatically. From there, you can trigger the Generator by just clicking a button in the GUI.

To do that, I created a Git repository in GitLab (it can be any Git provider), created an authentication token, added the token to Infrahub credentials, and then added the Git repository. You can add a Git repository to Infrahub by navigating to Integrations and then Git Repositories. (The Infrahub documentation has more detail about working with Git repositories if you need it.)

Screenshot

Now, if you open the Git repository we just added, you can see the Queries and the Generators listed. If you also have Transformations or Checks defined in your repository, they'll show up here as well.

Now let's go and create a new Infrahub branch and add a new site called New York with the shortname NY to that branch. Make sure to add the site to the branch_office group.

Then navigate to Actions, then Generator Definitions, and select the Generator we imported in the previous step. You'll see a Run button at the top right. Click on it and under Select target nodes, select the New York location we just created.

Finally, click Run Generator, and the Generator will kick off and create all the resources for the new site.

Screenshot

We can also run Generators by enabling events and actions. This allows the Generator to run automatically as soon as we create a site, without needing to manually run it.

Running Generators as part of Proposed Changes

There's yet another way to kick off a Generator: using Proposed Changes. A Proposed Change is similar to a pull request in Git. It lets you review changes made in a branch before merging them into the main branch. You can see what was added, modified, or deleted, and run checks and validations before the changes go live.

Using Proposed Changes, you can create an Infrahub branch, add a new site, then create a Proposed Change. The Generator will run automatically as part of the pipeline without you having to do anything. In the Tasks tab, you can see the Generator being executed along with other checks and validations.

Building from here

That's it for this post. We covered what Generators are, how they work, and walked through a simple example of automated network provisioning for new branch offices. The key takeaway is that Generators let you capture repetitive workflows once and reuse them whenever needed.

This was a simple example to get you started, but you can extend the logic to handle much more complex scenarios. Add interfaces, configure routing, and the possibilities are endless. As always, start small, experiment, and build from there.

The post How to Build and Run Generators to Automate Network Provisioning appeared first on OpsMill.

An Introduction to Infrahub’s GraphQL Query Engine (With FAQs)

Wim Van Deun — Tue, 10 Feb 2026 20:42:08 +0000

If you've spent time building network automation, you know the drill. You need to generate a configuration file, so you write a script that makes an API call to get device data….

Then another call for interface details. Then another for IP addresses. Then you realize you need VLAN info, so that's another call. Before long, you're stitching together 20+ API responses just to populate a single Jinja template.

Each call adds latency, and each response includes fields you don't need. Your code becomes a maze of loops and data transformations, and when requirements change, you're debugging integration logic instead of solving infrastructure problems.

The fundamental issue is that traditional REST APIs force you to work on their terms. You hit predefined endpoints that return fixed data structures. If you need related data, you make another call. If you need to filter or transform that data, you do it in your application code.

Infrahub takes a different approach with GraphQL. Instead of chasing data across multiple endpoints, you describe exactly what you need in one request and get back precisely that structure.

For infrastructure automation, where you're constantly pulling interconnected data about devices, interfaces, IP addresses, and services, this is a huge advantage.

GraphQL in a nutshell

GraphQL is a schema language, first developed at Facebook, that lets you specify the structure of the data you want to receive.

Instead of hitting multiple REST endpoints that each return fixed data structures, you send one request to a single GraphQL endpoint and get back precisely what you asked for.

Why Infrahub chose GraphQL as its query engine

Infrahub is built on a graph database, which means your infrastructure data is stored as interconnected nodes and relationships: devices connect to interfaces, interfaces have IP addresses, IP addresses belong to prefixes, and so on.

GraphQL was designed to query this kind of complex, interconnected data efficiently. That makes it a natural fit for infrastructure, where relationships matter just as much as individual objects.

What makes it particularly powerful in Infrahub is that the GraphQL API is dynamically generated based on your schema.

When you define your infrastructure data model in Infrahub, the system automatically creates corresponding GraphQL queries and mutations for you. You don't ever have to write API code or maintain endpoints. You define your schema, and Infrahub handles the rest.

If you add a custom object type—say, a new kind of network service or a specific device role—it's immediately available through GraphQL. Your custom data model gets a custom API, with no extra work required.

This schema-driven approach also means you get built-in documentation and type safety. Every query you write is validated against your schema, so you catch errors before they reach production.

The GraphQL advantage for AI

GraphQL's schema-driven design turns out to be particularly valuable as we move into an era where AI agents need to interact with infrastructure data.

The key advantage is that GraphQL provides machine-readable documentation through its schema. An AI agent can introspect a GraphQL API to discover what data is available, what types to expect, and how objects relate to each other, all without human guidance.

With a GraphQL interface, AI can discover available data and relationships automatically, curate the exact data slices it needs, and construct precise queries without over-fetching.

How the GraphQL query engine works in Infrahub

Unlike REST APIs that expose multiple endpoints for different resources, GraphQL uses a single endpoint: /graphql. Everything goes through that one entry point.

The GraphQL schema defines what queries are possible and what data types to expect. Because Infrahub generates this schema from your data model, you always know what's available and what structure your responses will have.

When you write a GraphQL query, you specify exactly which fields you want. Here's a simple example:

{
  InfraDevice {
    edges {
      node {
        name {
          value
        }
      }
    }
  }
}

This query retrieves just the names of your devices. Nothing more. If you also need interface information, you add it to the same query:

{
  InfraDevice {
    edges {
      node {
        name {
          value
        }
        interfaces {
          edges {
            node {
              name {
                value
              }
              status {
                value
              }
            }
          }
        }
      }
    }
  }
}

Now you're traversing relationships—from devices to their interfaces—all in one request. You could keep going deeper, adding IP addresses, connected neighbors, or any other related data your schema defines.

GraphQL supports two main types of operations:

Queries read data. They're what you use when you need to extract information for reports, dashboards, or configuration generation.
Mutations change data. They handle creating, updating, or deleting objects.

The structure of your response always matches the structure of your query, which makes parsing straightforward and predictable.

Common use cases for GraphQL in Infrahub

The GraphQL query engine shines when you need to gather complex, interconnected data quickly. Here are some scenarios where it makes a real difference.

Configuration generation: Efficiency gains are really obvious here. Generating a device configuration with a REST API means making separate calls for each bit of data you want, and getting back lots of data you don't need. With GraphQL, you write one query that requests all the data your Jinja template needs. You get exactly those fields in a single response, which you can feed directly into your template. It's faster to execute and much easier to maintain.
Topology mapping: GraphQL makes it straightforward to traverse network relationships. You can query a device, follow its interface connections to neighboring devices, and map out your topology in a single request. This is useful for building network diagrams, validating connectivity, or understanding impact before making changes.
Compliance and validation: Need to find all interfaces with a specific status? Or check whether every router has a loopback address configured? GraphQL filters let you target exactly what you're looking for across hundreds or thousands of objects. You can write queries that check for compliance with your standards and return only the exceptions that need attention.
Integration with CI/CD pipelines: When your automation pipeline needs infrastructure data to generate artifacts, GraphQL reduces the API overhead. Fewer calls mean faster pipelines and less complexity in your integration code. Since the GraphQL schema includes type information, it's also easier to validate that your pipeline is requesting the correct data.
Audit and troubleshooting: Infrahub's GraphQL API lets you query not just data but also metadata: who changed a value, when, and why. You can trace configuration values back to their source, whether that's a standard, an override, or an imported value from another system. This makes troubleshooting faster and gives you a clear audit trail for compliance.

GraphQL query engine FAQs

Is GraphQL in Infrahub different from standard GraphQL?

How to Know If Your NSoT Has Boundary Issues

Alex Henthorn-Iwane — Mon, 09 Feb 2026 18:59:56 +0000

aka A Field Guide to Unhealthy Intent Data Relationships

Your network source of truth (NSoT) is supposed to bring order, consistency, and clarity to your automation. But if your platform explodes when you change a schema, requires a week of testing for a patch release, or leaves you stuck on a 2021 version, you may be in a toxic relationship.

In other words, your NSoT has boundary issues.

As in human relationships, weak boundaries in intent data systems can lead to emotional burnout, broken trust, failed initiatives, and automation stagnation.

5 signs your NSoT has boundary issues

1. Everything is connected. And not in a good way.

A schema tweak breaks a plugin. That plugin powers a config template. That template drives an orchestration workflow. Suddenly, changing a single field can reroute your entire deployment pipeline.

Diagnosis: Enmeshment. You have no separation of concerns.

2. Your NSoT needs constant reassurance.

A patch update requires a month of testing. A minor version bump becomes a cross-team event. You’re scheduling therapy sessions for your CI pipeline.

Diagnosis: Insecure attachment to the schema.

3. Any change turns into a fight.

Need to support a new site type or vendor? Great. But to do it, you have to reverse-engineer your own tooling stack and hope nothing breaks downstream.

Diagnosis: Boundary collapse between schema and application logic.

4. You’re stuck in the past.

You haven’t upgraded your NSoT in years because you’re afraid everything will break. And your fears are correct. Your plugins depend on undocumented schema internals written by someone who left 18 months ago.

Diagnosis: Version trauma. You’re emotionally (and technically) frozen.

5. You and others around you are losing trust in the data.

The schema is so customized (and so fragile) that engineers start bypassing it. Intent data starts moving to other systems.

Diagnosis: Communication breakdown. Your source of truth is no longer true.

What healthy boundaries look like

Boundaries are how systems grow without becoming brittle. A healthy, maintainable NSoT platform should include:

Decoupled architecture: Schema, application logic, and rendered outputs are separated with clear, versioned contracts.

Composable extensions: Plugins don’t rely on internal structures and can evolve independently.

Stable upgrade paths: New platform versions don’t require rewriting your automation.

Schema independence: Your data model isn’t hostage to the product version.

Trustworthy data: Engineers rely on the NSoT as the system of record, not as an afterthought.

A platform that respects boundaries gives you velocity without drama.

If your NSoT makes you feel like every change is a crisis, it’s not you—it’s the architecture. Choose platforms built for change, not control.

Your infrastructure deserves a healthy relationship.

The post How to Know If Your NSoT Has Boundary Issues appeared first on OpsMill.