Francesco Del Re’s Blog

Cell‑Based Architecture: pragmatic isolation at scale

2026-02-18T00:00:00+00:00

Large systems fail for predictable reasons: shared bottlenecks, shared state, and shared deployment units. Even when services are “small”, the operational failure domain often remains one big bucket. A cell-based architecture breaks that bucket into smaller ones. The workload is replicated into isolated cells and traffic is partitioned across them, so failures and rollouts can be managed cell by cell.

This article keeps the idea simple and practical. You’ll get a clear definition, reasons to use (and not use) the approach, and a generic .NET reference you can adapt to almost any context.

What a “cell” really is

A cell is not “a microservice” and it’s not “a region”. A cell is a self‑sufficient slice of your workload that can serve a subset of traffic on its own. Think of it as a small copy of the system’s operational shape: API entry points, background workers if you have them, a cache if you need it, and this is the part people often underestimate its own state or a clearly separated state partition. The cell should be something you can scale, deploy, throttle, and even drain out of rotation without breaking the global system.

A typical view looks like this:

flowchart LR
  U[Clients] --> R[Cell Router / Gateway]
  R --> C0[Cell 00: API + Workers + Cache + Data partition]
  R --> C1[Cell 01: API + Workers + Cache + Data partition]
  R --> C2[Cell 02: API + Workers + Cache + Data partition]
  R --> CX[...]

The router decides which cell receives a request. The decision is usually driven by a partition key: tenant id, account id, region, cohort, or any other stable attribute that matches how your traffic naturally splits.

What cell‑based architecture is not

It helps to be explicit about what CBA doesn’t magically solve. It isn’t a synonym for microservices. You can have dozens of microservices and still have one shared database and one shared failure domain. It also isn’t automatically “multi‑region”; cells can live inside a single region if your main goal is isolation rather than geo redundancy. And it isn’t Domain‑Driven Design. DDD partitions by meaning, ownership, and business language. Cells partition by operational isolation and blast radius. They can coexist nicely, but they answer different questions.

Why teams adopt it

The main value of CBA is not theoretical elegance. It’s everyday operational safety. When you isolate the workload into cells, you get failure containment almost for free. If cell‑03 is having trouble maybe a bad deployment, maybe a hot tenant, maybe a misbehaving downstream dependency you can reduce or remove traffic to that one cell while the rest keeps working. Capacity may go down, but the system doesn’t collapse into a single global incident. Deployments become calmer as well. Instead of releasing to the whole fleet, you release to a single “canary cell”, observe, and then promote. This sounds like canary deployments in general, but the difference is that a cell gives you a bounded unit where not only the API binaries change, but also the local cache, queues, workers, and data partition behave together. You are testing a full slice of reality. Finally, scaling becomes more predictable. If load spikes are localized, often true, in multi‑tenant systems you can scale the impacted cells without resizing everything else, which is healthier for cost and for operational focus.

When it’s worth the cost

Cell‑based architecture is most compelling when your workload already “wants” to be partitioned. If you serve multiple tenants or customer groups with different behaviors, CBA gives you a clean way to prevent one from hurting the others. If you operate in a regulated or mission‑critical environment where a broad outage is unacceptable, cells provide an isolation boundary that is easy to reason about in incident response. And if deployments are routinely stressful because a change can take down too much at once, the canary‑cell rollout model is a concrete way to reduce risk without slowing delivery. In other words, CBA shines when you can answer two questions with confidence: “what is my partition key?” and “can I keep state isolated per partition?”

When you should not use it

CBA is not cheap. The thing you gain independence comes from the thing you pay duplication.

If your system is small enough that failures are manageable with simpler patterns, cells are likely overkill. If your traffic does not partition naturally, routing becomes arbitrary, and you’ll spend your time arguing about why a request should go to cell‑04 rather than cell‑07. If you cannot isolate state and you’re not ready to change that, cells won’t deliver their promise; you’ll end up with replicated stateless tiers all depending on the same stateful bottleneck.

A good practical check is this: if your “cells” share a database in a way that allows a global lock, a global schema migration, or a global saturation to take everything down, then you don’t have cell isolation yet. You have replicas.

The part that makes or breaks the design: state

People often start from routing and deployments because they’re visible and exciting. The real architecture work is state. The cleanest model is database per cell, because it gives you true isolation and straightforward failure boundaries. The more common compromise is a shard per cell: same database technology, but strict partitioning so that each cell owns its slice. Some teams start with a shared database and a strong partitioning discipline as a transition phase, but you should treat that as a temporary step. Shared state tends to reintroduce shared fate over time. State isolation also implies a mindset shift: cross‑cell synchronous calls should be rare. When information needs to flow between partitions, asynchronous integration (events, outbox/inbox patterns, projections) keeps the isolation boundary intact.

A generic .NET reference: the shape of the solution

This section stays deliberately use‑case‑agnostic. Whether you run on Kubernetes, VMs, or managed services, the logical responsibilities remain the same. You’ll typically have a cell router at the edge and a replicated cell workload behind it. A control plane is optional at the beginning, but it becomes valuable once you need rebalancing and operational automation. In .NET, a pragmatic router can be built with YARP (Yet Another Reverse Proxy). YARP is a production‑grade reverse proxy that supports routing, transforms, health checks, and dynamic configuration.

The basic idea is simple:

determine a partition key for the request (for example, X-Tenant-Id)
compute or look up a cell id
route to the cell’s backend cluster
make telemetry and failure handling cell‑aware

Routing strategy: hashing first, mapping later

There are two common ways to assign cells. A deterministic hash is the easiest place to start. It requires no lookup store and it gives you stable placement across restarts. The trade‑off is rebalancing: moving a tenant between cells becomes a project. A lookup table (tenant → cell) is more flexible. It enables hot‑tenant isolation and controlled migrations, but it introduces a dependency you must make highly available and fast. In practice, teams often start with hashing and later add a mapping layer once operational needs demand it. The reference below uses hashing because it’s the smallest thing that can work.

Building the Cell Router in .NET with YARP

Minimal proxy configuration

Here is a minimal appsettings.json excerpt that defines one cluster per cell. The addresses are placeholders; in Kubernetes they could be service DNS names, elsewhere they can be regular endpoints.

{
  "ReverseProxy": {
    "Routes": {
      "all": {
        "ClusterId": "byCell",
        "Match": { "Path": "/{**catch-all}" }
      }
    },
    "Clusters": {
      "cell-00": {
        "Destinations": { "d1": { "Address": "http://cell-00-api/" } }
      },
      "cell-01": {
        "Destinations": { "d1": { "Address": "http://cell-01-api/" } }
      }
    }
  }
}

For a large number of cells you will not want to hardcode routes and clusters. You’ll generate them from a config store or service discovery. But starting simple helps you validate the pattern quickly.

Assigning a cell id in middleware

The router needs to compute a cell for each request. The snippet below reads a partition key from X-Tenant-Id, hashes it, and injects X-Cell-Id. This is intentionally generic: replace the header with whatever key makes sense in your system.

using System.Security.Cryptography;
using System.Text;

public sealed class CellAssignmentMiddleware
{
    private readonly RequestDelegate _next;
    private readonly int _cellCount;

    public CellAssignmentMiddleware(RequestDelegate next, int cellCount) =>
        (_next, _cellCount) = (next, cellCount);

    public async Task Invoke(HttpContext ctx)
    {
        var key = ctx.Request.Headers["X-Tenant-Id"].ToString();

        if (string.IsNullOrWhiteSpace(key))
        {
            ctx.Response.StatusCode = StatusCodes.Status400BadRequest;
            await ctx.Response.WriteAsync("Missing X-Tenant-Id.");
            return;
        }

        var cellId = ComputeCellId(key, _cellCount); // "cell-03"
        ctx.Items["cell.id"] = cellId;
        ctx.Request.Headers["X-Cell-Id"] = cellId;

        await _next(ctx);
    }

    private static string ComputeCellId(string key, int cellCount)
    {
        var bytes = SHA256.HashData(Encoding.UTF8.GetBytes(key));
        var value = BitConverter.ToInt32(bytes, 0) & int.MaxValue;
        var idx = value % cellCount;
        return $"cell-{idx:00}";
    }
}

Routing to the correct cell

A straightforward technique is to have one route per cell that matches on X-Cell-Id. It is verbose but it is extremely easy to understand and debug. Later, once you have a control plane, you can switch to dynamic routing.

{
  "ReverseProxy": {
    "Routes": {
      "cell00": {
        "ClusterId": "cell-00",
        "Match": {
          "Path": "/{**catch-all}",
          "Headers": [{ "Name": "X-Cell-Id", "Values": ["cell-00"] }]
        }
      },
      "cell01": {
        "ClusterId": "cell-01",
        "Match": {
          "Path": "/{**catch-all}",
          "Headers": [{ "Name": "X-Cell-Id", "Values": ["cell-01"] }]
        }
      }
    }
  }
}

Wiring the router

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddReverseProxy()
    .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));

var app = builder.Build();

app.UseMiddleware(cellCount: 10);
app.MapReverseProxy();

app.Run();

This is enough to demonstrate the core behavior: a request enters the router, gets assigned a cell, and is forwarded to the correct backend.

What changes inside the cell workload

Your APIs and workers inside a cell should feel like normal .NET services, with one important constraint: they must be cell‑aware for state and telemetry. If you run on Kubernetes, the simplest approach is to inject CELL_ID as an environment variable. On VMs, it can come from configuration. Either way, treat the cell id as part of the service identity. From there, select the correct data partition. The cleanest pattern is to resolve connection settings from a small “cell context” service. Whether that context picks a different connection string, schema, or shard key depends on your storage strategy, but the principle is the same: a cell must not quietly drift into shared state.

Making observability cell‑aware (and why it matters)

Once you introduce cells, incident response becomes “which cell is sick?”. That only works if your telemetry carries cell.id consistently. In .NET, OpenTelemetry is a practical default. Add the cell id as a resource attribute at startup, so every trace and metric inherits it. The exporters and backend are your choice; the key is that cell identity becomes a first‑class dimension.

using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

var cellId = Environment.GetEnvironmentVariable("CELL_ID") ?? "cell-unknown";

builder.Services.AddOpenTelemetry()
    .ConfigureResource(r => r.AddService("sample.cell.api")
        .AddAttributes(new Dictionary { ["cell.id"] = cellId }))
    .WithTracing(t => t
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddOtlpExporter());

Even if you do not adopt OpenTelemetry today, keep the idea: every log line, trace, metric, and alert should know which cell it belongs to.

Rollouts that don’t spike your blood pressure

The most satisfying “cell moment” is the day you stop doing big‑bang releases. A simple operational loop is enough. Deploy the new version to one canary cell. Route a small cohort to it (or only internal traffic). Watch the basics: error rate, latency, saturation, and any business signals that matter. If it looks healthy, promote to the next cells in waves. If it doesn’t look healthy, drain the canary cell and roll back without impacting global traffic. You still have work to do, but you’ve turned a platform‑wide incident into a contained issue.

A short note on DDD

It’s common to worry that cells “fight” Domain‑Driven Design. In practice they don’t, as long as you don’t try to force one partitioning dimension to do everything. DDD tells you how to structure business ownership and boundaries. Cells tell you how to contain operational failures. Sometimes a cell will contain multiple bounded contexts because the partition key is tenant or region rather than domain. Other times a single bounded context might be replicated across cells. That is fine. The design becomes confusing only when you pretend these are the same concept.

What “good” looks like in practice

You know your cell‑based architecture is real when a broken cell behaves like a capacity issue, not like a correctness issue. You can drain it, keep the rest serving, and your operational tools can tell you, within minutes, what went wrong and where. If you can’t confidently drain a cell, or if draining a cell breaks global behavior, you’re not done yet. The main value of CBA is not the diagram; it’s the operational control it gives you on a bad day.

Conclusion

Once the basic pattern works, you’ll almost certainly want two upgrades. First, introduce a mapping layer so you can move tenants between cells without changing hash functions. Second, automate the lifecycle: provisioning a new cell should be infrastructure‑as‑code, not a weekend project. But don’t start there. Start with a single router, a handful of cells, strict state separation, and cell‑aware telemetry. If those four pieces hold, you’ll have a platform you can grow with confidence.

Medallion Architecture for PDND Interoperability Data in Public Administration

2026-01-18T00:00:00+00:00

(Bronze → Silver → Gold as a path to progressively higher data quality)

When you work with PDND-based interoperability, it is tempting to think that the job ends when the e-service exchange is “correct” from a protocol standpoint: authentication works, the request reaches the provider, and a payload comes back. No, this is only the starting point.

What matters in Public Administration scenarios is what happens after the exchange: can the data be trusted, compared across time, reused by multiple processes, and evolved without breaking everything downstream? If interoperability is the highway, data quality is what determines whether the traffic is safe and sustainable.

This is why I want to adopt the Medallion Architecture as a first-class pattern for PDND data exchanges. Medallion (often described as Bronze → Silver → Gold) is not a technology choice; it is an operating model for improving data quality in stages. Instead of attempting to produce a “perfect” dataset immediately (which usually leads to brittle pipelines or hidden assumptions), Medallion establishes a controlled progression where each layer has a clear contract and a clear responsibility.

In practice, this layered approach solves three recurring problems that are particularly common in interoperability ecosystems:

First, traceability and replayability: PDND exchanges happen between multiple parties and evolve over time. When a contract changes, or when an issue is discovered months later, you need to be able to go back to the original exchanged payload and reprocess it with new rules. Without this, every correction becomes a one-off patch and historical data becomes inconsistent.

Second, semantic stability: even when providers follow the same interoperability rules, payload semantics can vary. Code sets can drift, optional fields can be missing, different versions can coexist, and interpretations can differ between organizations. Medallion gives you a place to define a stable internal meaning (Silver) that downstream systems can rely on, independently from how external payloads evolve.

Third, reusability at scale: once PDND data is available, it tends to be consumed by many different use-cases: reporting, controls, workflow automation, downstream services, auditing, and analytics. If every consumer cleans and normalizes data on its own, you end up with duplicated logic, conflicting numbers, and fragile processes. Medallion addresses this by centralizing data quality and publishing consumption-ready products (Gold) with explicit contracts.

The key insight is simple:

Medallion Architecture is a maturity path for interoperability data.
It turns “data exchanged” into “data that multiple systems can trust”, while keeping evolution manageable.

This post keeps examples intentionally generic and focuses on how to interpret Bronze/Silver/Gold specifically in a PDND context, using high-level .NET design patterns to express the contracts between layers.

Why Medallion is a good fit for PDND interoperability

Interoperability data naturally changes over time. Payloads can be refined, fields can be added, and semantics can be clarified. Without structure, these changes quickly force downstream consumers to either break or to implement their own ad-hoc interpretation logic, which leads to inconsistent results and fragile processes.

Medallion Architecture addresses this by giving you three well-defined “quality gates”:

Bronze preserves what was exchanged so you can always replay and reprocess. Silver standardizes and validates meaning through a canonical internal contract. Gold then packages that trusted meaning into consumption-ready data products aligned to real operational and business needs. The result is an interoperability pipeline that becomes more reliable as the ecosystem grows.

Interpreting Bronze/Silver/Gold for PDND exchanges

Bronze: “as exchanged” (interoperability truth)

The Bronze layer captures the PDND exchange as it happened, staying as close as possible to the source. Its purpose is not to improve the data; it is to preserve evidence and enable replay.

In practice, Bronze stores the original payload (as exchanged) together with minimal technical metadata to keep it traceable across time and systems. A useful design choice here is to avoid long method signatures and “flat” records; instead, model the exchange as an envelope that carries both metadata and payload. This keeps contracts stable when metadata evolves (new identifiers, environment markers, schema hashes, etc.) without forcing changes across the codebase.

public sealed record PdndExchangeMetadata(
    string CorrelationId,
    DateTimeOffset ReceivedAtUtc,
    string EServiceId,
    string ProviderOrganization,
    string ContractVersion
);

public sealed record PdndBronzeExchange(
    PdndExchangeMetadata Meta,
    string Payload // raw payload as exchanged (e.g., JSON/XML as string)
);

A helpful mental model is simple: Bronze is the truth you can always return to, especially when rules change or issues are discovered later.

Silver: “conformed and validated” (quality and standardization)

Silver is where “as exchanged” data becomes a canonical internal representation with stable semantics. This is the layer where data quality is actively increased and formalized.

The central concept is the canonical contract: your internal model that preserves meaning even when external payload versions evolve. In Silver you enforce schema and types, normalize formats and code sets, validate both syntax and semantics, and deal explicitly with idempotency/deduplication in a deterministic way. Silver is also where you make data quality visible by quarantining invalid or incomplete records instead of silently dropping them.

To keep the design clean and signatures stable, a useful approach is to pass cross-cutting concerns (reference data, policies, environment-specific rules) through a processing context, rather than adding parameters to every method. This makes the pipeline easier to evolve and test.

public sealed record ProcessingContext(
    string Environment,
    IReferenceData ReferenceData,
    IPolicySet Policies
);

public sealed record ValidationResult(bool IsValid, string? Reason);

public interface ICanonicalMapper
{
    TOut Map(TIn input, ProcessingContext ctx);
}

public interface IValidator
{
    ValidationResult Validate(T item, ProcessingContext ctx);
}

With those contracts in place, your Silver pipeline becomes readable and explicit about the quality boundary it enforces. Importantly, it stays stable even when you add new rules or enrichments, because those evolve inside ProcessingContext rather than in method signatures.

public sealed class SilverPipeline
{
    private readonly ICanonicalMapper _mapper;
    private readonly IValidator _validator;

    public SilverPipeline(
        ICanonicalMapper mapper,
        IValidator validator)
    {
        _mapper = mapper;
        _validator = validator;
    }

    public (TSilver? Valid, TSilver? Invalid, string? Reason) Process(
        TBronze bronze,
        ProcessingContext ctx)
    {
        var canonical = _mapper.Map(bronze, ctx);
        var result = _validator.Validate(canonical, ctx);

        return result.IsValid
            ? (canonical, default, default)
            : (default, canonical, result.Reason);
    }
}

Gold: “domain-ready” (data products and consumption contracts)

Gold takes validated Silver data and turns it into consumption-oriented datasets, often referred to as data products. Gold does not necessarily make the data “more correct” than Silver; it makes it more usable for a specific audience and use-case.

In Gold you typically apply controlled enrichments (for example by joining reference data), define domain projections that match how processes and services consume the information, and introduce derived attributes or aggregates when they add clarity and value. The key architectural choice is to avoid a single “catch-all” dataset and instead publish targeted products with explicit contracts.

Here again, the same “clean signature” rule applies: keep your projector interfaces simple and push cross-cutting concerns into the context.

public interface IGoldProjector
{
    TGold Project(TSilver silver, ProcessingContext ctx);
}

public sealed record GoldDataProductRow(
    string BusinessKey,
    DateOnly BusinessDay,
    string Category,
    string ProviderOrganization,
    string EServiceId
    // + fields shaped for a specific consumer/use-case
);

The key PDND benefit: quality increases without breaking interoperability

A practical advantage of Medallion in PDND scenarios is that it allows exchanges to evolve safely. External contracts can change, but Bronze preserves the original payload for replay. Silver stabilizes internal meaning through canonical mapping and validation. Gold delivers consumption-ready data products whose contracts you control.

This prevents a common interoperability anti-pattern: each downstream consumer interprets the same PDND payload differently, re-implementing cleaning rules and code mappings in inconsistent ways. Over time that leads to fragile processes and conflicting numbers. Medallion keeps meaning centralized, explicit, and versionable.

Operational principles that make it work

Treat Bronze as immutable evidence. Ensure Silver transformations are deterministic so reprocessing remains trustworthy. Treat external versions as explicit contracts and keep your canonical representation stable. Make quarantine a normal part of the pipeline, because visibility is how quality improves. Finally, design Gold as a set of purpose-driven data products rather than a single generic dataset.

Conclusion

In PDND interoperability, the exchange is the beginning, not the end. Medallion Architecture provides a clean and repeatable path to convert “data exchanged across entities” into “data that multiple systems can trust”:

Bronze preserves the truth of the exchange
Silver stabilizes meaning through conformity and validation
Gold delivers domain-ready data products aligned to real consumption needs

Interoperability in Enterprise Systems: Integration Patterns for Public Administration and Legacy Landscapes

2025-12-23T00:00:00+00:00

Interoperability isn’t “we exposed an API”.

In enterprise systems, especially in Public Administration (PA), interoperability means exchanging information that stays correct, interpretable, and auditable over time. That “over time” part is where most projects break: models drift, regulations evolve, suppliers change, legacy systems stay put, and the same business concept ends up represented in five different ways across five different databases.

This is a pragmatic walkthrough of what actually works in that environment. We’ll cover Anti-Corruption Layers, Integration Layers, message brokers, normalization, and a few reliability patterns that keep integrations from collapsing under real-world conditions. No vendor talk, no “just adopt microservices”, and no pretending that legacy doesn’t exist.

Interoperability is not integration

Integration is the act of connecting systems: sending requests, receiving responses, delivering messages, moving files. Interoperability is what happens after that connection exists: whether both parties understand the same data in the same way, whether they can evolve without breaking each other, and whether you can reconstruct the truth when something goes wrong.

In PA this gets harder because integrations are institutional. You’re not linking one “service” to another; you’re linking organizations with different standards, responsibilities, and timelines. And traceability is not negotiable. When a citizen’s case is impacted, “the system replied 200 OK” is rarely a meaningful answer. You need to know what changed, why, and which source triggered it.

A simple way to frame interoperability is this: it’s not about connectivity, it’s about shared meaning + controlled evolution + auditability.

The legacy reality check: why naive approaches fail

Most interoperability failures are predictable because the same shortcuts keep happening. Point-to-point growth is the classic one. It starts as “just one integration”, then becomes six, then becomes thirty. Each endpoint carries slightly different assumptions, each transformation is implemented differently, and eventually nobody can explain what the system really believes about a piece of data. Another shortcut is extracting analytics directly from operational databases. The first report works fine, then volumes grow, queries get heavier, indexes get tuned for reporting instead of operations, and the transactional workload slows down. At that point you choose between degrading operational performance or killing analytics that stakeholders now depend on. Then there’s the retry trap. It’s easy to implement “retry on error” and call it resilience. But blind retries can re-apply a command that no longer matches the current state. In legacy-heavy environments this is common: downstream systems aren’t idempotent, upstream systems don’t know the real state, and you introduce subtle misalignments that appear weeks later. Finally, investigations often fail because history is missing. If you can’t reconstruct “what exactly happened to this record”, every incident becomes guesswork. In PA, that’s not only operational pain, it’s also governance pain.

Here are the shortcuts that usually show up together:

Point-to-point sprawl and inconsistent transformations
Operational DB used for analytics, causing performance regressions
Blind retries that ignore current state and create silent inconsistencies
Missing history/audit, making root cause analysis nearly impossible

These failures aren’t caused by bad tools. They happen when interoperability is treated as wiring rather than architecture.

Integration Layer vs Anti-Corruption Layer (ACL): two different responsibilities

Two concepts are frequently confused: the Integration Layer and the Anti-Corruption Layer (ACL). They both sit between systems, but their goals are not the same.

The Integration Layer is about mechanics. It standardizes how you connect: protocols, authentication, routing, throttling, canonical headers, consistent error handling. It’s where you centralize the boring but essential concerns that otherwise get re-implemented everywhere.

The ACL is about meaning. It protects your internal model from external chaos. Legacy systems and external organizations often encode business concepts in ways that are convenient for them, not for you. Status codes like 7 or 9 might make sense historically, but they’re not a stable language for your domain. An ACL translates external representations into internal concepts and enforces your invariants. This is where teams often underestimate the work. They implement DTO mapping and call it an ACL. But real ACL work is semantic. It’s deciding what an external field actually means in your domain, how to handle missing or contradictory information, and how to evolve without letting external changes ripple into your core services. A useful rule of thumb is simple: your core domain should never “speak legacy”. If you find yourself introducing external codes, external lifecycle states, or external quirks inside your core model, you’re skipping the ACL and you’ll pay for it later.

A practical mental model:

Integration Layer: “How do we connect?”
ACL: “What does this mean in our domain?”

Normalization: fix meaning, not just formats

Normalization is often treated as “format cleanup”: date formats, trimming strings, validating identifiers, ensuring encoding consistency. That’s necessary, but it’s not sufficient. In interoperability projects, the harder part is semantic normalization. The same field can represent different meanings depending on the source, the time period, or the business process that generated it. Two systems may both expose a concept called “status”, but one encodes operational status while the other encodes legal eligibility. You can normalize formats all day and still exchange wrong information. That’s why interoperability needs data contracts, not just endpoints. A contract makes expectations explicit: schema, semantics, versioning, validation rules, and compatibility commitments. In long-lived PA ecosystems, contracts are the difference between stable evolution and perpetual firefighting because they let you change one side without breaking the other, and they give you a shared reference when disputes arise.

What good contracts typically include:

A clear schema (even a simple one is better than none)
Semantic notes (what fields mean, not just their type)
Versioning rules and backward compatibility expectations
Validation rules and error semantics (what is rejected, what is tolerated)

Treat contracts as products: documented, versioned, tested, and owned. Otherwise interoperability becomes folklore.

The message broker: boring, but it saves projects

When organizations integrate, availability and timing rarely align. That’s why async-first architectures are so effective in PA and legacy landscapes. A message broker introduces decoupling: producers don’t need consumers to be online right now, and consumers can process at their own pace. But a broker doesn’t magically solve reliability. It forces you to be explicit about things synchronous calls often hide: retry strategies, poison messages, ordering assumptions, and what at least once delivery means for your business operations. This is also where the classic “blind retry” problem shows up again. If a failed transaction is retried without considering the current state, you can create contradictions. The same command that was correct yesterday might be incorrect today because new information arrived or the citizen’s case changed. Resilience that ignores state is just chaos with better logging. A broker helps because it enables a controlled approach: dead-letter queues, delayed retries, backpressure, replay, and fan-out patterns. It turns integrations into pipelines rather than fragile call chains.

A few broker-related concerns you should always decide upfront (explicitly):

How retries work (transient vs permanent failures)
How you handle poison messages (DLQ)
How you achieve idempotency and deduplication downstream
Whether you require ordering, and how you enforce it

Reliability patterns that keep integrations survivable

Once you accept that distributed systems fail in annoying ways, a few patterns become non-negotiable.

Transactional Outbox is the first. If a service updates its database and must publish an event, you want both actions to be consistent. Writing state to the DB and sending a message in the same “logical transaction” is harder than it looks. Without an outbox, you’ll eventually hit scenarios where the DB commit succeeds but message publishing fails (or vice versa). The outbox approach writes the outgoing event into a table in the same DB transaction and publishes it asynchronously. Not glamorous, but one of the best reliability trades you can make.

Idempotency and deduplication are the second. Duplicates happen. Replays happen. Recovery produces duplicates by design. Consumers must safely process the same message multiple times without breaking state. This usually means idempotency keys, dedup stores, and “upsert-like” semantics. It also means designing commands/events so that replaying them is safe.

Finally, for long-lived cross-system workflows you often need a Saga or process manager. PA workflows can span multiple domains and last days or weeks. Modeling them as a chain of synchronous calls is fragile and makes recovery painful. A process manager maintains state, correlates events, handles timeouts, and defines compensations. It’s the difference between a controlled workflow and a pile of retries.

If you want a quick checklist, these are the patterns most often missing in broken integrations:

Transactional Outbox (reliable publishing)
Idempotency + dedup (safe consumption)
DLQ strategy (controlled failure handling)
Correlation IDs (end-to-end traceability)
Process manager for long workflows (controlled orchestration)

Observability and auditability: the PA baseline

In many private contexts, you can survive with basic logging and an incident ticket. In PA contexts, auditability is a baseline requirement. You often need to prove what happened, not just fix it. That means correlation IDs across boundaries, structured logs, and distributed tracing for integration paths. It also means designing an audit trail that answers uncomfortable questions: who changed what, when, why, and based on which source. This becomes critical when the same “record” is influenced by multiple inbound integrations. At the same time, auditability must respect privacy. You can’t just log everything. You need tokenization, masking, and retention policies. The goal is to keep integrations observable and reconstructable without turning logs into a liability.

Analytics: stop querying operational databases

One of the most consistent anti-patterns in enterprise systems is “analytics on production databases”. It usually starts with good intentions and ends with performance problems and fragile tuning. When volumes grow, operational and analytical workloads compete, and tuning becomes a zero-sum game. A healthier approach is workload isolation. Build read models, projections, marts, or any other dedicated layer for analytics. Populate it incrementally, ideally near-real-time, and let it evolve independently from the operational model. This is where “Data-as-a-Product” becomes practical: curated datasets with ownership, documentation, and expectations. Near-real-time analytics isn’t about buying a platform. It’s about not mixing responsibilities. When operations and analytics are separated, both become more reliable.

A pragmatic playbook (no big-bang needed)

Interoperability improvements don’t require rewriting everything. They require choosing the right battles and sequencing changes. Start by identifying the critical exchange flows: high volume, high failure impact, or high business sensitivity. Formalize contracts for those flows and introduce an ACL where semantics are unstable. Move heavy exchange toward asynchronous messaging where coupling and availability are hurting you, and add idempotency and recovery mechanisms before you scale. If you publish events, introduce an outbox. If reporting is killing the operational DB, isolate analytics with projections. Then instrument the whole thing with correlation IDs and audit trails so you can operate it with confidence.

A simple sequence that works surprisingly often is:

Stabilize semantics with contracts and ACLs
Stabilize reliability with outbox, idempotency, and DLQ
Stabilize performance by isolating analytics workloads
Stabilize operations with observability and audit trails

Small steps. Measurable wins. Repeat.

Conclusion

Interoperability is semantic stability over time, not a simply connection. In PA and legacy landscapes, architecture must assume drift, partial failure, and long-lived processes. The Integration Layer standardizes mechanics; the ACL protects meaning. Brokers, idempotency, and outboxes make integrations resilient and evolvable. Normalization must include semantics, not just formats. Analytics belongs outside operational databases. Observability and auditability are first-class requirements, not optional extras.

If you treat interoperability as an architecture discipline rather than wiring, you stop fighting fires and start shipping reliable change.

Benchmarking in .NET: Measuring What Really Matters

2025-11-02T00:00:00+00:00

Issues with performance in code rarely arise from intentionally bad behavior, instead, they’re usually based on assumptions. Developers tend to believe they know what’s fast and what’s not, but modern .NET runtimes can be too complex to use intuition alone. JIT optimizations, memory pressure, and CPU caches can lead to vastly different outcomes than you thought would occur. Without measurement, it’s impossible to know where time or memory was actually spent.

BenchmarkDotNet solves this problem. It’s a straightforward, yet extremely precise framework that enables developers to measure execution time and memory consumption of small sections of .NET code reliably and unstated way that is statistically reliable. BenchmarkDotNet takes the noise and adjudication away from measuring performance, assuring you trustable, repeatable results.

Why Benchmarking Is Essential

In most projects, performance issues appear late, usually when the system scales or users increase. Running benchmarks early in development helps you:

Detect regressions after refactoring
Choose between two implementations based on real data
Understand how algorithms or libraries behave as input grows
Build realistic expectations about performance and scalability

Benchmarking isn’t only for low-level optimization but it’s a decision making tool.

Introducing BenchmarkDotNet

BenchmarkDotNet is available as a standard NuGet package:

dotnet add package BenchmarkDotNet

Once added, you can define small benchmark classes decorated with attributes that describe what to measure and how. The library handles warmup runs, multiple iterations, and reports average performance, deviation, and allocations. It’s the same tool used by the .NET runtime team, which makes it a solid foundation for your own analysis.

A Practical Example: Sorting Algorithms

Let’s benchmark two sorting approaches to see BenchmarkDotNet in action:

Array.Sort(): the optimized built-in algorithm used in .NET
A simple bubble sort: easy to understand but inefficient

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Columns;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Order;
using BenchmarkDotNet.Running;
using System.Linq;

public class Config : ManualConfig
{
    public Config()
    {
        AddJob(Job
            .Default
            .WithWarmupCount(3)
            .WithIterationCount(8)
            .WithLaunchCount(1));
        AddColumn(RankColumn.Png);                 // ranking badge
        WithOrderer(new DefaultOrderer(SummaryOrderPolicy.FastestToSlowest));
    }
}

[MemoryDiagnoser]
[Config(typeof(Config))]
public class SortingBenchmarks
{
    private int[] data;

    [Params(1_000, 10_000, 100_000)] // reduce 100_000 if runs take too long
    public int Size;

    [GlobalSetup]
    public void Setup()
    {
        var random = new Random(42);
        data = Enumerable.Range(0, Size).Select(_ => random.Next()).ToArray();
    }

    [Benchmark(Baseline = true)]
    public int[] ArraySort()
    {
        var copy = (int[])data.Clone();
        Array.Sort(copy);
        return copy;
    }

    [Benchmark]
    public int[] BubbleSort()
    {
        var copy = (int[])data.Clone();
        for (int i = 0; i < copy.Length - 1; i++)
            for (int j = 0; j < copy.Length - i - 1; j++)
                if (copy[j] > copy[j + 1])
                    (copy[j], copy[j + 1]) = (copy[j + 1], copy[j]);
        return copy;
    }
}

public class Program
{
    public static void Main() => BenchmarkRunner.Run();
}

Run it in Release mode to get accurate results:

dotnet run -c Release

Interpreting the Results

BenchmarkDotNet will execute both methods under controlled conditions and output a table like this:

Method	Size	Mean (ms)	Error (ms)	StdDev (ms)	Allocated
ArraySort	1,000	0.100	0.001	0.001	~3.9 KB
BubbleSort	1,000	3.000	0.060	0.045	~3.9 KB
ArraySort	10,000	1.329	0.013	0.011	~39.1 KB
BubbleSort	10,000	300.000	6.000	4.500	~39.1 KB
ArraySort	100,000	16.610	0.166	0.133	~390.6 KB
BubbleSort	100,000	30000.000	600.000	450.000	~390.6 KB

These results are consistent with the theoretical complexity of each algorithm:

Array.Sort() operates in roughly O(n log n) time, so its cost scales moderately as input size increases
BubbleSort is O(n²), so it grows exponentially slower and quickly becomes impractical even for mid-sized arrays

Memory allocations are identical because both methods clone the input array before sorting. This ability to measure, compare, and reason about performance with hard numbers is what makes BenchmarkDotNet so powerful for .NET developers and architects. BenchmarkDotNet quantifies that gap with precise numbers. It’s not an assumption or an estimate, it’s a reproducible measurement.

Expanding the Analysis

BenchmarkDotNet can do much more than measure time. By adding attributes like [MemoryDiagnoser] or [Params], you can:

Track memory allocations and garbage collections
Vary input sizes to observe scalability
Compare multiple methods across different runtimes
Export results as Markdown, CSV, or JSON for reporting or CI integration

For larger projects, you can use the BenchmarkSwitcher API to run multiple benchmark classes in the same session.

Integrating Benchmarks into the Development Process

Benchmarks are most useful when they’re not treated as one-off experiments. Running them occasionally can be insightful, but the real value comes when you make benchmarking part of your regular development and delivery workflow. In most teams, performance tends to drift over time. A new feature might introduce an extra allocation, for example a refactor might change an algorithm’s complexity, a library update might slow things down under load. These changes are rarely visible through functional tests alone. Integrating benchmarks ensures that performance regressions are detected as early as functional bugs.

Run Benchmarks Alongside Unit Tests

The first step is to treat benchmarks like any other form of validation. You can keep a Benchmarks project inside your solution, right next to your test projects. Developers can run it locally when they modify critical code paths, for example, data serialization, parsing, caching, or algorithms. BenchmarkDotNet projects build and run just like unit tests, but instead of pass/fail, they output numeric results. A good practice is to check those results into version control, so you have a historical view of how performance evolves.

Automate Benchmark Runs in CI/CD

Once the local workflow is stable, automate it. You can configure your CI pipeline (like GitHub Actions, Azure DevOps, or Jenkins) to:

Run benchmarks in Release mode as part of nightly or pre-release builds
Export results in JSON or CSV format using BenchmarkDotNet’s built-in exporters
Compare current results with previous builds to detect performance regressions
Fail the build if a key benchmark exceeds a defined threshold

This transforms performance testing into a continuous quality signal, rather than a last-minute audit before release.

Establish Baselines

Benchmarks are meaningful only when compared against something. Define a baseline version of your application, perhaps the latest stable release or a known good commit, and store its benchmark results. Subsequent runs can be compared against that baseline to show whether a change improved or degraded performance. BenchmarkDotNet supports this directly through the [Baseline] attribute and relative comparison columns, but you can also manage it externally with exported data.

Monitor Performance Trends Over Time

Beyond catching regressions, benchmarks can provide trend visibility. By tracking the same set of benchmarks across releases, you can observe how the system’s performance evolves, which areas are improving, which are degrading, and how architectural decisions impact real performance. You can even use tools like Power BI, Grafana, or Excel to visualize historical data exported from BenchmarkDotNet runs. This makes it easier to justify optimization work and demonstrate progress to non technical stakeholders.

Keep Benchmarks Targeted and Maintainable

Benchmarks should be small, isolated, and purposeful. Focus on code paths where performance matters, like loops, serialization, parsing, or algorithmic components. Avoid full end-to-end scenarios that mix CPU and I/O, which are better suited for load testing tools like NBomber or k6. Keep each benchmark reproducible and independent. The goal is to measure computation, not environment noise.

Educate the Team

Finally, share the results with the team. BenchmarkDotNet produces readable Markdown reports that can be published automatically in your documentation or wiki. Use them during sprint reviews or retrospectives to highlight improvements or issues. Encouraging developers to interpret and discuss these metrics builds a shared understanding of performance, helping teams write faster and more predictable code.

Best Practices

Always benchmark in Release mode. Debug builds disable compiler optimizations
Avoid I/O operations in benchmarks focus on CPU and memory behavior
Keep benchmarks deterministic; avoid randomness or shared state between runs
Run benchmarks on a ‘quiet’ machine to reduce external interference

Conclusion

BenchmarkDotNet gives developers a reliable, repeatable way to see how .NET code actually performs. It replaces assumptions with data, and guesses with measurement. That means few surprises about performance in production and more faith in architecture decisions. You will often be surprised by the results, which is exactly why it is worthwhile to benchmark.

Designing High-Quality Public APIs

2025-10-12T00:00:00+00:00

Public APIs serve as the foundation of today’s software ecosystems. APIs connect services, allow integrations, and empower companies to create value well beyond their own codebases. When an API is designed well, it becomes invisible. It feels intuitive to use, is predictable to extend, and is stable enough to trust. Conversely, when an API is designed poorly, it causes developers to be confused, breaks compatibility, and costs teams time and credibility.

Designing a public API is more than just an engineering exercise, it is an exercise in communication. Every endpoint, every field and every error message communicates with another developer. That conversation needs to be clear, consistent and respectful of their time.

Why API Design Matters

When an organization publishes an API, it creates a contract. From that moment onward, every modification of that contract has a consequence. Bad design frequently will show itself months later when the teams notice a very small change to an internal model breaks dozens of clients. At that stage versioning will become messy, and the team will spend more time maintaining compatibility than creating features. Good design prevents this. A well-defined API minimizes ambiguity, reduces support overhead, and is a reliable base for growth. More importantly, it protects developers from unintended coupling to internal details. The purpose of a public API should be to expose capabilities, not implementation details. The moment consumers are dependent upon the shape of your database tables or your internal class names, you have lost control of your own evolution.

Start with the Consumer

Every application programming interface (API) starts with a fundamental question: who will be using it, and what outcomes will they achieve? I think APIs typically become reflections of the product’s internal layout rather than the needs of the user. A thoughtful design inverts that concept. Instead of exposing raw data models, we expose actions, workflows, and outcomes. We foster APIs from that perspective and the result is typically smaller APIs, with clearer concepts, and a design which is easier to maintain. Empathy is the first principle of API design. When we consider how another developer will read and consume our documentation, develop an understanding of what we have named our concepts, and how they will recover from an error, we design a better interface.

Strive for Simplicity and Predictability

Simplicity is not the lack of features, simplicity is the practice of showing only what is necessary. A well-designed API can feel natural and intuitive because it behaves the way the developer expects. When you use standard conventions with a consistent naming pattern for your resources, a consistent response format, or a meaningful use of HTTP verbs, the developer will spend less time learning and more time building. Predictability is what makes an API feel elegant. Once a developer understands how one piece works, they should be able to predict how the rest will work, and that predictability creates confidence in the developer and shortens the learning curve significantly. The best APIs have a personality they are opinionated enough to be clear, but also flexible enough to accommodate multiple use cases.

.NET Tips

The best APIs in .NET often share consistent middleware pipelines for exception handling, validation, and response shaping, which can be achieved with tools like FluentValidation and global exception filters.

Control What You Expose

When you publish an API, it’s akin to opening a window into your system. The wider you open it, the more you unveil and the harder it will become to modify the interior. A stable public interface should never leak any internal identifiers, domain objects, or temporary fields. DTO act as protective layers, giving you the flexibility to change internal logic without breaking clients. This border between your internal and external operation is what enables your life of freedom. It permits you to refactor, optimize, and change behind the scenes while appearing reliable on the outside.

.NET Tips

Libraries such as AutoMapper or Mapster simplify the mapping between domain models and public representations, ensuring your internal code can evolve independently.

Design for Evolution

Regardless of how well you think through your API design, you will have to change it. The trick is to change it without losing trust. If you introduce backward compatibility, it shows your consumers respect. If you are adding new fields, don’t change the name or the type of an existing field. If you are forced to make breaking changes, introduce new versions and communicate them well and early. Versioning is not just a technical strategy but also a product policy. You should assume that each version comes with a release, and that version will have a lifecycle. Versioning often requires deprecation of older versions, so do this in less than a climax but more than what you have implemented. Giving developers time to migrate builds trust and stability, and confidence encourages adoption.

.NET Tips

In .NET, API versioning can be managed elegantly using the Asp.Versioning library. It integrates seamlessly with ASP.NET Core, supporting URL-based (/v1/orders) or header-based (api-version: 2.0) strategies. Combined with Swashbuckle.AspNetCore, you can generate versioned Swagger documentation automatically.

Documentation as a First-Class Feature

An API with no documentation is equivalent to a library without a catalog of books. The simpler that you can enable discovery about how your API works, the more developers will use your API correctly. Effective documentation tells a narrative, how do you authenticate, how do you perform standard operations, what an error means, or what an acceptable response looks like. While useful, detailed lists of all parameters don’t carry much weight compared to useful examples. Interactive documentation, such as explorers based on OpenAPI encourages a process of learning through experimentation. Developers can see how the API behaves or what a response looks like without needing to write a line of code. This reduces the barrier to entry while identifying inconsistencies very quickly.

.NET Tips

In the .NET ecosystem, Swashbuckle.AspNetCore and NSwag are the go-to tools for generating OpenAPI documentation automatically from your controllers or Minimal APIs. They can include authentication headers, example payloads, and even allow developers to test requests directly in the browser.

Security by Design

Security must be built in from the beginning, not added on as a patch. Secure communications, authentication and authorization, and input validation must be used at every layer. Protect your API from abuse by rate limiting and be prepared to monitor for abuse and anomalies. Protecting your users’ data and your own systems also qualifies as security. There are some very damaging error messages that can simultaneously expose internal information due to the lack of an access check. Every interaction you have with your API should be treated as if it came from a completely unknown source, because it will.

.NET Tips

Protect your API from abuse using Microsoft.AspNetCore.RateLimiting, available natively from .NET 7, or the community package AspNetCoreRateLimit for more granular configuration. Always validate inputs libraries like FluentValidation can enforce strong typing and consistent validation across DTOs.

Communicating Through Errors

Errors are not defects in the design, they are aspects of the conversation between your API and the user. A vague, nonsensical 500 Internal Server Error tells you nothing. An informative, organized response giving more information about what went wrong and how to fix it, however, converts frustration into trust. For example, when a request fails because of a missing parameter, returning a descriptive message like Field email is required, and including a precise error code, conveys respect for the dev’s time. The nature of your errors is the GUI to your attitude, toward the users.

.NET Tips

You can handle errors gracefully in ASP.NET Core using ProblemDetails responses, a standardized format supported by the framework out of the box. Combine this with FluentValidation to return consistent validation errors like:

{
  "errors": {
    "email": ["The 'email' field is required."]
  },
  "type": "https://httpstatuses.com/400",
  "title": "Bad Request"
}

Performance and Scalability

A public API should respond well to growth. When thinking about efficiency, don’t depend solely on hardware, but design your API carefully. The many features of a great API like pagination, caching, and asynchronous operations will help you provide a fast and consistent response. Sometimes, the easiest way to optimize is to expose fewer resources: smaller data payloads, a smaller number of endpoints, less round trip time. Scalability is also defined in terms of resilience. When throughput spikes or downstream systems are unhealthy, your API should degrade gracefully while communicating clearly instead of silently timing out requests.

.NET Tips

When performance is critical, use Response Caching Middleware or distributed caching solutions like StackExchange.Redis. For high traffic scenarios, consider gRPC for inter service communication, it offers excellent performance for binary payloads and is fully supported in ASP.NET Core.

Conclusion

Creating APIs for public use is an art and a responsibility. It requires clarity of thought, the ability to empathize with developers, and discipline in practice. A good API is not one that shows every feature of a system, but one that demonstrates intent clearly and can change without friction. When design is intentional, the API is more than a technical interface, it is a shared language between systems and individuals. Developers grow to trust it, to build on it, and to create value it in ways you may not ever expect. In time that trust becomes an ecosystem: integrations, tools, and applications that increase the value of what your product can do.

If it’s a good API, it disappears into the background of great software. It just works.

Building Resilient Distributed Systems with MassTransit in .NET

2025-09-07T00:00:00+00:00

In today’s software landscape, applications rarely live in isolation. Modern systems need to talk to each other: services exchange data, workflows span multiple applications, and events need to propagate across boundaries. But this interconnectedness comes with challenges:

How can we avoid brittle point-to-point integrations?
How can we ensure messages aren’t lost when something goes down?
How can we scale without reinventing the wheel each time?

This is where MassTransit, a free, open-source distributed application framework for .NET, steps in.

Why MassTransit?

At its core, MassTransit abstracts away the complexity of messaging systems (RabbitMQ, Kafka or Azure Service Bus) and lets developers focus on business logic instead of boilerplate. With MassTransit you get:

Decoupling: services don’t need to know about each other’s implementation details.
Reliability: messages are queued and retried automatically.
Scalability: horizontal scaling becomes straightforward since multiple consumers can process from the same queue.
Patterns out of the box: publish/subscribe, request/response, sagas for long-running workflows.

It’s like getting a well-tested messaging toolkit built on top of proven transport engines.

A Simple Example in .NET

Let’s start with a basic scenario: a service that publishes a EventSubmitted event, and another service that processes it.

Define a Message Contract

public record EventSubmitted(Guid EventId, string Body);

Configure MassTransit with RabbitMQ

services.AddMassTransit(x =>
{
    x.AddConsumer();

    x.UsingRabbitMq((context, cfg) =>
    {
        cfg.Host("rabbitmq://localhost");
        cfg.ConfigureEndpoints(context);
    });
});

Implement a Consumer

public class EventSubmittedConsumer : IConsumer
{
    public async Task Consume(ConsumeContext context)
    {
        Console.WriteLine($"Processing event {context.Message.EventId} for {context.Message.Body}");
        // Business logic goes here
    }
}

Publish an Event

await bus.Publish(new EventSubmitted(
    Guid.NewGuid(),
    "BODY_HERE"
));

So, the publishing service doesn’t care which component listens. The consumer picks up the event, processes it, and everything is safely routed via RabbitMQ.

Beyond Basics: Sagas and Workflows

In many public sector scenarios, processes do not operate within a single request/response cycle. Consider a citizen requesting a benefit certificate. The system must verify the documents first, conduct a background check, and only then make a decision. Each of those steps can be performed by separate services and may take time. If those workflows are manually managed then services tend to be large, complex services with conditional logic, retries, and fragile state. MassTransit provides a solution to this complexity in the form of sagas. A SAGA is a state machine that is aware of the lifecycle of a request and responds to events like “Documents Verified” or “Background Check Complete”. The durability and simplicity of sagas is what makes them so useful. The current state is persisted in a database, and as long as the request is still open, if the system restarts the process will continue as though it never left. Developers do not need to worry about lifecycle management. They only need to define the events of interest, a class to hold the request’s current state and a state machine that details how the events will transition the workflow. MassTransit orchestrates the correlation, persistence, and reliability so you can focus on the business rules.

MassTransit handles all the difficult work behind the curtain:

it deals with identification of each workflow instance as well as correlation with the right messages
it saves your state in the database of your choice (SQL Server, PostgreSQL, MongoDB, or any other)
it also takes care of concurrency and retries so you can concentrate on business logic modeling rather than plumbing code.

Developers can now model the complex business process just by the few lines of configuration and state machine definition. Essentially, rather than painstakingly writing thousands of lines of orchestration code, you simply specify how your process should change and it is MassTransit that ensures it as reliable, distributed and resilient by design. Sagas are extremely beneficial to those types of domains where processes, are involved in multiple asynchronous steps and have to be very careful not to lose any part of the process such as e-commerce, finance, telecom, and healthcare. By adopting sagas, you make your workflows explicit, auditable, and easy to extend over time. In fact, sagas convert the fast-orchestrated, haphazard, ad-hoc mode to a lucid, event-driven flow. In .NET, MassTransit pattern acceptance is not only possible but also surprisingly easy.

Sample SAGA with MassTransit in .NET

Scenario: a citizen submits a request for a family benefits certificate. The process spans multiple back-office steps: document verification, background checks, and final approval. Each step is asynchronous and possibly handled by different services.

Messages (events/commands)

public record BenefitRequestSubmitted(Guid RequestId, string CitizenFiscalCode);
public record DocumentsVerified(Guid RequestId, bool Passed, string? Notes = null);
public record BackgroundCheckCompleted(Guid RequestId, bool Passed, string? Notes = null);
public record RequestApproved(Guid RequestId);
public record RequestRejected(Guid RequestId, string Reason);

Saga state

using MassTransit;

public class BenefitRequestState : SagaStateMachineInstance
{
    public Guid CorrelationId { get; set; }      // MassTransit saga key
    public string CurrentState { get; set; } = default!;
    public string CitizenFiscalCode { get; set; } = default!;
    public bool? DocsOk { get; set; }
    public bool? BackgroundOk { get; set; }
    public string? LastNotes { get; set; }
    public DateTime StartedAtUtc { get; set; }
}

State machine (orchestration)

using MassTransit;

public class BenefitRequestStateMachine : MassTransitStateMachine
{
    public State Submitted { get; private set; }
    public State Verifying { get; private set; }
    public State Checking { get; private set; }
    public State Completed { get; private set; }

    public Event RequestSubmitted { get; private set; }
    public Event DocsVerified { get; private set; }
    public Event BgcCompleted { get; private set; }

    public BenefitRequestStateMachine()
    {
        InstanceState(x => x.CurrentState);

        Event(() => RequestSubmitted, x => x.CorrelateById(c => c.Message.RequestId));
        Event(() => DocsVerified,      x => x.CorrelateById(c => c.Message.RequestId));
        Event(() => BgcCompleted,      x => x.CorrelateById(c => c.Message.RequestId));

        Initially(
            When(RequestSubmitted)
                .Then(ctx =>
                {
                    ctx.Instance.CitizenFiscalCode = ctx.Data.CitizenFiscalCode;
                    ctx.Instance.StartedAtUtc = DateTime.UtcNow;
                })
                .TransitionTo(Submitted)
                // kick off first back-office step (document verification)
                .Publish(ctx => new /* command to verifier */ BenefitRequestSubmitted(
                    ctx.Instance.CorrelationId, ctx.Instance.CitizenFiscalCode))
        );

        During(Submitted,
            When(DocsVerified)
                .Then(ctx =>
                {
                    ctx.Instance.DocsOk = ctx.Data.Passed;
                    ctx.Instance.LastNotes = ctx.Data.Notes;
                })
                .IfElse(ctx => ctx.Data.Passed,
                    thenBinder => thenBinder.TransitionTo(Verifying)
                        // request background check only if docs ok
                        .Publish(ctx => new /* command to background-check svc */ DocumentsVerified(
                            ctx.Instance.CorrelationId, passed: true)),
                    elseBinder => elseBinder.Finalize().Publish(ctx =>
                        new RequestRejected(ctx.Instance.CorrelationId, "Document verification failed")))
        );

        During(Verifying,
            When(BgcCompleted)
                .Then(ctx =>
                {
                    ctx.Instance.BackgroundOk = ctx.Data.Passed;
                    ctx.Instance.LastNotes = ctx.Data.Notes;
                })
                .IfElse(ctx => ctx.Data.Passed && ctx.Instance.DocsOk == true,
                    thenBinder => thenBinder.TransitionTo(Completed).Publish(ctx =>
                        new RequestApproved(ctx.Instance.CorrelationId)),
                    elseBinder => elseBinder.Finalize().Publish(ctx =>
                        new RequestRejected(ctx.Instance.CorrelationId, "Background check failed")))
        );

        // When saga reaches Completed (approved) or gets Finalized (rejected), it’s removed from storage:
        SetCompletedWhenFinalized();
    }
}

EF Core persistence & MassTransit config

using MassTransit;
using Microsoft.EntityFrameworkCore;

public class BenefitSagaDbContext : SagaDbContext
{
    public BenefitSagaDbContext(DbContextOptions options) : base(options) { }

    protected override IEnumerable Configurations
        => new[] { new BenefitRequestStateMap() };
}

public class BenefitRequestStateMap : SagaClassMap
{
    protected override void Configure(EntityTypeBuilder entity, ModelBuilder model)
    {
        entity.Property(x => x.CurrentState);
        entity.Property(x => x.CitizenFiscalCode);
        entity.Property(x => x.DocsOk);
        entity.Property(x => x.BackgroundOk);
        entity.Property(x => x.LastNotes);
        entity.Property(x => x.StartedAtUtc);
    }
}

builder.Services.AddDbContext(opt =>
    opt.UseSqlServer(builder.Configuration.GetConnectionString("SagaDb")));

builder.Services.AddMassTransit(x =>
{
    x.AddSagaStateMachine()
     .EntityFrameworkRepository(r =>
     {
         r.ConcurrencyMode = ConcurrencyMode.Pessimistic;
         r.AddDbContext((_, cfg) => { /* configured above */ });
     });

    x.UsingRabbitMq((ctx, cfg) =>
    {
        cfg.ConfigureEndpoints(ctx);
    });
});

Kicking off the workflow

Anywhere in your system (like an API endpoint), publish the initial event:

[HttpPost("requests")]
public async Task Submit([FromServices] IBus bus, [FromBody] string fiscalCode)
{
    var id = Guid.NewGuid();
    await bus.Publish(new BenefitRequestSubmitted(id, fiscalCode));
    return Accepted(new { requestId = id });
}

Where MassTransit Shines

Microservices architectures: where decoupling is essential.
High-throughput systems: thanks to message batching, retries, and concurrency control.
Event-driven applications: publish/subscribe patterns for real-time updates.
Legacy modernization: gradually replacing point-to-point integrations with robust messaging.

Conclusion

MassTransit allows .NET teams to construct resilient, scalable, and decoupled systems while staying above the level of low-level messaging details. Teams do not need to know anything about queues, exchanges, and retry loops, and can focus on creating business value. If your system is constantly threatened by tight coupling, message loss, or integration spaghetti code, you will want to check out MassTransit. With some really simple configuration, you could be taking your architecture to the next level and into the event-driven world.

TemporalCollections: High-Performance, Thread-Safe Temporal Data Structures for .NET

2025-08-24T00:00:00+00:00

TemporalCollections is a personal open-source project I created to address a recurring challenge I encountered while building distributed systems: the lack of ready-to-use, time-aware data structures that are both thread-safe and optimized for time-based querying and pruning. It’s a .NET library that extends familiar collections, queues, stacks, sets, dictionaries, and more with native temporal semantics. Every item is automatically timestamped on insertion, enabling:

Fast time-range queries
Deterministic aging and pruning
Accurate temporal analytics

All of this comes with a unified query API and built-in thread-safety, making it easy to reason about data in time-sensitive applications. TemporalCollections is ideal for scenarios like event streaming, sliding-window analytics, telemetry buffers, rate limiting, session tracking, and caches with expiry any situation where time is a first-class concern.

👉 GitHub repository

Why Temporal Collections?

Time is a first-class dimension in many systems:

Event streams & observability: ingest items at high throughput and answer questions like “what happened in the last N seconds/minutes?”
Sliding-window analytics: compute rolling aggregates (counts, percentiles) on recent data only.
Caches & sessions: expire stale entries or prune by age.
Temporal state tracking: maintain the evolution of values over time (latest, earliest, before/after a point).

While you could bolt timestamps onto standard collections, you would still need to solve ordering, race-free timestamp assignment, range queries, pruning, and concurrency consistently across multiple data structures. TemporalCollections addresses these concerns out-of-the-box with a monotonic timestamp guarantee and a common query surface.

Core Design Principles

Temporal semantics: items carry precise insertion timestamps.
Thread-safety: operations are safe in multi-threaded environments.
Time-based querying: efficient retrieval by time windows.
Efficient cleanup: prune older items without long global locks.

Monotonic Timestamp

Temporal collections only make sense if time behaves. In practice, though, system clocks don’t always cooperate: multiple calls to UtcNow within the same tick can return identical values; NTP can move the clock backwards; and highly concurrent code can interleave operations so tightly that two insertions appear to occur at the same instant. If timestamps aren’t strictly ordered, time-window queries become flaky (GetInRange may miss or double count items on boundaries) and age-based pruning (RemoveOlderThan) isn’t deterministic. To keep temporal behavior predictable, the library assigns a monotonic timestamp to every insertion: each generated value is guaranteed to be strictly greater than the one before it within the same process. If the clock doesn’t advance between two reads, we simply step forward by one tick and move on.

Deterministic boundary rules

GetInRange(from, to) is inclusive on both ends.
RemoveOlderThan(cutoff) removes Timestamp < cutoff (keeps >= cutoff).
GetBefore(time) is strictly <; GetAfter(time) is strictly >.
GetLatest() / GetEarliest() return extremes or null when empty.

These rules make window math predictable and prevent off-by-one bugs.

Snapshot semantics

Enumerations return a stable snapshot at call time, preserving determinism under concurrency.

The Core Abstraction: `TemporalItem`

All collections store TemporalItem, a lightweight wrapper that pairs an immutable value with a timestamp (DateTimeOffset) representing the insertion moment. Timestamps are strictly increasing even under bursty or concurrent insertions: if UtcNow would produce a non-increasing value (precision limits / clock granularity), the library atomically increments by a tick to maintain order and uniqueness. This yields deterministic chronology without races across threads.

A Unified Temporal Query Surface: `ITimeQueryable`

Every structure implements ITimeQueryable, exposing consistent operations:

GetInRange(from, to): enumerate items in an inclusive time window.
RemoveOlderThan(cutoff): age/prune items strictly older than cutoff.
CountInRange(from, to): count items in a window.
GetTimeSpan(): time span covered by the collection (latest−earliest).
RemoveRange(from, to): delete items in a window.
GetLatest() / GetEarliest(): fast access to extremes.
GetBefore(time) / GetAfter(time): query by relative time.
CountSince(from): rolling counts.
GetNearest(time): nearest neighbor by timestamp.

This interface makes code collection agnostic, you can prototype with a queue and later swap to a sorted structure or an interval tree without rewriting queries.

Provided Data Structures

TemporalQueue
- ✅ Use when you need a thread-safe FIFO queue with time-based retrieval and cleanup.
- ❌ Avoid if you need priority ordering or random access.
TemporalStack
- ✅ Use when you want a thread-safe LIFO stack with timestamp tracking and time-range queries.
- ❌ Avoid if you require fast arbitrary removal or frequent sorting by timestamp.
TemporalSet
- ✅ Use for unique timestamped items with efficient membership checks and time-based removal.
- ❌ Avoid if you need ordering or priority queues.
TemporalSlidingWindowSet
- ✅ Use when you need to automatically retain only recent items within a fixed time window.
- ❌ Avoid if your window size is highly dynamic or if you need sorted access.
TemporalSortedList
- ✅ Use for a sorted-by-timestamp collection with efficient binary-search range queries.
- ❌ Avoid if insertion frequency is very high (O(n) inserts).
TemporalPriorityQueue
- ✅ Use when you need priority-based ordering with timestamp-aware dequeueing.
- ❌ Avoid if you only need FIFO/LIFO semantics without priorities.
TemporalIntervalTree
- ✅ Use for efficient interval overlap queries and session windows.
- ❌ Avoid if your data are single points rather than intervals.
TemporalDictionary
- ✅ Use for key-based access combined with global time-range queries.
- ❌ Avoid if you require a fully ordered view or range queries strictly sorted by timestamp.
TemporalCircularBuffer
- ✅ Use for a fixed-size ring buffer that overwrites the oldest items.
- ❌ Avoid if you need unbounded storage or complex queries.

🚀 Getting Started with TemporalCollections

This section shows how to install and use TemporalCollections in your .NET projects with simple examples.

Installation

dotnet add package TemporalCollections

Basic usage

TemporalQueue

using System;
using System.Linq;
using TemporalCollections.Collections;

var queue = new TemporalQueue();

// Enqueue items (timestamps are assigned automatically)
queue.Enqueue("event-1");
queue.Enqueue("event-2");

// Peek oldest (does not remove)
var oldest = queue.Peek();
Console.WriteLine($"Oldest: {oldest.Value} @ {oldest.Timestamp}");

// Dequeue oldest (removes)
var dequeued = queue.Dequeue();
Console.WriteLine($"Dequeued: {dequeued.Value} @ {dequeued.Timestamp}");

// Query by time range (inclusive)
var from = DateTime.UtcNow.AddMinutes(-5);
var to   = DateTime.UtcNow;
var inRange = queue.GetInRange(from, to);
foreach (var item in inRange)
{
    Console.WriteLine($"In range: {item.Value} @ {item.Timestamp}");
}

TemporalSet

using System;
using TemporalCollections.Collections;

var set = new TemporalSet();

set.Add(1);
set.Add(2);
set.Add(2);

Console.WriteLine(set.Contains(1));

// Remove older than a cutoff
var cutoff = DateTime.UtcNow.AddMinutes(-10);
set.RemoveOlderThan(cutoff);

// Snapshot of all items ordered by timestamp
var items = set.GetItems();

TemporalDictionary

using System;
using System.Linq;
using TemporalCollections.Collections;

var dict = new TemporalDictionary();

dict.Add("user:1", "login");
dict.Add("user:2", "logout");
dict.Add("user:1", "refresh");

// Range query across all keys
var from = DateTime.UtcNow.AddMinutes(-1);
var to   = DateTime.UtcNow.AddMinutes(1);
var all = dict.GetInRange(from, to);

// Range query for a specific key
var user1 = dict.GetInRange("user:1", from, to);

// Compute span covered by all events
var span = dict.GetTimeSpan();
Console.WriteLine($"Span: {span}");

// Remove a time window across all keys
dict.RemoveRange(from, to);

TemporalStack

using System;
using System.Linq;
using TemporalCollections.Collections;

var stack = new TemporalStack();

// Push (timestamps assigned automatically, monotonic UTC)
stack.Push("first");
stack.Push("second");

// Peek last pushed (does not remove)
var top = stack.Peek();
Console.WriteLine($"Top: {top.Value} @ {top.Timestamp}");

// Pop last pushed (removes)
var popped = stack.Pop();
Console.WriteLine($"Popped: {popped.Value}");

// Time range query (inclusive)
var from = DateTime.UtcNow.AddMinutes(-5);
var to   = DateTime.UtcNow;
var items = stack.GetInRange(from, to).OrderBy(i => i.Timestamp);

// Remove older than cutoff
var cutoff = DateTime.UtcNow.AddMinutes(-10);
stack.RemoveOlderThan(cutoff);

TemporalSlidingWindowSet

using System;
using System.Linq;
using TemporalCollections.Collections;

var window = TimeSpan.FromMinutes(10);
var swSet = new TemporalSlidingWindowSet(window);

// Add unique items (insertion timestamp recorded)
swSet.Add("A");
swSet.Add("B");

// Periodically expire items older than the window
swSet.RemoveExpired();

// Snapshot (ordered by timestamp)
var snapshot = swSet.GetItems().ToList();

// Query by time range
var from = DateTime.UtcNow.AddMinutes(-5);
var to   = DateTime.UtcNow;
var inRange = swSet.GetInRange(from, to);

// Manual cleanup by cutoff (if needed)
swSet.RemoveOlderThan(DateTime.UtcNow.AddMinutes(-30));

TemporalSortedList

using System;
using System.Linq;
using TemporalCollections.Collections;

var list = new TemporalSortedList();

// Add items (kept sorted by timestamp internally)
list.Add(10);
list.Add(20);
list.Add(30);

// Fast range query via binary search (inclusive)
var from = DateTime.UtcNow.AddSeconds(-30);
var to   = DateTime.UtcNow;
var inRange = list.GetInRange(from, to);

// Before / After helpers
var before = list.GetBefore(DateTime.UtcNow);
var after  = list.GetAfter(DateTime.UtcNow.AddSeconds(-5));

// Housekeeping
list.RemoveOlderThan(DateTime.UtcNow.AddMinutes(-1));
Console.WriteLine($"Span: {list.GetTimeSpan()}");

TemporalPriorityQueue

using System;
using System.Linq;
using TemporalCollections.Collections;

var pq = new TemporalPriorityQueue();

// Enqueue with explicit priority (lower number = higher priority)
pq.Enqueue("high", priority: 1);
pq.Enqueue("low",  priority: 10);

// TryPeek (does not remove)
if (pq.TryPeek(out var next))
{
    Console.WriteLine($"Peek: {next}");
}

// TryDequeue (removes highest-priority; stable by insertion time)
while (pq.TryDequeue(out var val))
{
    Console.WriteLine($"Dequeued: {val}");
}

// Time-based queries are also available
var from = DateTime.UtcNow.AddMinutes(-5);
var to   = DateTime.UtcNow;
var items = pq.GetInRange(from, to);

Console.WriteLine($"Count in range: {pq.CountInRange(from, to)}");

TemporalCircularBuffer

using System;
using System.Linq;
using TemporalCollections.Collections;

// Fixed-capacity ring buffer; overwrites oldest when full
var buf = new TemporalCircularBuffer(capacity: 3);

buf.Add("A");
buf.Add("B");
buf.Add("C");
buf.Add("D"); // Overwrites "A"

// Snapshot (oldest -> newest)
var snapshot = buf.GetSnapshot();
foreach (var it in snapshot)
{
    Console.WriteLine($"{it.Value} @ {it.Timestamp}");
}

// Range queries
var from = DateTime.UtcNow.AddMinutes(-5);
var to   = DateTime.UtcNow;
var inRange = buf.GetInRange(from, to);

// Remove a time window
buf.RemoveRange(from, to);

// Cleanup by cutoff (keeps >= cutoff)
buf.RemoveOlderThan(DateTime.UtcNow.AddMinutes(-1));

TemporalIntervalTree

using System;
using System.Linq;
using TemporalCollections.Collections;

var tree = new TemporalIntervalTree();

var now = DateTime.UtcNow;
tree.Insert(now, now.AddMinutes(10), "session:A");
tree.Insert(now.AddMinutes(5), now.AddMinutes(15), "session:B");

// Overlap query (values only)
var overlapValues = tree.Query(now.AddMinutes(7), now.AddMinutes(12));
// Overlap query (with timestamps = interval starts)
var overlapItems  = tree.GetInRange(now.AddMinutes(7), now.AddMinutes(12));

Console.WriteLine($"Overlaps: {string.Join(", ", overlapValues)}");

// Remove intervals that ended before a cutoff
tree.RemoveOlderThan(now.AddMinutes(9));

Threading Model & Big-O Cheatsheet

All collections are thread-safe. Locking granularity and common operations (amortized):

Collection	Locking	Add/Push	Range Query	RemoveOlderThan
TemporalQueue	single lock around a queue snapshot	O(1)	O(n)	O(k) from head
TemporalStack	single lock; drain & rebuild for window ops	O(1)	O(n)	O(n)
TemporalSet	lock-free dict + per-bucket ops	O(1) avg	O(n)	O(n)
TemporalSortedList	single lock; binary search for ranges	O(n) insert	O(log n + m)	O(k)
TemporalPriorityQueue	single lock; `SortedSet` by (priority,timestamp)	O(log n)	O(n)	O(n)
TemporalIntervalTree	single lock; interval overlap pruning	O(log n) avg	O(log n + m)	O(n)
TemporalDictionary	concurrent dict + per-list lock	O(1) avg	O(n)	O(n)
TemporalCircularBuffer	single lock; ring overwrite	O(1)	O(n)	O(n)

n = items, m = matches, k = removed.

Benchmark Results

Measured with BenchmarkDotNet, the results paint a consistent picture:

Insert-heavy pipelines with periodic age-off
TemporalQueue and TemporalCircularBuffer deliver the lowest median insert times (constant-time appends) and predictable pruning
(head-first for the queue, overwrite for the ring).
Frequent, wide time-window queries over large datasets
TemporalSortedList (binary-search boundaries) and TemporalIntervalTree (overlap index) offer the best query latency,
at the cost of more expensive inserts—especially for the sorted list.
Middle ground
TemporalSet and TemporalSlidingWindowSet show good insertion behavior and simple maintenance,
but range scans are linear compared to indexed structures.
Priority-aware processing
TemporalPriorityQueue optimizes for priority-based dequeue, so time-range scans and pruning are comparatively slower.
Per-key histories + global time queries
TemporalDictionary is a balanced option when you need per-key histories together with global time queries,
while TemporalStack mirrors the queue on inserts but pays linear costs on range queries and pruning.

For exact median timings, environment details, and methodology, see the full report:

👉 Full Benchmark report

Conclusion

TemporalCollections offers a pragmatic, production-minded approach to managing time-aware data in .NET: you get consistent timestamps, a unified query API, and a portfolio of structures optimized for different temporal needs. Start simple with a queue or sliding window set; when your workload demands it, switch to a sorted or interval-based structure, without changing how you query by time.

The Pragmatic Power of Modular Monoliths in .NET

2025-08-11T00:00:00+00:00

A modular monolith provides a hybrid architecture, keeping the single deployability aspect of a traditional monolith while enforcing solid module boundaries like microservices. Technically, an application is ultimately a single deployable unit, but internally the application is made up of distinct encapsulated components, each providing a distinct domain or business functionality.

What Is a Modular Monolith?

The modular monolith is a compromise in architectural form: like traditional monolithic applications, it is simple, but it also imposes simple, clear module boundaries, similar to microservices. In this case, the software will be deployed as a single software unit, but it will have clear module boundaries with well encapsulated components, each encapsulating a domain or business function.

Why Choose a Modular Monolith?

Often referred to as the Goldilocks architecture, it provides just right flexibility and simplicity. You miss the operational overhead of microservices (service orchestration, network latency, etc.) while pumping up maintainability and scalability over that of tightly coupled monolith. Other compelling advantages:

Simplified deployment and operations
Easier code refactoring and ownership boundaries
A natural evolution path toward microservices if needed

Core Architecture Principles

Encapsulation through Modules Every module should encapsulate its own domain logic, data access, and potentially its own database context. Modules only interact with one another via abstractions, any cross-module leakage will manage that effectively to ensure low coupling and high cohesion.
Vertical Slices + Domain-Driven Design Structuring your code (Vertical Slice Architecture) around features makes implementation closer to its business use cases. For our Clean or Onion Architecture layers, domain-centred architecture can also help establish clearer dependencies and encourage enforced isolation.
Shared vs Isolated Data Even though you might have multiple modules sharing a database, architectural discipline should prevent ways to gain unauthorized access. Each module should take care to only interact with its aggregates or its schemas, this may be enforced through internal visibility or architectural boundaries.

Hands-On Example in C#

In a modular monolith, each module is self-contained and exposes only a public API (interfaces, commands, events) to other modules, while internal classes remain inaccessible. Imagine a .NET solution structured like this:

MyApp.Web               // Entry point
MyApp.Modules.Orders    // Orders domain
MyApp.Modules.Inventory // Inventory domain
MyApp.SharedKernel      // Shared contracts
MyApp.Modules.Orders.Tests
MyApp.Modules.Inventory.Tests

Inventory Module:

// SharedKernel/Contracts/IInventoryService.cs
public interface IInventoryService
{
    bool AdjustStock(int productId, int quantity);
}

internal class InventoryService : IInventoryService
{
    public bool AdjustStock(int productId, int quantity) { ... }
}

Public API registration:

Modules expose public interfaces for cross-module communication.

public static IServiceCollection AddInventoryModule(this IServiceCollection services)
{
    services.AddScoped();
    return services;
}

For example, OrderService could depend on an IInventoryService defined in a shared contract promoting decoupling. Use the internal modifier to enforce module boundaries. Some frameworks, like Ardalis.Modulith (https://github.com/ardalis/modulith), even automate boundary checking via tests.

Usage in Orders module:

public class OrderService
{
    private readonly IInventoryService _inventory;
    public OrderService(IInventoryService inventory) => _inventory = inventory;

    public bool PlaceOrder(int productId, int qty) =>
        _inventory.AdjustStock(productId, -qty);
}

The key is that modules never depend on each other’s internal classes only on public contracts. This keeps boundaries clear, reduces coupling, and makes future refactoring or extraction into microservices easier.

When to Favor a Modular Monolith?

This style shines when:

You operate with small to mid-sized teams
Your delivery velocity matters more than complex scalability
You later want the flexibility to extract modules as independent microservices without rewriting everything

Trade-offs to Consider

Despite logical segmentation, failure in one module can still bring down the whole application
Scaling remains vertical, unless you extract modules later
Discipline required: without review, modules can devolve into a bad tangled codebase

Conclusion

Modular monoliths are a clever compromise. They’re easier to deal with than real distributed systems, while yielding structure, maintainability, and testability. Modular monoliths are also complementary with .NET tooling and provide a future migration path into microservices if that is the long term goal.

Hexagonal Architecture with .NET: Designing for Testability and Adaptability

2025-07-19T00:00:00+00:00

In an era where software must evolve rapidly, integrate seamlessly, and remain robust over time, how we architect our systems matters more than ever. Among the architectural styles that have been successful in enabling maintainability, testability, and flexibility is Hexagonal Architecture, or the Ports and Adapters architecture. In this article, we’ll talk about Hexagonal Architecture in .NET apps, explaining the thinking behind the pattern, why it matters, and how you can apply it to build systems that are easier to test, maintain, and grow. We’ll conclude with a full working example developed with C# using .NET.

What Is Hexagonal Architecture?

Hexagonal Architecture, introduced by Alistair Cockburn, is an approach to software design that aims to isolate the core business logic from external concerns like databases, web APIs, UIs and messaging systems.

Key Concepts:

Application Core: Contains your business logic and use cases. It has no dependencies on infrastructure or external systems.
Ports: Interfaces defined by the core to communicate with the outside world.
Adapters: Implementations of ports that bridge external systems to the application core.

Benefits:

Improved testability (the core logic can be unit tested in isolation).
Enhanced adaptability (easy to swap infrastructure technologies).
Clear separation of concerns and better dependency management.

When and Why Use Hexagonal Architecture in .NET?

Software development in the world of .NET is often quick off the blocks ASP.NET Core gives us ready to use templates, Entity Framework simplifies the data access, and integrating third-party services is just a package away. However, as expectations shift and the code grows, tightly coupled code becomes the drag rather than the benefit. This is where Hexagonal Architecture really shines. Instead of structuring your app based on frameworks and technology, you structure your app based on your domain logic the essence of your application that actually matters. By isolating the core business rules from external dependencies, Hexagonal Architecture helps .NET developers create systems that are more maintainable, testable, and adaptable over time. You’re no longer tied to a specific database, UI, or even framework those are just adapters that can be swapped out with minimal impact. Here’s why this approach can be a game-changer for .NET applications:

Testability: You are able to test your business logic in isolation without touching web servers, databases, or file systems. This means faster, more reliable unit tests.
Technology Independence: You are able to switch from Entity Framework to Dapper, or from REST to gRPC, without changing your business logic.
UI Flexibility: You are able to present your application as a REST API, a Blazor frontend, or even a CLI. The underlying is the same.
Clean Evolution: You can move legacy systems incrementally, one adapter at a time, without touching the domain.

In the real world, this structure is very powerful in:

Domain-Driven Design (DDD) situations, where your domain logic is rich and needs to be kept apart from noisy infrastructure code.
Microservices, where each service benefits from having a clear boundary between its core behavior and the way it communicates.
Legacy modernization efforts, where shrouding and substituting infrastructure components piecemeal is paramount.

Overall, Hexagonal Architecture does not slow your development process down it just ensures the speed you’re gaining today won’t come at the expense of maintainability tomorrow.

Project Structure Overview

Let’s implement a simple Webinar Management system following the hexagonal approach.

Domain: WebinarControl.Core

Domain Models
Interfaces (Ports)
Use Cases

Application Layer: WebinarControl.Application

Services that coordinate use cases

Adapters:

WebinarControl.Infrastructure: Database Adapter (EF Core)
WebinarControl.WebApi: REST API Adapter (ASP.NET Core)

Step-by-Step Implementation

Define the Domain (Core) Project: WebinarControl.Core

// Models/Webinar.cs
namespace WebinarControl.Core.Models
{
    public class Webinar
    {
        public Guid Id { get; set; } = Guid.NewGuid();
        public string Title { get; set; } = string.Empty;
        public string Speaker { get; set; } = string.Empty;
        public DateTime ScheduledAt { get; set; }
    }
}

// Ports/IWebinarRepository.cs
namespace WebinarControl.Core.Ports
{
    public interface IWebinarRepository
    {
        Task GetByIdAsync(Guid id);
        Task> GetAllAsync();
        Task AddAsync(Webinar webinar);
    }
}

// Ports/IScheduleWebinarUseCase.cs
namespace WebinarControl.Core.Ports
{
    public interface IScheduleWebinarUseCase
    {
        Task ScheduleAsync(string title, string speaker, DateTime scheduledAt);
    }
}

Implement Application Logic Project: WebinarControl.Application

// Services/ScheduleWebinarService.cs
using WebinarControl.Core.Models;
using WebinarControl.Core.Ports;

namespace WebinarControl.Application.Services
{
    public class ScheduleWebinarService : IScheduleWebinarUseCase
    {
        private readonly IWebinarRepository _repository;

        public ScheduleWebinarService(IWebinarRepository repository)
        {
            _repository = repository;
        }

        public async Task ScheduleAsync(string title, string speaker, DateTime scheduledAt)
        {
            var webinar = new Webinar { Title = title, Speaker = speaker, ScheduledAt = scheduledAt };
            await _repository.AddAsync(webinar);
            return webinar;
        }
    }
}

Implement Infrastructure Adapter (EF Core) Project: WebinarControl.Infrastructure

// EF/WebinarDbContext.cs
using Microsoft.EntityFrameworkCore;
using WebinarControl.Core.Models;

namespace WebinarControl.Infrastructure.EF
{
    public class WebinarDbContext : DbContext
    {
        public DbSet Webinars => Set();

        public WebinarDbContext(DbContextOptions options)
            : base(options) { }
    }
}

// Repositories/WebinarRepository.cs
using Microsoft.EntityFrameworkCore;
using WebinarControl.Core.Models;
using WebinarControl.Core.Ports;

namespace WebinarControl.Infrastructure.Repositories
{
    public class WebinarRepository : IWebinarRepository
    {
        private readonly WebinarDbContext _context;

        public WebinarRepository(WebinarDbContext context)
        {
            _context = context;
        }

        public async Task AddAsync(Webinar webinar)
        {
            _context.Webinars.Add(webinar);
            await _context.SaveChangesAsync();
        }

        public async Task> GetAllAsync()
        {
            return await _context.Webinars.ToListAsync();
        }

        public async Task GetByIdAsync(Guid id)
        {
            return await _context.Webinars.FindAsync(id);
        }
    }
}

Implement Web API Adapter Project: WebinarControl.WebApi

// Program.cs
using Microsoft.EntityFrameworkCore;
using WebinarControl.Core.Ports;
using WebinarControl.Application.Services;
using WebinarControl.Infrastructure.EF;
using WebinarControl.Infrastructure.Repositories;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddDbContext(opt =>
    opt.UseInMemoryDatabase("WebinarDb"));

builder.Services.AddScoped();
builder.Services.AddScoped();

builder.Services.AddEndpointsApiExplorer();
builder.Services.AddSwaggerGen();

var app = builder.Build();

app.UseSwagger();
app.UseSwaggerUI();

app.MapPost("/webinars", async (string title, string speaker, DateTime scheduledAt, IScheduleWebinarUseCase useCase) =>
{
    var webinar = await useCase.ScheduleAsync(title, speaker, scheduledAt);
    return Results.Created($"/webinars/{webinar.Id}", webinar);
});

app.Run();

Testing the Core

// UnitTests/ScheduleWebinarServiceTests.cs
using Moq;
using WebinarControl.Core.Ports;
using WebinarControl.Application.Services;
using Xunit;

public class ScheduleWebinarServiceTests
{
    [Fact]
    public async Task ScheduleAsync_ShouldCreateWebinarWithGivenDetails()
    {
        var repoMock = new Mock();
        var service = new ScheduleWebinarService(repoMock.Object);

        var scheduledAt = DateTime.UtcNow.AddDays(1);
        var webinar = await service.ScheduleAsync(".NET Hexagonal", "F. Del Re", scheduledAt);

        Assert.Equal(".NET Hexagonal", webinar.Title);
        Assert.Equal("F. Del Re", webinar.Speaker);
        Assert.Equal(scheduledAt, webinar.ScheduledAt);
        repoMock.Verify(r => r.AddAsync(It.IsAny()), Times.Once);
    }
}

Potential Limitations and Challenges

While Hexagonal Architecture offers numerous benefits, especially around testability, separation of concerns, and adaptability, it’s important to understand that it comes with trade-offs.

Initial Complexity and Overhead: When starting a new project, Hexagonal Architecture introduces abstractions and layers that may feel premature or heavy-weight for small or prototype applications. You might be crafting multiple projects, interfaces, and dependency injections before you see a single feature working.
Over-Abstraction: Too much abstraction can lead to boilerplate code and cognitive overhead, especially when interfaces are created for every single service, even when there’s only a single implementation. For small teams or codebases, this slows down development rather than speeding it up.
Learning Curve for Developers Teams: New developers to the pattern may struggle to find their way around or contribute to the code base. Terms like ports, adapters, and the separation of inbound and outbound interfaces require a shift in thinking from traditional layered architectures.

Conclusion

Hexagonal Architecture is a strong paradigm for .NET developers wishing to write maintainable, testable, and dynamic systems. By keeping your domain logic and infrastructure concerns well separated, you not only decouple, but you allow your application to scale in a manner independent of tech choices. With .NET and C#’s evolving features, it’s easier than ever now to write well-architected systems that stand the test of time.

Health Checks in Microservices with C#: Readiness, Liveness, and Startup Probes

2025-06-15T00:00:00+00:00

In today’s microservices architectures, particularly those that run in Kubernetes, health checks for applications are essential for reliability, observability, and straightforward scalability. Kubernetes offers liveness, readiness, and startup probes, which help Kubernetes understand and manage the life-cycle of application containers. This article will examine what the probes are, what they mean, how they are intended to be used, and how to use them effectively with C# applications using ASP.NET Core.

What Are Probes?

In a microservices context, and specifically in a Kubernetes environment, health probes are ways for the platform to track the status of each container and take relevant action when something goes wrong. Health probes help ensure high availability and are a key part of managing the orchestration of services because they deliver information to Kubernetes about your application’s internal state. There are three types of probes: liveness probe, readiness probe, and startup probe.

Liveness Probe

The liveness probe acts to determine if your application is still active. It gives a simple answer to the question: is the application running, or is it deadlocked or stuck? If the liveness probe continues to fail, Kubernetes will treat the container as broken and will restart it automatically. This is useful in case where the app has stopped processing due to some internal failure, but the process has not crashed. Liveness checks are usually simple and fast, just enough to determine that the core application loop has not gone down. A properly configured liveness probe will prevent long-running but non-live containers from staying in production, improving overall resilience.

Readiness Probe

The readiness probe checks if a container is ready to serve requests. The container may be alive (as determined by a liveness probe) but still not ready to serve requests for a variety of reasons such as still initializing, waiting for configuration, or establishing a database connection. In this event, Kubernetes will drop the Pod from the Service endpoint list until the readiness probe is satisfied again. The container is not restarted, the container is just being held back from serving requests. This readiness check is especially important during deployments and rolling updates, and restarts to ensure that only containers that are fully ready take on any load.

Startup Probe

The startup probe is intended for use with applications that take a long time to initialize. The startup probe will run once during startup, and while it is in a state of failure, Kubernetes won’t run the liveness or readiness probes. This is particularly valuable for legacy systems or services that have long bootstrapping processes. The startup probe avoids a case where the liveness probe can prematurely mark the probe as failed before the application is even ready and cause the container to restart. Once the startup probe has skipped, Kubernetes will start running the regular readiness and liveness probes.

Implementing Health Checks in C# with ASP.NET Core

Health check functionality is built into the ASP.NET Core framework and does not require any additional packages.

Step 1: Add the Health Check Middleware

In your Program.cs or Startup.cs, register health checks:

builder.Services.AddHealthChecks()
    .AddCheck("database_check");

You can create custom checks by implementing IHealthCheck interface which contains a single CheckHealthAsync method:

public class DatabaseHealthCheck : IHealthCheck
{
  private readonly IConfiguration _config;

  public DatabaseHealthCheck(IConfiguration config)
  {    
    _config = config;
  }

  public async Task CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken = default)
  {
    var connectionString = _config["Data:DefaultConnection"];

    using var connection = new SqlConnection(connectionString);

    try
    {
      await connection.OpenAsync(cancellationToken);

      return HealthCheckResult.Healthy();
    }
    catch (Exception ex)
    {
      return HealthCheckResult.Unhealthy(ex); 
    }                   
  }
}

Step 2: Configure Endpoints

Map the health check endpoints in Program.cs:

app.MapHealthChecks("/health/live", new HealthCheckOptions
{
    Predicate = (check) => check.Name == "self"
});

app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = (check) => check.Name == "database"
});

Creating Composite Health Checks

In certain cases, it’s useful to aggregate multiple health checks under a single, composite health check. This is particularly helpful when you want to expose a higher-level abstraction like StorageHealth, which internally evaluates the health of, say, a database, a blob storage, and a file system. Here’s how you can implement a composite health check by composing multiple IHealthCheck instances:

public class StorageHealthCheck : IHealthCheck
{
  private readonly IEnumerable _checks;  

  public StorageHealthCheck(IEnumerable checks) 
  {
    _checks = checks;
  }

  public async Task CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken = default)
  {
    var results = await Task.WhenAll(_checks.Select(c => 
      c.CheckHealthAsync(context, cancellationToken)));

    if(results.Any(r => r.Status == HealthStatus.Unhealthy))
    {
        return HealthCheckResult.Unhealthy();
    }

    return HealthCheckResult.Healthy();
  }
}

You can register it like this:

builder.Services.AddHealthChecks()
    .AddCheck("storage_health");

When to Use a Composite Class

You need custom aggregation logic.
You want to encapsulate a domain-specific grouping, not just tag-based.
You want to reuse the group check across multiple probes or services.

Configure Probes in Kubernetes

Example configuration for Kubernetes deployment.yaml:

livenessProbe:
  httpGet:
    path: /health/live
    port: 80
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /health/ready
    port: 80
  initialDelaySeconds: 5
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

startupProbe:
  httpGet:
    path: /health/live
    port: 80
  initialDelaySeconds: 0
  periodSeconds: 10
  failureThreshold: 30

Explanation:

initialDelaySeconds: Time to wait after container starts before probing.
periodSeconds: How often to perform the check.
failureThreshold: Number of failed checks before taking action.
timeoutSeconds: Timeout for each probe request.

Best Practices

Use /health/ready to include checks for dependencies like databases, caches, etc.
Use /health/live to ensure your app is running, even if not fully operational.
Separate concerns clearly: make your liveness probe simple and fast.
Use startupProbe for apps that need extra time to initialize.
Ensure health check endpoints are lightweight and fast to avoid resource strain.

Conclusion

Health probes are a vital part for robust microservices. By utilizing ASP.NET Core’s health check system and Kubernetes probes in conjunction, you’ll have the ability to see that your services are reliably behaving and scaling appropriately. When you correctly implement the liveness, readiness, and startup probes, you can reduce downtime and increase observability.