CodeOpinion

Coding Isn’t the Hard Part

Derek Comartin — Wed, 18 Mar 2026 13:31:17 +0000

I keep seeing posts pushing back on the idea that coding isn’t the hard part. And I get why. A lot of the disagreement comes down to what people mean by coding.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

But in the world I work in, coding usually is not the hard part.

I’m talking about line of business and enterprise apps. Order management, healthcare, insurance, logistics, and similar systems. In those kinds of systems, the real difficulty is usually not writing code. That is not to say building software is easy, because it is not. But if you understand your tools, your language, your libraries, your frameworks, and you have a solid foundation, the coding itself is often the more mechanical part.

That is because these systems usually are not algorithmically complex. Some domains absolutely are, but most of the systems I’m talking about are more workflow complex than algorithmically complex. Once you have a good foundation and you know the tooling you are working with, building the system often becomes a matter of assembling pieces. You keep adding on to what is already there. In that sense, the implementation can start to feel routine.

The hard part is figuring out what to build

What is not routine is figuring out what needs to be built in the first place.

That is where the real complexity shows up. What events occur in the system? What triggers them? What business rules apply? What data has to remain consistent? What edge cases exist? How does the process actually work from end to end?

Those are the questions that make this hard.

For line of business systems, the best developers I know understand business well. They know how to break down workflows, decompose a problem, and understand how things move through a system. You are not just writing code. You are trying to understand how a business operates and then model that in software.

Why boundaries are difficult

That is why I keep saying that defining boundaries is one of the most important things you can do, and also one of the most difficult.

There are techniques that can help, like event storming, but it is still hard to take a large system, break it into smaller parts, and decide where responsibilities belong. That is where most of the real design work is.

Part of the disagreement around this topic is probably just different definitions of coding. One reply I saw said that applying a good design to an existing system is coding. And if that is your definition, then we are not that far apart. Because I am talking about system design and implementation together. That work is difficult. Building line of business systems absolutely has complexity.

Workflow complexity is different from technical complexity

I use messaging and event driven architecture examples a lot because they make this easy to see.

If you are building a workflow based system, you may have to deal with idempotency, retries, backoffs, dead letter queues, concurrency, claim check patterns, and all kinds of technical concerns. Those things are difficult. In many cases they are more difficult than implementing a specific business step in a workflow.

But even there, the hardest part usually is not the code for a single step. It is understanding the workflow, modeling it correctly, and knowing how it can evolve when the business process changes or when your original assumptions turn out to be incomplete.

A simple shipment example

Take a shipment workflow as a simple example.

At a high level, you might say a shipment is dispatched, a driver arrives for pickup, the shipment is loaded, the driver departs, arrives at the destination, and completes delivery.

That sounds straightforward when you say it like that. But real systems are rarely that simple.

Maybe one truck is handling multiple shipments.

Maybe a pickup becomes unavailable. Maybe the shipment is delayed and there is no point in sending the driver. This is called a dry-run.

Maybe parts of the process branch depending on the customer, the carrier, or the type of delivery.

So now what are you modeling? Is it one workflow or several? Where do those boundaries exist? What belongs together and what should be separate?

That is the hard part.

Writing the code for each step is usually the easier part once you actually understand the model.

Once the model is clear, implementation becomes mechanical

That is why methods like event storming are useful. They help you focus on the events, actions, side effects, users, and different perspectives before you jump into code.

You want to understand the workflow first. You want to understand how smaller workflows fit into larger ones and how they cross boundaries. That work can be done with business people long before you start writing implementation code.

Once you understand it well enough, the implementation often starts to feel templated.

View the code on Gist.

You have probably felt this if you work in .NET and use a messaging framework. Good frameworks handle a lot of the technical heavy lifting for you. They deal with plumbing like the outbox, inbox, logging, database concerns, and idempotency so you can focus on the specific behavior you need to implement.

That is a good thing. But it also makes the point pretty clear.

At that stage, a lot of the work becomes filling in the blanks. The workflow has already been defined. The messages already exist. The handlers are just implementing the behavior you already modeled.

That does not mean there are no hard technical problems

There absolutely are hard technical challenges.

Scaling problems, data consistency issues, infrastructure concerns, deployment issues — those are all real. I like those problems. But those are often architectural issues around the shape and growth of the system, not questions about the business workflow itself.

They are different kinds of difficulty.

A lot of the pushback on “coding isn’t the hard part” comes from people saying that translating ideas into precise, working systems requires deep knowledge and experience. I agree with that completely. But I still separate understanding the business and designing the system from the actual implementation work.

Those are closely related skills, but they are not the same skill.

What the best developers actually do well

The best developers I know are not just technically capable. They understand the business domain. They can decompose a system. They know how to reason about workflows and boundaries.

Yes, they are also good with their tools and can handle concurrency, messaging, and technical complexity. But that is not what makes them stand out most.

What makes them stand out is that they can answer the hard design questions.

Where do responsibilities belong? Who owns the data? What capabilities belong in which part of the system? How are boundaries crossed? How coupled are different parts of the system? What kind of data coupling, timing coupling, or deployment coupling exists?

Those are technical questions, but they are also design questions. They sit above the level of just writing code.

So is coding the hard part?

So when I say coding isn’t the hard part, I am not saying building software is easy.

I am saying the hardest part of building business systems is usually understanding the business, modeling workflows, defining boundaries, and designing a system that can actually support the way the business works.

Once you have done that well, the coding often becomes the easier part.

Follow @CodeOpinion

Join CodeOpinon!
Developer-level members of my Patreon or YouTube channel get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

Vertical Slices doesn’t mean “Share Nothing”

Read Replicas Are NOT CQRS (Stop Confusing This)

Keynote: CTRL+SHIFT+(BUILD) PAUSE

The post Coding Isn’t the Hard Part appeared first on CodeOpinion.

Vertical Slices doesn’t mean “Share Nothing”

Derek Comartin — Fri, 06 Mar 2026 14:43:38 +0000

How do you share code between vertical slices? Vertical slices are supposed to be share nothing, right? Wrong. It is not about share nothing. It is about sharing the right things and avoiding sharing the wrong things. That is really the point.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://youtu.be/ZiZoR4cQhGE

Vertical Slice Architecture doesn't mean “Share Nothing” (https://youtu.be/ZiZoR4cQhGE)

Boundaries

If you have watched my videos before, you probably know I talk a lot about boundaries. A vertical slice is not that different from a logical boundary. What matters here is that a vertical slice defines a boundary around a use case.

That is the lens I want you to look through, because once you do that, the question of sharing becomes a lot clearer.

A Shipment Workflow Is a Good Example

A good example is a shipment. It is a workflow.

You have different actions that happen along the way that make up a life cycle from beginning to end. Think about ordering something online. It gets dispatched, which means the order is ready to be picked up at the warehouse. The carrier arrives at the shipper. Then the freight is loaded onto the truck. Then it departs. Then it arrives at the destination. Then it is delivered.

That is the workflow.

Now the natural question is this: is the vertical slice the whole workflow, or is each step its own slice?

Because if each step is a slice, how do you share between them?

A Vertical Slice Can Be One Step In the Workflow

Each step in that workflow can be a vertical slice.

You could model the entire workflow as one slice. Sometimes that might be fine. But often, each step can be its own slice because the workflow can change. It can deviate based on the actions that occur.

Take the same shipment example. The order gets dispatched, the vehicle is on the way to the warehouse, it arrives there, and then finds out the order was cancelled. There is nothing to pick up anymore.

That is a different use case.

In shipping, that might be called a dry run.

How do you implement that? It is just another vertical slice. It is part of the workflow, but it is also a deviation from that workflow.

That gets us back to the original question. What can you share between those vertical slices that are part of the same workflow?

There Are Two Different Kinds of Sharing

The first kind of sharing is technical infrastructure and plumbing.

Things like error and result types, logging, tracing, authorization helpers, messaging support, outbox primitives, event bus abstractions, and small utility code. That kind of stuff is normal to share. Some slices will use it. Some will not.

A slice gets to decide what dependencies it takes on and what tactical approach it uses. That is part of the slice owning its implementation.

But that leads to the more important question: what does the slice actually own?

A Vertical Slice Owns Its Data and Behavior

A vertical slice owns its data. That use case owns the data it needs and how that data is persisted.

It also owns the dependencies it takes on. It chooses the tactical patterns it wants to use for that specific use case.

That is important because people hear that and then assume everything must be completely isolated. But that is not really true, especially when several slices are part of the same workflow.

Shared State Is Not the Problem

In the shipment example, what you really have is a state machine. You have state transitions across the life cycle, from dispatched all the way to delivered.

So yes, there is shared state.

That does not mean there is shared ownership of everything.

View the code on Gist.

Imagine a shipment with state like status, dispatched at, arrived at, pallets loaded, and emptied at. If that was mapped to a table, each piece of that state is owned by the slice responsible for that action. The dispatch slice changes the dispatch related state. The arrive slice changes the arrival related state. The loaded slice changes the loaded related state.

Each slice owns the behavior around its part of that workflow.

This Still Applies With Event Sourcing

You can think about the exact same idea with event sourcing.

View the code on Gist.

Instead of changing columns in a table, you are appending events to a stream. Dispatched. Arrived. Loaded. Emptied.

Same concept.

Each use case owns the behavior that produces those events. It owns the data tied to that behavior. It owns where that data lives, whether that is in a table or in an event stream.

That can still all live together. You are still sharing an aggregate. You are still sharing a concern because there are invariants around that workflow.

That is not bad sharing.

Sharing an Aggregate Is Not Bad Sharing

An aggregate is a consistency boundary. You need that consistency boundary around the state.

A slice is a use case.

So if you have several use cases related to the same underlying model, that can be shared. If two slices share validation because both operate on the same domain model, that can be shared too.

At the same time, you can have other slices that are not part of that workflow at all and use a completely different model. That is fine too.

The point is not that every slice must have its own isolated universe. The point is understanding what actually belongs together.

Different Slices Can Use the Same Domain Model

Another way to visualize this is by looking at what each slice does from top to bottom.

One slice might be invoked by an HTTP API. It has application code and a data model under it. Another slice might be invoked by a message or event. It has different infrastructure, different application logic, but still uses the same underlying domain model as the HTTP slice.

The entry point is different. The infrastructure is different. The application code is different.

That does not mean the domain model cannot be shared.

And then you might have another use case that is not related at all, even if it lives in the same broader boundary. That one may have a completely separate model.

Again, the point is that slices define their own dependencies and their own behavior. But that does not mean they cannot share anything.

Where Sharing Becomes a Problem

The problem starts when you begin sharing things that have no business being shared.

In the shipment example, I am talking specifically about the workflow and the shipment life cycle from beginning to end. Nothing in that example has anything to do with compliance, customer support, customer tracking, or what was actually ordered.

Those are separate concerns.

The trap people fall into is that they start sharing and coupling things they should not. The model becomes unfocused. That is how you end up with one giant Shipment object for your whole system.

That is where you get into trouble.

Do not share one god object.

A Vertical Slice Is a Logical Boundary, Not a Physical One

This part is really important.

People often talk about vertical slices in the context of code organization, and that is useful. But a vertical slice is not a physical boundary. It is a logical boundary.

That means it does not have to live in one C#, Java, or TypeScript file. It does not have to live in one project.

If you have a mobile app deployed separately to iOS or Android, and it is dealing with specific actions as part of a use case, that can still be part of the slice. If the same use case is invoked through an HTTP API, that is also part of the slice.

The slice is the logical boundary around the use case. It is not just a folder structure.

Good Sharing Versus Bad Sharing

Good sharing is when vertical slices are operating on the same underlying thing as part of a workflow, a life cycle, or a set of common invariants.

Bad sharing is when a change unrelated to one vertical slice affects another slice unexpectedly.

That is when you are sharing things that have unrelated reasons to change.

That is the distinction.

Do Not Share Domain Meaning

Put another way, do not share domain meaning.

In the shipment workflow, dispatched, arrived, and loaded are use case specific. Dispatch is its own thing. No other feature should be doing something related to dispatch unless it actually owns that behavior.

If dispatch publishes events or changes dispatch related state, that should happen there. If there is state related to dispatching, that slice should own it.

You are not sharing that ownership.

Could you still share an underlying domain model that handles the workflow transitions? Absolutely.

But ownership of behavior still matters.

Focus on Actions, Not Just Data

Hopefully one thing stands out in this example. When I describe vertical slices and use cases, I am describing actions. I am not starting with data.

That matters.

And that is the real issue underneath all of this.

This Is Really About Coupling

When we talk about sharing, what are we really talking about?

Coupling.

That is what this usually comes down to.

If you understand what you are coupling to between vertical slices and use cases, you can manage it. If several slices depend on the same underlying domain model because they are part of the same workflow, that can be perfectly fine.

If every vertical slice can touch any piece of data and change state anywhere, then yes, you are going to have a problem.

At the end of the day, this is about managing coupling.

Share the Right Things

Vertical slices are not about sharing nothing.

They are about sharing the right things.

Technical concerns and plumbing? Sure.

Shared invariants as part of the same workflow? Sure.

A shared aggregate when several use cases are part of the same consistency boundary? Sure.

What you want to avoid is coupling things together that do not belong together.

That is the difference between good sharing and bad sharing.

Follow @CodeOpinion

Read Replicas Are NOT CQRS (Stop Confusing This)

Your Idempotent Code Is Lying To You

How Many Microservices Do I Need?

The post Vertical Slices doesn’t mean “Share Nothing” appeared first on CodeOpinion.

Read Replicas Are NOT CQRS (Stop Confusing This)

Derek Comartin — Wed, 18 Feb 2026 14:36:02 +0000

What’s overengineering? Is the outbox pattern, CQRS, and event sourcing overengineering? Some would say yes. The issue is: what’s your definition? Because if you have that wrong, then you’re making the wrong trade offs.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://youtu.be/HzqrTlmA0oc

Read Replicas Are NOT CQRS (Stop Confusing This) (https://youtu.be/HzqrTlmA0oc)

“The outbox pattern is only used in finance applications where consistency is a must. Otherwise, it’s just overengineering.”

Not exactly.

“CQRS is overengineering and rarely used even at very high scale companies. One master DB for writes and a bunch of replica DBs for reads are sufficient.”

No. And it has nothing to do with scaling.

“Event sourcing, another overengineering term, but in reality, most production systems do not implement strict event sourcing as described in books and system design articles. In the practical world, only current state is stored in the primary DB and events and business metrics are stored in an analytics DB.”

The giveaway that this is wrong is the discussion of business metrics related to event sourcing.

In the “practical world”, I’ll give some examples where event sourcing is natural.

Let’s go through them one by one, explain what they are, and when you should be using them.

The Outbox Pattern

Is it about finance? It has nothing to do with finance. Is it about consistency? Yes, that part is correct. It’s really a solution to a dual write problem.

Here’s the dual write problem.

You have your application. Some action gets invoked. You persist a state change in your system.

That’s the first write. The second write is you need to publish an event and write a message to a message broker so other parts of your system know it occurred. That’s the second write.

Here’s the issue. It fails in between. So you do the state change. Everything passes. Everything is saved. Transaction is good. But then you fail to publish the message to your message broker. Now you’re inconsistent. Your state change happened, but the event never got published.

Is it a big deal if you fail to publish that event? It depends what you’re using the event for, and what downstream services care about. If it’s best effort metrics or analytics, it might not be a big deal.

If it’s part of a workflow, it can be a much bigger deal. You want that consistency, and that’s where the outbox comes in.

So how do you solve the dual write problem? Like most problems, don’t have it in the first place.

To solve the dual write problem, we’re going to have a single write. That means you persist your state to your database and within the same transaction you persist the message to an outbox table in that same database.

Separately, you have a publisher that queries the outbox table, pulls messages that need to be published, and pushes them to your message broker. If it succeeds, it reaches back to the database and marks the message as completed or deletes it from the outbox table.

If there’s a failure, you retry. You haven’t lost any messages you wanted to publish.

So is the outbox pattern overengineering? It totally depends on your use case.

If you’re using events as a statement of fact that something occurred within your system and other parts of your system need to know it happened, then it’s probably not overengineering.

If you’re using events as best effort analytics and it’s totally fine if some events aren’t published because nobody depends on them, and lost messages are fine, then yes, it’s overengineering.

One side note: if you’re using a messaging library, it probably already supports the outbox pattern.

CQRS

“CQRS is overengineering and rarely used even at very high scale companies. One master DB for writes and a bunch of replica DBs for reads are sufficient.”

This is confusing two things entirely. It’s talking about scaling at the read write database level, when in reality this is about your application design.

CQRS literally stands for Command Query Responsibility Segregation. Commands change state. Queries read state.

That has nothing to do with databases. One database, two databases, whatever the case may be.

This is about having two different code paths for different responsibilities.

But since scaling was brought up, especially on the query side, that’s the angle I want to tackle. In a lot of query heavy systems, you often have to do a lot of composition.

That composition could be to a single database, multiple databases, a cache, whatever. But you’re making multiple calls to different places to compose data together to return to a client.

Because a lot of systems experience this, people create views or materialized views so you’re not doing all of that composition at runtime.

Instead, you have a separate table, a view, a different collection, a different object, something that represents what’s optimized for a specific query.

Example: an order and line items.

Maybe instead of joining tables and calculating totals on every request, you have a view that does it.

Or you have a materialized view that’s persisted and updated every time there’s a state change to an order.

So when you make a state change, your command updates your write side. Maybe that’s a relational database with normalized tables. And because you have a materialized view, you update that too. That could be in the same transaction. Then when a query comes in, you read directly from the materialized view.

This is all about optimizing reads or writes.

In my example, it’s optimizing reads, using a materialized view.

It doesn’t need to be that at all. It could be a relational database, a document store, a single table, a collection, some object that already contains what you need.

The point is you have different code paths, so you have options.

You could still have your query side do composition and your command side use the exact same database, the exact same schema, and update what it needs to update.

You just have the option to do different solutions if you have different code paths.

So is CQRS overengineering? Not really. You’re likely already doing it in some capacity because you already have different paths for reads and writes.

Where this gets conflated is when you start thinking about it purely from a scaling perspective. If you’re doing a lot of composition and you add read replicas, that’s fine.

But here’s the question.

Are your read replicas eventually consistent?

Because that plays a part in the complexity you’re adding by just adding read replicas. If you want pre computation because you want materialized views to optimize the query side, that’s a strategy if you need to optimize.

Event Sourcing

“In the practical world, only current state is stored in the primary DB and events and business metrics are stored in an analytics DB.”

We’re talking about different things here. Events are facts. What event sourcing is doing is taking those facts and making them the point of truth.

Then you take that point of truth, that series of events, and you can derive current state or any shape of data from any point in time.

Let’s use a practical example because there are a lot of domains that naturally have events. You can just see them. A stream of things that occur.

Here’s a shipment.

You persist these as a stream of events for the unique thing you’re tracking.

Shipment 123 has its own series of events. Another shipment has a different series of events. Those event streams are the point of truth.

You can derive current state from them.

It has nothing to do with analytics, but you can use them for analytics because just like current state, you can turn them into any shape you want.

So if you have an event stream, you can transform it any way you want.

Maybe you transform it into a relational table so analysts can write SQL like “select all shipments dispatched on a particular day”. Or maybe you transform it into a document shape that’s optimized for an application query.

That’s the point.

Your source of truth becomes an append only log of business facts, events. Your state is derived from those events.

A lot of the issues I read about people having with event sourcing are twofold. First, they’re not actually doing event sourcing. They have an event log, but it isn’t the point of truth. Their real database is still current state, and the event log is just “extra”. Or they’re using events as a communication mechanism with other services like a broker, which is a different thing.

Second, there’s a huge difference between facts and CRUD. “Shipment created” is not an event. That’s CRUD. “Order dispatched” is an event. Something happened.

“Shipment modified” is not an event. “Shipment loaded”, “shipment arrived”, “shipment delivered”, those are events.

Is event sourcing overengineering? It can be if all you view your system as is CRUD, and that’s how you build systems.

But there are a lot of domains where, once you start seeing it, you naturally see a series of events and it becomes obvious that’s where event sourcing fits.

The Real Point About Overengineering

Everything has trade-offs.

If you do not understand a concept, you won’t be able to understand what those trade-offs are, because you don’t even know what they are.

Follow @CodeOpinion

Your Idempotent Code Is Lying To You

You Can’t Future-Proof Software Architecture

How Many Microservices Do I Need?

The post Read Replicas Are NOT CQRS (Stop Confusing This) appeared first on CodeOpinion.

Your Idempotent Code Is Lying To You

Derek Comartin — Wed, 04 Feb 2026 13:56:35 +0000

You have some code that handles placing an order. This could be an HTTP API or a message handler. You made it idempotent. You added a unique constraint on some kind of message ID.

And somehow… you still end up double charging the customer’s credit card.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://youtu.be/5_DyX3PnKxg

Your Idempotent Code Is Lying To You (https://youtu.be/5_DyX3PnKxg)

Idempotent

You did everything right. You have idempotency. You have an inbox table and a unique constraint on that message ID. Your handler should be exactly once, right? Wrong.

And it’s because of this call to our payment gateway outside of our database transaction.

Our database can tell us whether we processed the message, but it doesn’t stop us from double charging our customer.

So let’s talk about why this happens, how concurrency can make it worse, and some solutions.

The happy path: idempotency with internal state

Let’s say you have an HTTP API where you might get multiple requests from the same user. Or this could be a message handler from a message broker where a message can be delivered more than once.

What that looks like is:

A request comes in

We take a message ID and persist it to our inbox table

Because we have a unique constraint, if it wasn’t there, everything is fine

Then that exact same message (or HTTP request) comes back in. Guess what? It’s already there. So our request fails.

This is the happy path. It’s idempotency inside the database.

As long as you’re only updating internal state, this works. But it gets a lot more complicated once you start calling something external… like a payment gateway.

Why the payment gateway breaks “exactly once”

In the happy path, all we do is:

start a transaction

add our message ID to the inbox

persist state

if it was a duplicate, we throw when we save or commit

Simple.

But we also need to make the call to the payment gateway. And this is where the issues start.

It might seem like a good idea to call the payment gateway immediately. Maybe you get a transaction ID back, some receipt, something you can persist in your database to mark the order as paid.

But here’s the problem: That payment gateway call is outside of our transaction. Outside of that unique constraint.

View the code on Gist.

So now this can happen:

First request comes in. We charge the customer. We save our inbox record. We persist our state. All good.

Second request comes in (same message, same request) We charge the customer again. Then we hit the database and fail because of the unique constraint when we save and commit.

So our database protected our internal state. But it didn’t protect the external side effect.

Concurrency makes this easier to see

If you’re looking at this from an HTTP API perspective, you can have two identical requests come in at the same time.

The load balancer sends them to two different instances.

They both hit the payment gateway at roughly the same time.

One of them wins the unique constraint. The other fails on commit.

But both potentially charged the customer. At the start I said concurrency can make it worse. A different way to think about it is: concurrency makes it easier to reproduce and prove you have the issue.

There’s no magic distributed transaction coming

One “solution” is a distributed transaction. The reality is you’re not going to get one.

That inbox table protects your internal state, but the moment you cross that network boundary to the payment gateway, all bets are off. You kind of went from exactly once to at least once.

But we can design for it. There are a few approaches here. It’s not one magic fix. It’s usually a combination depending on your situation.

1) The third-party service supports idempotency

Your third-party service might support idempotent requests.

A good example is Stripe. It supports an idempotency key that you pass in the header to make idempotent requests.

So you decide what the key is for that specific business operation. If the same request happens more than once, you send the same key again.

Now it becomes idempotent on the payment provider side too.

Side note: if you’re creating an HTTP API, support idempotent requests. Your clients will love you. If you don’t, they’re the ones who have to deal with the rest of this stuff.

2) Lookup by a reference ID before charging

If the provider doesn’t support idempotency keys, another option is a lookup against some kind of reference.

View the code on Gist.

In my example, I have a payment gateway and I want to know: is this invoice paid?

My reference at this point is the order ID. So the flow becomes:

Ask the gateway: “Is order 123 already paid?”

If not, charge it

You can still have race conditions here. This can still be an issue. But it may be good enough, especially if the third party has a unique constraint on something like your order ID.

3) Serialize the operation per business key (locking)

If lookup isn’t enough, another solution is serializing the operation by a granular business key.

View the code on Gist.

What I’m talking about here is locking.

You’re basically creating a distributed lock. That might be:

using your database and row locking

using Redis for locking

something else entirely

In my example, “granular business key” might be per order. That means only one payment attempt for that order runs at a time. If I can’t acquire the lock, I retry, or return something that tells the caller to retry. Now at any point, we only execute one at a time for that order, charge the customer once, and release the lock.

Caveats

The trade-off is throughput. That’s why the business key has to be granular. If you lock too broadly, you slow everything down.

Also, timeouts matter. If the payment gateway times out, that does not mean it failed. It might have actually succeeded. So even with locking, you still need to think about what “timeout” really means.

4) Separate internal state from external calls (inbox/outbox)

Another option is inbox/outbox and splitting internal state from the external call.

View the code on Gist.

What does that mean?

We get the order

We mark it as payment pending

We add a message to our outbox saying “charge payment”

We save all of that together in the same transaction

Then separately, a processor reads the outbox and performs the external call to the payment gateway.

If the provider supports idempotency keys, that outbox processor uses them. And then if it succeeds, we mark the order as paid. This doesn’t magically fix double charging. You could still double charge.

What it does give you is better internal consistency and better failure handling, because message handlers let you handle retries, backoff, timeouts, and errors differently than your main request path.

That’s one of the reasons I love messaging.

5) Reconcile and compensate

This generally always happens: you need reconciliation and compensating actions.

There’s nothing wrong with this. If something realizes “uh-oh, we double charged”, you void or refund.

This can be a workflow step, or a periodic reconciliation process that compares your system with the third-party system. You’re not failing as an engineer because you need compensation. This is just reality when you’re dealing with external systems.

Putting it all together

You have a lot of options, and it’s often a mix:

Use an inbox table to dedupe incoming messages

Use a unique message ID to prevent reprocessing internal state

Use an outbox pattern so your intent to call an external system is persisted with your internal state

If the provider supports idempotency keys, use them

If not, can you lookup by a reference ID to see if the action already happened?

If acceptable, can you serialize with a distributed lock by a granular business key like order ID?

And when those don’t cover everything, reconcile and compensate

Ultimately, I don’t think the goal is “exactly once” as in “the operation only ever happens exactly once.”

A better goal is designing a system that’s effectively once — it behaves correctly even when you deal with race conditions, concurrency, and timeouts that are outside of your control.

Follow @CodeOpinion

You Can’t Future-Proof Software Architecture

Context Is the Bottleneck in Software Development

Watch Out for Superficial Invariants

The post Your Idempotent Code Is Lying To You appeared first on CodeOpinion.

You Can’t Future-Proof Software Architecture

Derek Comartin — Tue, 20 Jan 2026 23:19:58 +0000

“Future proof your architecture” sounds good. But the reality is you can’t future-proof Software Architecture. When you really think about it, the future is just what’s breaking assumptions. You can’t really future-proof that.

What you can do is contain changes so they don’t ripple through your system.

Where people go wrong is trying to future-proof with abstractions everywhere. What you really want to be doing is controlling the blast radius, meaning controlling where change goes.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://youtu.be/mIKOqtZA7ak

You Can’t Future-Proof Software (But You Can Contain Change) (https://youtu.be/mIKOqtZA7ak)

I’m going to explain this using a thread by Aaron and elaborate on some of the things he’s pointing out.

He posted:

I posted a lot of bangers about SDK bin’s terrible software choices and how it generally made life unpleasant for us. So I wanted to detail how we’re addressing the dumbest and worst design choices in the codebase.

So the first one: what did our dev do when we needed to renew an annual subscription? Modify the subscription created date and reset the renewal reminder hard coded as N months from the creation date.

Now you might be thinking, “That’s ridiculous. I would never do that.”

But it underlines why things change.

The Unknown Is Usually Boring Stuff That Stops Being Boring

In the context of future proofing, the unknown usually lies in things like:

pricing rules changing

renewals and billing schedules changing

tax changes

refunds and how those show up

partial payments

a new payment provider, or a second payment provider/gateway

workflow that used to be simple, becoming not so simple

That’s the real problem. Early on, when you’re building a system, you can have a lot of assumptions about the unknown.

What kills you is early decisions that leak everywhere in your codebase and cause coupling.

You have assumptions. You make decisions. Those decisions leak everywhere. Now you’re coupled. And that coupling is going to cause a lot of pain later when you try to make change.

You know you have this problem because when a change comes in, you have to touch all these things:

the UI

the “domain”

persistence

some random shared helpers and utils

reporting

background jobs

and my favorite: three or five or a dozen other places you didn’t even know existed

You’ll often hear, “Well, we have a very large system and it’s very complex.”

In the context of what I’m talking about, that’s not complexity. That’s coupling.

Stripe Isn’t the Problem. Leaking Stripe Everywhere Is.

In Aaron’s case, he’s feeling the pain of everything being coupled so tightly to Stripe that it’s taken a mini Manhattan project to move off it.

Related, sure, but fundamentally separate concerns. So the assumptions and unknowns causing pain here are exactly what I said at the beginning:

the idea of a renewal wasn’t a first class concept

moving off Stripe becomes a disaster

conflating invoices and payments creates more pain

That “Created Date Renewal Hack” Is a Symptom

Here’s the type of thing that happens when the Stripe assumption leaks into your system.

View the code on Gist.

You end up with leaked information in your subscription, and who knows where else, like:

Stripe customer ID

Stripe invoice ID

And then the only real concept you had was a created date time.

But because you needed renewals after the fact, you didn’t model it.

So that created date turns into a hack. You “renew” by pretending it started again. You overwrite the created date and reset the reminder hard coded off that date.

This is also one of those situations where if the business actually knew you were overwriting this data, you’re potentially losing a lot of valuable info. Audit info. What actually happened. When did it renew. What was the history.

If you talk to someone non technical in the business, whether they care about that, they’ll probably say yes.

Your Data Model Isn’t Your Domain Model

This is where I’ll say something you’ve heard me say before:

Your data model isn’t your domain model.

How you persist data, what your schema looks like, that’s not your domain model.

If you look at your model and think, “This doesn’t really express the domain,” then yeah, you probably don’t have a domain model. You have a data model. A bucket of data. Getters and setters.

Bonus tip: it’s also not your resource model.

You have an HTTP API. Those are different things.

What you return to clients isn’t your underlying schema and isn’t your domain model. It’s a representation of what you want to show to clients.

So What’s the Fix? Not Interfaces Everywhere.

So what’s the fix?

In the Stripe example, it was coupled everywhere.

Is the fix immediately to jump to interfaces and put them everywhere? Use the adapter pattern everywhere all the time?

No.

It’s what I said at the beginning: controlling the blast radius. When you make a change, it should be localized to one particular place.

With Stripe specifically, it depends how coupled you are:

Is the Stripe customer ID leaked into your database?

Are other clients using it because you exposed it via your HTTP API?

Do other libraries use it?

Are you using the Stripe SDK in 10 places, or 200?

If you have 200 usages, and it’s all through the UI, the domain layer, persistence, reporting, background jobs, and everywhere else… you don’t have a blast radius.

You have a disaster.

Separate Concepts: Invoice vs Payment

Another simple example is the invoice vs payment problem.

View the code on Gist.

They were treated as one concept and it was a disaster.

They don’t need to be. They should be separate things. You can have an invoice that has nothing to do with Stripe. It’s just an invoice. No Stripe internals. It’s your concept. Then payments are a separate concept.

Now you can apply payments to an invoice. Partial payments? Fine. Refunds? That has nothing to do with invoices. That has everything to do with payments.

Separate concern. Easier to support. Easier to change.

Watch Your Nouns: Third Party Vocabulary Leaks

Pro tip: when you’re using third party services heavily, especially if they matter a lot to you, the nomenclature from that third party starts leaking into the core of your system.

You’ve got to be careful there.

You want your product’s nouns and verbiage to be yours, not the third party’s.

A Concrete Blast Radius Example

Here’s what “controlling the blast radius” can look like when paying an invoice.

View the code on Gist.

You fetch the invoice from the database.

You create a payment. Separate concept. You call Stripe to charge the account. If it succeeds, you mark the payment as succeeded. If it fails, you mark it as failed.

Then you save your database changes.

Yes, you’re going to have reconciliation, because if something fails on your side but the charge actually went through, you deal with that after the fact.

But the point is this: You’re controlling the blast radius of where you deal with Stripe. It’s concrete. It’s real. But it’s contained.

If you need to change payment providers, you change it where you have that capability exposed. It’s not coupled everywhere.

How People Make It Worse

This is where people take the right problem and make it worse.

Before you say, “Let’s make an IPaymentProvider used across the whole system” — congratulations, you just created shared coupling.

Or “Let’s build a generic billing framework” — no, you created a framework specific to one implementation that isn’t generic at all.

“Let’s reuse this shared library across services or slices” — no, what you created is a distributed monolith starter kit.

Designing for the Unknown

So how do you design and architect for the unknown?

It’s not about trying to future-proof Software Architecture. It’s about containing the blast radius.

For the things that often change — rules, workflows, integrations — if you segregate them, you can change them without it affecting your entire system.

Where things go wrong, like the Stripe example, is leaking internal information throughout the system. Then if it changes… now what?

Because you didn’t localize it. It permeated everywhere, and the blast radius is huge.

Follow @CodeOpinion

Context Is the Bottleneck in Software Development

Why “Microservices” Debates Miss the Point

Watch Out for Superficial Invariants

The post You Can’t Future-Proof Software Architecture appeared first on CodeOpinion.

Context Is the Bottleneck in Software Development

Derek Comartin — Mon, 05 Jan 2026 21:44:22 +0000

Software development context is the real bottleneck, not writing code. AI can generate code fast, but without context, boundaries, and language, you get coupling and brittle systems.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://www.youtube.com/watch?v=Px2LM-dkPuQ

AI Makes Code Cheap. That’s Why Design Matters More (https://www.youtube.com/watch?v=Px2LM-dkPuQ)

With AI, I think people are taking a leap that is fundamentally wrong. It is not about producing cheap code. I do not think that has ever been the bottleneck. The bottleneck has been context. If you have watched enough of my videos, you probably know my slogan is context is king. And context is probably more important now than ever.

It is not about what syntax or folder structure your source code looks like. It is the context of why it does what it does. Why did you write the code, or the AI write the code, given your instructions? What are we optimizing for? What constraints did we have? What are the invariants and the things that can never happen? Probably most importantly, what tradeoffs did we intentionally accept? What decisions were made, and why?

AI can provide the implementation. It can write all the code. But it needs context. It needs to understand how to make the tradeoffs. Giving the instructions I see online like “write clean code” or “DRY code” is the most useless instruction for actually developing a good design.

“Design does not matter anymore” is the trap

I understand the rebuttal people have. “Well, it does not even matter anymore about design. Because if AI can read the code and can write the code, and therefore change the code, it does not matter. It does not matter what type of folder structure you have, organization, it does not matter at all about the design because AI can just handle it.”

But that is a trap.

The pain has never been writing code. It is about making behavioral changes safely.

Coupling is still the thing you have to manage

I stumbled upon a post that basically said the way the code looks should be irrelevant. What matters is the end result. On the surface, I get it. But I think it is naive.

When people talk about “how the code looks” they might be thinking structure, syntax, whatever. But to go beyond that, yes, it matters how it looks, because coupling still needs to be managed. Coupling is arguably the thing that when we are writing software we need to handle the most. If you want a long lived system that can evolve and change, you need to manage coupling.

If AI makes producing code cheap, guess what is also going to be cheap. Creating coupling. Creating a rat’s nest turd pile of coupling. What people call a big ball of mud. Something that is hard to change.

Everybody can relate to this. You make a change to one part of your system and it affects another part of the system unintentionally. Why does that happen? Coupling.

And you might be thinking, “But AI knows everything.” It does not.

“But it is going to know all the coupling.” Sure. But you already know all the coupling right now as well. And you still have this problem. So what is going to be different with AI?

If you are using a statically typed language, say in a monolith, you can find usages. You can run tools to know what your coupling is between different boundaries, or how your system runs. You can already know this, and you still end up with a turd pile that is brittle and hard to change.

So I do not think the answer in the era of AI is to ignore design. It is likely the opposite. It is providing a design that is built on constraints and context.

And that gets us to the real question. Where does that context live?

Where context actually lives

Context lives in the structure. It lives in the dependencies. It lives in the boundaries. And boundaries are still more important than ever because of coupling. All these foundational things you are doing now, even with AI in the mix, are still relevant.

So how do you capture the design and context in your system? First, you have to be explicit in the domain and the language you are using. I preach this so much. It is the opposite of CRUD, and the language now is more important than ever.

Use the context to understand what the domain is. Let me give you an example in logistics and shipping. What are the use cases? What are the things you do as part of the workflow?

You can dispatch an order. When a vehicle arrives at the shipper to pick up the freight, you arrive. When you pick up the freight, that is the load. When you leave and you are on route to delivery, that is depart. When you unload the freight or deliver it, that is empty.

These are explicit. It is the exact opposite of CRUD. CRUD provides no context. All the context is living in your end user’s head, because that is the workflow of how they interact with your system. If you have create shipment, update shipment, update stop… what is this? What does the system even do?

You would not be able to tell me what the workflow is, because the workflow is in somebody’s head. You are just recording current state. And if we are talking about current state and how you are recording state, it tells you how the system is now. But it gives you no context about how it got that way.

Events give you the story, not just the state

This is where events fit so naturally. There is a big difference between “shipment created” or “shipment updated.” What does that even mean? Versus being explicit about actions and commands.

An order was dispatched. It arrived. Loaded. Departed. Delivered.

These are explicit behaviors of your system. That language is the story of the domain. One of the most underrated places context lives is in language in the code. It tells you what the system does, and how it does it.

Not all language is created equal. You can be using terms, especially in a large system, that mean different things depending on the boundary. That is why boundaries matter. Boundaries preserve context and control coupling between them. The language inside that boundary encodes the intent of what it does. Events preserve intent by capturing what happened, and why.

Boundaries keep your concepts from being smashed together

When I talk about boundaries, I mean the different parts of the system that have their own context. In the logistics example, you might have sales, rating, orders, ordering. Visibility to customers about the status of their order. Execution, dispatch, tracking the vehicle and the path through delivery. Auditing. Communications with the customer. Billing and approvals, making sure documents are in place so you can invoice, pay carriers, pay drivers, whoever is executing the shipment.

Each one has its own context. And here is a really good example of what happens when you do not respect that. A system coupled everything so tightly to Stripe that it took a miniature Manhattan project to move off of it.

Using Stripe as a credit card processor is fine. But conflating Stripe payments with invoices is a mistake because they are not the same thing. Payments are money sent. Invoices are amounts owed for services and taxes and everything else. They are fundamentally separate concerns. Different concepts.

That is design. That is context.

AI can write code. It guesses your intent.

Design has always been important. But I am hoping now people see the value in capturing context within your design.

Create boundaries with specific context. Use the domain language explicitly. Make the concepts, the actions, the events, and the reasons why things happen, obvious in your code. Because CRUD only gets you so far. With CRUD, workflow and concepts are in the end user’s head. They are not in your system.

And if you are using AI to generate code, it does not have that workflow context unless you put it there. And of course, manage coupling between boundaries. Because AI producing code faster just means you can build a big ball of mud quicker if you are not managing coupling.

Follow @CodeOpinion

Why “Microservices” Debates Miss the Point

Aggregates in DDD: Model Rules, Not Relationships

Watch Out for Superficial Invariants

The post Context Is the Bottleneck in Software Development appeared first on CodeOpinion.

Why “Microservices” Debates Miss the Point

Derek Comartin — Tue, 16 Dec 2025 22:28:29 +0000

DHH had a take on microservices in small teams that is getting a lot of attention. And while I agree with what he’s pointing out, all of these types of conversations miss what actually matters. This is not about microservices or a monolith or small teams.

Now what’s implied here is microservices is much more difficult to understand the full context. I agree, given how most people think of microservices. You can think, well, I got all these services and yeah, I don’t know how anything happens end to end, and what service interacts with what service. Yes, that’s a problem.

It’s a problem, but not the root cause.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://www.youtube.com/watch?v=xYyLNkTky80

Why “Microservices” Debates Miss the Point (https://www.youtube.com/watch?v=xYyLNkTky80)

The Root Problem Is Coupling

The root of the problem is coupling.

So if you have a high degree of coupling, let’s say we’re talking about a monolith here, yes, you’d be able to kind of navigate this a little bit better. Try to understand how each different part of your system is interacting with a different part. And yes, it’ll be much more difficult if all of a sudden these are all microservices and now you’ve introduced a network boundary.

Microservices is a physical architecture choice. That’s what you’re choosing when you introduce it. You’re introducing network boundaries.

But regardless if you have a monolith or microservices, whether you’re a small team or not, the key is to define logical boundaries.

Logical Boundaries vs Physical Boundaries

There’s a difference between logical boundaries and physical boundaries.

Half the issue here is that microservices define and force you to be a one to one.

Meaning, what we defined as a logical boundary of service A, B, and C, they likely end up with their own source repository. Even if it’s a monorepo, you have your own source that’s specific for that logical boundary, which guess what, gets built and turned into some type of deployable, whether it be some executable, a container, whatever, some unit of deployment.

We’ve turned everything into a one to one to one.

That can be different when you often think about a monolith, or what people would classify as a modular monolith. You have all these different logical boundaries within your monolith, within the same source codebase, that gets turned into a single deployable unit.

You can build a monolith with strong logical boundaries. You could be doing the same with microservices.

On the flip side, you can build an absolute turd pile of a monolith because you have weak boundaries, or none at all. Same goes with microservices.

What We Are Really Arguing About

What we’re really arguing about here with microservices is whether the cost of introducing a network boundary is worth it.

And he points out that cost.

“Then comes the operational farce. Each service demands its own pipeline, secrets, alerts, metrics, dashboards, permissions, backups, and rituals of appeasement.”

I don’t think that list is exaggerated at all. It’s a lot of complexity and has a high cost.

So the question is, do you get enough value from being able to deploy independently for the cost. This is about a trade off.

He continues with,

“One bug now requires a multi service autopsy. A feature release becomes a coordinated exercise across artificial borders you invented for no reason.”

Hang on there.

You just have a high degree of coupling. Artificial borders, absolutely you want borders. Should they be artificial. No. They should be cohesive around the capabilities of your system.

If you have a high degree of coupling, that’s your problem. That’s not just some random thing. It wasn’t invented. You created this. You created the coupling.

Whether you have microservices, is it going to be much more difficult to debug and troubleshoot because of that network boundary. Absolutely. I’m not disputing that.

But the root cause here is because of all the coupling, which directly relates to the comment, “You don’t deploy anymore. You synchronize a fleet.” No, that’s because of coupling.

More specifically, what people feel the pain of is temporal coupling.

If you were in your monolith and you had the same type of degree of coupling, you might not feel as much pain, but that coupling is still there and the pain is still there. It’s just hidden.

When you introduce that network boundary, it just exposed it. Because now you have all the distributed nature of HTTP, gRPC, whatever, however you’re distributing over the network. Retries, latency, it’s just exposing it all of a sudden. But the mess was already there.

“You Are Forced to Define APIs Before You Understand Your Own Business”

Here’s what I think is one of the most important parts of this post.

“You are forced to define APIs before you understand your own business.”

If you’re starting to build a system and you don’t really understand yet what the domain is, what the business is, I always say defining logical boundaries or services are one of the most important things to do, but one of the most difficult things to do.

You really need to understand the domain and how the interactions are going to work, because you do not want a high degree of coupling. You want your logical boundaries to be as autonomous as you possibly can be.

They’re often little workflows, a part of bigger workflows. There shouldn’t be a mess of coupling between boundaries.

Typically that happens because you’re more focused on the technical aspect than you are about the actual business behaviors and capabilities of your system.

So while I agree that jumping into microservices and defining network boundaries immediately, that’s going to be much more difficult because it’s harder to refactor. I think everybody can agree on that.

So yes, being in a monolith first, when you don’t understand and you’re trying to mold what the logical boundaries are, yes, it’s going to be easier because it’s easier to refactor.

Which gets to what I like to call the loosely coupled monolith.

The Loosely Coupled Monolith

If we think about three different logical boundaries that have contracts, things like messages or potentially interfaces, implementation tests, we can see with my database here, maybe I have one database instance, but within that I have schemas that are specifically owned by a logical boundary.

It’s not a free for all of any logical boundary accessing data from another.

More specifically, what happens then is all your interactions, because of workflows, are done asynchronously via messaging, if you can.

That way we can see, if I’m in a monolith, I have all three deployed together. There’s absolutely nothing stopping you from carving one of them off and making it individually deployable because maybe it has a different cadence of what you want to release. The others can be separate.

You start it off as a monolith. You discover what your boundaries are. And because you might have the need and enough value to make it independently deployable, you can.

So, as long as again, the trade off and the cost is worth it. But that’s specifically because you need something independently deployable, possibly scalable.

“Monoliths Don’t Scale” Is Not Real

“The claim that monoliths don’t scale is one of the dumbest lies in modern engineering folklore.”

I agree.

And the simplest example of this is with the web queue worker pattern.

Going back to when I said logical isn’t physical, you can have more than one entry point, or one executable, or one deployable unit, even in your monolith.

In my example here, I have one that’s our HTTP API, could be sitting behind a load balancer, and we’re scaling that out.

But I also have the exact same codebase, but instead its entry point is actually listening to a queue, a message broker, an event log, and performing work asynchronously.

Now, on our database side, you could scale that up. You could scale that out depending on what type of database you’re using, or introducing read replicas.

But there’s so many different ways that you can scale a monolith.

Jumping to independent deployability isn’t necessarily the first thing you need to do for scale.

Stop Making This About Microservices vs Monoliths

Now, while I agree with a lot of what he wrote, I think it’s kind of silly that we’re still even talking about this this way.

This isn’t “microservices good” or “microservices bad” in a small team or whatever context. How about we start talking about the actual underlying issues here.

Adding physical boundaries has a cost. That’s what he was describing. Is the cost worth it? Well, you need to understand what the actual trade-offs are and what the value is.

I think we need to get totally beyond this, because fundamentally, at the root of almost all of this is poor design and poor coupling.

Even if you decided, I’m going to go all in on microservices, and let’s say you lived in an existing system and you knew what those logical boundaries should be, if you designed it correctly, you would not experience the pain of “I have to navigate all these different services to understand this end to end flow and I don’t get this context.”

You wouldn’t have that problem because your services are contained to actually what they do. They’re a part of a workflow. Are they part of a larger workflow. Yes. Would you have all this temporal coupling everywhere like a spaghetti hot distributed mess? No, you wouldn’t.

We’re talking about what people are implementing and how they’re doing it poorly as being like, let’s not do this because people are doing it poorly.

That’s not the case.

Manage coupling and understand when that network boundary is worth it.

Follow @CodeOpinion

Loosely Coupled Monolith

Aggregates in DDD: Model Rules, Not Relationships

All Our Aggregates Are Wrong

The post Why “Microservices” Debates Miss the Point appeared first on CodeOpinion.

Aggregates in DDD: Model Rules, Not Relationships

Derek Comartin — Mon, 08 Dec 2025 19:21:17 +0000

In a recent video I did about Domain-Driven Design Misconceptions, there was a comment that turned into a great thread that I want to highlight. Specifically, somebody left a comment about their problem with Aggregates in DDD.

Their example: if you have a chat, it has millions of messages. If you have a user, it has millions of friends, etc. It’s impossible to make an aggregate big enough to load into memory and enforce invariants.

So the example I’m going to use in this post is the rule: a group chat cannot have more than 100,000 members.

The assumption here is that aggregates need to hold all the information. They need to know about all the users. But that’s not what aggregates are for!

I’m going to show four different options for how you can model this. One of them is not using an aggregate at all. And, of course, the trade-offs with each approach.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://www.youtube.com/watch?v=ZvY5fREVyZk

You don’t need an Aggregate in DDD. Model Rules, Not Relationships (https://www.youtube.com/watch?v=ZvY5fREVyZk)

The Common Starting Point (and the Trap)

View the code on Gist.

So this is how people often start with aggregates in DDD, which is directly what that comment was talking about. Say we have a GroupChat class. This is our aggregate. We’re defining our max number of members as 100,000. And then we have this list, this collection of all the members, all the users associated to this group chat.

Now, this user could itself be pretty heavy in terms of username, email address, a bunch of other information, and maybe some relationships with it.

Then, for our method to add a new member, all we’re doing is checking to make sure we’re not exceeding 100,000, and then we throw.

This is where people start. But here’s the problem with it.

It may feel intuitive, but it’s a trap. It’s a trap because you’re querying and pulling all that data from your database into memory to enforce a very simple rule.

The big mistake here is: we’re modeling relationships, not the rules.

We’re building up this object graph rather than modeling behaviors.

Option 1: Store Only the Count

View the code on Gist.

An alternative is to just record the number of members of the group chat. That’s actually the rule we’re trying to enforce. We don’t need to know who is associated to the group chat. We don’t need to know which users, just the total number so we can enforce the rule.

The obvious benefit is we solved the problem: we don’t have to load all those users into memory. This is going to be very fast.

The trade-off is if you do need to track which users are part of which group, you’ll have to model that separately.

Option 2: Enforce the Rule Above the Aggregate

View the code on Gist.

Another option, if you feel storing a count is too risky because it could get out of sync, and you’re already recording which users are associated to which group, is to push the invariant up a layer, above the aggregate, into some type of application request or application layer.

Here I’m using some kind of read model or projection to get the number of users. Because it’s a projection, it could be stale. That’s the trade-off. Then we enforce the invariant there. If we pass, we add the user to the group chat.

A fair argument here is: “Well, really? We have some aggregates enforcing invariants, some application or service layer enforcing invariants, everything scattered everywhere.” But reality is: you have to enforce rules where you can do so reliably, not where it always feels clean and tidy in some centralized place. That’s not reality.

An aggregate can only enforce a rule if it has all the data it needs. And often your application or service layer isn’t just a pass-through. It shouldn’t be. It’s doing orchestration, gathering information and deciding whether a command should be executed.

Option 3: No Aggregate At All (Transaction Script)

View the code on Gist.

This might sound surprising, but you don’t actually need an aggregate at all. Sometimes I advocate for using transaction scripts when they fit best.

That’s what I’m doing here: start a transaction. Set the right isolation level. Interact with the database. Do a SELECT COUNT(*). That’s going to be very fast with the right index. Lock if needed. Check the invariant. Insert the new record. Commit the transaction.

Simple.

Sometimes a simple problem just needs a simple solution, and a transaction script is very valid.

The trade-off here is if you’re in a domain with a lot of complexity and a lot of rules, this can get out of hand and hard to manage.

Option 4: Model Rules, Not Relationships

View the code on Gist.

Another option I mentioned earlier is: stop focusing on relationships and focus on the actual rule.

What makes us say the group chat is the one that needs to enforce the rule? Maybe there’s actually the concept of group membership, and group chat is about handling messages. These have different responsibilities.

That’s really what I want to emphasize: you don’t need one model to rule them all. You can enforce something in one place and something else somewhere else. You can have a group membership component enforcing whether you can join, and group chat is just about messages.

There are all kinds of approaches you can take, and they all have different trade-offs. Given the rule and how you’re modeling, pick what fits. It does not need to be an aggregate just because dogma says so.

Maybe it’s a transaction script. Maybe it’s an aggregate. Use what fits best.

When you’re modeling something like the group chat example, start with the rule. Ask yourself: Where can I reliably and efficiently enforce this rule?
Not: “How can I convert this schema into my object model?”

Too long didn’t read/watch: model rules, not relationships.

Follow @CodeOpinion

Minimal APIs, CQRS, DDD… Or Just Use Controllers?

Domain-Driven Design Misconceptions

All Our Aggregates Are Wrong

The post Aggregates in DDD: Model Rules, Not Relationships appeared first on CodeOpinion.

Domain-Driven Design Misconceptions

Derek Comartin — Mon, 01 Dec 2025 22:51:15 +0000

Domain-Driven Design misconceptions often come from treating DDD like a checklist of patterns. Have you ever looked into Domain-Driven Design and thought, “Wow, this is totally overkill”? Well, you’re not alone. And I kind of agree, but not for the reasons you might think.

I say you’re not alone because of this meme I did a video about that keeps giving. Somebody replied, “Learning .NET DDD sent me back to learning MVC. It’s so stressful.” I was kind of confused by this, and then somebody else was as well, saying, “What is the connection from DDD to MVC? It’s design patterns, I think.”

And there’s the smoking gun.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://www.youtube.com/watch?v=YTydbtFGfdk

Why Domain-Driven Design Feels So Complicated (And How to Fix It) (https://www.youtube.com/watch?v=YTydbtFGfdk)

I had a feeling this was going to happen, although I thought maybe, hey, it’s 2025 and we’ve gotten past this now. But clearly not, because a lot of people still think it’s about a checklist. A checklist of patterns you have to apply rather than it simply being a matter of understanding the domain, understanding the business, and modeling it.

The Code We’ve All Seen

Tell me you haven’t run into this.

View the code on Gist.

I’m using the example of a shipment. We have this UpdateShipmentStatus command where we take a shipment ID and what the status is. That’s probably invoked from some MVC controller or endpoint.

Then we have this handler that’s invoked where we pass that command in. What are we doing here? Oh, there’s a repository where we’re getting the shipment. Then we call UpdateStatus and save it.

Let’s take a look at what UpdateStatus does.

Almost nothing. Really just changing the property.

Tell me you haven’t seen this before.

I’ll give you another example you can probably relate to.

Let’s say we have a Customer aggregate. It’s the aggregate root. It’s a domain entity. It has relationships to the order history, the addresses, maybe when you’re looking at this aggregate it also publishes domain events.

Pretty impressive, right?
It’s using all the DDD lingo. You’re looking at this code. It has the relationships. Sounds great.

Not really.

Because it’s not capturing any business logic. Any behavior at all. What it’s really doing is just capturing structure of data. That’s it.

Domain design is not about design patterns. But a lot of people think it is. Which ties back to why people think it’s complicated. They think they have a checklist of patterns, entities, aggregates, value objects, repositories, shared kernel, all these things they read about, thinking, “I need all this stuff to apply DDD.”

And guess what?
You probably aren’t in a domain that even warrants it.

That’s why it seems complicated. Like my code example that needed none of those patterns.

It’s not about design patterns. It’s about the language you use within that domain, the workflows involved, the business logic, and the domain rules.

It’s Called Domain Driven Design

What I’m about to say may sound ridiculous, but take a step back for a second.

Domain Driven Design.
Not pattern driven design.
Not aggregate repository driven design.
Domain. Driven. Design.

It’s in the title. What do you think you’d actually be focusing on?
Probably the domain.

Domain-Driven Design misconceptions have a lot to do with the content published online. To be fair, a lot of the content published around DDD is getting. It doesn’t focus only on the tactical. It talks about the strategic, the stuff I’m talking about: bounded contexts, ubiquitous language, subdomains, context maps. All that.

But people latch onto the tactical. They see entities, aggregates, value objects, and want to disregard everything else and just focus on patterns.

Where DDD Actually Shines

Where DDD shines, in my opinion, is around complexity of a domain, specifically workflows.

Let me give you a simple example in the shipment world.

There’s a whole workflow and lifecycle that a shipment might go through. There’s a business process:

The order is dispatched.

The truck or vehicle doing the pickup arrives at the shipper.

The freight is loaded onto the vehicle.

It departs from the shipper.

It arrives at the consignee.

It’s unloaded and delivered.

I’m simplifying this, but you can imagine multiple shipments for multiple pickups with multiple deliveries, where things are split, part of the freight goes here, another part goes there. It can get very complicated. But there’s a lifecycle. There’s a workflow.

This is where DDD shines.

And you’ll notice in that workflow, I was using language that’s very domain-specific that anyone in it would understand. It wasn’t “update shipment,” like my initial code example where I was just setting a status. It was specific about events and actions that actually occurred.

That’s what DDD is for: capturing decisions, transitions, rules.

So, if I’m using something like an aggregate, that’s what I’m using it for. I have a shipment where I’m making sure it’s always in a valid state because there are state transitions it has to go through.

View the code on Gist.

I’m checking things like:
If I want to arrive at a particular stop, are all previous stops departed? If I’m trying to arrive but something earlier isn’t valid yet, that’s wrong.

Same thing on pickup. Is this the stop you should be departing from? Are there others before it that aren’t in the right state?

We’re making sure we’re always in a valid state.

Think of any example you have in a complex domain. I’m simplifying this, but you get the idea. You have logic tied to workflow, tied to state, making sure it’s always valid.

Where the Value Really Is

I put a lot of value in the strategic aspect of domain design, the language, the modeling.

What does that mean in code?
That’s where the tactical stuff comes in.
Entities, value objects, aggregates. They’re tools. They’re not the starting point.

There is value in the patterns people describe in DDD. Absolutely. But they’re a means to an end.

They’re not where you begin.

And this is why people think DDD is complicated, they’re starting from the tactical and trying to apply that to parts of the system that don’t need it at all.

This is typical. In my shipment example, the shipping workflow might be very complicated, but other parts of your system are simple. They support that complexity. Customer management, CRM, things like that, very basic.

Do they need all the tactical patterns?
No.

Can you still be explicit about language? Sure. But if it’s just CRUD, and it really is just data-driven structures, with no behavior or workflow, that’s fine.

So back to the original example of updating a shipment status. Does that make sense in a shipping system? Probably not.

But maybe if the whole system is just capturing status and nothing more. Maybe the workflow lives in someone’s head, not in software. If that’s all you’re doing, CRUD, why do we have a command, a handler, a repository, a domain entity that’s really just a data bucket with a setter?

That’s not DDD. That’s just applying patterns for the sake of patterns.

The tactical patterns make sense when they actually solve the problem you have. But that only happens once you understand the domain. It can’t be the other way around.

Talk to People First. Patterns Later.

Domain-Driven Design Misconceptions come from starting with patterns. Patterns are not bad. They’re just the wrong starting point.

Start by talking to people within the domain. Model workflows. Use the business’s language.

Then reach for things like entities and value objects.

Follow @CodeOpinion

Minimal APIs, CQRS, DDD… Or Just Use Controllers?

Double Dispatch in DDD

Distributed Domain-Driven Design, a No-Nonsense Implementation Guide

The post Domain-Driven Design Misconceptions appeared first on CodeOpinion.

Minimal APIs, CQRS, DDD… Or Just Use Controllers?

Derek Comartin — Mon, 24 Nov 2025 22:14:32 +0000

You’ve probably seen this meme floating around, and it’s funny. Why? Because there’s some truth to it. At one end, we just have MVC controllers. At the very opposite end, simply MVC controllers again. But there in the middle lies all the abstraction, libraries, tools, etc. The list goes on.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://www.youtube.com/watch?v=FR64isPs5Vs

Minimal APIs, CQRS, DDD… Or Just Use Controllers? (https://www.youtube.com/watch?v=FR64isPs5Vs)

So, who’s right? Well, like most things in software, it depends. But the answer “it depends” is ridiculous unless you say it depends on… which I’m going to do.

I’m going to take a really simple example that has some abstractions in it, and I’m going to take that to something purely concrete. Along the way, I’m going to explain the trade-offs. This is less about being a dumb or smart developer. The middle isn’t necessarily even bad. It’s about design choices and trade-offs. This isn’t to say that one is better than the other. It’s to show that your decisions affect things like testing, extensibility, coupling, and, as always, that context matters because context is king.

I’m going to jump into my code example, but first I’d like to thank Current for sponsoring this video. Current’s an event-native data platform that feeds real-time business-critical data with historical context and fine-grain streams from origination to destination, enhancing data analytics and AI outcomes. For more on Current, check out the link in the description.

Starting Point

View the code on Gist.

Here’s a pretty simple example. We have an OrderController, and what we’re really going to look at and rejig a little bit here is this MyOrders method.

So this particular route, right now what it’s doing is using MediatR to make a request to GetMyOrders, and what it’s doing is passing in the identity name of the user that’s logged into the system.

If we look at that request, the GetMyOrders request for MediatR and the handler for that request.

First, it’s injecting an IReadRepository. It’s using that by also specifying a specification. The specification is basically adding some where clause and other LINQ behavior to Entity Framework Core, the ORM that we’re using. But that’s all backed by this repository. So we have the specification that’s doing the filtering. We’re getting our list of orders out, and then we’re doing some transformation of that into this OrderViewModel.

So right now we’re using MVC, MediatR, a specification, a repository, and Entity Framework Core.

Let’s start ripping this apart and replacing some of the abstractions with concrete implementations, and then talk about the trade-offs.

Removing the Specification and Repository

View the code on Gist.

First, I’m going to remove the specification and repository entirely.

I’ll just get rid of those and use the DbContext directly. I’m replacing this with the actual query needed. I’m doing exactly the same thing. We end up with the same result.

So what are the implications of doing this? What’s the value of that specification? What’s its utility? What do we lose by removing it?

Its value and purpose were to capture that precondition—the filtering of our username when passing it to the repository—and it was also doing the eager loading of line items so you don’t accidentally forget to do that. That was its purpose in this context. Again, context is king.

How many usages did it have? One. Is it worth creating that indirection for one usage? In my opinion, absolutely not. But again, this is a sample to illustrate things. If you had this used in 20 different places and always needed that same filter, yes, it’s worth capturing that explicitly and giving it a valuable, meaningful name for that use case.

With the repository, because it was using the specification, we kind of had to use both together. Otherwise, we’d get too much data and filter stuff in memory. That makes no sense. So they went hand in hand.

Is there value in the specification and repository? Absolutely. It depends on what you’re capturing and how many usages you have.

You might say, “I hate this. I don’t like this at all. I want to abstract my use of my ORM because I may change data access. I may change my ORM.” Sure. Same question: how many usages do you have of the ORM for orders? Do you have a thousand or do you have ten?

If you need to change ten usages of your ORM, is an abstraction valuable? In my opinion, no. Change the ten usages. Do you have a thousand usages? Then maybe it’s worth it.

I’d also pose the question: why do you have a thousand usages? Could be entirely valid. But this all comes down to your degree of coupling and also testing.

You may say, “That repository and specification were easier to test than using EF directly.” True. Maybe not. I don’t agree that this isn’t testable. To me it is equally testable to fake out this particular data set just as it would be to fake out that repository.

So the abstractions you’re creating or using—do they have value?

Let’s keep dismantling because this next part brings up another great trade-off.

Removing MediatR

Let’s get rid of MediatR. Do we need it? What’s the value? What are the trade-offs?

View the code on Gist.

I’m going to take all the contents of the handler and put them right inline in the controller.

When I do that, immediately something should stand out: we don’t have the username. Where was the username coming from? MVC. This type is only accessible in the controller. We were using that to pass into our request. Now we have to have it directly inside our application code.

Now we’re muddling the water between application code and framework code. MVC is about HTTP. Now we have application code that is really tied to the web framework. Before we had an application request. Now we really have a web request.

We’re coupling directly from MVC into our application code. Does that matter?

Does it matter that you’re building a web app that returns HTML or JSON? It’s a web app. It’s built on top of HTTP. There are no other entry points. Or do you need other entry points?

What I mean is this: MVC could be one entry point. Minimal APIs could be one entry point. But you might have another. For example, if you’re using something like a web + queue worker pattern, your MVC could place messages on a queue, and separately you have a worker executing tasks. Same codebase. Could be deployed as two separate units or together. But two different entry points.

If that’s the case, you don’t want your application code directly in an MVC controller.

So the problem here is you’re coupling your application logic to ASP.NET Core MVC. Is that a problem? No. If all you ever need is HTTP and that’s what your app is, that’s totally fine.

Could you still use something like MediatR when you have multiple routes calling the same command or query? Yes. That’s an applicable use case.

Controllers aren’t the problem. Minimal APIs with CQS, AutoMapper, MediatR, FastEndpoints, vertical slices, DDD, all this stuff, none of that is the problem.

The problem is not understanding the degree of coupling you have to the tooling you’re using in the given context and whether it has value.

What I’m Actually Trying to Convey

A few points:

Frameworks (MVC, Minimal APIs, etc) are entry points. They aren’t architecture.

Your architecture is composed of many different architectural styles.

Indirection adds flexibility, but it also adds complexity.

Direct coupling isn’t inherently bad.

You don’t need an abstraction for everything unless something is actually limiting you.

Use abstractions when they solve real problems, not just because everyone says to abstract everything.

Design around your application’s needs, not someone else’s patterns.

Follow @CodeOpinion

Clean Up Bloated CQRS Handlers

Double Dispatch in DDD

How Many Microservices Do I Need?

The post Minimal APIs, CQRS, DDD… Or Just Use Controllers? appeared first on CodeOpinion.

Clean Up Bloated CQRS Handlers

Derek Comartin — Tue, 11 Nov 2025 20:13:00 +0000

We’ve all had bloated CQRS handlers. You open up a command, query, or event handler, and it’s a bloated mess. It’s a nightmare of code. There’s validation, authorization, state changes, side effects, logging, it’s a mess to maintain and it’s really hard to test.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://www.youtube.com/watch?v=oxs46k1fLTA

Clean Up Bloated CQRS Handlers with Domain Logic & Pipelines (https://www.youtube.com/watch?v=oxs46k1fLTA)

The Bloated Handler

Now, mind you, this is a very simple example, but you’ll get the gist because there are a lot of concerns here. This example is dispatching a shipment, basically, a package.

Here’s what that might look like:

View the code on Gist.

Mind you, in the real world, you could probably imagine this being hundreds of lines long with all kinds of validation, state transitions, and other logic, but you get the gist. There’s a lot going on here. This can often be pretty typical of most CQRS handlers that contain validation, state changes, and other concerns such as email and event publishing in this example.

Step 1: Move Logic into the Domain

View the code on Gist.

I still have my shipment logic here, but instead of doing that validation to make sure the status was in a ready state and then changing the state, I moved that all into our shipment.

I created a Dispatch method where I just moved that logic into it.

Quick Note on Indirection

Now, hold up a minute here, because you might have watched some of my other blogs/videos where I harp on indirection.

I’m not suggesting you do everywhere. Do this when, you have another place that uses the exact same state transition. Put that logic in a central place so that you always know you’re in a valid state.

Don’t add indirection for no good reason.

Step 2: Creating a Pipeline (Russian Doll Pattern)

Having made that disclaimer, let’s go to step two, creating a pipeline so you can execute small, simple tasks that are part of your flow.

This is often referred to as the Russian Doll pattern.

If you’re familiar with ASP.NET Core Middleware, it’s the exact same idea, except this is scoped down to a single application request, like a specific use case.

That’s exactly what I’ve done in code, broken it apart to create a pipeline.

View the code on Gist.

I’m not going to show all the trivial code for executing or defining a pipeline, you’re likely already using tooling if you’re working with commands, queries, or event handlers. The tooling you’re using might already support this, so check the documentation.

The important part is that I have a context.

This context is passed through my pipeline from one step to another.

Handling Side Effects

Now, I used to have logic for sending the email directly in here, but we can actually do that as part of the event instead.

That’s not even part of the pipeline, just completely asynchronous.

If we’re using some event-driven architecture, whether in-process or not, I can handle that event separately when the shipment is dispatched to notify the customer.

Benefits of This Approach

So now we’ve broken apart that original handler that had a lot of concerns into small steps, each calling the next.

And remember, because this is a Russian doll, when we call that last next, there’s nothing left to call, it returns. Then, the previous step resumes, which in our case saves the shipment to the database.

Now, everything has trade-offs.

One of the first benefits you’ll notice here is that it’s way easier to test because you’re testing a single step. You don’t have one big handler with seven dependencies. Instead, you have a small step that might have one or two dependencies that you can fake or mock easily.

That makes testing each part a lot simpler.

Another benefit is that it’s composable.

You might have certain steps that you want in every pipeline.

You might’ve noticed in the example that maybe you’d want to use the outbox pattern so that events aren’t published until after the database transaction commits. That’s a perfect fit here.

The Downside

Now, the downside is indirection, and that’s my biggest complaint about a lot of software.

If you look at a call stack, it can be a layered, nested mess. This pattern does add that.

But, like everything, there are trade-offs.

If you have complicated workflows and a handler with a ton of dependencies and hundreds or thousands of lines of code, there’s a benefit to breaking it apart like this.

It’s always about trade-offs.

Follow @CodeOpinion

Is CQRS Complicated?

Goodbye, long procedural code! Fix it with workflows

Managing Success and Growing Pains

The post Clean Up Bloated CQRS Handlers appeared first on CodeOpinion.

Double Dispatch in DDD

Derek Comartin — Wed, 29 Oct 2025 15:25:23 +0000

What’s Double Dispatch? Well, before we get to what it is, there is a common belief in domain-driven design that you want to keep your domain pure, meaning no dependencies, no services, no distractions. I get it because you do not want that core logic coupled to infrastructure concerns like database calls. You want it to be deterministic because you want it to be testable.

But somewhere along the way this advice turned into dogma that you cannot inject behavior into your domain. I am going to challenge that. If you are modeling your domain and capturing behavior, you can inject behavior into your domain using double dispatch. Used correctly, you can write expressive, testable code in the form of policies and specifications. It might actually be the most DDD thing you can do.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://www.youtube.com/watch?v=K9vEDkq9GIw

Double Dispatch in DDD: When Injecting Dependencies Makes Sense (https://www.youtube.com/watch?v=K9vEDkq9GIw)

Bad example

View the code on Gist.

To start, imagine a simple Shipment. We inject a system clock and have one method, isLate. The behavior is simple. We have a parameter of expected delivery and we compare it with the system clock. If now is greater than expected delivery then it is late.

This is straightforward, but the rule is hardcoded on the Shipment. If the rule needs to vary, the Shipment is the wrong place to embed all those variations.

Refactor using double dispatch

Double dispatch is when an object does not act on its own data. Instead, it delegates the decision to another object. In DDD we can use policies and specifications to model that delegation.

View the code on Gist.

For example, create an interface IDeliveryTimingPolicy. Implement two policies. The first is StandardDeliveryTiming which has an isLate method. Instead of Shipment pulling the current time itself, the policy gets passed a DateTime that represents now. The policy then compares now to the Shipment delivery date to determine lateness.

The second is BufferDeliveryTiming. It takes a buffer value, maybe minutes, hours, or days. When checking lateness it compares now to the Shipment delivery date plus the buffer. So if a delivery date was right now and the buffer is 30 minutes, the shipment is not late until after those 30 minutes.

Here is where the double dispatch happens. When I call shipment.isLate(policy), I pass the policy into the Shipment. That policy then receives the Shipment as a parameter and makes the determination. The Shipment delegates the decision, and the policy acts using the Shipment data. Shipment still owns the question of whether it is late, but it delegates the mechanics of the rule.

This allows the domain to remain expressive and testable. The rule itself lives in the policy. The domain is the entry point for the decision, but the policy defines the rule.

Testing is deterministic

View the code on Gist.

Testing remains straightforward because everything is deterministic. You create the Shipment and you create the policy with a specified now value. For example, create a Shipment with a delivery date of yesterday and test it with the StandardDeliveryTiming policy where now is today. It is late.

Or create a Shipment with a delivery date of 15 minutes ago and test it with a BufferDeliveryTiming policy that adds a 30 minute buffer and uses the current time. It is not late because of the buffer. Both tests are deterministic. You set up the Shipment and the policy and assert the outcome.

Specifications and collections of rules

Double dispatch applies to more than just timing policies. Consider shipment readiness. Define a ShipmentReadinessRule interface with isSatisfiedBy and pass the Shipment into it.

HasValidDestination, checking that the Shipment has a destination.

AllPackagesPacked, iterating through packages to ensure each is marked packed.

NotAlreadyShipped, checking the Shipment status.

View the code on Gist.

Now create a CanShip method that takes an enumerable collection of these rules and verifies they all pass. The Shipment still controls whether it can ship, but it delegates each individual rule to a specification object.

Configurable rules in multi tenant systems

In a multi tenant SaaS application these rules are often configurable. How do you decide which rules to pass in? Typically you load configuration from storage and build the rule set at the application layer. For example, get the Shipment, determine the customer, fetch that customer’s configured rules, build the specifications and pass them to CanShip.

View the code on Gist.

Testing follows the same pattern. Build the Shipment and the rules and assert the results. Because everything is passed in as explicit objects with explicit inputs, tests are deterministic and clear.

Addressing the dogma

Is injecting something into your domain via constructor injection or as an argument to a domain method always terrible? No. Not when you are passing domain behavior and domain concepts. You are not passing a database or a logger. You are passing policies and specifications, things that belong to the problem space.

At the core, the real reason people avoid dependencies is coupling. What are you coupling to? Are you coupling to domain concepts or infrastructure concerns? There is a big difference. Inject domain concepts when it makes sense. The domain can delegate parts of the decision making without losing ownership.

Do not inject anything into your domain

If that is the mantra you were taught, consider this. The policy you pass in is a domain rule. You are putting behavior back into the domain, not moving it out. The Shipment remains the entry point and still owns the decision. It just asks a domain concept for help deciding how to apply the rule.

Double Dispatch

Double dispatch is a simple and powerful way to keep your domain expressive and testable while allowing behavior to be injected in a controlled way. Use policies and specifications when those behaviors are part of the problem space. Avoid injecting infrastructure concerns into the domain. Consider what you are coupling to and keep your domain the owner of the decision, even when it delegates.

Follow @CodeOpinion

DDD is just giving a about your Domain

The WORST Domain Modeling Mistakes!

Careful How You Inject Those Dependencies

The post Double Dispatch in DDD appeared first on CodeOpinion.

Authorization: Domain or Application Layer?

Derek Comartin — Wed, 22 Oct 2025 15:36:47 +0000

I’m diving into a super common question that’s really important: where should your authorization live? Should it live within your domain or your application layer? I am going to show some real world code examples and some simple guidelines so you can keep your software architecture consistent and avoid authorization code scattered everywhere.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://www.youtube.com/watch?v=0TpejBzN-xw

Authorization: Domain or Application Layer? (https://www.youtube.com/watch?v=0TpejBzN-xw)

Authentication versus Authorization

I want to make the distinction because a lot of people mix these up. Authentication is who are you. That is not what we are talking about. What we are talking about is authorization, which is what you are allowed to do. I want to set that in again: what you are allowed to do. That phrasing will guide the guidelines and code samples I’m about to present.

Think of a typical flow. A client makes a request to our application and provides some identity for the client making that request. From there we perform authorization to determine whether the action being attempted is allowed. If it passes, we then hit our database or perform the business operation.

The Question: Domain or Application Layer?

When people start modeling their domain and using domain driven design, they ask: I previously had authorization in one place, but now some thing feels like a business rule and should live in my domain. So I am going to make the application call into the domain, fetch the domain object or aggregate, and let the domain do authorization. Which is it? Should the logic be inside the domain or a level up in the application layer?

What are you allowed to do?

Bank Account Example

View the code on Gist.

Imagine a bank account in the domain with a withdraw method. Inside that withdraw method we have logic like: if the current user making the request is not the owner of the account then throw. Should that logic live here within the domain or should it be in the application layer?

Here is a guideline I use. If the domain needs to know business entities, values, or state transitions, then that logic should stay inside the domain model. If the domain needs to know who is making the request, meaning identity, roles, claims, then those concerns should be handled outside of your domain model in your application layer.

View the code on Gist.

In the bank account example our domain does not need to care about identity. The domain needs to care about business rules like you can not withdraw more than the balance. That is a business rule and belongs in the domain. The identity portion can be removed from the domain and placed up in the application layer, where you check whether the requester is allowed to act on that account. Once that check passes, call into the domain to perform the withdrawal.

Vacation Request Example

View the code on Gist.

Now consider a vacation request. We have an Employee which specifies whether the employee is a manager or not. We have a VacationRequest and an approve method where we pass in the approver. Inside approve there is logic like: if the approver is not a manager then throw.

This is fine inside the domain because an employee is a business concept and the rule that the approver must be a manager is a business rule. This is not leaking information about who is making the request. This is not authorization in the sense of verifying identity and permissions. It is validating behavior. Put another way, it is not about who can access a system feature, it is about what the business considers valid behavior.

Shipment and Dispatcher Example

View the code on Gist.

Consider a dispatcher who takes orders and assigns a vehicle and driver to perform deliveries. The dispatcher has an assigned region. A shipment has a region. When assigning a driver to a shipment you might have a business rule like: if the dispatcher assigned region does not match the shipment region then throw. That is business logic. The domain cares about whether that association is valid in the context of the business process.

View the code on Gist.

On the other hand, authorization is about who can do the assignment in the system. Perhaps only users with the role dispatch manager may assign a dispatcher to a shipment. That is a security check at the application layer. For example, you might implement a policy in ASP.NET Core that requires a dispatch manager role. When the request comes in, check that the caller has that role. If they do, proceed to the domain operation. If you let the domain know about the dispatch manager as a user, the domain drifts away from modeling the business capability and into handling access control concerns.

Guidelines to Decide Where Authorization Belongs

If a rule controls who can access a system feature or perform an operation, treat it as security and handle it in the application layer.

If a rule controls how the business process works and what the business considers valid behavior, implement it as business logic in the domain layer.

If the logic depends on identity, roles, or claims, consider keeping that logic outside the domain and enforcing it in the application layer.

If the logic depends on business entities, values, or state transitions, keep it inside the domain model.

Why Leaking Identity into the Domain Causes Problems

View the code on Gist.

Here is an example that explains why leaking identity into the domain is problematic. Imagine a User with a resetPassword method that sets a temporary password. Inside resetPassword we check if the user making the request is the same user as the user object, and if not, we throw. That seems reasonable at first glance, but it breaks reuse.

What happens if an admin needs to reset a password for someone else? What happens if a service or another entry point needs to create a user on behalf of another? If the domain insists that resetPassword can only be invoked by the same user, you can no longer reuse that method for other legitimate flows. The real concern is who is performing the action, and that belongs on the application layer where the caller and entry point are known.

Authorization is one of those things that seems simple until it is not. You add checks and scatter them across your domain, and then it becomes a gray area of what is an actual business rule and what is authorization tied to the caller.

Authorization: Domain or Application Layer?

Keep a clear distinction between business rules and access control. If it is about what the business considers valid behavior, let the domain own it. If it is about who can perform an operation based on identity, roles, or claims, put that in the application layer. Following these guidelines will help you avoid authorization logic scattered throughout your domain and will keep your domain model focused on modeling the business capabilities.

Follow @CodeOpinion

Build better HTTP APIs

The WORST Domain Modeling Mistakes!

Welcome to the (State) Machine

The post Authorization: Domain or Application Layer? appeared first on CodeOpinion.

Your API Errors Suck (Here’s How to Fix Them)

Derek Comartin — Tue, 07 Oct 2025 21:44:31 +0000

I’ve been using an HTTP API as a consumer and how ti deals with it’s HTTP API Errors is terrible. Tt’s returning back a 200 OK. And in the body of the response, it has an error property that has a user facing message.

You might be thinking, if you just return a 400 status code, all problems are solved. Well, not really, because ultimately you want some structure of data and a good developer experience so the developers can handle those errors, because often times you need to surface that to your end user.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://www.youtube.com/watch?v=XXagEUbrIc8

Your API Errors Suck (Here’s How to Fix Them) (https://www.youtube.com/watch?v=XXagEUbrIc8)

Error or Success?

It really comes down to being explicit about the success or failure of a request. I have consumed a lot of APIs, likely older systems, that do exactly this: on failure they return a status property that dictates success or failure and they always have a message property that is kind of user facing. The catch is they still return HTTP 200.

Here is what I often see. On failure you get a body like this with a status and message and maybe some other metadata, and on success you get a body with that same structure plus some data.

They both return 200, and those properties always exist.

On error:

View the code on Gist.

On success:

View the code on Gist.

I can always deserialize knowing those properties are present and then decide what to do next.

If you prefer to use HTTP status codes properly, I am all for that. But if you’re going to return 200 and be explicit in the response body, that is okay too. All I really care about is having an explicit indication of success and failure.

Response Body: Why structure matters

The main problem with letting the payload dictate whether it’s an error is tooling and deserialization. Depending on the libraries or languages you use, nullable properties, optional fields, and missing fields can get messy. It makes the developer experience worse.

What you want is a combination of two things:

Human readable information so someone debugging or an end user can understand the problem.

Machine readable information so clients can programmaticly react to specific error conditions at runtime.

When failures occur, you often want to do something specific based on the error. For example, you might automatically truncate a value and retry, or you might map a specific error code to a friendly message for the user. To do that reliably, the API needs to provide structured error details that your client can depend on.

Example: QuickBooks Online

View the code on Gist.

A good real world example is QuickBooks Online and how they deal with API errors. Their API always uses a particular response structure that includes a fault. The response shows a fault type and, more importantly, a very specific error code you can look up in documentation, as well as a message and details. They return this in a 200 response for many error conditions. They do return 400 if the request body itself is malformed or has syntax issues.

Here is why this works well:

The fault type gives you a high level classification, such as validation fault.

The specific error code tells you exactly what failed, for example length exceeded for the first name.

The details provide contextual information that helps both debugging and forming better user messages.

Because of the specific error codes, when deserializing you can decide to do things like truncate and retry, show tailored UI messages, or take other programmatic actions. That is good developer experience: both human readable and machine readable.

Problem Details (RFC 9457)

You might be yelling, “Can we just not use a standard? Why does everybody have to create their own error responses?” The good news is many frameworks already support a standard called Problem Details. It has been around for a long time and if you are using any modern web framework, it probably supports it. I just don’t see it enough used in public APIs that I consume.

Problem Details is defined in RFC 9457. It covers everything I have been talking about: being explicit, providing machine readable information for runtime handling, and giving human readable fields to help with debugging and user messaging.

View the code on Gist.

The standard response has fields like:

type: a URI that identifies the error type. This can also point to documentation about that error type.

title: a short, human readable summary of the problem.

status: the HTTP status code that applies to this problem.

detail: human readable details specific to this occurrence of the problem.

Your type could be something like invalid parameters. The type URI can link to documentation that explains the expected response shape, for example a collection of key value pairs that tell which fields are invalid. That makes it machine readable because your client can expect a consistent structure when it sees that type, and human readable because the title and detail explain what happened.

API Errors

As an API consumer and producer, focus on being explicit about success and failure. When failures happen, separate what is machine readable from what is human readable. Use the machine readable fields to drive runtime behavior and use the human readable fields to help developers and end users understand the problem.

Problem Details is one solution that gives you both. It does have some issues if you’ve used it, and many teams implement their own structure. I’m not entirely against homegrown formats as long as documentation is thorough and clients can reliably parse and react to errors.

API Errors do not need to be hard. Stop hiding errors inside a 200 response without structure. Provide explicit success and failure indicators, give machine readable codes for runtime handling, and include human readable messages for debugging and user messaging.

Follow @CodeOpinion

Build better HTTP APIs

HTMX: What’s Old is New Again

How to (and how not to) design REST APIs

The post Your API Errors Suck (Here’s How to Fix Them) appeared first on CodeOpinion.

CRUD-Sourcing is why Your Event Streams Are Bloated

Derek Comartin — Wed, 24 Sep 2025 21:08:52 +0000

I see the same two issues come up over and over with event sourcing — they cause a lot of pain and they shouldn’t have to. Most of the pain stems from bad modeling, specifically because of CRUD-Sourcing. In this post I want to show why long event streams usually mean you’re modeling the wrong events, and how to fix that so your streams have natural starts and ends.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://www.youtube.com/watch?v=clFujW5KKLw

Stop CRUD-Sourcing: Why Your Event Streams Are Bloated and Broken (https://www.youtube.com/watch?v=clFujW5KKLw)

The common problem: long event streams

Look at any discussion about event sourcing and you’ll find the same question: what do I do if my event stream never ends? I get it. I use a warehouse example in some videos where you receive product, ship product, and the events just keep coming. A bank account is the same — deposits and withdrawals forever unless the account is closed. That endless stream feels like a problem.

If you’re familiar with event sourcing you might be yelling SNAPSHOTTING. Yes, snapshots are an optimization when streams get long, and I’ll link to more on snapshots at the end. But snapshotting is not the first thing I’d jump to. Before optimizing, ask: do the streams really need to be this long?

Most things have a life cycle

Most domain concepts you model actually have a life cycle — a beginning, a work in progress, and an end. A simple example is a support ticket. A ticket is opened, moves to pending as you work it, there are interactions, it gets resolved, and then closed if there are no further interactions. That gives you a clear start and end.

Even the warehouse and bank account examples have life cycles. You start receiving a product, do many operations, and eventually you discontinue the product. You open an account, have many transactions, and eventually you close it. The key is to find the natural checkpoints in those life cycles and treat them as boundaries.

Stock count as a natural checkpoint

In the warehouse example, think about a stream that starts when you first receive a product at quantity 10. You receive five more the next day, so you have 15. Then you ship six, so quantity on hand becomes 9. Later you do a physical stock count and discover one is damaged or lost, so you adjust to 8.

In the physical goods world, the real source of truth is what is actually in the warehouse, not what the system says. That stock count and adjustment is a natural checkpoint — it marks the end of one stream and the start of the next.

Cold data vs hot data and periods

Think about cold and hot data. Cold data is historical, rarely accessed, almost archival. Hot data is recent and accessed frequently for reads and writes. Many systems naturally partition by this idea.

A good real world example is accounting. Accounting operates on annual cycles. The chart of accounts itself is not the stream; instead you model the accounting period. The period gives you a bounded start and end for the transactions that belong to it. Use those natural boundaries when you can.

A common trap: CRUD-sourcing

One big reason people end up with long, unbounded event streams is they model events as CRUD changes, not as workflow. They capture property changes like “stop changed” without capturing what actually happened or why it happened. This is what I call CRUD sourcing or property sourcing.

When your events are driven by change data capture tools and look like raw data deltas, you lose the workflow meaning. An event that says “stop change” tells you nothing about why it changed. You have to infer the reason. In contrast, workflow events explain what happened and why.

Compare:

Stop changed — purely technical, ambiguous, no domain meaning

Arrived shipper — specific, meaningful to the shipping domain; we know the vehicle arrived at the shipper to pick up freight

Workflow events give you the life cycle you want.

For a shipment you might have events like dispatched, arrived shipper, loaded, departed, arrived, concented. Those workflow events define a beginning and an end for that shipment’s stream. They are not a long series of undifferentiated property changes.

How to keep streams manageable

To avoid long event streams, focus on modeling the business processes you actually care about. Capture workflow events rather than raw property deltas. That often produces streams that are naturally bounded by life cycle events or by explicit periods.

If you still have high volume streams, there are several options:

Make streams bounded by time or by business period — it could be a year, a day, or even an hour depending on your domain.

Use natural checkpoints — stock counts, account closings, order completion — to mark the end of one stream and the start of the next.

Use snapshotting as an optimization only after you’ve modeled events around workflow and periods. Snapshots speed up reads but don’t fix bad modeling.

When to snapshot

Snapshotting is a valid optimization for long streams, but it’s not the first thing to reach for. Ask whether your stream needs to be long in the first place. If you modeled workflow events and used boundaries and you still need better performance, then snapshotting is appropriate.

CRUD-Sourcing

In short: stop CRUD-sourcing. Model the why, not just the what. Look for life cycles, natural checkpoints, and appropriate periods to bound streams. Use snapshots as an optimization, not as a band aid for bad modeling.

Follow @CodeOpinion

Snapshots in Event Sourcing for Rehydrating Aggregates

Event Sourcing Tips: Do’s and Don’ts

Testing your Domain when Event Sourcing

The post CRUD-Sourcing is why Your Event Streams Are Bloated appeared first on CodeOpinion.

Regex for Email Validation? Think Again!

Derek Comartin — Mon, 15 Sep 2025 19:48:44 +0000

I ran into a nightmare of an issue recently because a service I use changed their email validation and decided my address wasn’t valid. In this post I want to walk through what happened, why simple regex for email validation often causes problems, and what you should do instead if you need to know whether an email actually exists.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://www.youtube.com/watch?v=M6itpAfbvas

Regex for Email Validation? Think Again! (https://www.youtube.com/watch?v=M6itpAfbvas)

The problem: plus-addressing and overzealous validation

If you use Gmail or many other providers, you might be familiar with labels (sometimes called plus-addressing). You can specify your account name, then a plus, then a label. For example, my actual mailbox is [email protected], but I also use addresses like [email protected] so I can filter or have unique addresses for services. There’s no difference in delivery — they all go to the same mailbox — but they are distinct addresses.

A service I used must have changed their application (probably something in their database), and I got an email requesting that I do a “forgot password” flow to reset it. The problem was: when I tried to enter my email address, their new validation wouldn’t even let me submit it because of the plus.

I tried a couple of online validation tools out of curiosity to see what they were doing as regex for email validation and saw the same thing: “Enter an email address. Nope, doesn’t work with the plus. Remove the plus, it validates.” Not great. This is more than just a bad regular expression — it’s conflating format validity with whether a mailbox actually exists.

What is a valid email address (format)?

“A valid email address is as follows: just the local part, an at symbol, then the domain. It’s not more complicated than that.”

To expand that a bit in plain terms:

The local part is everything before the @. It can include letters, numbers, dots, and certain special characters — including the plus sign used for labels.

The @ symbol separates the local part from the domain.

The domain is everything after the @ — a valid hostname (and often requires an MX record for mail delivery).

For simple format validation, you don’t need a monstrous regex that tries to account for every possible nuance. At its core, it’s just local-part@domain. Overly strict regexes often reject perfectly valid addresses like plus-addressed emails.

Does a valid-looking email actually exist?

Format is only half the battle. An address that looks valid might not exist for several reasons:

The domain doesn’t exist.

There is no MX record for the domain.

The domain accepts mail but the specific local part (the mailbox) doesn’t exist.

The mailbox exists but is disabled, full, or otherwise unreachable.

Option 1 — Send a verification email

A naive but straightforward approach is to send a verification email with a one-time code or confirmation link. If the user receives it and can enter the code, you know the address routes to a mailbox they control.

But there are trade-offs. If the verification email bounces, you learn the address is invalid — which is good — but outbound mail systems like AWS SES track bounce and complaint rates. If you send lots of messages that bounce, you can quickly harm your sending reputation. I believe SES treats a bounce rate around 2% as the beginning of warning territory, so you can’t be careless with high-volume verification attempts.

Most providers have bounce and complaint hooks and offer suppression lists. With SES you can:

Receive notifications for bounces and complaints.

Maintain your own suppression list to avoid re-sending to known-bad addresses.

Use the provider’s suppression features to keep your reputation clean.

Option 2 — Mailbox validation services

If you need to know whether an email actually exists but you can’t or don’t want to send verification emails (because of bounce rate or reputation concerns), use a mailbox validation service.

These services do the heavy lifting for you. Typical checks include:

Validating the domain exists.

Looking up MX records for the domain.

Connecting to the domain’s SMTP server and, where possible, checking whether the specific mailbox exists.

Returning a probabilistic result or verdict indicating whether the mailbox likely exists.

With that result you can decide whether to accept the address, prompt the user, or proceed to send a verification message if you still want to confirm ownership after the mailbox is likely valid.

Putting it together: what should you do?

If you’re only validating format, keep it simple: validate the local part, the @ symbol, and the domain. Don’t reject valid constructs like plus-addressing with an overzealous regex.

If you also need to know that an address exists, decide which trade-offs you accept:
- Send a verification email and handle bounces and suppression lists carefully (watch your provider’s thresholds).
- Or use a mailbox validation service to check existence without sending mail, then optionally send a verification email for final ownership confirmation.

Be mindful of customer experience: don’t create friction by rejecting addresses that are valid or commonly used for filtering (like plus-addresses).

Final thoughts

This whole post/video was spurred by my personal experience and hours spent trying to access an account because a service made a simple email validation change. A small change in validation caused a big headache for me and for other customers who rely on plus-addressing.

Follow @CodeOpinion

Why Separate Databases? Explaining Like You’re Five

Loosely Coupled Monolith

Composing Data from Multiple Services

The post Regex for Email Validation? Think Again! appeared first on CodeOpinion.

Composing Data from Multiple Services

Derek Comartin — Thu, 04 Sep 2025 13:47:04 +0000

One of the most common questions I get is how to compose data when different services each own their own data. You might have product details owned by the catalog service, pricing owned by sales, reviews owned by a reviews service, shipping information somewhere else, and order counts somewhere else. How do you get all that data together so you can render a UI or generate a report?

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://www.youtube.com/watch?v=EUWqaa86bOg

Disparate Data: Should You Fetch in Real-Time or Cache for Speed? (https://www.youtube.com/watch?v=EUWqaa86bOg)

The Problem: Where do you do the composition?

Developers want things to be simple. When a request comes in, they’d like to reach into their own local database and have all the data available within their service boundary. No cross-service calls at runtime, no network latency, no failure modes.

But in a distributed system, you can’t just assume a single source has everything. The data is everywhere, so by default you end up making multiple calls: catalog for the name and description, sales for pricing, reviews for ratings, shipping for delivery windows, inventory for quantity on hand. That runtime composition adds complexity and latency.

Option 1: Compose at Runtime (call other services)

The straightforward option is to do exactly that composition at request time. Your service or an intermediate layer calls each owning service and aggregates the responses. That keeps the data fresh, because you query the source of truth at the moment you need it, but it comes with trade-offs: increased latency, more failure modes, and tighter coupling across services during a request.

Option 2: Pre-compute and Cache (event-driven read models)

Another approach is to pre-compute the shape of the data you need ahead of time so you don’t have to assemble it at runtime. One common pattern is to use events to notify interested services that something changed, and have those services update their own read models into the shape they need to answer requests quickly.

To illustrate, imagine inventory events for a particular SKU. We receive 10 items, then receive 5 more, then ship 6, then an inventory count reveals a box of 50. If you process those events as they occur, you can maintain a current state like “quantity on hand = 59” without recomputing from the raw events at request time. You don’t need to replay the event stream every time a UI needs the quantity; the read model already reflects the current value.

When an event like “product received” is published to a message broker, other services can consume it and update their local data. The catalog service might be listening to warehouse events and change a document field like availability from “out of stock” to “in stock.” That lets the UI display a meaningful delivery promise without calling the warehouse service at request time. Essentially you’re changing the shape of the data ahead of time so the composition becomes simple: query your local database and return the pre-assembled document.

You’re changing the shape of the data. You’re not doing all this at runtime—you’re pre-computing so that when a request comes in you already have what you need.

Trade-offs: Staleness, Incorrectness, and Reconciliation

This pre-computation model gets you back to the simple mode developers want, but it comes with costs. The most obvious trade-off is stale data. By keeping a local cached copy of another service’s data, that copy may lag behind the source.

Worse than stale is incorrect. If an event never publishes or a consumer fails to process an event, the read model can be wrong. Handling that requires extra complexity: periodic reconciliation or checkpoints where you query the source service directly to ensure your cached copy matches the source. You can use events as a change set to keep things up to date and still periodically reconcile from the authoritative source.

It’s not binary. You don’t have to choose pre-computation for everything or runtime composition for everything. Consider the nature of the data. Pre-computation works well when data is not highly volatile or when you’re dealing with a finalized state at the end of a lifecycle—think reporting or data that is unlikely to change after a certain point. For highly transactional or very volatile data, runtime composition may be more appropriate depending on volume and latency requirements.

Practical Guidance

In my experience, a hybrid approach is often the best path. Use pre-computed read models for data that benefits from fast access and is relatively stable. Use runtime composition for volatile, strongly consistent data where freshness matters more than latency. Plan for reconciliation when you cache other services’ data and be explicit about the consistency and staleness guarantees you provide to clients.

Remember that regardless of approach, coupling remains: you still need to know how to assemble the pieces. The decisions are about where that composition happens and what trade-offs you’re willing to accept around latency, complexity, and correctness.

Composing data from disparate services is one of the most common questions I get. There’s no silver bullet—runtime composition gives freshness at the cost of latency and more failure surface, and pre-computation gives speed at the cost of potential staleness and extra complexity for reconciliation. Choose the approach that fits the volatility and lifecycle of your data, and don’t be afraid to mix patterns where it makes sense.

Follow @CodeOpinion

Loosely Coupled Monolith

The Challenge of Microservices: UI Composition

“I NEED data from another service!”… Do you really?

The post Composing Data from Multiple Services appeared first on CodeOpinion.

Why Separate Databases? Explaining Like You’re Five

Derek Comartin — Thu, 21 Aug 2025 18:16:33 +0000

I want to give you three different examples and reasons why you might want to separate customers and orders into different databases. The person who asked the question left out a lot of nuance and context, so I am going straight to the point. You will find all three examples boil down to the same underlying reason.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://www.youtube.com/watch?v=F2wea-D86xM

Why Separate Databases? Explaining Like You're Five (https://www.youtube.com/watch?v=F2wea-D86xM)

1. Third party systems: sometimes the data isn’t even in the same system

One reason to separate customers and orders is that they might not live in the same system. If you’re building e-commerce, you might think, “I want customers and orders together behind my relational database so I can join the data and print an invoice or show an admin a grid of orders and customer details.” That sounds reasonable. But the real world often looks different.

We might decide the core value is sales, and instead of building a CRM ourselves, we buy one. We integrate with a CRM like Salesforce or a purpose-built CRM. In that case, customer data lives in an external system that is better suited for marketing, loyalty, support, and other customer interactions. Orders live in our order system. They are not in the same database.

So yes, customers and orders can be separate because the best tool for one part of the business is a third party system. We integrate with it instead of building it ourselves.

2. Large systems: database coupling becomes a maintenance nightmare

Another reason is that you do not want to end up with a giant database schema that is a turd pile and is really hard to change. The asker assumed keeping customers and orders together in one database would be easier because you could avoid multiple calls and joins. That may be true for simple systems. But context matters. If your system is large and highly coupled, sharing a single database makes change painful.

Imagine a massive schema with clients making all kinds of queries against it. Orders have a foreign key to customer. Customer has email. Now suppose you want to change how you represent email addresses, maybe allow multiple emails per customer. If that column is referenced in dozens, hundreds of places across multiple services or projects, you cannot simply change it. You will break integrations you do not even know exist.

Integration at the database level is hard to evolve. The degree of coupling matters. If you have a small system with a dozen or two dozen places in a single codebase, refactor and change is doable. If you have many parts, multiple repositories, or other systems reading from the same schema, it is a rats nest.

Also make the distinction between reads and writes. Much of the argument for keeping data together is about read composition. For example, showing an invoice with customer name and order details makes it convenient to have both in one place. But on the write side you might be persisting events. Your event store contains the series of events that produced state, and you might not store customer data inside that event store.

What solves this is a read model or projection built specifically for that read use case. You can project the data into the shape you need for invoices and admin grids. The read model can have customer information and order details combined for convenience, while the underlying write model and event store remain separate. This gives you the performance and simplicity for reads without forcing write side coupling.

3. Business alignment: different contexts, different behaviors

The root reason separate databases comes back to is business alignment. The original question was framed from a technology first perspective, which is the wrong starting point. Ask whether your architecture aligns with business needs or whether it imposes constraints on the business.

In a large system, what you do with customers in the context of orders is very different than what you do with customers in the context of marketing, support, or loyalty. A customer in the order context might represent billing information, shipping addresses, and order history. A customer in the marketing context might include segmentation, campaign preferences, and multiple contact points. They are the same person but they are treated differently. They have different behaviors and capabilities.

That difference in behavior and responsibility is why you might model them in separate databases or services. Each boundary owns its own data and its own rules. They may refer to the same customer by an identifier, but how the data is used, evolved, and scaled is different. Separation helps manage coupling and focus on cohesion within each bounded context.

So, when is separation justified?

If you rely on a third-party system for customer data, separate the stores.

If your system is large and evolving, and schema coupling prevents safe change, separate the data to reduce cross-system impact.

If business domains treat the same entity differently, align your data boundaries to those business boundaries.

If, however, your system is small, low coupling, and you can change things in a single codebase without breaking other teams, keep them together. Do not add complexity for no reason. Context is king.

Manage coupling. Focus on cohesion.

All three reasons for separate databases point back to the same thing: manage coupling and align your architecture with the business. Yes, separating customers and orders adds complexity, but you only add that complexity when your system needs it. If you have a simple system, keep things simple. If you are in a large system, consider separate databases to protect your ability to change and scale.

Follow @CodeOpinion

Loosely Coupled Monolith

Greenfield Project? Start HERE!

Microservices Don’t Share Data

The post Why Separate Databases? Explaining Like You’re Five appeared first on CodeOpinion.

Loosely Coupled Monolith – Software Architecture – 2025 Edition

Derek Comartin — Thu, 07 Aug 2025 16:52:29 +0000

Over five years ago, back in 2020, I posted a series of blog posts and videos outlining what the Loosely Coupled Monolith is. I was recently tagged in a post saying they read those original posts and moved forward with the concept.

In this article/video, I want to share with you the core ideas behind the Loosely Coupled Monolith, focusing on three key points: cohesion, managing coupling, and the realization that your logical boundaries aren’t your physical boundaries. We’ll circle back to these points at the end, and I think they’ll really make you rethink the last decade or so of this microservices vs monolith debate.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://www.youtube.com/watch?v=GOIYREEANEM

Loosely Coupled Monolith – Software Architecture – 2025 Edition (https://www.youtube.com/watch?v=GOIYREEANEM)

Focusing on Cohesion

The first two points—focusing on cohesion and managing coupling—go hand in hand. If you’re working in a system or trying to build one that’s hard to change or easy to introduce bugs into, it’s likely because you have a high degree of coupling and low cohesion. That’s exactly what we want to avoid.

When breaking apart a big system, or better yet, not producing a big system in the first place, think about it like this: instead of building one big pile of poop, you want to build lots of little piles of poop.

The reality is, not everything in your system is going to be perfect. Some parts will be great, others not so much, but the goal is to break your system into logical boundaries.

What Is a Logical Boundary?

A logical boundary is simply a grouping of functionality or capabilities within your system. Not all parts of your system are created equal. Usually, you have a core part where your real value lies—those end-user capabilities that matter most—and then other essential parts that support that core.

Let me give you an example from transportation in the illustration above:

Recruitment boundary: This includes concepts like driver, vehicle compliance, and driver’s license.

Dispatch boundary: This involves shipment, vehicle, driver, and position.

Now, a common mistake developers make is focusing too much on entities. For example, “vehicle” exists in both recruitment and dispatch, but it’s not the same vehicle concept. In recruitment, the vehicle is about compliance—insurance, registration, etc.—while in dispatch, the vehicle is tied to executing a shipment: arriving, loading, unloading, and so on.

This shows why a single model doesn’t rule them all. Instead, you want to focus on the capabilities of your system when defining logical boundaries. On the dispatch side, that means workflows like dispatching an order, tracking positions en route, and managing delivery. On recruitment, it’s about compliance and certifications. These are very different concerns, even if they share some entities by name.

Managing Coupling

Once you’ve defined your logical boundaries by grouping capabilities cohesively, they still need to interact. So, how do you manage coupling between those boundaries?

From a development perspective, you can think of a logical boundary as having three parts, which might be called projects, modules, or packages depending on your platform:

Contracts: This is your public API—an interface, delegate, function definition, or schema defining messages. It’s the slowest form of coupling your platform supports.

Implementation: The actual code that executes the logic behind those contracts.

Data ownership: Each logical boundary owns its own data. This doesn’t mean each must have its own database instance, but it should have its own schema or tables it exclusively reads and writes.

To manage coupling, implementations should never reference other implementations directly. Instead, implementations reference contracts—the public APIs. This way, you avoid tight coupling to internal details and make boundaries more maintainable.

Also, boundaries should never directly access each other’s data stores. They must communicate through the public API (contracts), not by querying or updating another boundary’s database tables. This keeps coupling low and boundaries well encapsulated.

The Loosely Coupled Part

So where does the “loosely coupled” part come in? It’s through messaging. Messaging helps remove the temporal aspect of coupling. Two logical boundaries can be coupled because they need to communicate, but they don’t have to execute at the same time.

For example, you can have a message broker where one boundary publishes an event, and other boundaries consume it asynchronously. This could be a message queue, event bus, or even database-driven messaging depending on your use case.

Using messaging means you’re not tightly coupling implementations at runtime. Instead, you’re coupling schemas (message contracts) and asynchronously processing events. All of this can happen inside the same monolith code base.

Scaling and Deployment

With multiple logical boundaries in the same codebase, you can still scale and deploy differently. Imagine three logical boundaries grouped into the same code base, but with two different entry points:

An HTTP API for web requests

A worker process consuming messages

Both entry points are part of the same code base but are built and deployed separately.

This lets you scale web traffic and message processing independently. For example, scale out the HTTP API behind a load balancer and scale workers separately based on message volume.

Logical Boundaries Aren’t Physical Boundaries

The key realization is that logical boundaries are not the same as physical boundaries. Too often, we get stuck thinking that a logical boundary must have its own source code repository, build artifact, or container. It doesn’t have to be that way.

You can:

Have a single source code repo containing many logical boundaries

Build multiple executables or containers from that repo (e.g., one for HTTP API, one for workers)

Deploy logical boundaries separately or combined, depending on scaling needs

Mix and match whatever works best. Recognizing this opens up a lot of possibilities and removes many of the limitations traditionally associated with microservices or monolith debates.

Loosely Coupled Monolith

The Loosely Coupled Monolith is about grouping functionality into logical boundaries with high cohesion, managing coupling through contracts and messaging, and realizing that logical boundaries don’t have to map 1:1 to physical deployment boundaries.

Thinking this way changes how you approach software architecture and design. It frees you from pointless debates about microservices vs monoliths and instead focuses on what really matters: building flexible, scalable systems that hold up over time.

Follow @CodeOpinion

Context is King: Finding Service Boundaries

Greenfield Project? Start HERE!

What Starbucks Can Teach Us about Scalability

The post Loosely Coupled Monolith – Software Architecture – 2025 Edition appeared first on CodeOpinion.

Database Migration Strategies

Derek Comartin — Fri, 25 Jul 2025 14:55:42 +0000

Database migrations gone wrong. You deploy a new feature in your app and suddenly half of it breaks. You start digging through your logs and realize it’s a database schema change that’s causing all the issues. Now you’re scrambling, wondering if you can roll back your code changes. Nope, the damage is done. You have to roll forward and fix everything as fast as possible. This whole mess could be managed better if you handle your database migration correctly and understand that backwards compatibility is key.

YouTube

Check out my YouTube channel, where I post all kinds of content on Software Architecture & Design, including this video showing everything in this post.

https://www.youtube.com/watch?v=_lAjBOXhrXY

Database Migration Strategies & Flyway Example (https://www.youtube.com/watch?v=_lAjBOXhrXY)

Keeping Your App Code and Database in Sync

The goal is to keep your app code and database schema in sync, meaning your app understands exactly what the database schema looks like. So when you need a schema change, the typical approach is to make the schema change first, then deploy your new app code that matches that schema and understands it.

Now, I say it’s typical to make the schema change first, but that really depends on the type of database you’re using. For example, if you’re using a relational database and need to add a column, you have options like making it nullable or giving it a default value. If you do that, your existing application doesn’t care about the new column, it just works the way it did before.

So if you deploy your new schema changes and the current version of your app is fine with them, then when you deploy the new version of your app, if something goes wrong that’s unrelated to the schema change, like a bug in your code, you can roll back. Because you were already running on that schema, everything stays stable.

Backwards Compatibility in Database Migrations

Let’s say you’ve deployed your schema changes and your app is fully in sync. But what if the column you added was nullable, and you actually want it to be NOT NULL? You can’t just switch it immediately, so you might create a backfill script to populate the column with real values.

Then, as a next step, you make another schema change that marks the column as NOT NULL and backfills the data in the same transaction. Your code changes can now assume the column is never null and stop dealing with null values altogether.

This process doesn’t always happen in a single step. Sometimes it’s a multi-step cycle where you first make a backwards-compatible change, stabilize, and then make further changes to clean things up.

Applying These Concepts to Event Streams and Document Stores

If you’re using an event stream or event store, the same principle applies. There’s no rigid schema like in a relational database, it’s all defined in code. So you make your changes backwards compatible by allowing new fields to be nullable or by upcasting events to fill in missing data at runtime.

For example, if you have an order shipment event and want to start capturing the carrier (UPS, USPS, FedEx, etc.), you’d add a nullable carrier ID field. Your code has to handle the case where that field might or might not be there when deserializing events.

The same goes for document stores. Old documents might not have the new property, so your code needs to treat it as nullable or optionally present. You may or may not want to backfill the data, but keeping it backwards compatible lets you deploy without breaking existing functionality.

Why Backwards Compatibility Matters in Scaled-Out Environments

This becomes especially important when you’re running multiple instances of your app and doing rolling deployments. Imagine you deploy a new version of your app to one instance, but other instances are still running the old version. During this transition, you have both old and new versions running against your new schema.

Until all instances are updated, your schema changes and app code must be backwards compatible to avoid breaking functionality.

When to Make Schema Changes During Deployment

We know it’s often best to make schema changes before deploying new app code, but when exactly that happens depends on your database type and the changes you’re making.

With event streams or event stores, you often don’t have a schema change per se, it’s all about deploying backwards-compatible code that can handle existing and new event formats, possibly with upcasting.

For relational databases, one approach is to have your app make schema changes at startup before handling requests. This means your deployment includes deploying the app, which then runs the migration on startup.

The downside is in scaled-out environments: multiple instances might try to apply the same schema change concurrently, and every app startup has to check if the schema is up to date, which adds overhead.

Depending on your context, how often you deploy, how many instances you run, this overhead may or may not be acceptable. You might even use health checks to prevent the app from starting if the schema isn’t at the expected version.

Another approach is to separate schema changes from app deployment. You run your database migrations independently as a distinct step in your deployment pipeline. Once the schema change succeeds, you deploy your app code.

This approach works regardless of database type. For example, if you want to transform or migrate event streams, you’d do it as part of your deployment pipeline, not tied directly to app startup.

Using Flyway for Database Migrations

For tooling, I’ve been using Flyway for over 10 years. It’s served all my needs related to relational database changes.

Here’s a quick example of steps:

First, initialize Flyway and set the baseline of your current schema using the Flyway CLI.

Running the baseline command creates a schema history table that tracks which migration scripts have been applied.

Next, create a new migration script, for example, to add a new column to a table.

Run the migrate command to apply the migration script as part of your deployment process.

Flyway records the migration in the schema history table and updates your database schema.

When developing locally, you make your schema changes and create migration scripts that will be run during deployment. The key to database migrations is maintaining backwards compatibility.

Backwards Compatibility is the Key

Database migrations can be a multi-step process. You start by making non-breaking changes that are backwards compatible, then later do cleanup steps like backfilling data or tightening constraints.

The fundamental idea is to look closely at your code and schema changes and ensure backwards compatibility at every step. This allows you to deploy new versions safely, roll back if needed, and keep your app running smoothly throughout the process.

Follow @CodeOpinion

Shared Database between Services? Maybe!

Database Coupling: How to FIX a Spaghetti Code System

What does Idempotent Mean

The post Database Migration Strategies appeared first on CodeOpinion.

CodeOpinion

Coding Isn’t the Hard Part

YouTube

The hard part is figuring out what to build

Why boundaries are difficult

Workflow complexity is different from technical complexity

A simple shipment example

Once the model is clear, implementation becomes mechanical

That does not mean there are no hard technical problems

What the best developers actually do well

So is coding the hard part?

Related Links

Vertical Slices doesn’t mean “Share Nothing”

YouTube

Boundaries

A Shipment Workflow Is a Good Example

A Vertical Slice Can Be One Step In the Workflow

There Are Two Different Kinds of Sharing

A Vertical Slice Owns Its Data and Behavior

Shared State Is Not the Problem

This Still Applies With Event Sourcing

Sharing an Aggregate Is Not Bad Sharing

Different Slices Can Use the Same Domain Model

Where Sharing Becomes a Problem

A Vertical Slice Is a Logical Boundary, Not a Physical One

Good Sharing Versus Bad Sharing

Do Not Share Domain Meaning

Focus on Actions, Not Just Data

This Is Really About Coupling

Share the Right Things

Related Links

Read Replicas Are NOT CQRS (Stop Confusing This)

YouTube

The Outbox Pattern

CQRS

Event Sourcing

The Real Point About Overengineering

Related Links

Your Idempotent Code Is Lying To You

YouTube

Idempotent

The happy path: idempotency with internal state

Why the payment gateway breaks “exactly once”

Concurrency makes this easier to see

There’s no magic distributed transaction coming

1) The third-party service supports idempotency

2) Lookup by a reference ID before charging

3) Serialize the operation per business key (locking)

Caveats

4) Separate internal state from external calls (inbox/outbox)

5) Reconcile and compensate

Putting it all together

Related Links

You Can’t Future-Proof Software Architecture

YouTube

The Unknown Is Usually Boring Stuff That Stops Being Boring

Stripe Isn’t the Problem. Leaking Stripe Everywhere Is.

That “Created Date Renewal Hack” Is a Symptom

Your Data Model Isn’t Your Domain Model

So What’s the Fix? Not Interfaces Everywhere.

Separate Concepts: Invoice vs Payment

Watch Your Nouns: Third Party Vocabulary Leaks

A Concrete Blast Radius Example

How People Make It Worse

Designing for the Unknown

Related Links

Context Is the Bottleneck in Software Development

YouTube

“Design does not matter anymore” is the trap

Coupling is still the thing you have to manage

Where context actually lives

Events give you the story, not just the state

Boundaries keep your concepts from being smashed together

AI can write code. It guesses your intent.

Related Links

Why “Microservices” Debates Miss the Point

YouTube

The Root Problem Is Coupling

Logical Boundaries vs Physical Boundaries

What We Are Really Arguing About