Redowan's Reflections

Do you need a repository layer on top of sqlc?

Mon, 16 Mar 2026 00:00:00 +0000

Today in r/golang, user Leading-West-4881 asked:

Is a repository layer over sqlc over-engineering or necessary for scale? I’m building a notification engine in Go using sqlc for the DB layer. Do you just inject *db.Queries into your services, or do you find the abstraction of a repository layer worth the extra code?

I attempted to answer it there and the gist is correct. But I wrote it in a hurry so the example and the explanation could be better. Capturing it properly here.

Call it repository or whatever you want, the name doesn’t matter. The point is that your business logic should be oblivious to the persistence layer. Doesn’t matter if it’s sqlc, raw database/sql, or gorm. If your service functions call sqlc queries directly, your core logic is coupled to your database code. That makes it harder to test in isolation and harder to swap out later.

Put a small interface between your business code and your storage code. The business side defines what it needs, the storage side satisfies it, and they live in separate packages.

Say you’re building a service that manages books. Start with the domain type and the storage interface:

// bookstore/bookstore.go

type Book struct {
    ID    int64
    Title string
}

type BookStore interface {
    Get(ctx context.Context, id int64) (Book, error)
    Create(ctx context.Context, b Book) (int64, error)
}

The service depends only on that interface:

// bookstore/service.go

type Service struct {
    store BookStore
}

func NewService(s BookStore) *Service {
    return &Service{store: s}
}

func (s *Service) RegisterBook(
    ctx context.Context, title string) (Book, error) {

    b := Book{Title: title}
    id, err := s.store.Create(ctx, b)
    if err != nil {
        return Book{}, err
    }
    b.ID = id
    return b, nil
}

func (s *Service) GetBook(ctx context.Context, id int64) (Book, error) {
    return s.store.Get(ctx, id)
}

RegisterBook doesn’t know about SQL, sqlc, or Postgres. It builds a Book, asks the store to persist it, and gets an ID back.

The concrete implementation goes in a separate package. This is where sqlc-generated code would live:

// postgres/store.go

type Store struct{ db *sql.DB }

func NewStore(db *sql.DB) *Store { return &Store{db: db} }

func (s *Store) Get(ctx context.Context, id int64) (bookstore.Book, error) {
    // sqlc query or raw sql, doesn't matter
    // ...
}

func (s *Store) Create(
    ctx context.Context, b bookstore.Book) (int64, error) {
    // INSERT INTO books (title) VALUES ($1) RETURNING id
    // ...
}

Wire it up at startup:

// cmd/main.go

store := postgres.NewStore(db)
svc := bookstore.NewService(store)

In tests, swap in a fake that satisfies the same interface:

// bookstore/service_test.go

var _ BookStore = (*memStore)(nil)

type memStore struct {
    mu   sync.Mutex
    data map[int64]Book
    next int64
}

func (m *memStore) Get(
    ctx context.Context, id int64) (Book, error) {

    m.mu.Lock()
    defer m.mu.Unlock()
    b, ok := m.data[id]
    if !ok {
        return Book{}, fmt.Errorf("book %d not found", id)
    }
    return b, nil
}

func (m *memStore) Create(
    ctx context.Context, b Book) (int64, error) {

    m.mu.Lock()
    defer m.mu.Unlock()
    m.next++
    b.ID = m.next
    m.data[b.ID] = b
    return b.ID, nil
}

Now the test reads exactly like production code, minus Postgres:

// bookstore/service_test.go

func TestRegisterBook(t *testing.T) {
    store := &memStore{data: make(map[int64]Book)}
    svc := NewService(store)

    b, err := svc.RegisterBook(context.Background(), "DDIA")
    if err != nil {
        t.Fatal(err)
    }
    if b.ID == 0 {
        t.Fatal("expected non-zero ID")
    }
    if b.Title != "DDIA" {
        t.Fatalf("got title %q, want DDIA", b.Title)
    }
}

Same service code, no database needed. The test exercises RegisterBook without touching SQL. If the storage layer changes tomorrow, the service and its tests stay the same.

Wrapping a gRPC client in Go

Sun, 15 Mar 2026 00:00:00 +0000

Yesterday I wrote a shard on exploring the etcd codebase. One of the things that stood out was how the clientv3 package abstracts out the underlying gRPC machinery.

etcd is a distributed key-value store where the server and client communicate over gRPC. But if you’ve only ever used clientv3 and never peeked into the internals, you wouldn’t know that. You call resp, err := client.Put(ctx, "key", "value") and get back a *PutResponse. It feels like a regular Go library. The fact that gRPC and protobuf are involved is an implementation detail that the client wrapper keeps away from you.

I’ve been building a few gRPC services at work lately, and I keep running into the same question: what API do the users of my client library see? The server ships as a binary. The client ships as a Go package that other teams go get. If I hand them the raw generated gRPC stubs, they have to import my protobuf types, manage gRPC connections, configure TLS, and parse codes.NotFound from google.golang.org/grpc/status. That’s a lot of protocol plumbing for someone who just wants to consume my service.

This post walks through wrapping a generated gRPC client behind a higher level Go API, following the same pattern etcd uses. The idea is to give the user a wrapper client that abstracts out the generated client.

I’ll use a small in-memory KV store as the running example.

Layout

kv/
├── api/
│   ├── kv.proto           # service definition
│   ├── kv.pb.go           # generated message types
│   └── kv_grpc.pb.go      # generated client and server stubs
├── client/
│   └── client.go          # the wrapper (what users import)
├── server/
│   └── main.go            # the server binary
└── go.mod

api/ holds the proto and generated code. server/ is a binary you deploy. client/ is the library you ship. Other teams add it to their go.mod and never touch proto types directly.

Defining the service

The KV store has three RPCs: put, get, and delete.

// api/kv.proto
syntax = "proto3";
package kvpb;
option go_package = "example.com/kv/api";

service KV {
  rpc Put(PutRequest) returns (PutResponse);
  rpc Get(GetRequest) returns (GetResponse);
  rpc Delete(DeleteRequest) returns (DeleteResponse);
}

message PutRequest    { string key = 1; bytes value = 2; }
message PutResponse   {}
message GetRequest    { string key = 1; }
message GetResponse   { bytes value = 1; optional bool found = 2; }
message DeleteRequest { string key = 1; }
message DeleteResponse {}

GetResponse uses optional bool found because proto3 normally can’t distinguish “field is zero” from “field was never set.” The optional keyword generates a pointer in Go, which lets callers tell a missing key apart from an empty value.

Running protoc on this generates a client interface and a server stub. The client side looks like this:

// api/kv_grpc.pb.go (generated)
type KVClient interface {
    Put(ctx context.Context, in *PutRequest,
        opts ...grpc.CallOption) (*PutResponse, error)
    Get(ctx context.Context, in *GetRequest,
        opts ...grpc.CallOption) (*GetResponse, error)
    Delete(ctx context.Context, in *DeleteRequest,
        opts ...grpc.CallOption) (*DeleteResponse, error)
}

Every method takes a context.Context, a protobuf request struct, and variadic grpc.CallOptions, and returns a protobuf response plus an error. Anyone calling the service has to import protobuf types, construct request structs like &api.PutRequest{}, and understand gRPC call options, even for a simple “get this key” call.

The server implements the other side with an in-memory map. What we care about for the wrapper is that it returns a gRPC NOT_FOUND status when a key doesn’t exist. The wrapper translates that into a Go sentinel error. Here’s the server code:

// server/main.go
type server struct {
    kvpb.UnimplementedKVServer
    data map[string][]byte
}

func (s *server) Get(
    ctx context.Context, r *kvpb.GetRequest,
) (*kvpb.GetResponse, error) {
    v, ok := s.data[r.Key]
    if !ok {
        return nil, status.Errorf(
            codes.NotFound, "key %q", r.Key)
    }
    return &kvpb.GetResponse{
        Value: v, Found: proto.Bool(true),
    }, nil
}
// Put and Delete follow the same shape.

The server embeds UnimplementedKVServer, the standard gRPC pattern. It provides no-op implementations for all RPCs so the code compiles even before you’ve written the real logic. The Get method checks the map and returns codes.NotFound when the key isn’t there. This is the status code the wrapper will catch and turn into a Go error. I’ve elided Put and Delete since they follow the same structure.

Using the generated client directly

Without a wrapper, callers use the generated KVClient directly. Pay attention to the imports:

// example/main.go (raw usage without wrapper)
import (
    "context"

    "google.golang.org/grpc"
    "google.golang.org/grpc/credentials/insecure"
    "example.com/kv/api"
)

// ...
conn, err := grpc.NewClient("localhost:9090",
    grpc.WithTransportCredentials(insecure.NewCredentials()))
// ...
kv := api.NewKVClient(conn)
_, err = kv.Put(ctx, &api.PutRequest{
    Key: "greeting", Value: []byte("hello"),
})

Three imports just to put a key. The caller manages the gRPC connection, constructs &api.PutRequest{} structs for every call, and has to parse gRPC status codes to check if a key exists. For internal code where everyone knows gRPC, this is fine. For a library you ship to other teams, it’s a lot of ceremony.

Calling the server with the wrapper

This is the API we actually want to give our users. Same sequence as before (put a key, get it back, handle a missing key) but without any gRPC or protobuf leaking through:

// example/main.go (with the wrapper)
import "example.com/kv/client"

// ...

c, err := client.New("localhost:9090")
// ...
defer c.Close()

err = c.Put(ctx, "greeting", []byte("hello"))

val, err := c.Get(ctx, "greeting")

_, err = c.Get(ctx, "missing")
if errors.Is(err, client.ErrNotFound) { ... }

One import instead of three. No gRPC or protobuf packages in sight. Put takes a string and a byte slice. Get returns []byte. Missing keys come back as client.ErrNotFound, checked with errors.Is like any other Go error. The caller doesn’t need to know that gRPC is involved at all.

Note

Callers never have to build an api.PutRequest, call grpc.NewClient, configure TLS, or check codes.NotFound. They pass strings and byte slices, get Go errors back, and the wrapper handles the rest.

The rest of this post builds the wrapper that turns the generated KVClient from the previous section into this API.

Building the wrapper

The client/ package is the only thing users import. It hides the generated api.KVClient behind a struct and re-exposes the same operations using plain Go types. The whole wrapper lives in a single file (client/client.go).

The wrapper starts with a sentinel error and a testable interface:

// client/client.go

var ErrNotFound = errors.New("key not found")

type KV interface {
    Put(ctx context.Context, key string, value []byte) error
    Get(ctx context.Context, key string) ([]byte, error)
    Delete(ctx context.Context, key string) error
}

ErrNotFound replaces the gRPC NOT_FOUND status code. Callers check it with errors.Is and never import google.golang.org/grpc/codes.

Client implements KV, and KV uses only standard Go types instead of protobuf or gRPC types. This is intentionally a producer-side interface: we define it in the same package as Client because we know the full set of operations the service supports and we want to offer a ready-made contract for consumers. Other packages that depend on your client can accept a KV in their function signatures and swap in a simple in-memory fake during tests without spinning up a gRPC server or importing any gRPC packages.

Important

KV is a producer-side interface. I wrote about when these make sense in Revisiting interface segregation in Go.

Then the struct and constructor:

type Client struct {
    conn *grpc.ClientConn
    kv   api.KVClient
}

func New(addr string, opts ...grpc.DialOption) (*Client, error) {
    if len(opts) == 0 {
        opts = []grpc.DialOption{
            grpc.WithTransportCredentials(insecure.NewCredentials()),
        }
    }
    conn, err := grpc.NewClient(addr, opts...)
    if err != nil {
        return nil, fmt.Errorf("connecting to %s: %v", addr, err)
    }
    return &Client{conn: conn, kv: api.NewKVClient(conn)}, nil
}

func (c *Client) Close() error { return c.conn.Close() }

Client holds the gRPC connection and the generated api.KVClient as unexported fields. Note that api.KVClient is an interface, not a concrete struct. The gRPC codegen doesn’t expose the actual client struct at all; you get back a KVClient interface from api.NewKVClient(conn). We store it as a regular field rather than embedding it. If you embedded the api.KVClient interface, all its methods like Put(ctx, *PutRequest, ...CallOption) would be promoted onto Client directly, and callers could bypass the wrapper to make raw gRPC calls.

Warning

Don’t embed the generated client interface. Keep it as a private field so the only way to talk to the server is through the wrapper methods.

New creates the gRPC connection and builds the generated client from it. The variadic grpc.DialOption lets callers pass custom TLS, keepalive, or interceptor config. If they pass nothing, the default is insecure credentials for local dev. The retries section below shows what a production setup looks like.

With the types in place, we can look at the wrapper methods. Get shows the pattern all three follow:

func (c *Client) Get(ctx context.Context, key string) ([]byte, error) {
    resp, err := c.kv.Get(ctx, &api.GetRequest{Key: key})
    if err != nil {
        if s, ok := status.FromError(err); ok &&
            s.Code() == codes.NotFound {
            return nil, ErrNotFound
        }
        return nil, fmt.Errorf(
            "getting key %s: %v", key, err)
    }
    return resp.Value, nil
}
// Put and Delete follow the same shape.

Each wrapper method follows the same pattern: take the caller’s Go arguments, build the protobuf request internally, call the generated client, and return plain Go types.

Pay attention to the error handling. When the server returns NOT_FOUND, we catch that gRPC status and convert it to our own ErrNotFound sentinel so callers can check it with errors.Is instead of parsing gRPC status codes themselves. For everything else, we wrap with %v instead of %w. If we used %w, callers could unwrap the error with errors.As and reach the underlying gRPC status types, which would re-couple them to gRPC internals and defeat the whole point of having a wrapper. I wrote about this tradeoff in Go errors: to wrap or not to wrap?.

Plugging in retries and metrics

Since the wrapper owns the grpc.NewClient call, it can bake in retries and observability without the caller knowing. gRPC interceptors work like HTTP middleware. They wrap every RPC with extra logic (logging, retries, metrics) without changing the handler code. You register them as dial options when creating the connection:

// client/client.go (production version of New)
func New(addr string, opts ...grpc.DialOption) (*Client, error) {
    defaults := []grpc.DialOption{
        grpc.WithTransportCredentials(credentials.NewTLS(&tls.Config{})),
        grpc.WithChainUnaryInterceptor(
            grpc_retry.UnaryClientInterceptor(
                grpc_retry.WithMax(3),
                grpc_retry.WithBackoff(
                    grpc_retry.BackoffExponential(100*time.Millisecond),
                ),
            ),
            grpcprom.UnaryClientInterceptor,
        ),
    }
    opts = append(defaults, opts...)
    // ... rest is the same
}

grpc_retry from go-grpc-middleware retries failed RPCs with exponential backoff. grpcprom records latency histograms and error rates. Same client.New, same c.Put, but now with retries and metrics baked in. Callers who need to override the defaults can pass their own dial options. This is useful in tests where you might want insecure credentials or no retries.

Try it yourself

The full code is on GitHub. Install the server and run the example:

go install github.com/rednafi/examples/wrapping-grpc-client/server@latest
server &

go install github.com/rednafi/examples/wrapping-grpc-client/example@latest
example

Running the example will return:

put greeting=hello
get greeting=hello
get missing: not found (expected)
deleted greeting
get greeting after delete: not found (expected)

Or add the client library to your own project:

go get github.com/rednafi/examples/wrapping-grpc-client/client@latest

In praise of the etcd codebase

Sat, 14 Mar 2026 00:00:00 +0000

I’ve been writing a lot of Go gRPC services these days at work - database proxies, metadata services, storage orchestration control plane, etc. They require me to go a bit deeper into protobuf and Go gRPC tooling than you’d typically need to. So I started poking around OSS gRPC codebases to pick up conventions.

I was mainly looking for pointers on how to organize protobuf definitions, wire up server-side metrics and interceptors, and build ergonomic client wrappers. The default answer here is often “go read the Docker or Kubernetes codebase.” But both of those are pretty huge and take forever to get accustomed to.

Then I found etcd. It’s used by Kubernetes’ control plane for storing configs in a consistent manner. It exposes a small set of well-defined gRPC endpoints to interact with the storage layer. The core services are defined in a single rpc.proto file:

service KV {
  rpc Range(RangeRequest) returns (RangeResponse);
  rpc Put(PutRequest) returns (PutResponse);
  rpc DeleteRange(DeleteRangeRequest) returns (DeleteRangeResponse);
  rpc Txn(TxnRequest) returns (TxnResponse);
  rpc Compact(CompactionRequest) returns (CompactionResponse);
}

// ...

The full file also defines Watch, Lease, Cluster, Maintenance, and Auth services. Grokking that file and the surrounding api directory is a good way to learn how to organize your protobufs and generated code. Some other things I picked up:

Proto definitions live under api/, separated into subpackages like etcdserverpb, mvccpb, authpb. Generated Go code lives alongside the proto files.
The RPC handler implementations live under server/etcdserver/api/v3rpc. key.go implements the KV service (Range, Put, DeleteRange, Txn, Compact), and the other services follow the same pattern in watch.go, lease.go, member.go, maintenance.go, auth.go.
grpc.go shows how to assemble a gRPC server with chained unary and stream interceptors using go-grpc-middleware.
Server-side Prometheus metrics are wired in grpc.go via grpc_prometheus.ServerMetrics interceptors. It optionally enables latency histograms when the metric type is extensive.
metrics.go defines custom Prometheus counters and histograms on top of the standard gRPC ones, things like etcd_network_client_grpc_sent_bytes_total and watch stream durations.
interceptor.go handles logging. newLogUnaryInterceptor logs request/response sizes at warn level when latency exceeds a threshold.
The client has no built-in metrics. The clientv3 README says you can wire up go-grpc-prometheus yourself, but the library doesn’t do it for you.
retry_interceptor.go implements client-side retry with backoff, safe retry classification for read-only vs mutation RPCs, and auth token refresh on failure.
The clientv3 package wraps the generated gRPC client behind a nicer Go API. Good reference if you’re building an ergonomic client on top of raw protobuf types.
If you’re a distributed systems nerd, etcd uses Raft for consensus. That part of the codebase is its own rabbit hole.

This has become my go-to whenever I’m wiring up another gRPC service at work. I’ve gotten comfortable enough with it over the last few weeks that I can point people to specific files when we need to make decisions.

Go errors: to wrap or not to wrap?

Sat, 07 Mar 2026 00:00:00 +0000

A lot of the time, the software I write boils down to three phases: parse some input, run it through a state machine, and persist the result. In this kind of code, you spend a lot of time knitting your error path, hoping that it’d be easier to find the root cause during an incident. This raises the following questions:

When to fmt.Errorf("doing X: %w", err)
When to use %v instead of %w
When to just return err

There’s no consensus, and the answer changes depending on the kind of application you’re writing. The Go 1.13 blog already covers the mechanics and offers some guidance, but I wanted to collect more evidence of what people are actually doing in the open and share what’s worked for me.

The problem with bare errors

Here’s a function that places an order by calling into a few different packages:

func placeOrder(ctx context.Context, req OrderReq) error {
    user, err := users.Get(ctx, req.UserID)
    if err != nil {
        return err
    }
    err = inventory.Reserve(ctx, req.ItemID, req.Qty)
    if err != nil {
        return err
    }
    err = payments.Charge(ctx, user.PaymentID, req.Total)
    if err != nil {
        return err
    }
    return saveOrder(ctx, user.ID, req.ItemID)
}

All four calls can fail with connection refused. When one of them does, your log says:

connection refused

Which call? No idea. You grep the codebase, add temporary logging, narrow it down. In a service with dozens of dependencies, debugging this trail of errors can turn into a huge time sink.

One obvious fix is to wrap the error at every return site:

user, err := users.Get(ctx, req.UserID)
if err != nil {
    return fmt.Errorf("getting user %s: %w", req.UserID, err)
}
err = inventory.Reserve(ctx, req.ItemID, req.Qty)
if err != nil {
    return fmt.Errorf("reserving stock for %s: %w", req.ItemID, err)
}

Now the log says:

reserving stock for item-123: connection refused

That tells you exactly which call failed and which item it was for.

The case for wrapping at every return site

Dave Cheney advocated for this in his 2016 talk Don’t just check errors. His pkg/errors library introduced errors.Wrap, which adds a message and a stack trace at the point where the error occurs. The idea is that each function knows what operation it was attempting, and that context is lost if you don’t capture it immediately.

CockroachDB takes this further. They use cockroachdb/errors, a drop-in replacement for the stdlib errors package that captures a stack trace at every wrap site:

// cockroachdb style: stack trace at every wrap
if err := r.validateCmd(ctx, cmd); err != nil {
    return errors.Wrap(err, "validating command")
}
if err := r.stage(ctx, cmd); err != nil {
    return errors.Wrap(err, "staging command")
}

The Terraform AWS provider does the same thing with fmt.Errorf("...: %w", err) at every layer. Their contributor guidelines mandate a consistent format for all resource operations:

// terraform-provider-aws style
output, err := conn.CreateVpc(ctx, input)
if err != nil {
    return fmt.Errorf("creating EC2 VPC: %w", err)
}

d.SetId(aws.ToString(output.Vpc.VpcId))

if _, err := WaitVPCAvailable(ctx, conn, d.Id()); err != nil {
    return fmt.Errorf(
        "waiting for EC2 VPC (%s) available: %w",
        d.Id(), err,
    )
}

The wrapcheck linter codifies this as a rule. It doesn’t flag every bare return err, only errors that originated from a different package:

func placeOrder(ctx context.Context, req OrderReq) error {
    // users.Get is in another package: wrapcheck flags
    user, err := users.Get(ctx, req.UserID)
    if err != nil {
        return err // not wrapped: linter warning
    }

    // validate is in the same package: wrapcheck allows
    err = validate(req)
    if err != nil {
        return err // fine, same package
    }
    // ...
}

The reasoning is that when an error crosses a package boundary, the receiving code is the last place that knows what it was trying to do. Within a package, the caller already has that context.

For many cases, wrapping everything is the right default:

The risk of overwrapping, especially in my private code, is much lower than the risk of underwrapping when the service crashes and you get io.EOF.

– Peter Bourgon on Go Time #91

But wrapping has costs that only show up as the codebase grows.

The cost of overwrapping

Messages pile up

When every layer wraps, your error messages become nested chains:

placing order: reserving stock for item-123:
    checking warehouse: querying database:
    connection refused

Four layers of context for one connection refused. The middle layers (checking warehouse and querying database) don’t add a warehouse ID or a query. They just restate the call chain.

It also makes the error string fragile. It changes whenever someone renames an intermediate function or refactors the call chain. If you had an alert matching on checking warehouse: querying database: connection refused, it breaks the moment someone renames checkWarehouse to checkStock. The same root cause (connection refused) wrapped through different code paths produces different error strings, making it hard to aggregate them in your logging dashboard.

Jay Conrod’s error handling guidelines address this:

Each function is responsible for including its own values in the error message, except for arguments passed to the function that returned the wrapped error.

In other words, if os.Open already puts the file path in its error, your wrapper shouldn’t add the path again:

// redundant: the path appears twice
return fmt.Errorf("opening %s: %w", path, err)
// open /etc/app.yaml: opening /etc/app.yaml: permission denied

// better: add what you were doing, not what Open already said
return fmt.Errorf("reading config: %w", err)
// reading config: open /etc/app.yaml: permission denied

The Google Go Style Guide says the same:

When adding information to errors, avoid redundant information that the underlying error already provides.

You should still wrap, but only when you’re adding information - a user ID, an item ID, the name of the external service you were calling.

Important

If a function is just passing through a call to another function within the same package, the wrapper is noise.

`%w` creates contracts you didn’t mean to

%w in fmt.Errorf creates an error chain that callers can traverse with errors.Is and errors.As. That means the wrapped error becomes part of your function’s API surface.

The Go 1.13 blog uses sql.ErrNoRows to illustrate this. Say your LookupUser function calls database/sql internally:

func LookupUser(ctx context.Context, id string) (*User, error) {
    row := db.QueryRowContext(ctx, "SELECT ...", id)
    var u User
    if err := row.Scan(&u.Name, &u.Email); err != nil {
        return nil, fmt.Errorf(
            "looking up user %s: %w", id, err,
        )
    }
    return &u, nil
}

Because of %w, callers can now do errors.Is(err, sql.ErrNoRows) to check whether the user wasn’t found. That works until you switch from database/sql to an ORM, or put a cache in front of the query. The callers matching on sql.ErrNoRows silently break.

The Go 1.13 blog is explicit about this:

Wrapping an error makes that error part of your API. If you don’t want to commit to supporting that error as part of your API in the future, you shouldn’t wrap the error.

The Error Values FAQ makes the same point:

Callers can depend on the type and value of the error you’re wrapping, so changing that error can now break them. […] At that point, you must always return sql.ErrTxDone if you don’t want to break your clients, even if you switch to a different database package.

Same thing with typed errors. If your repository wraps a pgconn.PgError with %w, callers can unwrap through to the Postgres error code:

if pgErr, ok := errors.AsType[*pgconn.PgError](err); ok {
    log.Println(pgErr.Code) // e.g. "23505" (unique violation)
}

When you migrate to MySQL or put a cache in front of the database, those callers silently break.

The Google Go Style Guide notes that %w is appropriate when your package’s API guarantees that certain underlying errors can be unwrapped and checked by callers. If you don’t want to make that guarantee, use %v.

Important

%w makes the wrapped error part of your function’s API. Callers can errors.Is and errors.As through it, which means they can start depending on the inner error type. If you later change that inner error (swap databases, add a cache layer), those callers break. Use %w only when you intend to expose the inner error.

`%v` as the conservative default

%v adds the same context text (the human reading the log sees the identical message) but severs the error chain. No caller can errors.Is or errors.As through it:

// %w: callers can errors.Is(err, sql.ErrNoRows)
return fmt.Errorf("getting user %s: %w", id, err)

// %v: same message text, but the chain is severed
return fmt.Errorf("getting user %s: %v", id, err)

Both produce the same log output. But with %v, you’re free to swap the database later without breaking callers who were depending on the inner error type.

At system boundaries, the Google Go Style Guide recommends translating rather than wrapping:

At points where your system interacts with external systems like RPC, IPC, or storage, it’s often better to translate domain-specific errors into a standardized error space (e.g., gRPC status codes) rather than simply wrapping the raw underlying error with %w.

Say your repository layer talks to Postgres via pgx. Wrapping with %w exposes pgx errors to callers:

func (r *UserRepo) Get(ctx context.Context, id string) (*User, error) {
    row := r.db.QueryRow(ctx, "SELECT ...", id)
    if err := row.Scan(&u.Name, &u.Email); err != nil {
        return nil, fmt.Errorf("getting user %s: %w", id, err)
    }
    return &u, nil
}

Now any caller can errors.Is(err, pgx.ErrNoRows), tying them to your database driver. Translating means mapping the storage error into your own domain before it crosses the boundary:

var ErrNotFound = errors.New("not found")

func (r *UserRepo) Get(ctx context.Context, id string) (*User, error) {
    row := r.db.QueryRow(ctx, "SELECT ...", id)
    if err := row.Scan(&u.Name, &u.Email); err != nil {
        if errors.Is(err, pgx.ErrNoRows) {
            return nil, ErrNotFound
        }
        return nil, fmt.Errorf("getting user %s: %v", id, err)
    }
    return &u, nil
}

Callers check errors.Is(err, ErrNotFound) - which is yours - instead of errors.Is(err, pgx.ErrNoRows). When you swap from Postgres to MySQL, callers don’t break. And at system boundaries, consider translating entirely instead of wrapping.

How the stdlib handles errors

The standard library also uses sentinel errors and custom error types alongside %w and %v.

Packages like io define sentinel errors - package-level variables that callers check with errors.Is. The io package defines EOF and returns it from Read when there’s no more data:

// definition
var EOF = errors.New("EOF")

// inside a Reader implementation
func (r *myReader) Read(p []byte) (int, error) {
    if r.pos >= len(r.data) {
        return 0, io.EOF
    }
    // ...
}

A caller uses the sentinel to distinguish “end of input” from a real failure:

n, err := reader.Read(buf)
if errors.Is(err, io.EOF) {
    // done reading, not an error
    break
}
if err != nil {
    return err
}

Sentinels work when the caller only needs to know which failure occurred. When callers need structured metadata - not just identity - the stdlib uses custom error types. os.Open defines a *fs.PathError struct and returns it with the operation name, file path, and underlying syscall error as struct fields:

// definition in the fs package
type PathError struct {
    Op   string // "open", "read", "write"
    Path string // the file path
    Err  error  // the underlying syscall error
}

func (e *PathError) Unwrap() error { return e.Err }

// inside os.Open
func Open(name string) (*File, error) {
    // ...
    return nil, &PathError{Op: "open", Path: name, Err: err}
}

Because PathError implements Unwrap(), errors.Is(err, fs.ErrNotExist) works through the chain. But unlike fmt.Errorf wrapping, the context is in typed struct fields. A caller can extract those fields to decide what to do:

f, err := os.Open("/etc/app.yaml")
if err != nil {
    if pathErr, ok := errors.AsType[*fs.PathError](err); ok {
        // pathErr.Op is "open", pathErr.Path is "/etc/app.yaml"
        // pathErr.Err is the syscall error (e.g. ENOENT)
        log.Printf(
            "%s failed on %s: %v",
            pathErr.Op, pathErr.Path, pathErr.Err,
        )
    }
    return err
}

net.OpError follows the same pattern with Op, Net, Source, Addr, and Err fields. The package controls exactly what’s exposed via Unwrap(), and callers get structured metadata they can act on programmatically.

The stdlib also uses fmt.Errorf with both %w and %v, and the database/sql package shows why the choice matters. Rows.Scan wraps scanner errors with %w:

return fmt.Errorf(
    `sql: Scan error on column index %d, name %q: %w`,
    i, rs.rowsi.Columns()[i], err,
)

Before Go 1.16, Rows.Scan used %v here, which severed the chain. Custom Scanner implementations returning sentinel errors couldn’t be inspected with errors.Is by callers. Issue #38099 fixed this by switching to %w. But in the same package, internal type conversion errors use %v because the underlying strconv parse error is an implementation detail callers don’t need to inspect:

return fmt.Errorf(
    "converting driver.Value type %T (%q) to a %s: %v",
    src, s, dv.Kind(), err,
)

The database/sql migration from %v to %w was safe because it only exposed more to callers. Going the other direction would break callers who started depending on errors.Is.

Important

Going from %v to %w is a backwards-compatible change (it exposes more to callers). Going from %w to %v is a breaking change (callers who relied on errors.Is or errors.As through the chain will stop working). When in doubt, start with %v.

Kubernetes went through a similar migration. They historically used %v for most wrapping, which meant errors.As couldn’t traverse the chain. Issue #123234 tracked the codebase- wide migration from %v to %w, acknowledging that %v may still be preferred in some places “to abstract the implementation details” but that such cases should be rare.

For most application code, fmt.Errorf with %w or %v is enough. Custom error types like PathError make more sense in libraries and shared packages where callers need structured metadata. But wrapping isn’t the only way to attach context to an error.

Structured logging as an alternative to wrapping

Dave Cheney is the person who created pkg/errors and popularized error wrapping in Go. He eventually walked away from his own advice. In 2021, when looking for new maintainers for pkg/errors, he wrote:

I no longer use this package, in fact I no longer wrap errors.

– Dave Cheney on pkg/errors #245

His reasoning was that structured logging can carry the debugging context that wrapping was meant to provide. Compare the two approaches. With wrapping, you bake the context into the error string:

err = inventory.Reserve(ctx, req.ItemID, req.Qty)
if err != nil {
    return fmt.Errorf(
        "reserving stock for %s: %w", req.ItemID, err,
    )
}

The log line looks like:

reserving stock for item-123: connection refused

With structured logging, you keep the error value clean and attach the context as separate key-value fields:

err = inventory.Reserve(ctx, req.ItemID, req.Qty)
if err != nil {
    slog.Error("reserve stock failed",
        "item_id", req.ItemID,
        "err", err,
    )
    return err
}

The log line looks like:

level=ERROR msg="reserve stock failed"
    item_id=item-123 err="connection refused"

The same information is there, but in structured fields that your logging dashboard can index, filter, and aggregate on. The error value itself stays as connection refused without a chain of prefixes.

The tradeoff is that structured logging requires a logging pipeline that can query on fields. If all you have is grep on a log file, the wrapping version is easier to work with.

Note

Structured logging and wrapping aren’t mutually exclusive. You can wrap at package boundaries for the error string and log with slog at the handler for request-scoped context (user IDs, request IDs, trace IDs). The handler example in the Services section below does both.

How wrapping changes by application type

So how do you actually decide? It depends on what you’re building. Marcel van Lohuizen from the Go team described his own approach:

I do and don’t… If I wanna have context, I wrap it. If I create a new error, I wrap it. But sometimes you’re not really adding too much information, and then I don’t. So it depends on the situation.

– Marcel van Lohuizen on Go Time #91

Libraries

Be conservative. The Google style guide applies most directly here because you’re shipping an API contract. Use %v by default so you don’t accidentally expose implementation details. Use %w only when you intentionally want callers to inspect the inner error, and document that you’re doing so.

A library that wraps with %w ties its callers to its dependencies. If v2 switches from pgx to database/sql, every caller doing errors.Is(err, pgconn.something) breaks. Use %v by default, and define your own sentinels when callers need to branch on the error:

var ErrNotFound = errors.New("item not found")

func (c *Client) Fetch(ctx context.Context, id string) (*Item, error) {
    resp, err := c.http.Get(ctx, c.url+"/items/"+id)
    if err != nil {
        if isNotFound(err) {
            return nil, ErrNotFound
        }
        return nil, fmt.Errorf("fetching item %s: %v", id, err)
    }
    // ...
}

Callers check errors.Is(err, ErrNotFound) - which is yours - without being coupled to your HTTP client. Same pattern as the UserRepo translation example earlier.

CLI tools

Wrap freely with %w. The call stack is shallow, the error message is the user-facing output, and nobody is calling errors.Is on your CLI’s errors. Maximum context helps the human reading the terminal:

func run() error {
    cfg, err := loadConfig(cfgPath)
    if err != nil {
        return fmt.Errorf("loading config %s: %w", cfgPath, err)
    }
    conn, err := connect(cfg.DatabaseURL)
    if err != nil {
        return fmt.Errorf("connecting to database: %w", err)
    }
    return migrate(conn)
}

The user sees:

loading config /etc/app.yaml:
    open /etc/app.yaml: permission denied

Services

In my experience, services are where it’s the hardest to give a formulaic answer to this. You have structured logging and distributed tracing, but you also have deep call stacks and many dependencies.

The approach I’ve landed on: wrap at package boundaries with context about what you were trying to do. Use %w within your own codebase where callers should be able to inspect the inner error. Use %v when the error crosses a system boundary (RPCs, database calls, third-party APIs). Skip wrapping for same-package calls.

Here’s the placeOrder function from the beginning, rewritten:

func placeOrder(ctx context.Context, req OrderReq) error {
    user, err := users.Get(ctx, req.UserID) // (1)
    if err != nil {
        return fmt.Errorf("getting user %s: %w", req.UserID, err)
    }
    err = inventory.Reserve(ctx, req.ItemID, req.Qty) // (2)
    if err != nil {
        return fmt.Errorf("reserving stock for %s: %w", req.ItemID, err)
    }
    err = payments.Charge(ctx, user.PaymentID, req.Total) // (3)
    if err != nil {
        return fmt.Errorf("charging payment: %w", err)
    }
    return saveOrder(ctx, user.ID, req.ItemID) // (4)
}

(1) users.Get is in another package - wrap with the user ID
(2) inventory.Reserve is in another package - wrap with the item ID
(3) payments.Charge is in another package - wrap with the operation name
(4) internal helper in the same package - bare return is enough

At the handler, use %v to translate into the external domain without exposing internals:

func handlePlaceOrder(
    ctx context.Context, req *pb.OrderReq,
) (*pb.OrderResp, error) {
    err := placeOrder(ctx, fromProto(req))
    if err != nil {
        slog.Error("placing order",
            "user_id", req.UserId,
            "item_id", req.ItemId,
            "err", err,
        )
        // %v: context for humans, no chain for callers
        return nil, status.Errorf(codes.Internal, "placing order: %v", err)
    }
    return &pb.OrderResp{}, nil
}

The handler logs the full error with request context for debugging, then returns a gRPC status with %v so the caller gets a useful message without being able to errors.Is through to your database driver.

Where I’ve landed

There’s no consensus on how much to wrap, and I don’t think there needs to be. Here’s what I do:

Within a package, bare return err. The caller already has context.
At package boundaries, fmt.Errorf("doing X: %w", err) with identifying info (user IDs, item IDs, file paths). The wrapcheck linter can enforce this automatically. Only wrap when you’re adding information the inner error doesn’t already carry.
At system boundaries (RPCs, database calls, third-party APIs), translate rather than wrap. Map implementation errors into your own sentinel errors or custom error types so callers depend on your package, not your dependencies. Use %v for the fallback path.
In libraries, %v by default. Own sentinels (ErrNotFound, ErrConflict) for cases callers need to inspect. %w only when you intentionally want callers to unwrap, and document that you’re doing so.
In CLIs, %w everywhere. The error message is the user-facing output.
In services, all of the above plus slog at the handler level for request-scoped context, so the error value doesn’t need to carry all of that.

Mutate your locked state inside a closure

Thu, 05 Mar 2026 00:00:00 +0000

When multiple goroutines need to read and write the same value, you need a mutex to make sure they don’t step on each other. Without one, concurrent writes can corrupt the state - two goroutines might read the same value, both modify it, and one silently overwrites the other’s change. The usual approach is to put a sync.Mutex next to the fields it protects:

var (
    mu      sync.Mutex
    counter int
)

mu.Lock()
counter++
mu.Unlock()

This works, but nothing enforces it. The compiler won’t stop you from accessing counter without holding the lock. Forget to lock in one spot and you have a data race. One way to make this safer is to bundle the value and its mutex into a small generic wrapper that only exposes locked access through methods:

type Locked[T any] struct {
    mu sync.Mutex
    v  T
}

func NewLocked[T any](initial T) *Locked[T] {
    return &Locked[T]{v: initial}
}

func (l *Locked[T]) Get() T {
    l.mu.Lock()
    defer l.mu.Unlock()
    return l.v
}

func (l *Locked[T]) Set(v T) {
    l.mu.Lock()
    defer l.mu.Unlock()
    l.v = v
}

You keep mu and v unexported, pass around *Locked[T], and callers use Get to read and Set to write:

counter := NewLocked(0)

counter.Set(42)
fmt.Println(counter.Get()) // 42

Now callers can’t touch the underlying value without going through the lock. This doesn’t prevent misuse within the same package, but it makes unprotected access from other packages impossible.

This works fine when you’re replacing the value wholesale - just call counter.Set(42) and move on. But when your mutation depends on the current value, Get and Set can race against each other.

The problem with Set

Say you want to increment the counter instead of replacing it. You’d have to do:

v := counter.Get()
v++
counter.Set(v)

Each individual call is safe - Get holds the lock while reading, Set holds it while writing. But the three calls together aren’t atomic. Between Get and Set, another goroutine can modify the value, and your increment overwrites theirs. That’s the classic lost-update bug.

It gets worse with compound state. Say the wrapper holds a struct:

type State struct {
    Count int
    Name  string
}

state := NewLocked(State{})

And you want to conditionally update both fields:

s := state.Get()
if s.Count < 10 {
    s.Count++
    s.Name = fmt.Sprintf("item-%d", s.Count)
}
state.Set(s)

Same problem. Get returns a copy, you mutate the copy, then Set writes it back. If another goroutine modified state between those two calls, your write clobbers it.

Important

The race detector (go test -race) won’t catch this. It detects data races - two goroutines accessing the same memory without synchronization. Here, every Get and Set properly acquires the mutex, so each individual access is synchronized. The bug is a logical race (lost update), not a data race. The race detector sees nothing wrong.

You can prove this with a simple test. Ten goroutines each increment the counter 1000 times, so the final value should be 10000:

func TestSetValue(t *testing.T) {
    counter := NewLocked(0)

    var wg sync.WaitGroup
    for range 10 {
        wg.Go(func() {
            for range 1000 {
                v := counter.Get()
                v++
                counter.SetValue(v)
            }
        })
    }

    wg.Wait()

    got := counter.Get()
    if got != 10000 {
        t.Errorf("got %d, want 10000 (lost %d updates)", got, 10000-got)
    }
}

Running go test -race produces no race warnings, but the test fails:

=== RUN   TestSetValue
    locked_test.go:30: got 1855, want 10000 (lost 8145 updates)
--- FAIL: TestSetValue (0.02s)

The race detector is silent. The updates are just gone.

Take a function instead

Instead of taking a value, have Set take a function:

func (l *Locked[T]) Set(f func(*T)) {
    l.mu.Lock()
    defer l.mu.Unlock()
    f(&l.v)
}

Now the counter increment becomes:

counter.Set(func(v *int) {
    *v++
})

And the compound mutation:

state.Set(func(s *State) {
    if s.Count < 10 {
        s.Count++
        s.Name = fmt.Sprintf("item-%d", s.Count)
    }
})

The lock is held for the entire closure. There’s no gap between reading and writing, so no other goroutine can interfere. Both fields update together or not at all.

The function takes a pointer to T rather than a value of T for two reasons. First, it lets you mutate the state in place instead of working on a copy. Second, if T is a large struct, passing a pointer avoids copying the whole thing into the closure on every call.

The stdlib already does this

Go’s database/sql package has an internal withLock helper that follows the same pattern:

// withLock runs while holding lk.
func withLock(lk sync.Locker, fn func()) {
    lk.Lock()
    defer lk.Unlock() // in case fn panics
    fn()
}

It’s used throughout database/sql to serialize access to the underlying driver connection. For example, when pinging a connection:

if pinger, ok := dc.ci.(driver.Pinger); ok {
    withLock(dc, func() {
        err = pinger.Ping(ctx)
    })
}

Or when preparing a statement:

withLock(dc, func() {
    si, err = ctxDriverPrepare(ctx, dc.ci, query)
})

Or committing a transaction:

withLock(tx.dc, func() {
    err = tx.txi.Commit()
})

There are about 18 call sites in sql.go alone. In those snippets, dc is a *driverConn - the struct that wraps a database driver connection. It embeds sync.Mutex directly, so it satisfies sync.Locker and can be passed straight to withLock.

Note

withLock accepts sync.Locker instead of *sync.Mutex, so it also works with the read side of an RWMutex:

withLock(rs.closemu.RLocker(), func() {
    doClose, ok = rs.nextLocked()
})

Here rs.closemu is a sync.RWMutex, and .RLocker() returns a sync.Locker that acquires the read lock. The same withLock function handles both cases.

The proposal to add this to sync

In 2021, twmb filed proposal #49563 to add a Mutex.Locked(func()) method to the standard library:

func (m *Mutex) Locked(fn func()) {
    m.Lock()
    defer m.Unlock()
    fn()
}

The idea was that if sync.Mutex had this method natively, you wouldn’t need to write a wrapper at all for simple cases - you’d just call mu.Locked(fn) directly. It also eliminates forgotten unlocks and guards against panics leaving the mutex locked. esote pointed out that database/sql already had an internal version of this - the same withLock helper we saw earlier.

zephyrtronium raised the sync.Locker point:

I think there are advantages to making this a function that takes a Locker rather than a method on Mutex. This would allow using it with either end of an RWMutex, or another custom Locker.

– zephyrtronium on #49563

rsc declined it on philosophical grounds:

In general we try not to have two different ways to do something, and for better or worse we have the current idioms.

– rsc on #49563

The more interesting pushback came from bcmills, who argued the proposal didn’t go far enough. With generics arriving, he wanted something that also prevents unguarded access to the protected data, not just forgotten unlocks:

Now that we have generics on the way, I would rather see us move in a direction that also eliminates unlocked-access bugs, not just incrementally update Mutex for forgotten-defer bugs.

– bcmills on #49563

He sketched out what that could look like:

type Synchronized[T any] struct {
    mu  Mutex
    val T
}

func (s *Synchronized[T]) Do(fn func(*T)) {
    s.mu.Lock()
    defer s.mu.Unlock()
    fn(&s.val)
}

This is essentially the Locked[T] wrapper from the beginning of this post. The proposal was declined, but bcmills’ suggestion is the direction the community ended up going anyway-just outside the standard library.

Tailscale’s MutexValue

Tailscale’s syncs package has a MutexValue[T] type that follows this direction:

type MutexValue[T any] struct {
    mu sync.Mutex
    v  T
}

func (m *MutexValue[T]) WithLock(f func(p *T)) {
    m.mu.Lock()
    defer m.mu.Unlock()
    f(&m.v)
}

func (m *MutexValue[T]) Load() T {
    m.mu.Lock()
    defer m.mu.Unlock()
    return m.v
}

func (m *MutexValue[T]) Store(v T) {
    m.mu.Lock()
    defer m.mu.Unlock()
    m.v = v
}

They provide both Store for simple replacements and WithLock for compound mutations. When you need to read-modify-write, you go through WithLock so the lock covers the whole operation.

When a plain Set is fine

If T is small and you only ever replace the whole value without reading it first, a plain Set works. A boolean flag that gets toggled from one place, a config value that gets swapped wholesale - those are fine.

But most state doesn’t stay that simple. You start with a single integer, it becomes a struct with three fields, and now you need to update two of them based on the third. At that point, Set(func(*T)) is the only safe option.

Important

The proposal benchmarks showed about 35% overhead for the closure-based approach (14.65 ns/op vs 10.82 ns/op for direct lock/unlock) due to closures and defer not being inlineable. In practice this rarely matters. If your critical section does any real work, the lock overhead dominates.

What canceled my Go context?

Tue, 24 Feb 2026 00:00:00 +0000

I’ve spent way more hours than I’d like to admit debugging context canceled and context deadline exceeded errors. These errors usually tell you that a context was canceled, but not exactly why. In a typical client-server scenario, the reason could be any of the following:

The client disconnected
A parent deadline expired
The server started shutting down
Some code somewhere called cancel() explicitly

Go 1.20 and 1.21 added cause-tracking functions to the context package that fix this, but there’s a subtlety with WithTimeoutCause that most examples skip.

What “context canceled” actually tells you

Here’s a function that processes an order by calling three services under a shared 5-second timeout:

func processOrder(ctx context.Context, orderID string) error {
    ctx, cancel := context.WithTimeout(ctx, 5*time.Second)  // (1)
    defer cancel()  // (2)

    if err := checkInventory(ctx, orderID); err != nil {
        return err  // (3)
    }
    if err := chargePayment(ctx, orderID); err != nil {
        return err
    }
    return shipOrder(ctx, orderID)
}

(1) creates a derived context that automatically cancels after 5 seconds
(2) cleans up the timer when the function returns, standard practice per the context package documentation
(3) if anything goes wrong, including a context cancellation, the error is returned as-is

When a context gets canceled, the underlying reason is either context.Canceled or context.DeadlineExceeded. Libraries wrap these in their own types (*url.Error for net/http, gRPC status codes for grpc), but errors.Is still matches the sentinel.

So if checkInventory makes an HTTP call and the client disconnects while it’s in flight, the error that bubbles all the way up is:

context canceled

If the 5-second timeout fires while chargePayment is waiting on a slow payment gateway:

context deadline exceeded

Two sentinel errors. No reason, no origin, nothing. The caller of processOrder has no idea what actually happened.

You’d think wrapping the error helps:

if err := checkInventory(ctx, orderID); err != nil {
    return fmt.Errorf("checking inventory for %s: %w", orderID, err)
}

Now the log says:

checking inventory for ord-123: context canceled

Better. You know it happened during the inventory check. But you still don’t know why the context was canceled. Was it the 5-second timeout? A parent context’s deadline? The client hanging up? A graceful shutdown signal? The error doesn’t say.

Without the cause, you can’t tell whether to retry, alert, or ignore, and your logs don’t give on-call enough to triage.

When this happens in production, you end up scanning logs for other errors around the same timestamp, hoping something nearby gives you a clue. If the logs don’t help, you trace the context from where it was created, through every function that receives it, looking for cancel calls and timeouts. In a small service this takes a few minutes. In a larger codebase with middleware, interceptors, and nested timeouts, it can take a lot longer.

This has been a known pain point in the Go community for years. Bryan C. Mills noted this in issue #26356 back in 2018:

I’ve seen this sort of issue crop up several times now. I wonder if context.Context should record a bit of caller information… Then we could add a debugging hook to interrogate why a particular context.Context was cancelled.

– bcmills on #26356

On proposal #51365, which eventually led to the cause APIs, bullgare described the production experience:

I had a case when on production I got random “context canceled” log messages. And in the case like that you don’t even know where to dig and how to investigate it further. Or how to reproduce it on a local machine.

– bullgare on #51365

That proposal led to the cause APIs that shipped in go 1.20.

Attaching a cause with WithCancelCause

context.WithCancelCause gives you a CancelCauseFunc that takes an error instead of a plain CancelFunc. Here’s the same processOrder rewritten to use it:

func processOrder(ctx context.Context, orderID string) error {
    ctx, cancel := context.WithCancelCause(ctx)
    defer cancel(nil)  // (1)

    if err := checkInventory(ctx, orderID); err != nil {
        cancel(fmt.Errorf(
            "order %s: inventory check failed: %w", orderID, err,
        ))  // (2)
        return err
    }
    if err := chargePayment(ctx, orderID); err != nil {
        cancel(fmt.Errorf(
            "order %s: payment failed: %w", orderID, err,
        ))
        return err
    }
    return shipOrder(ctx, orderID)
}

(1) cancel(nil) as the default, sets the cause to context.Canceled
(2) before returning the error, records a specific reason that includes the original error via %w

Now you can read the cause with context.Cause(ctx). If checkInventory fails because of a connection error, the cause comes back as:

order ord-123: inventory check failed: connection refused

Instead of just context canceled. You know it was the inventory check, you know it was a connection error, and because the original error is wrapped with %w, the full error chain is preserved for programmatic inspection.

The first call to cancel wins. Once a cause is recorded, subsequent calls are no-ops. So defer cancel(nil) only takes effect if nothing else canceled the context first. This means the most specific cancel, the one closest to the actual failure, is what gets recorded. If checkInventory sets a cause and then defer cancel(nil) runs on the way out, the inventory cause is preserved.

context.Cause is a standalone function rather than a method on Context because Go’s compatibility promise means the Context interface can’t add new methods. Err() will always return nil, Canceled, or DeadlineExceeded. If you call context.Cause on a context that wasn’t created with one of the cause-aware functions, it returns whatever ctx.Err() returns. On an uncanceled context, it returns nil.

This handles explicit cancellation, but the function still has no timeout. The original version used WithTimeout for the 5-second deadline. To label that timeout with a cause, Go 1.21 added WithTimeoutCause:

ctx, cancel := context.WithTimeoutCause(
    ctx,
    5*time.Second,
    fmt.Errorf("order %s: 5s processing timeout exceeded", orderID),
)
defer cancel()

When the timer fires, context.Cause(ctx) returns the custom error instead of a bare context.DeadlineExceeded. There’s also WithDeadlineCause, which is the same thing but takes an absolute time.Time. If all you need is a label on the timeout path, WithTimeoutCause works. But there’s a subtlety in how it interacts with defer cancel() that can silently discard your cause.

Why defer cancel() discards the cause

WithTimeoutCause returns (Context, CancelFunc), not (Context, CancelCauseFunc). The cancel function you get back doesn’t accept an error argument. Proposal #56661 defined it this way explicitly:

func WithTimeoutCause(
    parent Context, timeout time.Duration, cause error,
) (Context, CancelFunc)

Think about what happens when processOrder finishes normally in 100ms, well before the 5-second timeout:

ctx, cancel := context.WithTimeoutCause(
    ctx,
    5*time.Second,
    fmt.Errorf("order %s: 5s timeout exceeded", orderID),
)
defer cancel()  // (1)
// ... returns in 100ms ...

(1) cancel() fires on return, before the timer

If the timer fires first (the function ran too long), the context is canceled with DeadlineExceeded and context.Cause(ctx) returns your custom message. That path works correctly.

But if the function returns first, which is the common case, defer cancel() fires. Since it’s a plain CancelFunc, it can’t take a cause argument. The Go source shows what it does internally:

return c, func() { c.cancel(true, Canceled, nil) }

It passes Canceled with a nil cause. Your custom cause only gets recorded when the internal timer fires. On the normal return path, the cause is just context.Canceled.

This isn’t a bug. WithTimeoutCause is a new function, so it could have returned CancelCauseFunc. The Go team chose not to. rsc explained the reasoning when closing proposal #51365:

WithDeadlineCause and WithTimeoutCause require you to say ahead of time what the cause will be when the timer goes off, and then that cause is used in place of the generic DeadlineExceeded. The cancel functions they return are plain CancelFuncs (with no user-specified cause), not CancelCauseFuncs, the reasoning being that the cancel on one of these is typically just for cleanup and/or to signal teardown that doesn’t look at the cause anyway.

– rsc on #51365

He also acknowledged that this creates a subtle distinction between the two APIs:

That distinction makes sense, but it makes WithDeadlineCause and WithTimeoutCause different in an important, subtle way from WithCancelCause. We missed that in the discussion…

– rsc on #51365

So WithTimeoutCause only carries the custom cause when the timeout actually fires. On the normal return path and on any explicit cancellation path, defer cancel() discards it. If you have a middleware that logs context.Cause(ctx) for every request, it’ll see context.Canceled instead of something useful on the most common path.

Covering every path with a manual timer

The way around this is to skip WithTimeoutCause and wire the timer yourself using WithCancelCause. Since there’s only one CancelCauseFunc, every path goes through the same door, and first-cancel-wins handles the rest. Here’s processOrder one more time:

func processOrder(ctx context.Context, orderID string) error {
    ctx, cancel := context.WithCancelCause(ctx)  // (1)
    defer cancel(errors.New("processOrder completed"))  // (2)

    timer := time.AfterFunc(5*time.Second, func() {
        cancel(fmt.Errorf("order %s: 5s timeout exceeded", orderID))  // (3)
    })
    defer timer.Stop()  // (4)

    if err := checkInventory(ctx, orderID); err != nil {
        cancel(fmt.Errorf(
            "order %s: inventory check failed: %w", orderID, err,
        ))
        return err
    }
    if err := chargePayment(ctx, orderID); err != nil {
        cancel(fmt.Errorf("order %s: payment failed: %w", orderID, err))
        return err
    }
    return shipOrder(ctx, orderID)
}

(1) one CancelCauseFunc for everything
(2) the default cause if nothing else cancels first
(3) the timer fires with a timeout-specific cause
(4) stop the timer on normal return

Three possible paths, one cancel function. If the timer fires, context.Cause(ctx) returns:

order ord-123: 5s timeout exceeded

If checkInventory fails with a connection error:

order ord-123: inventory check failed: connection refused

On normal completion:

processOrder completed

This is actually what the stdlib does internally; WithDeadline uses time.AfterFunc under the hood.

The trade-off is that ctx.Err() always returns context.Canceled, never context.DeadlineExceeded, because you’re using WithCancelCause instead of WithTimeout. ctx.Deadline() also returns the zero value, which matters if downstream code or frameworks use it to propagate deadlines (gRPC, for example, sends the deadline across service boundaries via ctx.Deadline()). If downstream code branches on errors.Is(err, context.DeadlineExceeded), that check won’t match either.

When you also need DeadlineExceeded

If downstream code relies on errors.Is(err, context.DeadlineExceeded) to distinguish timeouts from explicit cancellations, stack a WithCancelCause on top of a WithTimeoutCause:

func processOrder(ctx context.Context, orderID string) error {
    ctx, cancelCause := context.WithCancelCause(ctx)       // (1)
    ctx, cancelTimeout := context.WithTimeoutCause(         // (2)
        ctx,
        5*time.Second,
        fmt.Errorf("order %s: 5s timeout exceeded", orderID),
    )
    defer cancelTimeout()                                   // (3)
    defer cancelCause(errors.New("processOrder completed")) // (4)

    if err := checkInventory(ctx, orderID); err != nil {
        cancelCause(fmt.Errorf(
            "order %s: inventory check failed: %w", orderID, err,
        ))
        return err
    }
    if err := chargePayment(ctx, orderID); err != nil {
        cancelCause(fmt.Errorf(
            "order %s: payment failed: %w", orderID, err,
        ))
        return err
    }
    return shipOrder(ctx, orderID)
}

(1) outer context for error-path and normal-completion causes
(2) inner context with a timeout cause for the deadline path
(3) deferred first, runs last (LIFO), cleans up the inner timeout context
(4) deferred second, runs first (LIFO), cancels the outer context with a cause

When the timeout fires, the inner context gets canceled with DeadlineExceeded and the custom cause. errors.Is(ctx.Err(), context.DeadlineExceeded) works as expected. On the error path, cancelCause(specificErr) cancels the outer context, which propagates to the inner. On normal completion, cancelCause("processOrder completed") runs first because of LIFO defer ordering, canceling the outer and propagating to the inner. Then cancelTimeout() finds the inner already canceled and does nothing.

Note

Notice the defer ordering. cancelCause must be deferred after cancelTimeout so it runs before it (LIFO). If you reverse them, cancelTimeout() cancels the inner context with context.Canceled before cancelCause gets a chance to set a meaningful cause.

One subtlety: after line (2), ctx points to the inner context. If you call context.Cause(ctx) on it after a cancelCause(specificErr) call, you’ll see context.Canceled (propagated from the outer), not the specific error. The specific cause lives on the outer context. In practice this doesn’t matter because the caller inspects the returned error, not context.Cause, but it’s worth knowing if you add logging inside processOrder itself.

The manual timer pattern is simpler and covers most cases. This stacked approach is for when downstream code specifically relies on errors.Is(err, context.DeadlineExceeded).

Reading and logging the cause

context.Cause returns an error, so the full errors.Is and errors.As machinery works on it. Since the cause in processOrder wraps the original error with %w, you can unwrap through it to reach the underlying error.

If checkInventory failed because the inventory service refused the connection, the cause is "order ord-123: inventory check failed: connection refused", and the wrapped error is a *net.OpError. You can pull it out:

cause := context.Cause(ctx)

var netErr *net.OpError
if errors.As(cause, &netErr) {
    // The inventory service is unreachable.
    slog.Error("network failure",
        "op", netErr.Op,
        "addr", netErr.Addr,
    )
}

errors.Is works the same way. If the timer cause had wrapped context.DeadlineExceeded (e.g., with fmt.Errorf("order timeout: %w", context.DeadlineExceeded)), you could check for it:

if errors.Is(context.Cause(ctx), context.DeadlineExceeded) {
    // A timeout fired; maybe adjust the deadline or retry.
}

For logging, ctx.Err() and context.Cause(ctx) serve different purposes. ctx.Err() gives you the category (cancellation or timeout), and context.Cause(ctx) gives you the specific reason. Keeping them as separate structured log fields makes them easy to query:

if ctx.Err() != nil {
    slog.Error("request failed",
        "err", ctx.Err(),
        "cause", context.Cause(ctx),
    )
}

That produces:

level=ERROR msg="request failed" err="context deadline exceeded"
    cause="order ord-123: 5s timeout exceeded"

A useful pattern is wrapping the request context with WithCancelCause at the middleware level so every handler downstream gets automatic cause tracking. The cancel function is stashed in the context via WithValue so handlers can pull it out and set a specific cause:

type cancelCauseKey struct{}

func withCause(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        ctx, cancel := context.WithCancelCause(r.Context())    // (1)
        defer cancel(errors.New("request completed"))           // (2)

        ctx = context.WithValue(ctx, cancelCauseKey{}, cancel)  // (3)
        next.ServeHTTP(w, r.WithContext(ctx))

        if ctx.Err() != nil {  // (4)
            slog.Error("request context canceled",
                "method", r.Method,
                "path", r.URL.Path,
                "err", ctx.Err(),
                "cause", context.Cause(ctx),
            )
        }
    })
}

(1) wrap the request context with WithCancelCause
(2) default cause for normal completion
(3) stash the cancel function so downstream handlers can reach it
(4) only fires if the context was canceled during request handling (client disconnect, handler cancel), not on normal completion; defer cancel(...) hasn’t run yet at this point

Any handler can pull the cancel function out and set a cause:

func handleOrder(w http.ResponseWriter, r *http.Request) {
    cancel, _ := r.Context().
        Value(cancelCauseKey{}).(context.CancelCauseFunc)

    if err := processOrder(r.Context()); err != nil {
        cancel(fmt.Errorf("order processing failed: %w", err))
        http.Error(w, "order failed", http.StatusInternalServerError)
        return
    }
    // ...
}

First cancel wins, so the most specific reason is what shows up in the middleware log. streamingfast/substreams uses this approach in production, storing a CancelCauseFunc in the request context so worker pools downstream can cancel with a specific error.

One thing to know: the stdlib’s HTTP server and most third-party libraries cancel contexts without setting a cause, since they predate Go 1.20. If a client disconnects, context.Cause(ctx) will return context.Canceled, not a custom error. The cause APIs are most useful for reasons set by your own code.

Closing words

Most of the time, WithCancelCause is all you need. It covers explicit cancellation with a specific reason, and context.Cause gives you a way to read it back. If you also need a timeout, WithTimeoutCause labels the deadline path without extra wiring. The gotcha is that defer cancel() on the normal return path discards the cause, so if you need causes on every path, including normal completion, the manual timer pattern fills that gap. The stacked approach on top of that is for when downstream code also needs DeadlineExceeded.

The cause APIs have seen steady adoption since Go 1.20. golang.org/x/sync/errgroup uses WithCancelCause internally since v0.3.0, so context.Cause(ctx) on an errgroup-canceled context returns the actual goroutine error. docker cli uses it to distinguish OS signals from normal cancellation. kubernetes cluster-api migrated its codebase to the *Cause variants. gRPC-Go had a proposal to use it for distinguishing client disconnects from gRPC timeouts and connection closures.

Runnable examples:

Structured concurrency & Go

Sat, 21 Feb 2026 00:00:00 +0000

At my workplace, a lot of folks are coming to Go from Python and Kotlin. Both languages have structured concurrency built into their async runtimes, and people are often surprised that Go doesn’t. The go statement just launches a goroutine and walks away. There’s no scope that waits for it, no automatic cancellation if the parent dies, no built-in way to collect its errors.

This post looks at where the idea of structured concurrency comes from, what it looks like in Python and Kotlin, and how you get the same behavior in Go using errgroup, WaitGroup, and context.

From goto to structured programming

In 1968, Dijkstra wrote a letter to the editor of Communications of the ACM titled Go To Statement Considered Harmful. His core argument was that unrestricted use of goto made programs nearly impossible to reason about:

The unbridled use of the go to statement has as an immediate consequence that it becomes terribly hard to find a meaningful set of coordinates in which to describe the process progress.

Structured programming replaced goto with scoped constructs like if, while, and functions. The key insight was that control flow should be lexically scoped: you can look at a block of code and know where it starts, where it ends, and that everything in between finishes before execution moves on.

The same problem showed up later in concurrent programming.

The same problem, but with concurrency

Spawning a thread or goroutine that outlives its parent is the concurrency equivalent of goto. The spawned work escapes the scope that created it, and now you have to reason about lifetimes that cross boundaries.

Martin Sustrik, creator of ZeroMQ, coined the term “structured concurrency” in his Structured Concurrency blog post. He framed the idea as an extension of how block lifetimes work in structured programming:

Structured concurrency prevents lifetime of green thread B launched by green thread A to exceed lifetime of A.

Eric Niebler later expanded on Sustrik’s idea, tying it directly to how function calls work in sequential code:

“Structured concurrency” refers to a way to structure async computations so that child operations are guaranteed to complete before their parents, just the way a function is guaranteed to complete before its caller.

– Eric Niebler, Structured Concurrency (Niebler)

Nathaniel J. Smith (NJS) took this further in his essay Notes on structured concurrency:

That’s right: go statements are a form of goto statement.

NJS’s broader point was that spawning a background task breaks function abstraction the same way goto does. Once a function can spawn work that outlives it, the caller can no longer reason about when the function’s effects are complete:

Every time our control splits into multiple concurrent paths, we want to make sure that they join up again.

Structured concurrency boils down to a few rules:

Concurrent tasks are spawned within a scope and can’t outlive it
If the parent scope is cancelled or a task fails, the remaining tasks are cancelled too
The scope doesn’t exit until all its tasks have finished
Errors propagate from children back to the parent

This essay prompted Go proposal #29011, filed by smurfix, which proposed adding structured concurrency to Go. NJS participated in the discussion and made a point that stuck with me:

Right now you can structure things this way in Go, but it’s way more cumbersome than just typing go myfunc(), so Go ends up encouraging the “unstructured” style.

– Nathaniel J. Smith, Go proposal #29011

The proposal was eventually closed. Before getting into Go’s approach, it helps to see what structured concurrency actually looks like in practice across the three languages.

Python’s TaskGroup

Python 3.11 introduced asyncio.TaskGroup as the structured concurrency primitive. Here’s an example that runs three tasks concurrently, where one of them fails:

import asyncio


async def fetch(url: str, should_fail: bool = False) -> str:
    await asyncio.sleep(0.1)  # (1)
    if should_fail:
        raise ValueError(f"failed to fetch {url}")
    return f"fetched {url}"


async def main() -> None:
    try:
        async with asyncio.TaskGroup() as tg:  # (2)
            tg.create_task(fetch("/api/users"))  # (3)
            tg.create_task(fetch("/api/orders", should_fail=True))
            tg.create_task(fetch("/api/products"))
    except* ValueError as eg:  # (4)
        for exc in eg.exceptions:
            print(f"caught: {exc}")
    finally:
        print("cleanup runs no matter what")  # (5)


asyncio.run(main())

Here:

(1) await is a cancellation point; the runtime can interrupt the coroutine here
(2) async with creates a scope that waits for all tasks to finish
(3) tasks are spawned inside the group and tied to its lifetime
(4) if any task raises, the group cancels the remaining tasks and collects the errors
(5) finally runs regardless of success or failure

The thing that makes this work is that await expressions are cancellation points. When the group decides to cancel, the runtime delivers that cancellation at the next await in each running coroutine.

Kotlin’s coroutineScope

Kotlin has had structured concurrency since kotlinx.coroutines 0.26. The equivalent construct is coroutineScope. Here’s the same scenario with three tasks and one failure:

import kotlinx.coroutines.*

suspend fun fetch(url: String, shouldFail: Boolean = false): String {
    delay(100)  // (1)
    if (shouldFail) throw IllegalStateException("failed to fetch $url")
    return "fetched $url"
}

suspend fun main() {
    try {
        coroutineScope {  // (2)
            launch { fetch("/api/users") }  // (3)
            launch { fetch("/api/orders", shouldFail = true) }
            launch { fetch("/api/products") }
        }
    } catch (e: IllegalStateException) {  // (4)
        println("caught: ${e.message}")
    } finally {
        println("cleanup runs no matter what")  // (5)
    }
}

Here:

(1) delay is a suspension point where cancellation can be delivered
(2) coroutineScope waits for all children and cancels siblings if one fails
(3) launch starts a coroutine tied to this scope
(4) the exception propagates after all children are cancelled
(5) finally runs as expected

Like Python’s await, Kotlin’s suspension functions (delay, channel operations, etc.) are cancellation points. When the scope cancels, the runtime delivers a CancellationException at the next suspension point in each running coroutine.

Kotlin also has supervisorScope, which is the variant where siblings keep running when one fails. We’ll see the Go equivalent of that shortly.

Go doesn’t do this by default

Go’s go statement is unstructured. When you write go func() { ... }(), the runtime spawns a background goroutine and immediately moves on. The calling function doesn’t wait for it, doesn’t get notified when it finishes, and has no way to cancel it. Unless you explicitly synchronize with something like a WaitGroup or a channel, that goroutine can outlive the function that spawned it. There’s no built-in scope that ties their lifetimes together.

But you can compose the same patterns using channels, sync.WaitGroup, context, and errgroup from x/sync.

errgroup for cancel-on-error

This is Go’s equivalent of TaskGroup and coroutineScope. Same scenario: three tasks, one fails, siblings get cancelled:

func run() error {
    g, ctx := errgroup.WithContext(context.Background())  // (1)

    g.Go(func() error {  // (2)
        select {
        case <-ctx.Done():
            return ctx.Err()
        case <-time.After(100 * time.Millisecond):
            fmt.Println("fetched /api/users")
            return nil
        }
    })

    g.Go(func() error {  // (3)
        return fmt.Errorf("failed to fetch /api/orders")
    })

    g.Go(func() error {  // (4)
        select {
        case <-ctx.Done():
            return ctx.Err()
        case <-time.After(100 * time.Millisecond):
            fmt.Println("fetched /api/products")
            return nil
        }
    })

    err := g.Wait()  // (5)
    fmt.Println("cleanup runs no matter what")
    return err
}

Here:

(1) creates a group with a derived context; if any goroutine fails, the context cancels
(2) fetches users; observes cancellation via ctx.Done()
(3) fails immediately, triggering cancellation of the shared context
(4) fetches products; also observes cancellation
(5) Wait blocks until all goroutines finish and returns the first non-nil error

Notice how the Go version requires each goroutine to explicitly check ctx.Done(). In Python and Kotlin, the runtime handles that at await/suspension points. In Go, you wire it in yourself.

WaitGroup for supervisor-like behavior

This is Go’s equivalent of Kotlin’s supervisorScope. Siblings keep running regardless of individual failures:

func run() []error {
    var (
        wg   sync.WaitGroup
        mu   sync.Mutex
        errs []error
    )

    urls := []string{"/api/users", "/api/orders", "/api/products"}

    for _, url := range urls {
        wg.Go(func() {  // (1)
            time.Sleep(100 * time.Millisecond)
            if url == "/api/orders" {
                mu.Lock()
                err := fmt.Errorf("failed to fetch %s", url)
                errs = append(errs, err) // (2)
                mu.Unlock()
                return
            }
            fmt.Printf("fetched %s\n", url)
        })
    }

    wg.Wait()  // (3)
    fmt.Println("cleanup runs no matter what")
    return errs
}

Here:

(1) Go launches a goroutine and handles Add/Done internally (Go 1.25+)
(2) errors are collected but don’t affect other goroutines
(3) Wait blocks until all goroutines finish

Those two examples cover Go’s equivalents of the structured patterns in Python and Kotlin. But the code looks noticeably different, and the reason comes down to how these runtimes handle concurrent execution.

Goroutines aren’t coroutines

The fundamental difference between the Python/Kotlin approach and Go’s approach comes down to how cancellation gets delivered.

In Python, async def functions are coroutines. They run on a single-threaded event loop and yield control at every await. In Kotlin, suspend functions are coroutines. They run on cooperative dispatchers (which can be backed by thread pools) and yield at every suspension point. Both languages have colored functions (async/suspend) - the “color” means async functions can only be called from other async functions, which lets the runtime track every point where a coroutine can yield. These yield points are also cancellation points, so when a scope cancels, the runtime delivers the cancellation at the next such point.

Go’s goroutines aren’t coroutines. They’re functions running on a preemptive scheduler backed by OS threads. The runtime multiplexes goroutines onto OS threads and can preempt them, but it has no knowledge of application-level cancellation. There’s no concept of a “suspension point” where the runtime can inject a cancellation signal. A goroutine doing CPU-bound work will keep running even if its context was cancelled. The goroutine has to check ctx.Done() explicitly via a select statement.

Here’s the cooperative pattern in Go:

func worker(ctx context.Context) error {
    for {
        select {
        case <-ctx.Done():  // (1)
            return ctx.Err()
        default:
            doUnitOfWork()  // (2)
        }
    }
}

(1) checks for cancellation on each iteration
(2) does a chunk of work, then loops back to the cancellation check

And here’s a goroutine that ignores cancellation:

func busyWorker(ctx context.Context) {
    for {
        // CPU-bound work, never checks ctx.Done()
        heavyComputation()
    }
}

This goroutine will keep running until the process exits, regardless of whether its context was cancelled.

Python and Kotlin workers also need to cooperate for cancellation to actually work. If a coroutine does CPU-bound work without hitting an await or a suspension point, the runtime can’t interrupt it either.

In Python, a non-cooperative worker looks like this:

async def stubborn_worker() -> None:
    while True:
        heavy_computation()  # (1)

(1) no await anywhere, so the runtime never gets a chance to deliver cancellation

To make it cooperative, you insert an explicit cancellation check:

async def cooperative_worker() -> None:
    while True:
        await asyncio.sleep(0)  # (1)
        heavy_computation()

(1) await asyncio.sleep(0) yields control back to the event loop, giving it a chance to cancel this coroutine

In Kotlin, the same situation looks like this:

suspend fun stubbornWorker() {
    while (true) {
        heavyComputation()  // (1)
    }
}

(1) no suspension point, so cancellation can’t be delivered

To fix this, use coroutineContext.ensureActive() to check whether the coroutine’s scope has been cancelled:

suspend fun cooperativeWorker() {
    while (true) {
        coroutineContext.ensureActive()  // (1)
        heavyComputation()
    }
}

(1) throws CancellationException if the scope has been cancelled

This isn’t too different from what Go does with ctx.Done(). In all three languages, a tight loop doing CPU-bound work won’t cancel unless the worker explicitly checks. The difference is that in Python and Kotlin, most standard library functions (asyncio.sleep, delay, channel operations) are cancellation points by default, so you hit them naturally in typical code.

Explicit by design

Go’s concurrency model is built on CSP (Communicating Sequential Processes). Goroutines communicate via channels, not via structured scopes. The go statement is deliberately low-level. It gives you a concurrent execution unit and gets out of your way.

Python and Kotlin start from the structured side and require you to opt out. Python’s asyncio.create_task outside a group, or Kotlin’s supervisorScope, are the escape hatches. Go starts from the unstructured side and requires you to opt in. errgroup and WaitGroup are how you add structure. Different design priorities lead to different defaults.

Go proposal #29011 was closed after Ian Lance Taylor pointed out the practical problem:

I think these ideas are definitely interesting. But your specific suggestion would break essentially all existing Go code, so that is a non-starter.

In a later comment, he acknowledged that there are good ideas in the space but argued for improving existing primitives rather than changing the language:

There are likely good ideas in the area of structured concurrency that we can do better at, in the language or the standard library or both.

NJS also noted that structured concurrency helps with error propagation, because when a goroutine exits with an error, there is somewhere to propagate that error to. That’s a real shortcoming of the current model. The response from the Go team was that errgroup, context, and WaitGroup already provide the building blocks, and language-level changes weren’t justified given the cost.

There’s also a Trio forum discussion on Go’s situation. NJS was cautious about overstating the benefits, noting that structured concurrency wouldn’t have prevented about a quarter of the concurrency bugs in a study on real-world Go bugs they examined (classic race conditions). But he pointed out that some of the hardest-to-understand bugs involved standard library modules that spawned surprising background goroutines. That couldn’t happen in a language with truly scoped concurrency. He also observed that all mistakes in using Go’s WaitGroup API seemed like they’d be trivially prevented by structured concurrency.

Making Go’s concurrency more structured

If you’re writing Go and want structured concurrency, there are a few practices that help. The core idea is:

Never start a goroutine without knowing when it will stop.

– Dave Cheney, Practical Go

Here are some concrete ways to follow that:

Know the lifetime of every goroutine you spawn. Before writing go func(), you should be able to answer: what signals this goroutine to stop, and what waits for it to finish? If you can’t answer both, the goroutine’s lifetime is unknown and it can leak.
Use go func() sparingly. A bare go func() { ... }() sends a goroutine into the background with no handle to wait on it or cancel it. Prefer launching goroutines through errgroup or behind a WaitGroup so something always owns their lifetime.
Let the caller decide concurrency. If you’re writing a library function, return a result instead of spawning a goroutine internally. Let the caller choose how to run it concurrently. This keeps goroutine lifetimes visible at the call site.
Pass context down, check it inside. Accept context.Context as the first parameter and check ctx.Done() in long-running loops or blocking operations. This is how the caller communicates “I don’t need this anymore.”

Here’s what a well-structured goroutine launch pattern looks like:

func processItems(ctx context.Context, items []string) error {
    g, ctx := errgroup.WithContext(ctx)  // (1)

    for _, item := range items {
        g.Go(func() error {  // (2)
            select {
            case <-ctx.Done():
                return ctx.Err()
            default:
                return handle(ctx, item)  // (3)
            }
        })
    }

    return g.Wait()  // (4)
}

(1) the group owns the goroutines and the context ties their lifetimes to the caller
(2) each goroutine is launched through the group, so Wait knows about it
(3) the actual work, which also receives the context for deeper cancellation
(4) all goroutines finish before this function returns

Every goroutine has a clear owner and exit condition. If any task fails, the context cancels and the others observe it on their next check.

Catching mistakes

Since Go doesn’t enforce structured concurrency at the language level, it’s possible to leak goroutines or miss cancellation signals. I wrote about one common case in early return and goroutine leak.

There are a few tools that help catch these issues:

goleak is a library from Uber that you add to TestMain. It checks that no goroutines are still running when your tests finish. It’s useful for catching the “forgot to cancel” class of bugs, which is the most common way unstructured goroutines cause trouble.
The race detector (go test -race) catches data races between goroutines. It won’t catch leaks, but unstructured goroutines with unclear lifetimes are more likely to race because their access to shared state is harder to reason about.
testing/synctest (Go 1.24+) lets you test concurrent code with fake time. You can verify that goroutines exit when their context cancels or their parent scope ends, without relying on real time.Sleep calls that make tests slow and flaky.
Go 1.26 adds an experimental goroutine leak profile via runtime/pprof. It uses the garbage collector’s reachability analysis to find goroutines permanently blocked on synchronization primitives that no runnable goroutine can reach. Unlike goleak, which only works in tests, this profile can be collected from a running program via /debug/pprof/goroutineleak, making it useful for finding leaks in production.

Closing words

If you’re coming from languages like Python or Kotlin, Go’s concurrency can feel overly verbose, and it is. Wiring up errgroup, checking ctx.Done() in every goroutine, guarding shared state with a mutex around a WaitGroup; that’s a lot of ceremony for something the other languages hand you for free.

But as covered earlier, the concurrency paradigms are fundamentally different. Python and Kotlin’s cooperative runtimes can own the cancellation because they own the scheduling. Go’s preemptive scheduler doesn’t know what your goroutine is doing or when it should stop. That’s why cancellation is your job.

The same structured patterns are all achievable in Go. You just build them yourself out of errgroup, WaitGroup, context, and channels. That gives you more control over goroutine lifetimes, but it also means more surface area for bugs. Forget a ctx.Done() check and a goroutine leaks. Misuse a WaitGroup and you deadlock. The study on real-world Go bugs found 171 concurrency bugs across projects like Docker and Kubernetes, with more than half caused by Go-specific issues around message passing and goroutine management.

Your Go tests probably don't need a mocking library

Fri, 23 Jan 2026 00:00:00 +0000

There are frameworks that generate those kind of fakes, and one of them is called GoMock… they’re fine, but I find that on balance, the handwritten fakes tend to be easier to reason about and clearer to sort of see what’s going on. But I’m not an enterprise Go programmer. Maybe people do need that, so I don’t know, but that’s my advice.

– Andrew Gerrand, Testing Techniques (46:44)

No shade against mocking libraries like gomock or mockery. I use them all the time, both at work and outside. But one thing I’ve noticed is that generating mocks often leads to poorly designed tests and increases onboarding time for a codebase.

Also, since almost no one writes tests by hand anymore and instead generates them with LLMs, the situation gets more dire. These ghosts often pull in all kinds of third-party libraries to mock your code, simply because they were trained on a lot of hastily written examples on the web.

So the idea of this post isn’t to discourage using mocking libraries. Rather, it’s to show that even if your codebase already has a mocking library in the dependency chain, not all of your tests need to depend on it. Below are a few cases where I tend not to use any mocking library and instead leverage the constructs that Go gives us.

This does require some extra song and dance with the language, but in return, we gain more control over our tests and reduce the chance of encountering spooky action at a distance.

Mocking a function

Say you have a function that creates a database handle:

func OpenDB(user, pass, host, dbName string) (*sql.DB, error) {
    dsn := fmt.Sprintf("%s:%s@tcp(%s)/%s", user, pass, host, dbName)
    return sql.Open("mysql", dsn)
}

The problem is that sql.Open hands the DSN directly to the driver. When you call OpenDB("admin", "secret", "db.internal", "orders"), the function formats the DSN string and hands it to the MySQL driver. You can’t intercept that call, you can’t control what it returns, and you probably don’t want unit tests leaning on a real driver (or a real MySQL instance) just to verify DSN formatting.

The fix is to make the database opener injectable:

type SQLOpenFunc func(driver, dsn string) (*sql.DB, error)  // (1)

func OpenDB(
    user, pass, host, dbName string, openFn SQLOpenFunc, // (2)
) (*sql.DB, error) {
    dsn := fmt.Sprintf("%s:%s@tcp(%s)/%s", user, pass, host, dbName)
    return openFn("mysql", dsn)  // (3)
}

Here:

(1) defines a function type that matches sql.Open’s signature
(2) accepts an opener function as a parameter
(3) delegates to that function instead of calling sql.Open directly

In production, pass the real sql.Open:

func main() {
    db, err := OpenDB(
        "admin", "secret", "db.internal", "orders", sql.Open,  // (1)
    )
    // ...
}

Here:

(1) the real sql.Open is passed as the last argument - no wrapper needed

In tests, pass a fake that captures what was passed or returns canned values:

func TestOpenDB(t *testing.T) {
    var got string
    fakeOpen := func(driver, dsn string) (*sql.DB, error) {
        got = dsn  // (1) capture what was passed
        return nil, nil
    }

    OpenDB(
        "admin", "secret", "db.internal", "orders", fakeOpen,  // (2)
    )

    want := "admin:secret@tcp(db.internal)/orders"
    if got != want {
        t.Errorf("got %q, want %q", got, want)
    }
}

Here:

(1) the fake captures the DSN for later assertion
(2) the call site looks the same, just with a different opener

This pattern works for any function dependency - UUID generators, random number sources, file openers. Functions are first-class values in Go, so you can pass them around like any other value.

The downside is that parameter lists can grow quickly. If OpenDB also needed a logger, a metrics client, and a config loader, the signature becomes unwieldy. When you find yourself passing more than two or three function dependencies, consider grouping them into a struct with an interface - see Mocking a method on a type.

Monkey patching

Sometimes you inherit code where refactoring the function signature isn’t practical. Maybe it’s called from dozens of places, or it’s part of a public API you can’t change:

func PublishOrderCreated(
    ctx context.Context, brokers []string, id string) error {
    w := &kafka.Writer{
        Addr: kafka.TCP(brokers...), Topic: "order-events",
    }
    defer w.Close()
    return w.WriteMessages(ctx, kafka.Message{Key: []byte(id)})
}

The Kafka writer is instantiated directly inside the function. There’s no seam to inject a fake without touching every call site. If this function is called from 50 places in your codebase, changing its signature means updating all 50.

One workaround is a package-level variable that points to the constructor:

type kafkaWriter interface {  // (1)
    WriteMessages(context.Context, ...kafka.Message) error
    Close() error
}

var newWriter = func(brokers []string) kafkaWriter {  // (2)
    return &kafka.Writer{
        Addr: kafka.TCP(brokers...), Topic: "order-events",
    }
}

func PublishOrderCreated(
    ctx context.Context, brokers []string, id string,
) error {
    w := newWriter(brokers)  // (3)
    defer w.Close()
    return w.WriteMessages(ctx, kafka.Message{Key: []byte(id)})
}

Here:

(1) define an interface with only the methods we need from kafka.Writer
(2) the package variable returns the interface type, not the concrete type
(3) the function calls it instead of instantiating directly

Production code doesn’t change - it calls PublishOrderCreated exactly as before, and the default newWriter creates real Kafka writers.

Tests swap it out:

type fakeWriter struct {
    key []byte
}

func (f *fakeWriter) WriteMessages(
    _ context.Context, msgs ...kafka.Message) error {
    if len(msgs) > 0 {
        f.key = msgs[0].Key  // (1)
    }
    return nil
}

func (f *fakeWriter) Close() error { return nil }

func TestPublishOrderCreated(t *testing.T) {
    orig := newWriter
    t.Cleanup(func() { newWriter = orig })  // (2) restore after test

    fake := &fakeWriter{}
    newWriter = func([]string) kafkaWriter {  // (3)
        return fake
    }

    PublishOrderCreated(
        t.Context(), []string{"kafka:9092"}, "ord-1",
    )

    if got := string(fake.key); got != "ord-1" {  // (4)
        t.Errorf("got %q, want %q", got, "ord-1")
    }
}

Here:

(1) the fake captures the message key for later assertion
(2) t.Cleanup ensures the original is restored even if the test fails
(3) the replacement factory returns the fake - note it returns kafkaWriter, matching the variable’s type
(4) assert the captured key matches the expected value

This works, but be aware of the costs. Tests that mutate package state can’t run in parallel - they’d stomp on each other’s fakes. If you’re writing tests from an external package (package events_test), the variable must be exported, which pollutes your public API.

Prefer the function parameter pattern or the interface pattern over monkey patching. Reserve this technique for legacy code where changing signatures would be too disruptive.

Mocking a method on a type

This is a pattern you’ll see all the time in services that integrate with third-party APIs. Here’s a payment service that charges customers through Stripe (this uses the newer stripe.Client API, which is the recommended shape in recent stripe-go versions):

func (s *Service) ChargeCustomer(
    ctx context.Context, custID string, cents int64) (string, error) {
    intent, err := s.client.V1PaymentIntents.Create(ctx,
        &stripe.PaymentIntentCreateParams{
            Amount:   stripe.Int64(cents),
            Currency: stripe.String("usd"),
            Customer: stripe.String(custID),
        },
    )
    if err != nil {
        return "", err
    }
    return intent.ID, nil
}

Testing this hits the real Stripe API. That’s slow, requires live credentials, and in production mode charges actual money. The problem is that s.client is a *stripe.Client from the SDK - there’s no way to swap it for a fake without introducing a seam.

The solution is to introduce an interface that describes what you need:

type PaymentIntentCreator interface {  // (1)
    Create(
        context.Context, *stripe.PaymentIntentCreateParams,
    ) (*stripe.PaymentIntent, error)
}

type Service struct {
    intents PaymentIntentCreator  // (2)
}

func (s *Service) ChargeCustomer(
    ctx context.Context, custID string, cents int64) (string, error) {
    intent, err := s.intents.Create(ctx,  // (3)
        &stripe.PaymentIntentCreateParams{
            Amount:   stripe.Int64(cents),
            Currency: stripe.String("usd"),
            Customer: stripe.String(custID),
        })
    if err != nil {
        return "", err
    }
    return intent.ID, nil
}

Here:

(1) the interface has one method matching what we need from the SDK
(2) the service holds the dependency as a field
(3) calls through the interface instead of the client directly

In production, inject the real Stripe service client:

func main() {
    sc := stripe.NewClient("sk_test_...")
    svc := &Service{intents: sc.V1PaymentIntents}  // (1)
    // ...
}

Here:

(1) sc.V1PaymentIntents satisfies PaymentIntentCreator (it has a Create method with the right signature)

In tests, you pass a fake that returns canned values:

type fakeIntents struct {
    id string  // (1)
}

func (f *fakeIntents) Create(
    context.Context, *stripe.PaymentIntentCreateParams,
) (*stripe.PaymentIntent, error) {
    return &stripe.PaymentIntent{ID: f.id}, nil  // (2)
}

func TestChargeCustomer(t *testing.T) {
    fake := &fakeIntents{id: "pi_123"}  // (3)
    svc := &Service{intents: fake}
    id, _ := svc.ChargeCustomer(t.Context(), "cus_abc", 5000)
    // assert id == "pi_123"
}

Here:

(1) the fake struct holds the canned return value
(2) returns whatever you configured instead of calling Stripe
(3) configure the fake with the expected payment intent ID

The service doesn’t know or care whether it’s talking to Stripe or a test fake. This is the most common mocking pattern in Go - define an interface for your dependency, accept it in your constructor, and swap implementations at runtime.

But what happens when the SDK surface area is huge and your code only needs one operation? That’s where the next pattern comes in.

Consumer-side interface segregation

The previous pattern works well when you control the interface. But AWS SDK clients have dozens of methods. The DynamoDB client has over 40 operations - GetItem, PutItem, Query, Scan, BatchGetItem, and so on. If you write tests against a dependency that exposes the whole surface area, your fakes become annoying fast.

The solution is to define a minimal interface on the consumer side:

type itemGetter interface {  // (1)
    GetItem(context.Context, *dynamodb.GetItemInput,
        ...func(*dynamodb.Options)) (*dynamodb.GetItemOutput, error)
}

func GetUserByID(
    ctx context.Context, client itemGetter, id string) (*User, error) {
    out, err := client.GetItem(ctx, &dynamodb.GetItemInput{  // (2)
        TableName: aws.String("users"),
        Key: map[string]types.AttributeValue{
            "pk": &types.AttributeValueMemberS{Value: id},
        },
    })
    // ...
}

Here:

(1) the interface has exactly one method - just what this function needs
(2) accept the minimal interface and call through it

In production, pass the real DynamoDB client - it satisfies itemGetter because it has a GetItem method. Go interfaces are satisfied implicitly:

func main() {
    client := dynamodb.NewFromConfig(cfg)
    user, err := GetUserByID(ctx, client, "user-123")  // (1)
    // ...
}

Here:

(1) the real client satisfies itemGetter automatically - no adapter or wrapper needed thanks to implicit interface satisfaction

In tests, you only implement the one method you need:

type fakeItemGetter struct {
    item map[string]types.AttributeValue  // (1)
}

func (f *fakeItemGetter) GetItem(context.Context, *dynamodb.GetItemInput,
    ...func(*dynamodb.Options)) (*dynamodb.GetItemOutput, error) {
    return &dynamodb.GetItemOutput{Item: f.item}, nil  // (2)
}

func TestGetUserByID(t *testing.T) {
    fake := &fakeItemGetter{
        item: map[string]types.AttributeValue{
            "email": &types.AttributeValueMemberS{Value: "[email protected]"},
        },
    }
    user, _ := GetUserByID(t.Context(), fake, "u-1")  // (3)
    // assert user.Email == "[email protected]"
}

Here:

(1) the fake struct holds the canned response data
(2) returns the configured item - no network call
(3) pass the fake to the function under test

This is the Interface Segregation Principle in action - clients shouldn’t be forced to depend on methods they don’t use.

But this approach has limits. If you have 20 functions each using different DynamoDB operations, you’d end up with 20 tiny interfaces. And sometimes you’re stuck with a preexisting interface type that has more methods than you want. That’s where struct embedding helps.

Struct embedding for partial implementation

Sometimes you can’t define your own minimal interface. Maybe a library insists on a specific interface type, and it’s bigger than what your test cares about.

The AWS SDK v2’s S3 upload manager is a good example. manager.NewUploader takes a client interface that supports both single-part uploads and multipart uploads. If your test is exercising the single-part path and you only want to intercept PutObject, implementing the multipart methods just to satisfy the interface is pure busywork.

Go’s struct embedding provides an escape hatch. Here’s the production code:

func UploadReport(
    ctx context.Context, client manager.UploadAPIClient,  // (1)
    bucket, key string, body io.Reader,
) error {
    up := manager.NewUploader(client)
    _, err := up.Upload(ctx, &s3.PutObjectInput{
        Bucket: aws.String(bucket),
        Key:    aws.String(key),
        Body:   body,
    })
    return err
}

Here:

(1) accepts the SDK’s UploadAPIClient interface - a large interface with many methods

In tests, embed the interface in your fake and override only what you need:

type fakeS3 struct {
    manager.UploadAPIClient  // (1)
    gotKey  string
    gotBody []byte
}

func (f *fakeS3) PutObject(
    _ context.Context, in *s3.PutObjectInput, _ ...func(*s3.Options),
) (*s3.PutObjectOutput, error) {
    if in.Key != nil {
        f.gotKey = *in.Key  // (2)
    }
    if in.Body != nil {
        f.gotBody, _ = io.ReadAll(in.Body)
    }
    return &s3.PutObjectOutput{}, nil
}

func TestUploadReport(t *testing.T) {
    fake := &fakeS3{}
    err := UploadReport(
        t.Context(),
        fake,  // (3)
        "my-bucket",
        "reports/q1.csv",
        bytes.NewReader([]byte("hi")),  // (4)
    )
    if err != nil {
        t.Fatal(err)
    }
    if fake.gotKey != "reports/q1.csv" {
        t.Errorf("got %q, want %q", fake.gotKey, "reports/q1.csv")
    }
}

Here:

(1) embedding the interface satisfies the full interface at compile time
(2) capture what you care about - only implement what this test needs
(3) pass the fake to code that expects the full UploadAPIClient interface
(4) use a small body so the upload manager takes the single PutObject path

The embedded interface value is nil, so any method you don’t override will panic if called. This is a feature, not a bug. If your code accidentally triggers multipart and calls CreateMultipartUpload, the test crashes immediately, and you learn that your test setup (or your assumptions) are wrong.

Function type as interface

For interfaces with a single method, there’s an even more compact approach. Say you have middleware that validates authentication tokens:

type ctxKey string

const userIDKey ctxKey = "userID"

type TokenValidator interface {  // (1)
    Validate(token string) (userID string, err error)
}

func RequireAuth(v TokenValidator, next http.Handler) http.Handler {  // (2)
    fn := func(w http.ResponseWriter, r *http.Request) {
        userID, err := v.Validate(r.Header.Get("Authorization"))
        if err != nil {
            http.Error(w, "unauthorized", 401)
            return
        }
        ctx := context.WithValue(r.Context(), userIDKey, userID)
        next.ServeHTTP(w, r.WithContext(ctx))
    }
    return http.HandlerFunc(fn)
}

Here:

(1) a single-method interface - the perfect candidate for a function type adapter
(2) the middleware accepts the interface as a dependency

You could write a fake struct with a Validate method, but Go lets you define a function type that satisfies the interface:

type TokenValidatorFunc func(string) (string, error)  // (1)

func (f TokenValidatorFunc) Validate(token string) (string, error) {
    return f(token)  // (2)
}

Here:

(1) define a function type with the right signature
(2) add a method that just calls the function itself

This is the same pattern the standard library uses with http.HandlerFunc. Now tests can pass inline functions:

func TestRequireAuth(t *testing.T) {
    v := TokenValidatorFunc(func(token string) (string, error) {
        if token == "Bearer valid" {
            return "user-123", nil  // (1)
        }
        return "", errors.New("invalid")
    })
    next := http.HandlerFunc(func(http.ResponseWriter, *http.Request) {})
    handler := RequireAuth(v, next)  // (2)

    req := httptest.NewRequest("GET", "/protected", nil)
    req.Header.Set("Authorization", "Bearer valid")
    rec := httptest.NewRecorder()
    handler.ServeHTTP(rec, req)
    // assert rec.Code == http.StatusOK
}

Here:

(1) return a known user ID for a valid token
(2) the middleware accepts it as a TokenValidator interface

No extra struct definitions cluttering up your test file.

Mocking HTTP calls

When your code makes HTTP requests to external services, the net/http/httptest package provides a test server that runs on localhost. Say you have a client that fetches exchange rates:

func (c *Client) GetRate(from, to string) (float64, error) {
    url := c.baseURL + "/latest?base=" + from + "&symbols=" + to
    resp, err := c.httpClient.Get(url)
    if err != nil {
        return 0, err
    }
    defer resp.Body.Close()
    // decode JSON, return rate...
}

In production, c.baseURL points to the real API. Testing against it is problematic - it’s slow, requires credentials, returns different values each time, and might rate-limit your CI.

The httptest.Server spins up a real HTTP server on localhost:

func TestGetRate(t *testing.T) {
    srv := httptest.NewServer(http.HandlerFunc(  // (1)
        func(w http.ResponseWriter, r *http.Request) {
            fmt.Fprint(w, `{"base":"USD","rates":{"EUR":0.92}}`)
        },
    ))
    defer srv.Close()  // (2)

    client := NewClient(srv.URL, "key")  // (3)
    rate, _ := client.GetRate("USD", "EUR")

    // assert rate == 0.92
}

Here:

(1) spin up a local HTTP server with a handler that returns canned JSON
(2) shut down the server when the test finishes
(3) point your client at srv.URL instead of the real API

Your code makes real HTTP calls over TCP, but they never leave the machine. You can return different responses for different scenarios - rate limits, malformed JSON, network errors - whatever you need to test.

Mocking time

This is essentially the same technique as Mocking a function - we’re just applying it to time.Now. Code that depends on the current time is tricky to test:

func IsExpired(expiresAt time.Time) bool {
    return time.Now().After(expiresAt)
}

Every call to time.Now() returns a different value. You can’t write a reliable test because the result depends on when the test runs.

Make the clock injectable:

type Clock func() time.Time  // (1)

func IsExpired(expiresAt time.Time, clock Clock) bool {  // (2)
    return clock().After(expiresAt)
}

Here:

(1) define a function type for getting the current time
(2) accept it as a parameter

In production, pass time.Now:

func main() {
    expired := IsExpired(token.ExpiresAt, time.Now)  // (1)
    // ...
}

Here:

(1) pass the real time.Now function - it satisfies the Clock type

In tests, pass a function that returns a fixed time:

func TestIsExpired(t *testing.T) {
    expiry := time.Date(2025, 6, 15, 12, 0, 0, 0, time.UTC)

    before := func() time.Time { return expiry.Add(-time.Hour) }  // (1)
    after := func() time.Time { return expiry.Add(time.Hour) }   // (2)

    if IsExpired(expiry, before) {
        t.Error("should not be expired")
    }
    if !IsExpired(expiry, after) {
        t.Error("should be expired")
    }
}

Here:

(1) a clock that returns one hour before expiry
(2) a clock that returns one hour after expiry

For code that uses time.Sleep, timers, or tickers, Go 1.25’s testing/synctest provides a fake clock that advances automatically when goroutines in the bubble are durably blocked:

func TestPeriodicFlush(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {  // (1)
        count := 0
        go func() {
            ticker := time.NewTicker(10 * time.Second)  // (2)
            defer ticker.Stop()
            for range ticker.C {
                count++
                if count >= 3 {
                    return
                }
            }
        }()
        time.Sleep(35 * time.Second)  // (3)
        synctest.Wait()               // (4)
        // assert count == 3
    })
}

Here:

(1) synctest.Test runs the function in an isolated bubble with fake time starting at 2000-01-01
(2) the ticker uses fake time - no real 10-second waits
(3) time.Sleep inside the bubble uses fake time; time advances when goroutines are durably blocked, so this returns instantly after the ticker fires 3 times
(4) synctest.Wait is a synchronization point; it blocks until the other goroutines in the bubble are durably blocked or finished

Inside synctest.Test, the framework intercepts time operations. The test completes instantly rather than waiting for real time to pass.

Closing words

These are the most common ones where I typically avoid opting for mocking libraries. But there are cases when I still like to generate mocks for an interface. One example that comes to mind is testing gRPC servers. I’m sure I’m forgetting some other cases where I regularly use mocking libraries.

The point is not to discourage the use of mocking libraries or to make a general statement that “all mocking libraries are bad.” It’s that these mocking libraries have costs associated with them. Code generation is fun, but it’s one extra step that you have to teach someone who’s onboarding to your codebase.

Also, if you’re using LLMs to generate tests, you may want to write some tests manually to give the tool a sense of how you want your tests written, so it doesn’t pull in the universe just to mock something that can be mocked natively using Go constructs.

For more on why handwritten fakes often beat generated mocks, see Test state, not interactions.

Tap compare testing for service migration

Sat, 13 Dec 2025 00:00:00 +0000

Throughout the years, I’ve been part of a few medium- to large-scale system migrations. As in, rewriting old logic in a new language or stack. The goal is usually better scalability, resilience, and maintainability, or more flexibility to adapt to changing requirements. Now, whether rewriting your system is the right move is its own debate.

A common question that shows up during a migration is, “How do we make sure the new system behaves exactly like the old one, minus the icky parts?” Another one is, “How do we build the new system while the old one keeps changing without disrupting the business?”

There’s no universal playbook. It depends on how gnarly the old system is, how ambitious the new system is, and how much risk the business can stomach. After going through a few of these migrations, I realized one approach keeps showing up. So I’ll expand on it here.

The idea is that you shadow a slice of production traffic to the new system. The old system keeps serving real users. A copy of that same traffic is forwarded to the new system along with the old system’s response. The new system runs the same business logic and compares its outputs with the old one. The entire point is to make the new system return the exact same answer the old one would have, for the same inputs and the same state.

At the start, you don’t rip out bad behavior or ship new features. Everything is about output parity. Once the systems line up and the new one has processed enough real traffic to earn some trust, you start sending actual user traffic to it. If something blows up, you roll back. If it behaves as expected, you push more traffic. Eventually the old system gets to ride off into the sunset.

This workflow is typically known as shadow testing or tap and compare testing.

The scenario

Say we have a Python service with a handful of read and write endpoints the business depends on. It’s been around for a while, and different teams have patched it over the years. Some of the logic does what it does for reasons nobody remembers anymore. It still works, but it’s getting harder to maintain. Also, the business wants a tighter SLO. So the team decides to rewrite it in Go.

To keep the scope tight, I’m only talking about HTTP read and write endpoints on the main request path. The same applies to gRPC, minus the transport details. I’m ignoring everything else: message queues, background workers, async job processing, analytics pipelines, and other side channels that also need migrating.

During shadow testing, the Python service stays on the main request path. All real user traffic still goes to the Python service. A proxy or load balancer sitting in front of it forwards requests as usual, gets an answer back, and returns that answer to the user.

That same proxy also emits tap events. Each tap event contains a copy of the request and the canonical response the Python service sent to the user. Those tap events go to the Go service on a shadow path. From the outside world, nothing has changed. Clients talk to Python, and Python talks to the live production database.

The Go service never serves real users during this phase. It only sees tap events. For each event, it reconstructs the request, runs its version of the logic against a separate datastore, and compares its outputs with the Python response recorded in the event. The Python response is always the source of truth.

The Go service has its own datastore, usually a snapshot or replica of production that’s been detached so it can be written freely. This is the sister datastore. The Go service only talks to it for reads and writes. It never touches the real production DB. The sister datastore is close enough to show real-world behavior but isolated enough that nothing breaks.

With this setup in place, you spend time fixing differences. If the Python service returns a specific payload shape or some quirky value, the Go service has to match it. If Python gets a bug fix or a new feature, you update Go. You keep doing this until shadow traffic stops producing mismatches. Then you start thinking about cutover.

Start with read endpoints

Reads don’t change anything in the database, so they are easier to start with.

On the main path, a user sends a request. The proxy forwards it to the Python service as usual. The Python service reads from the real database, builds a response, and returns it to the caller.

While that is happening, the proxy also constructs a tap event. At minimum, this event contains:

The original request: method, URL, headers, body.
The canonical Python response: status code, headers, body.

The proxy sends this tap event to the Go service via an internal HTTP or RPC endpoint. Alternatively, it can publish the event to a Kafka stream, where a consumer eventually forwards it to the internal tap endpoint.

The important thing is that the tap event captures the exact input and output of the Python service as seen by the real user.

A typical read path diagram during tap compare looks like this:

graph TD subgraph MAIN_PATH [MAIN PATH] User([User]) --> Proxy Proxy --> Python Python <-- reads production state --> ProdDB[(Prod DB)] end subgraph SHADOW_PATH [SHADOW PATH] Proxy -- "tap event: {request, python_resp}" --> Go Go <--> SisterDB[(Sister DB)] Go --> Log[Log mismatch?] end classDef userStyle fill:#6b7280,stroke:#4b5563,color:#fff classDef proxyStyle fill:#7c3aed,stroke:#5b21b6,color:#fff classDef pythonStyle fill:#2563eb,stroke:#1d4ed8,color:#fff classDef goStyle fill:#0d9488,stroke:#0f766e,color:#fff classDef dbStyle fill:#ca8a04,stroke:#a16207,color:#fff classDef logStyle fill:#dc2626,stroke:#b91c1c,color:#fff class User userStyle class Proxy proxyStyle class Python pythonStyle class Go goStyle class ProdDB,SisterDB dbStyle class Log logStyle

From the Go service’s point of view, a tap event is just structured data. A simple shape might look like this on the wire:

{
  "request": {
    "method": "GET",
    "url": "/users/123?verbose=true",
    "headers": { "...": ["..."] },
    "body": "..."
  },
  "python_response": {
    "status": 200,
    "headers": { "...": ["..."] },
    "body": "{ \"id\": \"123\", \"name\": \"Alice\" }"
  }
}

The Go side reconstructs the request, runs its own logic against the sister datastore, and compares its answer with python_response. No extra call back into Python. No race between a second read and the response that already went to the user.

On the Go side, a handler for a read tap event might look like this:

type TapRequest struct {
    Method  string              `json:"method"`
    URL     string              `json:"url"`
    Headers map[string][]string `json:"headers"`
    Body    []byte              `json:"body"`
}

type TapResponse struct {
    Status  int                 `json:"status"`
    Headers map[string][]string `json:"headers"`
    Body    []byte              `json:"body"`
}

type TapEvent struct {
    Request        TapRequest  `json:"request"`
    PythonResponse TapResponse `json:"python_response"`
}

func TapHandleGetUser(w http.ResponseWriter, r *http.Request) {
    // This endpoint is internal only.
    // It receives tap events from the proxy, not real user traffic.

    var tap TapEvent
    if err := json.NewDecoder(r.Body).Decode(&tap); err != nil {
        http.Error(w, "bad tap payload", http.StatusBadRequest)
        return
    }

    // Rebuild something close to the original HTTP request.
    reqURL, err := url.Parse(tap.Request.URL)
    if err != nil {
        http.Error(w, "bad url", http.StatusBadRequest)
        return
    }

    // Body is a one-shot stream, so buffer it for reuse.
    bodyBytes := append([]byte(nil), tap.Request.Body...)

    goReq := &http.Request{
        Method: tap.Request.Method,
        URL:    reqURL,
        Header: http.Header(tap.Request.Headers),
        Body:   io.NopCloser(bytes.NewReader(bodyBytes)),
    }

    // Go service: run candidate logic against sister datastore.
    goResp, goErr := goUserService.GetUser(r.Context(), goReq)
    if goErr != nil {
        log.Printf("go candidate error: %v", goErr)
    }

    // Normalize and compare off the main response path.
    // The real user already got python_response.
    go func() {
        normalizedPython := normalizeHTTP(tap.PythonResponse)
        normalizedGo := normalizeHTTP(goResp)

        if !deepEqual(normalizedPython, normalizedGo) {
            log.Printf(
                "read mismatch: url=%s python=%v go=%v",
                tap.Request.URL,
                normalizedPython,
                normalizedGo,
            )
        }
    }()

    // Optional debugging response for whoever is calling the tap
    // endpoint.
    w.WriteHeader(http.StatusNoContent)
}

A few things to notice:

Truth lives with the Python response that already went to the user.
The Go service sees exactly the same request the Python service saw.
Comparison happens off the user path. Users never wait on the Go service.
The Go service only touches the sister datastore, never the real one.
The tap handler doesn’t return any payload. It just compares service outputs and emits logs.

When the read diffs drop to zero (or near zero) against live traffic, you can trust the Go implementation matches the Python one.

Write endpoints are trickier

Write endpoints change state, so they are harder to migrate.

On the main path, only the Python service is allowed to mutate production state.

A typical write looks like this on the main path:

User sends a write request.
Proxy forwards it to the Python service.
Python runs the real write logic, talks to the live database, sends emails, charges cards, and returns a response.
Proxy returns that response to the user.

That path is the only one touching production. The Go service must not:

write anything to the real production database
trigger real external side effects
call any real Python write endpoint in a way that causes a second write

For writes, the tap event pushed by the proxy looks quite similar to reads:

{
  "request": {
    "method": "POST",
    "url": "/users",
    "headers": { "...": ["..."] },
    "body": "{ \"email\": \"[email protected]\", \"name\": \"Alice\" }"
  },
  "python_response": {
    "status": 201,
    "headers": { "...": ["..."] },
    "body": "{ \"id\": \"123\", \"email\": \"[email protected]\" }"
  }
}

The write path diagram during tap compare becomes:

graph TD subgraph MAIN_PATH [MAIN PATH] User([User]) --> Proxy Proxy --> Python Python <-- writes prod state, triggers side effects --> ProdDB[(Prod DB)] end subgraph SHADOW_PATH [SHADOW PATH] Proxy -- "tap event: {request, python_resp}" --> Go Go <--> SisterDB[(Sister DB)] Go --> Log[Log mismatch?] end classDef userStyle fill:#6b7280,stroke:#4b5563,color:#fff classDef proxyStyle fill:#7c3aed,stroke:#5b21b6,color:#fff classDef pythonStyle fill:#2563eb,stroke:#1d4ed8,color:#fff classDef goStyle fill:#0d9488,stroke:#0f766e,color:#fff classDef dbStyle fill:#ca8a04,stroke:#a16207,color:#fff classDef logStyle fill:#dc2626,stroke:#b91c1c,color:#fff class User userStyle class Proxy proxyStyle class Python pythonStyle class Go goStyle class ProdDB,SisterDB dbStyle class Log logStyle

On the Go side, the write tap handler follows the same pattern as reads but has more corner cases to think through.

A shadow write handler might look like this:

type UserInput struct {
    Email string `json:"email"`
    Name  string `json:"name"`
    // ... other fields
}

type User struct {
    ID        string    `json:"id"`
    Email     string    `json:"email"`
    Name      string    `json:"name"`
    CreatedAt time.Time `json:"created_at"`
    // ... other fields
}

func TapHandleCreateUser(w http.ResponseWriter, r *http.Request) {
    // Internal only. Receives tap events for CreateUser.

    var tap TapEvent
    if err := json.NewDecoder(r.Body).Decode(&tap); err != nil {
        http.Error(w, "bad tap payload", http.StatusBadRequest)
        return
    }

    // Decode the original request body once.
    var input UserInput
    if err := json.Unmarshal(tap.Request.Body, &input); err != nil {
        log.Printf("bad original json: %v", err)
        return
    }

    // The Python response is canonical: this is what the real user saw.
    pyUser, err := decodePythonUser(tap.PythonResponse)
    if err != nil {
        log.Printf("bad python response: %v", err)
        return
    }

    // Run the Go write path against the sister datastore.
    // This must never talk to the live production DB.
    goUser, goErr := goUserService.CreateUserInSisterStore(
        r.Context(), input,
    )
    if goErr != nil {
        log.Printf("go candidate write error: %v", goErr)
    }

    // Compare results asynchronously.
    go func() {
        normalizedPython := normalizeUser(pyUser)
        normalizedGo := normalizeUser(goUser)

        if !compareUsers(normalizedPython, normalizedGo) {
            log.Printf(
                "write mismatch: email=%s python=%v go=%v",
                normalizedPython.Email,
                normalizedPython,
                normalizedGo,
            )
        }
    }()

    w.WriteHeader(http.StatusNoContent)
}

You are comparing how each system transforms the same request into a domain object and response. You are not trying to drive the Python service a second time. You are not trying to rebuild the Python result from scratch against changed state.

But with this setup, the write path has several corner cases to think through.

Uniqueness, validation, and state-dependent logic

Uniqueness checks, conditional updates, and other validations that depend on database state are sensitive to timing. The Python write runs against the actual production state at the moment the main request hits. The Go write runs against whatever state exists in the sister datastore when the tap event arrives.

If the sister datastore is a snapshot that is not continuously replicated, it will drift almost immediately. Even with streaming replication, there may be short lags. That means:

A create request that was valid in prod might look invalid against a slightly stale snapshot if another request changed state in between.
A conditional update like “only update if version is X” can take different branches if the sister store has not applied the latest change yet.
A multi-entity invariant that Python enforced with a transaction might appear broken in the sister store if replication replayed statements in a different order relative to the tap event.

You should expect some write comparisons to be noisy because of state drift and treat those separately. In practice you often:

Keep replication as close to real time as you can, or regularly reseed the sister datastore.
Attach a few state fingerprints to the tap event, like the version of the row before and after the write, so you can tell when the sister store is simply behind.
Filter out mismatches that can be traced to obvious replication lag when you look at diff reports.

The important thing is: when you see a mismatch, you can decide whether it is a real logic difference or just the sister store living in a slightly different universe for that request.

Idempotency, retries, and ordering

Real systems don’t get one clean write per user action. You get retries, duplicates, and concurrent updates.

On the main path, you might have:

A user hitting “submit” twice.
A client retrying on a network timeout.
Two services racing to update the same record.

Your Python service probably already has a story for this, such as idempotency keys, version checks, or last-write-wins semantics. The tap path needs to reflect what actually happened, not an idealized story.

Because the tap event is constructed from the real request and real response at the proxy, it naturally honors whatever the Python service did. If a retry was coalesced into a single write under an idempotency key, you will see a single successful response in the tap stream. If the second retry was rejected as a conflict, you will see that error. The Go service just needs to implement the same semantics against the sister datastore.

What still bites you is ordering. Tap events may arrive at the Go service a little out of order relative to how mutations hit production. If two writes race, Python might process them in order A, B while the tap messages arrive as B, A. The sister datastore will then experience a different sequence of state changes than production did, which can yield legitimate differences in final state.

You can’t fully eliminate this. What you can do is:

Keep tap delivery low latency and best-effort ordered.
Focus your comparisons more on single-request behavior (did CreateUser behave the same) than on multi-request history until you are comfortable with the noise.
Use version numbers or timestamps in the domain model to detect when the sister store is applying changes in a different order, and treat those as “not comparable” rather than bugs.

External side effects

Writes often have external side effects: emails, payment gateways, cache invalidations, search indexing, analytics.

The tap path isolates database writes by using the sister datastore, but that is not enough on its own. You have to run the Go service in a mode where those side-effectful calls are either disabled or mocked.

The usual pattern is:

Centralize side-effectful behavior behind interfaces or specific modules.
In normal production mode, those modules call real providers.
In tap compare mode, they are wired to no-op or record-only implementations.

You want the code paths that decide “should we send a welcome email” or “should we charge this card” to run, because they influence the domain model and response shape. You don’t want the actual email to go out or the real payment provider to be hit twice.

On the Python side, you don’t need dry runs or special write endpoints. The real main path already did the work, and the tap event gives you the results. The only thing the Python service might need for tap compare is a dedicated read endpoint that returns a normalized view of state if you want to sample post-write state directly. That read endpoint must not cause extra writes or side effects.

What tap compare can and can’t tell you on writes

It tells you:

For a given real user request and the production state that existed at that moment, what the Python service chose to return.
Whether your Go service, running against a similar but separate view of state, tends to produce the same shape and content of domain objects and responses.
Whether your Go write path can execute at all against realistic traffic without panicking or tripping over obvious logic errors.

It doesn’t guarantee:

That the Go service produces exactly the same side effects in exactly the same order as the Python service. External systems and replication noise get in the way.
That the Go service behaves identically under arbitrary concurrent write histories. You saw the histories that actually happened during the tap window, which might miss some edge case interleavings.
That all mismatches are bugs. Some will be explained by replication lag, idempotency behavior, or intended fixes.

The right way to think about it is: tap compare lets you align the new system with the old one for the traffic you actually have, under the state and timing conditions you actually experienced. It shrinks the unknowns before you put the new system in front of real users.

From tap handlers to production handlers

The Tap* handlers are test-only. They will never be promoted to production. They exist to validate the domain logic, not to serve users. The 204 No Content response makes this clear.

Here’s how the pieces fit together:

Core domain logic: methods on goUserService that take a context and input, return a response. This is the code you’re actually testing.
Tap handlers: call the domain logic, compare against the Python response, discard the result. Pure validation.
Production handlers: call the same domain logic and write real HTTP responses. This is what users hit after cutover.

Both tap and production handlers call the same domain logic. The difference is what happens to the result. Tap handlers compare and throw away. Production handlers serialize and return.

A production handler might look like this:

func HandleGetUser(w http.ResponseWriter, r *http.Request) {
    resp, err := goUserService.GetUser(r.Context(), r)
    if err != nil {
        writeError(w, err)
        return
    }
    writeHTTP(w, resp)
}

During tap compare, TapHandleGetUser feeds the same inputs into goUserService.GetUser and compares resp against the Python response. Meanwhile, HandleGetUser exists but isn’t on the main path yet. It might serve staging traffic or a canary behind a flag.

Once the diffs drop to zero, you have evidence goUserService.GetUser works correctly. At that point, you route real traffic to HandleGetUser. The domain logic has already been validated. The production handler just wires it to HTTP.

Once the production handlers have started to serve real traffic, you can remvove the tap tests:

Delete the tap handlers. The Tap* prefix makes them easy to find.
Remove tap-only wiring. Strip out comparison code and sister-datastore plumbing.
Point domain logic at the real datastore. Flip a config or swap the write path.
Flip the proxy. Route traffic to HandleGetUser and HandleCreateUser.
Optionally keep a thin tap path. Mirror a small slice of traffic for extra safety.

Tap compare is scaffolding. Once you trust the domain logic, you throw it away and let the production handlers take over.

Other risks and pitfalls

A few things worth calling out beyond what the write section already covers:

Logging and privacy: Dumping full requests and responses on every mismatch is a good way to leak user data. If this is relevant in your case, log IDs and fingerprints, not full payloads.
Non-deterministic data: Auto-incremented IDs diverge, timestamps can differ by milliseconds, 10.0 vs 10 doesn’t matter. Normalize or ignore these fields.
Bug compatibility: The Python code has bugs. The Go code may fix them, which shows up as a mismatch. Sometimes you replicate the bug to keep the migration low-risk, then fix it later once the new system is live.
Cost and blast radius: Shadowing production traffic is expensive. Plan for the extra load so the tap path doesn’t degrade the main path.

Parting words

Typically, you don’t have to build all the plumbing by hand. Proxies like Envoy, NGINX, and HAProxy, or a service mesh like Istio, can help you mirror traffic, capture tap events, and feed them into a shadow service. I left out tool-specific workflows so that the core concept doesn’t get obscured.

Tap compare doesn’t remove all the risk from a migration, but it moves a lot of it into a place you can see: mismatched payloads, noisy writes, and gaps in business logic. Once those are understood, switching over to the new service is less of a big bang and more of a boring configuration change, followed by trimming a pile of Tap* code you no longer need.

Splintered failure modes in Go

Sun, 30 Nov 2025 00:00:00 +0000

A man with a watch knows what time it is. A man with two watches is never sure.

– Segal’s Law

Take this example:

func validate(input string) (bool, error) {
    // Validation check 1
    if input == "" {
        return false, nil
    }
    // Validation check 2
    if isCorrupted(input) {
        return false, nil
    }
    // System check
    if err := checkUpstream(); err != nil {
        return false, err
    }

    return true, nil
}

This function returns two signals: a boolean to indicate if the string is valid, and an error to explain any problem the function might run into.

The issue is that these two signals are independent. Put together, they produce four possible combinations:

true, nil: The input is valid and the function encountered no issues. This is the only obvious mode.
false, nil: Implies the function didn’t hit a system error but the input was invalid. However, in many codebases, this combination is accidentally used to hide real errors that were swallowed.
true, err: A contradiction. The function claims success and failure at the same time.
false, err: Looks like a clean failure, but it creates a priority trap. The Go convention dictates you must check the error first. If a caller checks the boolean first, they might see false and treat a major system crash as a simple validation failure.

In this specific case, we never return true, err, but the caller doesn’t know that. They have to read the code to understand which subset of the possible combinations the function actually uses.

Splintered failure modes

For lack of a better term, I call this splintered failure modes. It is one of the cases that the adage make illegal state unrepresentable aims to prevent.

In our case, validate encodes the success/failure state in two places. These two signals can disagree. The boolean tries to express validity, and the error tries to express system failure, yet both attempt to answer the same question: did this succeed?

When combinations like false, nil or true, err appear, the caller needs to know how to reconcile the conflicting states.

Represent failure modes exclusively via the error

We fix the ambiguity by removing the boolean status flag entirely.

In this refactored version, the error assumes total responsibility for the function’s state (success vs. failure). The first return value becomes purely the payload.

The caller checks one place and one place only: the error.

// We return the data (string), not a flag (bool)
func validate(input string) (string, error) {
    if input == "" {
        return "", fmt.Errorf("input cannot be empty")
    }
    if isCorrupted(input) {
        return "", fmt.Errorf("input is corrupted")
    }
    if err := checkUpstream(); err != nil {
        return "", err
    }

    // If we are here, the data is valid
    return input, nil
}

This makes the call site trivial because the state is no longer split. If the error is non-nil, the operation failed. If it is nil, the operation succeeded.

Distinguishing failure types within the error

Sometimes the caller of a function needs to take different actions depending on the type of an error. In that case, just knowing whether a function succeeded or failed isn’t enough.

Removing the boolean removes the ambiguity, but it introduces a new question: How do we distinguish between “validation error” and “system failure”?

Previously, the boolean represented validation outcome (valid/invalid), and the error represented the system failures (crash/upstream). Now that we have consolidated everything into error, we need a way to differentiate the kind of failure without re-introducing a second return value.

Sentinel errors

We can use sentinel errors to encode multiple failure modes into one error variable. The error return value remains the single source of truth for “did it fail?”, but the content of that error tells us “how it failed.”

var (
    // Domain/Logic failures
    ErrEmpty     = errors.New("input cannot be empty")
    ErrCorrupted = errors.New("input is corrupted")

    // System/Mechanical failures
    ErrSystem    = errors.New("system failure")
)

func validate(input string) (string, error) {
    if input == "" {
        return "", ErrEmpty
    }
    if isCorrupted(input) {
        return "", ErrCorrupted
    }
    if err := checkUpstream(); err != nil {
        // We could return err directly, or wrap it
        return "", ErrSystem
    }
    return input, nil
}

We have unified the failure state (it is always just an error), but we haven’t lost the granularity. The caller can now use errors.Is to switch between the failure modes:

val, err := validate(userData)
if err != nil {
    switch {
    case errors.Is(err, ErrEmpty):
        // Handle logic failure 1 (e.g. prompt user)
        return
    case errors.Is(err, ErrCorrupted):
        // Handle logic failure 2 (e.g. reject payload)
        return
    case errors.Is(err, ErrSystem):
        // Handle system failure (e.g. alert ops team)
        log.Fatal(err)
    default:
        log.Fatal(err)
    }
}

Error types

If sentinels aren’t enough (for example, if you need to know which field failed validation), you can use error types. This allows the single error value to carry structured metadata while still adhering to the standard error interface.

Here, we map both “Empty” and “Corrupted” to a ValidationError type, while leaving system errors as standard errors.

type ValidationError struct {
    Field  string
    Reason string
}

func (e *ValidationError) Error() string {
    return fmt.Sprintf("invalid %s: %s", e.Field, e.Reason)
}

func validate(input string) (string, error) {
    if input == "" {
        return "", &ValidationError{Field: "input", Reason: "empty"}
    }
    if isCorrupted(input) {
        return "", &ValidationError{Field: "input", Reason: "corrupted"}
    }
    if err := checkUpstream(); err != nil {
        return "", err
    }
    return input, nil
}

The caller can then use errors.As to inspect the failure mode in detail:

val, err := validate(userData)
if err != nil {
    var vErr *ValidationError

    // Check if the error is a logical ValidationError
    if errors.As(err, &vErr) {
        fmt.Printf("Validation failed on %s: %s", vErr.Field, vErr.Reason)
        return
    }

    // If not, it is a system failure
    log.Fatal(err)
}

By sticking to the error value as the single indicator of failure, we eliminate the “two watches” paradox. Whether the failure is a simple validation error or a catastrophic system crash, all the failure modes are encapsulated inside the single error value itself.

Re-exec testing Go subprocesses

Sun, 16 Nov 2025 00:00:00 +0000

When testing Go code that spawns subprocesses, you usually have three options.

Run the real command. It invokes the actual binary that creates the subprocess and asserts against the output. However, that makes tests slow and tied to the environment. You have to make sure the same binary exists and behaves the same everywhere, which is harder than it sounds.

Fake it. Mock the subprocess to keep tests fast and isolated. The problem is that the fake version doesn’t behave like a real process. It won’t fail, write to stderr, or exit with a non-zero code. That makes it hard to trust the result, and over time the mock can drift away from what the real command actually does.

Re-exec. I discovered this neat trick while watching Mitchel Hashimoto’s Advanced Testing with Go talk. In fact, it originated in the stdlib os/exec test suite. With re-exec, your test binary spawns a new subprocess that runs itself again. Inside that subprocess, the code emulates the behavior of the real command. The parent process then interacts with this subprocess exactly as it would with a real command. In short:

The parent test process spawns the subprocess.
The subprocess emulates the behavior of the target command.
The parent process interacts with the emulated subprocess as if it were the real command.

This setup makes re-exec a middle ground between mocking and invoking the actual subprocess.

The first two paths are well-trodden, so let’s look closer at the third one. Here’s how it works:

The test re-launches itself with a special flag or environment variable to signal it’s running in “child” mode.
In this mode, it acts as the subprocess and can print output, write to stderr, or exit with any code you want. This subprocess basically emulates the real command’s subprocess.
The main test process then runs as usual and interacts with it just like it would with a real subprocess.

You still get a real subprocess, but the behavior of your original binary invocation is emulated inside it. So you don’t invoke the original command. Observe:

// /cmd/echo/main.go
package main

import (
    "os/exec"
)

// RunEcho executes the system "echo" command with the provided message
// and returns the command's output.
func RunEcho(msg string) (string, error) {
    cmd := exec.Command("echo", msg)
    out, err := cmd.Output()
    return string(out), err
}

RunEcho invokes the system’s echo binary with some argument and returns the output. Now let’s test it using the re-exec trick:

// /cmd/echo/main_test.go
package main

import (
    "fmt"
    "os"
    "os/exec"
    "testing"
)

// TestEchoHelper runs when the binary is re-executed with
// GO_WANT_HELPER_PROCESS=1. It prints its argument and exits,
// emulating "echo".
func TestEchoHelper(t *testing.T) {
    if os.Getenv("GO_WANT_HELPER_PROCESS") != "1" {
        return
    }
    fmt.Print(os.Args[len(os.Args)-1])
    os.Exit(0)
}

func TestRunEcho(t *testing.T) {
    // Spawn the same test binary as a subprocess instead of calling the
    // real "echo". This runs only the TestEchoHelper test in a subprocess
    // which emulates the behavior of "echo"
    cmd := exec.Command(
        os.Args[0],
        "-test.run=TestEchoHelper",
        "--",
        "hello",
    )
    cmd.Env = append(os.Environ(), "GO_WANT_HELPER_PROCESS=1")

    out, err := cmd.Output()
    if err != nil {
        t.Fatal(err)
    }
    if string(out) != "hello" {
        t.Fatalf("got %q, want %q", out, "hello")
    }
}

TestRunEcho creates a command that re-runs the same test binary (os.Args[0]) as a subprocess via the exec.Command. The -test.run=TestEchoHelper flag tells Go’s test runner to execute only the TestEchoHelper function inside that new process. The "--" marks the end of the test runner’s own flags, and everything after it ("hello") becomes an argument available to the helper process in os.Args.

When this subprocess starts, it sees that the environment variable GO_WANT_HELPER_PROCESS=1 is set. That tells it to behave like a helper instead of running the full test suite. The TestEchoHelper function then prints its last argument ("hello") to standard output and exits. In other words, we’re emulating echo inside TestEchoHelper. This part is intentionally kept simple, but you can do all kinds of things here to emulate the actual echo command. In real tests, this will also include different failure modes.

From the parent process’s perspective, it looks just like running /bin/echo hello, except everything is happening within the Go test binary. The subprocess is real, but its behavior is entirely controlled by the test.

You might find it strange that the actual RunEcho function isn’t called anywhere. That’s on purpose. The goal of this example is not to test production logic, but to show how to emulate and control subprocesses inside a test environment. The production function here doesn’t contain any logic beyond calling exec.Command, so there’s nothing meaningful to verify yet.

In real code, typically, you’d split subprocess management into two parts: one that spawns the process and another that handles its output and errors. The handler is where the bulk of your logic should live. This way, the subprocess handling code can be tested in isolation without having to tie it with a real subprocess.

Consider this example where the production code invokes the git switch mybranch command. The RunGitSwitch command calls the git binary with the appropriate arguments and passes the *exec.Cmd pointer to the handleGitSwitch function. This handler function has the bulk of the logic that interacts with the git subprocess.

// path: /cmd/git/main.go
package main

import (
    "os/exec"
)

// handleGitSwitch runs a command and returns its output and error.
func handleGitSwitch(cmd *exec.Cmd) (string, error) {
    out, err := cmd.CombinedOutput()
    return string(out), err
}

// RunGitSwitch constructs the subprocess to run "git switch".
func RunGitSwitch(branch string) (string, error) {
    cmd := exec.Command("git", "switch", branch)
    return handleGitSwitch(cmd)
}

And the corresponding test:

// path: /cmd/git/main_test.go
package main

import (
    "fmt"
    "os"
    "os/exec"
    "testing"
)

// TestGitSwitchHelper acts as the fake "git switch" subprocess.
func TestGitSwitchHelper(t *testing.T) {
    if os.Getenv("GO_WANT_HELPER_PROCESS") != "1" {
        return
    }
    // Emulate "git switch" output.
    fmt.Printf("Switched to branch '%s'\n", os.Args[len(os.Args)-1])
    os.Exit(0)
}

func TestGitSwitch(t *testing.T) {
    cmd := exec.Command(
        os.Args[0],
        "-test.run=TestGitSwitchHelper", "--", "feature-branch",
    )
    cmd.Env = append(os.Environ(), "GO_WANT_HELPER_PROCESS=1")

    // This time we're invoking the production handler.
    out, err := handleGitSwitch(cmd)
    if err != nil {
        t.Fatal(err)
    }

    want := "Switched to branch 'feature-branch'\n"
    if out != want {
        t.Fatalf("got %q, want %q", out, want)
    }
}

In this test, the subprocess behavior (git switch) is emulated by TestGitSwitchHelper. The helper prints predictable output that mimics the real command, but the subprocess itself is still a separate process spawned by the parent test.

What’s under test here is handleGitSwitch, which manages subprocess execution, reads its output, and handles errors. The subprocess is fake in behavior but real in execution, which means the I/O boundaries are still exercised.

This separation between subprocess creation and handling keeps tests focused and repeatable. You can emulate different subprocess outcomes, such as errors or unexpected output, while keeping the process interaction logic untouched.

Revisiting interface segregation in Go

Sat, 01 Nov 2025 00:00:00 +0000

Object-oriented (OO) patterns get a lot of flak in the Go community, and often for good reason.

Still, I’ve found that principles like SOLID, despite their OO origin, can be useful guides when thinking about design in Go.

Recently, while chatting with a few colleagues new to Go, I noticed that some of them had spontaneously rediscovered the Interface Segregation Principle (the “I” in SOLID) without even realizing it. The benefits were obvious, but without a shared vocabulary, it was harder to talk about and generalize the idea.

So I wanted to revisit ISP in the context of Go and show how small interfaces, implicit implementation, and consumer-defined contracts make interface segregation feel natural and lead to code that’s easier to test and maintain.

Clients should not be forced to depend on methods they do not use.

– Robert C. Martin (SOLID, interface segregation principle)

Or, put simply: your code shouldn’t accept anything it doesn’t use.

Consider this example:

type FileStorage struct{}

func (FileStorage) Save(data []byte) error {
    fmt.Println("Saving data to disk...")
    return nil
}

func (FileStorage) Load(id string) ([]byte, error) {
    fmt.Println("Loading data from disk...")
    return []byte("data"), nil
}

FileStorage has two methods: Save and Load. Now suppose you write a function that only needs to save data:

func Backup(fs FileStorage, data []byte) error {
    return fs.Save(data)
}

This works, but there are a few problems hiding here.

Backup takes a FileStorage directly, so it only works with that type. If you later want to back up to memory, a network location, or an encrypted store, you’ll need to rewrite the function. Because it depends on a concrete type, your tests have to use FileStorage too, which might involve disk I/O or other side effects you don’t want in unit tests. And from the function signature, it’s not obvious what part of FileStorage the function actually uses.

Instead of depending on a specific type, we can depend on an abstraction. In Go, you can achieve that through an interface. So let’s define one:

type Storage interface {
    Save(data []byte) error
    Load(id string) ([]byte, error)
}

Now Backup can take a Storage instead:

func Backup(store Storage, data []byte) error {
    return store.Save(data)
}

Backup now depends on behavior, not implementation. You can plug in anything that satisfies Storage, something that writes to disk, memory, or even a remote service. And FileStorage still works without any change.

You can also test it with a fake:

type FakeStorage struct{}

func (FakeStorage) Save(data []byte) error         { return nil }
func (FakeStorage) Load(id string) ([]byte, error) { return nil, nil }

func TestBackup(t *testing.T) {
    fake := FakeStorage{}
    err := Backup(fake, []byte("test-data"))
    if err != nil {
        t.Fatal(err)
    }
}

That’s a step forward. It fixes the coupling issue and makes the tests free of side effects. However, there’s still one issue: Backup only calls Save, yet the Storage interface includes both Save and Load. If Storage later gains more methods, every fake must grow too, even if those methods aren’t used. That’s exactly what the ISP warns against.

The above interface is too broad. So let’s narrow it to match what the function actually needs:

type Saver interface {
    Save(data []byte) error
}

Then update the function:

func Backup(s Saver, data []byte) error {
    return s.Save(data)
}

Now the intent is clear. Backup only depends on Save. A test double can just implement that one method:

type FakeSaver struct{}

func (FakeSaver) Save(data []byte) error { return nil }

func TestBackup(t *testing.T) {
    fake := FakeSaver{}
    err := Backup(fake, []byte("test-data"))
    if err != nil {
        t.Fatal(err)
    }
}

The original FileStorage still works fine:

fs := FileStorage{}
_ = Backup(fs, []byte("backup-data"))

Go’s implicit interface satisfaction makes this less ceremonious. Any type with a Save method automatically satisfies Saver.

This pattern reflects a broader Go convention: define small interfaces on the consumer side, close to the code that uses them. The consumer knows what subset of behavior it needs and can define a minimal contract for it. If you define the interface on the producer side instead, every consumer is forced to depend on that definition. A single change to the producer’s interface can ripple across your codebase unnecessarily.

From Go code review comments:

Go interfaces generally belong in the package that uses values of the interface type, not the package that implements those values. The implementing package should return concrete (usually pointer or struct) types: that way, new methods can be added to implementations without requiring extensive refactoring.

This isn’t a strict rule. The standard library defines producer-side interfaces like io.Reader and io.Writer, which is fine because they’re stable and general-purpose. But for application code, interfaces usually exist in only two places: production code and tests. Keeping them near the consumer reduces coupling between multiple packages and keeps the code easier to evolve.

You’ll see this same idea pop up all the time. Take the AWS SDK, for example. It’s tempting to define a big S3 client interface and use it everywhere:

type S3Client interface {
    PutObject(
        ctx context.Context,
        input *s3.PutObjectInput,
        opts ...func(*s3.Options)) (*s3.PutObjectOutput, error)

    GetObject(
        ctx context.Context,
        input *s3.GetObjectInput,
        opts ...func(*s3.Options)) (*s3.GetObjectOutput, error)

    ListObjectsV2(
        ctx context.Context,
        input *s3.ListObjectsV2Input,
        opts ...func(*s3.Options)) (*s3.ListObjectsV2Output, error)

    // ...and many more
}

Depending on such a large interface couples your code to far more than it uses. Any change or addition to this interface can ripple through your code and tests for no good reason.

For example, if your code uploads files, it only needs the PutObject method:

func UploadReport(
    ctx context.Context, client S3Client, data []byte,
) error {
    _, err := client.PutObject(
        ctx,
        &s3.PutObjectInput{
            Bucket: aws.String("reports"),
            Key:    aws.String("daily.csv"),
            Body:   bytes.NewReader(data),
        },
    )
    return err
}

But accepting the full S3Client here ties UploadReport to an interface that’s too broad. A fake must implement all the methods just to satisfy it.

It’s better to define a small, consumer-side interface that captures only the operations you need. This is exactly what the AWS SDK doc recommends for testing.

To support mocking, use Go interfaces instead of concrete service client, paginators, and waiter types, such as s3.Client. This allows your application to use patterns like dependency injection to test your application logic.

Similar to what we’ve seen before, you can define a single method interface:

type Uploader interface {
    PutObject(
        ctx context.Context,
        input *s3.PutObjectInput,
        opts ...func(*s3.Options)) (*s3.PutObjectOutput, error)
}

And then use it in the function:

func UploadReport(ctx context.Context, u Uploader, data []byte) error {
    _, err := u.PutObject(
        ctx,
        &s3.PutObjectInput{
            Bucket: aws.String("reports"),
            Key:    aws.String("daily.csv"),
            Body:   bytes.NewReader(data),
        },
    )
    return err
}

The intent is obvious: this function uploads data and depends only on PutObject. The fake for tests is now tiny:

type FakeUploader struct{}

func (FakeUploader) PutObject(
    _ context.Context,
    _ *s3.PutObjectInput,
    _ ...func(*s3.Options)) (*s3.PutObjectOutput, error) {
    return &s3.PutObjectOutput{}, nil
}

If we distill the workflow as a general rule of thumb, it’d look like this:

Insert a seam between two tightly coupled components by placing a consumer-side interface that exposes only the methods the caller invokes.

Fin!

Avoiding collisions in Go context keys

Wed, 22 Oct 2025 00:00:00 +0000

Along with propagating deadlines and cancellation signals, Go’s context package can also carry request-scoped values across API boundaries and processes.

There are only two public API constructs associated with context values:

func WithValue(parent Context, key, val any) Context
func (c Context) Value(key any) any

WithValue can take any comparable value as both the key and the value. The key defines how the stored value is identified, and the value can be any data you want to pass through the call chain.

Value, on the other hand, also returns any, which means the compiler cannot infer the concrete type at compile time. To use the returned data safely, you must perform a type assertion.

A naive workflow to store and retrieve values in a context looks like this:

ctx := context.Background()

// Store some value against a key
ctx = context.WithValue(ctx, "userID", 42)

// Retrieve the value
v := ctx.Value("userID")

// Value returns any, so you need a type assertion
id, ok := v.(int)
if !ok {
    fmt.Println("unexpected type")
}
fmt.Println(id) // 42

WithValue returns a new context that wraps the parent. Value walks up the chain of contexts and returns the first matching key it finds. Since the return type is any, a type assertion is required to recover the original type. Without the ok check, a mismatch would cause a panic.

The issue with this setup is that it risks collision. If another package sets a value against the same key, one overwrites the other:

package main

import (
    "context"
    "fmt"
)

func main() {
    ctx := context.WithValue(context.Background(), "key", "from-main")
    ctx = foo(ctx)
    fmt.Println(ctx.Value("key")) // from-foo
}

func foo(ctx context.Context) context.Context {
    // Accidentally reuse the same key in another package
    return context.WithValue(ctx, "key", "from-foo")
}

The first value becomes inaccessible because WithValue returns a new derived context that shadows parent values with the same key. The original value still exists in the parent context but is unreachable through the reassigned variable.

To understand why this collision occurs, you need to know how Go compares interface values. When you assign a value to an interface{} (or any), Go boxes that value into an internal representation made up of two machine words: one points to the type information, and the other points to the underlying data.

For example:

var a any = "key"
var b any = "key"
fmt.Println(a == b) // true

Each boxed interface here stores two things: a pointer to the type string and a pointer to the data "key". Since both type and data pointers match, the comparison returns true.

WithValue stores both the key and the value as any. When you later call Value, Go compares the boxed key you pass in with those stored in the context chain. If two different packages use the same built-in key type and data, like both passing "key" as a string, their boxed representations look identical. Go sees them as equal, and the most recent value shadows the earlier one.

If you want to learn more about how interfaces are represented and compared, Russ Cox’s post on Go interface internals explains it in detail with pretty pictures.

The fix is to make sure the keys have unique types so their boxed representations differ. If you define a custom type, the type pointer changes even if the data looks the same. For example:

type userKey string

var a any = userKey("key")
var b any = "key"
fmt.Println(a == b) // false

Even though the underlying value is "key", the two interfaces now hold different type information, so Go considers them unequal. That difference in type identity is what prevents collisions.

The context documentation gives this advice:

The provided key must be comparable and should not be of type string or any other built-in type to avoid collisions between packages using context. Users of WithValue should define their own types for keys. To avoid allocating when assigning to an interface{}, context keys often have concrete type struct{}. Alternatively, exported context key variables' static type should be a pointer or interface.

In short:

Keys must be comparable (string, int, struct, pointer, etc.)
Define unique key types per package to avoid collisions
Use struct{} keys to avoid allocation when stored as any
Exported key variables should have pointer or interface types

Here’s how defining a unique key type prevents collisions:

type userIDKey string

// Store value
ctx := context.WithValue(context.Background(), userIDKey("id"), 42)

// Retrieve value
id := ctx.Value(userIDKey("id"))
fmt.Println(id) // 42

Even if another package uses the string "id", the key types differ, so they cannot collide.

To avoid allocation when WithValue assigns the inbound value to interface any, you can define an empty struct key. Unlike strings or integers, which allocate when boxed into an interface, a zero-sized struct occupies no memory and needs no allocation:

type key struct{}

// Store value
ctx := context.WithValue(context.Background(), key{}, "value")

// Retrieve value
v := ctx.Value(key{})
fmt.Println(v) // value

Empty structs are ideal for local, unexported keys. They are unique by type and add no overhead.

Alternatively, exported keys can use pointers, which also avoid allocation and guarantee uniqueness. When a pointer is boxed into an interface, no data copy occurs because the interface just holds the pointer reference. Pointers are also ideal for keys that need to be shared across packages.

type userIDKey struct {
    name string
}

// Struct pointer as key
var UserIDKey = &userIDKey{"user-id"}

// Store value. No allocation here since userIDKey is a pointer
// to a struct
ctx := context.WithValue(context.Background(), UserIDKey, 42)

// Retrieve value
id := ctx.Value(UserIDKey)
fmt.Println(id) // 42

Here, UserIDKey points to a unique struct instance, so equality checks work by pointer identity. The name field exists only for debugging. This avoids allocation and ensures exported keys remain unique even when shared between packages.

When exposing context values across APIs, you can approach it in two ways depending on how much control and safety you want to give your users.

1. Expose keys directly

You can export the key itself and let users interact with it freely:

type APIKey string

// Allow the other packages to directly use this key
var APIKeyContextKey = APIKey("api-key")

// Store value. An allocation will occur since the key is of type string
ctx := context.WithValue(context.Background(), APIKeyContextKey, "secret")

// Retrieve value
v := ctx.Value(APIKeyContextKey).(string) // caller must do this assertion
fmt.Println(v) // secret

When you export the key directly the caller gains direct access, but they also must:

do the type assertion themselves and handle the ok result to avoid panics
ensure they don’t accidentally overwrite values using the wrong key

The net/http package uses this approach for some of its exported context keys:

type contextKey struct {
    name string
}

// Notice the exported keys
var (
    ServerContextKey    = &contextKey{"http-server"}
    LocalAddrContextKey = &contextKey{"local-addr"}
)

Each variable points to a distinct struct, making them unique by pointer identity.

The serve_test.go file uses these keys like this:

ctx := context.WithValue(
    context.Background(), http.ServerContextKey, srv,
)

// Type assertion to recover the concrete type
srv2, ok := ctx.Value(http.ServerContextKey).(*http.Server)
if ok {
    fmt.Println(srv == srv2) // true
}

The server value is stored in the context and later retrieved using the same pointer key. The user must perform a type assertion and handle it safely.

2. Expose accessor functions

The other approach is to hide the key and provide accessor functions to set and retrieve values. This removes the need for users to remember the right key type or perform type assertions manually.

// Define a private key type to avoid collisions
type contextKey struct {
    name string
}

// Define the key
var userIDKey = &contextKey{"user-id"}

// Public accessor to store a value to ctx
func WithUserID(ctx context.Context, id int) context.Context {
    // No allocation here since userIDKey is a pointer to a struct
    return context.WithValue(ctx, userIDKey, id)
}

// Public accessor to fetch a value from ctx
func UserIDFromContext(ctx context.Context) (int, bool) {
    v, ok := ctx.Value(userIDKey).(int)
    return v, ok
}

// Store value
ctx := WithUserID(context.Background(), 42)

// Retrieve value
id, ok := UserIDFromContext(ctx)
if ok {
    fmt.Println(id) // 42
} else {
    fmt.Println("no user ID found in context")
}

This approach centralizes how values are stored and retrieved from the context. It ensures the correct key and type are always used, preventing collisions and runtime panics. It also keeps the calling code shorter since your API users won’t need to repeat type assertions everywhere.

WithX / XFromContext accessors appear throughout the Go standard library:

net/http/httptrace

func WithClientTrace(
    ctx context.Context, trace *ClientTrace,
) context.Context
func ContextClientTrace(ctx context.Context) *ClientTrace

runtime/pprof

func WithLabels(ctx context.Context, labels LabelSet) context.Context
func Labels(ctx context.Context) LabelSet

You can find similar examples outside of the stdlib. For instance, the OpenTelemetry Go SDK follows the same model:

func ContextWithSpan(parent context.Context, span Span) context.Context
func SpanFromContext(ctx context.Context) Span

This technique standardizes how values are passed across APIs, eliminates redundant type assertions, and prevents key misuse across packages.

Closing words

I usually use a pointer to a struct as a key and expose accessor functions when building user-facing APIs. Otherwise, in services, I often define empty struct keys and expose them publicly to avoid the ceremony around accessor functions.

Organizing Go tests

Wed, 08 Oct 2025 00:00:00 +0000

When it comes to test organization, Go’s standard testing library only gives you a few options. I think that’s a great thing because there are fewer details to remember and fewer things to onboard people to. However, during code reviews, I often see people contravene a few common conventions around test organization, especially those who are new to the language.

If we distill the most common questions that come up when organizing tests, they are:

Where to put the unit tests for a package
How to enable white-box and black-box testing
Where the executable examples, benchmarks, and fuzz tests should live
Where the integration and end-to-end tests for a service should live

To answer these, let’s consider a simple test subject.

System under test (SUT)

Let’s define a small app called myapp that contains a single package mypkg. It has a Greet function that returns a greeting message as a string. We’ll use this throughout the discussion and evolve the directory structure as needed.

myapp/
└── mypkg/
    ├── greet.go
    └── greet_test.go

Here’s how greet.go looks:

// greet.go
package mypkg

func Greet(name string) string {
    if name == "" {
        return "Hello, stranger"
    }
    return "Hello, " + name
}

In-package tests

Most Go tests live next to the code they verify. These are called in-package tests, and they share the same package name as the code under test. This setup gives them access to unexported functions and variables, making them ideal for unit tests that target specific internal logic.

// greet_test.go
package mypkg // The test file lives under `mypkg`

import "testing"

func TestGreet(t *testing.T) {
    got := Greet("Go") // The test can access mypkg deps without an import
    want := "Hello, Go"
    if got != want {
        t.Fatalf("Greet() = %q, want %q", got, want)
    }
}

The structure stays the same:

myapp/
└── mypkg/
    ├── greet.go         # under package mypkg
    └── greet_test.go    # under package mypkg

These are your bread-and-butter unit tests. You can run them with go test ./..., and they’ll have full access to unexported details in the package.

The Go documentation explains it as:

The test file can be in the same package as the one being tested. If the test file is in the same package, it may refer to unexported identifiers within the package.

This approach is called white-box testing. Your test code has full access to the package internals, allowing you to test them directly when needed. For example, if there’s an unexported function in greet.go, the test in greet_test.go can call it directly. Following the test pyramid, most tests in your system should be written this way.

Co-located external tests

Sometimes you want to verify that your package behaves correctly from the outside. At this point, you’re not concerned with its internals and just want to confirm that the public API works as intended.

Go makes this possible by letting you write tests under a package name that ends with _test. This creates a separate test package that lives alongside the package under test. For example:

// greet_external_test.go
package mypkg_test // Note the package definition

import (
    "testing"
    "myapp/mypkg" // Explicitly import the SUT package
)

func TestGreetExternal(t *testing.T) {
    got := mypkg.Greet("External")
    want := "Hello, External"
    if got != want {
        t.Fatalf("unexpected output: got %q, want %q", got, want)
    }
}

Your directory now includes both internal and external tests:

myapp/
└── mypkg/
    ├── greet.go                 # under package mypkg
    ├── greet_test.go            # under package mypkg
    └── greet_external_test.go   # under package mypkg_test

In this setup, the mypkg directory can only contain the mypkg and mypkg_test packages. The compiler recognizes the _test suffix and disallows any other package names in the same directory.

A key detail is that the Go test harness doesn’t build the tests of mypkg_test together with those of mypkg. It compiles two separate test binaries: one containing the package code and its in-package tests, and another containing the external tests. Each binary runs independently, and the external one links against the compiled mypkg archive just like any other importing package. You can find more about this process in the Go documentation on how tests are run.

This structure is particularly useful for validating public contracts and ensuring that refactors don’t break exported APIs.

As noted in the official testing package docs:

If the file is in a separate _test package, the package being tested must be imported explicitly, and only its exported identifiers may be used. This is known as “black-box" testing.

It’s a neat way to test your package from the outside without moving your tests into a separate directory tree. You can find examples of this style in net/http, context, and errors.

Examples, benchmarks, and fuzz tests

Go’s testing tool treats examples, benchmarks, and fuzz tests as first-class test functions. They use the same go test command as your regular unit tests and usually live in the same package. This makes them part of the same discovery and execution process but with different entry points.

Here’s how all three can coexist in the same package:

// greet_test.go
package mypkg // same package as the unit tests

import (
    "fmt"
    "testing"
)

// ... other unit tests

func ExampleGreet() {
    fmt.Println(Greet("Alice"))
    // Output: Hello, Alice
}

func BenchmarkGreet(b *testing.B) {
    for b.Loop() {
        Greet("Go")
    }
}

func FuzzGreet(f *testing.F) {
    f.Add("Bob")
    f.Fuzz(func(t *testing.T, name string) {
        Greet(name)
    })
}

This setup doesn’t change your layout:

myapp/
└── mypkg/
    ├── greet.go         # under package mypkg
    └── greet_test.go    # under package mypkg

If you prefer to separate these test types, you can move them into their own file while keeping them in the same package:

myapp/
└── mypkg/
    ├── greet.go                        # under package mypkg
    ├── greet_test.go                   # under package mypkg
    └── greet_bench_fuzz_example.go     # under package mypkg

In this layout, greet_bench_fuzz_example.go houses the benchmarks, fuzz tests, and examples, but all files still declare the same package mypkg. These are regular unit tests with specialized entry points. See how packages like encoding/json or html organize their fuzz tests.

It’s not a strict rule to keep them in the same package. You can also put them in a _test package. The sort package, for example, keeps its examples in sort_test.

As mentioned in the testing docs, benchmarks are discovered and executed with the -bench flag, and fuzz tests with the -fuzz flag.

Integration and end-to-end tests

When your project grows into multiple packages, you’ll want to verify that everything works together, not just in isolation. That’s where integration and end-to-end tests come in. They typically live outside the package tree because they often span multiple packages or processes.

myapp/
├── mypkg/
│   ├── greet.go                # under package mypkg
│   └── greet_test.go           # under package mypkg
└── integration/
    └── greet_integration_test.go   # under package integration

Here’s what one might look like:

package integration

import (
    "testing"
    "myapp/mypkg" // Explicitly import the SUT pkg to use its deps
)

func TestGreetFlow(t *testing.T) {
    got := mypkg.Greet("Integration")
    want := "Hello, Integration"
    if got != want {
        t.Fatalf("unexpected output: got %q, want %q", got, want)
    }
}

Integration tests import real packages and test their interactions. They can spin up servers, connect to databases, or coordinate subsystems. The integration test packages are just like any other package: to communicate with any other package, it needs to be imported explicitly.

You’ll see this pattern in kubernetes, which has a test directory with subpackages like integration and e2e.

Having a top-level package for testing only makes sense if you’re testing multiple packages. Otherwise, if you’re writing integration or functional tests for a single package, you can still nest the tests under the SUT package. In this case, integration tests for mypkg can be tucked away under mypkg/test.

Closing

The general rule of thumb is:

Unit tests stay in the same package as the code.
Black-box tests use a _test package in the same directory.
Examples, benchmarks, and fuzz tests live with the unit tests, though you may put them in _test if needed.
Integration and end-to-end tests live outside the SUT package tree.

The following tree attempts to capture the full picture:

myapp/
├── mypkg/
│   ├── greet.go                     # mypkg - production code
│   ├── greet_test.go                # mypkg - unit & white-box tests
│   ├── greet_external_test.go       # mypkg_test - black-box tests
│   └── greet_bench_fuzz_example.go  # examples, benchmarks, fuzz
└── integration/
    └── greet_integration_test.go    # integration or e2e tests

Subtest grouping in Go

Wed, 01 Oct 2025 00:00:00 +0000

Go has support for subtests starting from version 1.7. With t.Run, you can nest tests, assign names to cases, and let the runner execute work in parallel by calling t.Parallel from subtests if needed.

For small suites, a flat set of t.Run calls is usually enough. That’s where I tend to begin. As the suite grows, your setup and teardown requirements may demand subtest grouping. There are multiple ways to handle that.

One option is to group subtests using nested t.Run. However, since t.Run supports arbitrary nesting, it’s easy to create tests that are hard to read and reason about, especially when each group has its own setup and teardown. When you add calls to t.Parallel, it can also become unclear which groups of tests run sequentially and which run in parallel.

This is all a bit hand wavy without examples. We’ll start with the simplest possible subtest grouping and work our way up. Coming up with examples that make the point while still fitting in a blog is tricky, so you’ll have to bear with my toy examples and use a bit of imagination.

System under test (SUT)

Let’s say we’re writing tests for a calculator that, for the sake of argument, can only do addition and multiplication. Instead of going for table-driven tests, we’ll split the tests for addition and multiplication into two groups using subtests. The reason being, let’s say addition and multiplication need different kinds of setup and teardown for some reason.

I know I’m reaching, but bear with me. I’d rather make the point without dragging in mocks, databases, or testcontainers and getting lost in details. But you can find similar setup in a real codebase everywhere where you might be talking to a database and your read and write path have separate test lifecycles.

Keep it flat until you can’t

If we didn’t need different setup and teardown for the two groups, the simplest way to test a system would be through a set of table-driven tests:

func TestCalc(t *testing.T) {
    // Common setup and teardown

    tests := []struct {
        name string
        got  int
        want int
    }{
        {"1+1=2", 1 + 1, 2},
        {"2+3=5", 2 + 3, 5},
        {"2*2=4", 2 * 2, 4},
        {"3*3=9", 3 * 3, 9},
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            if tt.got != tt.want {
                t.Fatalf("got %d, want %d", tt.got, tt.want)
            }
        })
    }
}

Running the tests returns:

--- PASS: TestCalc (0.00s)
    --- PASS: TestCalc/1+1=2 (0.00s)
    --- PASS: TestCalc/2+3=5 (0.00s)
    --- PASS: TestCalc/2*2=4 (0.00s)
    --- PASS: TestCalc/3*3=9 (0.00s)
PASS

Unrolling the tests would give you this. The following is equivalent to the above test suite:

func TestCalc(t *testing.T) {
    // Common setup and teardown

    // Addition
    t.Run("1+1=2", func(t *testing.T) {
        if 1+1 != 2 {
            t.Fatal("want 2")
        }
    })

    t.Run("2+3=5", func(t *testing.T) {
        if 2+3 != 5 {
            t.Fatal("want 5")
        }
    })

    // Multiplication
    t.Run("2*2=4", func(t *testing.T) {
        if 2*2 != 4 {
            t.Fatal("want 4")
        }
    })

    t.Run("3*3=9", func(t *testing.T) {
        if 3*3 != 9 {
            t.Fatal("want 9")
        }
    })
}

Observe that all the subtests live at the same level. The names of the tests are the indicator of which function of the calculator they’re testing. But this obviously doesn’t allow us to have separate lifecycles for the addition and multiplication groups. There’s no grouping as of now.

Group subtests with nested `t.Run` when lifecycle diverges

To allow different setup and teardown for addition and multiplication, we can introduce grouping by nesting the subtests via t.Run. Notice:

func TestCalc(t *testing.T) {
    // Common setup and teardown

    t.Run("addition", func(t *testing.T) {
        // addition-specific setup
        defer func() {
            // addition-specific teardown
        }()

        t.Run("1+1=2", func(t *testing.T) {
            if 1+1 != 2 {
                t.Fatal("want 2")
            }
        })

        t.Run("2+3=5", func(t *testing.T) {
            if 2+3 != 5 {
                t.Fatal("want 5")
            }
        })
    })

    t.Run("multiplication", func(t *testing.T) {
        // multiplication-specific setup
        defer func() {
            // multiplication-specific teardown
        }()

        t.Run("2*2=4", func(t *testing.T) {
            if 2*2 != 4 {
                t.Fatal("want 4")
            }
        })

        t.Run("3*3=9", func(t *testing.T) {
            if 3*3 != 9 {
                t.Fatal("want 9")
            }
        })
    })
}

In this case, you can run the common setup and teardown in the top-level test function and the groups can have their own lifecycle operations alongside. Introducing the group also allows us to name them properly and they show up when we run the tests:

--- PASS: TestCalc (0.00s)
    --- PASS: TestCalc/addition (0.00s)
        --- PASS: TestCalc/addition/1+1=2 (0.00s)
        --- PASS: TestCalc/addition/2+3=5 (0.00s)
    --- PASS: TestCalc/multiplication (0.00s)
        --- PASS: TestCalc/multiplication/2*2=4 (0.00s)
        --- PASS: TestCalc/multiplication/3*3=9 (0.00s)
PASS

From the output it’s clear which subtests belong to which group. This setup also allows you to run the groups in parallel by calling t.Parallel in each group.

func TestCalc(t *testing.T) {
    // Common setup and teardown

    t.Run("addition", func(t *testing.T) {
        t.Parallel()
    })

    t.Run("multiplication", func(t *testing.T) {
        t.Parallel()
    })
}

Starting with flat subtests and nesting them one extra level with t.Run should suffice in the majority of cases. Readability of your tests usually starts hurting when you need to introduce any additional nesting.

I almost always frown when I encounter more than two degrees of nesting in a test suite. On top of that, if your overly nested subtests start calling t.Parallel then it’s quite difficult to reason about the test execution flow. Plus, maintaining the lifecycles of the nested subgroups can get out of hand pretty quickly.

But even when you’re grouping subtests with two degrees of nesting, if the individual test logic starts getting longer, that might start hurting readability. Named functions for the subtests can help here in most cases.

Extract subtest groups into functions

We can rewrite the subtest grouping example of the previous section by extracting subtests into two group-specific functions like this:

func TestCalc(t *testing.T) {
    // Common setup and teardown

    t.Run("addition", addgroup)
    t.Run("multiplication", multgroup)
}

func addgroup(t *testing.T) {
    // addition-specific setup
    defer func() {
        // addition-specific teardown
    }()

    t.Run("1+1=2", func(t *testing.T) {
        if 1+1 != 2 {
            t.Fatal("want 2")
        }
    })

    t.Run("2+3=5", func(t *testing.T) {
        if 2+3 != 5 {
            t.Fatal("want 5")
        }
    })
}

func multgroup(t *testing.T) {
    // multiplication-specific setup
    defer func() {
        // multiplication-specific teardown
    }()

    t.Run("2*2=4", func(t *testing.T) {
        if 2*2 != 4 {
            t.Fatal("want 4")
        }
    })

    t.Run("3*3=9", func(t *testing.T) {
        if 3*3 != 9 {
            t.Fatal("want 9")
        }
    })
}

All we did here is extract the groups into their own functions. Other than that this test is identical to the previous two-degree subtest grouping. You can call t.Parallel from the subgroup functions:

func TestCalc(t *testing.T) {
    // Common setup and teardown
    // ...
}

func addgroup(t *testing.T) {
    // Run the group in parallel
    t.Parallel()
}

func multgroup(t *testing.T) {
    // Run the group in parallel
    t.Parallel()
}

Or you can bring the t.Parallel at the top-level test function:

func TestCalc(t *testing.T) {
    // Common setup and teardown

    t.Run("addition", func(t *testing.T) {
        t.Parallel()
        addgroup(t)  // addgroup doesn't have t.Parallel
    })

    t.Run("multiplication", func(t *testing.T) {
        t.Parallel()
        multgroup(t) // multgroup doesn't have t.Parallel
    })
}

That’s all there is to it. But some people don’t like the manual wiring that we needed to do in the top-level TestCalc function. Also, in a larger codebase, you’ll need some discipline to make sure the pattern is followed by others extending the code.

So often people want the subtest groups to be automatically discovered without them having to manually wire them in the main test function. While I’m not a big fan of automagical group discovery, I got curious about it nonetheless. The gRPC-go has a group discovery function that does this.

gRPC-go uses reflection to discover groups

If we were writing tests inside the grpc-go repository, we could lean on its small helper package, internal/grpctest, which reflects over a value you pass in, discovers methods whose names start with Test, and runs each of those as a subtest. Crucially, the helper also runs setup before and teardown after each discovered test method, which gives you a clear spot for per-group lifecycle work. The public surface is tiny: RunSubTests(t, x) plus a default hook carrier Tester that you embed to get Setup and Teardown.

Here is our same calculator suite in that style, as if we were adding tests inside grpc-go:

// NOTE: This import path only works inside the grpc-go repo family.
// External modules cannot import google.golang.org/grpc/internal/*.
package calc

import (
    "testing"
    "google.golang.org/grpc/internal/grpctest"
)

// CalcSuite: embed grpctest.Tester so we get Setup and Teardown hooks.
// The runner will discover TestAddition and TestMultiplication below.
type CalcSuite struct{ grpctest.Tester }

// TestAddition is discovered because the name starts with "Test".
func (CalcSuite) TestAddition(t *testing.T) {
    // addition-specific setup and teardown for this group
    defer func() {
        // tear down addition fixtures
    }()

    t.Run("1+1=2", func(t *testing.T) {
        if 1+1 != 2 {
            t.Fatal("want 2")
        }
    })
    t.Run("2+3=5", func(t *testing.T) {
        if 2+3 != 5 {
            t.Fatal("want 5")
        }
    })
}

// A second discovered group.
func (CalcSuite) TestMultiplication(t *testing.T) {
    // multiplication-specific setup and teardown for this group
    defer func() {
        // tear down multiplication fixtures
    }()

    t.Run("2*2=4", func(t *testing.T) {
        if 2*2 != 4 {
            t.Fatal("want 4")
        }
    })
    t.Run("3*3=9", func(t *testing.T) {
        // call t.Parallel() here if overlapping is safe
        if 3*3 != 9 {
            t.Fatal("want 9")
        }
    })
}

// Top-level entry that "go test" sees.
// RunSubTests reflects over CalcSuite,
// then runs Setup, the test method, then Teardown.
func TestCalc(t *testing.T) {
    grpctest.RunSubTests(t, CalcSuite{})
}

Outside grpc-go you can’t import google.golang.org/grpc/internal/grpctest because it lives under an internal/ path. Go’s visibility rule only allows packages within that module tree to use it. If you want the subtest discoverer, there’s nothing stopping you from blatantly copying the code. It’s only a few dozen lines and devoid of any dependencies other than the leak checker. You can drop the file in your tests, remove the leak checker code if you don’t need that, adjust the import paths, and start using RunSubTests. To avoid repetition, I’ll leave that as an exercise to the reader.

Another thing to point out is that grpctest.RunSubTests doesn’t change the standard scheduler; you still opt into concurrency with t.Parallel() where it is safe.

Subgroup with third party libraries

If you like automatic subgroup discovery but want something you can use outside grpc-go, two common options are testify’s suite and Bloomberg’s go-testgroup. Both let you organize tests into named groups and keep per-group setup/teardown close to the cases.

Testify’s suite

Testify models a suite as a struct with Test* methods and gives you s.Run for subtests and assertion helpers.

package calc

import (
    "testing"
    "github.com/stretchr/testify/suite"
)

type CalcSuite struct{ suite.Suite }

func (s *CalcSuite) TestAddition() {
    s.Run("1+1=2", func() { s.Equal(2, 1+1) })
    s.Run("2+3=5", func() { s.Equal(5, 2+3) })
}

func (s *CalcSuite) TestMultiplication() {
    s.Run("2*2=4", func() { s.Equal(4, 2*2) })
    s.Run("3*3=9", func() { s.Equal(9, 3*3) })
}

func TestCalc(t *testing.T) {
    suite.Run(t, new(CalcSuite))
}

One limitation is that the suite runner doesn’t support using t.Parallel to run the suite methods (TestAddition, TestMultiplication) in parallel. Bloomberg’s test group allows you to do that.

Bloomberg’s go-testgroup

Bloomberg’s library also groups by methods, but passes a *testgroup.T and provides two runners so you can choose serial or parallel execution at the group level.

package calc

import (
    "testing"
    "github.com/bloomberg/go-testgroup"
)

type CalcGroup struct{}

func (g *CalcGroup) Addition(t *testgroup.T) {
    t.Run("1+1=2", func(t *testgroup.T) { t.Equal(2, 1+1) })
    t.Run("2+3=5", func(t *testgroup.T) { t.Equal(5, 2+3) })
}

func (g *CalcGroup) Multiplication(t *testgroup.T) {
    t.Run("2*2=4", func(t *testgroup.T) { t.Equal(4, 2*2) })
    t.Run("3*3=9", func(t *testgroup.T) { t.Equal(9, 3*3) })
}

func TestCalcSerial(t *testing.T) {
    testgroup.RunSerially(t, &CalcGroup{})
}

// Or run in parallel.
// Don't call t.Parallel inside methods
func TestCalcParallel(t *testing.T) {
    testgroup.RunInParallel(t, &CalcGroup{})
}

RunInParallel handles group-level parallelism for you and documents not to mix in your own t.Parallel inside those methods.

Closing

While there are multiple ways to organize subtest groups, I try to keep them flat for as long as possible. When grouping becomes necessary, I gradually add a single extra level of nesting with t.Run.

In larger tests, extracting groups into their own named functions improves readability and maintainability quite a bit. I almost never use reflection-based wiring because that’s one extra bit of code to carry around.

I also tend to eschew pulling in third-party test suites unless I am already working in a codebase that uses them. Tools like testify or go-testgroup require you to define a struct and attach tests to it. I prefer to keep tests as standalone functions. In addition, testing frameworks often develop into mini-languages of their own, which makes onboarding harder. Notice how different the APIs of testify suite and go-testgroup are despite doing pretty much the same thing.

In my experience, even in large codebases, a bit of discipline is usually enough to get by with manual subtest grouping.

Let the domain guide your application structure

Sat, 20 Sep 2025 00:00:00 +0000

I like to make the distinction between application structure and architecture. Structure is how you organize the directories and packages in your app while architecture is how different components talk to each other. The way your app talks to other services in a fleet can also be called architecture.

While structure often influences architecture and vice versa, this distinction is important. This post is strictly about application structure and not library structure. Library structure is often driven by different design pressures than their app counterparts. There are a ton of canonical examples of good library structure in the stdlib, but it’s app structure where things get a bit more muddy.

At work, I not only write Go in a distributed system environment but also review potential candidates’ assignments in the hiring pipeline. While there is no objectively right or wrong way to structure an app, I do see a common pitfall in candidates’ submissions that is usually frowned upon in a Go application.

App structure should be driven by what it does and not what it’s built with. Let the domain guide the structure, not technology or the current language specific zeitgeist.

Ben Johnson’s Standard Package Layout is a good reference for this. He points out why approaches like monolithic packages, Rails style layouts, or grouping by module don’t fit well in Go. Then he lays out a map where the root package holds domain types, dependencies are grouped in separate packages, and the main package wires everything together.

While Ben’s post is focused on what you should be doing, I want to keep this discussion a bit more open-ended and just talk about one bad pattern that you probably should avoid. The rest of the app structure is subjective and should be driven by requirements. Use your judgement.

The mistake I often see is people making a bunch of generically named packages like models, controllers, handlers and stuffing everything there. App structure like the following is quite common:

mystore/
├── controllers/
│   ├── order_controller.go
│   └── user_controller.go
├── models/
│   ├── order.go
│   └── user.go
├── handlers/
│   ├── http_handler.go
│   └── webhook_handler.go
└── main.go

In Go there’s no file level separation, only package level separation. That means everything under models like order and product lives in the same namespace. The same is true for controllers and handlers.

Once you put multiple business domains under a generic umbrella, you tie them together. This might make sense in a language like Python where file names are prefixed in the fully qualified import path. In Python you’d import them as follows:

# Identifiers live in the order namespace
from mystore.models import order

# Identifiers live in the http_handler namespace
from mystore.handlers import http_handler

But in Go the import path becomes this:

// Identifiers from order.go, user.go, product.go
// all live in the same namespace
import "mystore/models"

// Identifiers from http_handler.go & webhook_handler.go
// all live in the same namespace
import "mystore/handlers"

There is no file level delineation in Go. If you put different domains under the same models directory, there is no indication at import time what domain a model belongs to. The only clue is the identifier name. This isn’t ideal when you want clear separation between domains.

In Go, packages define your bounded context, not files within a package. Domains should be delineated by top level packages, not by file names.

For your top level business logic, you want package level separation between domains. Order logic should live in order, user logic should live in user. These packages will be imported in many places throughout the app, and keeping them separate keeps dependencies clear.

It could look like this:

mystore/
├── order/          <-- business logic related to the order domain
│   ├── order.go
│   └── service.go
├── user/           <-- business logic related to the user domain
│   ├── user.go
│   └── service.go
└── cmd/            <-- wire everything here
    └── mystore/
        └── main.go

Each domain owns its own logic and optional adapters. If you need to find order related code, you go to order. If you need user code, you go to user. Nothing is smooshed together under a generic bucket.

The details around how you layer your app can differ based on requirements, but the important point is that your top level directories shouldn’t just be generic buckets containing all domains. That makes navigation harder. A better approach is letting the domain guide the structure and only layering in technology when it matters.

You can place your transport concerns alongside the top level packages. A top level http package can hold handlers that import service functions from the domain packages. You can put all handlers under http or split them into http/order and http/user. Both are valid choices. If you put all handlers under http, that’s fine because they are usually imported in one place where you wire routes. The same is true for database adapters. You can put them all under postgres or split them into postgres/order and postgres/user. Both patterns are acceptable. The key difference is that domains need package level separation, while technology packages can be grouped because they are only wired at the edge.

mystore/
├── order/
│   ├── order.go
│   └── service.go
├── user/
│   ├── user.go
│   └── service.go
├── http/                 <-- lumping all the handlers here is fine
│   ├── order_handler.go
│   └── user_handler.go
├── postgres/             <-- this is fine, but you can create sub pkgs too
│   ├── order_repo.go
│   └── user_repo.go
└── cmd/
    └── server/
        └── main.go

But depending on the complexity of your app, this is also absolutely fine:

mystore/
├── order/
│   ├── order.go
│   └── service.go
├── user/
│   ├── user.go
│   └── service.go
├── http/                 <-- handlers are split by domain here
│   ├── order/
│   │   └── handler.go
│   └── user/
│       └── handler.go
├── postgres/             <-- repos are split by domain here
│   ├── order/
│   │   └── repo.go
│   └── user/
│       └── repo.go
└── cmd/
    └── server/
        └── main.go

The rule of thumb is that top level domains should never import anything from technology folders like http or postgres. Instead, http and postgres should always import from domain packages. You can add a linter to enforce this rule but since Go doesn’t allow import cycles, this is automatically enforced by the compiler.

   +-----------+     +-----------+
   |   order   |     |   user    |
   +-----------+     +-----------+
          ^                ^
          |                |
   +------------------------------+
   | http              postgres   |
   +------------------------------+
                  ^
                  |
             +---------+
             |   cmd   |
             +---------+

Domains sit at the top. Technology packages depend on them, never the other way around. The cmd package wires everything together. This keeps the graph simple and keeps domains independent.

Astute readers might notice that I have left out any discussion around the internal directory. This is intentional. Depending on your requirements, you might opt in for an internal directory or not. This isn’t important for our discussion. The main point I wanted to emphasize is that technology or architecture patterns shouldn’t guide your app structure. It should be based on something more persistent and nothing is more persistent than your application’s domain.

Test state, not interactions

Sun, 14 Sep 2025 00:00:00 +0000

With the advent of LLMs, the temptation to churn out a flood of unit tests for a false veneer of productivity and protection is stronger than ever.

My colleague Matthias Doepmann recently fired a shot at AI-generated tests that don’t validate the behavior of the System Under Test (SUT) but instead create needless ceremony around internal implementations. At best, these tests give a shallow illusion of confidence in the system’s correctness while breaking at the smallest change. At worst, they remain green even when the SUT’s behavior changes.

In practice, they add maintenance overhead and drag down code reviews. The frustration in that post wasn’t about violating some abstract testing philosophy. It came from having to wade through countless implementation-checking tests churned out by LLMs across components of a real, large-scale distributed system.

I think the problem persists for three reasons:

First, many developers have begun defaulting to LLMs for generating tests. Regrettably, even in critical systems. In greenfield projects with no test baseline, AI agents often go rogue and churn out these cheap implementation-checking tests. Google calls them interaction tests.
Second, the prevalence of mocking libraries encourages this pattern. They make it too easy to write tests that assert “which function called which” instead of “what actually happened.”
Third, once these tests exist, they create inertia and people keep piling on the same style of tests to be consistent.

Test state, not interactions

The general theme when writing unit tests should be checking the behavior of the system, not the scaffolding of its implementation. It doesn’t matter which method called which, how many times, or with what arguments.

What matters is: if you give the SUT some input, does it return the expected output? In a stateful system, does the input cause the system to mutate some persistence layer in the expected way? That persistence layer doesn’t always need to be a real database; it could be an in-memory buffer.

In scenarios where your code invokes external systems, it is more useful to test your system with canned responses from upstream calls rather than testing which method is being called.

The salient point is: test outcomes, not implementation details. As the book Software Engineering at Google puts it: test state, not interactions:

With state testing, you observe the system itself to see what it looks like after invoking with it. With interaction testing, you instead check that the system took an expected sequence of actions on its collaborators in response to invoking it. Many tests will perform a combination of state and interaction validation.

And the guidance that follows:

By far the most important way to ensure this is to write tests that invoke the system being tested in the same way its users would; that is, make calls against its public API rather than its implementation details. If tests work the same way as the system’s users, by definition, change that breaks a test might also break a user.

I think the first step in the right direction is to accept that LLMs can’t substitute for thought. The first few critical tests in your systems shouldn’t be written by LLMs and you must vet the tests churned out by the genie that wants to leap. Next up, you can often get away without a mocking library and more often than not, they improve the quality and maintainability of your tests.

Mocking libraries often don’t help

Mocking libraries come with their own idiosyncratic syntax and workflows. On most occasions, handwritten fakes are better than mocks. I’ll use Go to make my point here because that’s what I write the most these days, but the lesson applies to other languages too.

Consider a simple UserService that depends on a DB interface. Its job is to delegate user creation to the database and return any error to the caller:

// usersvc/usersvc.go
package usersvc

import "errors"

var ErrDuplicate = errors.New("duplicate user")

type DB interface {
    InsertUser(name string) error
    ListUsers() []string
}

type UserService struct {
    db DB
}

func NewUserService(db DB) *UserService {
    return &UserService{db: db}
}

// Baseline behavior: delegate to DB and surface errors to callers.
func (s *UserService) CreateUser(name string) error {
    return s.db.InsertUser(name)
}

A mocking tool such as mockery can generate a mock implementation of the DB interface. The generated code records calls and arguments so that tests can later assert whether the expected interactions happened:

// usersvc/mocks/mock_db.go
// generated by:
// mockery --name=DB --dir=usersvc --output=usersvc/mocks \
//     --outpkg=mocks --with-expecter
// simplified to remove unnecessary details
package mocks

import "github.com/stretchr/testify/mock"

type MockDB struct{ mock.Mock }

func (m *MockDB) InsertUser(name string) error {
    args := m.Called(name)
    return args.Error(0)
}

func (m *MockDB) ListUsers() []string {
    args := m.Called()
    return args.Get(0).([]string)
}

Using this mock, a test can be written to check that CreateUser interacts with the dependency in the expected way:

// usersvc/usersvc_mock_test.go
package usersvc_test

import (
    "testing"

    "github.com/stretchr/testify/require"
    "example.com/app/usersvc"
    "example.com/app/usersvc/mocks"
)

func TestUserService_CreateUser(t *testing.T) {
    db := mocks.NewMockDB(t)
    svc := usersvc.NewUserService(db)

    // Exact interaction expected.
    db.EXPECT().InsertUser("alice").Return(nil).Once()

    // Exercise public API.
    err := svc.CreateUser("alice")
    require.NoError(t, err)

    // Verify the interaction occurred.
    db.AssertExpectations(t)
}

This works mechanically, but it breaks down in practice:

It checks the collaborator call, not the result

A useful test would assert that “alice” was actually added or that a duplicate error was returned. This one only verifies that InsertUser("alice") was invoked once.

It breaks on harmless refactors

If the database method is renamed while keeping the same semantics, callers see no difference but the test fails:

// usersvc/usersvc.go  (harmless refactor, behavior unchanged)
package usersvc

type DB interface {
    UpsertUser(name string) error   // was InsertUser
    ListUsers() []string
}

func (s *UserService) CreateUser(name string) error {
    return s.db.UpsertUser(name)    // same public behavior
}

The mock-based test no longer compiles or needs rewiring, even though the public behavior didn’t change.

And worse, it survives real bugs

If an error is accidentally swallowed, callers get the wrong signal but the test still passes:

// usersvc/usersvc.go  (buggy refactor: behavior changed)
package usersvc

func (s *UserService) CreateUser(name string) error {
    _ = s.db.InsertUser(name)  // ignore error by mistake
    return nil                 // callers think it succeeded
}

A real DB or an in-memory fake would raise a constraint error that should propagate. The mock test goes green anyway because it only checked the call path.

The common thread is that mocks lock tests to implementation details. They don’t protect the behavior that real users rely on.

Interface-guided design and fakes

A better approach is to keep the same interface but back it with a handwritten fake. The fake encodes the domain rules you care about, and tests can focus on outcomes instead of verifying which collaborator methods were called.

Here, we’re hand writing the fake implementation of the DB interface instead of generating it via a mockgen library.

// usersvc/usersvc_fake_test.go
package usersvc_test

import "example.com/app/usersvc"

type FakeDB struct {
    seen  map[string]struct{}
    order []string
}

func NewFakeDB() *FakeDB {
    return &FakeDB{seen: make(map[string]struct{})}
}

func (f *FakeDB) InsertUser(name string) error {
    if _, ok := f.seen[name]; ok {
        return usersvc.ErrDuplicate
    }
    f.seen[name] = struct{}{}
    f.order = append(f.order, name)
    return nil
}

func (f *FakeDB) ListUsers() []string {
    out := make([]string, len(f.order))
    copy(out, f.order)
    return out
}

Tests with the fake read like a statement of expected behavior:

// usersvc/usersvc_fake_test.go
package usersvc_test

import (
    "testing"

    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
    "example.com/app/usersvc"
)

func TestUserService_CreateUser(t *testing.T) {
    db := NewFakeDB()
    svc := usersvc.NewUserService(db)

    require.NoError(t, svc.CreateUser("alice"))
    assert.Equal(t, []string{"alice"}, db.ListUsers()) // outcome observed
}

func TestUserService_CreateUser_DuplicateSurfaces(t *testing.T) {
    db := NewFakeDB()
    svc := usersvc.NewUserService(db)

    require.NoError(t, svc.CreateUser("alice"))
    err := svc.CreateUser("alice")
    require.ErrorIs(t, err, usersvc.ErrDuplicate) // behavior enforced
    assert.Equal(t, []string{"alice"}, db.ListUsers()) // state unchanged
}

This avoids the fragility of mocks. The tests survive harmless refactors, fail when behavior changes, and stay readable without a mocking DSL.

But the cost is maintaining the fake as the interface evolves. However, in practice, that’s still easier than constantly updating brittle mock expectations and occasionally dealing with the mock library’s lengthy migration workflow.

Fakes vs real systems

Sometimes the right move is to test against a real database running in a container. That is still state testing, just at a higher fidelity. The tradeoff is speed: you get stronger confidence in behavior, but the tests run slower.

Most of the time, handwritten in-memory fakes are what you need, and most tests should stick to those. When you do need the same behavior you would see in production, tools like testcontainers let you spin up databases, queues, or caches inside containers. Your tests can then call the SUT normally, with its configuration pointing at the containerized service, just as production code would connect to a production resource.

Parting words

This is not a rally against using LLMs for tests. But the seed tests, the first handful that set the standard, need to come from you. They define what correctness means in your system and give the ensuing tests a model to follow. If you hand that job to an LLM, you give up the chance to shape how the rest of the suite grows.

This isn’t to disparage mocking libraries either. But I have seen people armed with overzealous LLMs and mocks wreak havoc on a test suite and then unironically ask reviewers to review the mess. Instead of validating behavior, the suite fills up with fragile interaction checks that break on refactors and stay green through real bugs.

More often than not, you can skip mocking libraries and rely on handwritten fakes that check the behavior of the SUT instead of its interactions. The next person that needs to read and extend your tests might thank you for that.

Early return and goroutine leak

Sun, 07 Sep 2025 00:00:00 +0000

At work, a common mistake I notice when reviewing candidates’ home assignments is how they wire goroutines to channels and then return early.

The pattern usually looks like this:

start a few goroutines
each goroutine sends a result to its own unbuffered channel
in the main goroutine, read from those channels one by one
if any read contains an error, return early

The trap is the early return. With an unbuffered channel, a send blocks until a receiver is ready. If you return before reading from the remaining channels, the goroutines writing to them block forever. That’s a goroutine leak.

Here’s how the bug appears in a tiny example: one worker intentionally fails, causing the main goroutine to bail early. That early return skips the receive from ch2, leaving the sender on ch2 stuck.

type result struct{ err error }

func Example() error {
    ch1 := make(chan result) // unbuffered
    ch2 := make(chan result) // unbuffered

    // Simulate a failing worker by sending an error into ch1.
    // This is intentional to trigger the early return below.
    go func() { ch1 <- result{err: fmt.Errorf("oops")} }()

    // Simulate a successful worker that will try to send into ch2.
    go func() { ch2 <- result{err: nil} }()

    // Receive the first result.
    res1 := <-ch1
    if res1.err != nil {
        // We return right away because of the error.
        // Because we never read from ch2, the goroutine sending to ch2
        // is now blocked forever on its send. That goroutine leaks.
        return res1.err
    }

    // This receive is skipped on the error path above.
    res2 := <-ch2
    if res2.err != nil {
        return res2.err
    }
    return nil
}

One simple fix is to make sure you always read from both channels before you decide what to do. This guarantees that every send has a matching receive and no goroutine gets stuck:

func ExampleDrain() error {
    ch1 := make(chan result)
    ch2 := make(chan result)

    go func() { ch1 <- result{err: fmt.Errorf("oops")} }() // same failure
    go func() { ch2 <- result{err: nil} }()                // same success

    // Always receive both. Both sends now complete.
    res1 := <-ch1
    res2 := <-ch2

    if res1.err != nil {
        return res1.err
    }
    if res2.err != nil {
        return res2.err
    }
    return nil
}

This is safe but it means you always wait for both workers even when the first one already failed and the second result is irrelevant. If you want to return early without leaking, another option is to use buffered channels so the producers don’t block on send. A buffer of size one is enough for this pattern.

func ExampleBuffered() error {
    ch1 := make(chan result, 1) // buffered so sends do not block
    ch2 := make(chan result, 1)

    go func() { ch1 <- result{err: fmt.Errorf("oops")} }() // failure
    go func() { ch2 <- result{err: nil} }()                // success

    // Receive the first result and decide.
    res1 := <-ch1
    if res1.err != nil {
        // Safe to return early. The send to ch2 already completed
        // into its buffer even though we have not read it yet.
        return res1.err
    }

    // Still read ch2 to consume its buffered value
    res2 := <-ch2
    if res2.err != nil {
        return res2.err
    }
    return nil
}

Buffered channels remove the blocked send, but they also make it easier to forget that a second result exists at all. If that second value carries data you must process, you should still receive it. If it is truly fire and forget, buffering is fine.

Often the cleanest approach is to drop the channel plumbing when you only need to run tasks and aggregate errors. The errgroup package lets each goroutine return an error while the group does the waiting. There is nothing to forget to receive, so there is nothing to leak.

import (
    "fmt"
    "golang.org/x/sync/errgroup"
)

func ExampleErrgroup() error {
    var g errgroup.Group

    // Task 1 fails and returns an error.
    g.Go(func() error {
        return fmt.Errorf("oops")
    })

    // Task 2 succeeds.
    g.Go(func() error {
        return nil
    })

    // Wait waits for both tasks and returns the first error, if any.
    return g.Wait()
}

Sometimes you also want peers to stop once one task fails. errgroup.WithContext gives you a context that gets canceled as soon as any task returns an error. You pass that context into your workers and have them check ctx.Done() so they can exit quickly.

import (
    "context"
    "fmt"
    "time"

    "golang.org/x/sync/errgroup"
)

func ExampleErrgroupWithContext() error {
    // When any task returns an error, ctx is canceled.
    g, ctx := errgroup.WithContext(context.Background())

    // Task 1 fails quickly to simulate an early error.
    g.Go(func() error {
        return fmt.Errorf("oops")
    })

    // Task 2 is long running but cooperates with cancellation.
    g.Go(func() error {
        for {
            select {
            case <-ctx.Done():
                // Exits because Task 1 failed and canceled the context.
                return ctx.Err()
            default:
                time.Sleep(10 * time.Millisecond)
            }
        }
    })

    return g.Wait()
}

At this point it is natural to ask if tools can catch the original bug for you. go vet cannot. Vet is static analysis that runs at build time. Whether a send blocks depends on runtime control flow and timing. Vet cannot prove that the function returns before a particular receive in a general way, so it doesn’t flag this pattern.

go test -race cannot either. The race detector detects unsynchronized concurrent memory access. A goroutine stuck on a channel send isn’t a data race. You may see a test hang until timeout, but the tool won’t point to a leaking goroutine.

You can turn this into a failing test with goleak from Uber. goleak fails if goroutines are still alive when a test ends. It snapshots all goroutines via the runtime, filters out the standard background ones, and reports the rest. Wire it into a test that triggers the early return and you will see the blocked sender’s stack in the output.

Here is a test that leaks and fails:

package example_test

import (
    "fmt"
    "testing"

    "go.uber.org/goleak"
)

type result struct{ err error }

func buggyEarlyReturn() error {
    ch1 := make(chan result)
    ch2 := make(chan result)

    // Force the early-return path by sending an error on ch1.
    go func() { ch1 <- result{err: fmt.Errorf("oops")} }()

    // This send will block forever on the failing path
    // because nobody receives ch2.
    go func() { ch2 <- result{err: nil} }()

    r1 := <-ch1
    if r1.err != nil {
        return r1.err // leak: ch2 sender is stuck
    }

    _ = <-ch2
    return nil
}

func TestBuggyLeaks(t *testing.T) {
    // fails if any goroutines are stuck at test end
    defer goleak.VerifyNone(t)
    _ = buggyEarlyReturn()
}

This test fails and prints the goroutine stack stuck in the send to ch2.

=== RUN   TestBuggyLeaks
    main_test.go:34: found unexpected goroutines:
        [Goroutine 24 in state chan send,
        with thing.buggyEarlyReturn.func2 on top of the stack:
        thing.buggyEarlyReturn.func2()
                .../main_test.go:20 +0x28
        created by thing.buggyEarlyReturn in goroutine 22
                .../main_test.go:20 +0xc0
        ]
--- FAIL: TestBuggyLeaks (0.44s)
FAIL
exit status 1

If you switch the implementation to a fixed version, the test passes. For example, the draining fix:

func fixedDrain() error {
    ch1 := make(chan result)
    ch2 := make(chan result)

    go func() { ch1 <- result{err: fmt.Errorf("oops")} }()
    go func() { ch2 <- result{err: nil} }()

    r1 := <-ch1
    r2 := <-ch2

    if r1.err != nil {
        return r1.err
    }
    if r2.err != nil {
        return r2.err
    }
    return nil
}

func TestFixedNoLeaks(t *testing.T) {
    defer goleak.VerifyNone(t)
    _ = fixedDrain()
}

If you prefer suite wide enforcement, add goleak to your TestMain. This way your entire test run fails if any test leaks goroutines.

package main

import (
    "os"
    "testing"

    "go.uber.org/goleak"
)

func TestMain(m *testing.M) {
    // VerifyTestMain wraps the whole test run
    // and fails if any goroutines are left behind.
    goleak.VerifyTestMain(m)
}

If you start goroutines that send on channels, think carefully about early returns. An unbuffered send waits for a receive, and if you return before that receive happens, you’ve leaked a goroutine.

You can avoid this by:

always draining all channels
buffering intentionally so sends don’t block
or using errgroup, with or without context, so tasks return errors and cooperate on cancellation

Add goleak to your tests so leaks surface early during development.

Lifecycle management in Go tests

Sat, 30 Aug 2025 00:00:00 +0000

Unlike pytest or JUnit, Go’s standard testing framework doesn’t give you as many knobs for tuning the lifecycle of your tests.

By lifecycle I mean the usual setup and teardown hooks or fixtures that are common in other languages. I think this is a good thing because you don’t need to pick up many different framework-specific workflows for something so fundamental.

Go gives you enough hooks to handle this with less ceremony. But it can still be tricky to figure out the right conventions for setup and teardown that don’t look odd to other Gophers, especially if you haven’t written Go for a while. This text explores some common ways to do lifecycle management in your Go tests.

Before we cover multiple testing scenarios, it’s useful to understand how Go’s test harness actually runs your tests.

How Go discovers and runs your tests

When you type go test, Go doesn’t interpret test files directly. It collects all the _test.go files in a package, compiles them together with the rest of the package, and produces a temporary binary. That binary contains both your code and your tests, along with a small harness that drives them. The harness then runs the binary and reports results.

From the “go test” command doc:

“go test” automates testing the packages named by the import paths. […] recompiles each package along with any files with names matching the file pattern “*_test.go”.

Discovery

Inside each package, the harness looks for test functions. A function qualifies if it has the form:

func TestXxx(t *testing.T)

where Xxx starts with an uppercase letter. There are no annotations or decorators, just naming convention. Functions that don’t match this signature are ignored.

Execution

By default, the harness runs tests sequentially. If you want concurrency, you can opt in at the test level. Calling t.Parallel() inside a test signals that this test may run alongside others in the same package that also call t.Parallel(). Tests that don’t opt in remain strictly ordered.

Scope of binaries

Every package with tests produces its own binary, and those binaries are run independently. There is no global suite that links packages together, so setup and teardown only exist inside one package’s process. If you have ten packages containing tests, you get ten binaries, each with its own lifecycle.

For example:

project/
├── go.mod
├── db/
│   ├── db.go
│   └── db_test.go
└── api/
    ├── api.go
    └── api_test.go

Running go test ./... produces two binaries: one for db and one for api. Each binary bundles the package code and its tests, and each binary runs on its own. The harness aggregates the results and prints a combined report, but execution itself is confined to the package.

It is important to note that there is no file-level scope. All _test.go files in a package are merged into a single binary, so there is no way to run setup once per file. Similarly, there is no cross-package scope. Go does not let you set up once for all tests in a module or tear down after the last package finishes. If you need orchestration across packages, it has to happen outside of go test, for example in a shell script or a CI pipeline step.

With this background, we can now look at the lifecycle hooks Go does provide. They apply at three levels: per test function, per group of subtests, and per package.

Three different scopes

Typically you need to perform setup and teardown before and after:

each test function is executed (single test function scope)
a group of tests is executed (multiple test function scope)
the full test suite is executed (test package scope)

Per-test setup and teardown

The smallest scope is the test function itself. You create resources at the start of the test and clean them up when it ends. This pattern is common when you want each test to run against a fresh state with no leakage from other tests. The idiomatic way in Go is to wrap the setup in a helper and register the cleanup with t.Cleanup.

type TestDB struct{}

// newTestDB sets up a fresh database for a single test
func newTestDB(t *testing.T) *TestDB {
    t.Helper()
    db := &TestDB{}

    // cleanup tied to the function scope
    t.Cleanup(func() {
        db.Close()
    })

    return db
}

func (db *TestDB) Close() {}
func (db *TestDB) Insert(k, v string) error       { return nil }
func (db *TestDB) Query(k string) (string, error) { return "value", nil }

func TestInsert(t *testing.T) {
    db := newTestDB(t) // new DB created for this test only

    if err := db.Insert("foo", "bar"); err != nil {
        t.Fatalf("insert failed: %v", err)
    }
}

In this example, TestInsert gets its own new database. The cleanup registered with t.Cleanup makes sure the database is closed when the test finishes. The resource is never shared with other tests, which gives you strong isolation. The downside is that if your setup is expensive, it will run before and after every test function, which can slow things down.

Grouped setup and teardown with subtests

The next scope is a group of subtests. Instead of repeating setup for every test, you create the resource once in the parent test and share it with the children. Teardown runs when the parent finishes. This works well when you want to test a flow of operations against the same shared state.

func TestUserFlow(t *testing.T) {
    // new DB created once for this group
    // t.Cleanup() gets called after all the subtests finish and
    // the parent returns
    db := newTestDB(t)

    t.Run("insert user", func(t *testing.T) {
        if err := db.Insert("user:1", "alice"); err != nil {
            t.Fatal(err)
        }
    })

    t.Run("query user", func(t *testing.T) {
        val, err := db.Query("user:1")
        if err != nil {
            t.Fatal(err)
        }
        if val != "alice" {
            t.Fatalf("expected alice, got %s", val)
        }
    })
}

Here both subtests share the same database, and the cleanup runs once when TestUserFlow ends. This is useful when your tests need to act on shared state, like inserting a record and then querying it. The trade-off is that the tests are no longer fully independent, and if one subtest leaves the database in a bad state, others may fail in unexpected ways.

Package-wide setup and teardown with `TestMain`

The broadest scope is the package. If you define TestMain, the test harness calls it instead of running the tests directly. You can perform setup, run all the tests, and then perform teardown. This allows you to reuse an expensive resource across all tests in the package.

var globalDB *TestDB

func TestMain(m *testing.M) {
    globalDB = &TestDB{} // setup once for the entire package

    code := m.Run()

    globalDB.Close() // teardown after all tests

    os.Exit(code)
}

func TestGlobalInsert(t *testing.T) {
    if err := globalDB.Insert("k", "v"); err != nil {
        t.Fatal(err)
    }
}

Here the database is created once and reused by all tests in the package. The teardown runs when everything is finished. This can make your tests run much faster if setup is expensive, but you pay for it in global (package wide) state. If one test mutates the shared resource in an unexpected way, other tests may start failing, and debugging those failures can be difficult.

Also, remember your setup and teardown are still package bound, meaning each package can have its own TestMain. Reasoning about their order can get out of hand quickly. Make sure your tests never depends on the order of TestMain execution. Treat these like init functions and use them sparingly.

Combining the levels

These three scopes are not mutually exclusive. You can combine them when you need different levels of control. A typical pattern is to have TestMain start a package-wide service, create a shared schema or fixture in a parent test for a group of related subtests, and then still use per-test setup inside individual subtests for fine-grained isolation. Each call to newTestDB creates a fresh database, so using it at different levels produces different resources with different lifetimes.

func TestOrders(t *testing.T) {
    schema := newTestDB(t) // group-level DB shared across subtests

    t.Run("create order", func(t *testing.T) {
        db := newTestDB(t) // per-test DB, fresh for this subtest only
        db.Insert("order:1", "widget")
    })

    t.Run("query order", func(t *testing.T) {
        // uses the group-level DB, so the state persists across subtests
        schema.Insert("order:1", "widget")
        val, _ := schema.Query("order:1")
        if val != "widget" {
            t.Fatalf("expected widget, got %s", val)
        }
    })
}

In this example, TestMain could be running a package-wide database server. The parent test TestOrders sets up a schema that is shared across its subtests. Inside, one subtest spins up its own per-test database to work in isolation, while another uses the shared schema to test how state persists across operations.

The combination of package, group, and function scopes gives you flexibility: reuse expensive resources when you need to, and isolate state when correctness depends on it. However, combining scopes can be hard to reason about when you have many different subtests under a single parent that are also interacting with some global state. I tend to avoid this whenever possible.

Parting words

Most of your setup and teardown should happen at the function level. That gives you the strongest isolation and keeps each test self-contained.

The next most useful pattern is at the subtest group level, where you create a resource once in a parent test and let its children share it. Cleanup runs when the parent finishes, which makes sense when you really do want that shared state.

Package-level setup through TestMain should be rare. It is tempting when setup is expensive, but global state is the fastest way to end up with brittle tests. Mixing different scopes is possible, but usually creates more confusion than clarity, so reach for it only when you have no better option.

Gateway pattern for external service calls

Sun, 03 Aug 2025 00:00:00 +0000

No matter which language you’re writing your service in, it’s generally a good idea to separate your external dependencies from your business-domain logic. Let’s say your order service needs to make an RPC call to an external payment service like Stripe when a customer places an order.

Usually in Go, people make a package called external or http and stash the logic of communicating with external services there. Then the business logic depends on the external package to invoke the RPC call. This is already better than directly making RPC calls inside your service functions, as that would make these two separate concerns (business logic and external-service wrangling) tightly coupled. Testing these concerns in isolation, therefore, would be a lot harder.

While this is a fairly common practice, I was looking for a canonical name for this pattern to talk about it in a less hand-wavy way. Turns out Martin Fowler wrote a blog post on it a few moons ago, and he calls it the Gateway pattern. He explores the philosophy in more detail and gives some examples in JS. However, I thought that Gophers could benefit from a few examples to showcase how it translates to Go. Plus, I wanted to reify the following axiom:

High-level modules should not depend on low-level modules. Both should depend on abstractions. Abstractions should not depend on details. Details should depend on abstractions.

– Dependency inversion principle (D in SOLID), Uncle Bob

In this scenario, our business logic in the order package is the high-level module and external is the low-level module, as the latter concerns itself with transport details. Inside external, we could communicate with the external dependencies via either HTTP or gRPC. But that’s an implementation detail and shouldn’t make any difference to the high-level order package.

order will communicate with external via a common interface. This is how we satisfy the “both should depend on abstractions” part of the ethos.

Our app layout looks like this:

yourapp/
├── cmd/                        # wire up the deps
│   └── main.go
├── order/                      # business logic in the service functions
│   ├── service.go
│   └── service_test.go
├── external/                   # code to communicate with external deps
│   └── stripe/
│       ├── gateway.go
│       ├── mock_gateway.go
│       └── gateway_test.go
└── go.mod / go.sum

Let’s walk through the flow from the bottom up. Think about walking back from the edge to the core, as in Alistair Cockburn’s Hexagonal Architecture lingo where edge represents the transport logic and core implies the business concerns.

The Stripe implementation lives in external/stripe/gateway.go. For simplicity’s sake, we’re pretending to call the Stripe API over HTTP, but this could be a gRPC call to another service.

// external/stripe/gateway.go
package stripe

import "fmt"

type StripeGateway struct {
    APIKey string
}

func NewStripeGateway(apiKey string) *StripeGateway {
    return &StripeGateway{APIKey: apiKey}
}

// Handle all the details of making HTTP calls to the Stripe service here.
func (s *StripeGateway) Charge(
    amount int64, currency string, source string) (string, error) {
    fmt.Printf(
        "[Stripe] Charging %d %s to card %s\n",
        amount, currency, source,
    )
    return "txn_live_123", nil
}

// Make another HTTP call to the Stripe service to perform a refund.
func (s *StripeGateway) Refund(transactionID string) error {
    fmt.Printf("[Stripe] Refunding transaction %s\n", transactionID)
    return nil
}

Notice that the stripe package handles the details of communicating with the Stripe endpoint, but it doesn’t export any interface for the higher-level module to use. This is intentional.

In Go, the general advice is that the consumer should define the interface they want, not the provider.

Go interfaces generally belong in the package that uses values of the interface type, not the package that implements those values.

– Go code review comments

That gives the consumer full control over what it wants to depend on, and nothing more. You don’t accidentally couple your code to a bloated interface just because the implementation provided one. You define exactly the shape you need and mock that in your tests.

Clients should not be forced to depend on methods they do not use.

– Interface segregation principle (I in SOLID), Uncle Bob

So, in the order package, we define a tiny private interface that reflects the use case.

// order/service.go
package order

// The order service only requires the Charge method of a payment gateway.
// So we define a tiny interface here on the consumer side rather
// than on the producer side
type paymentGateway interface {
    Charge(amount int64, currency string, source string) (string, error)
}

type Service struct {
    gateway paymentGateway
}

// Pass the Stripe implementation of paymentGateway at runtime here.
func NewService(gateway paymentGateway) *Service {
    return &Service{gateway: gateway}
}

// In production, this calls .Charge on the Stripe implementation.
// During tests, it calls .Charge on a mock gateway.
func (s *Service) Checkout(amount int64, source string) error {
    _, err := s.gateway.Charge(amount, "USD", source)
    return err
}

The order service doesn’t know or care which implementation of the gateway it’s using to perform some action. It just knows it can call Charge on the provided gateway type. It doesn’t need to care about the Refund method on the Stripe gateway implementation. Also, the paymentGateway interface is bound to the order package, so we’re not polluting the API surface with a bunch of tiny interfaces.

Now, when testing the service logic, you just need to write a tiny mock implementation of paymentGateway and pass it to order.Service. You don’t need to reach into the external/stripe package or wire up anything complicated. You can place the fake right next to your service test. Since interface implementations in Go are implicitly satisfied, everything just works without much fuss.

// order/service_test.go
package order_test

import (
    "testing"
    "yourapp/order"
)

type mockGateway struct {
    calledAmount int64
    calledSource string
}

func (m *mockGateway) Charge(
    amount int64, currency, source string) (string, error) {
    m.calledAmount = amount
    m.calledSource = source
    return "txn_mock", nil
}

func TestCheckoutCallsCharge(t *testing.T) {
    mock := &mockGateway{}
    svc := order.NewService(mock)

    err := svc.Checkout(1000, "test_source_abc")
    if err != nil {
        t.Fatalf("unexpected error: %v", err)
    }

    if mock.calledAmount != 1000 {
        t.Errorf("expected amount 1000, got %d", mock.calledAmount)
    }

    if mock.calledSource != "test_source_abc" {
        t.Errorf("want source test_source_abc, got %s",
            mock.calledSource)
    }
}

The test is focused only on what matters: Does the service call Charge with the correct arguments? We’re not testing Stripe here. That’s its own concern.

You can still write tests for the Stripe client if you want. You’d do that in external/stripe/gateway_test.go.

// external/stripe/gateway_test.go
package stripe_test

import (
    "testing"
    "yourapp/external/stripe"
)

func TestStripeGateway_Charge(t *testing.T) {
    gw := stripe.NewStripeGateway("dummy-key")
    txn, err := gw.Charge(1000, "USD", "tok_abc")

    if err != nil {
        t.Fatalf("unexpected error: %v", err)
    }

    if txn == "" {
        t.Fatal("expected transaction ID, got empty string")
    }
}

Finally, everything is wired together in cmd/main.go.

// cmd/main.go
package main

import (
    "yourapp/external/stripe"
    "yourapp/order"
)

func main() {
    stripeGw := stripe.NewStripeGateway("live-api-key")

    // Passing the real Stripe gateway to the order service.
    orderSvc := order.NewService(stripeGw)

    _ = orderSvc.Checkout(5000, "tok_live_card_xyz")
}

It’s also common to call gateways “client.” Some people prefer that name. However, I think client is way overloaded, which makes it hard to discuss the pattern clearly. There’s the HTTP client, the gRPC client, and then your own client that wraps these. It gets confusing fast. I prefer “gateway,” as Martin Fowler used in his original text.

In Go context, the core idea is that a service function uses a locally defined gateway interface to communicate with external gateway providers. This way, the service and the external providers are unaware of each other’s existence and can be tested independently.

Flags for discoverable test config in Go

Sat, 28 Jun 2025 00:00:00 +0000

As your test suite grows, you need ways to toggle certain kinds of tests on or off. Maybe you want to enable snapshot tests, skip long-running integration tests, or switch between real services and mocks. In every case, you’re really saying, “Run this test only if X is true.”

So where does X come from?

I like to rely on Go’s standard tooling so that integration and snapshot tests can live right beside ordinary unit tests. Because I usually run these heavier tests in testcontainers, I don’t always want them running while I’m iterating on a feature or chasing a bug. So I need to enable them in an optional manner.

To fetch the X and conditionally run some tests, you’ll typically see three approaches:

Build tags – place integration or snapshot tests in files guarded by build tags, so they’re compiled only when you include the tag.
Environment variables – have each test look for an environment variable (e.g., RUN_INTEGRATION=1) and skip itself if it’s absent.
Custom go test flags (my preferred approach) – define your own flags so you can run, for example, go test -run Integration -integration.

Build tags are hard to discover

Build tags are special comments you place at the top of a .go file to tell Go to include that file only when certain tags are set during the build. This is how they typically look:

//go:build snapshot

package main

import "testing"

func TestSnapshot(t *testing.T) {
    t.Log("running snapshot")
}

This file will only be compiled and included when you run:

go test -tags=snapshot

If you don’t pass the tag, the file is skipped entirely during the build. Go won’t even see the test.

The upside is that it gives you a clean separation. You can group slow tests or environment-dependent tests into their own files. But the downsides add up quickly.

First, there’s no way to discover which tags are used without grepping through the codebase. Go itself won’t tell you. go help test doesn’t mention them. There’s no built-in list or summary. You need to solely depend on documentation.

Second, build tags are applied per file, not per package. That means if even one test in a file is guarded by a tag, the entire file is excluded unless the tag is passed. This makes it difficult to mix optional and always-on tests in the same file.

And third, once you have more than a couple of tags, managing them becomes guesswork. You end up running things like:

go test -tags=slow,mock,external

But you no longer remember what each one does or what combinations are safe. There’s no validation. It gets messy fast.

Envvars are a bit better

Environment variables let you control test behavior at runtime. You don’t need to recompile anything, and you can pass them inline when running tests.

Here’s a typical example:

import "os"

func TestSnapshot(t *testing.T) {
    if os.Getenv("SNAPSHOT") != "1" {
        t.Skip("set SNAPSHOT=1 to run this test")
    }

    t.Log("running snapshot")
}

You run it like:

SNAPSHOT=1 go test -v

This is more dynamic than build tags. You don’t have to split tests into separate files, and you don’t have to rebuild with special flags. More importantly, the test itself can detect when the environment variable is missing and tell you what to do. It can skip itself and print a message like “set SNAPSHOT=1 to run this test.” That feedback loop is helpful.

But the discovery problem remains. There’s no built-in way to ask, “what environment variables does this test suite support?” You still have to read the code to find out.

It can get worse if the check is buried deep in a helper. Maybe some setup logic does:

if os.Getenv("SNAPSHOT") == "1" {
    useRealService()
}

Now the test runs, but the behavior changes silently based on the environment. Nothing in the test output tells you that the envvar was involved. You may not even realize that you’re running in a different mode.

And just like with build tags, there’s no central registry. No docs or summary. You can only hope someone left a good comment or wrote it down somewhere.

Custom flags are almost always better

The cleanest and most discoverable way to control optional test behavior in Go is by defining your own test flags. They’re typed, explicit, and work well with Go’s built-in tooling. Instead of toggling tests with magic file-level build tags or invisible environment variables, you can wire up test configuration using the flag package, just like any other Go binary.

There are two common approaches for defining test flags:

Package-level flags via TestMain
Per-file flags via init().

Both approaches register the flag in the global flag set, so every test in the package can see the value once parsing has happened. The trade-off is indirection versus locality: TestMain centralizes all flags in one place, while file-level init() keeps each flag next to the code that cares about it.

Here’s how it looks with TestMain:

package snapshot_test

import (
    "flag"
    "os"
    "testing"
)

var snapshot = flag.Bool("snapshot", false, "run snapshot tests")

func TestMain(m *testing.M) {
    flag.Parse()
    os.Exit(m.Run())
}

func TestSnapshot(t *testing.T) {
    if !*snapshot {
        t.Skip("pass -snapshot to run this test")
    }
    t.Log("running snapshot")
}

And here’s the equivalent using init() to keep everything in the same file:

package snapshot_test

import (
    "flag"
    "testing"
)

var snapshot bool

func init() {
    flag.BoolVar(&snapshot, "snapshot", false, "run snapshot tests")
}

func TestSnapshot(t *testing.T) {
    if !snapshot {
        t.Skip("pass -snapshot to run this test")
    }
    t.Log("running snapshot")
}

Once you’ve defined a flag, you run the snapshot tests like this:

go test -v -snapshot

You can also list all the flags using:

go test -v -args -h

This prints all registered flags, including your own:

  -snapshot
        run snapshot tests
  -test.v
        verbose: print all tests as they are run.
  -test.run
        run only those tests and examples matching the regular expression.
   # ...

A detail about names: built-in flags show up in the help output with a test. prefix (-test.v, -test.run, -test.timeout), yet you pass them without that prefix (-v, -run, -timeout) while running tests. The Go tool strips test. for you. Custom flags don’t get this treatment. Whatever string you register is the exact string you must pass. If you register snapshot you run:

go test -snapshot

If you register test.snapshot you must run:

go test -test.snapshot

There is no automatic collapsing just because the name starts with test..

The flag -args lets you pass additional arguments to the test binary. When the binary sees -h after -args, it prints every flag and exits. No tests run, though the binary is built. That one command exposes the full configuration surface of your tests.

If you namespace your flags like this:

flag.BoolVar(&snapshot, "custom.snapshot", false, "run snapshot tests")

Then you can grep for them:

go test -v -args -h | grep custom

Define the global flags in TestMain when several files need the same switches or when you have package-wide setup (containers, databases, global mocks). Define flags in init() when a switch is relevant to one test file and you want the declaration right next to the logic it controls. I usually prefer per-test- file-level flags that don’t need to depend on any global magic.

Either way, the flag lives in code, is easy to grep, appears in -h, and tells everyone exactly what it controls. The only downside I can think of with this approach is that, similar to the environment variable technique, you’ll have to check for the flag in every test and make a decision. But in practice, I prefer the flexibility over the all-or-nothing approach with build tags.

I think flags are the best way to configure your apps and tools. Even when environment variables are involved, I often map them to flags for documentation purposes. The goal is to give users a single -h command they can run to see all available options for tuning behavior. Tests are no exception. I was quite happy to find out that Peter Bourgon conveyed the same sentiment in this seminal 2018 blog post.

You probably don't need a DI framework

Sat, 24 May 2025 00:00:00 +0000

When working with Go in an industrial programming context, I feel like dependency injection (DI) often gets a bad rep because of DI frameworks. But DI as a technique is quite useful. It just tends to get explained with too many OO jargons and triggers PTSD among those who came to Go to escape GoF theology.

Dependency Injection is a 25-dollar term for a 5-cent concept.

– James Shore

DI basically means passing values into a constructor instead of creating them inside it. That’s really it. Observe:

type server struct {
    db DB
}

// NewServer constructs a server instance
func NewServer() *server {
    db := DB{}            // The dependency is created here
    return &server{db: db}
}

Here, NewServer creates its own DB. Instead, to inject the dependency, build DB elsewhere and pass it in as a constructor parameter:

func NewServer(db DB) *server {
    return &server{db: db}
}

Now the constructor no longer decides how a database is built; it simply receives one.

In Go, DI is often done using interfaces. You collate the behavior you care about in an interface, and then provide different concrete implementations for different contexts. In production, you pass a real implementation of DB. In unit tests, you pass a fake implementation that behaves the same way from the caller’s perspective but avoids real database calls.

Here’s how that looks:

// behaviour we care about
type DB interface {
    Get(id string) (string, error)
    Save(id, value string) error
}

type server struct{ db DB }

// NewServer accepts a DB implementation and passes it to server
func NewServer(db DB) *server { return &server{db: db} }

A real implementation of DB might look like this:

type RealDB struct{ url string }

func NewDB(url string) *RealDB { return &RealDB{url: url} }

func (r *RealDB) Get(id string) (string, error) {
    // pretend we hit Postgres
    return "real value", nil
}
func (r *RealDB) Save(id, value string) error { return nil }

And a fake implementation for unit tests might be:

type FakeDB struct{ data map[string]string }

func NewFake() *FakeDB { return &FakeDB{data: map[string]string{}} }

func (f *FakeDB) Get(id string) (string, error) {
    return f.data[id], nil
}
func (f *FakeDB) Save(id, value string) error {
    f.data[id] = value
    return nil
}

Use the fake in unit tests like so:

func TestServerGet(t *testing.T) {
    fake := NewFake()
    _    = fake.Save("42", "fake")

    srv := NewServer(fake)
    val, _ := srv.db.Get("42")

    if val != "fake" {
        t.Fatalf("want fake, got %s", val)
    }
}

The compiler guarantees both RealDB and FakeDB satisfy DB, and during tests, we can swap out the implementations without much ceremony.

Why frameworks turn mild annoyance into actual pain

Once NewServer grows half a dozen dependencies, wiring them by hand can feel noisy. That’s when a DI framework starts looking tempting.

With Uber’s dig, you register each constructor as a provider. Provide takes a function, uses reflection to inspect its parameters and return type, and adds it as a node in an internal dependency graph. Nothing is executed yet. Things only run when you call .Invoke() on the container.

But that reflection-driven magic is also where the pain starts. As your graph grows, it gets harder to tell which constructor feeds which one. Some constructor takes one parameter, some takes three. There’s no single place you can glance at to understand the wiring. It’s all figured out inside the container at runtime.

Let the container figure it out!

– every DI framework ever

func BuildContainer() *dig.Container {
    c := dig.New()
    // Each Provide call teaches dig about one node in the graph.
    c.Provide(NewConfig)     // produces *Config
    c.Provide(NewDB)         // wants *Config, produces *DB
    c.Provide(NewRepo)       // wants *DB, produces *Repo
    c.Provide(NewFlagClient) // produces *FlagClient
    c.Provide(NewService)    // wants *Repo, *FlagClient, produces *Service
    c.Provide(NewServer)     // wants *Service, produces *server
    return c
}

func main() {
    // Invoke starts the graph; dig sorts and calls constructors
    if err := BuildContainer().Invoke(
        func(s *server) { s.Run() }); err != nil {
        panic(err)
    }
}

Now try commenting out NewFlagClient. The code still compiles. There’s no error until runtime, when dig fails to construct NewService due to a missing dependency. And the error message you get?

dig invoke failed: could not build arguments for function
        main.main.func1 (prog.go:87)
    : failed to build *main.Server
    : could not build arguments for function main.NewServer (prog.go:65)
    : failed to build *main.Service: missing dependencies for function
        main.NewService (prog.go:55)
    : missing type: *main.FlagClient

That’s five stack frames deep, far from where the problem started. Now you’re digging through dig’s internals to reconstruct the graph in your head.

Google’s wire takes a different approach: it shifts the graph-building to code generation. You collect your constructors in a wire.NewSet, call wire.Build, and the generator writes a wire_gen.go that wires everything up explicitly.

var serverSet = wire.NewSet(
    NewConfig,
    NewDB,
    NewRepo,
    NewFlagClient, // comment out to see Wire complain
    NewService,
    NewServer,
)

func InitializeServer() (*server, error) {
    wire.Build(serverSet)
    return nil, nil // replaced by generated code
}

Comment out NewFlagClient and Wire fails earlier - during generation:

wire: ../../service/wire.go:13:2: cannot find dependency for *flags.Client

It’s better than dig’s runtime panic, but still comes with its own headaches:

You need to remember to run go generate ./... whenever constructor signatures change.
When something breaks, you’re stuck reading through hundreds of lines of autogenerated glue to trace the issue.
You have to teach every teammate Wire’s DSL - wire.NewSet, wire.Build, build tags, and sentinel rules. And if you ever switch to something different like dig, you’ll need to learn a completely different set of concepts: Provide, Invoke, scopes, named values, etc.

While DI frameworks tend to use vocabularies like provider or container to give you an essense of familiarity, they still reinvent the API surface every time. Switching between them means relearning a new mental model.

So the promise of “just register your providers and forget about wiring” ends up trading clear, compile-time control for either reflection or hidden generator logic - and yet another abstraction layer you have to debug.

The boring alternative: keep wiring explicit

In Go, you can just wire your own dependencies manually. Like this:

func main() {
    cfg := NewConfig()

    db    := NewDB(cfg.DSN)
    repo  := NewRepo(db)
    flags := NewFlagClient(cfg.FlagURL)

    svc := NewService(repo, flags, cfg.APIKey)
    srv := NewServer(svc, cfg.ListenAddr)

    srv.Run()
}

Longer? Yes. But:

The call order is the dependency graph.
Errors are handled right where they happen.

If a constructor changes, the compiler points straight at every broken call:

./main.go:33:39: not enough arguments in call to NewService
    have (*Repo, *FlagClient)
    want (*Repo, *FlagClient, string)

No reflection, no generated code, no global state. Go type-checks the dependency graph early and loudly, exactly how it should be. And also, it doesn’t confuse your LSP, so your IDE keeps on being useful.

If main() really grows unwieldy, split your code:

func buildInfra(cfg *Config) (*DB, *FlagClient, error) {
    // ...
}

func buildService(cfg *Config) (*Service, error) {
    db, flags, err := buildInfra(cfg)
    if err != nil { return nil, err }
    return NewService(NewRepo(db), flags, cfg.APIKey), nil
}

func main() {
    cfg := NewConfig()
    svc, err := buildService(cfg)
    if err != nil { log.Fatal(err) }
    NewServer(svc, cfg.ListenAddr).Run()
}

Each helper is a regular function that anyone can skim without reading a framework manual. Also, you usually build all of your dependency in one place and it’s really not that big of a deal if your builder function takes in 20 parameters and builds all the dependencies. Just put each function parameter on their own line and use gofumpt to format the code to make it readable.

Reflection works elsewhere, so why not here?

Other languages lean on containers because often times constructors cannot be overloaded and compile times hurt. Go already gives you:

First-class functions so constructors are plain values.
Interfaces so implementations swap cleanly in tests.
Fast compilation so feedback loops stay tight.

A DI framework often fixes problems Go already solved and trades away readability to do it.

The most magical thing about Go is how little magic it allows.

– Some Gopher on Reddit

You might still want a framework

It’s tempting to make a blanket statement saying that you should never pick up a DI framework, but context matters here.

I was watching Uber’s GopherCon talk on Go at scale and how their DI framework Fx (which uses dig underneath) allows them to achieve consistency at scale. If you’re Uber and have all the observability tools in place to get around the downsides, then you’ll know.

Also, if you’re working in a codebase that’s already leveraging a framework and it works well, then it doesn’t make sense to refactor it without any incentives.

Or, you’re writing one of those languages where using a DI framework is the norm, and you’ll be called a weirdo if you try to reinvent the wheel there.

However, in my experience, even in organizations that maintain a substantial number of Go repos, DI frameworks add more confusion than they’re worth. If your experience is otherwise, I’d love to be proven wrong.

The post got a fair bit of discussion going around the web. You might find it interesting.

Preventing accidental struct copies in Go

Mon, 21 Apr 2025 00:00:00 +0000

By default, Go copies values when you pass them around. But sometimes, that can be undesirable. For example, if you accidentally copy a mutex and multiple goroutines work on separate instances of the lock, they won’t be properly synchronized. In those cases, passing a pointer to the lock avoids the copy and works as expected.

Take this example: passing a sync.WaitGroup by value will break things in subtle ways:

func f(wg sync.WaitGroup) {
    // ... do something with the waitgroup
}

func main() {
    var wg sync.WaitGroup
    f(wg) // oops! wg is getting copied here!
}

sync.WaitGroup lets you wait for multiple goroutines to finish some work. Under the hood, it’s a struct with methods like Add, Done, and Wait to sync concurrently running goroutines.

That snippet compiles fine but leads to buggy behavior because we’re copying the lock instead of referencing it in the f function.

Luckily, go vet catches it. If you run vet on that code, you’ll get a warning like this:

f passes lock by value: sync.WaitGroup contains sync.noCopy
call of f copies lock value: sync.WaitGroup contains sync.noCopy

This means we’re passing wg by value when we should be passing a reference. Here’s the fix:

func f(wg *sync.WaitGroup) { // pass by reference
    // ... do something with the waitgroup
}

func main() {
    var wg sync.WaitGroup
    f(&wg) // pass a pointer to wg
}

Since this kind of incorrect copy doesn’t throw a compile-time error, if you skip go vet, you might never catch it. Another reason to always vet your code.

I was curious how the Go toolchain enforces this. The clue is in the vet warning:

call of f copies lock value: sync.WaitGroup contains sync.noCopy

So the sync.noCopy struct inside sync.WaitGroup is doing something to alert go vet when you pass it by value.

Looking at the implementation of sync.WaitGroup, you’ll see:

type WaitGroup struct {
    noCopy noCopy

    state atomic.Uint64
    sema  uint32
}

Then I traced the definition of noCopy in sync/cond.go:

// noCopy may be added to structs which must not be copied
// after the first use.

// Note that it must not be embedded, due to the Lock and Unlock methods.
type noCopy struct{}

// Lock is a no-op used by -copylocks checker from `go vet`.
func (*noCopy) Lock()   {}
func (*noCopy) Unlock() {}

Just having those no-op Lock and Unlock methods on noCopy is enough. This implements the Locker interface. Then if you put that struct inside another one, go vet will flag cases where you try to copy the outer struct.

Also, note the comment: don’t embed noCopy. Include it explicitly. Embedding would expose Lock and Unlock on the outer struct, which you probably don’t want.

The Go toolchain enforces this with the copylock checker. It’s part of go vet. You can exclusively invoke it with go vet -copylocks ./.... It looks for value copies of any struct that nests a struct with Lock and Unlock methods. It doesn’t matter what those methods do, just having them is enough.

When vet runs, it walks the AST and applies the checker on assignments, function calls, return values, struct literals, range loops, channel sends, basically anywhere values can get copied. If it sees you copying a struct with noCopy, it yells.

Interestingly, if you define noCopy as anything other than a struct and implement the Locker interface, vet ignores that. I tested this on Go 1.24:

type noCopy int     // this is valid but vet doesn't get triggered
func (*noCopy) Lock()   {}
func (*noCopy) Unlock() {}

This doesn’t trigger vet. It only works when noCopy is a struct. The reason is that vet takes a shortcut in the copylock checker when deciding whether to trigger the warning. Currently, it explicitly looks for a struct that satisfies the Locker interface and ignores any other type even if it implements the interface.

You’ll see this in other parts of the sync package too. sync.Mutex uses the same trick:

type Mutex struct {
    _ noCopy

    mu isync.Mutex
}

Same with sync.Once:

type Once struct {
    done   uint32
    m      Mutex
    noCopy noCopy
}

Here’s a complete example of abusing -copylocks to prevent copying our own struct:

type Svc struct{ _ noCopy }

type noCopy struct{}

func (*noCopy) Lock()   {}
func (*noCopy) Unlock() {}

// Use this
func main() {
    var svc Svc
    _ = svc // go vet will complain about this copy op
}

Running go vet on this gives:

assignment copies lock value to s: play.Svc contains play.noCopy
call of fmt.Println copies lock value: play.Svc contains play.noCopy

Someone on Reddit asked me what actually triggers the copylock checker in go vet - is it the struct’s literal name noCopy or the fact that it implements the Locker interface?

The name noCopy isn’t special. You can call it whatever you want. As long as it implements the Locker interface, go vet will complain if the surrounding struct gets copied. See this Go Playground snippet.

Go 1.24's "tool" directive

Sun, 13 Apr 2025 00:00:00 +0000

Go 1.24 added a new tool directive that makes it easier to manage your project’s tooling.

I used to rely on Make targets to install and run tools like stringer, mockgen, and linters like gofumpt, goimports, staticcheck, and errcheck. Problem is, these installations were global, and they’d often clash between projects.

Another big issue was frequent version mismatch. I ran into cases where people were formatting the same codebase differently because they had different versions of the tools installed. Then CI would yell at everyone because it was always installing the latest version of the tools before running them. Chaos!

The `tools.go` convention

To avoid this mess, the Go community came up with a convention where you’d pin your tool versions in a tools.go file. I’ve written about omitting dev dependencies before. But the gist is, you’d have a tools.go file in your root directory that imports the tooling and assigns them to _:

//go:build tools

// tools.go
package tools

import (
    _ "github.com/golangci/golangci-lint/cmd/golangci-lint"
    _ "mvdan.cc/gofumpt"
)

Since these dependencies aren’t used directly in the codebase, the //go:build tools directive ensures they’re excluded from the main build.

Then running go mod tidy keeps things clean and includes these dev dependencies in the go.mod and go.sum files.

This works, but it always felt a bit clunky. You end up polluting your main go.mod with tooling-only dependencies. And sometimes, transitive dependencies of those tools clash with your app’s dependencies.

The new tool directive in Go 1.24 solves some of the tools.go pain points.

Enter the `tool` directive

With Go 1.24, you can now add tooling with the -tool flag when using go get:

go get -tool github.com/golangci/golangci-lint/cmd/golangci-lint@latest

This adds the dependency to your go.mod like this:

module github.com/rednafi/foo

go 1.24.2

tool github.com/golangci/golangci-lint/cmd/golangci-lint

// ... other transitive dependencies

Notice the tool directive clearly separates these from regular module dependencies.

Then you can run the tool with:

go tool golangci-lint run ./...

One thing to keep in mind: the first time you run a tool this way, it might take a second - Go needs to compile it before running if it isn’t already compiled. After that, it’s cached, so subsequent runs are fast.

What about `go generate`?

This also plays nicely with go generate. I’ve started replacing direct tool calls with go tool, so contributors don’t need to install tools globally. Just run go generate and you’re done:

//go:generate go tool stringer -type=MyEnum

No further setup needed, no path issues, and it’s always using the version you pinned.

Still not perfect

That said, one thing still bugs me: go get -tool adds these dev tools to the main go.mod file. That means your application and dev dependencies are still mixed together. Same problem the tools.go hack had.

There’s no built-in way to avoid this yet. So your options are:

Accept that dev and app deps will live in the same go.mod file.
Create a separate tools module to isolate your tooling. A bit clunky, but doable.

I went with the second option.

My layout looks like this:

.
├── go.mod
├── go.sum
└── tools
    └── go.mod

Then I install tools like this:

cd tools
go get -tool github.com/golangci/golangci-lint/cmd/golangci-lint@latest

And run them from the root directory as follows:

go tool -modfile tools/go.mod golangci-lint run ./...

The go tool command supports a -modfile flag that you can use to specify where to pull the tool version from. I really wish go get supported -modfile too - that way you wouldn’t need to manage the dependencies in such a wonky manner. This was close to being perfect. Well, maybe in a future release.

Only works with Go-native tools

Another limitation is that it only works with tools written in Go. So if you’re using stuff like eslint, prettier, or jq, you’re on your own. But for most of my projects, the dev tooling is written in Go anyway, so this setup has been working okay.

Capturing console output in Go tests

Sat, 12 Apr 2025 00:00:00 +0000

Ideally, every function that writes to the stdout probably should ask for a io.Writer and write to it instead. However, it’s common to encounter functions like this:

func frobnicate() {
    fmt.Println("do something")
}

This would be easier to test if frobnicate would ask for a writer to write to. For instance:

func frobnicate(w io.Writer) {
    fmt.Fprintln(w, "do something")
}

You could pass os.Stdout to frobnicate explicitly to write to the console:

func main() {
    frobnicate(os.Stdout)
}

This behaves exactly the same way as the first version of frobnicate.

During test, instead of os.Stdout, you’d just pass a bytes.Buffer and assert its content as follows:

func TestFrobnicate(t *testing.T) {
    // Create a buffer to capture the output
    var buf bytes.Buffer

    // Call the function with the buffer
    frobnicate(&buf)

    // Check if the output is as expected
    expected := "do something\n"
    if buf.String() != expected {
        t.Errorf("Expected %q, got %q", expected, buf.String())
    }
}

This is all good. But many functions or methods that emit logs just do that directly to stdout. So we want to test the first version of frobnicate without making any changes to it.

I found this neat pattern to test functions that write to stdout without accepting a writer.

The idea is to write a helper function named captureStdout that looks like this:

// captureStdout replaces os.Stdout with a buffer and returns it.
func captureStdout(f func()) string {
    old := os.Stdout
    r, w, _ := os.Pipe()
    os.Stdout = w

    f() // run the function that writes to stdout

    _ = w.Close()
    var buf bytes.Buffer
    _, _ = io.Copy(&buf, r)
    os.Stdout = old

    return buf.String()
}

Here’s what’s happening under the hood:

We use os.Pipe() to create a pipe: a connected pair of file descriptors - a reader (r) and a writer (w). Think of it like a temporary tunnel. Whatever we write to w, we can read back from r. Since both are just files as far as Go is concerned, we can temporarily replace os.Stdout with the writer end of the pipe:

os.Stdout = w

This means anything printed to stdout during the function run actually goes into our pipe. After the function runs, we close the writer to signal that we’re done writing, then read from the reader into a buffer and restore the original stdout.

Now we can test frobnicate without touching its implementation:

func TestFrobnicate(t *testing.T) {
    output := captureStdout(func() {
        frobnicate()
    })

    expected := "do something\n"
    if output != expected {
        t.Errorf("Expected %q, got %q", expected, output)
    }
}

No need to refactor frobnicate. This works great for quick tests when you don’t control the code or just want to assert some printed output.

A more robust capture out

The above version of captureStdout works fine for simple cases. But in practice, functions might also write to stderr, especially if they’re using Go’s log package or if a panic happens. For example, this would not be captured by the simple captureStdout helper:

log.Println("something went wrong")

Even though it looks like a normal print statement, log writes to stderr by default. So if you want to catch that output too, or generally capture everything that’s printed to the console during a function call, we need to upgrade our helper a bit. I found this example from immudb’s captureOutput helper.

Here’s a more complete version:

// captureOut captures both stdout and stderr.
func captureOut(f func()) string {
    // Create a pipe to capture stdout
    custReader, custWriter, err := os.Pipe()
    if err != nil {
        panic(err)
    }

    // Save the original stdout and stderr to restore later
    origStdout := os.Stdout
    origStderr := os.Stderr

    // Restore stdout and stderr when done
    defer func() {
        os.Stdout = origStdout
        os.Stderr = origStderr
    }()

    // Set the stdout and stderr to the pipe
    os.Stdout, os.Stderr = custWriter, custWriter
    log.SetOutput(custWriter)

    // Create a channel to read the output from the pipe
    out := make(chan string)

    // Goroutine reads from pipe and sends output to channel
    var wg sync.WaitGroup
    wg.Add(1)
    go func() {
        var buf bytes.Buffer
        wg.Done()
        io.Copy(&buf, custReader)
        out <- buf.String()
    }()
    wg.Wait()

    // Call the function that writes to stdout
    f()

    // Close the writer to signal that we're done
    _ = custWriter.Close()

    // Wait for the goroutine to finish reading from the pipe
    return <-out
}

This version does a few more things:

Captures everything: It redirects both os.Stdout and os.Stderr to ensure all standard output streams are captured. It also explicitly redirects the standard log package’s output, which often bypasses os.Stderr.
Prevents deadlocks: Output is read concurrently in a separate goroutine. This is crucial because if f generates more output than the internal pipe buffer can hold, writing would block without a concurrent reader, causing a deadlock.
Ensure reader readiness: A sync.WaitGroup guarantees the reading goroutine is active before f starts executing. This prevents a potential race condition where initial output could be lost if f writes before the reader is ready.
Guarantees cleanup: Using defer, the original os.Stdout and os.Stderr are always restored, even if f panics. This prevents the function from permanently altering the program’s standard output streams.

You’d use captureOut the same way as the naive captureStdout. This version is safer and more complete, and works well when you’re testing CLI commands, log-heavy code, or anything that might write to the terminal in unexpected ways.

It’s not a replacement for writing functions that accept io.Writer, but when you’re dealing with existing code or want to quickly assert on terminal output, it gets the job done.

Deferred teardown closure in Go testing

Fri, 28 Mar 2025 00:00:00 +0000

While watching Mitchell Hashimoto’s Advanced Testing with Go talk, I came across this neat technique for deferring teardown to the caller. Let’s say you have a helper function in a test that needs to perform some cleanup afterward.

You can’t run the teardown inside the helper itself because the test still needs the setup. For example, in the following case, the helper runs its teardown immediately:

func TestFoo(t *testing.T) {
    helper(t)

    // Test logic here: resources may already be cleaned up!
}

func helper(t *testing.T) {
    t.Helper()

    // Setup code here.

    // Teardown code here.
    defer func() {
        // Clean up something.
    }()
}

When helper is called, it defers its teardown - which executes at the end of the helper function, not the test. But the test logic still depends on whatever the helper set up. So this approach doesn’t work.

The next working option is to move the teardown logic into the test itself:

func TestFoo(t *testing.T) {
    helper(t)

    // Run the teardown of helper.
    defer func() {
        // Clean up something.
    }()

    // Test logic here.
}

func helper(t *testing.T) {
    t.Helper()

    // Setup code here.

    // No teardown here; we move it to the caller.
}

This works fine if you have only one helper. But with multiple helpers, it quickly becomes messy - you now have to manage multiple teardown calls manually, like this:

func TestFoo(t *testing.T) {
    helper1(t)
    helper2(t)

    defer func() {
        // Clean up helper2.
    }()

    defer func() {
        // Clean up helper1.
    }()

    // Test logic here.
}

You also need to be careful with the order: defer statements are executed in LIFO (last-in, first-out) order. So if teardown order matters, this can be a problem. Ideally, your tests shouldn’t depend on teardown order - but sometimes they do.

So rather than manually handling cleanup inside the test, have helpers return a teardown function that the test can defer itself. Here’s how:

func TestFoo(t *testing.T) {
    teardown1 := helper1(t)
    defer teardown1()

    teardown2 := helper2(t)
    defer teardown2()

    // Test logic here.
}

func helper1(t *testing.T) func() {
    t.Helper()

    // Setup code here.
    // Maybe create a temp dir, start a mock server, etc.

    return func() {
        // Teardown code here.
    }
}

func helper2(t *testing.T) func() {
    t.Helper()

    // Setup code here.

    return func() {
        // Teardown code here.
    }
}

Each helper is self-contained: it sets something up and returns a function to clean up whatever resource it has spun up. The test controls when teardown happens by calling the cleanup function at the appropriate time. Another benefit is that the returned teardown closure has access to the local variables of the helper. So func() can access the helper’s *testing.T without us having to pass it explicitly as a parameter.

Here’s how I’ve been using this pattern.

Creating a temporary file to test file I/O

The setupTempFile helper creates a temporary file, writes some content to it, and returns the file name along with a teardown function that removes the file.

func setupTempFile(t *testing.T, content string) (string, func()) {
    t.Helper()

    tmpFile, err := os.CreateTemp("", "temp-*.txt")
    if err != nil {
        t.Fatalf("failed to create temp file: %v", err)
    }

    if _, err := tmpFile.WriteString(content); err != nil {
        t.Fatalf("failed to write to temp file: %v", err)
    }
    tmpFile.Close()

    return tmpFile.Name(), func() {
        if err := os.Remove(tmpFile.Name()); err != nil {
            t.Errorf("failed to remove temp file %s: %v",
                tmpFile.Name(), err)
        } else {
            t.Logf("cleaned up temp file: %s", tmpFile.Name())
        }
    }
}

In the main test:

func TestReadFile(t *testing.T) {
    path, cleanup := setupTempFile(t, "hello world")
    defer cleanup()

    data, err := os.ReadFile(path)
    if err != nil {
        t.Fatalf("failed to read file: %v", err)
    }

    t.Logf("file contents: %s", data)
}

Running the test displays:

=== RUN   TestReadFile
    prog_test.go:18: file contents: hello world
    prog_test.go:38: cleaned up temp file: /tmp/temp-30176446.txt
--- PASS: TestReadFile (0.00s)
PASS

Starting and stopping a mock HTTP server

Sometimes you want to test code that makes HTTP calls. Here’s a helper that starts an in-memory mock server and returns its URL and a cleanup function that shuts it down:

func setupMockServer(t *testing.T) (string, func()) {
    t.Helper()

    handler := http.HandlerFunc(
        func(w http.ResponseWriter, r *http.Request) {
            w.WriteHeader(http.StatusOK)
            w.Write([]byte("mock response"))
        },
    )

    server := httptest.NewServer(handler)

    return server.URL, func() {
        server.Close()
        t.Log("mock server shut down")
    }
}

And in the test:

func TestHTTPRequest(t *testing.T) {
    url, cleanup := setupMockServer(t)
    defer cleanup()

    resp, err := http.Get(url)
    if err != nil {
        t.Fatalf("failed to make HTTP request: %v", err)
    }
    defer resp.Body.Close()

    body, _ := io.ReadAll(resp.Body)
    t.Logf("response body: %s", body)
}

Running the test prints:

=== RUN   TestHTTPRequest
    prog_test.go:34: response body: mock response
    prog_test.go:20: mock server shut down
--- PASS: TestHTTPRequest (0.00s)
PASS

Setting up and tearing down a database table

In tests that hit a real (or test) database, you often need to create and drop tables. Here’s a helper that sets up a test table and returns a teardown function to drop it:

func setupTestTable(t *testing.T, db *sql.DB) func() {
    t.Helper()

    query := `CREATE TABLE IF NOT EXISTS users (
        id INTEGER PRIMARY KEY,
        name TEXT
    )`
    _, err := db.Exec(query)
    if err != nil {
        t.Fatalf("failed to create table: %v", err)
    }

    return func() {
        _, err := db.Exec(`DROP TABLE IF EXISTS users`)
        if err != nil {
            t.Errorf("failed to drop table: %v", err)
        } else {
            t.Log("dropped test table")
        }
    }
}

And the test:

func TestInsertUser(t *testing.T) {
    db := getTestDB(t) // Opens test DB; defined elsewhere
    cleanup := setupTestTable(t, db)
    defer cleanup()

    _, err := db.Exec(`INSERT INTO users (name) VALUES (?)`, "Alice")
    if err != nil {
        t.Fatalf("failed to insert user: %v", err)
    }
}

The t.Cleanup() method

P.S. I learned about this after the blog went live.

Go 1.14 added the t.Cleanup() method, which lets you avoid returning the teardown closures from helper functions altogether. It also runs the cleanup logic in the correct order (LIFO). So, you could rewrite the first example in this post as follows:

func TestFoo(t *testing.T) {
    // The testing package will ensure that the cleanup runs at the end of
    // this test function.
    helper(t)

    // Test logic here.
}

func helper(t *testing.T) {
    t.Helper()

    // We register the teardown logic with t.Cleanup().
    t.Cleanup(func() {
        // Teardown logic here.
    })
}

Now the testing package will handle calling the cleanup logic in the correct order. You can add multiple teardown functions like this:

t.Cleanup(func() {})
t.Cleanup(func() {})

The functions will run in LIFO order. Similarly, the database setup example can be rewritten like this:

func setupTestTable(t *testing.T, db *sql.DB) func() {
    t.Helper()

    // Logic as before.

    // Instead of returning the teardown function, we register
    // it with t.Cleanup().
    t.Cleanup(func() {
        _, err := db.Exec(`DROP TABLE IF EXISTS users`)
        if err != nil {
            t.Errorf("failed to drop table: %v", err)
        } else {
            t.Log("dropped test table")
        }
    })
}

Then the helper function is used like this:

func TestInsertUser(t *testing.T) {
    db := getTestDB(t) // Opens a test DB connection; defined elsewhere.

    // This sets up the DB, and t.Cleanup will execute the teardown
    // logic once this test function finishes.
    setupTestTable(t, db)

    // Rest of the test logic.
}

Fin!

Three flavors of sorting Go slices

Sat, 22 Mar 2025 00:00:00 +0000

There are primarily three ways of sorting slices in Go. Early on, we had the verbose but flexible method of implementing sort.Interface to sort the elements in a slice. Later, Go 1.8 introduced sort.Slice to reduce boilerplate with inline comparison functions. Most recently, Go 1.21 brought generic sorting via the slices package, which offers a concise syntax and compile-time type safety.

These days, I mostly use the generic sorting syntax, but I wanted to document all three approaches for posterity.

Using sort.Interface

The oldest technique is based on sort.Interface. You create a custom type that wraps your slice and implement three methods - Len, Less, and Swap - to satisfy the interface. Then you pass this custom type to sort.Sort().

Sorting a slice of integers

The following example defines an IntSlice type. Passing an IntSlice to sort.Sort arranges its integers in ascending order:

import (
    "fmt"
    "sort"
)

// Define a custom IntSlice so that we can implement the sort.Interface
type IntSlice []int

// Len, Less, Swap are required to conform to sort.Interface
func (s IntSlice) Len() int           { return len(s) }
func (s IntSlice) Less(i, j int) bool { return s[i] < s[j] }
func (s IntSlice) Swap(i, j int)      { s[i], s[j] = s[j], s[i] }

func main() {
    nums := IntSlice{4, 1, 3, 2}
    sort.Sort(nums)
    fmt.Println(nums) // [1 2 3 4]
}

To reverse the order, invert the comparison in the Less method and define a new type:

import (
    "fmt"
    "sort"
)

// Define a custom IntSlice for descending order sorting.
type DescIntSlice []int

func (s DescIntSlice) Len() int           { return len(s) }
// Inverted comparison for descending order
func (s DescIntSlice) Less(i, j int) bool { return s[i] > s[j] }
func (s DescIntSlice) Swap(i, j int)      { s[i], s[j] = s[j], s[i] }

func main() {
    nums := DescIntSlice{4, 1, 3, 2}
    sort.Sort(nums)
    fmt.Println(nums) // [4 3 2 1]
}

Just reversing the order requires you to define a separate type and implement the three methods again!

Luckily, for the basic types, the sort package provides sort.IntSlice, sort.Float64Slice, and sort.StringSlice - which already implement sort.Interface. So you don’t have to do the above for sorting a slice of primitive elements. Instead, you can do this:

ints := sort.IntSlice{4, 1, 3, 2}
floats := sort.Float64Slice{3.1, 2.7, 5.0}
strings := sort.StringSlice{"banana", "apple", "cherry"}

sort.Sort(ints)      // ints: [1 2 3 4]
sort.Sort(floats)    // floats: [2.7 3.1 5]
sort.Sort(strings)   // strings: [apple banana cherry]

To reverse the order, you can use sort.Reverse as follows:

sort.Sort(sort.Reverse(ints))      // ints: [4 3 2 1]
sort.Sort(sort.Reverse(floats))    // floats: [5 3.1 2.7]
sort.Sort(sort.Reverse(strings))   // strings: [cherry banana apple]

Sorting a slice of structs by age

However, if you’re dealing with a slice of structs, then you do have to implement sort.Interface manually. Here, we sort by the Age field in ascending order:

import (
    "fmt"
    "sort"
)

type User struct {
    Name string
    Age  int
}

type ByAge []User

func (s ByAge) Len() int           { return len(s) }
func (s ByAge) Less(i, j int) bool { return s[i].Age < s[j].Age }
func (s ByAge) Swap(i, j int)      { s[i], s[j] = s[j], s[i] }

func main() {
    users := ByAge{
        {"Alice", 32},
        {"Bob", 27},
        {"Carol", 40},
    }
    sort.Sort(users)
    fmt.Println(users) // [{Bob 27} {Alice 32} {Carol 40}]
}

We can leverage sort.Reverse to reverse the order:

sort.Sort(sort.Reverse(users)) // [{Carol 40} {Alice 32} {Bob 27}]

Although sort.Interface can handle just about any sorting logic, you must create a new custom type (or significantly modify an existing one) each time you want to sort a different slice or the same slice in a different way. It’s powerful but verbose, and can be cumbersome to maintain if you have many different sorts in your code.

Using sort.Slice

Go 1.8 introduced sort.Slice to minimize the amount of boilerplate needed for sorting. Instead of creating a new type and implementing three methods, you provide an inline comparison function that receives the two indices you’re comparing.

Sorting a slice of float64

Here’s a simple example that sorts floats in ascending order:

import (
    "fmt"
    "sort"
)

func main() {
    floats := []float64{2.5, 0.1, 3.9, 1.2}
    sort.Slice(floats, func(i, j int) bool {
        return floats[i] < floats[j]
    })
    fmt.Println(floats) // [0.1 1.2 2.5 3.9]
}

Inverting the comparison sorts them in descending order:

import (
    "fmt"
    "sort"
)

func main() {
    floats := []float64{2.5, 0.1, 3.9, 1.2}
    sort.Slice(floats, func(i, j int) bool {
        return floats[i] > floats[j]  // Reverse the comp
    })
    fmt.Println(floats) // [3.9 2.5 1.2 0.1]
}

Sorting a slice of structs by age

For structs, the inline comparator can access struct fields:

import (
    "fmt"
    "sort"
)

type User struct {
    Name string
    Age  int
}

func main() {
    users := []User{
        {"Alice", 32},
        {"Bob", 27},
        {"Carol", 40},
    }
    sort.Slice(users, func(i, j int) bool {
        return users[i].Age < users[j].Age
    })
    fmt.Println(users) // [{Bob 27} {Alice 32} {Carol 40}]
}

Switching > for < will reverse the sort:

import (
    "fmt"
    "sort"
)

type User struct {
    Name string
    Age  int
}

func main() {
    users := []User{
        {"Alice", 32},
        {"Bob", 27},
        {"Carol", 40},
    }
    sort.Slice(users, func(i, j int) bool {
        return users[i].Age > users[j].Age
    })
    fmt.Println(users) // [{Carol 40} {Alice 32} {Bob 27}]
}

While sort.Slice is much simpler than sort.Interface, it’s still not strictly type-safe: the slice parameter is defined as an interface{}, and you provide a comparator that uses indices. Go won’t necessarily stop you from doing something incorrect in the comparison at compile time.

For example, this code compiles but will panic at runtime because other is referenced inside the comparator of a different slice ints, and the indices i or j can go out of bounds in other:

import (
    "fmt"
    "sort"
)

func main() {
    ints := []int{3, 1, 2}
    other := []int{10, 20}
    sort.Slice(ints, func(i, j int) bool {
        // Using 'other' here compiles, but i or j might be out of range.
        return other[i] < other[j]
    })
    fmt.Println(ints)
}

You won’t find out you’ve made a mistake until runtime, when a panic occurs. There is no compiler-enforced guarantee that the func(i, j int) bool actually compares two values of the intended slice.

Note: In sort.Slice, the comparison function parameters i and j are indices. Inside the function, you must reference slice[i] and slice[j] to get the actual elements being compared.

Using generics with the slices package

Go 1.21 introduced the slices package, which provides generic sorting functions. These new functions combine the convenience of sort.Slice with the ability to detect type errors at compile time. For basic numeric or string slices that satisfy Go’s “ordered” constraints, you can just call slices.Sort. For more complex or custom sorting, slices.SortFunc accepts a comparator function that returns an integer (negative if a < b, zero if they’re equal, and positive if a > b).

Sorting primitive slices

When you’re dealing with basic types like int, float64, or string, you can sort them immediately using slices.Sort, which arranges them in ascending order:

import (
    "fmt"
    "slices"
)

func main() {
    ints := []int{4, 1, 3, 2}
    floats := []float64{2.5, 0.1, 3.9, 1.2}

    slices.Sort(ints)
    slices.Sort(floats)

    fmt.Println(ints)   // [1 2 3 4]
    fmt.Println(floats) // [0.1 1.2 2.5 3.9]
}

For descending order, you can use slices.SortFunc and invert the usual comparison:

import (
    "fmt"
    "slices"
)

func main() {
    ints := []int{4, 1, 3, 2}
    floats := []float64{2.5, 0.1, 3.9, 1.2}

    slices.SortFunc(ints, func(a, b int) int {
        switch {
        case a > b:
            return -1
        case a < b:
            return 1
        default:
            return 0
        }
    })

    slices.SortFunc(floats, func(a, b float64) int {
        switch {
        case a > b:
            return -1
        case a < b:
            return 1
        default:
            return 0
        }
    })

    fmt.Println(ints)   // [4 3 2 1]
    fmt.Println(floats) // [3.9 2.5 1.2 0.1]
}

Sorting a slice of structs by age

When dealing with more complex structures, you can define precisely how two elements should be compared:

import (
    "fmt"
    "slices"
)

type User struct {
    Name string
    Age  int
}

func main() {
    users := []User{
        {"Alice", 32},
        {"Bob", 27},
        {"Carol", 40},
    }
    slices.SortFunc(users, func(a, b User) int {
        return a.Age - b.Age
    })
    fmt.Println(users) // [{Bob 27} {Alice 32} {Carol 40}]
}

To reverse the order, invert the numerical comparison:

import (
    "fmt"
    "slices"
)

type User struct {
    Name string
    Age  int
}

func main() {
    users := []User{
        {"Alice", 32},
        {"Bob", 27},
        {"Carol", 40},
    }
    slices.SortFunc(users, func(a, b User) int {
        switch {
        case a.Age > b.Age:
            return -1
        case a.Age < b.Age:
            return 1
        default:
            return 0
        }
    })
    fmt.Println(users) // [{Carol 40} {Alice 32} {Bob 27}]
}

Note: Unlike sort.Slice, which passes indices to the comparison function, slices.SortFunc passes the actual elements (a and b) to your comparator. Moreover, the comparator must return an int (negative, zero, or positive), rather than a boolean.

Compile-time safety

One of the major benefits of the slices package is compile-time type safety, which you don’t get with sort.Sort or sort.Slice. Those older APIs use interface{} parameters or index-based comparators and don’t strictly verify that your comparator operates on the right types.

As shown previously, you can accidentally reference a different slice in the comparator and your code will compile but crash at runtime. By contrast, slices.Sort and slices.SortFunc are fully generic. The compiler enforces that you pass a slice of a valid type (e.g., []int, []string, or a custom struct slice), and that your comparator’s signature matches the element type. This means you get errors at compile time instead of at runtime.

For instance, if you attempt to pass an array instead of a slice:

import "slices"

func main() {
    arr := [4]int{10, 20, 30, 40}
    // compile-time error: cannot use arr (type [4]int) as []int
    slices.Sort(arr)
}

Go will refuse to compile this code because arr is not a slice. Similarly, if your comparator for slices.SortFunc returns a type other than int, the compiler will produce an error. This helps you detect mistakes immediately, rather than discovering them in runtime.

For a practical illustration, consider sorting a slice by a case-insensitive string field:

import (
    "fmt"
    "slices"
    "strings"
)

type Animal struct {
    Name    string
    Species string
}

func main() {
    animals := []Animal{
        {"Bob", "Giraffe"},
        {"alice", "Zebra"},
        {"Dave", "Elephant"},
    }

    // Sort by Name, ignoring case
    slices.SortFunc(animals, func(a, b Animal) int {
        aLower := strings.ToLower(a.Name)
        bLower := strings.ToLower(b.Name)
        switch {
        case aLower < bLower:
            return -1
        case aLower > bLower:
            return 1
        default:
            return 0
        }
    })

    fmt.Println(animals)
    // Output: [{alice Zebra} {Bob Giraffe} {Dave Elephant}]
}

Because your comparator expects an Animal for both a and b, you can’t accidentally compare two different types or reference the wrong fields without hitting a compile-time error.

Nil comparisons and Go interface

Wed, 12 Mar 2025 00:00:00 +0000

Comparing interface values in Go has caught me off guard a few times, especially with nils. Often, I’d expect a comparison to evaluate to true but got false instead.

Many moons ago, Russ Cox wrote a fantastic post on Go interface internals that clarified my confusion. This post is a distillation of my exploration of interfaces and nil comparisons.

Interface internals

Roughly speaking, an interface in Go has three components:

A static type
A dynamic type
A dynamic value

For example:

var n any  // The static type of n is any (interface{})
n = 1      // Upon assignment, the dynamic type becomes int
           // And the dynamic value becomes 1

Here, the static type of n is any, which tells the compiler what operations are allowed on the variable. In the case of any, any operation is allowed. When we assign 1 to n, it adopts the dynamic type int and the dynamic value 1.

Internally, every interface value is implemented as a two word structure:

One word holds a pointer to the dynamic type (i.e., a type descriptor).
The other word holds the data associated with that type.

This data word might directly contain the value if it’s small enough, or it might hold a pointer to the actual data. Note that this internal representation is distinct from the interface’s declared or “static” type - the type you wrote in the code (any in the example above). At runtime, what gets stored is only the pair of dynamic type and dynamic value. Here’s a crude diagram:

+-----------------------+
|   Interface           |
+-----------------------+
| Pointer to type info  |  ---> [Dynamic type descriptor]
+-----------------------+
| Data                  |  ---> [Dynamic value or pointer to the value]
+-----------------------+

Comparing nils with interface variables

Nil comparisons can be tricky because an interface value is considered nil only when both its dynamic type and dynamic value are nil. A few examples.

Comparing a nil pointer directly

var p *int  // p is a nil pointer of type *int
if p == nil {
    fmt.Println("p is nil")
}
// Output: p is nil

Here, p is a pointer to an int and is explicitly nil, so the comparison works as expected. This doesn’t have anything to do with explicit interfaces, but it’s important to demo basic nil comparison to understand how comparisons work with interfaces.

An interface variable explicitly set to nil

var r io.Reader  // The static type of r is io.Reader
r = nil          // The dynamic type is nil
                 // The dynamic value is nil

// Since both the dynamic type and value evaluate to nil, r == nil is true
if r == nil {
    fmt.Println("r is nil")
}
// Output: r is nil

In this case, r is directly set to nil. Since both the dynamic type and the dynamic value are nil, the interface compares equal to nil.

Assigning a nil pointer to an interface variable

var b *bytes.Buffer    // b is a nil pointer of type *bytes.Buffer
var r io.Reader = b    // The static type of r is io.Reader.
                       // The dynamic type of r is *bytes.Buffer.
                       // The dynamic value of r is nil.

// Although b is nil, r != nil because r holds type info (*bytes.Buffer).
if r == nil {
    fmt.Println("r is nil")
} else {
    fmt.Println("r is not nil")
}
// Output: r is not nil

Even though b is nil, assigning it to the interface variable r gives r a non-nil dynamic type (*bytes.Buffer) with a nil dynamic value. Since r still holds type information, r == nil returns false, even though the underlying value is nil.

When comparing an interface variable, Go checks both the dynamic type and the value. The variable evaluates to nil only if both are nil.

Using type assertions for reliable nil checks

In cases where an interface variable might hold a nil pointer, we’ve seen that comparing the interface directly to nil may not yield the expected result.

A type assertion can help extract the underlying value so that you can perform a more reliable nil check. This approach is especially useful when you know the expected underlying type.

Below, we define a simple type myReader that implements the Read method to satisfy the io.Reader interface.

type myReader struct{}

func (mr *myReader) Read(p []byte) (int, error) {
    return 0, nil
}

Now, consider the following example:

var mr *myReader        // mr is a nil pointer of type *myReader
var r io.Reader = mr    // The static type of r is io.Reader
                        // The dynamic type of r is *myReader
                        // The dynamic value of r is nil

// Use a type assertion to extract the underlying *myReader value.
if underlying, ok := r.(*myReader); ok && underlying == nil {
    fmt.Println("r holds a nil pointer")
} else {
    fmt.Println("r does not hold a nil pointer")
}
// Output: r holds a nil pointer

Here, we assert that r holds a value of type *myReader. If the assertion succeeds (indicated by ok being true) and the underlying value is nil, we can conclude that the interface variable holds a nil pointer - even though the interface itself is not nil due to its dynamic type.

This type assertion trick only works when you know the underlying type of the interface value. If the type might vary, consider using the reflect package to examine the underlying value.

Writing a generic nil checker with reflect

The following function introspects any variable and checks whether it’s nil:

func isNil(i any) bool {
    if i == nil {
        return true
    }
    // Arrays are not nilable, so we skip reflect.Array.
    switch reflect.TypeOf(i).Kind() {
    case reflect.Ptr,
        reflect.Map,
        reflect.Chan,
        reflect.Slice,
        reflect.Func:
        return reflect.ValueOf(i).IsNil()
    }
    return false
}

The switch on .Kind() is necessary because directly calling reflect.ValueOf().IsNil() on a non-pointer value will cause a panic.

Calling this function on any value, including an interface, reliably checks whether it’s nil.

Fin!

Stacked middleware vs embedded delegation in Go

Thu, 06 Mar 2025 00:00:00 +0000

Middleware is usually the go-to pattern in Go HTTP servers for tweaking request behavior. Typically, you wrap your base handler with layers of middleware - one might log every request, while another intercepts specific routes like /special to serve a custom response.

However, I often find the indirections introduced by this pattern a bit hard to read and debug. I recently came across the embedded delegation pattern while browsing Gin’s HTTP router source code. Here, I explore both patterns and explain why I usually start with delegation whenever I need to modify HTTP requests in my Go services.

Middleware stacking

Here’s an example where the logging middleware records each request, and the special middleware intercepts requests to /special:

package main

import (
    "log"
    "net/http"
)

// loggingMiddleware logs incoming requests.
func loggingMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        log.Println("Middleware: received request for", r.URL.Path)
        next.ServeHTTP(w, r)
    })
}

// specialMiddleware intercepts requests for "/special" and handles them.
func specialMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        if r.URL.Path == "/special" {
            w.Write([]byte("Special middleware handling request"))
            return
        }
        next.ServeHTTP(w, r)
    })
}

func main() {
    mux := http.NewServeMux()
    mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("Hello, world!"))
    })

    // Middleware chain: special handling then logging.
    handler := loggingMiddleware(specialMiddleware(mux))
    http.ListenAndServe(":8080", handler)
}

In this setup, every incoming request is first handled by the special middleware, which checks for the /special route, and then by the logging middleware that logs the request details. We’re effectively stacking the middleware functions.

If you hit the server with:

curl localhost:8080/
curl localhost:8080/special

the server logs will look like this:

2025/03/06 21:24:44 Middleware: received request for /
2025/03/06 21:24:47 Middleware: received request for /special

Stacking middleware functions like middleware3(middleware2(middleware1(mux))) can get messy when you have many of them. That’s why people usually write a wrapper function to apply the middlewares to the mux:

func applyMiddleware(
    handler http.Handler,
    middlewares ...func(http.Handler) http.Handler) http.Handler {

    // Apply middlewares in reverse order to preserve LIFO.
    for i := len(middlewares) - 1; i >= 0; i-- {
        handler = middlewares[i](handler)
    }
    return handler
}

applyMiddleware takes an http.Handler and a variadic list of middleware functions (...func(http.Handler) http.Handler). It loops over the middleware in reverse order so each one wraps the next properly. This avoids deep nesting like middleware3(middleware2(middleware1(mux))) and keeps the middleware chain tidy.

You’d then use it like this:

func main() {
    mux := http.NewServeMux()
    mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("Hello, world!"))
    })

    // Middleware chain: special handling then logging.
    // specialMiddleware is applied before loggingMiddleware.
    handler := applyMiddleware(mux, loggingMiddleware, specialMiddleware)
    http.ListenAndServe(":8080", handler)
}

This behaves just like the manual middleware stacking, but it’s a bit cleaner.

While this is the canonical way to handle request-response modifications in Go, it can sometimes be hard to reason about, especially when debugging or dealing with many middleware layers.

There’s another way to achieve the same result without dealing with a soup of nested functions. The next section talks about that.

Embedded delegation

Embedded delegation (or the delegation pattern) means you embed the standard HTTP multiplexer inside your own struct and override its ServeHTTP method.

It’s a bit like inheritance - overriding a method in a subclass to add extra functionality and then delegating the call to the original method. Although Go doesn’t have a class hierarchy, you can still delegate responsibilities to the embedded type’s method.

The following example implements the same behavior - logging every request and intercepting the /special route - directly within a custom mux:

package main

import (
    "log"
    "net/http"
)

// CustomMux embeds http.ServeMux to override ServeHTTP.
type CustomMux struct {
    *http.ServeMux
}

// ServeHTTP logs the request and intercepts "/special" before
// delegating to the embedded mux.
func (cm *CustomMux) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    // Log all requests.
    log.Println("CustomMux: received request for", r.URL.Path)

    // Handle "/special" differently.
    if r.URL.Path == "/special" {
        w.Write([]byte("Special handling in CustomMux"))
        return
    }
    cm.ServeMux.ServeHTTP(w, r)
}

func main() {
    mux := http.NewServeMux()
    mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("Hello, world!"))
    })

    // Wrap the standard mux with our custom delegation.
    customMux := &CustomMux{ServeMux: mux}
    http.ListenAndServe(":8080", customMux)
}

In this example, the custom mux centralizes both logging and special-case route handling within one ServeHTTP method. This approach cuts out the extra function calls in a middleware chain and can simplify tracking the request flow. I find it a bit easier on the eyes too.

If you have a bunch of extra functionality to add inside cm.ServeHTTP, you can wrap them in utility functions like this:

// logRequest logs incoming HTTP requests.
func logRequest(r *http.Request) {
    log.Println("CustomMux: received request for", r.URL.Path)
}

// handleSpecialRequest handles requests to "/special"
// and returns true if handled.
func handleSpecialRequest(w http.ResponseWriter, r *http.Request) bool {
    if r.URL.Path != "/special" {
        return false // Not handled, continue processing.
    }
    w.Write([]byte("Special handling in CustomMux"))
    return true // Handled; no further processing needed.
}

Then, simply call these functions inside your cm.ServeHTTP method:

func (cm *CustomMux) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    logRequest(r)

    if handleSpecialRequest(w, r) {
        return
    }
    cm.ServeMux.ServeHTTP(w, r)
}

This keeps all the request modifications in a single ServeHTTP method.

Mixing the two approaches

You can also mix both techniques. For example, you might use direct delegation for special route handling and then wrap the resulting handler with middleware for logging. Here’s how a hybrid solution might look:

package main

import (
    "log"
    "net/http"
)

// CustomMux embeds http.ServeMux and intercepts "/special".
type CustomMux struct {
    *http.ServeMux
}

// ServeHTTP intercepts "/special" and delegates other routes.
func (cm *CustomMux) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    if r.URL.Path == "/special" {
        w.Write([]byte("Special handling in CustomMux"))
        return
    }
    cm.ServeMux.ServeHTTP(w, r)
}

// loggingMiddleware logs incoming requests.
func loggingMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        log.Println("Middleware: received request for", r.URL.Path)
        next.ServeHTTP(w, r)
    })
}

func main() {
    mux := http.NewServeMux()
    mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("Hello, world!"))
    })

    // Use direct delegation for special routing.
    customMux := &CustomMux{ServeMux: mux}
    // Wrap the custom mux with logging middleware.
    handler := loggingMiddleware(customMux)
    http.ListenAndServe(":8080", handler)
}

In this hybrid approach, the specialized behavior (intercepting the /special path) is handled via direct delegation, while logging stays modular as middleware. This gives you the best of both worlds.

I usually start with the embedded delegation and gradually introduce the middleware pattern if I need it later. It’s easier to adopt the middleware pattern if you start with delegation than the other way around.

Why does Go's io.Reader have such a weird signature?

Sat, 08 Feb 2025 00:00:00 +0000

I’ve always found the signature of io.Reader a bit odd:

type Reader interface {
    Read(p []byte) (n int, err error)
}

Why take a byte slice and write data into it? Wouldn’t it be simpler to create the slice inside Read, load the data, and return it instead?

// Hypothetical; what I *thought* it should be
Read() (p []byte, err error)

This felt more intuitive to me - you call Read, and it gives you a slice filled with data, no need to pass anything.

I found out why it’s designed this way while watching this excellent GopherCon Singapore talk on understanding allocations by Jacob Walker. It mainly boils down to two reasons.

Reducing heap allocations

If Read created and returned a new slice every time, the memory would always end up on the heap.

Heap allocations are slower because they require garbage collection, while stack allocations are faster since they are freed automatically when a function returns. By taking a caller-provided slice, Read lets the caller control memory and reuse buffers, keeping them on the stack whenever possible.

This matters a lot when reading large amounts of data. If each Read call created a new slice, you’d constantly be allocating memory, leading to more work for the garbage collector. Instead, the caller can allocate a buffer once and reuse it across multiple reads:

buf := make([]byte, 4096) // Single allocation
n, err := reader.Read(buf) // Read into existing buffer

Go’s escape analysis tool (go build -gcflags=-m) can confirm this. If Read returned a new slice, the tool would likely show:

buf escapes to heap

meaning Go has to allocate it dynamically. But by reusing a preallocated slice, we avoid unnecessary heap allocations - only if the buffer is small enough to fit in the stack. How small? Only the compiler knows, and you shouldn’t depend on it. Use the escape analysis tool to see that. But most of the time, you don’t need to worry about this at all.

Reusing buffers in streaming

The second issue is correctness. When reading from a stream, you usually call Read multiple times to get all the data. If Read returned a fresh slice every time, you’d have no control over memory usage across calls. Worse, you couldn’t efficiently handle partial reads, making buffer management unpredictable.

With the hypothetical version of Read, every call would allocate a new slice. If you needed to read a large stream of data, you’d have to manually piece everything together using append, like this:

var allData []byte
for {
    buf, err := reader.Read() // New allocation every call
    if err != nil {
        break
    }
    allData = append(allData, buf...) // Growing slice, more allocs
}
process(allData)

This is a mess. Every time append runs out of space, Go will have to allocate a larger slice and copy the existing data over, piling on unnecessary GC pressure.

By contrast, io.Reader’s actual design avoids this problem:

buf := make([]byte, 4096) // Allocate once
for {
    n, err := reader.Read(buf)
    if err != nil {
        break
    }
    process(buf[:n])
}

This avoids unnecessary allocations and produces less garbage for the GC to clean up.

Redowan's Reflections

Do you need a repository layer on top of sqlc?

Wrapping a gRPC client in Go

Layout

Defining the service

Using the generated client directly

Calling the server with the wrapper

Building the wrapper

Plugging in retries and metrics

Try it yourself

In praise of the etcd codebase

Go errors: to wrap or not to wrap?

The problem with bare errors

The case for wrapping at every return site

The cost of overwrapping

Messages pile up

%w creates contracts you didn’t mean to

%v as the conservative default

How the stdlib handles errors

Structured logging as an alternative to wrapping

How wrapping changes by application type

Libraries

CLI tools

Services

Where I’ve landed

Mutate your locked state inside a closure

The problem with Set

Take a function instead

The stdlib already does this

The proposal to add this to sync

Tailscale’s MutexValue

When a plain Set is fine

What canceled my Go context?

What “context canceled” actually tells you

Attaching a cause with WithCancelCause

Why defer cancel() discards the cause

Covering every path with a manual timer

When you also need DeadlineExceeded

Reading and logging the cause

Closing words

Structured concurrency & Go

From goto to structured programming

The same problem, but with concurrency

Python’s TaskGroup

Kotlin’s coroutineScope

Go doesn’t do this by default

errgroup for cancel-on-error

WaitGroup for supervisor-like behavior

Goroutines aren’t coroutines

Explicit by design

Making Go’s concurrency more structured

Catching mistakes

Closing words

Your Go tests probably don't need a mocking library

Mocking a function

Monkey patching

Mocking a method on a type

Consumer-side interface segregation

Struct embedding for partial implementation

Function type as interface

Mocking HTTP calls

Mocking time

Closing words

Tap compare testing for service migration

The scenario

Start with read endpoints

Write endpoints are trickier

Uniqueness, validation, and state-dependent logic

Idempotency, retries, and ordering

External side effects

What tap compare can and can’t tell you on writes

From tap handlers to production handlers

Other risks and pitfalls

Parting words

Splintered failure modes in Go

Splintered failure modes

Represent failure modes exclusively via the error

Distinguishing failure types within the error

Sentinel errors

Error types

`%w` creates contracts you didn’t mean to

`%v` as the conservative default

Group subtests with nested `t.Run` when lifecycle diverges

Package-wide setup and teardown with `TestMain`

The `tools.go` convention

Enter the `tool` directive

What about `go generate`?