EmilRex

On Tool Minimalism

2025-05-23T00:00:00+00:00

Reflecting on tools for development and broader organizational use often leads me to ‘tool minimalism.’ This approach fundamentally considers the ratio of value to complexity in one’s choices, aiming for high value with manageable complexity. A higher ratio is better and a core expression of this minimalist stance.

Value here isn’t just monetary cost (though savings can be a byproduct of simpler systems) but rather utility: problems solved, efficiency gained, or breadth of useful features. Complexity includes learning curve, cognitive load from context switching, integration efforts, and maintenance. Each added tool can increase this complexity, often with diminishing returns—an 80/20 dynamic where seeking that last bit of functionality can disproportionately complicate things, undermining a minimalist approach.

This value-vs-complexity principle, while not a formula, is a conceptual model for tool minimalism. It combats tool sprawl by encouraging conscious tradeoffs—like sacrificing niche features for a simpler, integrated toolkit. The goal isn’t a vendor’s “walled garden,” but a “manageable garden” of well-chosen, interoperable tools. This post muses on a few tools that seem to fit this philosophy.

GitHub

GitHub excels at version control, its primary role. Its true value for tool minimalism, however, lies in integrating project management (Issues) and CI/CD (Actions). This consolidation, useful even for non-engineers, reduces separate tooling and centralizes discussions. Keeping these functions with the codebase lessens friction and system complexity, offering a favorable balance of value and complexity and covering much of the development lifecycle and project oversight in one place.

Google Workspace

Google Workspace also exemplifies this. Beyond email, its suite (Drive, Docs, Sheets) offers a unified system for documents, storage, collaboration, and user management. For a company, this can be powerful, establishing core infrastructure (accounts, email, file sharing, productivity tools) under one domain. This integration is efficient. The connection to GCP is also practical, as GCP is a relatively straightforward cloud platform for more advanced needs.

Supabase

Supabase builds on PostgreSQL (a solid relational database choice) by layering commonly needed backend features—authentication, real-time subscriptions, storage—on top. This significantly cuts initial project setup time and effort.

Modal specializes in simplifying cloud function and application deployment/hosting—feeling like the serverless ideal. For Python scripts, ML models, or smaller web services, it abstracts much underlying infrastructure. This developer experience focus means less server management and more coding.

Closing Thoughts

These tools exemplify a principle I value: a favorable ratio of value to complexity. Adopting such tools reflects a minimalist approach, prioritizing a high return of utility and efficiency for the investment in learning and integration. This often means strategically limiting systems, a tradeoff that, in my view, pays dividends in maintainability, ease of use, and lower operational overhead.

PoC vs MVP

2024-11-14T00:00:00+00:00

I’ve noticed that engineers — myself included — often conflate Proof of Concept (PoC) with Minimum Viable Product (MVP), but these serve fundamentally different purposes in the development process.

A PoC is fundamentally about code. It’s the process of writing enough code to demonstrate that a technical approach is feasible. The code quality doesn’t matter, the user experience is irrelevant, and the implementation may be completely discarded afterward. What matters is proving that something can be built. Engineers are naturally drawn to PoCs because they align with our inclination to solve technical problems.

An MVP, by contrast, is about demonstrating value to users. The technical implementation is secondary and may even be partially simulated or manually operated. The goal isn’t to prove that something works — it’s to validate that users actually need and want the solution. An MVP might involve technical shortcuts or manual processes behind the scenes, as long as it allows users to experience the core value proposition.

This distinction matters because it influences where we focus our efforts. When building a PoC, we focus purely on technical validation. When creating an MVP, we focus on user value validation. The confusion between these two concepts often leads us to over-engineer MVPs or create PoCs that aren’t actually useful.

The essence of this difference is simple: a PoC proves you can build something, while an MVP proves you should build it. Understanding this helps teams focus their efforts appropriately at each stage of product development.

Advanced Docker Compose Features

2024-07-09T00:00:00+00:00

I recently set up a Prefect environment using Docker Compose and wanted to document some of the slightly more advanced features I used. This setup deploys a PostgreSQL database, Prefect server (API and front end), and two workers - one for running flows as subprocesses and another for running them as Docker containers.

While this configuration is specific to Prefect, the overall structure and many of the techniques are widely applicable. The pattern of bringing up a database alongside one or more application services is common in many Docker Compose stacks. As such, the advanced features explored here can be valuable in a variety of containerized deployments, from web applications to microservices architectures.

The full Docker Compose file, along with the rest of the code, is available in this repo.

Here are the key Docker Compose features I found particularly useful:

1. YAML Anchors and Aliases

YAML anchors allow reusing common configuration across multiple services:

x-prefect-common: &prefect-common
    build:
        context: .
    networks:
        - prefect
    restart: always

services:
    server:
        <<: *prefect-common
        command: prefect server start --host server --port 4200
        # Additional server-specific config

The &prefect-common anchor defines a set of common configurations. The <<: *prefect-common syntax then merges these common configs into each service that needs them. This keeps the compose file DRY and easier to maintain.

2. Health Checks

The compose file defines detailed health checks for services:

healthcheck:
    test: ["CMD", "curl", "-f", "http://server:4200/api/health"]
    start_interval: 5s
    start_period: 15s

The start_interval specifies more frequent health checks during startup to make everything come up quicker, but only for the initial startup period. The start_period defines a grace period before starting these health checks. This allows for more nuanced control over service initialization, balancing quick startup with allowing sufficient time for services to initialize.

3. Startup Order

Health checks are used in combination with depends_on to ensure proper service startup order:

docker-worker:
    <<: *prefect-common
    depends_on:
        server:
            condition: service_healthy

This configuration ensures that the docker-worker service only starts after the server service is not just running, but actually healthy according to its defined health check.

4. Docker-in-Docker

The Docker worker service includes a specific mount to enable interaction with the host’s Docker daemon:

volumes:
    - /var/run/docker.sock:/var/run/docker.sock

This mount allows the worker to call the Docker daemon on the host system, enabling it to spin up new containers for flow runs. This is crucial for the Docker worker’s ability to create and manage containers for Prefect flows.

Rediscovering Foreign Keys in DWs

2024-07-06T00:00:00+00:00

Foreign keys are standard in transactional databases but rarely used in data warehouses. This isn’t just about database design - it’s a missed chance to improve data discovery and use.

In data warehouses, foreign keys matter for metadata, not constraints. Unlike transactional databases where it’s typical for a single team to manage the schema, data warehouses mix data from many sources with different structures. The connections between these datasets exist, but they’re not explicit. They’re just in the minds of data engineers.

This presents two challenges. First, finding data connections takes too long. Second, you can’t automate anything based on these hidden relationships. Take joining Stripe data with internal data. There’s probably a shared customer ID column, but you have to know it exists. I’ve spent hours searching table schemas for the right join key, sometimes finding that the ID I thought was right (customer_id) only works sometimes, and we actually join on something else (subscription_id).

With proper foreign keys defined, you could see these connections quickly. I’d love to see a network graph of a data warehouse with tables as nodes and foreign keys as edges. It could make for a great user experience when joining tables.

But how do we add foreign keys to existing data warehouses? The system’s there - most support foreign keys in the schema. But how do we find the actual relationships? Should we manually encode them, like refs in dbt? Should we discover them based on metadata, then have humans check and approve?

Where should this information live? In an existing data catalog? As a standalone tool? Should dbt get involved? Could a tool like Fivetran propagate foreign keys as part of ETL? Probably not, as we need a whole-warehouse approach, not a source by source one. After all, the whole point is to link disparate sources with no knowledge of one another.

I think there’s room for an open-source tool for curating foreign keys. It could run as a service, regularly check for schema changes, suggest relationships for humans to approve, and write them back to the data warehouse.

There’s also potential to use these encoded foreign keys to help write SQL, but that feels like an entirely separate tool.

The bigger question remains: how do we seed foreign keys in all the data warehouses that don’t have them now? It’s a challenge, but solving it could greatly improve how we work with data warehouses.

I’ve had these thoughts about foreign keys in data warehouses for some years now, but getting them into a coherent written form always felt daunting. With the help of LLMs, I was finally able to put this post together (in a way that felt easy enough). This piece was co-written by an AI assistant, Claude. It’s a blend of my ideas and experiences with Claude’s ability to organize and articulate.

Hello World

2022-02-05T00:00:00+00:00

Hello World! Mighty empty in here…