Your Codebase Has a String Problem — And It’s Costing You at Scale

A string can hold anything — a name, a UUID, a JSON blob, a typo, a SQL injection payload. That’s exactly the problem. When business concepts like TransactionId or EmailAddress live as raw strings, the type system becomes useless and every function turns into a defensive wall of manual checks. Primitive Obsession is what happens when developers use primitive types — strings, integers, booleans — to represent domain concepts that deserve their own structure.


TL;DR: Quick Takeaways

  • A raw string parameter bypasses the compiler entirely — swapping arguments causes silent data corruption, not a build error
  • Value Objects eliminate validation duplication by centralizing rules in one constructor, called once at the system boundary
  • Magic status strings like "pending" are typo-prone and non-exhaustive — discriminated unions give the compiler full visibility over state transitions
  • Parsing strings inside hot loops allocates redundant objects — parse once at the entry point, pass structured types through business logic

The Type System You’re Bypassing Every Time You Use a Raw String

When a function accepts string, it accepts everything. From the compiler’s perspective, "", "pizza", and "[email protected]" are identical — they’re all just string. This creates what DDD practitioners call a semantic void: a parameter that could mean anything forces every caller to assume responsibility for correctness. In large codebases, that assumption breaks down fast. Validation logic gets copy-pasted across Auth, Billing, and Notifications. When the rules change — a new email TLD, a different UUID format — you’re hunting through twenty files instead of changing one constructor.

Validation Hell: When Every Function Becomes a Policeman

The most visible symptom of Primitive Obsession is validation scattered across the codebase. Every function that accepts a raw string email has to verify it — or trust that the caller already did. Neither option scales. The fix is moving validation responsibility into the type itself: if an EmailAddress object exists, it’s already valid. You don’t check it again.

// BAD: validation duplicated at every call site
function registerUser(email: string, countryCode: string) {
    if (!email.includes('@') || email.length < 5) throw new Error("Invalid email");
    if (countryCode.length !== 2) throw new Error("Invalid ISO code");
    db.save({ email, country: countryCode.toUpperCase() });
}

// GOOD: type constructor is the single validation point
class EmailAddress {
    constructor(private readonly val: string) {
        if (!val.match(/^[^\s@]+@[^\s@]+\.[^\s@]+$/))
            throw new Error("Invalid email format");
    }
    get value() { return this.val.toLowerCase(); }
}

function registerUser(email: EmailAddress, country: CountryCode) {
    db.save({ email: email.value, country: country.iso });
}

The second version has zero defensive checks inside registerUser. The EmailAddress constructor runs once — at the system boundary — and from that point the type is a guarantee. If validation rules change, you update one class. Every consumer gets the fix automatically.

The Argument Swap Trap: Silent Corruption at Runtime

Three string parameters in a function signature is a data corruption waiting to happen. A function like assignProject(userId, projectId, managerId) — all UUIDs, all strings — will compile and run just fine when arguments are swapped. No exception, no stack trace. The database quietly stores corrupted relationships and the logs look clean. This category of bug survives code review, unit tests, and QA because nothing technically breaks — it just links the wrong data.

// BAD: compiler can't distinguish three identical string types
function transferFunds(fromId: string, toId: string, amount: string, currencyId: string) {
    // swapping fromId and toId compiles fine — and destroys data silently
}
transferFunds(targetAccount, sourceAccount, "500", "USD"); // wrong order, no error

// GOOD: branded types make argument swaps a compile error
type AccountId = string & { readonly __brand: "AccountId" };
type Money = { amount: number; currency: "USD" | "EUR" | "GBP" };

function transferFunds(from: AccountId, to: AccountId, amount: Money) {
    // passing a UserId where AccountId is expected: build fails immediately
}

Branded types in TypeScript have zero runtime cost — they’re erased during compilation. The __brand property exists only in the type system. What you get is compiler-enforced argument ordering with no performance overhead. On the JVM side, Kotlin’s value class achieves the same result with inline optimization.

Related materials
Pure functions vs impure...

Side Effects Are Not the Enemy — Uncontrolled Side Effects Are You've seen this bug. A function works flawlessly in staging, passes all tests, ships to production — and then, three weeks later, silently returns...

[read more →]

State Management Without Magic Strings

Status fields are where Primitive Obsession gets genuinely dangerous. Using "pending", "active", "deleted" as raw strings means the compiler has no idea what values are valid — or exhaustive. A typo like "pnding" doesn’t throw an error; it silently falls through to the else branch and executes logic that was never intended to run. In order processing, payment flows, or anything with state machines, this pattern produces the most confusing production incidents: no exception thrown, no obvious error, just wrong behavior in a code path that shouldn’t have been reached.

Discriminated Unions: Exhaustive State by Design

The fix for magic status strings isn’t adding more null checks — it’s making invalid states unrepresentable. Discriminated unions give the compiler full visibility over every possible state, and TypeScript’s exhaustive checking will surface a compile error the moment you add a new status without handling it everywhere. This is the kind of safety that scales: adding "refunded" to OrderStatus immediately breaks every switch statement that doesn’t handle it.

// BAD: arbitrary strings, no exhaustiveness, typos are silent
function processOrder(order: any) {
    if (order.status === "paid") shipOrder(order);
    else if (order.status === "pnding") wait(order); // typo — else branch fires
}

// GOOD: discriminated union with exhaustive switch
type OrderStatus = "paid" | "pending" | "shipped" | "refunded";

function processOrder(order: { status: OrderStatus }) {
    switch (order.status) {
        case "paid":     return ship(order);
        case "pending":  return wait(order);
        case "shipped":  return track(order);
        case "refunded": return notify(order);
        // TypeScript errors here if a case is missing — exhaustiveness enforced
    }
}

The union type narrows automatically inside each case branch — order.status is literally "paid" inside the first case, not just string. This means you can add properties specific to each state variant without casting, and the compiler tracks correctness through every branch.

Performance and Security: The Hidden Costs of Stringly-Typed Code

Beyond correctness, Primitive Obsession has measurable runtime costs that appear at scale. In V8 (Node.js) and the JVM, strings are heap-allocated objects. Parsing the same date string or UUID repeatedly inside a high-frequency loop doesn’t just add CPU cycles — it generates garbage that the GC has to collect. A loop over 50,000 records that creates a new Date() from a string field on every iteration generates 50,000 short-lived objects. Replace that with a structured type parsed once at the entry point and the allocation disappears entirely.

Related materials
Fixing Mutable Defaults in...

Python Mutable Default Arguments // The Definition-Time Trap In modern software engineering, the behavior of function arguments remains a primary source of logic corruption. A common trap in Python is using mutable objects (lists, dicts,...

[read more →]

Parse Once at the Boundary, Pass Types Through Logic

The architectural rule is simple: strings are for transport. API responses, database rows, and config files all use strings because they cross serialization boundaries. The moment data enters your business logic, it should be a structured type — and stay that way. Every utility function that re-parses a string it received as a parameter is a symptom of a missing domain object.

// BAD: re-parsing the same string in every utility function
function getDay(dateStr: string)  { return new Date(dateStr).getDay(); }
function getYear(dateStr: string) { return new Date(dateStr).getFullYear(); }
// 50,000 records × 2 functions = 100,000 Date allocations per batch

// GOOD: parse once at the system boundary, pass the object
class DomainDate {
    private constructor(private readonly d: Date) {}
    static parse(raw: string): DomainDate { return new DomainDate(new Date(raw)); }
    get day()  { return this.d.getDay(); }
    get year() { return this.d.getFullYear(); }
}

const date = DomainDate.parse(input); // one allocation, one parse
getInfo(date); // no strings, no re-parsing, no GC pressure

At 50,000 records this pattern eliminates 99,998 redundant allocations per batch operation. The GC pressure difference is measurable in production profilers — especially in Node.js environments where heap allocation rate directly affects latency percentiles.

Security: Strings Don’t Know If They’re Tainted

Security vulnerabilities follow the same pattern as correctness bugs: when everything is a string, there’s no way to distinguish a safe constant from a user-supplied payload. SQL injection and XSS attacks happen at the exact moment raw input strings are concatenated into sensitive contexts. A typed system that distinguishes SanitizedInput from RawUserInput makes this mistake structurally impossible — the build fails before the injection vector exists.

// BAD: user input treated as a safe string fragment
const query = "SELECT * FROM users WHERE username = '" + req.body.user + "'";
db.rawExecute(query); // classic SQL injection vector

// GOOD: sanitized type prevents unsafe concatenation
class SafeId {
    private constructor(readonly value: string) {}
    static from(raw: unknown): SafeId {
        if (typeof raw !== 'string' || !/^[a-z0-9_-]+$/i.test(raw))
            throw new Error("Invalid identifier");
        return new SafeId(raw);
    }
}

const safeId = SafeId.from(req.body.user);
db.execute("SELECT * FROM users WHERE id = ?", [safeId.value]);

The SafeId constructor enforces a whitelist at parse time. Anything that doesn’t match the pattern throws before reaching the query. The parameterized query handles escaping. Two layers of defense, both enforced by types — not by developer discipline on every call site.

Value Objects: The Architectural Pattern That Fixes All of This

Every fix described above is an instance of the same pattern: the Value Object. A Value Object is a domain concept represented as an immutable object whose equality is based on its value, not its identity. It encapsulates both the data and the rules that govern it. Money isn’t a number — it’s an amount and a currency that knows how to add itself to another Money of the same currency and throw if they differ. EmailAddress isn’t a string — it’s a validated, normalized email that exists only if it passed construction.

Approach Validation location Argument swap safety Refactor cost Compiler support
Raw string Every call site None — silent corruption High — scattered checks Zero
Branded type Cast point only Compile error on swap Low — type aliases Full (TS/Kotlin)
Value Object Constructor once Compile error on swap Minimal — one class Full + domain logic

Applying Value Objects at the System Boundary

The transition from stringly-typed code to Value Objects doesn’t require a full rewrite. The pattern is always the same: identify where strings enter the system (API handlers, DB mappers, config parsers), replace the raw string with a Value Object constructor at that point, and let types propagate inward. Everything downstream gets a validated, structured object — never a raw string that might be anything.

// Step 1: identify the boundary — API handler receives raw input
app.post('/transfer', (req, res) => {
    // Step 2: construct Value Objects at the entry point
    const from     = AccountId.from(req.body.fromId);
    const to       = AccountId.from(req.body.toId);
    const amount   = Money.of(req.body.amount, req.body.currency);

    // Step 3: business logic receives types, not strings
    transferService.execute(from, to, amount);
    res.json({ ok: true });
});

// transferService.execute never sees a string — only validated domain types
function execute(from: AccountId, to: AccountId, amount: Money): void {
    ledger.debit(from, amount);
    ledger.credit(to, amount);
}

The boundary pattern means your domain logic is permanently isolated from raw input. If the API changes its field names, you update one mapper. If validation rules change, you update one constructor. The rest of the codebase is unaffected because it never touched a string in the first place.


FAQ

What is Primitive Obsession and why does it matter in production code?

Primitive Obsession is the code smell where domain concepts — email addresses, currency amounts, user identifiers — are represented as raw primitive types like string or number instead of dedicated types. It matters in production because it systematically bypasses the compiler: argument swaps, validation gaps, and typo-prone status strings all become runtime failures instead of build errors. At scale, the maintenance cost compounds — validation logic gets duplicated across dozens of files, and a single rule change requires hunting through the entire codebase.

Related materials
Idempotency in programming

Idempotency in Programming: Why Retries Without It Will Break You You send a payment request. The network hiccups. You retry. The charge goes through twice. Your user opens a ticket, your on-call engineer wakes up...

[read more →]

Are Value Objects a performance overhead compared to raw strings?

In most runtimes the overhead is negligible or zero. Branded types in TypeScript are fully erased at compile time — they add no runtime cost whatsoever. Kotlin’s value class is inlined by the compiler and generates the same bytecode as a primitive. Regular class-based Value Objects do add one object allocation per instance, but that allocation happens once at the boundary — not on every function call. Compared to re-parsing strings repeatedly inside business logic, Value Objects typically reduce allocations, not increase them.

How is a Value Object different from a regular DTO or data class?

A DTO is a passive data container — it holds fields with no rules attached. A Value Object encapsulates both data and the invariants that must always hold true for that data. An EmailAddress Value Object can never be constructed with an invalid value; a DTO with an email: string field can hold anything. Value Object equality is based on value — two Money(100, "USD") instances are equal regardless of reference identity. DTOs typically use reference equality by default.

What’s the right place to validate input — constructor or factory method?

Both are valid patterns with different trade-offs. A throwing constructor is simpler and ensures the object can never exist in an invalid state — any failure is immediate and explicit. A factory method (like EmailAddress.tryParse()) can return null or a Result type instead of throwing, which is more ergonomic in contexts where invalid input is expected (form validation, file parsing). For domain objects that should always be valid, throwing constructors are the idiomatic choice. For boundary parsing where failure is routine, factory methods with typed error returns are cleaner.

How do discriminated unions compare to enums for status fields?

Enums are better for closed, stable sets of values — HTTP methods, day-of-week, fixed configuration options. Discriminated unions in TypeScript shine when each variant carries different data: a paid order has a paymentId, a refunded order has a refundReason, a pending order has neither. Unions let you attach variant-specific fields and the compiler narrows the type inside each case branch automatically. For pure status flags with no associated data, enums are simpler. For state machines with per-state data, discriminated unions are structurally more expressive.

Can this pattern be applied incrementally to an existing codebase?

Yes — and incremental adoption is the only realistic approach for most production systems. Start with the highest-risk parameters: function signatures with multiple string arguments of the same conceptual type (user IDs, account IDs), status fields that drive branching logic, and any string that gets parsed more than once. Introduce Value Objects at the API and database boundaries first, then let types propagate inward naturally as you touch adjacent code. You don’t need to rewrite everything — even replacing three string parameters with typed wrappers in a critical payment flow eliminates an entire class of silent corruption bugs from that path.

 

Written by: