Programming – Code by Tom

Why “quick” code reviews are hurting you and your team

tjcafferkey@gmail.com — Thu, 15 Jan 2026 10:18:25 +0000

We’ve all been there: a notification pops up, you glance at the code, it looks “fine enough,” and you hit Approve.

I recently caught myself giving a rushed approval, and it forced me to realize that these “rubber-stamp” reviews aren’t just a personal lapse they are a systemic risk. As AI tools increasingly assist in writing our code, the ability to perform deep, intentional code reviews is becoming the most critical skill in an engineer’s toolkit

We are entering a world where the volume of code being produced is exploding, thanks to LLMs. While AI is great at generating syntax, it often misses the nuance of edge cases, architectural consistency, and long-term maintainability.

This is where you set yourself apart. In the age of AI, the “coder” is common, but the “discerning reviewer” is rare. If you treat reviews as a chore to be cleared, you are essentially outsourcing your team’s quality to an algorithm. Taking the time to truly understand the why behind every line of code is what transforms you from a code-writer into a high-level engineer.

Reviewing code is a competitive advantage

Ultimately, a code review is one of the best ways for a team to grow. It’s where trade-offs are discussed, different approaches are compared, and senior-level thinking is shared.

In a world saturated with AI-generated code, the engineers who take the time to deeply understand, test, and critique code are the ones who will greatly benefit themselves, the team and the product.

Tracking Record Divergence with Content Hashing

tjcafferkey@gmail.com — Fri, 24 Oct 2025 07:52:52 +0000

When users duplicate records in a database, tracking whether copies have diverged from their source becomes challenging. A content hash provides a simple mechanism to detect when duplicates no longer match their parent.

The Problem

Consider an email_templates table:

CREATE TABLE email_templates (
  id INT PRIMARY KEY,
  name VARCHAR(255),
  subject VARCHAR(255),
  body TEXT,
  footer TEXT,
  hash VARCHAR(64),
  parent_id INT NULL
);

CREATE TABLE email_templates (
  id INT PRIMARY KEY,
  name VARCHAR(255),
  subject VARCHAR(255),
  body TEXT,
  footer TEXT,
  hash VARCHAR(64),
  parent_id INT NULL
);

SQL

User A creates a template (id: 1, parent_id: null). Then User B duplicates this template (id: 2, parent_id: 1). Both records are identical.

User A then updates their template’s subject to “Welcome aboard!”. The parent_id on User B’s record still points to id: 1, but the records no longer match since the subjects are different.

Content Hashing

A hash function converts variable-length input into a fixed-length string. SHA-256 produces a 64-character hexadecimal output. Identical inputs always produce identical hashes; even single character changes produce completely different outputs.

const crypto = require('crypto');

function generateHash(data) {
  return crypto
    .createHash('sha256')
    .update(JSON.stringify(data))
    .digest('hex');
}

const crypto = require('crypto');

function generateHash(data) {
  return crypto
    .createHash('sha256')
    .update(JSON.stringify(data))
    .digest('hex');
}

JavaScript

Implementation

Whenever a user creates, or updates a record you take all (or some) of that records data and produce a hash based on it which is stored against the record.

async function updateEmailTemplate(id, updates) {
  const hashData = {
    subject: updates.subject,
    body: updates.body,
    footer: updates.footer
  };
  
  const hash = generateHash(hashData);
  
  await db.query(
    'UPDATE email_templates SET subject = ?, body = ?, footer = ?, hash = ? WHERE id = ?',
    [updates.subject, updates.body, updates.footer, hash, id]
  );
}

async function updateEmailTemplate(id, updates) {
  const hashData = {
    subject: updates.subject,
    body: updates.body,
    footer: updates.footer
  };
  
  const hash = generateHash(hashData);
  
  await db.query(
    'UPDATE email_templates SET subject = ?, body = ?, footer = ?, hash = ? WHERE id = ?',
    [updates.subject, updates.body, updates.footer, hash, id]
  );
}

JavaScript

Detecting Divergence

Now in our example, since we have two records which are linked through id and parent_id we can now determine whether they have diverged from each other by comparing the hashes. Since User A updated the subject heading, their hash will now be different to User B’s copy.

async function hasDiverged(recordId) {
  const record = await db.query(
    'SELECT hash, parent_id FROM email_templates WHERE id = ?',
    [recordId]
  );
  
  if (!record.parent_id) {
    return false; // Not a duplicate
  }
  
  const parent = await db.query(
    'SELECT hash FROM email_templates WHERE id = ?',
    [record.parent_id]
  );
  
  return record.hash !== parent.hash;
}

async function hasDiverged(recordId) {
  const record = await db.query(
    'SELECT hash, parent_id FROM email_templates WHERE id = ?',
    [recordId]
  );
  
  if (!record.parent_id) {
    return false; // Not a duplicate
  }
  
  const parent = await db.query(
    'SELECT hash FROM email_templates WHERE id = ?',
    [record.parent_id]
  );
  
  return record.hash !== parent.hash;
}

JavaScript

Considerations

Select which fields to include in the hash. Exclude timestamps, usage counts, or other metadata that shouldn’t trigger divergence detection.
Store the hash in the database rather than computing it on demand to avoid performance penalties when checking multiple records.
For large text fields, consider hashing normalized versions (trimmed, lowercase) to avoid detecting inconsequential changes.

The hash comparison provides O(1) divergence detection without field-by-field comparisons or storing complete record history.

An overview of basic cryptographic concepts

tjcafferkey@gmail.com — Wed, 20 Aug 2025 18:12:24 +0000

I’ve been learning a little more about cryptographic fundamentals, this post documents my exploration. I’m focusing on the practical use cases, underlying mechanisms, and trade-offs that influence implementation decisions. In future, I plan to dive into more of the implementation of some of the below.

Hashing and salting

Hashing is basically a one-way function that takes any input and spits out a fixed size string of characters. Think of it like a meat grinder, you can put a steak in and get minced/ground beef out, but you can’t reverse it and turn it back into a steak. The same input always produces the same output, but even tiny changes to the input create completely different results.

Software uses hashing all over the place. The big one is password storage, instead of keeping actual passwords in your database (which would be a nightmare if someone broke in), you store the hash. When a user logs in, you hash their input and compare it to what’s stored. Hashing also gets used for file integrity checks, cache keys, and digital signatures.

Here’s where salting comes in. Without salt, identical passwords create identical hashes, which makes life easy for attackers. They can build these massive lookup tables called rainbow tables with common passwords and their hashes. Salt is just random data you add to each password before hashing it. Every user gets their own unique salt, so even if two people use “password123“, their hashes look completely different because we use different salt values to hash them.

The trick with salt is that you need to store it alongside the hash in your database. When someone tries to log in, you grab both the stored hash and the stored salt, then use that same salt to hash their input password. If the result matches what’s in the database, they’re good to go. The salt doesn’t need to be secret it just needs to be unique per user and stored permanently so you can use it again during verification.

Digital signatures

Digital signatures use asymmetric cryptography to bind a message to its creator through mathematical proof. The process begins when a sender generates a hash of their message. This hash gets encrypted with the sender’s private key. The resulting signature attaches to the original message for transmission.

Verification reverses this process. The recipient hashes the received message using the same algorithm, then decrypts the signature using the sender’s public key. If the decrypted hash matches the computed hash, the signature is validated. This proves the message originated from the private key holder and remains unmodified during transmission.

Digital signatures don’t hide data. They prove origin and detect tampering. For confidentiality, you need separate encryption.

Public-key cryptography

Public-key cryptography operates on asymmetric key pairs generated through mathematical algorithms. The algorithms create two keys where data encrypted with one can only be decrypted with the other. This mathematical relationship forms a ‘trapdoor function’, computationally easy in one direction but practically impossible to reverse without the private key.

The encryption process depends on which key initiates the operation. When encrypting data for confidentiality, you use the recipient’s public key, ensuring only they can decrypt with their private key. For digital signatures (described above), you encrypt a hash of your message with your private key, allowing anyone to verify authenticity using your public key. This dual functionality addresses both secrecy and authentication requirements.

The difference between symmetric and asymmetric encryption

Symmetric encryption uses a single shared key for both encryption and decryption, requiring secure key distribution between parties but offering fast performance. Asymmetric encryption uses mathematically related key pairs where the public key encrypts data that only the corresponding private key can decrypt, eliminating the key distribution problem but at the cost of significantly slower performance.

Most practical systems combine both approaches: asymmetric algorithms establish a shared symmetric key. This hybrid model leverages the security benefits of asymmetric cryptography for key exchange while maintaining the performance advantages of symmetric encryption.

The Marathon of Software Engineering

tjcafferkey@gmail.com — Fri, 18 Apr 2025 07:37:03 +0000

My brother is running an ultra-marathon today, stepping up to the starting line with minimal training. As I think about what leads up to someone stepping on the start line, I’m struck by how much it mirrors the world of software engineering. The race itself, the day of the event, isn’t the real story. It’s the culmination of months of unseen effort: waking up in the dark, braving cold weather, training alone while others sleep or relax, and balancing family life with an intense schedule. That’s where the true grit lives.

There’s paralells in software, when a polished product or solution launches, it’s a celebration, a finish line. But the real work happens long before, in the quiet moments no one sees. It’s the late nights or long days debugging a stubborn bug, the countless iterations of trying and failing and then trying again, and the collaborative grind of aligning a team toward a single goal. These are the miles logged in the dark, the ones that don’t make it to the highlight reel.

Just like an endurance athlete, a software engineer thrives on overcoming hurdles. It’s in the messy, challenging process solving complex problems, navigating trade-offs, and learning from failures, that it really comes alive. The satisfaction doesn’t come from the final product alone but from the harmony of a team working together, iterating, and pushing through setbacks.

Watching my brother tackle his ultra-marathon today, I’m reminded that the real reward in any tough endeavor, whether running miles or building software, is the grind itself that lead you to the day of the event. The unseen effort, the teamwork, and the problem-solving make the finish line worth crossing.

A Performant Way To Batch Delete Transient Data in WordPress

tjcafferkey@gmail.com — Tue, 01 Apr 2025 08:00:00 +0000

As a developer working with WordPress and WooCommerce, I recently had the chance to tackle a small but meaningful optimization. Transients in WordPress are a handy way to store temporary data. The standard approach requires looping through a list and calling `delete_transient()` for each one individually, which, while functional, isn’t as efficient as it could be. I saw an opportunity to improve this process and submitted a pull request to WooCommerce to introduce a new function: `_wc_delete_transients()`.

You can check out the full details in the pull request here: [WooCommerce PR #55931](https://github.com/woocommerce/woocommerce/pull/55931).

The idea behind `_wc_delete_transients()` is simple, rather than deleting transients one by one, it handles them in a single operation whenever possible, leveraging a direct database query. This approach cuts down on overhead and speeds things up significantly, especially when you’re working with hundreds of transients at a time. Of course, it’s designed to play nicely with WordPress’s architecture: it falls back to individual deletions if an external object cache is in use and includes safeguards like chunking large batches to avoid hitting database limits.

To see how it performs, I ran some tests. Deleting 500 transients using the traditional loop with `delete_transient() and compare the results with `_wc_delete_transients()`. The results showed that `_wc_delete_transients()` is over 100% more performant. Put another way, over twice as fast. While these numbers come from a controlled test, they suggest a solid performance boost for WooCommerce stores that need to clear out transients in bulk, like during cache purges or updates.

I kept the implementation cautious and practical. It respects WordPress’s caching mechanisms, updates the `alloptions` cache when needed, and includes error handling to log any issues that might pop up. It’s marked as an internal function, meaning it’s intended for WooCommerce’s core use rather than direct public access, but I hope it proves useful behind the scenes.

This was a modest project, but it’s satisfying to see how a targeted tweak can make a difference. For WooCommerce developers and store owners, faster transient deletion could mean quicker maintenance tasks and a slightly snappier experience overall.

Here is the full implementation

Why You Should Start Writing DocBlock Comments More in Your Codebase

tjcafferkey@gmail.com — Wed, 26 Mar 2025 09:00:00 +0000

In our current codebases, we’ve leaned on static type systems or naming conventions to document our functions. And that’s not a bad start, they give us the basics: what goes in, what comes out, and the general shape of things. But as our project grows and we rely more on AI-powered tools, I’ve been thinking about what those approaches don’t cover. They don’t tell us the why, the context, the edge cases, or the quirks. That’s where comment blocks, those structured, natural-language notes above our functions come in. They’re especially vital for making our codebase work seamlessly with large language models (LLMs). That said, not every function needs a doc block, and when you do write them, they don’t need to spell out the “what” behind the code. You and tools like LLMs can already read that from the code itself. Here’s why we should make them a habit.

When to Use Them (and When Not To)

Doc blocks are great, but they’re not mandatory for every piece of code. If a function’s name, parameters, and implementation are self-explanatory—like add($a, $b) that just returns $a + $b – adding a doc block is overkill. Modern IDEs, and even LLMs, can read that code and tell you exactly what it does without extra help. Save doc blocks for cases where the purpose or context isn’t obvious, think complex logic, edge cases, or business-specific rules. The goal is clarity, not clutter.

And when you do write them, don’t waste time repeating the “what.” The example above? It just echoes what the function name and code already say. You don’t need a doc block to tell you it calculates a total with tax—code and AI can figure that out. Instead, focus on the “why.” Why does this function exist? What’s the bigger picture? Here’s a better take:

/**
 * Checks if a user has exceeded their daily login attempts.
 * Caps are enforced per calendar day to prevent brute-force attacks.
 * 
 * @param attempts - Number of login attempts so far
 * @param maxAttempts - Maximum allowed attempts (default: 5)
 * @returns True if the limit is exceeded, false otherwise
 */
function isLoginLimitExceeded(attempts: number, maxAttempts: number = 5): boolean {
  return attempts > maxAttempts;
}

JavaScript

The doc block for isLoginLimitExceeded is helpful because it goes beyond the simple code, attempts > maxAttempts to explain why the function exists: it enforces a security cap on daily login attempts to prevent brute-force attacks. This context, including the daily scope, isn’t obvious from the code alone, making the doc block valuable for understanding intent and avoiding misuse, all while staying concise and not repeating what the code already shows.

We’re not just writing code for ourselves anymore. Context is king.

Beyond AI: The Human Factor

Even setting LLMs aside, doc blocks make our lives easier. Types won’t tell you why we chose one approach over another, or what trade-offs we accepted. They won’t warn you about that one edge case which crashed production. Doc blocks do.

A Small Investment, A Big Payoff

The best bit is that it’s really cheap to do, especially since LLMs can handle most of the heavy lifting. Tools like Cursor can generate solid doc block drafts based on our code and a quick prompt, leaving us to just tweak or approve them. That tiny upfront effort pays off fast. Better comments mean better AI assistance, fewer misunderstandings, and a codebase that’s more maintainable for humans and machines alike. Let’s start adding them where it makes sense especially in complex or critical functions.

QuickSort: The Clever Sorting Algorithm

tjcafferkey@gmail.com — Tue, 11 Mar 2025 09:05:04 +0000

Sorting data is a common task in programming, and I’ve learned that QuickSort is one of the most efficient ways to do it. I’ve recently been learning about this algorithm. What makes it so efficient, when to use it, and if it is practical.

What QuickSort Does

QuickSort takes an unordered list like [5, 2, 8, 1], and sorts it into order, like [1, 2, 5, 8]. It’s a “divide and conquer” method:

1. It picks a “pivot” (say, the first number, which is 5).
2. Split the list: numbers smaller than the pivot go left, bigger ones go right.
3. Repeat on each half until everything’s sorted.

Here’s a simple JavaScript version

/**
 * Sorts an array in ascending order using the quicksort algorithm.
 * The function modifies the input array in-place and uses an optimized
 * approach by sorting the smaller partition first to reduce stack space.
 *
 * @param {Array} arr - The array to be sorted
 * @param {number} [low=0] - The starting index of the partition to sort
 * @param {number} [high=arr.length-1] - The ending index of the partition to sort
 * @returns {Array} The sorted array
 */
function quickSort(arr, low = 0, high = arr.length - 1) {
    while (low < high) {
        const pivotIndex = partition(arr, low, high);
        // Sort the smaller partition first to optimize stack space
        if (pivotIndex - low < high - pivotIndex) {
            quickSort(arr, low, pivotIndex);
            low = pivotIndex + 1;
        } else {
            quickSort(arr, pivotIndex + 1, high);
            high = pivotIndex;
        }
    }
    return arr;
}

/**
 * Partitions an array around a pivot element for the quicksort algorithm.
 * Uses the first element as the pivot and rearranges elements so that all
 * values less than the pivot are on the left and all values greater are on
 * the right.
 *
 * @param {Array} arr - The array to partition
 * @param {number} low - The starting index of the partition
 * @param {number} high - The ending index of the partition
 * @returns {number} The final position of the pivot element
 */
function partition(arr, low, high) {
    const pivot = arr[low];
    let i = low - 1;
    let j = high + 1;
    while (true) {
        do { i++; } while (arr[i] < pivot);
        do { j--; } while (arr[j] > pivot);
        if (i >= j) return j;
        [arr[i], arr[j]] = [arr[j], arr[i]];
    }
}

JavaScript

It picks 5, moves 2 and 1 left, 8 right, then sorts each side. Done.

How It Performs

QuickSort is known for being fast, especially for most lists you throw at it. It sorts in-place, reusing the original list without needing extra space, which keeps memory use low. While it can slow down with really big or tricky lists, it’s usually quick and efficient for most things.

QuickSort vs Native Implementations

My exploration of QuickSort implementations in PHP and JavaScript reveals a clear performance gap compared to native sorting solutions. In PHP, benchmarks showed native sort() sorted an array of 10,000 items 85% faster (8.1ms vs 1.2ms) than our PHP implementation. JavaScript followed a similar trend, with Array.sort() sorting an array of 500 items 68% faster (1.9ms vs 0.6ms) than our JavaScript implementation.

My Thoughts

Is writing a custom QuickSort worth it? It seems generally, no – if speed is the priority, native functions appear unbeatable. However, implementing QuickSort offers educational value and flexibility for specific use cases. For production code, stick with the native solution unless you have a compelling reason to roll your own.

Understanding the Halting Problem

tjcafferkey@gmail.com — Mon, 17 Feb 2025 10:45:16 +0000

Recently, I’ve been learning about something fascinating in computer science called the Halting Problem. Don’t worry if you’re not a programmer, I’ll be explaining this in the only way I know how. In simple terms.

The Time & Paradox Problem

Here’s the fundamental issue: How can you predict if something will go on forever… without waiting forever to check?

Imagine you’re watching a program run, and you want to know if it will ever stop. If it does stop, great! You have your answer. But if it keeps running… how long do you wait? 1 minute? 1 hour? 1 year? Maybe it would have stopped if you had waited just one more second.

This creates a catch-22: To know for certain if a program will run forever, you’d need to watch it forever. But if you watch it forever, you’ll never be able to tell anyone the answer.

But it gets even trickier – even if we could somehow look into the future we still couldn’t build a perfect predictor because we could always build a program designed to do the opposite of what our predictor says. For example, our program could:

Ask The Predictor if it (our program) will halt.
If The Predictor says “Yes” -> Run forever
If The Predictor says “No” -> Stop immediately.

This is the paradox. And it proves that it’s impossible to create a program that can always correctly determine if other programs will halt

Why This Matters in Real Life

After learning about this I was curious to know how this impacted my every day life as a programmer. Had I used tools that were limited because of it? Or considered this problem prior to knowing its actual name? It turns out:

This is why your computer can’t always tell if you’re stuck in an infinite loop.
It’s why programming tools (static code analysis etc) can only give you warnings about possible endless loops, but can’t be 100% certain.
It’s why cloud services need to set time limits on executions instead of waiting to see if they’ll finish.
It’s why we force-quit programs that might be stuck.

The Bigger Picture

Sometimes knowing what’s impossible is just as valuable as knowing what’s possible. Because what fascinated me most was realising that there are things computers fundamentally cannot do, no matter how powerful they become. It’s not a limitation of today’s technology – it’s a mathematical impossibility, like trying to find the end of infinity.

Script to Convert CSV to Github Issues

tjcafferkey@gmail.com — Tue, 21 Jan 2025 16:06:46 +0000

Inspired by wanting to die at the thought of creating multiple issues via Githubs UI.

I wanted to share a new script I’ve put together that should make issue creation a bit easier. If you’ve ever needed to create multiple GitHub issues at once (like when planning a new feature or documenting a batch of bugs), this should save you some time.

What it does

This is a simple script that creates multiple GitHub issues from a CSV file. You just need to prepare a CSV with your issue details (title, description, and optional labels), and the script will create them all at once.

Where to find it

https://github.com/tjcafferkey/gh-issue-creator

Whats next

If it proves useful, some ideas of whats next:

Assignee support
Project board support
Better markdown support, perhaps via a UI? (you can already use it I think but, but would require some escaping etc)

LLMs Will Supercharge Open Source Projects Like WordPress and WooCommerce

tjcafferkey@gmail.com — Wed, 18 Dec 2024 13:18:03 +0000

Open source software has always thrived on community knowledge and collaboration. Now, with the emergence of Large Language Models (LLMs), we’re entering a new era where these AI systems can act as force multipliers for open source projects, particularly those with rich ecosystems like WordPress and WooCommerce.

The Power of Collective Knowledge

What makes LLMs particularly powerful for open source projects is their ability to understand and synthesize vast amounts of information from multiple sources:

Public code repositories
Developer documentation
Community forums and discussions
Stack Overflow questions and answers
Blog posts and tutorials
GitHub issues and pull requests

This collective knowledge, accumulated over years of community contributions, gives LLMs a comprehensive understanding of both official APIs and real-world usage patterns.

Beyond Official Documentation

One of the most fascinating aspects of LLMs is their ability to understand not just the “official” way of doing things, but also the creative solutions developers have discovered in the wild. For WordPress and WooCommerce specifically, this includes:

Common workarounds for missing API functionality
Unofficial but widely-used hooks and filters
Creative solutions for extending core functionality
Integration patterns with popular plugins and themes
Performance optimization techniques

This knowledge extends far beyond what any single developer or team could maintain in traditional documentation.

The Open Source Advantage

This is where open source projects have a significant advantage over proprietary software. In closed systems, LLMs can only learn from public documentation and limited external discussions. But with open source projects, they can understand:

The complete codebase and its evolution over time
Internal discussions about design decisions
Bug reports and their resolutions
Community-driven feature requests
Real-world implementation examples

Democratized Development

The combination of LLMs and open source creates a powerful feedback loop: as more developers contribute solutions, LLMs become better at suggesting improvements, which in turn helps more developers contribute effectively. This democratization of development knowledge means:

Newer developers can contribute meaningful code faster
Experienced developers can tackle more complex challenges
The community can focus on innovation rather than reinventing wheels
Solutions can be implemented and tested more rapidly

For platforms like WordPress and WooCommerce, this accelerated development cycle means staying ahead of proprietary alternatives in ways that weren’t possible before. Features that might have taken months to implement can now be completed in weeks. Bug fixes that required extensive debugging can be resolved in hours. And most importantly, the entire community benefits from this collective intelligence in real-time.

The open nature of these projects means that every solution, every optimization, and every creative workaround becomes part of the shared knowledge base that LLMs can learn from and suggest to others. This creates an unprecedented advantage that proprietary software simply cannot match – one that will continue to grow stronger as AI technology evolves.

For developers and organizations choosing between open source and proprietary solutions, this emerging advantage isn’t just about cost or flexibility anymore – it’s about tapping into an AI-powered ecosystem that’s becoming exponentially more efficient at solving problems and driving innovation.