Something Something Programming

AI Reproduction of Lin’s Busy Beaver Proof

2026-01-16T00:00:00+00:00

Does anybody care about “look what AI can do” posts anymore? We’re running out of places to move the goalposts to, but expectations are adjusting quickly. What would have been a huge breakthrough a year ago is a yawn today. Given that AI is solving open math problems, this little piece of news might not seem very exciting, but I figured I would share it anyway. Namely: ChatGPT is able to reproduce Shen Lin’s 1963 Busy Beaver proof.

The Busy Beaver problem asks: what is the longest that a Turing machine program of N states can run before halting when started on the blank tape? This problem is uncomputable, which is to say that there is no general method for determining BB(N) for all values of N. Even proving Busy Beaver for small values of N is difficult. The problem was first posed in 1962, but the value of BB(5) was not proved until 2024.

Proving that BB(N) = K requires enumerating all N-state Turing machine programs and proving of each one that it either halts with K steps or does not halt at all. This is difficult because 1) the number of programs to check increases exponentially with N and 2) individual program complexity increases in some unspecified-but-substantial quantity with N. In other words, the basic structure of a Busy Beaver proof is a conjunction of individual program proofs, and as N grows the length of the conjunction explodes and the proofs get all harder.

Lin’s proof that BB(3) = 21 works as follows. First, observe that we can prune the search space substantially by normalizing the first program instruction to A0:1RB (a limited form of Brady’s algorithm), leaving 80,000 or so programs to check. Next, run all of these for 21 steps, since it is known by explicit witness that BB(3) ≥ 21. Then run all remaining programs for 50 steps and check for rudimentary looping behavior: either getting stuck in place going back and forth or getting stuck doing the same thing moving off to the left or right. As it happens almost all 3-state non-halting programs end up in this condition, now known as Lin recurrence. This leaves exactly 40 holdouts that Lin claimed to have analyzed by hand and verified not to halt.

This is all laid out in Lin’s dissertation. No code is provided; all methods and algorithms are described in English. It has been argued in the past that is not a very precise way of doing things, since it makes the result difficult to reproduce. On the other hand, I doubt including a bunch of IBM 7090 machine code would have helped to elucidate anything. Besides, English is a good-enough method of describing algorithms, and I can attest that I was personally able to reproduce the results myself. Which is to say, I wrote some code (in Python) according to Lin’s descriptions and that code was able to reproduce the exact same 40 holdouts listed.

Reproduction of results in this case was possible, but not so easy. For one thing, the recurrence algorithm is just a minefield of off-by-one errors, and it is tricky to get it right. For another thing, it requires reading through the whole text and figuring out the author’s ad-hoc conventions for Turing machine program notation, which are different from what is used today. Lin identifies programs using 8-digit octal “serial numbers” (I guess so it could fit within a “machine word”), so even reading this holdout list requires a decoding procedure. Overall, the whole thing took some doing. Wouldn’t it be nice if the drudgery could be automated?

Yes, well, ChatGPT can do it now. Just give it the PDF along with some prompts: “discuss problem, methods, and results” and “want to reproduce result. produce single c file containing whole pipeline. program enumeration, pruning, etc according to lin’s methods”. It didn’t get it right on the first try or the second, but by the third try it did indeed produce a single C file that runs enumeration and filtering, ending up with the correct 40 holdouts. It even got the serial number scheme right (although initially it got the nybbles backwards). The whole thing took about a half hour.

I’ll include the code in full below, if anyone wants to verify it. But first, a challenge / prediction. Lin’s proof of BB(3) is ultimately pretty straightforward. The recent proof of BB(5) is quite a bit more involved, with a variety of separate decider components. I believe it is now or will soon be possible to implement the whole thing from start to finish in a single C file just from the PDF. Other languages could be used. Lean would be an especially good choice. This might be a fun challenge for anyone looking to work on an AI project.

Anyway, here is the C code that implements Lin’s proof. I didn’t write any of it. Take a look with fair-minded skepticism.

// lin_bb3.c
// Reproduce Shen Lin (1963) BB-3 (3-card binary) normalized search pipeline.
//
// This single-file program implements:
//   1) Lin's normalized enumeration: 4 lots x 12^4 = 82,944 machines.
//      Fixed lines: Card1-0 = 112, and the unique stop-line = 110.
//   2) Discard machines that stop in <= 21 shifts (recording scores).
//   3) Lin's "obvious" pruning rules for some lots.
//   4) Lin's PARTIAL RECURRENCE routine (36-bit tape word, start square at bit 18)
//      exactly as described in Chapter III.
//   5) Print remaining "holdouts" in standard TM program notation:
//         A0 A1  B0 B1  C0 C1
//      e.g. 1RB 1RH  0LC 0RA  1LA 1LB
//
// Build:
//   gcc -O3 -std=c11 -Wall -Wextra lin_bb3.c -o lin_bb3
// Run:
//   ./lin_bb3
//
// References (from Lin dissertation PDF):
// - Normalization to 82,944 and lots, stop<=21 phase: see Chapter III. fileciteturn16file2L13-L22
// - Partial recurrence routine formulas and 50-shift bound, spill check: fileciteturn16file3L1-L20
// - Barrier intuition: compare tape between barriers / drifting recurrence: fileciteturn16file0L11-L15

#include 
#include 
#include 
#include 

#define NUM_LINES 6
#define NUM_LOTS 4
#define MAX_STOP_SCAN 21
#define MAX_REC_SHIFTS 50

// Full tape for accurate scoring in the <=21 shift scan
#define TAPE_SIZE 4096
#define TAPE_MID  (TAPE_SIZE/2)

// 36-bit tape word used by Lin's recurrence routine
#define WORD_BITS 36
#define START_BIT 18          // starting square at bit 18 fileciteturn16file0L18-L19
#define DEV_LIMIT 17          // spill when |deviation| > 17 fileciteturn16file3L11-L14
#define WORD_MASK ((uint64_t)((1ULL<
// --- Bit-numbering conventions (compile-time tunable) ---
// Lin's text uses expressions like "T shifted left 18 + D bits" fileciteturn16file3L1-L3.
// Different machines/notations may number bits MSB->LSB or LSB->MSB.
// To reproduce Lin's counts, we keep the mapping explicit:
//   - deviation d corresponds to bit position BITPOS(d) within the 36-bit word.
//   - the "shift left" operation used in comparisons is SHIFT36(word, k).

#ifndef BITPOS
// Default: bit index = START_BIT + deviation (so D=0 is bit 18)
#define BITPOS(d) (START_BIT + (d))
#endif

#ifndef SHIFT36
// Default: C-style logical left shift within 36-bit width
#define SHIFT36(word, k) (shl36((word), (k)))
#endif

#ifndef PRINT_HOLDOUTS
#define PRINT_HOLDOUTS 1
#endif

#ifndef SHIFT36
#define SHIFT36(w, k) shift_left36((w), (k))
#endif

static inline int line_index(int card /*1..3*/, int sym /*0/1*/) {
    return (card-1)*2 + sym;
}

// Lin's 4-bit line encoding: [p][s][c1][c0]
static inline uint8_t enc_line(uint8_t p, uint8_t s, uint8_t c) {
    return (uint8_t)((p<<3) | (s<<2) | (c & 3));
}
static inline uint8_t get_p(uint8_t w) { return (w>>3)&1; }
static inline uint8_t get_s(uint8_t w) { return (w>>2)&1; }
static inline uint8_t get_c(uint8_t w) { return w & 3; }

// 12 possible non-stop cases for a free line: p∈{0,1}, s∈{0,1}, c∈{1,2,3}
static void gen_12_cases(uint8_t cases12[12]) {
    int t = 0;
    for (uint8_t p=0;p<=1;p++) {
        for (uint8_t s=0;s<=1;s++) {
            for (uint8_t c=1;c<=3;c++) {
                cases12[t++] = enc_line(p,s,c);
            }
        }
    }
}

// Build normalized machine for a given lot, with 4 free lines
static void build_machine_for_lot(int lot, const uint8_t free4[4], uint8_t out[NUM_LINES]) {
    // initialize with a placeholder non-stop line
    for (int i=0;i<NUM_LINES;i++) out[i] = enc_line(0,0,1);

    // fixed Card1-0 line = 112
    out[0] = enc_line(1,1,2);

    // determine stop-line index per lot
    // Lot1: Card1-1
    // Lot2: Card2-1
    // Lot3: Card3-0
    // Lot4: Card3-1
    int stop_idx = -1;
    if (lot == 1) stop_idx = 1;
    if (lot == 2) stop_idx = 3;
    if (lot == 3) stop_idx = 4;
    if (lot == 4) stop_idx = 5;

    // stop-line fixed to 110
    out[stop_idx] = enc_line(1,1,0);

    // assign remaining 4 lines in deterministic order:
    // all indices except 0 and stop_idx
    int k = 0;
    for (int i=0;i<NUM_LINES;i++) {
        if (i==0 || i==stop_idx) continue;
        out[i] = free4[k++];
    }
}

// Lin's "obvious" pruning rules
// Lot1: discard if no call to Card1 appears in Cards 2 and 3
// Lots3&4: discard if no call to Card3 appears in Cards 1 and 2
// (as stated in Chapter III results discussion) fileciteturn10file0L365-L366
static int prune_obvious(int lot, const uint8_t lines[NUM_LINES]) {
    if (lot == 1) {
        // among Card2/3 lines (idx2..5), check any c==1
        for (int i=2;i<=5;i++) if (get_c(lines[i]) == 1) return 0;
        return 1;
    }
    if (lot == 3 || lot == 4) {
        // among Card1-1 (idx1) and Card2 lines (idx2,idx3), check any c==3
        if (get_c(lines[1]) == 3) return 0;
        if (get_c(lines[2]) == 3) return 0;
        if (get_c(lines[3]) == 3) return 0;
        return 1;
    }
    return 0;
}

// --- TM program notation printer ---
// Card1=A, Card2=B, Card3=C, stop=H
static char state_letter(uint8_t c) {
    if (c==0) return 'H';
    if (c==1) return 'A';
    if (c==2) return 'B';
    return 'C';
}

static void line_to_tm(uint8_t w, char out[4]) {
    // out: e.g. "1RB"
    out[0] = (char)('0' + get_p(w));
    out[1] = get_s(w) ? 'R' : 'L';
    out[2] = state_letter(get_c(w));
    out[3] = '\0';
}

static inline uint32_t lin_serial24_from_lines(const uint8_t L[6]) {
    // L order must be: A0 A1 B0 B1 C0 C1
    return ((uint32_t)(L[0] & 0xF) << 20) |
           ((uint32_t)(L[1] & 0xF) << 16) |
           ((uint32_t)(L[2] & 0xF) << 12) |
           ((uint32_t)(L[3] & 0xF) <<  8) |
           ((uint32_t)(L[4] & 0xF) <<  4) |
           ((uint32_t)(L[5] & 0xF) <<  0);
}

static inline void print_lin_serial_octal_from_lines(const uint8_t L[6]) {
    printf("%08o", lin_serial24_from_lines(L));
}

static void print_machine_tm(const uint8_t lines[NUM_LINES]) {
    printf("Serial=");
    print_lin_serial_octal_from_lines(lines);
    printf("  ");
    char a0[4], a1[4], b0[4], b1[4], c0[4], c1[4];
    line_to_tm(lines[0], a0);
    line_to_tm(lines[1], a1);
    line_to_tm(lines[2], b0);
    line_to_tm(lines[3], b1);
    line_to_tm(lines[4], c0);
    line_to_tm(lines[5], c1);
    printf("%s %s  %s %s  %s %s", a0,a1,b0,b1,c0,c1);
}

// --- Phase 1: run machine up to 21 shifts, accurate score on full tape ---
// STOP line halts after executing its print+shift (Lin fixes stop-line to 110).
// Score = number of 1s on tape at stop.
static int tape_score(const uint8_t *tape, int min_i, int max_i) {
    int s = 0;
    for (int i=min_i;i<=max_i;i++) s += (tape[i] != 0);
    return s;
}

typedef struct {
    int stopped; // 1 if stopped within bound
    int shifts;
    int score;
} StopScanResult;

static StopScanResult run_stop_scan_21(const uint8_t lines[NUM_LINES]) {
    uint8_t tape[TAPE_SIZE];
    memset(tape, 0, sizeof(tape));
    int head = TAPE_MID;
    int card = 1;
    int minTape = head, maxTape = head;

    for (int s=1; s<=MAX_STOP_SCAN; s++) {
        int scanned = tape[head] & 1;
        uint8_t w = lines[line_index(card, scanned)];

        // execute print
        tape[head] = get_p(w);
        if (head < minTape) minTape = head;
        if (head > maxTape) maxTape = head;

        // execute shift
        if (get_s(w)) head++; else head--;
        if (head < 0 || head >= TAPE_SIZE) {
            StopScanResult r = {0, s, 0};
            return r;
        }

        // stop?
        if (get_c(w) == 0) {
            StopScanResult r;
            r.stopped = 1;
            r.shifts = s;
            r.score = tape_score(tape, minTape, maxTape);
            return r;
        }

        card = get_c(w);
    }

    StopScanResult r = {0, MAX_STOP_SCAN, 0};
    return r;
}

// --- Lin's 36-bit recurrence routine implementation ---

// 36-bit shift-left with zero fill, keeping 36-bit width
static inline uint64_t shl36(uint64_t x, int k) {
    if (k <= 0) return x & WORD_MASK;
    if (k >= WORD_BITS) return 0ULL;
    return (x << k) & WORD_MASK;
}

// 36-bit shift-right with zero fill (for alternative bit-numbering conventions)
static inline uint64_t shr36(uint64_t x, int k) {
    if (k <= 0) return x & WORD_MASK;
    if (k >= WORD_BITS) return 0ULL;
    return (x >> k) & WORD_MASK;
}

static inline int bit_at(uint64_t T, int dev) {
    // return tape bit at square with deviation dev (within [-DEV_LIMIT..DEV_LIMIT])
    int bp = BITPOS(dev);
    if (bp < 0 || bp >= WORD_BITS) return 0; // outside tracked word treated as 0
    return (int)((T >> bp) & 1ULL);
}

// Compare tape segments for Lin's barrier recurrence logic (see discussion preceding routine)
// For Dq < D (current head is to the right): compare the portion of tape to the right of the
// left barrier (minimum deviation Dmin) with the earlier pattern shifted by delta = D - Dq.
static int compare_right_of_left_barrier(uint64_t Tq, uint64_t T, int Dmin, int delta) {
    // Compare dev in [Dmin .. DEV_LIMIT - delta] : Tq[dev] == T[dev + delta]
    int start = Dmin;
    int end = DEV_LIMIT - delta;
    if (end < start) return 0;
    for (int dev = start; dev <= end; dev++) {
        if (bit_at(Tq, dev) != bit_at(T, dev + delta)) return 0;
    }
    return 1;
}

// For Dq > D (current head is to the left): compare the portion of tape to the left of the
// right barrier (maximum deviation Dmax) with the earlier pattern shifted by delta = D - Dq (negative).
static int compare_left_of_right_barrier(uint64_t Tq, uint64_t T, int Dmax, int delta) {
    // delta < 0. Compare dev in [(-DEV_LIMIT - delta) .. Dmax] : Tq[dev] == T[dev + delta]
    int start = -DEV_LIMIT - delta;
    if (start < -DEV_LIMIT) start = -DEV_LIMIT;
    int end = Dmax;
    if (end < start) return 0;
    for (int dev = start; dev <= end; dev++) {
        if (bit_at(Tq, dev) != bit_at(T, dev + delta)) return 0;
    }
    return 1;
}

static inline uint64_t mask_range_bits(int lo, int hi) {
    // inclusive, within [0..35]
    if (lo < 0) lo = 0;
    if (hi > WORD_BITS-1) hi = WORD_BITS-1;
    if (hi < lo) return 0ULL;
    int len = hi - lo + 1;
    if (len >= WORD_BITS) return WORD_MASK;
    uint64_t m = (len == 64) ? ~0ULL : ((1ULL << len) - 1ULL);
    return (m << lo) & WORD_MASK;
}

typedef struct {
    uint64_t T;
    int S;
    int D;
} TBEntry;

typedef enum {
    REC_LOOPED = 1,
    REC_NO_RECURRENCE = 0,
    REC_SPILL = -1,
    REC_STOPPED = 2
} RecResult;

// Compute min/max deviation between shifts Sq and s inclusive
static inline void dev_minmax(const int dev[], int Sq, int s, int *outMin, int *outMax) {
    int mn = dev[Sq];
    int mx = dev[Sq];
    for (int k=Sq; k<=s; k++) {
        if (dev[k] < mn) mn = dev[k];
        if (dev[k] > mx) mx = dev[k];
    }
    *outMin = mn;
    *outMax = mx;
}

// Lin recurrence routine: run up to 50 shifts looking for partial recurrence.
// Returns:
//  - REC_LOOPED if recurrence detected => discard never-stopper
//  - REC_NO_RECURRENCE if none within 50 => holdout
//  - REC_SPILL if |deviation|>17 => holdout (spilled beyond 36-bit word)
//  - REC_STOPPED if it stops (should not happen if SH(3)=21)
static RecResult run_lin_recurrence_50(const uint8_t lines[NUM_LINES]) {
    // tape word bits: bit(BITPOS(D)) corresponds to square at deviation D
    uint64_t T = 0ULL;
    int D = 0;      // deviation of head relative to starting square
    int card = 1;

    // deviation history (after each shift) dev[s] = D
    int dev[MAX_REC_SHIFTS+1];
    dev[0] = 0;

    // Tape tables TB[i][j], i=1..3, j=0..1
    TBEntry tb[4][2][MAX_REC_SHIFTS+1];
    int tbCount[4][2];
    memset(tbCount, 0, sizeof(tbCount));

    // We begin before shift 1 with all-0 tape; scanned digit at start is 0.

    for (int s=1; s<=MAX_REC_SHIFTS; s++) {
        // scanned symbol at current head (deviation D)
        if (D < -DEV_LIMIT || D > DEV_LIMIT) {
            return REC_SPILL;
        }
        int bitpos = BITPOS(D);
        int scanned = (int)((T >> bitpos) & 1ULL);

        // execute current instruction
        uint8_t w = lines[line_index(card, scanned)];
        uint8_t p = get_p(w);
        uint8_t sh = get_s(w);
        uint8_t c = get_c(w);

        // print: set bit at current deviation
        if (p) T |= (1ULL << bitpos);
        else   T &= ~(1ULL << bitpos);

        // shift head
        if (sh) D++; else D--;

        // stop?
        if (c == 0) {
            dev[s] = D;
            return REC_STOPPED;
        }

        // call next card
        card = (int)c;

        // spill check (after shift)
        dev[s] = D;
        if (D < -DEV_LIMIT || D > DEV_LIMIT) {
            return REC_SPILL;
        }

        // scanned digit after shift, used to index TB[card][j]
        int bitpos2 = BITPOS(D);
        int j = (int)((T >> bitpos2) & 1ULL);

        // insert into tape table TB[card][j]
        int cnt = tbCount[card][j];

        // if table nonempty, test against previous entries
        for (int q=0; q<cnt; q++) {
            TBEntry *e = &tb[card][j][q];
            uint64_t Tq = e->T;
            int Sq = e->S;
            int Dq = e->D;

            if (Dq < D) {
                // Dq < D: find Dmin between Sq and s, then compare shifted words
                int Dmin, Dmax;
                dev_minmax(dev, Sq, s, &Dmin, &Dmax);

                // Tq shifted left 18 + Dq bits
                // T  shifted left 18 + Dmin + D - Dq bits
                // (Lin: "Tq is shifted left 18 + Dq bits and T shifted left 18 + Dmin + D - Dq bits")
                // fileciteturn16file3L1-L3
                // Lin's OCR scan truncates the symbol after "18 +" in some copies.
                // The barrier-based derivation implies shifting relative to the barrier
                // (minimum deviation) rather than the earlier endpoint deviation.
                int delta = D - Dq;
                if (compare_right_of_left_barrier(Tq, T, Dmin, delta)) {
                    return REC_LOOPED;
                }

            } else if (Dq > D) {
                // symmetric when Dq > D
                int Dmin, Dmax;
                dev_minmax(dev, Sq, s, &Dmin, &Dmax);

                // symmetric right-barrier analogue: use Dmax instead of Dmin
                // (Lin: "Symmetrical procedure hold when Dq > D") fileciteturn16file3L8
                int delta = D - Dq; // negative
                if (compare_left_of_right_barrier(Tq, T, Dmax, delta)) {
                    return REC_LOOPED;
                }

            } else {
                // Dq == D: use both barriers (mask compare between barriers)
                // Lin: "If Dq = D, both Dmax and Dmin are determined and Tq and T
                // are compared from bits ... to ... by the use of a mask." fileciteturn16file3L9-L10
                int Dmin, Dmax;
                dev_minmax(dev, Sq, s, &Dmin, &Dmax);

                int lo = BITPOS(Dmin);
                int hi = BITPOS(Dmax);
                uint64_t m = mask_range_bits(lo, hi);
                if ( (Tq & m) == (T & m) ) {
                    return REC_LOOPED;
                }
            }
        }

        // no recurrence found; append entry
        tb[card][j][cnt].T = T;
        tb[card][j][cnt].S = s;
        tb[card][j][cnt].D = D;
        tbCount[card][j] = cnt + 1;

        // continue to next shift
    }

    // no recurrence after 50 shifts => holdout fileciteturn16file3L18-L20
    return REC_NO_RECURRENCE;
}

int main(void) {
    uint8_t cases12[12];
    gen_12_cases(cases12);

    int total = 0;
    int stoppers = 0;
    int bestScore = -1;
    int bestScoreShifts = 0;
    uint8_t bestScoreMachine[NUM_LINES];
    int bestShifts = -1;
    int bestShiftsScore = 0;
    uint8_t bestShiftMachine[NUM_LINES];

    int candidates = 0;
    int obviousPruned = 0;
    int recLooped = 0;
    int holdouts = 0;
    int spilled = 0;
    int stoppedBeyond21 = 0;

    printf("Lin BB-3 normalized enumeration: 4 lots x 12^4 = 82,944 machines\n");
    printf("Phase 1: discard machines that stop in <= %d shifts\n", MAX_STOP_SCAN);

    for (int lot=1; lot<=NUM_LOTS; lot++) {
        int lotTotal=0, lotStop=0, lotCand=0, lotPrune=0, lotHold=0;

        for (int a=0;a<12;a++)
        for (int b=0;b<12;b++)
        for (int c=0;c<12;c++)
        for (int d=0;d<12;d++) {
            uint8_t free4[4] = {cases12[a], cases12[b], cases12[c], cases12[d]};
            uint8_t m[NUM_LINES];
            build_machine_for_lot(lot, free4, m);

            total++; lotTotal++;

            StopScanResult r = run_stop_scan_21(m);
            if (r.stopped) {
                stoppers++; lotStop++;

                if (r.score > bestScore) {
                    bestScore = r.score;
                    bestScoreShifts = r.shifts;
                    memcpy(bestScoreMachine, m, NUM_LINES);
                }
                if (r.shifts > bestShifts) {
                    bestShifts = r.shifts;
                    bestShiftsScore = r.score;
                    memcpy(bestShiftMachine, m, NUM_LINES);
                }

                // Lin printed champions score>=6 or shifts>=20
                if (r.score >= 6 || r.shifts >= 20) {
                    printf("HALTED  stop@%2d score=%d lot=%d :: ", r.shifts, r.score, lot);
                    print_machine_tm(m);
                    printf("\n");
                }
                continue;
            }

            // Not stopped within 21 shifts
            candidates++; lotCand++;

            if (prune_obvious(lot, m)) {
                obviousPruned++; lotPrune++;
                continue;
            }

            // Lin recurrence routine
            RecResult rr = run_lin_recurrence_50(m);
            if (rr == REC_LOOPED) {
                recLooped++;
            } else if (rr == REC_STOPPED) {
                stoppedBeyond21++;
                // if this ever triggers, SH(3) > 21 (contradicts Lin)
                printf("WARNING: stopper beyond 21 shifts lot=%d :: ", lot);
                print_machine_tm(m);
                printf("\n");
            } else {
                // holdout or spill
                holdouts++; lotHold++;
                if (rr == REC_SPILL) spilled++;

                if (PRINT_HOLDOUTS) {
                    printf("HOLDOUT lot=%d (%s) :: ", lot, (rr==REC_SPILL)?"spill":"no-recurrence");
                    print_machine_tm(m);
                    printf("\n");
                }
            }
        }

        printf("Lot %d: total=%d stoppers<=21=%d candidates=%d pruned=%d holdouts=%d\n",
               lot, lotTotal, lotStop, lotCand, lotPrune, lotHold);
    }

    printf("\n=== SUMMARY ===\n");
    printf("Machines enumerated: %d (expected 82944)\n", total);
    printf("Stoppers (<=21 shifts): %d (Lin reports 26073)\n", stoppers);
    printf("Candidates after 21 shifts: %d\n", candidates);
    printf("Obvious pruned: %d\n", obviousPruned);
    printf("Recurrence-discarded (looped): %d\n", recLooped);
    printf("Holdouts remaining: %d (Lin reports 40)\n", holdouts);
    printf("  of which spills: %d\n", spilled);
    printf("Stopped beyond 21 (should be 0): %d\n", stoppedBeyond21);

    printf("\nBest score observed: %d (expected Sigma(3)=6)\n", bestScore);
    if (bestScore >= 0) {
        printf("  achieved at %d shifts by: ", bestScoreShifts);
        print_machine_tm(bestScoreMachine);
        printf("\n");
    }

    printf("\nMax shifts among stoppers observed: %d (expected SH(3)=21)\n", bestShifts);
    if (bestShifts >= 0) {
        printf("  score at max shifts: %d, machine: ", bestShiftsScore);
        print_machine_tm(bestShiftMachine);
        printf("\n");
    }

    return 0;
}

Running out of places to move the goalposts to

2025-12-31T00:00:00+00:00

The history of artificial intelligence is full of moved goalposts. For example, chess was long thought to be a holy grail of intelligence. A solution to chess would be the solution to intelligence itself. But then chess got solved, and it didn’t seem like intelligence had been achieved. It turned out that chess wasn’t all that important after all, and intelligence was still out of reach. Goalposts moved.

Moving the goalposts is often an accusation. It is perceived as dishonest, or a form of cheating. But it is an important part of science. Sometimes you find that the goalposts really are in the wrong place, and they need to be moved.

Where should the AI goalposts be moved now? That is becoming an increasingly difficult question.

When ChatGPT came to prominence in 2023, it was clear that the so-called “Turing test” had more or less been passed, and so the goalposts needed to be moved again. LLM technology at that time had a lot of obvious shortcomings, so there were lots of places to move the goalposts to. Lots of comments of the form “LLMs can never be intelligent because they can’t do X”, where X is some ad-hoc goal that has never been discussed before.

Some of these ad-hoc goals became popular memes. Look, ChatGPT can’t tell how many r’s are in the word “strawberry”, therefore it will never be truly intelligent. Some people thought this was a showstopper / mic drop / big deal. It wasn’t a big deal though. First, because it was a goal that nobody had ever cared about before. Second, because ChatGPT soon became able to do it. Whoops. Time to move the goalposts again.

For a while I had my own personal ad-hoc benchmark. A complicated coding task related to the Busy Beaver problem. [Warning: technical details ahead.] Check out the pseudocode from this paper for what is known as the “closed position set” method (CPS). Basically there is a todo pile and a seen pile; items from the todo pile are popped, processed, and added to seen; and when the todo pile is finished, the seen pile is recycled back into todo, repeating all this until nothing new is added to seen. This is all accomplished by nested while loops. If you hear that and think, that sounds like it will involve a catastrophic amount of deep data structure cloning, well, you’re right. I implemented the algorithm as-is in Rust and it was indeed catastrophically slow. I wanted to change it to track what would need to be updated after an item was processed. But it’s tricky to get it right, and a lot of work, and I never quite managed it.

I failed at it, but I took some comfort in the fact that ChatGPT couldn’t get it either. Over and over I tried to goad it into giving me working code, and it never did. I returned to it as new models came out. Tried it with DeepSeek too, nothing ever got it right. It can never be truly intelligent if it can’t figure this out.

Except, recently it did get it right. I think it was with “ChatGPT 5.1 Thinking”. I said, here’s some code, figure out how to eliminate the catastrophic cloning. And within a minute it came back with working code. Just like that. Substantially faster and totally correct according to my elaborate test suite. And then I pressed a little further and it came up with a bunch of other performance optimizations. Oh. Wow. Time to move the goalposts again. Uh, where to?

All that’s left is “AI can never come up with an original artistic / scientific idea”. Except, this is not true anymore. AI-generated music is really starting to hit, and of course original, compelling images have been old hat for a while now. What about original scientific ideas? Yes, that’s starting to happen too.

I’ll give another Busy Beaver example. The Busy Beaver problem is uncomputable, in the sense that there is no algorithm that can solve it outright. There is a variation known as the Beeping Busy Beaver. This problem is super-uncomputable, meaning that it would still be uncomputable even if there were a solution to the regular-uncomputable Busy Beaver. It is impossible to exaggerate how difficult this problem is from a theoretical perspective, and it is also incredibly difficult practically, even on small instances. There are various techniques for dealing with small instances of the Busy Beaver problem, and nobody has any idea how to get them to apply to the Beeping Busy Beaver.

Well, “ChatGPT 5.2 Thinking” was actually able to move the needle slightly. It proposed modifying the CPS method to maintain a state transition graph as the closed position set is constructed; then analyzing the graph afterwards to verify certain liveness conditions. TBH, I don’t understand the details very well. But the code mostly works. It has a good true positive rate and a surprisingly low false positive rate. Not perfect, but it is literally the best idea I have ever heard for how to deal with this problem. It is certainly not something that is just regurgitated from the training set. It is a full-blown original idea.

So given all this, where should the goalposts be moved to next? I don’t know if I’m competent to tell. AI seems to have caught up to my own intelligence even in those narrow domains where I have some expertise. What is there left that AI can’t do that I would be able to verify? But even ignoring that, what is the point of trying to move the goalposts anymore? AI capabilities are improving at such an incredible pace that people don’t even realize that the goalposts need to be moved again since the last time they moved them. Perhaps the time has come to stop moving the goalposts and simply conclude that artificial intelligence really has been achieved.

Recursive Type Definitions in Rust

2025-10-18T00:00:00+00:00

I am pleased to announce that I was able to add a new feature to the Rust Clippy linter. Namely: the use_self lint will now notify that the Self keyword can be used in recursive type definitions. This feature is now officially available in the Nightly release. Hip hip hooray.

Recursive type definitions? Self keyword? Let’s look at some examples. First, consider the humble linked list, the nodes of which contain some data and possibly also a link to another node:

struct LinkedList {
    data: u8,
    link: Option<Box<LinkedList>>,
}

The link field refers back to the object’s own type, so this is a recursive type definition. In this case, the data field contains a u8. Except, actually, the requirements just changed. Now the data needs to be generic. Okay, just a little change to make:

struct LinkedList<T> {
    data: T,
    link: Option<Box<LinkedList<T>>>,
}

Obviously the data field was updated, and the struct had to be made generic accordingly. But on top of that, the recursive field also had to be updated. Kind of annoying, but it should be fine going forward. Except, no, wait, the requirements changed again. Now instead of an owned value, we’re going to take a reference. Gotta update with a lifetime now:

struct LinkedList<'t, T> {
    data: &'t T,
    link: Option<Box<LinkedList<'t, T>>>,
}

Okay I am starting not to like this. Every time the struct gets updated, the recursive field gets updated too. Lifetime soup. There’s gotta be a better way. Oh hey, there is. Clippy says:

error: unnecessary structure name repetition
  |
  |     link: Option>>,
  |                      ^^^^^^^^^^^^^^^^^ help: use the applicable keyword: `Self`

Instead of explicitly referring to the object by name, the Self keyword can be used instead:

struct LinkedList<'t, T> {
    data: &'t T,
    link: Option<Box<Self>>,
}

Well would you look at that. Now the type definition can undergo further changes and the recursive field can be left undisturbed.

Self is not just for struct. It is also for enum. For example, the definition of a basic Lisp language:

enum Expr<NumType, SymType> {
    Number(NumType),
    Symbol(SymType),
    Define(SymType, Box<Self>),
    Call(Box<Self>, Vec<Self>),
    Lambda(Vec<SymType>, Box<Self>),
    If(Box<Self>, Box<Self>, Box<Self>),
}

The Rust language reference also says that union can be recursive. But unions are already unsafe, so I didn’t implement this new feature for them. Recursive unions are not just unsafe, but exotically unsafe, and probably shouldn’t be messed with.

It might be argued that using Self in type definitions is not idiomatic. Certainly it is not very common. But to me that just means it is a good language feature that is poorly publicized. I myself didn’t learn about it until I had to update a recursive struct to be generic. I was annoyed that the use_self lint hadn’t alread told me that Self could be used there. Hence the new feature.

But still, what if you have recursive type definitions and you enable the opt-in use_self lint and you really, really do not want to use Self? Well there is something for you too. Just add recursive-self-in-type-definitions = false to your Clippy configuration file and you won’t have to hear about it.

Discussion Questions

Have you used Self in recursive type definitions?
Were you aware that this is even possible?
Have you used Self in impl blocks?
Does the new check have any false negatives?
Does the new check have any false positives?
Are there any uses for recursive unions?

AI-generated music is really starting to hit

2025-10-07T00:00:00+00:00

I have to admit it’s getting better
A little better all the time

– Lennon-McCartney

Back in 2021, somebody released an AI-generated album in the style of the Beatles. Well, maybe “album” is too strong a word, since it was only fifteen minutes long. Also it wasn’t really “music” in the traditional sense. I don’t know how to describe it exactly. Some sort of music-like audio document? It was all garbled and weird, definitely not something you would just put on and listen to like music.

Still though, I was blown away. Despite everything, it really sounded like the Beatles. This was back before ChatGPT blew up, before AI came to mainstream attention. It was a major wow moment for me. I had had the usual skepticism: computers will never be able to make music, computers will never be able to do this or that, etc. Even thought it was just a crude proof of concept, this “album” (or whatever you want to call it) changed my mind. It made me think: this technology is real, and even if today’s AI-generated “music” is gibberish, tomorrow’s may not be.

Now it is 2025, and full-blown real music is being produced by AI. No need for scare quotes – it is actual, listenable music. And I don’t just mean formulaic genre junk like EDM, muzak, lofi beats, etc. I mean genuinely novel, artistically interesting music. And “here today” I would like to highlight a particular example: The Beach Boys Sing The Beatles.

🚩 Listen! (Youtube) 🚩

I don’t think the correct vocabulary or conceptual framework has yet been developed to discuss a work like this, so bear with me. Basically this is an album in the style of the Beach Boys where all of the lyrics are from Beatles songs. The songs are not covers of Beach Boys songs and they are not covers of Beatles songs – they are “original” Beach Boys-like songs with seemingly random snippets of recognizable Beatles lyrics.

I want to emphasize a few points.

First, this album is amazing. I cannot stop listening to it. It is so good. It is sometimes claimed that “real music” must, by definition, be produced by humans. Well, if this isn’t real music, what have I been whistling to myself all week? Am I some kind of cretin who cannot identify music? Or is it possible, on the other hand, that the human-creation criterion is just a pointless restriction on the definition of music?

Second, we have these expressions “in the style of the Beach Boys” or “Beach Boys-like”. These crummy circumlocutions really sell it short: this album sounds just like the Beach Boys. And not just cornball striped-shirt surfboards-and-cars Beach Boys. I’m talking about late-60s Brian-at-his-peak fire-helmet Beach Boys. The whole gang is there: Carl, Mike, everyone. And they all sound great. “Carl” in particular has some moments that give me chills, matching his most soulful Wild Honey vocals.

Third, it is difficult to figure out where to start discussing a work like this. For one thing, it is not obvious whether it even is a “work”, or what kind of “work” it is. As I said before, I don’t think the vocabulary or conceptual framework exists to discuss this cogently. The technology is just too new and too dazzling, and traditional theories of art interpretation and criticism have not caught up.

But I would say on its face there are two way to discuss it. The first is as what it actually is, namely an AI-generated album from 2025. Who made it? How exactly was it made? To what extent was there human involvement in its production? I don’t know the answers to any of these questions.

The last question is especially important. In my own experience dealing in AI-generated stuff (images, code, etc), I find that there is a lot of garbage to sort through. It’s rarely the case that you get what you want on the first try. Maybe you get something close to what you want, but not quite. The prompt-evaluate-retry loop can go on for a while. So while the output is “AI-generated”, there may be a substantial human-influenced selection bias in what the final work ends up actually being.

Digging a little further into the specific content of the album, I am most interested to know how the particular Beatles lyrics were chosen. By human? By AI? It is, to put it mildly, a “very strange” selection. If you are able to listen to it and identify all the songs used, I would call you a Beatles expert.

Anyway, that is the first way to discuss this album. The second way to discuss it is as an alternative history work. It is sometimes claimed (again, by definition) that the work is not actually what is important about art; what is important instead is the process that led to the creation of the work.

In this case, it is easy to imagine a backstory that could have led to the creation of this album. Consider: The Beatles broke up in 1969. Brian Wilson, who had for years been driven by friendly competition with the Beatles, was inspired to take a bunch of random lines from Beatles songs and set them to new music. The album was released in 1970 to a combination of puzzlement and acclaim.

This is an eminently plausible scenario. It didn’t happen, but it very well might have. And in that case, what would be the place of BBSB in the Beach Boys canon? How would it be understood in relation to Pet Sounds and Smile? How would the Beatles themselves have responded? Sure, BBSB doesn’t have a real historical context, but that doesn’t mean that these counterfactuals can’t be pondered.

(Things probably would have gone better for the Beach Boys if they had released this album. It’s a lot better than what actually did happen: Brian went nuts, and the rest of the band, lacking any direction or vision, spent the early 70s releasing a string of dull, lifeless, forgettable albums.)

Listen, I could go on all day about this album. In fact, I have been doing that – my family and friends are sick of hearing about it. So I’ll wrap this up with a takeaway message: AI-generated music is here and it’s real. People are still saying that it’s nothing but hype, it’s just a big scam, it will always sound janky and unnatural and weird, it will never be “truly musical”, etc. If you believe any of this, I strongly encourage you to listen to The Beach Boys Sing The Beatles and reassess.

Discussion Questions

Did you listen to BBSB? If so, what did you think?
Which Beatles lyrics were you able to identify? Which were you unable to identify?
Are there any obvious AI-artifacts? Anything that clearly sounds bad, off, unmusical, etc?
What are the “inspirations” behind the album? Are there any specific moments from the album that are clearly ripped off from existing songs?
To what extent is BBSB an “original” musical work? To what extent does it manage to sidestep the problem of “originality” by openly relying on preexisting Beach Boys sounds and Beatles lyrics?
If the Beach Boys actually had released BBSB, to what extent would it have been “original”?
Other than its lyrics, does music consist of anything apart from its pitches, rhythms, timbres, etc?

The Shape of a Turing Machine

2025-09-30T00:00:00+00:00

Recently I found an interesting Turing machine program. When started on a blank tape, it runs for more than 10¹⁵⁶⁵ steps before halting. Here it is¹:

What’s special is that this is, as far as anyone knows, the longest that a Turing machine program of eight instructions can run.

Now, this might sound familiar to you if you’ve heard of the Busy Beaver game. The goal of that game is to find the longest running Turing machine of a given length. However, program length has traditionally been measured by number of states and colors, rather than total number of instructions. So we say, for example, that a program is 5-state 2-color (5x2), or 2-state 4-color (2x4), etc. We call a program’s particular number of states and colors its shape.

A fully specified N-state K-color halting program has NK - 1 instructions (-1 because reaching an undefined instruction is what it means to halt), and so NxK programs have the same number of instructions as KxN. It is therefore natural to wonder whether there might be some relationship between the two classes, or whether there might be a way to convert an NxK program to KxN form, or whether the two categories ought to be somehow grouped together.

There is a point of view that says this approach is nonsensical because it ignores the real semantic difference between states and colors. In fact states and colors correspond to fundamental control flow constructs, namely jumping and branching. An NxK program has N jump targets, each of which executes a K-way branch.

For example, consider the 4x2 Busy Beaver champion program, discovered by Allen Brady in 1964, and the 2x4 champion, discovered by Shawn and Terry Ligocki in 2005:

    +-----+-----+        +-----+-----+-----+-----+
4x2 |  0  |  1  |    2x4 |  0  |  1  |  2  |  3  |
+---+-----+-----+    +---+-----+-----+-----+-----+
| A | 1RB | 1LB |    | A | 1RB | 2LA | 1RA | 1RA |
+---+-----+-----+    +---+-----+-----+-----+-----+
| B | 1LA | 0LC |    | B | 1LB | 1LA | 3RB | --- |
+---+-----+-----+    +---+-----+-----+-----+-----+
| C | --- | 1LD |
+---+-----+-----+
| D | 1RD | 0RA |
+---+-----+-----+

The difference in shape between these programs is visually manifest, and it becomes especially pronounced when they are transliterated into C:

main() {  /* 4x2 */
 A:
  switch (READ) {
    case 0: { PRINT(1); RIGHT; goto B; }
    case 1: { PRINT(1); LEFT;  goto B; }
  }

 B:
  switch (READ) {
    case 0: { PRINT(1); LEFT;  goto A; }
    case 1: { PRINT(0); LEFT;  goto C; }
  }

 C:
  switch (READ) {
    case 0: { return; }
    case 1: { PRINT(1); LEFT;  goto D; }
  }

 D:
  switch (READ) {
    case 0: { PRINT(1); RIGHT; goto D; }
    case 1: { PRINT(0); RIGHT; goto A; }
  }
}

main() {  /* 2x4 */
 A:
  switch (READ) {
    case 0: { PRINT(1); RIGHT; goto B; }
    case 1: { PRINT(2); LEFT;  goto A; }
    case 2: { PRINT(1); RIGHT; goto A; }
    case 3: { PRINT(1); RIGHT; goto A; }
  }

 B:
  switch (READ) {
    case 0: { PRINT(1);  LEFT; goto B; }
    case 1: { PRINT(1);  LEFT; goto A; }
    case 2: { PRINT(3); RIGHT; goto B; }
    case 3: { return; }
  }
}

There is no apriori reason to believe that it should be possible to somehow convert back and forth between jump targets and switches, and so we might say that it is ridiculous to compare NxK programs with KxN. They are apples and oranges.

But wait, not so fast. Actually there is a way to commensurate NxK and KxN programs: simply interpret them both as max(N,K) x max(N,K) programs! To use the examples above, this means interpeting 4x2 and 2x4 programs as woefully underspecified 4x4 programs that happen not to avail themselves of all available states and colors:

    +-----+-----+-----+-----+        +-----+-----+-----+-----+
    |  0  |  1  |  2  |  3  |        |  0  |  1  |  2  |  3  |
+---+-----+-----+-----+-----+    +---+-----+-----+-----+-----+
| A | 1RB | 1LB | --- | --- |    | A | 1RB | 2LA | 1RA | 1RA |
+---+-----+-----+-----+-----+    +---+-----+-----+-----+-----+
| B | 1LA | 0LC | --- | --- |    | B | 1LB | 1LA | 3RB | --- |
+---+-----+-----+-----+-----+    +---+-----+-----+-----+-----+
| C | --- | 1LD | --- | --- |    | C | --- | --- | --- | --- |
+---+-----+-----+-----+-----+    +---+-----+-----+-----+-----+
| D | 1RD | 0RA | --- | --- |    | D | --- | --- | --- | --- |
+---+-----+-----+-----+-----+    +---+-----+-----+-----+-----+

Now we have a straightforward apples-to-apples comparison between two 4x4 programs each with seven instructions defined. And the results are stark: the 4x2 champ runs for 106 steps before halting, while the 2x4 champ runs for 3,932,963 steps. Wow! This suggests that, in some sense, colors are more powerful than states.

But is it really so simple? Notice that these traditional shape-based programs appear artificially constrained in this new context. Neither of them use the diagonal, for example, and they both have a boxy look. Are there any 7-instruction programs that run longer? If so, what are their shapes? In general, what about N-instruction programs?

This question is known as Instruction-Limited Busy Beaver, or BBi(n) for short. It was first proposed by pseudonymous Internet denizen MrBrain in July 2025. Brain and Shawn were quickly able to establish some definitive early values:

n	BBi(n)	Shape	Notes
3	5	2x2	BB(2,2)
4	16	3x2
5	37	2x3	BB(2,3)
6	123	2x4
7	3932963	2x4	BB(2,4)

These early results were somewhat disappointing. The hope was that BBi search would turn up some exotic new program shapes. Instead, it confirmed only what was already believed: that colors are more powerful than states. So much more powerful, it would seem, that states were really just a drag, and that the best strategy was to minimize state use entirely.

So I was very pleased when on 26 July 2025 I found a new BBi(8) champ. Not just because of the thrill of discovery, but also because the program had an unexpected shape: 3x4. States may have some use after all! And thus this entry was added to the results table:

n	BBi(n)	Shape	Notes
8	10¹⁵⁶⁵ <	3x4	🎉

Discussion Questions

How many states and colors can an N-instruction program use?
What is the best method for measuring program length?
What other evidence is there that colors are more powerful than states? Is there any counter-evidence?
How does BBi relate to Brady’s algorithm?
How does program shape relate to the Spaghetti Code Conjecture?
Recently the true value of BB(5,2) was proved to be 47,176,869. The champion 5x2 program uses 9 instructions. By the lights of BBi, is this a good score?
Recently mxdys discovered a 6x2 program that runs for pentationally many steps before halting, with a score > 2 ↑↑↑ 5. That program uses 11 instructions. How would you expect this to compare to the true value of BBi(11)?
There are off-by-one discrepancies between the step counts used in this post and those reported elsewhere. Why?
Consider the function SHAPE(n) that returns the shape of the BBi(n) champion. Is this function computable? What about its rate of growth?

Open Problems

Find a better BBi(8) score, or else prove the current champion.
BB(2,5) > 10 ↑↑ 4. Find a better BBi(9) score, or prove that value.

Footnotes

¹ In plain text:

    +-----+-----+-----+-----+
    |  0  |  1  |  2  |  3  |
+---+-----+-----+-----+-----+
| A | 1RB | 1LA | --- | --- |
+---+-----+-----+-----+-----+
| B | 1RC | 3LB | 1RB | --- |
+---+-----+-----+-----+-----+
| C | 2LA | 2LC | --- | 0LC |
+---+-----+-----+-----+-----+

Performance Hacks for Brady’s Algorithm

2025-07-15T00:00:00+00:00

In the Busy Beaver game we ask: what is the longest that a Turing machine program of N states and K colors can run before halting when started on the blank tape? Answering this question requires enumerating every such program and checking whether or not it halts. But there is a big problem: there are just too many to check. The number of programs grows multiple-exponentially with N and K, O(nk^nk). Yikes!

Brady’s algorithm is an enumeration technique that allays this situation somewhat. It is based on two observations. First, we know that the Turing machine programs will be run from the blank tape. This constrains the possible execution paths. An arbitary program may have instructions that are simply unreachable in these circumstances, and there is no need to consider such programs. Second, some programs are isomorphic duplicates of each other, differing only in having their states or colors rearranged. There is no need to consider these duplicates, and only one program from an isomorphic group will need to be considered.

So the algorithm goes like this. Start on the blank tape with a program whose only instruction is A0:1RB. Then run it until an undefined instruction is reached. Then enumerate all possible instructions, pursuant to the following restriction: a new state can only be used if all prior states have been used. For example, state D cannot be used until state C has been used, and state E cannot be used until state D has been used, etc. And likewise for colors. Then for each such instruction, create an extension of the program with that instruction inserted and recursively continue the procedure. This ensures that only programs with actually reachable and meaningfully distinct instructions are generated.

It’s a cool algorithm, and a dramatic improvement over naive program generation. But even still, there are an awful lot of programs to generate, and running the algorithm can take quite a long time. So it is very important to pay attention to fine implementation details and take advantage of low-level performance hacks wherever possible. Small gains add up!

For some context, we will consider a real-world, used-in-anger, known-good implementation of Brady’s algorithm written by Shawn and Terry Ligocki and offer a few suggestions to make it faster. These are the sorts of changes that apply generically; basically any implementation of the algorithm will deal with these same issues. (Hopefully it goes without saying, but nothing here should be construed as negative or critical. This is fine code that has already proved its worth.)

There is some set-up to get the whole apparatus going. We will ignore all of that and jump straight into the action:

class TM_Enum:
    def set_trans(self, *, state_in, symbol_in, symbol_out, dir_out, state_out): ...

    def enum_children(self, state_in, symbol_in):
        max_state = 0
        max_symbol = 0
        num_def_trans = 0

        for state in range(self.tm.num_states):
            for symbol in range(self.tm.num_symbols):
                trans = self.tm.get_trans_object(state_in=state, symbol_in=symbol)
                if trans.condition != Turing_Machine.UNDEFINED:
                    num_def_trans += 1
                    max_state = max(max_state, trans.state_out)
                    max_symbol = max(max_symbol, trans.symbol_out)

        num_states = min(self.tm.num_states, max_state + 2)
        num_symbols = min(self.tm.num_symbols, max_symbol + 2)

        if num_def_trans < self.max_transitions:
            for state_out in range(num_states):
                for symbol_out in range(num_symbols):
                    for dir_out in range(2):
                        new_tm_enum = copy.deepcopy(self)

                        new_tm_enum.set_trans(
                            state_in=state_in,
                            symbol_in=symbol_in,
                            symbol_out=symbol_out,
                            dir_out=dir_out,
                            state_out=state_out,
                        )

                        yield new_tm_enum

The outline of the procedure is clear: at the branch point, determine the available instructions based on the combination of already-used states and colors and maximum possible states and colors, then create extensions from them. There are three easy ways to improve this.

Pass on used-parameter information from parent to child.

At the start of the branch, the program stops to check how many and which instructions it has used so far. But the parameters of the child program can be derived from the parameters of the parent program plus the extension instruction, so really the program should already know this information about itself. If each node keeps track of its parameter information and passes it on to its extensions, the parameter recalculation can be skipped entirely.

Pre-calculate available instructions.

Given the available parameters, the available instructions are generated on the fly every time. But in practice the maximum available parameters are never all that large. So it is much faster to generate a table of all possible available instructions just once up front. Then the branching program can hold a reference to that table and index in with available parameters as needed. This will look something like:

avail_instrs: list[Instruction] = self.table[avail_states][avail_colors]

Then at branch-time, obtaining available instructions is just a fetch operation, no generation required.

Re-use the existing program.

With the instruction table approach, extension creation looks like this:

for instr in avail_instrs:
    new_tm_enum = copy.deepcopy(self)
    new_tm_enum.set_trans(instr)  # or whatever
    yield new_tm_enum

We ran our program until it reached an undefined instruction, and now we are at the branch point, and we create one extended program for each available instruction. Well, what happens to the program object we were just running? Currently it gets thrown in the trash. But it is perfectly good and can continue to be used. And since the instructions are all there together, it is easy to accomplish this with some list manipulation:

*rest_instrs, last_instr = avail_instrs

for instr in rest_instrs:
    new_tm_enum = copy.deepcopy(self)
    new_tm_enum.set_trans(instr)
    yield new_tm_enum

self.set_trans(last_instr)
yield self

This saves one deepcopy call per branch and also reduces the amount of garbage that must be collected.

Busy Beaver Backwards

2025-07-03T00:00:00+00:00

In the classic Busy Beaver game we ask: what is the longest that a Turing machine program of N states and K colors can run before halting when started on the blank tape? The basic approach to solving this problem is to generate a list of candidate programs, then subject each program to a sequence of deciders, where a decider is a function that takes a program as input and returns a result of type Option. This result is interpreted as follows:

Some(true): the program provably halts
Some(false): the program provably does not halt
None: haltingness could not be determined

Proving non-haltingness means refuting the possibility of halting, usually by showing that the program’s halt conditions are unreachable.

One of the fundamental methods for refuting haltingness is backward reasoning. The idea is to start with a program’s halt conditions and work backwards, reconstructing possible paths that could have reached it. If it can be shown that there are no valid paths, then the program’s haltingness is refuted.

Here is a simple example:

    +-----+-----+
    |  0  |  1  |
+---+-----+-----+
| A | 1RB | 0LA |
+---+-----+-----+
| B | 1LA | --- |
+---+-----+-----+

This program halts if it scans a 1 while in state B. Other than the scanned 1, the tape contents don’t matter. In other words, the halt configuration is:

B1 | ? [1] ?

The goal now is to figure out the previous configuration. There is only one instruction that reaches state B, and that’s A0 : 1RB. The machine must have been in state A scanning a 0, and since that instruction moves right, that 0 must have been to the left of the current head. The tape contents are unknown other than the current scanned 1, so it is consistent that a 0 be at that spot. The previous configuration must therefore have been:

A0 | ? [0] 1 ?

Repeating the process, there are two instructions that lead to state A: A1 : 0LA and B0 : 1LA. Both of these instructions go to the left, so they must have come from the right. A1 : 0LA writes a 0, but the cell to the right of the scan contains a 1. So the A1 instruction is impossible. The B0 instruction is consistent with the tape contents, so we move on to the next configuration:

B0 | ? 0 [0] ?

As before, the only possible instruction that could reach this is A0 : 1RB. But that instruction writes a 1, while the cell to the left has a 0. So the instruction is impossible. There are no other configurations to consider, so we can conclusively say that this program cannot halt.

The full sequence of configurations looks like this:

1 | B1 | ? [1] ?

2 | A0 | ? [0] 1 ?

3 | B0 | ? 0 [0] ?

We call this a backward refutation of length 3 and width 1.

Here is a more complicated example:

    +-----+-----+
    |  0  |  1  |
+---+-----+-----+
| A | 1RB | 0LA |
+---+-----+-----+
| B | 0RC | 1RC |
+---+-----+-----+
| C | 1LA | --- |
+---+-----+-----+

As before, we start with the single halting configuration:

C1 | ? [1] ?

How was this configuration reached? This time there are two possibilities: B0 : 0RC and B1 : 1RC. Both instructions are consistent with the tape contents, so both must be considered:

B0 | ? [0] 1 ?
B1 | ? [1] 1 ?

The same process must now be repeated for both of these branches. Here is the full sequence of configurations:

 1 | C1 | ? [1] ?

 2 | B0 | ? [0] 1 ?
 2 | B1 | ? [1] 1 ?

 3 | A0 | ? [0] 0 1 ?
 3 | A0 | ? [0] 1^2 ?

 4 | A1 | ? 0 [1] 1 ?
 4 | C0 | ? 0 [0] 1 ?

 5 | C0 | ? 0 1 [0] ?
 5 | B0 | ? [0] 0 1 ?

 6 | B1 | ? 0 [1] 0 ?
 6 | A0 | ? [0] 0^2 1 ?

 7 | A1 | ? 0 [1] 0 1 ?

 8 | A1 | ? 0 1 [1] 1 ?

 9 | C0 | ? 0 1^2 [0] ?

10 | B1 | ? 0 1 [1] 0 ?

11 | A0 | ? 0 [0] 1 0 ?

12 | C0 | ? 0^2 [0] 0 ?

13 | B0 | ? 0 [0] 0^2 ?

This is a backward refutation of length 13 and width 2 – width 2 because that is the maximum number of configurations at any step.

In these examples, we have seen a 2-state 2-color program with a refutation of length 2 and a 3-state 2-color program with a refutation of length 13. Are there any longer ones? Perhaps you can see where this is going. We can ask the general Busy Beaver Backward question: among backward-refutable programs of N states and K colors, what is the length of the longest refutation?

(What would be a good name for this function? BBBack? I want my BBBack, BBBack, BBBack, …)

I will claim tentatively that these values are in fact the winners: the longest 2/2 refutation has length 2 and the longest 3/2 refutation has length 13. I don’t have a proof, although whatever the true values are, they are certainly provable.

Here are the best values that I have been able to find, along with their witnessing programs:

States	Colors	Program	Refutation Length
2	2	`1RB0LA_1LA---`	3
2	3	`1RB1RB---_0LB2RB1LA`	8
3	2	`1RB0LA_0RC1RC_1LA---`	13
2	4	`1RB0RA3LA2RB_2LA---2RB3LA`	17
		`1RB1LA---3RB_2LA3RB0LB1LA`	17
		`1RB1LA---3RB_2LB3RB0LA1LA`	17
2	5	`1RB4RB---1RB2RB_2LB3LA3RB0LA1LA`	41
		`1RB3RA0RB0LA2RB_2LA---4LA---3LA`	41
4	2	`1RB0RB_1RC1LD_1LA---_0LD0RA`	46
		`1RB0LA_0RC1RC_1RD1LA_1LB---`	46
3	3	`1RB0LA0RB_2RC1RC1LA_1LA2LA---`	50
		7 others (8-way tie)	50
5	2	`1RB0LA_0RC1RC_1RD1LA_0RE1LB_1LC---`	115
		`1RB0RB_1RC1LE_1RD1LA_1LB---_0LE0RA`	115

I would be very interested to know if these values can be beaten. Alternatively, if there is a bug in my backward reasoner and any of the values are illegitimate, I would be very interested to know that too.

A trend that shows up in this data is that longer refutations correlate with more states and fewer colors. This is because more colors means exponentially more backward branching possibilities, and this tends to foil the backward reasoning method. I interpret this as yet more evidence that colors are more powerful than states.

Questions and Exercises

Verify the claimed BBBack values, or find better ones, or show that they are illegitimate.
How can the backward reasoning method be used to prove haltingness?
A similar question is: among backward-refutable programs of N states and K colors, what is the width of the widest refutation? Find the best values for this function and exhibit their witnessing programs.
Is BBBack computable? Why or why not?
Backward reasoning can be used to refute haltingness, but it can be used for other conditions as well. Use backward reasoning to show that the following programs cannot erase the tape. How many steps do they take?

1RB0RD_1LC0LA_0RA1LB_1RE0LB_0LB1RD
1RB0RD_1LC0LA_0RA1LB_1RE0LB_1LE1RD
1RB0LC_1LC0RD_0RE1LA_0LA1RD_0RB1LB
1RB0RB_1RC1RA_1LC0LD_0RA0LE_1LD1LE
1RB1RD_1LB0LC_0RD0LE_1RA0RA_1LC1LE

What a Difference a Faster Hash Makes

2025-05-27T00:00:00+00:00

I have some Rust code that does a bunch of pure computation. A lot of it. To solve basically an elaborate combinatorial problem.

Wanted to speed it up. Profiled with flamegraph.

When you do performance profiling, what you really want to find are hot spots. Some low-hanging fruit. You want to look at the profile and see one big anomolous entry. “We’re spending X% of total CPU time doing what now???”. Look at the code and find there is some critical point where something infelicitous is being done. Then fix it for a big-time speed-up. That’s the dream.

And lucky for me that’s exactly what I found. According to the graph, I was spending 23.25% of total CPU time on the function ::write. What? Yeah, that’s just time spent going back and forth with std::collections::{HashMap, HashSet}. A really stupid amount of effort to spend on a basic administrative task.

Replaced the builtin hash collections with an external package, ahash::{AHashMap, AHashSet}. Boy oh boy, what a difference it made. The big computational task went from taking 67 minutes to 55 minutes. That’s an 18% improvent for a diff of only a few lines. Hooray.

Why is std::collections::HashMap so much slower than ahash::AHashMap? Well, the first thing to note is that ahash is a really hard word to type. I have never typed it correctly on the first try. It always comes out as ahahs or ashash, etc. That doesn’t have anything to do with speed, it’s just something I wanted to complain about.

The builtin hash procedure is slower because it attempts to provide some security. It wants to make it as difficult as possible to guess a key from a value. That is a great feature for a web server or CLI tool. But it is totally useless for solving combinatorial problems. There is no need to make anything difficult to guess because there is no untrusted input. It is the computational equivalent of installing external locks on the internal doors in a home. Stop doing that and it goes a lot faster.

There are other hash collection implementations. Some of them are specialized for certain types of data. For example if the data is all numbers then you will want one hash procedure, but a different one if you are hashing compound objects.

A good way to make it easy to experiment with different hashers is to use import aliases. What you want to avoid is having to modify every callsite. For example:

use std::collections::{HashMap, HashSet};

let mut a = HashMap::new();  // gotta change this
let mut b = HashMap::new();  // gotta change this
let mut c = HashSet::new();  // gotta change this
let mut d = HashSet::new();  // gotta change this

Instead, I like to refer to the collections by more generic names, like Dict and Set.

use std::collections::{HashMap as Dict, HashSet as Set};

let mut a = Dict::new();
let mut b = Dict::new();
let mut c = Set::new();
let mut d = Set::new();

Swapping in a different hasher is only a matter of changing the import to use ahash::{AHashMap as Dict, AHashSet as Set}.

This technique has the added benefit / drawback of making the code look more like Python.

BBB(3, 3) > 10 ↑↑ 6

2025-03-24T00:00:00+00:00

The other day I found a cool new 3-state 3-color Turing machine program. Here it is:

	０	１	２
Ａ	１ＲＢ	０ＬＢ	２ＬＡ
Ｂ	１ＬＡ	０ＲＣ	０ＬＢ
Ｃ	２ＲＣ	２ＲＢ	０ＬＣ

When started on the blank tape, this program runs for more than 10 ↑↑ 6 steps before terminating.

Now, you may notice that this program has no halt instruction and therefore obviously can never halt. And given that it can never halt, you may wonder what I mean when I say that it “terminates”.

Observe that the program contains the instruction C0 -> 2RC. That is, if the machine is in state C and scanning a blank cell (0), then it will remain in state C and move right. We are starting the program from the blank tape, so there are only ever finitely many marks on the tape. So if the program should ever reach state C with the tape head to the right of all the marks, then it is clear that it will get stuck in instruction C0 forever. And indeed, this is exactly what happens – it ends up in this configuration:

C0 | 2^Q 0 [0]

That is, there is a 2-block of length Q, followed by a blank cell, and the machine is scanning another blank cell and is in state C. It is obvious that no meaningful computation can occur after this point, so we may as well just end the run there. This circumstance is known as spinning out.

Spinning out is the simplest possible behavior that a non-halting program can exhibit.

Spinning out is also an instance of a more general behavior known as quasihalting. Whereas halting means that all states become unreachable, quasihalting means that some states become unreachable. In the specific case of spinning out, all states but one become unreachable. (Indeed, all instructions but one become unreachable).

The classic Busy Beaver question (BB) asks: what is the longest that a Turing machine program of N states and K colors, when started on the blank tape, can run before halting? The program here cannot halt and so is obviously not a candidate for BB. However, the Beeping Busy Beaver question (BBB) is just the same as BB, except that it asks for quasihalting instead of halting. This is program does quasihalt, and in fact it is the new BBB(3, 3) champion! And it is now known that BBB(3, 3) > 10 ↑↑ 6.

How can such a simple program generate such a huge number? Actually, although the number is too huge to be written out in full, it is simple to specify. I said earlier that the final tape configuration reached contained a block of length Q. Here is the precise definition of Q:

2 ** (4 + (2 ** (4 + (2 ** (4 + (2 ** (4 + (2 ** (4 + (2 ** 20))))))))))

This is a big number, but ultimately it is just a power of 2. The program achieves this by implementing a simple additive rule, then using that additive rule to implement a multiplicative rule, then applying that multiplicative rule repeatedly. This is exactly what one might expect based on the repetitive structure of Q. Calculating these rules is not terribly complex, but it does require some real math.

A few notes:

Running a program for tetrationally many steps cannot be done directly. It requires a fast-forwarding, algebra-aware inductive prover simulator. But for such a simulator, this program runs extremely quickly: termination is reached in only a few hundred steps.
The Spaghetti Code Conjecture says that Busy Beaver programs ought to be complicated, ill-structured, or otherwise “spaghetti code”. This program, however, has a fairly clean structure. It has three states, but two of those states do not communicate with each other: state A only reaches itself and state B, and likewise state C only reaches itself and state B. State B therefore acts as some sort of dispatch node, and this fact can be gleaned simply by looking at the program text. So this program is weak evidence that maybe the Spaghetti Code Conjecture is false.
The previous BBB(3, 3) champion was found by Shawn Ligocki back in February 2022. That program quasihalts after around 10⁶² steps, so it is “just” exponential, rather than tetrational like this new one. When announcing that discovery, he said “I don’t think I’ll find any more without some more clever searching.” But I didn’t come up with any particularly novel search strategy – it was just standard Brady’s algorithm. So why didn’t Shawn find this one? I think it was simply a matter of being in the right place at the wrong time. He was the first person to find a tetrational program, but that didn’t happen until May 2022, a few months after his BBB(3, 3) search. After that he overhauled his simulator to handle tetrational numbers, but I suppose he didn’t go back to the 3-3 space after that. If he had, he probably would have found it. (My own simulator is partially based on Shawn’s. I would say it is approximately 1/3 directly similar, 1/3 distinct, and 1/3 convergently similar.)

Finally, here are the latest results for BB / BBB.

States	Colors	BB	BBB
3	2	21	55
2	3	38	59
4	2	107	≥ 32,779,478
2	4	3,932,964	> 10²⁴
5	2	47,176,870	> 10¹⁴⁰⁰⁶
3	3	> 10¹⁸	> 10 ↑↑ 6
2	5	> 10 ↑↑ 4	…
6	2	> 10 ↑↑ 15	…

Proven values are stated exactly; the rest are lower bounds. Some values are provably difficult to prove. In the case 2-state 5-color and 6-state 2-color, there is no BBB result better than the best known BB result.

🚨 OPEN PROBLEM ALERT 🚨

Find values of BBB(6, 2) or BBB(2, 5) better than their BB counterparts.

An ideology-induced bug in Mypy

2024-09-10T00:00:00+00:00

Mypy is a typechecker for Python. It’s not the official typechecker for Python. There is no official typechecker. But Mypy seems to be the official-est. I use it, and it’s mostly pretty great.

Mypy has a bug. Big deal, lots of software has bugs. But this bug seems to have been deliberately chosen on the basis of some misguided code ideology. I think the ideology ought to be discarded and the bug ought to be fixed.

Before describing the bug, I would like to speak about static typing in Python. Python is renowned for how freeing it feels. You can write some code and run it, just like that. Static typing, on the other hand, is often associated with the feeling of arbitrary restrictions. Why does the compiler keep complaining, just let me run my code! So it is sometimes thought that static typechecking runs counter to the spirit of Python.

But static typing remains totally optional. Everyone is free to write Python without declaring types and free to run it without checking anything. Of course, the freedom to run code without typechecking is a lot like the freedom to ride in a car without a seatbelt. The freedom to encounter runtime type errors, so liberating!

No, I’m just kidding (somewhat). Freedom really is a valuable aspect of the Python experience. Users don’t want to be burdened with doing a bunch of paperwork before they can try something out. At the same time, some users would prefer to know about type errors before runtime, especially in already-existing Python codebases. Optional, incremental typechecking is a great way to balance freedom and correctness in Python.¹

Freedom is important in Python, get it? We’ll come back to this later. Okay, now on to the bug. Consider this code:

from __future__ import annotations

from random import randint

class WhatIsIt:
    def __new__(cls) -> int | WhatIsIt:
        if randint(0, 1):
            return object.__new__(cls)
        else:
            return 5

def check(x: WhatIsIt) -> None:
    assert isinstance(x, WhatIsIt)

x = WhatIsIt()

check(x)

What happens when check(x) is called? The function asserts that its argument is an instance of WhatIsIt. So if variable x is not a WhatIsIt, an AssertionError will be raised; otherwise, nothing will happen.

That variable x – what is it? Its value comes from the WhatIsIt constructor, so it must be a WhatIsIt, right?

Well, no. That constructor – WhatIsIt.__new__ – usually returns an instance of WhatIsIt, but occasionally it returns an int. Notice that this is explicitly annotated in its return type: int | WhatIsIt.

According to its type annotations, the function check expects a WhatIsIt argument. So the call check(x) is a type error, since x could be an int. But Mypy doesn’t say anything about that. Instead, it raises a different warning:

error: "__new__" must return a class instance (got "int | WhatIsIt")  [misc]

It says that the __new__ constructor “must” return a class instance. “Must” is a funny word, straddling the distinction between “is” and “ought”. In this case, the “is” interpretation of “must” is literally false: it just simply is not the case that a constructor must return an instance of its class. As the example here shows, a constructor very much can return something else. So the “must” here seems to mean “ought”, as in “__new__ ought to return a class instance”.

This is just an opinion. It’s a fine opinion to hold, and if a linter warned about this, there would be no problem. But the job of a typechecker is not to give opinions. A typechecker has just one job: analyze the types and warn about inconsistencies.

Okay, I guess Mypy is oddly opinionated about the practice of returning something other than a class instance from a class constructor. Just disable the warning then:

class WhatIsIt:
    def __new__(cls) -> int | WhatIsIt:  # type: ignore[misc]
        ...

After this change, Mypy reports: Success: no issues found in 1 source file. But this is a false negative! There is a type error sitting right there! Apparently Mypy is so committed to its constructor-instance ideology that it refuses to do any further typechecking, even when the constructor is clearly and correctly annotated. This is a full-blown type-inference bug, and it ought to be fixed.

There is an opposing point of view that says: the obvious thing for a constructor to do is to return an instance, and in fact that is what is actually done in practically all cases, and doing otherwise violates an overwhelmingly common assumption. But this argument itself violates an even more important tenent, namely Pythonic freedom.

Here is the reality of the situation: the __new__ constructor can return anything. Regardless of what it “should” return, Python allows for writing class constructors that can return whatever. That is the freedom of Python, and it is exactly why the language is so great. There is no good reason why this freedom should not be accomodated to as great an extant as possible.

Discussion Questions

Have you ever written a __new__ constructor to return something other than a class instance? Did this lead to any confusion?
Wait, what is __new__? Is that the same as __init__?
How do other languages deal with constructors returning objects of different types?
Python allows users to write and run code quickly. This often comes at the expense of all sorts of runtime errors. Is this actually a good trade-off?
Is static typing in Python a good idea?

Footnotes

¹ There is an argument against typechecking in Python that says typing is inappropriate because Python is a “scripting language”. But as far as I can tell, “scripting language” just means a language without static types. So this argument is patently circular and therefore very stupid.