Build Your Own Stack-Based Bytecode Virtual Machine in C

Ever wonder how Python's CPython, the JVM, or the Lua VM actually execute code under the hood? In this article, we build a stack-based bytecode virtual machine from scratch in C.

You've heard the word "virtual machine" thrown around a thousand times. The Java Virtual Machine. The CPython VM that powers Python. The Lua VM. The WebAssembly runtime. They're everywhere. But do you actually know how they work?

I mean really work — not the hand-wavy "it interprets bytecode" explanation, but the actual mechanics. The stack. The instruction pointer. The dispatch loop. The thing doing the work.

I didn't. And then I decided to build one.

This article is about implementing a stack-based bytecode virtual machine from scratch in C. We're going to define a real instruction set, implement a constant pool, write an execution loop, and run actual bytecode through it. By the end, you'll have a working VM that can execute two instructions — and, more importantly, you'll understand the foundation that makes every other instruction possible.

Fair warning: this is Part 1. We've got 17 instructions total to implement. We're covering the foundation today, and the rest in follow-up articles. But the foundation is the hard part. Once you understand the loop, everything else is just more cases in a switch statement.

What Even Is a Stack-Based Bytecode VM?

Two concepts you need to lock in before writing a single line of code.

First: the stack. A stack is a list of slots that can hold values — numbers in our case. You can add values to it (push), remove values from it (pop), and access the top of it. That's it. It's dead simple, and it's one of the most fundamental data structures in computing. You've already seen one — the JavaScript call stack is a stack. The Java heap and stack that the JVM manages? Same idea. The stack is everywhere.

The key constraint: you can only directly interact with the top. This feels limiting at first. We'll get there.

Second: bytecode. Instead of executing source code directly (like a scripting interpreter would), a bytecode VM compiles source code down into a compact binary representation — bytecode — and then interprets that. Each byte is an instruction (or an argument to one). The VM reads them one at a time and figures out what to do. This is how CPython works. It's how the JVM works. It's how the Lua VM works.

Put them together and you have a stack-based bytecode virtual machine: a program that reads bytecode instructions and uses a stack to do all its work — no registers, no complex addressing modes. Push operands, execute an instruction, the result lands on the stack. Clean, portable, and surprisingly powerful.

Now let's build it.

Define Your Instruction Set

The first thing any VM needs is an instruction set — the vocabulary of operations it knows how to perform. Every opcode is just a number. We use C preprocessor macros to give them readable names:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>

#define PUSH    0x00
#define PUSHP   0x01
#define POP     0x02
#define ADD     0x03
#define SUB     0x04
#define MUL     0x05
#define DIV     0x06
#define JMP     0x07
#define JNE     0x08
#define JE      0x09
#define JG      0x0A
#define DPRINT  0x0B
#define SPRINT  0x0C
#define DPRINTST 0x0D
#define DUP     0x0E
#define SWAP    0x0F
#define END     0x10

17 instructions total. That's our entire universe for now. Let's break down each include before moving on, because they each earn their place here.

stdio.h — standard I/O. We need this for printf to handle error messages, debug output, and printing strings from the VM. Every debug print in the execution loop runs through this.

stdlib.h — standard library. Static arrays for the stack and constant pool live here. It's also your expansion point if you later want dynamic memory allocation.

stdint.h — this one matters most. When you're doing low-level work like building a VM, you need to guarantee integer sizes. int32_t is always 32 bits. int8_t is always 8 bits. No surprises across platforms. Without this, bytecode handling gets sketchy fast.

stdbool.h — gives us bool, true, and false. We use a debug boolean flag throughout the VM to toggle verbose output without recompiling. Makes the code much more readable than throwing int flags everywhere.

The Constant Pool and Helper Functions

Here's a problem: our stack holds integers. What about strings? What about integers larger than 128? (That limit is a design choice we'll explain shortly.)

The answer is a constant pool — a shared storage area for values the stack can't hold directly. Think of it as the VM's read-only data segment.

typedef struct {
    int location; // index in the pool
    int type;
    union {
        char* Svalue;
        int   Ivalue;
    };
} const_t;

The union here is key. A const_t entry holds either a string (Svalue) or an integer (Ivalue), not both — and the type field tells us which. This keeps entries compact without wasting memory on a full-size struct for every possible value.

Now our push and pop helpers:

bool debug = false; // set to true for verbose execution traces

int push(int32_t* stack, int* sp, int value) {
    stack[*sp] = value;
    return (*sp)++;
}

int pop(int32_t* stack, int* sp) {
    if (*sp > 0) {
        (*sp)--;
        return stack[*sp];
    } else {
        printf("\e[0;31mError: Stack underflow\e[0;37m\n");
        if (debug) {
            for (int i = 0; i < *sp + 1; i++) {
                printf("%d ", stack[i]);
            }
        }
        return -1;
    }
}

A few things worth noting here:

push takes a pointer to the stack and a pointer to the stack pointer (sp). We do this so the same functions can work with multiple stacks if we ever need them — good API design even at this scale.

pop checks for stack underflow before doing anything. If sp is already at 0, there's nothing to pop. We print an error in red (\e[0;31m), dump the current stack state if debug mode is on, and return -1 as a sentinel. This kind of defensive check is what separates a toy VM from one that gives you useful errors when your bytecode is wrong.

Initialize the Stack and Constant Pool

Now we wire everything together with global state:

int32_t stack[256];
int sp = 0;
const_t pool[128];

The stack is a fixed-size array of 256 int32_t values. The stack pointer sp starts at 0 — the bottom. The constant pool holds up to 128 entries.

This is also the reason PUSH has a 128-value ceiling I mentioned earlier: we limit each PUSH argument to a single signed byte (int8_t), which maxes out at 127. If you need to put a bigger integer on the stack, you define it in the pool and use PUSHP (push from pool) instead. It's a deliberate design constraint that simplifies the instruction encoding at the cost of flexibility for large immediate values.

One thing I learned the hard way while building this: a stack with only top-of-stack access turns out to be insufficient for real programs. I discovered this when I tried implementing a Fibonacci number counter and it simply couldn't be done. After some research, I found out why — the VM was functionally equivalent to a Pushdown Automaton, and pushdown automata can't compute Fibonacci sequences. They're not Turing complete.

That's why we have SWAP in the instruction set. SWAP lets you exchange the top of the stack with an element at an arbitrary depth — giving you access to any stack slot, not just the top. Some VMs solve this differently by adding registers. We kept it purely stack-based and used SWAP instead. Either works; this was the simpler path.

The Execution Loop

This is the heart of the VM. Everything else is scaffolding around this.

void exec(int32_t* stack, int* sp, const_t* pool, int8_t* code, size_t size) {
    int pc = 0; // program counter

    if (debug)
        printf("Code is: %hhd, size is %lu\n", *code, sizeof(code));

    while (pc < size) {
        int opcode = code[pc];

        if (debug) {
            printf("Opcode: %d | PC: %d | SP: %d\n", opcode, pc, *sp);
            for (int i = 0; i < *sp; i++) {
                printf("%d ", stack[i]);
            }
            printf("\n");
        }

        if (*sp > 256) {
            printf("Stack overflow: SP=%d at opcode %d\n", *sp, opcode);
            return;
        }

        switch (opcode) {
            case SPRINT: {
                int b1 = code[++pc];
                int b2 = code[++pc];
                int idx = (b1 << 8) | b2;
                const_t entry = pool[idx];
                printf("%s\n", entry.Svalue);
                pc++;
                break;
            }
            case END: {
                return;
            }
            default: {
                printf("\e[0;31mUnrecognized instruction at PC %d\e[0;37m\n", pc);
                pc++;
                break;
            }
        }
    }
}

Let's walk through this carefully, because there's a lot happening.

int pc = 0 — the program counter. This is an index into the code array. It tells the VM which instruction to execute next. Every time we process an instruction, we advance pc. When pc >= size, the program is done.

while (pc < size) — the main dispatch loop. This runs forever (until END or until we fall off the end of the bytecode). Each iteration: read opcode, dispatch, advance.

int opcode = code[pc] — read the current byte as an opcode. This is always the first thing we do. The opcode tells us which case in the switch to execute.

The debug block — prints the current opcode, program counter, and stack pointer on every iteration, then dumps the entire stack. This is invaluable when your bytecode is misbehaving. Toggle it with debug = true and you get a full execution trace.

Stack overflow checkif (*sp > 256) catches runaway pushes before they corrupt memory. We allocated 256 slots; writing past that is undefined behavior in C and will corrupt whatever lives next to the array in memory. Check early, fail loudly.

case SPRINT — our first real instruction. SPRINT takes a 2-byte argument (16-bit index) pointing to a string in the constant pool:

  • code[++pc] — advance pc and read the high byte of the index
  • code[++pc] — advance pc again and read the low byte
  • (b1 << 8) | b2 — reconstruct the 16-bit index from two bytes. b1 is the high byte, shifted left 8 positions; b2 is the low byte. OR them together and you've got your index.
  • Fetch the pool entry, print its string, advance pc once more to land on the next instruction.

Why two bytes for the index? Because a single byte only gives us 256 pool entries maximum. Two bytes gives us 65,536 entries — more than enough headroom.

case END — terminates execution. Clean and simple.

default — instead of crashing on an unknown opcode, we print an error and advance pc. This makes the VM fault-resilient: bad instructions are logged but don't bring down the whole program. In practice this also means you could craft bytecode that does wildly different things on a second pass through the loop — an interesting property for anyone who wants to build self-modifying code experiments.

Why does exec take size_t size? Because when you pass a C array to a function, it decays to a pointer to its first element. Calling sizeof() on that pointer gives you the size of a pointer (8 bytes on 64-bit systems) — not the size of the array. We have to pass the size explicitly. This is one of C's classic gotchas.

The Main Function

Now let's wire it all together.

int main(int argc, char *argv[]) {
    // Populate the constant pool
    pool[0] = (const_t){0, 0, .Svalue = "Welcome to the Virtual Machine!"};

    // Define a simple program
    int8_t code[] = {
        SPRINT, 0x00, 0x00,
        END,
    };

    if (debug)
        printf("Size of code is %zu\n", sizeof(code));

    exec(stack, &sp, pool, code, sizeof(code));
    return 0;
}

The bytecode here is four bytes:

  • SPRINT (0x0C) — print a string from the pool
  • 0x00, 0x00 — pool index 0 (high byte 0, low byte 0)
  • END (0x10) — stop

When exec runs, it reads SPRINT, fetches the two index bytes, looks up pool[0], prints "Welcome to the Virtual Machine!", then reads END and returns. Done.

Building and Running

Compile with GCC:

gcc -o vm main.c
./vm

You should see:

Welcome to the Virtual Machine!

That's it. A working VM executing real bytecode. Two instructions — but they're the two instructions that prove the entire architecture works.

What's Missing (And What's Next)

We've got the shell. The loop, the stack, the constant pool, the instruction decoder — all of it is in place. The remaining 15 instructions are just more cases in the switch statement, each implementing specific behavior:

  • PUSH / PUSHP / POP — stack manipulation
  • ADD / SUB / MUL / DIV — arithmetic
  • JMP / JNE / JE / JG — jumps and branching (this is where control flow comes from)
  • DPRINT / DPRINTST — print integers and stack state
  • DUP — duplicate the top of the stack
  • SWAP — swap top with another stack element (our Turing-completeness fix)

The jump instructions are where things get genuinely interesting — they're what allows loops and conditionals, and they're also where most VM bugs live. We'll dig into all of them in Part 2.

The Full Code

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>

#define PUSH     0x00
#define PUSHP    0x01
#define POP      0x02
#define ADD      0x03
#define SUB      0x04
#define MUL      0x05
#define DIV      0x06
#define JMP      0x07
#define JNE      0x08
#define JE       0x09
#define JG       0x0A
#define DPRINT   0x0B
#define SPRINT   0x0C
#define DPRINTST 0x0D
#define DUP      0x0E
#define SWAP     0x0F
#define END      0x10

bool debug = false;

typedef struct {
    int location;
    int type;
    union {
        char* Svalue;
        int   Ivalue;
    };
} const_t;

int32_t stack[256];
int sp = 0;
const_t pool[128];

int push(int32_t* stack, int* sp, int value) {
    stack[*sp] = value;
    return (*sp)++;
}

int pop(int32_t* stack, int* sp) {
    if (*sp > 0) {
        (*sp)--;
        return stack[*sp];
    } else {
        printf("\e[0;31mError: Stack underflow\e[0;37m\n");
        if (debug) {
            for (int i = 0; i < *sp + 1; i++) {
                printf("%d ", stack[i]);
            }
        }
        return -1;
    }
}

void exec(int32_t* stack, int* sp, const_t* pool, int8_t* code, size_t size) {
    int pc = 0;

    if (debug)
        printf("Code is: %hhd, size is %lu\n", *code, sizeof(code));

    while (pc < size) {
        int opcode = code[pc];

        if (debug) {
            printf("Opcode: %d | PC: %d | SP: %d\n", opcode, pc, *sp);
            for (int i = 0; i < *sp; i++) {
                printf("%d ", stack[i]);
            }
            printf("\n");
        }

        if (*sp > 256) {
            printf("Stack overflow: SP=%d at opcode %d\n", *sp, opcode);
            return;
        }

        switch (opcode) {
            case SPRINT: {
                int b1 = code[++pc];
                int b2 = code[++pc];
                int idx = (b1 << 8) | b2;
                const_t entry = pool[idx];
                printf("%s\n", entry.Svalue);
                pc++;
                break;
            }
            case END: {
                return;
            }
            default: {
                printf("\e[0;31mUnrecognized instruction at PC %d\e[0;37m\n", pc);
                pc++;
                break;
            }
        }
    }
}

int main(int argc, char *argv[]) {
    pool[0] = (const_t){0, 0, .Svalue = "Welcome to the Virtual Machine!"};

    int8_t code[] = {
        SPRINT, 0x00, 0x00,
        END,
    };

    if (debug)
        printf("Size of code is %zu\n", sizeof(code));

    exec(stack, &sp, pool, code, sizeof(code));
    return 0;
}

The Conclusion

A stack-based bytecode VM is one of those things that looks intimidating from the outside and obvious once you've built one. The core is dead simple: a loop, an array, a program counter, and a switch statement. Everything else — arithmetic, jumps, function calls, string handling — is just more cases.

The real lessons here aren't specific to VMs. They show up everywhere in low-level systems work: pass sizes explicitly when arrays decay to pointers, check your boundaries before you blow past them, make debug output a first-class citizen from day one, and choose your constraints deliberately (like our 128-value push limit) rather than discovering them as bugs.

Part 2 will add arithmetic, stack manipulation, and the jump instructions — which is where this VM becomes genuinely useful. The jump instructions are also where most of the interesting bugs live, so that should be a good time.

Compile it. Run it. Tweak the bytecode. Break it. That's how this stuff actually sticks.

Read more