GoDis compiles Go source code to Dis bytecode for execution on the Inferno OS Dis virtual machine. It translates Go's SSA intermediate representation directly into Dis VM instructions, mapping Go's concurrency primitives, type system, and memory model onto Dis's runtime facilities.
This document covers the compiler's architecture, the translation strategy, every critical pattern discovered during development, and every bug encountered and fixed. It is intended as both a developer reference and a research record.
- Motivation
- Architecture Overview
- Build and Test
- Compilation Pipeline
- The Dis Virtual Machine
- Go-to-Dis Translation Strategy
- Frame Layout and Memory Model
- Type System Mapping
- Instruction Lowering
- Interface Dispatch
- Closures and Higher-Order Functions
- Channels and Concurrency
- Exception Handling (panic/recover)
- Standard Library Interception
- Multi-Package Compilation
- Critical Patterns
- Bug Log
- Test Suite
- Project Statistics
- Status and Limitations
The Dis VM is the execution engine of Inferno OS, a distributed operating system descended from Plan 9. Dis is a register-based, garbage-collected virtual machine with native support for concurrency (channels, spawn), module linking, and type-safe memory management via pointer maps.
Inferno's native language is Limbo, but Limbo has a small community and limited tooling. Go shares deep ancestry with Limbo and Inferno through Rob Pike and the Bell Labs lineage. Both languages have goroutines/channels, garbage collection, module systems, and similar type philosophies.
GoDis exploits this shared lineage. Rather than writing a Go runtime for Dis, we map Go primitives directly onto Dis VM features:
| Go Feature | Dis Equivalent |
|---|---|
go f() |
SPAWN |
chan T |
NEWC / SEND / RECV |
select |
ALT / NBALT |
| Garbage collection | Dis reference counting + pointer maps |
string |
Dis string type (ADDC, SLICEC, INDC) |
[]T |
Dis arrays (NEWA, LENA, INDEXA) |
map[K]V |
Limbo-compatible ADT (future: native Dis) |
panic/recover |
Dis exception handler tables (RAISE) |
| Module imports | Dis LDT (Loader Dispatch Table) |
This "adapt to the runtime" strategy means compiled Go programs are first-class Dis citizens: they can be spawned by Limbo programs, share channels across language boundaries, and participate in Inferno's namespace and security model.
tools/godis/
├── compiler/ # Core compiler (12,237 lines)
│ ├── compiler.go # Orchestrator: parse, SSA, link, emit (1,748 lines)
│ ├── lower.go # SSA → Dis instruction lowering (7,019 lines)
│ ├── types.go # Go → Dis type mapping
│ ├── frame.go # Stack frame slot allocator
│ ├── builtins.go # Sys module function signatures
│ └── compiler_test.go # E2E test suite (3,056 lines)
├── dis/ # Dis bytecode library (1,994 lines)
│ ├── opcode.go # 62+ VM opcode definitions
│ ├── inst.go # Instruction representation
│ ├── encode.go # Binary serialization
│ ├── decode.go # Binary deserialization
│ ├── module.go # Module structure
│ ├── data.go # Data section (strings, constants)
│ ├── typedesc.go # Type descriptor format
│ └── dis_test.go # Round-trip tests
├── cmd/
│ ├── godis/main.go # CLI compiler tool
│ ├── debug/main.go # Dis bytecode inspector
│ └── ssadump/main.go # SSA IR dump tool
└── testdata/ # 172 test programs
├── hello.go ... switch.go # Single-file feature tests
├── tier6_*.go # Coverage tier tests
├── bench/ # Performance benchmarks
├── chain/ # Multi-package import chain
├── multipkg/ # Multi-package tests
└── sharedtype/ # Cross-package type sharing
Dependencies: Go 1.24+, golang.org/x/tools (SSA construction).
cd tools/godis
# Build everything
go build ./...
# Run all tests
go test ./... -count=1
# Compile a Go program to Dis
go run ./cmd/godis/ testdata/hello.go
# Run the compiled program on Inferno's emulator
# (from the infernode project root)
./emu/Linux/o.emu -r. /tools/godis/hello.dis
# Inspect compiled bytecode
go run ./cmd/debug/ hello.dis
# Dump SSA IR for a Go file
go run ./cmd/ssadump/ testdata/hello.go Go source files (.go)
│
▼
┌──────────────────┐
│ go/parser │ Parse to AST
│ go/types │ Type-check
└────────┬─────────┘
│
▼
┌──────────────────┐
│ x/tools/go/ssa │ Build SSA IR
│ ssautil.Packages │ (with optimizations)
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Compiler │
│ ├─ scanClosures │ Discover closures, method values
│ ├─ collectTypes │ Build type tag registry
│ ├─ collectMethods│ Map interface dispatch tables
│ └─ collectInits │ Find init() functions
└────────┬─────────┘
│
▼
┌──────────────────┐
│ funcLowerer │ Per-function lowering:
│ ├─ allocateSlots │ Frame layout for params, locals
│ ├─ lowerBlock │ SSA block → instruction sequence
│ ├─ patchBranches │ Resolve forward references
│ └─ emitHandler │ Exception tables (if recover)
└────────┬─────────┘
│
▼
┌──────────────────┐
│ ModuleData │ Assemble: instructions, type
│ ├─ TypeDescs │ descriptors, data section,
│ ├─ Links/LDT │ module links, handler tables
│ └─ Encode() │ Serialize to binary .dis
└──────────────────┘
1. Parsing and Type Checking. Standard Go toolchain. We use
go/parser.ParseFile and go/types.Config.Check. A custom stubImporter
provides synthetic type information for packages that don't exist in the host Go
environment but have Dis equivalents (inferno/sys, fmt, strings, math,
strconv, errors, os, sort, sync, time, log, io).
2. SSA Construction. golang.org/x/tools/go/ssa builds the SSA form with
ssa.InstantiateGenerics | ssa.SanityCheckFunctions. Every Go function becomes a
sequence of ssa.BasicBlocks containing typed SSA instructions. This is our
primary IR — we never build our own.
3. Pre-compilation Analysis. Before lowering any function, the compiler scans all SSA functions to:
- Discover closures (
MakeClosure→ inner function mapping) - Discover bound method wrappers (
$boundfunctions synthesized by SSA) - Allocate type tags for interface dispatch
- Build method dispatch tables (
ifaceDispatch) - Collect
init#Nfunctions for startup sequencing
4. Per-function Lowering. Each SSA function is lowered independently by a
funcLowerer. Frame slots are allocated for parameters, locals, and temporaries.
SSA instructions are translated to Dis instructions one basic block at a time.
Branch targets are recorded as patches and resolved after all blocks are emitted.
5. Module Assembly. All lowered functions are concatenated into a single
instruction stream (Dis modules are flat). Type descriptors, the data section
(string/float literals, globals), module links (Sys imports), and exception
handler tables are assembled into a ModuleData and serialized to binary.
Dis is a register-based VM with the following characteristics:
- Three-operand instructions:
OP src, mid, dstwheredst = mid OP srcfor arithmetic. Note the operand order — this is opposite to most architectures. - Memory spaces: FP (frame pointer, local variables), MP (module data, globals/constants), immediate.
- Typed instructions: Separate opcodes for words (W), big integers (L), floats (F), pointers (P), bytes (B), strings (C).
- Reference counting GC: Pointer assignments (
MOVP) automatically adjust reference counts. Type descriptors contain pointer maps so the VM knows which frame slots and heap objects contain pointers. - H = nil: The constant
H((void*)(-1), 0xFFFFFFFFFFFFFFFF on 64-bit) represents nil/uninitialized for all pointer types. - Concurrency primitives:
SPAWN(create thread),NEWC(create channel),SEND/RECV(synchronous channel ops),ALT/NBALT(select). - Exception handling: Handler tables map PC ranges to exception handlers.
RAISEthrows a string exception. - Module linking: The Loader Dispatch Table (LDT) enables calling functions in other modules (e.g., the Sys built-in module).
The core insight is "adapt Go to Dis runtime" — rather than implementing Go's runtime semantics on top of Dis, we identify the closest Dis primitive for each Go feature and generate code that uses it natively.
- Goroutines → SPAWN. Go's
go f(args)compiles toSPAWNwith a new frame. The Dis VM handles scheduling. - Channels → NEWC/SEND/RECV. Go channels map directly to Dis channels. Buffered channels use NEWC's buffer size operand.
- Strings → Dis strings. Go strings are Dis strings. Concatenation is ADDC, slicing is SLICEC, indexing is INDC.
- Slices → Dis arrays. Go slices compile to Dis arrays created with NEWA. Length is LENA, indexing is INDW/INDB.
- Select → ALT/NBALT. Go's select statement compiles to Dis's ALT (blocking) or NBALT (non-blocking with default).
- Panic → RAISE. Go's panic compiles to Dis RAISE with a string exception.
- Module imports → LDT. Calls to
inferno/sysfunctions use Dis's module linking via IMFRAME/IMCALL with LDT indices.
- Interfaces — Dis has no native interfaces. We implement tagged dispatch using two-word values (type tag + data) and BEQW dispatch chains.
- Maps — Dis has no native hash maps. We use wrapper structs with sorted arrays and binary search (future: native Dis table type).
- Closures — Dis has no closures. We allocate heap structs containing free variables and a function tag, with dispatch chains at call sites.
- Defer — Dis has no defer. We inline deferred calls at every return point in LIFO order, with exception handlers for panic paths.
- Recover — Dis exception handlers + a module-data bridge pattern (handler writes exception to global, deferred closure reads it).
- Standard library — Intercepted and inlined.
fmt.Sprintfbecomes a sequence of CVTWC/ADDC operations.strings.Containsbecomes a SLICEC loop.
Every Dis function call allocates a frame. GoDis uses a fixed header followed by locals and temporaries:
Offset Slot Purpose
────── ──── ───────
0 REGLINK Return address (managed by VM)
8 REGFRAME Caller's frame pointer
16 REGMOD Module pointer
24 REGTYP Type descriptor pointer
32 REGRET Return value pointer
40 STemp Scratch temporary (word)
48 RTemp Scratch temporary (real/float)
56 DTemp Scratch temporary (word)
64+ Locals Parameters, variables, temporaries
MaxTemp = 64 bytes. Arguments to called functions start at offset 64 in the callee's frame.
The Frame struct tracks slot allocation with a pointer bitmap:
AllocWord(name)— 8-byte non-pointer slot. Used for integers, booleans, floats, and addresses computed by LEA.AllocPointer(name)— 8-byte pointer slot. GC-traced. Used for strings, arrays, channels, heap objects.AllocReal(name)— 8-byte float slot. Used for float64 data items in MP.AllocTemp(isPtr)— Unnamed temporary.allocPtrTemp()— Pointer temp with H-initialization (emitsMOVW $(-1)).
Every frame and heap object has a type descriptor that tells the GC which offsets contain pointers. The descriptor encodes:
size— total size in bytesmap— bitmap where 1 = pointer at that word offset
This is critical for correctness. If a pointer slot is not marked in the type descriptor, the GC won't trace it and the object may be prematurely freed. If a non-pointer slot is marked as a pointer, the GC will try to adjust a reference count on garbage data, causing crashes.
Dis represents nil as H = (void*)(-1) = 0xFFFFFFFFFFFFFFFF. This is NOT zero.
Every pointer slot in a new frame or heap object is initialized to H by the VM
(if the type descriptor marks it as a pointer). Non-pointer slots are NOT
initialized — they contain whatever was previously on the stack.
This asymmetry is a persistent source of bugs. See Bug Log.
type DisType struct {
Size int32 // Bytes needed
IsPtr bool // GC-tracked pointer?
}| Go Type | Dis Size | IsPtr | Notes |
|---|---|---|---|
int, int64, uint64 |
8 | No | WORD (64-bit on Dis) |
int32, int16, int8 |
8 | No | Widened to WORD |
bool |
8 | No | WORD (0 or 1) |
float64 |
8 | No | Dis REAL |
string |
8 | Yes | Dis string (ref-counted) |
*T (pointer) |
8 | Yes | Heap pointer |
[]T (slice) |
8 | Yes | Dis array |
map[K]V |
8 | Yes | Wrapper struct pointer |
chan T |
8 | Yes | Wrapper struct pointer |
func(...) |
8 | Yes | Closure struct pointer |
interface{} |
16 | No | 2 WORDs: [tag, value] |
struct{...} |
N*8 | Mixed | Consecutive slots per field |
error |
16 | No | Interface (2 WORDs) |
Interfaces are two consecutive WORDs, not a pointer:
Offset 0: type tag (WORD) — integer identifying the concrete type
Offset 8: value (WORD) — the concrete value or pointer to heap data
The type tag is allocated by AllocTypeTag(typeName) and is unique per concrete
type. Tag 0 means nil interface. The value word holds the concrete value directly
for scalar types, or a pointer to heap-allocated data for structs.
On ARM64 Dis, both int (WORD) and big (LONG) are 64-bit. The CVTWL/CVTLW
instructions are just copies, NOT 32-bit truncation operations. This is a
difference from 32-bit Dis where WORD=4, LONG=8.
The funcLowerer translates each SSA instruction to one or more Dis instructions.
The lowering is implemented as a large switch on SSA instruction type in
lower.go (6,703 lines).
Dis instructions have three operand slots: src, mid, dst. Each can be:
Imm(v)— immediate constantFP(off)— frame pointer + offset (locals)MP(off)— module pointer + offset (globals, constants)FPInd(base, off)— indirect through FP (heap object fields)MPInd(base, off)— indirect through MP
Critical: Operand{Mode: 0} is AMP (absolute MP), NOT "no operand". The
"no operand" mode is AXXX=3. Always use the Inst0(), Inst1(), Inst2()
helpers to construct instructions with the correct number of operands.
For arithmetic: OP src, mid, dst means dst = mid OP src.
This is the opposite of what you might expect. For a - b:
SUBW b, a, result # result = a - b (dst = mid - src)
For non-commutative operations (SUB, DIV, MOD), the operands must be swapped
from the natural Go order. The compiler's emitArith handles this.
For conditional branches: BOP src, mid, dst means "if src OP mid, goto dst".
BLTW x, y, target # if x < y, goto target
BGEW i, len, done # if i >= len, goto done
The first operand is the tested value. This is a frequent source of bugs —
swapping BGEW FP(i), FP(len), done to BGEW FP(len), FP(i), done silently
inverts the condition.
For Dis comparison instructions used by the JIT:
CMP src, mid tests src OP mid. The CMP operand order must match the branch
condition. Getting this backwards silently inverts all comparisons.
Go interfaces are implemented as tagged two-word values with dispatch chains. This avoids virtual method tables (which Dis doesn't support) in favor of inline type-tag switching.
type Compiler struct {
typeTagMap map[string]int32 // "main.Dog" → 1, "main.Cat" → 2
typeTagNext int32 // next available tag
}Tags are allocated during pre-compilation analysis when concrete types are discovered implementing interface methods.
var a Animal = Dog{name: "Rex"}Compiles to:
MOVW $tag_Dog, FP(iface+0) # store type tag
MOVW FP(dog_val), FP(iface+8) # store value (or LEA for structs)
d, ok := a.(Dog)Compiles to:
BEQW $tag_Dog, FP(iface+0), $match # check tag
MOVW $0, FP(ok) # mismatch: ok = false
JMP $end
match:
MOVW FP(iface+8), FP(d) # extract value
MOVW $1, FP(ok) # ok = true
end:
Without comma-ok, a mismatch raises a panic instead.
For a single implementing type, the call is direct. For multiple types:
a.Speak() // a is Animal, could be Dog or CatCompiles to a BEQW dispatch chain:
BEQW $tag_Dog, FP(iface+0), $call_dog
BEQW $tag_Cat, FP(iface+0), $call_cat
RAISE "unknown type"
call_dog:
IFRAME ...
# load receiver from iface+8
ICALL dog_speak
JMP $exit
call_cat:
IFRAME ...
# load receiver from iface+8
ICALL cat_speak
JMP $exit
exit:
Type switches compile to sequential tag comparisons, same as multi-type assert.
The error interface gets special treatment. errors.New("msg") creates a
tagged interface with tag=errorString and value=the string itself (not a pointer
to a struct). The Error() method is synthetic — it just returns the value word
directly, since the value IS the error string.
Dis has no native closure support. GoDis allocates heap structs for closures:
Offset 0: function tag (WORD) — identifies which function this closure calls
Offset 8: free var 0 — captured variable
Offset 16: free var 1 — captured variable
...
The function tag is critical for dynamic dispatch — when a closure is passed as a value and called through a variable, the tag identifies which function to call.
adder := func(x int) int { return x + base }Compiles to:
INEW $closure_td # allocate closure struct
MOVW $tag_anon1, FPInd(closure, 0) # store function tag
MOVW FP(base), FPInd(closure, 8) # capture 'base'
When calling a closure through a variable (higher-order function):
func apply(f func(int) int, x int) int { return f(x) }The compiler emits a BEQW chain over all closures with matching signatures:
MOVW FPInd(f, 0), FP(tag) # read function tag
BEQW $tag_anon1, FP(tag), $call1 # is it closure 1?
BEQW $tag_anon2, FP(tag), $call2 # is it closure 2?
RAISE "unknown function"
call1:
IFRAME $frame_anon1
MOVP FP(f), MaxTemp+0(fp) # pass closure ptr (for free vars)
MOVW FP(x), MaxTemp+8(fp) # pass argument
ICALL anon1
JMP $exit
...
closureSignaturesMatch() compares Go-level signatures (parameter types, return
types) to determine which closures could be called through a given variable. The
comparison uses Go's types.Signature and excludes hidden parameters.
When a named function (not a closure) is used as a value:
f := myFuncmaterializeFuncValue() wraps it in a tag-only 8-byte closure struct (no free
variables, just the tag). This ensures all function values have the same
representation at call sites.
a := Adder{base: 10}
f := a.Add // method valueGo SSA synthesizes (*Adder).Add$bound — a wrapper that captures the receiver.
These $bound functions are not regular package members or anonymous functions.
After scanClosures, we iterate the closure map to discover them and add them to
the compilation set.
Every chan T is a heap-allocated wrapper struct:
Offset 0: rawCh (PTR) — the actual Dis channel
Offset 8: closed (WORD) — 0 = open, 1 = closed
Offset 16: cap (WORD) — buffer capacity
Type descriptor: 24 bytes, pointer at offset 0.
This wrapper is necessary because Dis channels have no native close semantics.
The closed flag is checked on send (panic if closed) and on receive (drain
buffer then return zero value).
| Go | Dis |
|---|---|
make(chan T) |
NEWC + INEW wrapper |
make(chan T, n) |
NEWC with buffer size in mid operand |
ch <- v |
Check closed flag, then SEND |
<-ch |
If open: RECV. If closed: NBALT to drain, zero if empty |
v, ok := <-ch |
Same as above but set ok=false when closed+empty |
close(ch) |
MOVW $1 to closed flag at wrapper offset 8 |
cap(ch) |
Read wrapper offset 16 |
for range ch |
CommaOk receive, exit when ok=false |
Go's select compiles to Dis ALT (blocking, no default) or NBALT
(non-blocking, has default). The ALT instruction takes a descriptor encoding
which channels to wait on and which direction (send/receive).
Goroutines blocked on RECV when close() is called on another goroutine will
NOT be unblocked. This is a Dis VM limitation — the VM's channel implementation
has no close-notification mechanism. The closed flag is only checked at the
next receive attempt.
panic(v) compiles to:
# Convert v to string if needed (CVTWC for int, etc.)
RAISE FP(str) # or RAISE MP(str) for string constants
recover() uses Dis exception handler tables. The mechanism is complex because
Go's recover only works inside deferred functions, which are closures.
The module-data bridge pattern:
- The enclosing function has a Dis exception handler table entry covering its body. When an exception occurs, the VM jumps to the handler PC.
- The handler stores the exception string to a global MP slot (
excGlobal). - The handler then executes deferred closures.
- Inside a deferred closure,
recover()reads fromexcGlobal(mp). - If non-nil, recover returns the value as a tagged interface (tag=errorString, value=the exception string) and zeros the global.
Handler table entry format:
{eoff, pc1, pc2, descID=-1, ne=0, wildPC}
eoff— offset in frame where VM stores exception stringpc1, pc2— PC range covered by handlerdescID=-1— no type descriptor (exception is a string)ne=0— number of named exceptions (0 = wildcard only)wildPC— PC to jump to on any exception
Zero-divide check: ARM64's sdiv returns 0 on divide-by-zero (no trap).
The compiler emits an explicit check before every integer DIVW/MODW:
BNEW divisor, $0, $skip
RAISE "zero divide"
skip:
DIVW ...
GoDis does not link Go's standard library. Instead, it intercepts calls to known packages and inlines equivalent Dis instruction sequences.
| Go Function | Implementation |
|---|---|
fmt.Sprintf(fmt, args...) |
Parse format string at compile time, emit inline ops per verb |
fmt.Printf(fmt, args...) |
Sprintf + sys.print |
fmt.Println(args...) |
Trace varargs, emit print per element with spaces + newline |
fmt.Errorf(fmt, args...) |
Sprintf + wrap as tagged error interface |
Sprintf verb implementation:
%d,%v(int) → CVTWC (integer to decimal string)%s→ pass-through%c→ INSC (rune to single-character string)%x→ hex loop (ANDW + SHRW + lookup table)%f,%g→ CVTFC (float to string)%t→ branch on bool, emit "true"/"false"%q→ ADDC with quote characters%p→ CVTWC with "0x" prefix%b→ binary loop (ANDW + SHRW)%o→ octal loop%%→ literal "%"- Width/precision padding → LENC + ADDC loop
Multiple format segments are concatenated with ADDC (string concatenation).
Vararg tracing: fmt.Println and fmt.Sprintf receive arguments as
[]interface{}. The compiler traces the SSA data flow backwards:
Slice → Alloc → IndexAddr → Store → MakeInterface → original value.
This recovers the original typed values so we can emit type-specific print code.
| Function | Implementation |
|---|---|
strings.Contains(s, sub) |
SLICEC + BEQC loop |
strings.HasPrefix(s, pre) |
SLICEC + BEQC |
strings.HasSuffix(s, suf) |
LENC + SLICEC + BEQC |
strings.Index(s, sub) |
SLICEC scan loop |
strings.TrimSpace(s) |
INDC loop from both ends, check whitespace, SLICEC |
strings.Split(s, sep) |
Two-pass: count occurrences, NEWA, fill with SLICEC |
strings.Join(elems, sep) |
Loop: INDW element, MOVP deref, ADDC with sep |
strings.Replace(s, old, new, n) |
Scan + SLICEC + ADDC rebuild |
strings.ToUpper(s) / ToLower(s) |
INDC loop, INSC rebuild with ±32 |
strings.Repeat(s, n) |
ADDC loop |
| Function | Implementation |
|---|---|
math.Abs(x) |
MOVF + BGEF + NEGF (conditional negation) |
math.Sqrt(x) |
Newton's method: 15 iterations, unrolled |
math.Min(x, y) |
MOVF + BLTF branch |
math.Max(x, y) |
MOVF + BGTF branch |
| Function | Implementation |
|---|---|
strconv.Itoa(i) |
CVTWC |
strconv.Atoi(s) |
CVTCW (with error interface return) |
strconv.FormatInt(i, base) |
CVTWC (base 10), loop for base 2/8/16 |
| Package | Functions | Implementation |
|---|---|---|
errors |
New(msg) |
Tagged interface: tag=errorString, value=string |
os |
Exit(code) |
RET |
sort |
Ints, Strings, IntsAreSorted |
Inline insertion sort |
sync |
Mutex, WaitGroup, Once |
Channel-based stubs |
time |
After, Duration.Milliseconds, Time.Sub |
sys.sleep wrapper |
log |
Println, Fatal |
sys.print + optional exit |
io |
Reader, Writer, EOF |
Type stubs |
Direct Dis module calls via LDT:
| Function | Dis Call | Notes |
|---|---|---|
sys.fildes(n) |
IMFRAME + IMCALL | Returns file descriptor |
sys.fprint(fd, fmt, args...) |
IFRAME + IMCALL | Variadic (custom TD) |
sys.print(fmt, args...) |
IFRAME + IMCALL | Variadic |
sys.sleep(ms) |
IMFRAME + IMCALL | |
sys.millisec() |
IMFRAME + IMCALL | Returns int |
sys.open(path, mode) |
IMFRAME + IMCALL | Returns FD |
sys.read(fd, buf, n) |
IMFRAME + IMCALL | Returns count |
sys.write(fd, buf, n) |
IMFRAME + IMCALL | Returns count |
sys.create(path, mode, perm) |
IMFRAME + IMCALL | Returns FD |
sys.seek(fd, off, whence) |
IMFRAME + IMCALL | |
sys.bind(src, dst, flags) |
IMFRAME + IMCALL | Namespace binding |
sys.chdir(path) |
IMFRAME + IMCALL | |
sys.remove(path) |
IMFRAME + IMCALL | |
sys.pipe(fds) |
IMFRAME + IMCALL | |
sys.dup(old, new) |
IMFRAME + IMCALL | |
sys.pctl(flags, movefd) |
IMFRAME + IMCALL | Process control |
import "mathutil" // resolved from baseDir/mathutil/*.goThe localImporter falls through from stubImporter. It reads all .go files
in baseDir/pkg/, parses them, and provides type information. All packages are
inlined into a single .dis file — there is no Dis inter-module linking for
local packages.
Global variables from imported packages are prefixed with the package path to avoid collisions:
main.counter → MP offset 0
mathutil.counter → MP offset 8
Local packages can import other local packages. The localImporter resolves
transitively: if main imports mid and mid imports base, all three are
compiled and inlined.
Cross-package struct creation and return works correctly. A struct defined in
package geom can be created in main and the fields are laid out consistently
because both packages see the same types.Struct from the type checker.
These patterns were discovered through debugging and are essential for correct code generation. Each represents a class of bug that is easy to reintroduce.
// WRONG — Mode 0 is AMP (absolute MP addressing)
inst := dis.Inst{Op: "MOVW", Dst: operand}
// RIGHT — use helpers that set unused operands to AXXX
inst := dis.Inst1("MOVW", dst)Operand{Mode: 0} encodes as AMP (address mode 0), which means "absolute
address in module data." Using a zero-value operand when you mean "no operand"
will cause the VM to read from a garbage MP address.
When computing an address with LEA (Load Effective Address), the result is a raw address into the stack or module data. It is NOT a heap pointer. Storing it in a pointer-typed slot will confuse the GC.
// Stack addresses, MP addresses, FieldAddr, IndexAddr results:
slot := frame.AllocWord(name) // NOT AllocPointerInterior pointers (field addresses, array element addresses) are also non-GC words.
SSA phi nodes are eliminated by inserting MOV instructions at the end of each predecessor block (before the terminator). If multiple phis exist, their moves must not interfere — the compiler handles this by emitting all moves before any branch.
allocPtrTemp() emits a MOVW $(-1) (H-initialization) at the current
compilation point. If used inside a loop body, the MOVW runs every iteration,
stomping valid pointer values without calling destroy — causing reference count
leaks.
Fix: For loop-body pointer temporaries where the underlying data is kept
alive by another reference, use AllocWord + MOVW instead (no refcount
management).
SLICEC src, mid, dst → dst = dst[src:mid]
Where src=start index, mid=end index. The destination string is ALSO the input string — SLICEC modifies in place (well, replaces the string reference).
- INDW is for array element addressing:
INDW arr, addr, idx→ addr = &arr[idx] - INDC is for string character extraction:
INDC str, idx, dst→ dst = rune at str[idx]
Using INDW on a string causes a nil dereference because a string is not an array object.
INDW src=array, mid=resultAddr, dst=index
The mid operand gets the address, NOT dst. This is counterintuitive given that most instructions put results in dst.
Functions with frame size 0 (like print, fprint) are variadic in Dis. They
require IFRAME with a custom call-site type descriptor, not IMFRAME with the
standard function TD.
When materializing a nil interface constant, both words must be explicitly set to 0:
MOVW $0, FP(iface+0) # tag = 0
MOVW $0, FP(iface+8) # value = 0
Non-pointer frame slots are NOT zero-initialized by the VM, so relying on default values will read garbage.
func constOperand(c *ssa.Const) Operand {
if c.Value == nil {
if isPointerType(c.Type()) {
return Imm(-1) // H (nil pointer)
}
return Imm(0) // zero value
}
}Pointer nil is H (-1), not 0. Getting this wrong breaks nil comparisons for slices, maps, channels, and function values.
LENA on a nil (H) array crashes the VM. Before computing len(s) or
cap(s), check for nil:
MOVW $0, FP(dst) # default: length = 0
BEQW FP(slice), $(-1), $skip # if slice == H, skip
LENA FP(slice), FP(dst) # safe: slice is non-nil
skip:
Same pattern for append(nil, ...).
Go's println(true) prints the string "true", not "1". The compiler emits:
MOVP MP("false"), FP(tmp)
BEQW FP(val), $0, $skip
MOVP MP("true"), FP(tmp)
skip:
# print FP(tmp)
BGEW src, mid, dst means "if src >= mid goto dst". The first operand is
always the tested value. Swapping operands silently inverts the condition.
Every bug encountered during development, in chronological order. Each entry describes the symptom, root cause, and fix.
Symptom: Random crashes when instructions had fewer than 3 operands.
Cause: Default Operand{} has Mode=0, which is AMP (absolute MP), not
"no operand" (AXXX=3).
Fix: Inst0(), Inst1(), Inst2() helpers that set unused operands to
AXXX mode.
Symptom: a - b computed b - a.
Cause: Dis three-operand format is dst = mid OP src, so SUB requires
swapping the Go-order operands.
Fix: emitArith swaps operands for non-commutative ops.
Symptom: GC corruption after FieldAddr/IndexAddr operations.
Cause: LEA results (stack/MP addresses) stored in pointer-typed slots.
The GC tried to trace them as heap pointers.
Fix: All LEA destinations use AllocWord.
Symptom: Wrong values after if/else branches with multiple phi nodes. Cause: Phi elimination MOVs were inserted inline, so later MOVs could read values already overwritten by earlier MOVs. Fix: Emit all MOVs at end of predecessor blocks, using temps for conflicts.
Symptom: alloc:D2B: addr in free blk — heap corruption during cleanup.
Cause: lowerChangeType used MOVW for ALL types, including pointers
(channels, slices). This bypassed reference counting.
Fix: Use MOVP for pointer types to maintain GC reference counts.
Symptom: Closures read wrong captured values.
Cause: Free variable loads started at offset 0 of the closure struct,
overlapping the function tag.
Fix: emitFreeVarLoads() starts at offset 8 (skip tag word).
Symptom: "unknown function" panic when calling method values.
Cause: $bound wrapper functions synthesized by SSA are not package
members or anonymous functions — they weren't discovered.
Fix: After scanClosures, iterate closureMap to find unseen inner
functions and add to compilation set.
Symptom: Nil dereference in emitStringToRuneSlice.
Cause: Used INDW (array element addr) on a string. Strings are not arrays.
Also had wrong operand order.
Fix: Use INDC src, FP(idx), FP(runeSlot).
Symptom: *Node == nil comparisons always false for BST/linked list code.
Cause: constOperand(nil:*T) returned Imm(0). Dis nil is H = -1.
Fix: Return Imm(-1) for pointer types.
Symptom: make([]int, n) contained garbage values.
Cause: NEWA's initarray skips types with np==0 (no pointers),
leaving non-pointer elements uninitialized.
Fix: Emit explicit zero-init loops (emitArrayZeroInit /
emitArrayZeroInitDynamic) after NEWA.
Symptom: VM crash on len(nil_slice).
Cause: LENA on H (nil) dereferences invalid memory.
Fix: Emit nil check before LENA. Constant nil → emit MOVW $0 directly.
Symptom: Memory growth in long-running programs with loops.
Cause: allocPtrTemp() emits H-init MOVW at compilation point. In loops,
this stomps valid pointers without destroy → refcount never decremented.
Fix: Use AllocWord + manual MOVW for loop-body temps where data is kept
alive by other references.
Symptom: Type assertions on nil interfaces matched random types.
Cause: Non-pointer frame slots contain stack garbage. Nil interface
materialization didn't zero the tag word.
Fix: Explicit MOVW $0 for both tag and value words.
Symptom: Multi-field struct returns had garbage in fields after the first.
Cause: lowerReturn only copied one word to REGRET for struct types.
Fix: Copy ALL struct fields to REGRET offsets. Also fixed slotOf to
call allocStructFields() for struct-typed SSA values.
Symptom: Wrong values for globals when two packages had same-named vars.
Cause: Globals from all packages shared the same namespace.
Fix: Prefix with package path: pkgPath.varName.
Symptom: uint64(a) < uint64(b) gave wrong results for large values.
Cause: Dis has only signed comparisons (BLTW = signed less-than).
Fix: XOR both operands with sign bit (0x8000000000000000) before
comparison, flipping the sign so unsigned order maps to signed order.
Symptom: "unhandled SSA instruction: *ssa.Field" panic.
Cause: *ssa.Field (direct struct field extraction) was not implemented.
Only *ssa.FieldAddr (field address) was handled.
Fix: Implement lowerField: copy from struct base + field offset to
destination slot.
Symptom: "unsupported function: fmt.Printf" error. Cause: Only fmt.Sprintf and fmt.Println were intercepted. Fix: Implement as Sprintf + sys.print.
Symptom: VM crash on panic(42) (integer argument).
Cause: RAISE requires a string operand. Non-string panic values weren't
converted.
Fix: Emit CVTWC (int→string) before RAISE for non-string panic arguments.
Symptom: strconv.Atoi returned non-nil error on success.
Cause: Nil error was materialized as MOVW $(-1) (H) for both words.
But nil error is a nil interface = 2 zero words, not H.
Fix: Nil error = MOVW $0, tag; MOVW $0, value.
Symptom: println(true) outputs "1".
Cause: Bool was printed as integer.
Fix: Emit conditional branch to select "true"/"false" string before print.
Symptom: Nil comparisons for non-pointer-looking types failed.
Cause: constOperand only checked *types.Pointer for nil→H mapping.
Slices, maps, channels, and functions are also pointers in Dis.
Fix: Check all reference types: Pointer, Slice, Map, Chan, Signature.
Symptom: for i, v := range slice skipped last element.
Cause: Loop exit condition used BGTW (>) instead of BGEW (>=) for
the length comparison.
Fix: Use BGEW FP(i), FP(len), done.
Symptom: Converting string to []rune crashed.
Cause: Used INDW (array addressing) to read string characters instead
of INDC (string character extraction).
Fix: Use INDC with correct operand order: INDC str, idx, dst.
Symptom: Infinite loop in for-range.
Cause: BGEW FP(len), FP(i), done means "if len >= i goto done" — this
is almost always true (exits immediately or never).
Fix: BGEW FP(i), FP(len), done — "if i >= len goto done".
Symptom: Nil dereference after type asserting empty struct in CommaOk form.
Programs with d, ok := iface.(EmptyStruct) crash when the else branch runs.
Cause: allocStructFields() returns baseSlot which defaults to 0 when the
struct has no fields. Offset 0 in the frame is REGLINK (return address). Writing
the extracted value to 0(fp) corrupts the return address, causing a nil
dereference on function return.
Fix: Allocate a dummy word slot for empty structs in allocStructFields()
so the returned offset is always >= MaxTemp (64), never in the register area.
Symptom: v, ok := x.(string) segfaults when the assertion fails. The
non-match path returned v = 0, but pointer-typed zero values in Dis must be
H (-1). Setting a string slot to 0 causes a GC fault when accessed.
Cause: lowerTypeAssert emitted MOVW $0, FP(dst) for all types on the
non-match path. For pointer types (string, slice, etc.), the Dis zero value is
H = -1, not 0.
Fix: Check dt.IsPtr and emit Imm(-1) for pointer types.
Symptom: v, ok := <-ch returns ok=true after close(ch) when the buffer
is empty. Expected ok=false.
Cause: close() injects a phantom zero value into the buffer via NBALT to
wake blocked receivers. A subsequent commaOk receive on the closed path picks up
this phantom value and reports ok=true because it can't distinguish phantom
zeros from real buffered values.
Fix: Added a buffered value count field at offset 24 in the channel wrapper
(expanded from 24 to 32 bytes). lowerSend increments the count; close() does
not. In emitCloseAwareRecv and lowerChanNext, the closed path checks the count
after NBALT succeeds: count > 0 means a real value (ok=true, decrement count);
count == 0 means a phantom zero (ok=false).
TestE2EPrograms in compiler_test.go compiles each .go file in testdata/,
runs it on the Inferno emulator, and compares stdout to expected output.
type testCase struct {
file string
expected string
}
tests := []testCase{
{"hello.go", "hello, infernode\n"},
{"loop.go", "10\n45\n"},
// ... 170+ more
}The test harness uses context.WithTimeout to kill the emulator after 10
seconds (Inferno's emu doesn't always exit cleanly).
| Category | Count | Description |
|---|---|---|
| Core language | ~40 | Variables, loops, conditionals, functions, methods |
| Data structures | ~20 | Arrays, slices, maps, structs, strings |
| Concurrency | ~15 | Goroutines, channels, select, buffered channels |
| Closures/HOF | ~10 | Closures, higher-order functions, method values |
| Error handling | ~10 | Panic, recover, defer, error interface |
| Type system | ~15 | Interfaces, type assert, type switch, embedding |
| Stdlib | ~15 | fmt, strings, strconv, math, sort, time |
| Real programs | ~22 | Quicksort, sieve, BST, pipeline, calculator, etc. |
| Tier 6 | 18 | Named types, closures, bit ops, nested structs |
| Lang completeness | 8 | &^, goto, labeled break, fallthrough, type aliases, struct embed, chan commaOk, 3-index slice |
| Multi-package | 4 | Multi-file, multi-pkg, chain imports, shared types |
| Benchmarks | 16 | Go vs Limbo performance comparison |
selectrecv.go,map_range.go— non-deterministic output (goroutine ordering)sys*.go— require Inferno-specific file system
cd tools/godis
go test ./compiler/ -count=1 -timeout 120s # all E2E tests
go test ./compiler/ -run TestE2EPrograms -count=1 # single-file tests only
go test ./compiler/ -run TestE2EMultiPackage # multi-package tests
go test ./dis/ -count=1 # bytecode round-trip tests| Metric | Value |
|---|---|
| Total lines of code | ~14,200 |
| Compiler core (excl. tests) | ~9,200 |
| Largest file (lower.go) | 7,019 lines |
| Test code | ~3,500 |
| Dis bytecode library | ~2,000 |
| CLI tools | ~250 |
| E2E test programs | 172+ |
| Multi-package test scenarios | 4 |
| Benchmark programs | 16 |
| Supported Go features | Tiers 1-7 (see Status) |
| Supported Sys functions | 15 |
| Intercepted stdlib packages | 14 (incl. embed, unsafe, math/cmplx) |
| Bugs found and fixed | 28 |
| VM opcodes used | 62+ |
| External dependencies | 1 (golang.org/x/tools) |
Tier 1 — Core Language:
Variables, constants (const/iota), arithmetic, comparisons, loops (for,
for range), conditionals (if/else, switch), functions, multiple return
values, methods (value and pointer receivers), recursive functions.
Tier 2 — Data Structures:
Arrays, slices (make, append, copy, cap, sub-slicing), strings (indexing,
slicing, concatenation, []byte conversion, []rune conversion), structs
(nested, embedded), maps (string and int keys), pointers and heap allocation.
Tier 3 — Concurrency:
Goroutines (go), channels (unbuffered, buffered, directional), select
(blocking, non-blocking with default), channel close, for range over channels,
cap(ch).
Tier 4 — Advanced Features:
Closures (with captured variables), higher-order functions, method values,
defer (including defer with closures), panic/recover, interfaces (single
and multiple dispatch, type assertion, type switch, comma-ok, empty interface),
error interface, init() functions.
Tier 5 — Standard Library:
fmt (Sprintf, Printf, Println, Errorf with 10+ format verbs), strings
(11 functions), strconv (Itoa, Atoi, FormatInt), math (Abs, Sqrt, Min, Max),
errors (New), os (Exit), sort (Ints, Strings), sync (Mutex, WaitGroup,
Once), time (After, Duration), log (Println, Fatal), io (Reader, Writer).
Tier 6 — Additional Coverage: Named type methods, struct embedding, composite interfaces, type assertion with comma-ok, slices/maps of structs, recursive tree structures, named returns, range with index, bit operations, defer with closure captures, directional channels.
Tier 7 — Language Completeness:
Generics (monomorphization via ssa.InstantiateGenerics), complex numbers
(complex64/complex128 with full arithmetic), go:embed (compile-time file
embedding), &^ bit-clear operator, goto, labeled break/continue,
fallthrough, 3-index slicing (a[lo:hi:max]), named return values, type
aliases (type X = Y), string ↔ []rune conversions, method values
(x.Method as closure), struct embedding with promoted methods, multi-value
channel receive (v, ok := <-ch), unsafe.Sizeof.
Systematic probing confirmed that the following features work through SSA
desugaring (no compiler changes needed): goto, labeled break/continue,
fallthrough, 3-index slicing, named return values, type aliases, method
values, struct embedding with promoted methods. The following required explicit
compiler implementation: &^ operator, complex numbers, generics, go:embed,
v, ok := <-ch close detection, type assertion comma-ok pointer zero values.
- No goroutine unblock on close. Goroutines blocked on RECV are not woken when the channel is closed from another goroutine. Close injects a phantom zero to wake one blocked receiver, but this is best-effort.
- No native maps. Maps use sorted-array wrappers, not hash tables.
- Limited float formatting.
%f/%guse Dis CVTFC without precision control. - No reflection.
reflectpackage is not supported. - No cgo. Cannot call C functions.
- Single-binary output. All packages are inlined into one
.disfile; no incremental/separate compilation. - No garbage on stack. Relies on VM's frame initialization for pointer slots; non-pointer slots may contain garbage from previous calls.
- Standard library is stub-only. The 12+ intercepted stdlib packages provide type signatures for compilation but implementations are inlined as Dis instruction sequences, not full Go stdlib implementations.