This repository contains the source code for my work for CS142b at UCI, taught by Michael Franz. This was one of the most enjoyable classes I've taken at UCI with the amount of freedom given to students and difficulty of the task making for an incredibly interesting project.
Within this repository are three directories: tiny, warmup1, and warmup2. The warmup1 and warmup2 directories contain two small warm-up assignments completed at the beginning of the quarter, which the Tiny project was extended from, and so I will not discuss these in detail, but the source code is available and (hopefully) self-explanatory.
The Tiny compiler, implemented as CS142b's main project, is a simple SSA (Single Static Assignment) optimizing compiler with support for graph generation, register allocation, register spilling, and arm64 (macos and linux) assembly code generation. Here's an example of a simple use for this program:
main
var a, b;
function add(x, y);
{
return x + y;
};
{
let a <- call InputNum();
let b <- call InputNum();
call OutputNum(call add(a, b));
call OutputNewLine();
}.
And to run:
$ cargo run -- run test.tiny
...
Running compiled binary (build took 48.308417ms):
> 5
> 6
11Running the Tiny compiler itself should only require a modern version of Rust and Cargo (Tiny was developed with 1.78.0), and cargo test or cargo run -- within the tiny/ directory.
However, unless you're running a device on Linux or Mac arm64 architecture, you will be unable to execute the compiled programs as Tiny only currently supports linux/arm64 and mac/arm64. To run programs generated by the Tiny compiler, please ensure Docker is installed and can virtualize container architectures (preferably QEMU). Then, build and run the Docker container within the tiny/ directory:
docker build . --platform linux/arm64 -t tinyAnd run:
docker run --rm -it -v $(pwd):/usr/src/tiny --platform linux/arm64 tiny [TINY_ARGS (try --help)]To run tests, replace the entrypoint (you can also use bash for an interactive shell):
docker run --rm -it -v $(pwd):/usr/src/tiny --platform linux/arm64 --entrypoint cargo tiny testThe Tiny compiler, when installed with cargo install and invoked as tiny, or invoked through cargo with cargo run -- provides the following CLI interface for building and running Tiny programs:
tiny 0.1.0 by Phillip Cutter
An optimizing compiler for the Tiny language
Usage: tiny [OPTIONS] <COMMAND>
Commands:
build
run
help Print this message or the help of the given subcommand(s)
Options:
--target <TARGET> Platform to generate assembly for [default: auto] [possible values: auto, linux-arm64, apple-arm64]
--disable-strict Disable strict parsing mode (allows for negative numbers)
--disable-cse Perform common enable subexpression elimination
--num-registers <NUM_REGISTERS> Number of registers allowed, min of 3, max of 32 [default: 5]
--graph <filename> Graph output file
--enable-dom-labels Use domination labels in DOT graph visualization
--disable-asm-build Disable assembly compilation
-h, --help Print help
-V, --version Print version
Currently the Tiny compiler has the following features:
- Tokenizer and parser to generate an abstract syntax tree
- Simple verification rules to ensure program is semantically correct
- Negative numbers are supported when using the
--disable-strictCLI flag
- IR (internal representation) generation of SSA instructions for optimizations
- This step automatically performs common subexpression elimination (including commutative operations) and copy propagation
- Optional IR graph generation in DOT graph language
- Generated using
--dot graph_file.dotCLI flag
- Generated using
- Register allocation and spilling for an arbitrary number of available registers
- This works for all tested programs with as few as 3 registers, but uses 5 by default
- The algorithm does not optimize for value moves (across basic block boundaries) or how frequently a value is used within a program when spilling
- This is performed via graph coloring and spilling values to the stack, and coalescing is automatically performed due to the SSA IR architecture
- Assembly source code generation for Apple and Linux ARM64 CPU architectures
- The CLI and tests will automatically build and run Tiny programs end-to-end in order to test the expected output for given inputs
This is a non-exhaustive list of improvements to be made or features to be added, that I would like to address, but have not yet:
- Language Grammar & Parser
- Support for comments with
//or#syntax - Support for string literals for enhanced IO
- More conventional function definitions, ex:
def my_func(int a) -> int { ... } - Improved error messages with line numbers for program verification
- Support for comments with
- Language Runtime
- Support for more types, such as strings, bools, floats, and arrays
- Support for more built-ins:
OutputString,ReadFile, etc. - Support for importing other Tiny source code files
- Support for linking against other language to leverage libraries
- For example, adding FFI for the C, Rust, or C languages would allow Tiny to leverage those languages' libraries
- Register Allocator
- Improvements/bugfixes to support all programs with at least three registers, and possibly just two
- Weighting/scoring functions for variables to spill variables that are used least frequently
- Intelligent register assignment to minimize
MOVinstructions when merging values for aPhiIR instruction
- Assembly Generation
- Support for more target architectures:
wasm,linux/amd64,mac/amd64,windows/amd64 - Support for Cranelift or LLVM IRs to offload Assembly generation + more optimizations
- This could replace core pieces of the compiler, such as the register allocator and intermediate optimizations, but this could be an interesting exercise to dramatically expand the number of target architectures
- Support for more target architectures:
The Tiny language grammar is provided here for reference, though I recommend also viewing test programs in the tiny/tests/programs/ directory to understand what the language can do. Please note that this grammar was created by Michael Franz, thus I do not claim ownership of this grammar nor is it under any license.
letter = "a" | "b" | … | "z" .
digit = "0" | "1" | … | "9" .
relOp = "==" | "!=" | "<" | "<=" | ">" | ">=" .
ident = letter {letter | digit} .
number = digit {digit} .
factor = ident | number | "(" expression ")" | funcCall .
term = factor { ("*" | "/") factor} .
expression = term {("+" | "-") term} .
relation = expression relOp expression .
assignment = "let" ident "<-" expression .
funcCall = "call" ident ["(" [expression { "," expression } ] ")" ] .
ifStatement = "if" relation "then" statSequence [ "else" statSequence ] "fi" .
whileStatement = "while" relation "do" StatSequence "od".
returnStatement = "return" [ expression ] .
statement = assignment | funcCall | ifStatement | whileStatement | returnStatement .
statSequence = statement { ";" statement } [ ";" ] .
varDecl = "var" indent { "," ident } ";" .
funcDecl = [ "void" ] "function" ident formalParam ";" funcBody ";" .
formalParam = "(" [ident { "," ident }] ")" .
funcBody = [ varDecl ] "{" [ statSequence ] "}" .
computation = "main" [ varDecl ] { funcDecl } "{" statSequence "}" "." .The language also supports the following predefined functions:
InputNum()- Returns an integer from the user's input (standard in)OutputNum(int)- Outputs an integer (standard out)OutputNewLine()- Outputs a new line (\nto standard out)
Please note that within this repository are testing programs included from the msof/2024-cs142b GitHub repository created by other fellow students in the class. These are included in order to be automatically tested against when running, but I do not claim to have created or own these test programs. Thank you to these students for their contributions to the class, as they have helped me significantly to improve my compiler. More detail can be found in this README file.
Along these lines, please note that certain files in this repository, specifically those located in the tiny/tests/programs/msof/, are not covered by the repository's MIT license. These files are provided without a license, and all rights are reserved by their respective owners.