Much ado thereabouts
An experiment in hygienic macro expansion.
This project explores hygienic macro expansion in a small lambda calculus language. The core idea is to use syntactic closures — tagging syntax with the environment it was born in — to ensure that macros don't accidentally capture or clobber variable names from the expansion site.
For a walkthrough of the approach and how it evolved check out the accompanying blog post: Scrubbing Up.
We begin with a simple language supporting lambdas, definitions, and syntax
transformation definitions. Plain definitions and lambdas introduce bindings
into the current environment. def-syn definitions however introduce a hygienic
pattern expansion.
form = (lam (SYM ... ) <form>...) ; Lambda definition
| (def SYM <form>) ; Value definition (binding)
| (def-syn id <rule>...) ; Syntax mutation definition (macro binding)
| (quot <form>...) ; Value quotation
| (stx <form>...) ; Syntax quotation
| (<form>...) ; Application. Evaluates a function value.
| atom ; Leaf literal
atom = NUMBER | SYM
rule = (<form> <form>) ; Transformer definition. Binds syntax to the
; first form (the pattern), and expands it
; in the template form.
Programs are transformed initially by the parser from a string into a concrete syntax tree. This tree is then "illuminated" to add syntactic context before macro expansion. We use this source provenance to enforce hygiene with syntactic closures.
Once expansion is complete the macro-free source text can be bound into an
output format ready for use. We aim to perform this in a single bind pass
that takes the illuminated Stx tree and recursively walks it expanding
each node as it goes and then resolving the resulting macro-free syntax into
a Bound program with syntax constructs applied and variables assigned to
their storage locations.
We try to follow the terminology from Bawden & Rees, with a few additions:
- A
nameis a token used to "name" something. This is ourSYMin the grammar. - A
keywordis anameused to introduce a syntactic construct. The keywords supported in our language are:lam,def,def-syn,quot, andstx. - An
identifieris anameused to refer to a variable. - A
variableis the storage location for a specific value. The sameidentifiermay refer to differentvariables depending on the environment. - A
macrois a syntax transformer. - A
syntactic constructis an item with meaning at syntax expansion time, such as akeywordormacro.
When binding there are two "environments" to consider: syntactic and value.
-
The syntactic environment contains the mapping of
namestovariableskeywords, andmacros. It is used during theexpandsection of thebindpass to recognise references tosyntactic constructs, and distinguish the hygienically correctvariablethat anidentifierrefers to. -
The value environment maps variables to storage locations for their values. It is used to resolve the locations that values should be read or written from.
In this toy language our locations are simple named suffixes of the variable name. e.g. an identifier
xmay refer to the storage locationx.1orx.2etc. depending on scope. A production compiler would need to assign these to local variable slots, argument indexes, capture environment locations, or globals.
We have three main representations of the program:
-
CST - The concrete syntax, represented as a Red/Green tree with a typed layer defined in
Syntax. This is a lossless representation of the input of the program. -
STX Tree - The "Illuminated" tree. This is produced by the
illuminatepass and acts as the input and output format for macro expansion inExpand.At this stage we replace some of the sugar from the CST. In a full production compiler this is where quote expressions would be transformed into quote forms amongst other simplifications. The idea here is to have a uniform representation of the syntax for the later passes to work upon.
-
Bound Tree - Resolved representation of the program structure. Bound trees no longer think of the program as a "soup of forms" and instead impose some structure on the items within it.
Although the expansion phase has some understanding of bindings in order to expand compile time elements it is up to the
Bindpass to resolve these references down to canonical storage locations.