Daniel's Blog

Reasons to Love the Field of Programming Languages

Wed, 31 Dec 2025 00:00:00 +0000

I work at HPE on the Chapel Programming Language. Recently, another HPE person asked me:

So, you work on the programming language. What’s next for you?

This caught me off-guard because I hadn’t even conceived of moving on. I don’t want to move on, because I love the field of programming languages. In addition, I have come to think there is something in PL for everyone, from theorists to developers to laypeople. So, in that spirit, I am writing this list as a non-exhaustive survey that holds the dual purpose of explaining my personal infatuation with PL, and providing others with ways to engage with PL that align with their existing interests. I try to provide rationale for each claim, but you can just read the reasons themselves and skip the rest.

My general thesis goes something like this: programming languages are a unique mix of the inherently human and social and the deeply mathematical, a mix that often remains deeply grounded in the practical, low-level realities of our hardware.

Personally, I find all of these properties equally important, but we have to start somewhere. Let’s begin with the human aspect of programming languages.

Human Aspects of PL

Programs must be written for people to read, and only incidentally for machines to execute.

— Abelson & Sussman, Structure and Interpretation of Computer Programs.

As we learn more about the other creatures that inhabit our world, we discover that they are similar to us in ways that we didn’t expect. However, our language is unique to us. It gives us the ability to go far beyond the simple sharing of information: we communicate abstract concepts, social dynamics, stories. In my view, storytelling is our birthright more so than anything else.

I think this has always been reflected in the broader discipline of programming. Code should always tell a story, I’ve heard throughout my education and career. It should explain itself. In paradigms such as literate programming, we explicitly mix prose and code. Notebook technologies like Jupyter intersperse computation with explanations thereof.

Reason 1: programming languages provide the foundation of expressing human thought and stories through code.

From flowery prose to clinical report, human expression takes a wide variety of forms. The need to vary our descriptions is also well-served by the diversity of PL paradigms. From stateful transformations in languages like Python and C++, through pure and immutable functions in Haskell and Lean, to fully declarative statements-of-fact in Nix, various languages have evolved to support the many ways in which we wish to describe our world and our needs.

Reason 2: diverse programming languages enable different perspectives and ways of storytelling, allowing us choice in how to express our thoughts and solve our problems.

Those human thoughts of ours are not fundamentally grounded in logic, mathematics, or anything else. They are a product of millennia of evolution through natural selection, of adaptation to ever-changing conditions. Our cognition is limited, rife with blind spots, and partial to the subject matter at hand. We lean on objects, actors, contracts, and more as helpful, mammal-compatible analogies. I find this to be beautiful; here is something we can really call ours.

Reason 3: programming languages imbue the universe’s fundamental rules of computation with humanity’s identity and idiosyncrasies. They carve out a home for us within impersonal reality.

Storytelling (and, more generally, writing) is not just about communicating with others. Writing helps clarify one’s own thoughts, and to think deeper. In his 1979 Turing Award lecture, Notation as a Tool of Thought, Kenneth Iverson, the creator of APL, highlighted ways in which programming languages, with their notation, can help express patterns and facilitate thinking.

Throughout computing history, programming languages built abstractions that — together with advances in hardware — made it possible to create ever more complex software. Dijkstra’s structured programming crystallized the familiar patterns of if/else and while out of a sea of control flow. Structures and objects partitioned data and state into bundles that could be reasoned about, or put out of mind when irrelevant. Recently, I dare say that notions of ownership and lifetimes popularized by Rust have clarified how we think about memory.

Reason 4: programming languages combat complexity, and give us tools to think and reason about unwieldy and difficult problems.

The fight against complexity occurs on more battlegrounds than PL design alone. Besides its syntax and semantics, a programming language is comprised of its surrounding tooling: its interpreter or compiler, perhaps its package manager or even its editor. Language designers and developers take great care to improve the quality of error messages, to provide convenient editor tooling, and build powerful package managers like Yarn. Thus, in each language project, there is room for folks who, even if they are not particularly interested in grammars or semantics, care about the user experience.

Reason 5: programming languages provide numerous opportunities for thoughtful forays into the realms of User Experience and Human-Computer Interaction.

I hope you agree, by this point, that programming languages are fundamentally tethered to the human. Like any human endeavor, then, they don’t exist in isolation. To speak a language, one usually wants a partner who understands and speaks that same language. Likely, one wants a whole community, topics to talk about, or even a set of shared beliefs or mythologies. This desire maps onto the realm of programming languages. When using a particular PL, you want to talk to others about your code, implement established design patterns, use existing libraries.

I mentioned mythologies earlier. In some ways, language communities do more than share know-how about writing code. In many cases, I think language communities rally around ideals embodied by their language. The most obvious example seems to be Rust. From what I’ve seen, the Rust community believes in language design that protects its users from the pitfalls of low-level programming. The Go community believes in radical simplicity. Julia actively incorporates contributions from diverse research projects into an interoperable set of scientific packages.

Reason 6: programming languages are complex collaborative social projects that have the power to champion innovative ideas within the field of computer science.

So far, I’ve presented interpretations of the field of PL as tools for expression and thought, human harbor to the universe’s ocean, and collaborative social projects. These interpretations coexist and superimpose, but they are only a fraction of the whole. What has kept me enamored with PL is that it blends these human aspects with a mathematical ground truth, through fundamental connections to computation and mathematics.

The Mathematics of PL

Like buses: you wait two thousand years for a definition of “effectively calculable”, and then three come along at once.

— Philip Wadler, Propositions as Types

There are two foundations, lambda calculus and Turing machines, that underpin most modern PLs. The abstract notion of Turing machines is closely related to, and most similar among the “famous” computational models, to the von Neumann Architecture. Through bottom-up organization of “control unit instructions” into “structured programs” into the imperative high-level languages today, we can trace the influence of Turing machines in C++, Python, Java, and many others. At the same time, and running on the same hardware functional programming languages like Haskell represent a chain of succession from the lambda calculus, embellished today with types and numerous other niceties. These two lineages are inseparably linked: they have been mathematically proven to be equivalent. They are two worlds coexisting.

The two foundations have a crucial property in common: they are descriptions of what can be computed. Both were developed initially as mathematical formalisms. They are rooted not only in pragmatic concerns of “what can I do with these transistors?”, but in the deeper questions of “what can be done with a computer?”.

Reason 7: general-purpose programming languages are built on foundations of computation, and wield the power to compute anything we consider “effectively computable at all”.

Because of these mathematical beginnings, we have long had precise and powerful ways to talk about what code written in a particular language means. This is the domain of semantics. Instead of reference implementations of languages (CPython for Python, rustc for Rust), and instead of textual specifications, we can explicitly map constructs in languages either to mathematical objects (denotational semantics) or to (abstractly) execute them (operational semantics).

To be honest, the precise and mathematical nature of these tools is, for me, justification enough to love them. However, precise semantics for languages have real advantages. For one, they allow us to compare programs’ real behavior with what we expect, giving us a “ground truth” when trying to fix bugs or evolve the language. For another, they allow us to confidently make optimizations: if you can prove that a transformation won’t affect a program’s behavior, but make it faster, you can safely use it. Finally, the discipline of formalizing programming language semantics usually entails boiling them down to their most essential components. Stripping the syntax sugar helps clarify how complex combinations of features should behave together.

Some of these techniques bear a noticeable resemblance to the study of semantics in linguistics. Given our preceding discussion on the humanity of programming languages, perhaps that’s not too surprising.

Reason 8: programming languages can be precisely formalized, giving exact, mathematical descriptions of how they should work.

In talking about how programs behave, we run into an important limitation of reasoning about Turing machines and lambda calculus, stated precisely in Rice’s theorem: all non-trivial semantic properties of programs (termination, throwing errors) are undecidable. There will always be programs that elude not only human analysis, but algorithmic understanding.

It is in the context of this constraint that I like to think about type systems. The beauty of type systems, to me, is in how they tame the impossible. Depending on the design of a type system, a well-typed program may well be guaranteed not to produce any errors, or produce only the “expected” sort of errors. By constructing reasonable approximations of program behavior, type systems allow us to verify that programs are well-behaved in spite of Rice’s theorem. Much of the time, too, we can do so in a way that is straightforward for humans to understand and machines to execute.

Reason 9: in the face of the fundamentally impossible, type systems pragmatically grant us confidence in our programs for surprisingly little conceptual cost.

At first, type systems look like engineering formalisms. That may well be the original intention, but in our invention of type systems, we have actually completed a quadrant of a deeper connection: the Curry-Howard isomorphism. Propositions, in the logical sense, correspond one-to-one with types of programs, and proofs of these propositions correspond to programs that have the matching type.

This is an incredibly deep connection. In adding parametric polymorphism to a type system (think Java generics, or C++ templates without specialization), we augment the corresponding logic with the “for all x” ($\forall x$) quantifier. Restrict the copying of values in a way similar to Rust, and you get an affine logic, capable of reasoning about resources and their use. In languages like Agda with dependent types, you get a system powerful enough to serve as a foundation for mathematics. Suddenly, you can write code and mathematically prove properties about that code in the same language. I’ve done this in my work with formally-verified static program analysis.

This connection proves appealing even from the perspective of “regular” mathematics. We have developed established engineering practices for writing code: review, deployment, documentation. What if we could use the same techniques for doing mathematics? What if, through the deep connection of programming languages to logic, we could turn mathematics into a computer-verified, collaborative endeavor? I therefore present:

Reason 10: type systems for programming languages deeply correspond to logic, allowing us to mathematically prove properties about code, using code, and to advance mathematics through the practices of software engineering.

Bonus meta-reason to love the mathy side of PL!

In addition to the theoretical depth, I also find great enjoyment in the way that PL is practiced. Here more than elsewhere, creativity and artfulness come into play. In PL, inference rules are a lingua franca through which the formalisms I’ve mentioned above are expressed and shared. They are such a central tool in the field that I’ve developed a system for exploring them interactively on this blog.

In me personally, inference rules spark joy. They are a concise and elegant way to do much of the formal heavy-lifting I described in this section; we use them for operational semantics, type systems, and sometimes more. When navigating the variety and complexity of the many languages and type systems out there, we can count on inference rules to take us directly to what we need to know. This same variety naturally demands flexibility in how rules are constructed, and what notation is used. Though this can sometimes be troublesome (one paper I’ve seen describes 27 different ways of writing the simple operation of substitution in literature!), it also creates opportunities for novel and elegant ways of formalizing PL.

Bonus Reason: the field of programming languages has a standard technique for expressing its formalisms, which precisely highlights core concepts and leaves room for creative expression and elegance.

I know that mathematics is a polarizing subject. Often, I find myself torn between wanting precision and eschewing overzealous formalism. The cusp between the two is probably determined by my own tolerance for abstraction. Regardless of how much abstraction you are interested in learning about, PL has another dimension, close to the ground: more often than not, our languages need to execute on real hardware.

Pragmatics of PL

Your perfectly-designed language can be completely useless if there is no way to execute it [note: Technically, there are language that don't care if you execute them at all. Many programs in theorem-proving languages like Agda and Rocq exist only to be type-checked. So, you could nitpick this claim; or, you could take it more generally: your language can be useless if there's no way to make it efficiently do what it's been made to do. ] efficiently. Thus, the field of PL subsumes not only the theoretical foundations of languages and their human-centric design; it includes also their realization as software.

The overall point of this section is that there is much depth to the techniques involved in bringing a programming language to life. If you are a tinkerer or engineer at heart, you will never run out of avenues of exploration. The reasons are all framed from this perspective.

One fascinating aspect to programming languages is the “direction” from which they have grown. On one side, you have languages that came together from the need to control and describe hardware. I’d say that this is the case for C and C++, Fortran, and others. More often than not, these languages are compiled to machine code. Still subject to human constraints, these languages often evolve more user-facing features as time goes on. On the other side, you have languages developed to enable people to write software, later faced constraints of actually working efficiently. These are languages like Python, Ruby, and JavaScript. These languages are often interpreted (executed by a dedicated program), with techniques such as just-in-time compilation. There is no one-size-fits-all way to execute a language, and as a result,

Reason 11: the techniques of executing programming languages are varied and rich. From compilation, to JIT, to interpretation, the field has many sub-disciplines, each with its own know-hows and tricks.

At the same time, someone whose goal is to actually develop a compiler likely doesn’t want to develop everything from scratch. To do so would be a daunting task, especially if you want the compiler to run beyond the confines of a personal machine. CPU architectures and operating system differences are hard for any individual to keep up with. Fortunately, we have a gargantuan ongoing effort in the field: the LLVM Project. LLVM spans numerous architectures and targets, and has become a common back-end for languages like C++ (via Clang), Swift, and Rust. LLVM helps share and distribute the load of keeping up with the ongoing march of architectures and OSes. It also provides a shared playground upon which to experiment with language implementations, optimizations, and more.

Reason 12: large projects like LLVM enable language designers to lean on decades of precedent to develop a compiler for their language.

Though LLVM is powerful, it does not automatically grant languages implemented with it good performance. In fact, no other tool does. To make a language run fast requires a deep understanding of the language itself, the hardware upon which it runs, and the tools used to execute it. That is a big ask! Modern computers are extraordinarily complex. Techniques such as out-of-order execution, caching, and speculative execution are constantly at play. This means that any program is subject to hard-to-predict and often unintuitive effects. On top of that, depending on your language’s capabilities, performance work can often entail working with additional hardware, such as GPUs and NICs, which have their own distinct performance characteristics. This applies both to compiled and interpreted languages. Therefore, I give you:

Reason 13: improving the performance of a programming language is rife with opportunities to engage with low-level details of the hardware and operating system.

In the mathematics section, we talked about how constructing correct optimizations requires an understanding of the language’s semantics. It was one of the practical uses for having a mathematical definition of a language. Reason 13 is where that comes in, but the synthesis is not automatic. In fact, a discipline sits in-between defining how a language behaves and optimizing programs: program analysis. Algorithms that analyze properties of programs such as reaching definitions enable optimizations such as loop-invariant code motion, which can have very significant performance impact. At the same time, for an analysis to be correct, it must be grounded in the program’s mathematical semantics. There are many fascinating techniques in this discipline, including ones that use lattice theory.

Reason 14: the sub-discipline of program analysis serves as a grounded application of PL theory to PL practice, enabling numerous optimizations and transformations.

The programs your compiler generates are software, and, as we just saw, may need to be tweaked for performance. But the compiler and/or interpreter is itself a piece of software, and its own performance. Today’s language implementations are subject to demands that hadn’t been there historically. For instance, languages are used to provide language servers to enable editors to give users deeper insights into their code. Today, a language implementation may be called upon every keystroke to provide a typing user live updates. This has led to the introduction of techniques like the query architecture (see also salsa) to avoid redundant work and re-used intermediate results. New language implementations like that of Carbon are exploring alternative representations of programs in memory. In short,

Reason 15: language implementations are themselves pieces of software, subject to unique constraints and requiring careful and innovative engineering.

Conclusion

I’ve now given a tour of ways in which I found the PL field compelling, organized across three broad categories. There is just one more reason I’d like to share.

I was 16 years old when I got involved with the world of programming languages and compilers. Though I made efforts to learn about it through literature (the Dragon Book, and Modern Compiler Design), I simply didn’t have the background to find these resources accessible. However, all was not lost. The PL community online has been, and still is, a vibrant and enthusiastic place. I have found it to be welcoming of folks with backgrounds spanning complete beginners and experts alike. Back then, it gave me accessible introductions to anything I wanted. Now, every week I see new articles go by that challenge my intuitions, teach me new things, or take PL ideas to absurd and humorous extremes. So, my final reason:

Reason 16: the programming languages community is full of brilliant, kind, welcoming and enthusiastic people, who dedicate much of their time to spreading the joy of the field.

I ❤️ the programming languages community.

Chapel's Runtime Types as an Interesting Alternative to Dependent Types

Sun, 02 Mar 2025 22:52:01 -0800

One day, when I was in graduate school, the Programming Languages research group was in a pub for a little gathering. Amidst beers, fries, and overpriced sandwiches, the professor and I were talking about dependent types. Speaking loosely and imprecisely, these are types that are somehow constructed from values in a language, like numbers.

For example, in C++, std::array is a dependent type. An instantiation of the type array, like array<string, 3> is constructed from the type of its elements (here, string) and a value representing the number of elements (here, 3). This is in contrast with types like std::vector, which only depends on a type (e.g., vector<string> would be a dynamically-sized collection of strings).

I was extolling the virtues of general dependent types, like you might find in Idris or Agda: more precise function signatures! The Curry-Howard isomorphism! [note: The Curry-Howard isomorphism is a common theme on this blog. I've written about it myself, but you can also take a look at the Wikipedia page. ] The professor was skeptical. He had been excited about dependent types in the past, but nowadays he felt over them. They were cool, he said, but there are few practical uses. In fact, he posed a challenge:

Give me one good reason to use dependent types in practice that doesn’t involve keeping track of bounds for lists and matrices!

This challenge alludes to fixed-length lists – vectors – which are one of the first dependently-typed data structures one learns about. Matrices are effectively vectors-of-vectors. In fact, even in giving my introductory example above, I demonstrated the C++ equivalent of a fixed-length list, retroactively supporting the professor’s point.

It’s not particularly important to write down how I addressed the challenge; suffice it to say that the notion resonated with some of the other students present in the pub. In the midst of practical development, how much of dependent types’ power can you leverage, and how much power do you pay for but never use?

A second round of beers arrived. The argument was left largely unresolved, and conversation flowed to other topics. Eventually, I graduated, and started working on the Chapel language team (I also write on the team’s blog).

When I started looking at Chapel programs, I could not believe my eyes…

A Taste of Chapel’s Array Types

Here’s a simple Chapel program that creates an array of 10 integers.

var A: [0..9] int;

Do you see the similarity to the std::array example above? Of course, the syntax is quite different, but in essence I think the resemblance is uncanny. Let’s mangle the type a bit — producing invalid Chapel programs — just for the sake of demonstration.

var B: array(0..9, int); // first, strip the syntax sugar
var C: array(int, 0..9); // swap the order of the arguments to match C++

Only one difference remains: in C++, arrays are always indexed from zero. Thus, writing array<int, 10> would implicitly create an array whose indices start with 0 and end in 9. In Chapel, array indices can start at values other than zero (it happens to be useful for elegantly writing numerical programs), so the type explicitly specifies a lower and a higher bound. Other than that, though, the two types look very similar.

In general, Chapel arrays have a domain, typically stored in variables like D. The domain of A above is {0..9}. This domain is part of the array’s type.

Before I move on, I’d like to pause and state a premise that is crucial for the rest of this post: I think knowing the size of a data structure, like std::array or Chapel’s [0..9] int, is valuable. If this premise were not true, there’d be no reason to prefer std::array to std::vector, or care that Chapel has indexed arrays. However, having this information can help in numerous ways, such as:

Enforcing compatible array shapes. For instance, the following Chapel code would require two arrays passed to function foo to have the same size.
```
proc doSomething(people: [?D] person, data: [D] personInfo) {}
```
Similarly, we can enforce the fact that an input to a function has the same shape as the output:
```
proc transform(input: [?D] int): [D] string;
```
Consistency in generics. Suppose you have a generic function that declares a new variable of a given type, and just returns it:
```
proc defaultValue(type argType) {
  var x: argType;
  return x;
}
```
Code like this exists in “real” Chapel software, by the way — the example is not contrived. By including the bounds etc. into the array type, we can ensure that x is appropriately allocated. Then, defaultValue([1,2,3].type) would return an array of three default-initialized integers.
Eliding boundary checking. Boundary checking is useful for safety, since it ensures that programs don’t read or write past the end of allocated memory. However, bounds checking is also slow. Consider the following function that sums two arrays:
```
proc sumElementwise(A: [?D] int, B: [D] int) {
  var C: [D] int;
  for idx in D do
    C[idx] = A[idx] + B[idx];
}
```
Since arrays A, B, and C have the same domain D, we don’t need to do bound checking when accessing any of their elements. I don’t believe this is currently an optimisation in Chapel, but it’s certainly on the table.
Documentation. Including the size of the array as part of type signature clarifies the intent of the code being written. For instance, in the following function:
```
proc sendEmails(numEmails: int, destinationAddrs: [1..numEmails] address) { /* ... */ }
```
It’s clear from the type of the destinationAddrss that there ought to be exactly as many destinationAddrs as the number of emails that should be sent.

Okay, recap: C++ has std::array, which is a dependently-typed container that represents an array with a fixed number of elements. Chapel has something similar. I think these types are valuable.

At this point, it sort of looks like I’m impressed with Chapel for copying a C++ feature from 2011. Not so! As I played with Chapel programs more and more, arrays miraculously supported patterns that I knew I couldn’t write in C++. The underlying foundation of Chapel’s array types is quite unlike any other. Before we get to that, though, let’s take a look at how dependent types are normally used (by us mere mortal software engineers).

Difficulties with Dependent Types

Let’s start by looking at a simple operation on fixed-length lists: reversing them. One might write a reverse function for “regular” lists, ignoring details like ownership, copying, that looks like this:

std::vector<int> reverse(std::vector<int>);

This function is not general: it won’t help us reverse lists of strings, for instance. The “easy fix” is to replace int with some kind of placeholder that can be replaced with any type.

std::vector<T> reverse(std::vector<T>);

You can try compiling this code, but you will immediately run into an error. What the heck is T? Normally, when we name a variable, function, or type (e.g., by writing vector, reverse), we are referring to its declaration somewhere else. At this time, T is not declared anywhere. It just “appears” in our function’s type. To fix this, we add a declaration for T by turning reverse into a template:

template <typename T>
std::vector<T> reverse(std::vector<T>);

The new reverse above takes two arguments: a type and a list of values of that type. So, to really call this reverse, we need to feed the type of our list’s elements into it. This is normally done automatically (in C++ and otherwise) but under the hood, invocations might look like this:

reverse<int>({1,2,3});              // produces 3, 2, 1
reverse<string>({"world", "hello"}) // produces "hello", "world"

This is basically what we have to do to write reverse on std::array, which, includes an additional parameter that encodes its length. We might start with the following (using n as a placeholder for length, and observing that reversing an array doesn’t change its length):

std::array<T, n> reverse(std::array<T, n>);

Once again, to make this compile, we need to add template parameters for T and n.

template <typename T, size_t n>
std::array<T, n> reverse(std::array<T, n>);

Now, you might be asking…

This section is titled "Difficulties with Dependent Types". What's the difficulty?

Well, here’s the kicker. C++ templates are a compile-time mechanism. As a result, arguments to template (like T and n) must be known when the program is being compiled. This, in turn, means the following program doesn’t work: [note: The observant reader might have noticed that one of the Chapel programs we saw above, sendEmails, does something similar. The numEmails argument is used in the type of the destinationAddrs parameter. That program is valid Chapel. ]

void buildArray(size_t len) {
  std::array<int, len> myArray;
  // do something with myArray
}

You can’t use these known-length types like std::array with any length that is not known at compile-time. But that’s a lot of things! If you’re reading from an input file, chances are, you don’t know how big that file is. If you’re writing a web server, you likely don’t know the length the HTTP requests. With every setting a user can tweak when running your code, you sacrifice the ability to use templated types.

Also, how do you return a std::array? If the size of the returned array is known in advance, you just list that size:

std::array<int, 10> createArray();

If the size is not known at compile-time, you might want to do something like the following — using an argument n in the type of the returned array — but it would not compile:

auto computeNNumbers(size_t n) -> std::array<int, n>; // not valid C++

Moreover, you actually can’t use createArray to figure out the required array size, and then return an array that big, even if in the end you only used compile-time-only computations in the body of createArray. What you would need is to provide a “bundle” of a value and a type that is somehow built from that value.

// magic_pair is invented syntax, will not even remotely work
auto createArray() -> magic_pair<size_t size, std::array<int, size>>;

This pair contains a size (suppose it’s known at compilation time for the purposes of appeasing C++) as well as an array that uses that size as its template argument. This is not real C++ – not even close – but such pairs are a well-known concept. They are known as dependent pairs, or, if you’re trying to impress people, $\Sigma$-types. In Idris, you could write createArray like this:

createArray : () -> (n : Nat ** Vec n Int)

There are languages out there – that are not C++, alas – that support dependent pairs, and as a result make it more convenient to use types that depend on values. Not only that, but a lot of these languages do not force dependent types to be determined at compile-time. You could write that coveted readArrayFromFile function:

readArrayFromFile : String -> IO (n : Nat ** Vec n String)

Don’t mind IO; in pure languages like Idris, this type is a necessity when interacting when reading data in and sending it out. The key is that readArrayFromFile produces, at runtime, a pair of n, which is the size of the resulting array, and a Vec of that many Strings (e.g., one string per line of the file).

Dependent pairs are cool and very general. However, the end result of types with bounds which are not determined at compile-time is that you’re required to use dependent pairs. Thus, you must always carry the array’s length together with the array itself.

The bottom line is this:

In true dependently typed languages, a type that depends on a value (like Vec in Idris) lists that value in its type. When this value is listed by referring to an identifier — like n in Vec n String above — this identifier has to be defined somewhere, too. This necessitates dependent pairs, in which the first element is used syntactically as the “definition point” of a type-level value. For example, in the following piece of code:
```
(n : Nat ** Vec n String)
```
The n : Nat part of the pair serves both to say that the first element is a natural number, and to introduce a variable n that refers to this number so that the second type (Vec n String) can refer to it.

A lot of the time, you end up carrying this extra value (bound to n above) with your type.
In more mainstream languages, things are even more restricted: dependently typed values are a compile-time property, and thus, cannot be used with runtime values like data read from a file, arguments passed in to a function, etc..

Hiding Runtime Values from the Type

Let’s try to think of ways to make things more convenient. First of all, as we saw, in Idris, it’s possible to use runtime values in types. Not only that, but Idris is a compiled language, so presumably we can compile dependently typed programs with runtime-enabled dependent types. The trick is to forget some information: turn a vector Vec n String into two values (the size of the vector and the vector itself), and forget – for the purposes of generating code – that they’re related. Whenever you pass in a Vec n String, you can compile that similarly to how you’d compile passing in a Nat and List String. Since the program has already been type checked, you can be assured that you don’t encounter cases when the size and the actual vector are mismatched, or anything else of that nature.

Additionally, you don’t always need the length of the vector at all. In a good chunk of Idris code, the size arguments are only used to ensure type correctness and rule out impossible cases; they are never accessed at runtime. As a result, you can erase the size of the vector altogether. In fact, Idris 2 leans on Quantitative Type Theory to make erasure easier.

At this point, one way or another, we’ve “entangled” the vector with a value representing its size:

When a vector of some (unknown, but fixed) length needs to be produced from a function, we use dependent pairs.
Even in other cases, when compiling, we end up treating a vector as a length value and the vector itself.

Generally speaking, a good language design practice is to hide extraneous complexity, and to remove as much boilerplate as necessary. If the size value of a vector is always joined at the hip with the vector, can we avoid having to explicitly write it?

This is pretty much exactly what Chapel does. It allows explicitly writing the domain of an array as part of its type, but doesn’t require it. When you do write it (re-using my original snippet above):

var A: [0..9] int;

What you are really doing is creating a value (the range 0..9), and entangling it with the type of A. This is very similar to what a language like Idris would do under the hood to compile a Vec, though it’s not quite the same.

At the same time, you can write code that omits the bounds altogether:

proc processArray(A: [] int): int;
proc createArray(): [] int;

In all of these examples, there is an implicit runtime value (the bounds) that is associated with the array’s type. However, we are never forced to explicitly thread through or include a size. Where reasoning about them is not necessary, Chapel’s domains are hidden away. Chapel refers to the implicitly present value associated with an array type as its runtime type.

I hinted earlier that things are not quite the same in this representation as they are in my simplified model of Idris. In Idris, as I mentioned earlier, the values corresponding to vectors’ indices can be erased if they are not used. In Chapel, this is not the case — a domain always exists at runtime. At the surface level, this means that you may pay for more than what you use. However, domains enable a number of interesting patterns of array code. We’ll get to that in a moment; first, I want to address a question that may be on your mind:

At this point, this looks just like keeping a .length field as part of the array value. Most languages do this. What's the difference between this and Chapel's approach?

This is a fair question. The key difference is that the length exists even if an array does not. The following is valid Chapel code (re-using the defaultValue snippet above):

proc defaultValue(type argType) {
  var x: argType;
  return x;
}

proc doSomething() {
  type MyArray = [1..10] int;
  var A = defaultValue(MyArray);
}

Here, we created an array A with the right size (10 integer elements) without having another existing array as a reference. This might seem like a contrived example (I could’ve just as well written var A: [1..10] int), but the distinction is incredibly helpful for generic programming. Here’s a piece of code from the Chapel standard library, which implements a part of Chapel’s reduction support:

)" data-file-path="modules/internal/ChapelReduce.chpl">

From ChapelReduce.chpl, around line 146

    inline proc identity {
      var x: chpl__sumType(eltType); return x;
    }

Identity elements are important when performing operations like sums and products, for many reasons. For one, they tell you what the sum (e.g.) should be when there are no elements at all. For another, they can be used as an initial value for an accumulator. In Chapel, when you are performing a reduction, there is a good chance you will need several accumulators — one for each thread performing a part of the reduction.

That identity function looks almost like defaultValue! Since it builds the identity element from the type, and since the type includes the array’s dimensions, summing an array-of-arrays, even if it’s empty, will produce the correct output.

type Coordinate = [1..3] real;

var Empty: [0..<0] Coordinate;
writeln(+ reduce Empty); // sum up an empty list of coordinates

As I mentioned before, having the domain be part of the type can also enable indexing optimizations — without any need for interprocedural analysis — in functions like sumElementwise:

proc sumElementwise(A: [?D] int, B: [D] int) {
  var C: [D] int;
  for idx in D do
    C[idx] = A[idx] + B[idx];
}

The C++ equivalent of this function – using vectors to enable arbitrary-size lists of numbers read from user input, and .at to enable bounds checks — does not include enough information for this optimization to be possible.

void sumElementwise(std::vector<int> A, std::vector<int> B) {
  std::vector<int> C(A.size());

  for (size_t i = 0; i < A.size(); i++) {
    C.at(i) = A.at(i) + B.at(i);
  }
}

All in all, this makes for a very interesting mix of features:

Chapel arrays have their bounds as part of types, like std::array in C++ and Vec in Idris. This enables all the benefits I’ve described above.
The bounds don’t have to be known at compile-time, like all dependent types in Idris. This means you can read arrays from files (e.g.) and still reason about their bounds as part of the type system.
Domain information can be hidden when it’s not used, and does not require explicit additional work like template parameters or dependent pairs.

Most curiously, runtime types only extend to arrays and domains. In that sense, they are not a general purpose replacement for dependent types. Rather, they make arrays and domains special, and single out the exact case my professor was talking about in the introduction. Although at times I’ve twisted Chapel’s type system in unconventional ways to simulate dependent types, rarely have I felt a need for them while programming in Chapel. In that sense — and in the “practical software engineering” domain — I may have been proven wrong.

Pitfalls of Runtime Types

Should all languages do things the way Chapel does? I don’t think so. Like most features, runtime types like that in Chapel are a language design tradeoff. Though I’ve covered their motivation and semantics, perhaps I should mention the downsides.

The greatest downside is that, generally speaking, types are not always a compile-time property. We saw this earlier with MyArray:

type MyArray = [1..10] int;

Here, the domain of MyArray (one-dimensional with bounds 1..10) is a runtime value. It has an execution-time cost. [note: The execution-time cost is, of course, modulo dead code elimination etc.. If my snippet made up the entire program being compiled, the end result would likely do nothing, since MyArray isn't used anywhere. ] Moreover, types that serve as arguments to functions (like argType for defaultValue), or as their return values (like the result of chpl__sumType) also have an execution-time backing. This is quite different from most compiled languages. For instance, in C++, templates are “stamped out” when the program is compiled. A function with a typename T template parameter called with type int, in terms of generated code, is always the same as a function where you search-and-replaced T with int. This is called monomorphization, by the way. In Chapel, however, if the function is instantiated with an array type, it will have an additional parameter, which represents the runtime component of the array’s type.

The fact that types are runtime entities means that compile-time type checking is insufficient. Take, for instance, the above sendEmails function:

proc sendEmails(numEmails: int, destinationAddrs: [1..numEmails] address) { /* ... */ }

Since numEmails is a runtime value (it’s a regular argument!), we can’t ensure at compile-time that a value of some array matches the [1..numEmails] address type. As a result, Chapel defers bounds checking to when the sendEmails function is invoked.

This leads to some interesting performance considerations. Take two Chapel records (similar to structs in C++) that simply wrap a value. In one of them, we provide an explicit type for the field, and in the other, we leave the field type generic.

record R1 { var field: [1..10] int; }
record R2 { var field; }

var A = [1,2,3,4,5,6,7,8,9,10];
var r1 = new R1(A);
var r2 = new R2(A);

In a conversation with a coworker, I learned that these are not the same. That’s because the record R1 explicitly specifies a type for field. Since the type has a runtime component, the constructor of R1 will actually perform a runtime check to ensure that the argument has 10 elements. R2 will not do this, since there isn’t any other type to check against.

Of course, the mere existence of an additional runtime component is a performance consideration. To ensure that Chapel programs perform as well as possible, the Chapel standard library attempts to avoid using runtime components wherever possible. This leads to a distinction between a “static type” (known at compile-time) and a “dynamic type” (requiring a runtime value). The chpl__sumType function we saw mentioned above uses static components of types, because we don’t want each call to + reduce to attempt to run a number of extraneous runtime queries.

Conclusion

Though runtime types are not a silver bullet, I find them to be an elegant middle-ground solution to the problem of tracking array bounds. They enable optimizations, generic programming, and more, without the complexity of a fully dependently-typed language. They are also quite unlike anything I’ve seen in any other language.

What’s more, this post only scratches the surface of what’s possible using arrays and domains. Besides encoding array bounds, domains include information about how an array is distributed across several nodes (see the distributions primer), and how it’s stored in memory (see the sparse computations section of the recent 2.3 release announcement). In general, they are a very flavorful component to Chapel’s “special sauce” as a language for parallel computing.

You can read more about arrays and domains in the corresponding primer.

Implementing and Verifying "Static Program Analysis" in Agda, Part 9: Verifying the Forward Analysis

Wed, 25 Dec 2024 19:00:00 -0800

In the previous post, we put together a number of powerful pieces of machinery to construct a sign analysis. However, we still haven’t verified that this analysis produces correct results. For the most part, we already have the tools required to demonstrate correctness; the most important one is the validity of our CFGs relative to the semantics of the little language.

High-Level Algorithm

We’ll keep working with the sign lattice as an example, keeping in mind how what we do generalizes to a any lattice $L$ describing a variable’s state. The general shape of our argument will be as follows, where I’ve underlined and numbered assumptions or aspects that we have yet to provide.

Our fixed-point analysis from the previous section gave us a result $r$ that satisfies the following equation:
$$ r = \text{update}(\text{join}(r)) $$
Above $\text{join}$ applies the predecessor-combining function from the previous post to each state (corresponding to joinAll in Agda) and $\text{update}$ performs one round of abstract interpretation.
Because of the correspondence of our semantics and CFGs, each program evaluation in the form $\rho, s \Rightarrow \rho'$ corresponds to a path through the Control Flow Graph. Along the path, each node contains simple statements, which correspond to intermediate steps in evaluating the program. These will also be in the form $\rho_1, b \Rightarrow \rho_2$.
We will proceed iteratively, stepping through the trace one basic block at a time. At each node in the graph:
- We will assume that the beginning state (the variables in $\rho_1$) are correctly described 1 by one of the predecessors of the current node. Since joining represents "or" 2, that is the same as saying that $\text{join}(r)$ contains an accurate description of $\rho_1$.
- Because the abstract interpretation function preserves accurate descriptions 3, if $\text{join}(r)$ contains an accurate description $\rho_1$, then applying our abstract interpretation function via $\text{update}$ should result in a map that contains an accurate-described $\rho_2$. In other words, $\text{update}(\text{join}(r))$ describes $\rho_2$ at the current block. By the equation above 4, that’s the same as saying $r$ describes $\rho_2$ at the current block.
- Since the trace is a path through a graph, there must be an edge from the current basic block to the next. This means that the current basic block is a predecessor of the next one. From the previous point, we know that $\rho_2$ is accurately described by this predecessor, fulfilling our earlier assumption and allowing us to continue iteration.

So, what are the missing pieces?

We need to define what it means for a lattice (like our sign lattice) to “correctly describe” what happens when evaluating a program for real. For example, the $+$ in sign analysis describes values that are bigger than zero, and a map like {x:+} states that x can only take on positive values.
We’ve seen before the $(\sqcup)$ operator models disjunction (“A or B”), but that was only an informal observation; we’ll need to specify it preceisely.
Each analysis provides an abstract interpretation eval function. However, until now, nothing has formally constrained this function; we could return $+$ in every case, even though that would not be accurate. We will need, for each analysis, a proof that its eval preserves accurate descriptions.
The equalities between our lattice elements are actually equivalences, which helps us use simpler representations of data structures. Thus, even in statements of the fixed point algorithm, our final result is a value $a$ such that $a \approx f(a)$. We need to prove that our notion of equivalent lattice elements plays nicely with correctness.

Let’s start with the first bullet point.

A Formal Definition of Correctness

When a variable is mapped to a particular sign (like { "x": + }), what that really says is that the value of x is greater than zero. Recalling from the post about our language’s semantics that we use the symbol $\rho$ to represent mappings of variables to their values, we might write this claim as:

$$ \rho(\texttt{x}) > 0 $$

This is a good start, but it’s a little awkward defining the meaning of “plus” by referring to the context in which it’s used (the { "x": ... } portion of the expression above). Instead, let’s associate with each sign (like $+$) a predicate: a function that takes a value, and makes a claim about that value (“this is positive”):

$$ \llbracket + \rrbracket\ v = v > 0 $$

The notation above is a little weird unless you, like me, have a background in Programming Language Theory (❤️). This comes from denotational semantics; generally, one writes:

$$ \llbracket \text{thing} \rrbracket = \text{the meaning of the thing} $$

Where $\llbracket \cdot \rrbracket$ is really a function (we call it the semantic function) that maps things to their meaning. Then, the above equation is similar to the more familiar $f(x) = x+1$: function and arguments on the left, definition on the right. When the “meaning of the thing” is itself a function, we could write it explicitly using lambda-notation:

$$ \llbracket \text{thing} \rrbracket = \lambda x.\ \text{body of the function} $$

Or, we could use the Haskell style and write the new variable on the left of the equality:

$$ \llbracket \text{thing} \rrbracket\ x = \text{body of the function} $$

That is precisely what I’m doing above with $\llbracket + \rrbracket$. With this in mind, we could define the entire semantic function for the sign lattice as follows:

$$ \llbracket + \rrbracket\ v = v\ \texttt{>}\ 0 \\ \llbracket 0 \rrbracket\ v = v\ \texttt{=}\ 0 \\ \llbracket - \rrbracket\ v = v\ \texttt{<}\ 0 \\ \llbracket \top \rrbracket\ v = \text{true} \\ \llbracket \bot \rrbracket\ v = \text{false} $$

In Agda, the integer type already distinguishes between “negative natural” or “positive natural” cases, which made it possible to define the semantic function without using inequalities. [note: Reasoning about inequalities is painful, sometimes requiring a number of lemmas to arrive at a result that is intuitively obvious. Coq has a powerful tactic called lia that automatically solves systems of inequalities, and I use it liberally. However, lacking such a tactic in Agda, I would like to avoid inequalities if they are not needed. ]

From Sign.agda, lines 114 through 119

⟦_⟧ᵍ : SignLattice → Value → Set
⟦_⟧ᵍ ⊥ᵍ _ = ⊥
⟦_⟧ᵍ ⊤ᵍ _ = ⊤
⟦_⟧ᵍ [ + ]ᵍ v = Σ ℕ (λ n → v ≡ ↑ᶻ (+_ (suc n)))
⟦_⟧ᵍ [ 0ˢ ]ᵍ v = v ≡ ↑ᶻ (+_ zero)
⟦_⟧ᵍ [ - ]ᵍ v = Σ ℕ (λ n → v ≡ ↑ᶻ -[1+ n ])

Notably, $\llbracket \top \rrbracket\ v$ always holds, and $\llbracket \bot \rrbracket\ v$ never does. In general, we will always need to define a semantic function for whatever lattice we are choosing for our analysis.

It’s important to remember from the previous post that the sign lattice (or, more generally, our lattice $L$) is only a component of the lattice we use to instantiate the analysis. We at least need to define what it means for the $\text{Variable} \to \text{Sign}$ portion of that lattice to be correct. This way, we’ll have correctness criteria for each key (CFG node) in the top-level $\text{Info}$ lattice. Since a map from variables to their sign characterizes not a single value $v$ but a whole environment $\rho$, something like this is a good start:

$$ \llbracket \texttt{\{} x_1: s_1, ..., x_n: s_n \texttt{\}} \rrbracket\ \rho = \llbracket s_1 \rrbracket\ \rho(x_1)\ \text{and}\ ...\ \text{and}\ \llbracket s_n \rrbracket\ \rho(x_n) $$

As a concrete example, we might get:

$$ \llbracket \texttt{\{} \texttt{x}: +, \texttt{y}: - \texttt{\}} \rrbracket\ \rho = \rho(\texttt{x})\ \texttt{>}\ 0\ \text{and}\ \rho(\texttt{y})\ \texttt{<}\ 0 $$

This is pretty good, but not quite right. For instance, the initial state of the program — before running the analysis — assigns $\bot$ to each element. This is true because our fixed-point algorithm starts with the least element of the lattice. But even for a single-variable map {x: ⊥ }, the semantic function above would give:

$$ \llbracket \texttt{\{} \texttt{x}: \bot \texttt{\}} \rrbracket\ \rho = \text{false} $$

That’s clearly not right: our initial state should be possible, lest the entire proof be just a convoluted ex falso!

There is another tricky aspect of our analysis, which is primarily defined using the join ($\sqcup$) operator. Observe the following example:

// initial state: { x: ⊥ }
if b {
  x = 1; // state: { x: + }
} else {
  // state unchanged: { x: ⊥ }
}
// state: { x: + } ⊔ { x: ⊥ } = { x: + }

Notice that in the final state, the sign of x is +, even though when b is false, the variable is never set. In a simple language like ours, without variable declaration points, this is probably the best we could hope for. The crucial observation, though, is that the oddness only comes into play with variables that are not set. In the “initial state” case, none of the variables have been modified; in the else case of the conditional, x was never assigned to. We can thus relax our condition to an if-then: if a variable is in our environment $\rho$, then the variable-sign lattice’s interpretation accurately describes it.

$$ \begin{array}{ccc} \llbracket \texttt{\{} x_1: s_1, ..., x_n: s_n \texttt{\}} \rrbracket\ \rho & = & & \textbf{if}\ x_1 \in \rho\ \textbf{then}\ \llbracket s_1 \rrbracket\ \rho(x_1)\ \\ & & \text{and} & ... \\ & & \text{and} & \textbf{if}\ x_n \in \rho\ \textbf{then}\ \llbracket s_n \rrbracket\ \rho(x_n) \end{array} $$

The first “weird” case now results in the following:

$$ \llbracket \texttt{\{} \texttt{x}: \bot \texttt{\}} \rrbracket\ \rho = \textbf{if}\ \texttt{x} \in \rho\ \textbf{then}\ \text{false} $$

Which is just another way of saying:

$$ \llbracket \texttt{\{} \texttt{x}: \bot \texttt{\}} \rrbracket\ \rho = \texttt{x} \notin \rho $$

In the second case, the interpretation also results in a true statement:

$$ \llbracket \texttt{\{} \texttt{x}: + \texttt{\}} \rrbracket\ \rho = \textbf{if}\ \texttt{x} \in \rho\ \textbf{then}\ \texttt{x} > 0 $$

In Agda, I encode the fact that a verified analysis needs a semantic function $\llbracket\cdot\rrbracket$ for its element lattice $L$ by taking such a function as an argument called ⟦_⟧ˡ:

From Forward.agda, lines 246 through 253

module WithInterpretation (latticeInterpretationˡ : LatticeInterpretation isLatticeˡ) where
    open LatticeInterpretation latticeInterpretationˡ
        using ()
        renaming
            ( ⟦_⟧ to ⟦_⟧ˡ
            ; ⟦⟧-respects-≈ to ⟦⟧ˡ-respects-≈ˡ
            ; ⟦⟧-⊔-∨ to ⟦⟧ˡ-⊔ˡ-∨
            )

I then define the semantic function for the variable-sign lattice in the following way, which eschews the “…” notation in favor of a more Agda-compatible (and equivalent) form:

From Forward.agda, lines 255 through 256

255
256

⟦_⟧ᵛ : VariableValues → Env → Set
⟦_⟧ᵛ vs ρ = ∀ {k l} → (k , l) ∈ᵛ vs → ∀ {v} → (k , v) Language.∈ ρ → ⟦ l ⟧ˡ v

The above reads roughly as follows:

For every variable k and sign [or, more generally, lattice element] l in the variable map lattice, if k is in the environment ρ, then it satisfies the predicate given by the semantic function applied to l.

Let’s recap: we have defined a semantic function for our sign lattice, and noted that to define a verified analysis, we always need such a semantic function. We then showed how to construct a semantic function for a whole variable map (of type $\text{Variable} \to \text{Sign}$, or $\text{Variable}\to L$ in general). We also wrote some Agda code doing all this. As a result, we have filled in the missing piece for property 1.

However, the way that we brought in the semantic function in the Agda code above hints that there’s more to be discussed. What’s latticeInterpretationˡ? In answering that question, we’ll provide evidence for property 2 and property 4.

Properties of the Semantic Function

As we briefly saw earlier, we loosened the notion of equality to that equivalences, which made it possible to ignore things like the ordering of key-value pairs in maps. That’s great and all, but nothing is stopping us from defining semantic functions that violate our equivalence! Supposing $a \approx f(a)$, as far as Agda is concerned, even though $a$ and $f(a)$ are “equivalent”, $\llbracket a \rrbracket$ and $\llbracket f(a) \rrbracket$ may be totally different. For a semantic function to be correct, it must produce the same predicate for equivalent elements of lattice $L$. That’s missing piece 4.

Another property of semantic functions that we will need to formalize is that $(\sqcup)$ represents disjunction. This comes into play when we reason about the correctness of predecessors in a Control Flow Graph. Recall that during the last step of processing a given node, when we are trying to move on to the next node in the trace, we have knowledge that the current node’s variable map accurately describes the intermediate environment. In other words, $\llbracket \textit{vs}_i \rrbracket\ \rho_2$ holds, where $\textit{vs}_i$ is the variable map for the current node. We can generalize this kowledge a little, and get:

$$ \llbracket \textit{vs}_1 \rrbracket\ \rho_2\ \text{or}\ ...\ \text{or}\ \llbracket \textit{vs}_n \rrbracket\ \rho_2 $$

However, the assumption that we need to hold when moving on to a new node is in terms of $\textit{JOIN}$, which combines all the predecessors’ maps $\textit{vs}_1, ..., \textit{vs}_n$ using $(\sqcup)$. Thus, we will need to be in a world where the following claim is true:

$$ \llbracket \textit{vs}_1 \sqcup ... \sqcup \textit{vs}_n \rrbracket\ \rho $$

To get from one to the other, we will need to rely explicitly on the fact that $(\sqcup)$ encodes “or”. It’s not necessary for the forward analysis, but a similar property ought to hold for $(\sqcap)$ and “and”. This constraint provides missing piece 2.

I defined a new data type that bundles a semantic function with proofs of the properties in this section; that’s precisely what latticeInterpretationˡ is:

From Semantics.agda, lines 66 through 73

record LatticeInterpretation {l} {L : Set l} {_≈_ : L → L → Set l}
                             {_⊔_ : L → L → L} {_⊓_ : L → L → L}
                             (isLattice : IsLattice L _≈_ _⊔_ _⊓_) : Set (lsuc l) where
    field
        ⟦_⟧ : L → Value → Set
        ⟦⟧-respects-≈ : ∀ {l₁ l₂ : L} → l₁ ≈ l₂ → ⟦ l₁ ⟧ ⇒ ⟦ l₂ ⟧
        ⟦⟧-⊔-∨ : ∀ {l₁ l₂ : L} → (⟦ l₁ ⟧ ∨ ⟦ l₂ ⟧) ⇒ ⟦ l₁ ⊔ l₂ ⟧
        ⟦⟧-⊓-∧ : ∀ {l₁ l₂ : L} → (⟦ l₁ ⟧ ∧ ⟦ l₂ ⟧) ⇒ ⟦ l₁ ⊓ l₂ ⟧

In short, to leverage the framework for verified analysis, you would need to provide a semantic function that interacts properly with ≈ and ∨.

Correctness of the Evaluator

All that’s left is the last missing piece, 3, which requires that eval matches the semantics of our language. Recall the signature of eval:

From Forward.agda, line 166

module WithEvaluator (eval : Expr → VariableValues → L)

It operates on expressions and variable maps, which themselves associate a sign (or, generally, an element of lattice $L$), with each variable. The “real” evaluation judgement, on the other hand, is in the form $\rho, e \Downarrow v$, and reads “expression $e$ in environment $\rho$ evaluates to value $v$”. In Agda:

From Semantics.agda, line 27

`27`	`data _,_⇒ᵉ_ : Env → Expr → Value → Set where`

Let’s line up the types of eval and the judgement. I’ll swap the order of arguments for eval to make the correspondence easier to see:

$$ \begin{array}{ccccccc} \text{eval} & : & (\text{Variable} \to \text{Sign}) & \to & \text{Expr} & \to & \text{Sign} \\ \cdot,\cdot\Downarrow\cdot & : & (\text{Variable} \to \text{Value}) & \to & \text{Expr} & \to & \text{Value} & \to & \text{Set} \\ & & \underbrace{\phantom{(\text{Variable} \to \text{Value})}}_{\text{environment-like inputs}} & & & & \underbrace{\phantom{Value}}_{\text{value-like outputs}} \end{array} $$

Squinting a little, it’s almost like the signature of eval is the signature for the evaluation judgement, but it forgets a few details (the exact values of the variables) in favor of abstractions (their signs). To show that eval behaves correctly, we’ll want to prove that this forgetful correspondence holds.

Concretely, for any expression $e$, take some environment $\rho$, and “forget” the exact values, getting a sign map $\textit{vs}$. Now, evaluate the expression to some value $v$ using the semantics, and also, compute the expression’s expected sign $s$ using eval. The sign should be the same as forgetting $v$’s exact value. Mathematically,

$$ \forall e, \rho, v, \textit{vs}.\ \textbf{if}\ \llbracket\textit{vs}\rrbracket \rho\ \text{and}\ \rho, e \Downarrow v\ \textbf{then}\ \llbracket \text{eval}\ \textit{vs}\ e\rrbracket v $$

In Agda:

From Forward.agda, lines 286 through 287

286
287

InterpretationValid : Set
InterpretationValid = ∀ {vs ρ e v} → ρ , e ⇒ᵉ v → ⟦ vs ⟧ᵛ ρ → ⟦ eval e vs ⟧ˡ v

For a concrete analysis, we need to prove the above claim. In the case of sign analysis, this boils down to a rather cumbersome proof by cases. I will collapse the proofs to save some space and avoid overwhelming the reader.

(Click here to expand the proof of correctness for plus)

From Sign.agda, lines 237 through 258

plus-valid : ∀ {g₁ g₂} {z₁ z₂} → ⟦ g₁ ⟧ᵍ (↑ᶻ z₁) → ⟦ g₂ ⟧ᵍ (↑ᶻ z₂) → ⟦ plus g₁ g₂ ⟧ᵍ (↑ᶻ (z₁ Int.+ z₂))
plus-valid {⊥ᵍ} {_} ⊥ _ = ⊥
plus-valid {[ + ]ᵍ} {⊥ᵍ} _ ⊥ = ⊥
plus-valid {[ - ]ᵍ} {⊥ᵍ} _ ⊥ = ⊥
plus-valid {[ 0ˢ ]ᵍ} {⊥ᵍ} _ ⊥ = ⊥
plus-valid {⊤ᵍ} {⊥ᵍ} _ ⊥ = ⊥
plus-valid {⊤ᵍ} {[ + ]ᵍ} _ _ = tt
plus-valid {⊤ᵍ} {[ - ]ᵍ} _ _ = tt
plus-valid {⊤ᵍ} {[ 0ˢ ]ᵍ} _ _ = tt
plus-valid {⊤ᵍ} {⊤ᵍ} _ _ = tt
plus-valid {[ + ]ᵍ} {[ + ]ᵍ} (n₁ , refl) (n₂ , refl) = (_ , refl)
plus-valid {[ + ]ᵍ} {[ - ]ᵍ} _ _ = tt
plus-valid {[ + ]ᵍ} {[ 0ˢ ]ᵍ} (n₁ , refl) refl = (_ , refl)
plus-valid {[ + ]ᵍ} {⊤ᵍ} _ _ = tt
plus-valid {[ - ]ᵍ} {[ + ]ᵍ} _ _ = tt
plus-valid {[ - ]ᵍ} {[ - ]ᵍ} (n₁ , refl) (n₂ , refl) = (_ , refl)
plus-valid {[ - ]ᵍ} {[ 0ˢ ]ᵍ} (n₁ , refl) refl = (_ , refl)
plus-valid {[ - ]ᵍ} {⊤ᵍ} _ _ = tt
plus-valid {[ 0ˢ ]ᵍ} {[ + ]ᵍ} refl (n₂ , refl) = (_ , refl)
plus-valid {[ 0ˢ ]ᵍ} {[ - ]ᵍ} refl (n₂ , refl) = (_ , refl)
plus-valid {[ 0ˢ ]ᵍ} {[ 0ˢ ]ᵍ} refl refl = refl
plus-valid {[ 0ˢ ]ᵍ} {⊤ᵍ} _ _ = tt

(Click here to expand the proof of correctness for minus)

From Sign.agda, lines 261 through 282

minus-valid : ∀ {g₁ g₂} {z₁ z₂} → ⟦ g₁ ⟧ᵍ (↑ᶻ z₁) → ⟦ g₂ ⟧ᵍ (↑ᶻ z₂) → ⟦ minus g₁ g₂ ⟧ᵍ (↑ᶻ (z₁ Int.- z₂))
minus-valid {⊥ᵍ} {_} ⊥ _ = ⊥
minus-valid {[ + ]ᵍ} {⊥ᵍ} _ ⊥ = ⊥
minus-valid {[ - ]ᵍ} {⊥ᵍ} _ ⊥ = ⊥
minus-valid {[ 0ˢ ]ᵍ} {⊥ᵍ} _ ⊥ = ⊥
minus-valid {⊤ᵍ} {⊥ᵍ} _ ⊥ = ⊥
minus-valid {⊤ᵍ} {[ + ]ᵍ} _ _ = tt
minus-valid {⊤ᵍ} {[ - ]ᵍ} _ _ = tt
minus-valid {⊤ᵍ} {[ 0ˢ ]ᵍ} _ _ = tt
minus-valid {⊤ᵍ} {⊤ᵍ} _ _ = tt
minus-valid {[ + ]ᵍ} {[ + ]ᵍ} _ _ = tt
minus-valid {[ + ]ᵍ} {[ - ]ᵍ} (n₁ , refl) (n₂ , refl) = (_ , refl)
minus-valid {[ + ]ᵍ} {[ 0ˢ ]ᵍ} (n₁ , refl) refl = (_ , refl)
minus-valid {[ + ]ᵍ} {⊤ᵍ} _ _ = tt
minus-valid {[ - ]ᵍ} {[ + ]ᵍ} (n₁ , refl) (n₂ , refl) = (_ , refl)
minus-valid {[ - ]ᵍ} {[ - ]ᵍ} _ _ = tt
minus-valid {[ - ]ᵍ} {[ 0ˢ ]ᵍ} (n₁ , refl) refl = (_ , refl)
minus-valid {[ - ]ᵍ} {⊤ᵍ} _ _ = tt
minus-valid {[ 0ˢ ]ᵍ} {[ + ]ᵍ} refl (n₂ , refl) = (_ , refl)
minus-valid {[ 0ˢ ]ᵍ} {[ - ]ᵍ} refl (n₂ , refl) = (_ , refl)
minus-valid {[ 0ˢ ]ᵍ} {[ 0ˢ ]ᵍ} refl refl = refl
minus-valid {[ 0ˢ ]ᵍ} {⊤ᵍ} _ _ = tt

From Sign.agda, lines 284 through 294

eval-Valid : InterpretationValid
eval-Valid (⇒ᵉ-+ ρ e₁ e₂ z₁ z₂ ρ,e₁⇒z₁ ρ,e₂⇒z₂) ⟦vs⟧ρ =
    plus-valid (eval-Valid ρ,e₁⇒z₁ ⟦vs⟧ρ) (eval-Valid ρ,e₂⇒z₂ ⟦vs⟧ρ)
eval-Valid (⇒ᵉ-- ρ e₁ e₂ z₁ z₂ ρ,e₁⇒z₁ ρ,e₂⇒z₂) ⟦vs⟧ρ =
    minus-valid (eval-Valid ρ,e₁⇒z₁ ⟦vs⟧ρ) (eval-Valid ρ,e₂⇒z₂ ⟦vs⟧ρ)
eval-Valid {vs} (⇒ᵉ-Var ρ x v x,v∈ρ) ⟦vs⟧ρ
    with ∈k-decᵛ x (proj₁ (proj₁ vs))
...   | yes x∈kvs = ⟦vs⟧ρ (proj₂ (locateᵛ {x} {vs} x∈kvs)) x,v∈ρ
...   | no x∉kvs = tt
eval-Valid (⇒ᵉ-ℕ ρ 0) _ = refl
eval-Valid (⇒ᵉ-ℕ ρ (suc n')) _ = (n' , refl)

This completes our last missing piece, 3. All that’s left is to put everything together.

Proving The Analysis Correct

Lifting Expression Evaluation Correctness to Statements

The individual analyses (like the sign analysis) provide only an evaluation function for expressions, and thus only have to prove correctness of that function. However, our language is made up of statements, with judgements in the form $\rho, s \Rightarrow \rho'$. Now that we’ve shown (or assumed) that eval behaves correctly when evaluating expressions, we should show that this correctness extends to evaluating statements, which in the forward analysis implementation is handled by the updateVariablesFromStmt function.

The property we need to show looks very similar to the property for eval:

$$ \forall b, \rho, \rho', \textit{vs}.\ \textbf{if}\ \llbracket\textit{vs}\rrbracket \rho\ \text{and}\ \rho, b \Rightarrow \rho'\ \textbf{then}\ \llbracket \text{updateVariablesFromStmt}\ \textit{vs}\ b\rrbracket \rho' $$

In Agda:

From Forward.agda, line 291

updateVariablesFromStmt-matches : ∀ {bs vs ρ₁ ρ₂} → ρ₁ , bs ⇒ᵇ ρ₂ → ⟦ vs ⟧ᵛ ρ₁ → ⟦ updateVariablesFromStmt bs vs ⟧ᵛ ρ₂

The proof is straightforward, and relies on the semantics of the map update. Specifically, in the case of an assignment statement $x \leftarrow e$, all we do is store the new sign computed from $e$ into the map at $x$. To prove the correctness of the entire final environment $\rho'$, there are two cases to consider:

A variable in question is the newly-updated $x$. In this case, since eval produces correct signs, the variable clearly has the correct sign. This is the first highlighted chunk in the below code.
A variable in question is different from $x$. In this case, its value in the environment $\rho'$ should be the same as it was prior, and its sign in the updated variable map is the same as it was in the original. Since the original map correctly described the original environment, we know the sign is correct. This is the second highlighted chunk in the below code.

The corresponding Agda proof is as follows:

From Forward.agda, lines 291 through 305

updateVariablesFromStmt-matches : ∀ {bs vs ρ₁ ρ₂} → ρ₁ , bs ⇒ᵇ ρ₂ → ⟦ vs ⟧ᵛ ρ₁ → ⟦ updateVariablesFromStmt bs vs ⟧ᵛ ρ₂
updateVariablesFromStmt-matches {_} {vs} {ρ₁} {ρ₁} (⇒ᵇ-noop ρ₁) ⟦vs⟧ρ₁ = ⟦vs⟧ρ₁
updateVariablesFromStmt-matches {_} {vs} {ρ₁} {_} (⇒ᵇ-← ρ₁ k e v ρ,e⇒v) ⟦vs⟧ρ₁ {k'} {l} k',l∈vs' {v'} k',v'∈ρ₂
    with k ≟ˢ k' | k',v'∈ρ₂
...   | yes refl | here _ v _
        rewrite updateVariablesFromExpression-k∈ks-≡ k e {l = vs} (Any.here refl) k',l∈vs' =
        interpretationValidˡ ρ,e⇒v ⟦vs⟧ρ₁
...   | yes k≡k' | there _ _ _ _ _ k'≢k _ = ⊥-elim (k'≢k (sym k≡k'))
...   | no k≢k' | here _ _ _ = ⊥-elim (k≢k' refl)
...   | no k≢k'  | there _ _ _ _ _ _ k',v'∈ρ₁ =
        let
            k'∉[k] = (λ { (Any.here refl) → k≢k' refl })
            k',l∈vs = updateVariablesFromExpression-k∉ks-backward k e {l = vs} k'∉[k] k',l∈vs'
        in
            ⟦vs⟧ρ₁ k',l∈vs k',v'∈ρ₁

From this, it follows with relative ease that each basic block in the lattice, when evaluated, produces an environment that matches the prediction of our forward analysis.

From Forward.agda, line 318

updateAll-matches : ∀ {s sv ρ₁ ρ₂} → ρ₁ , (code s) ⇒ᵇˢ ρ₂ → ⟦ variablesAt s sv ⟧ᵛ ρ₁ → ⟦ variablesAt s (updateAll sv) ⟧ᵛ ρ₂

Walking the Trace

Finally, we get to the meat of the proof, which follows the outline. First, let’s take a look at stepTrace, which implements the second bullet in our iterative procedure. I’ll show the code, then we can discuss it in detail.

From Forward.agda, lines 324 through 342

stepTrace : ∀ {s₁ ρ₁ ρ₂} → ⟦ joinForKey s₁ result ⟧ᵛ ρ₁ → ρ₁ , (code s₁) ⇒ᵇˢ ρ₂ → ⟦ variablesAt s₁ result ⟧ᵛ ρ₂
stepTrace {s₁} {ρ₁} {ρ₂} ⟦joinForKey-s₁⟧ρ₁ ρ₁,bss⇒ρ₂ =
        let
            -- I'd use rewrite, but Agda gets a memory overflow (?!).
            ⟦joinAll-result⟧ρ₁ =
                subst (λ vs → ⟦ vs ⟧ᵛ ρ₁)
                      (sym (variablesAt-joinAll s₁ result))
                      ⟦joinForKey-s₁⟧ρ₁
            ⟦analyze-result⟧ρ₂ =
                updateAll-matches {sv = joinAll result}
                                  ρ₁,bss⇒ρ₂ ⟦joinAll-result⟧ρ₁
            analyze-result≈result =
                ≈ᵐ-sym {result} {updateAll (joinAll result)}
                       result≈analyze-result
            analyze-s₁≈s₁ =
                variablesAt-≈ s₁ (updateAll (joinAll result))
                                 result (analyze-result≈result)
        in
            ⟦⟧ᵛ-respects-≈ᵛ {variablesAt s₁ (updateAll (joinAll result))} {variablesAt s₁ result} (analyze-s₁≈s₁) ρ₂ ⟦analyze-result⟧ρ₂

The first let-bound variable, ⟦joinAll-result⟧ρ₁ is kind of an intermediate result, which I was forced to introduced because rewrite caused Agda to allocate ~100GB of memory. It simply makes use of the fact that joinAll, the function that performs predecessor joining for each node in the CFG, sets every key of the map accordingly.

The second let-bound variable, ⟦analyze-result⟧, steps through a given node’s basic block and leverages our proof of statement-correctness to validate that the final environment ρ₂ matches the predication of the analyzer.

The last two let-bound variables apply the equation we wrote above:

$$ r = \text{update}(\text{join}(r)) $$

Recall that analyze is the combination of update and join:

From Forward.agda, lines 226 through 227

226
227

analyze : StateVariables → StateVariables
analyze = updateAll ∘ joinAll

Finally, the in portion of the code uses ⟦⟧ᵛ-respects-≈ᵛ, a proof of property 4, to produce the final claim in terms of the result map.

Knowing how to step, we can finally walk the entire trace, implementing the iterative process:

From Forward.agda, lines 344 through 357

walkTrace : ∀ {s₁ s₂ ρ₁ ρ₂} → ⟦ joinForKey s₁ result ⟧ᵛ ρ₁ → Trace {graph} s₁ s₂ ρ₁ ρ₂ → ⟦ variablesAt s₂ result ⟧ᵛ ρ₂
walkTrace {s₁} {s₁} {ρ₁} {ρ₂} ⟦joinForKey-s₁⟧ρ₁ (Trace-single ρ₁,bss⇒ρ₂) =
    stepTrace {s₁} {ρ₁} {ρ₂} ⟦joinForKey-s₁⟧ρ₁ ρ₁,bss⇒ρ₂
walkTrace {s₁} {s₂} {ρ₁} {ρ₂} ⟦joinForKey-s₁⟧ρ₁ (Trace-edge {ρ₂ = ρ} {idx₂ = s} ρ₁,bss⇒ρ s₁→s₂ tr) =
    let
        ⟦result-s₁⟧ρ =
            stepTrace {s₁} {ρ₁} {ρ} ⟦joinForKey-s₁⟧ρ₁ ρ₁,bss⇒ρ
        s₁∈incomingStates =
            []-∈ result (edge⇒incoming s₁→s₂)
                        (variablesAt-∈ s₁ result)
        ⟦joinForKey-s⟧ρ =
            ⟦⟧ᵛ-foldr ⟦result-s₁⟧ρ s₁∈incomingStates
    in
        walkTrace ⟦joinForKey-s⟧ρ tr

The first step — assuming that one of the predecessors of the current node satisfies the initial environment ρ₁ — is captured by the presence of the argument ⟦joinForKey-s₁⟧ρ₁. We expect the calling code to provide a proof of that.

The second step, in both cases, is implemented using stepTrace, as we saw above. That results in a proof that at the end of the current basic block, the final environment ρ₂ is accurately described.

From there, we move on to the third iterative step, if necessary. The sub-expression edge⇒incoming s₁→s₂ validates that, since we have an edge from the current node to the next, we are listed as a predecessor. This, in turn, means that we are included in the list of states-to-join for the $\textit{JOIN}$ function. That fact is stored in s₁∈incomingStates. Finally, relying on property 2, we construct an assumption fit for a recursive invocation of walkTrace, and move on to the next CFG node. The foldr here is motivated by the fact that “summation” using $(\sqcup)$ is a fold.

When the function terminates, what we have is a proof that the final program state is accurately described by the results of our program analysis. All that’s left is to kick off the walk. To do that, observe that the initial state has no predecessors (how could it, if it’s at the beginning of the program?). That, in turn, means that this state maps every variable to the bottom element. Such a variable configuration only permits the empty environment $\rho = \varnothing$. If the program evaluation starts in an empty environment, we have the assumption needed to kick off the iteration.

From Forward.agda, lines 359 through 366

joinForKey-initialState-⊥ᵛ : joinForKey initialState result ≡ ⊥ᵛ
joinForKey-initialState-⊥ᵛ = cong (λ ins → foldr _⊔ᵛ_ ⊥ᵛ (result [ ins ])) initialState-pred-∅

⟦joinAll-initialState⟧ᵛ∅ : ⟦ joinForKey initialState result ⟧ᵛ []
⟦joinAll-initialState⟧ᵛ∅ = subst (λ vs → ⟦ vs ⟧ᵛ []) (sym joinForKey-initialState-⊥ᵛ) ⟦⊥ᵛ⟧ᵛ∅

analyze-correct : ∀ {ρ : Env} → [] , rootStmt ⇒ˢ ρ → ⟦ variablesAt finalState result ⟧ᵛ ρ
analyze-correct {ρ} ∅,s⇒ρ = walkTrace {initialState} {finalState} {[]} {ρ} ⟦joinAll-initialState⟧ᵛ∅ (trace ∅,s⇒ρ)

Take a look at the highlighted line in the above code block in particular. It states precisely what we were hoping to see: that, when evaluating a program, the final state when it terminates is accurately described by the result of our static program analysis at the finalState in the CFG. We have done it!

Future Work

It took a lot of machinery to get where we are, but there’s still lots of things to do.

Correctness beyond the final state: the statement we’ve arrived at only shows that the final state of the program matches the results of the analysis. In fact, the property hold for all intermediate states, too. The only snag is that it’s more difficult to state such a claim.

To do something like that, we probably need a notion of “incomplete evaluations” of our language, which run our program but stop at some point before the end. A full execution would be a special case of such an “incomplete evaluation” that stops in the final state. Then, we could restate analyze-correct in terms of partial evaluations, which would strengthen it.
A more robust language and evaluation process: we noted above that our join-based analysis is a little bit weird, particularly in the cases of uninitialized variables. There are ways to adjust our language (e.g., introducing variable declaration points) and analysis functions (e.g., only allowing assignment for declared variables) to reduce the weirdness somewhat. They just lead to a more complicated language.
A more general correctness condition: converting lattice elements into predicates on values gets us far. However, some types of analyses make claims about more than the current values of variables. For instance, live variable analysis checks if a variable’s current value is going to be used in the future. Such an analysis can help guide register (re)allocation. To talk about future uses of a variable, the predicate will need to be formulated in terms of the entire evaluation proof tree. This opens a whole can of worms that I haven’t begun to examine.

Now that I’m done writing up my code so far, I will start exploring these various avenues of work. In the meantime, though, thanks for reading!

Implementing and Verifying "Static Program Analysis" in Agda, Part 8: Forward Analysis

Sun, 01 Dec 2024 15:09:07 -0800

In the previous post, I showed that the Control Flow graphs we built of our programs match how they are really executed. This means that we can rely on these graphs to compute program information. In this post, we finally get to compute that information. Here’s a quick bit paraphrasing from last time that provides a summary of our approach:

We will construct a finite-height lattice. Every single element of this lattice will contain information about each variable at each node in the Control Flow Graph.
We will then define a monotonic function that update this information using the structure encoded in the CFG’s edges and nodes.
Then, using the fixed-point algorithm, we will find the least element of the lattice, which will give us a precise description of all program variables at all points in the program.
Because we have just validated our CFGs to be faithful to the language’s semantics, we’ll be able to prove that our algorithm produces accurate results.

Let’s jump right into it!

Choosing a Lattice

A lot of this time, we have been talking about lattices, particularly lattices of finite height. These structures represent things we know about the program, and provide operators like $(\sqcup)$ and $(\sqcap)$ that help us combine such knowledge.

The forward analysis code I present here will work with any finite-height lattice, with the additional constraint that equivalence of lattices is decidable, which comes from the implementation of the fixed-point algorithm, in which we routinely check if a function’s output is the same as its input.

From Forward.agda, lines 4 through 8

module Analysis.Forward
    {L : Set} {h}
    {_≈ˡ_ : L → L → Set} {_⊔ˡ_ : L → L → L} {_⊓ˡ_ : L → L → L}
    (isFiniteHeightLatticeˡ : IsFiniteHeightLattice L h _≈ˡ_ _⊔ˡ_ _⊓ˡ_)
    (≈ˡ-dec : IsDecidable _≈ˡ_) where

The finite-height lattice L is intended to describe the state of a single variable. One example of a lattice that can be used as L is our sign lattice. We’ve been using the sign lattice in our examples from the very beginning, and we will stick with it for the purposes of this explanation. However, this lattice alone does not describe our program, since it only talks about a single sign; programs have lots of variables, all of which can have different signs! So, we might go one step further and define a map lattice from variables to their signs:

$$ \text{Variable} \to \text{Sign} $$

We have seen that we can turn any lattice $L$ into a map lattice $A \to L$, for any type of keys $A$. In this case, we will define $A \triangleq \text{Variable}$, and $L \triangleq \text{Sign}$. The sign lattice has a finite height, and I’ve proven that, as long as we pick a finite set of keys, map lattices $A \to L$ have a finite height if $L$ has a finite height. Since a program’s text is finite, $\text{Variable}$ is a finite set, and we have ourselves a finite-height lattice $\text{Variable} \to \text{Sign}$.

We’re on the right track, but even the lattice we have so far is not sufficient. That’s because variables have different signs at different points in the program! You might initialize a variable with x = 1, making it positive, and then go on to compute some arbitrary function using loops and conditionals. For each variable, we need to keep track of its sign at various points in the code. When we defined Control Flow Graphs, we split our programs into sequences of statements that are guaranteed to execute together — basic blocks. For our analysis, we’ll keep per-variable for each basic block in the program. Since basic blocks are nodes in the Control Flow Graph of our program, our whole lattice will be as follows:

$$ \text{Info} \triangleq \text{NodeId} \to (\text{Variable} \to \text{Sign}) $$

We follow the same logic we just did for the variable-sign lattice; since $\text{Variable} \to \text{Sign}$ is a lattice of finite height, and since $\text{NodeId}$ is a finite set, the whole $\text{Info}$ map will be a lattice with a finite height.

Notice that both the sets of $\text{Variable}$ and $\text{NodeId}$ depend on the program in question. The lattice we use is slightly different for each input program! We can use Agda’s parameterized modules to automaitcally parameterize all our functions over programs:

From Forward.agda, lines 36 through 37

36
37

module WithProg (prog : Program) where
    open Program prog

Now, let’s make the informal descriptions above into code, by instantiating our map lattice modules. First, I invoked the code for the smaller variable-sign lattice. This ended up being quite long, so that I could rename variables I brought into scope. I will collapse the relevant code block; suffice to say that I used the suffix v (e.g., renaming _⊔_ to _⊔ᵛ_) for properties and operators to do with variable-sign maps (in Agda: VariableValuesFiniteMap).

(Click here to expand the module uses for variable-sign maps)

From Forward.agda, lines 41 through 82

    module VariableValuesFiniteMap = Lattice.FiniteValueMap.WithKeys _≟ˢ_ isLatticeˡ vars
    open VariableValuesFiniteMap
        using ()
        renaming
            ( FiniteMap to VariableValues
            ; isLattice to isLatticeᵛ
            ; _≈_ to _≈ᵛ_
            ; _⊔_ to _⊔ᵛ_
            ; _≼_ to _≼ᵛ_
            ; ≈₂-dec⇒≈-dec to ≈ˡ-dec⇒≈ᵛ-dec
            ; _∈_ to _∈ᵛ_
            ; _∈k_ to _∈kᵛ_
            ; _updating_via_ to _updatingᵛ_via_
            ; locate to locateᵛ
            ; m₁≼m₂⇒m₁[k]≼m₂[k] to m₁≼m₂⇒m₁[k]ᵛ≼m₂[k]ᵛ
            ; ∈k-dec to ∈k-decᵛ
            ; all-equal-keys to all-equal-keysᵛ
            )
        public
    open IsLattice isLatticeᵛ
        using ()
        renaming
            ( ⊔-Monotonicˡ to ⊔ᵛ-Monotonicˡ
            ; ⊔-Monotonicʳ to ⊔ᵛ-Monotonicʳ
            ; ⊔-idemp to ⊔ᵛ-idemp
            )
    open Lattice.FiniteValueMap.IterProdIsomorphism _≟ˢ_ isLatticeˡ
        using ()
        renaming
            ( Provenance-union to Provenance-unionᵐ
            )
    open Lattice.FiniteValueMap.IterProdIsomorphism.WithUniqueKeysAndFixedHeight _≟ˢ_ isLatticeˡ vars-Unique ≈ˡ-dec _ fixedHeightˡ
        using ()
        renaming
            ( isFiniteHeightLattice to isFiniteHeightLatticeᵛ
            ; ⊥-contains-bottoms to ⊥ᵛ-contains-bottoms
            )

    ≈ᵛ-dec = ≈ˡ-dec⇒≈ᵛ-dec ≈ˡ-dec
    joinSemilatticeᵛ = IsFiniteHeightLattice.joinSemilattice isFiniteHeightLatticeᵛ
    fixedHeightᵛ = IsFiniteHeightLattice.fixedHeight isFiniteHeightLatticeᵛ
    ⊥ᵛ = Chain.Height.⊥ fixedHeightᵛ

I then used this lattice as an argument to the map module again, to construct the top-level $\text{Info}$ lattice (in Agda: StateVariablesFiniteMap). This also required a fair bit of code, most of it to do with renaming.

(Click here to expand the module uses for the top-level lattice)

From Forward.agda, lines 85 through 112

    module StateVariablesFiniteMap = Lattice.FiniteValueMap.WithKeys _≟_ isLatticeᵛ states
    open StateVariablesFiniteMap
        using (_[_]; []-∈; m₁≼m₂⇒m₁[ks]≼m₂[ks]; m₁≈m₂⇒k∈m₁⇒k∈km₂⇒v₁≈v₂)
        renaming
            ( FiniteMap to StateVariables
            ; isLattice to isLatticeᵐ
            ; _≈_ to _≈ᵐ_
            ; _∈_ to _∈ᵐ_
            ; _∈k_ to _∈kᵐ_
            ; locate to locateᵐ
            ; _≼_ to _≼ᵐ_
            ; ≈₂-dec⇒≈-dec to ≈ᵛ-dec⇒≈ᵐ-dec
            ; m₁≼m₂⇒m₁[k]≼m₂[k] to m₁≼m₂⇒m₁[k]ᵐ≼m₂[k]ᵐ
            )
        public
    open Lattice.FiniteValueMap.IterProdIsomorphism.WithUniqueKeysAndFixedHeight _≟_ isLatticeᵛ states-Unique ≈ᵛ-dec _ fixedHeightᵛ
        using ()
        renaming
            ( isFiniteHeightLattice to isFiniteHeightLatticeᵐ
            )
    open IsFiniteHeightLattice isFiniteHeightLatticeᵐ
        using ()
        renaming
            ( ≈-sym to ≈ᵐ-sym
            )

    ≈ᵐ-dec = ≈ᵛ-dec⇒≈ᵐ-dec ≈ᵛ-dec
    fixedHeightᵐ = IsFiniteHeightLattice.fixedHeight isFiniteHeightLatticeᵐ

Constructing a Monotone Function

We now have a lattice in hand; the next step is to define a function over this lattice. For us to be able to use the fixed-point algorithm on this function, it will need to be monotonic.

Our goal with static analysis is to compute information about our program; that’s what we want the function to do. When the lattice we’re using is the sign lattice, we’re trying to determine the signs of each of the variables in various parts of the program. How do we go about this?

Each piece of code in the program might change a variable’s sign. For instance, if x has sign $0$, and we run the statement x = x - 1, the sign of x will be $-$. If we have an expression y + z, we can use the signs of y and z to compute the sign of the whole thing. This is a form of abstract interpretation, in which we almost-run the program, but forget some details (e.g., the exact values of x, y, and z, leaving only their signs). The exact details of how this partial evaluation is done are analysis-specific; in general, we simply require an analysis to provide an evaluator. We will define an evaluator for the sign lattice below.

From Forward.agda, lines 166 through 167

166
167

    module WithEvaluator (eval : Expr → VariableValues → L)
                         (eval-Mono : ∀ (e : Expr) → Monotonic _≼ᵛ_ _≼ˡ_ (eval e)) where

From this, we know how each statement and basic block will change variables in the function. But we have described them process as “if a variable has sign X, it becomes sign Y” – how do we know what sign a variable has before the code runs? Fortunately, the Control Flow Graph tells us exactly what code could be executed before any given basic block. Recall that edges in the graph describe all possible jumps that could occur; thus, for any node, the incoming edges describe all possible blocks that can precede it. This is why we spent all that time defining the predecessors function.

We proceed as follows: for any given node, find its predecessors. By accessing our $\text{Info}$ map for each predecessor, we can determine our current best guess of variable signs at that point, in the form of a $\text{Variable} \to \text{Sign}$ map (more generally, $\text{Variable} \to L$ map in an arbitrary analysis). We know that any of these predecessors could’ve been the previous point of execution; if a variable x has sign $+$ in one predecessor and $-$ in another, it can be either one or the other when we start executing the current block. Early on, we saw that the $(\sqcup)$ operator models disjunction (“A or B”). So, we apply $(\sqcup)$ to the variable-sign maps of all predecessors. The reference Static Program Analysis text calls this operation $\text{JOIN}$:

$$ \textit{JOIN}(v) = \bigsqcup_{w \in \textit{pred}(v)} \llbracket w \rrbracket $$

The Agda implementation uses a foldr:

From Forward.agda, lines 139 through 140

139
140

    joinForKey : State → StateVariables → VariableValues
    joinForKey k states = foldr _⊔ᵛ_ ⊥ᵛ (states [ incoming k ])

Computing the “combined incoming states” for any node is a monotonic function. This follows from the monotonicity of $(\sqcup)$ — in both arguments — and the definition of foldr.

(Click here to expand the general proof)

From Lattice.agda, lines 143 through 151

    foldr-Mono : ∀ (l₁ l₂ : List A) (f : A → B → B) (b₁ b₂ : B) →
                 Pairwise _≼₁_ l₁ l₂ → b₁ ≼₂ b₂ →
                 (∀ b → Monotonic _≼₁_ _≼₂_ (λ a → f a b)) →
                 (∀ a → Monotonic _≼₂_ _≼₂_ (f a)) →
                 foldr f b₁ l₁ ≼₂ foldr f b₂ l₂
    foldr-Mono List.[] List.[] f b₁ b₂ _ b₁≼b₂ _ _ = b₁≼b₂
    foldr-Mono (x ∷ xs) (y ∷ ys) f b₁ b₂ (x≼y ∷ xs≼ys) b₁≼b₂ f-Mono₁ f-Mono₂ =
        ≼₂-trans (f-Mono₁ (foldr f b₁ xs) x≼y)
                 (f-Mono₂ y (foldr-Mono xs ys f b₁ b₂ xs≼ys b₁≼b₂ f-Mono₁ f-Mono₂))

From this, we can formally state that $\text{JOIN}$ is monotonic. Note that the input and output lattices are different: the input lattice is the lattice of variable states at each block, and the output lattice is a single variable-sign map, representing the combined preceding state at a given node.

From Forward.agda, lines 145 through 149

    joinForKey-Mono : ∀ (k : State) → Monotonic _≼ᵐ_ _≼ᵛ_ (joinForKey k)
    joinForKey-Mono k {fm₁} {fm₂} fm₁≼fm₂ =
        foldr-Mono joinSemilatticeᵛ joinSemilatticeᵛ (fm₁ [ incoming k ]) (fm₂ [ incoming k ]) _⊔ᵛ_ ⊥ᵛ ⊥ᵛ
                   (m₁≼m₂⇒m₁[ks]≼m₂[ks] fm₁ fm₂ (incoming k) fm₁≼fm₂)
                   (⊔ᵛ-idemp ⊥ᵛ) ⊔ᵛ-Monotonicʳ ⊔ᵛ-Monotonicˡ

Above, the m₁≼m₂⇒m₁[ks]≼m₂[ks] lemma states that for two maps with the same keys, where one map is less than another, all the values for any subset of keys ks are pairwise less than each other (i.e. m₁[k]≼m₂[k], and m₁[l]≼m₂[l], etc.). This follows from the definition of “less than” for maps.

So those are the two pieces: first, join all the preceding states, then use the abstract interpretation function. I opted to do both of these in bulk:

Take an initial $\text{Info}$ map, and update every basic block’s entry to be the join of its predecessors.
In the new joined map, each key now contains the variable state at the beginning of the block; so, apply the abstract interpretation function via eval to each key, computing the state at the end of the block.

I chose to do these in bulk because this way, after each application of the function, we have updated each block with exactly one round of information. The alternative — which is specified in the reference text — is to update one key at a time. The difference there is that updates to later keys might be “tainted” by updates to keys that came before them. This is probably fine (and perhaps more efficient, in that it “moves faster”), but it’s harder to reason about.

Generalized Update

To implement bulk assignment, I needed to implement the source text’s Exercise 4.26:

Exercise 4.26: Recall that $f[a \leftarrow x]$ denotes the function that is identical to $f$ except that it maps $a$ to $x$. Assume $f : L_1 \to (A \to L_2)$ and $g : L_1 \to L_2$ are monotone functions where $L_1$ and $L_2$ are lattices and $A$ is a set, and let $a \in A$. (Note that the codomain of $f$ is a map lattice.)

Show that the function $h : L_1 \to (A \to L_2)$ defined by $h(x) = f(x)[a \leftarrow g(x)]$ is monotone.

In fact, I generalized this statement to update several keys at once, as follows:

$$ h(x) = f(x)[a_1 \leftarrow g(a_1, x),\ ...,\ a_n \leftarrow g(a_n, x)] $$

I called this operation “generalized update”.

At first, the exercise may not obviously correspond to the bulk operation I’ve described. Particularly confusing is the fact that it has two lattices, $L_1$ and $L_2$. In fact, the exercise results in a very general theorem; we can exploit a more concrete version of the theorem by setting $L_1 \triangleq A \to L_2$, resulting in an overall signature for $f$ and $h$:

$$ f : (A \to L_2) \to (A \to L_2) $$

In other words, if we give the entire operation in Exercise 4.26 a type, it would look like this:

$$ \text{ex}_{4.26} : \underbrace{K}_{\text{value of}\ a} \to \underbrace{(\text{Map} \to V)}_{\text{updater}} \to \underbrace{\text{Map} \to \text{Map}}_{f} \to \underbrace{\text{Map} \to \text{Map}}_{h} $$

That’s still more general than we need it. This here allows us to modify any map-to-map function by updating a certain key in that function. If we just want to update keys (as we do for the purposes of static analysis), we can recover a simpler version by setting $f \triangleq id$, which results in an updater $h(x) = x[a \leftarrow g(x)]$, and a signature for the exercise:

$$ \text{ex}_{4.26} : \underbrace{K}_{\text{value of}\ a} \to \underbrace{(\text{Map} \to V)}_{\text{updater}\ g} \to \underbrace{\text{Map}}_{\text{old map}} \to \underbrace{\text{Map}}_{\text{updated map}} $$

This looks just like Haskell’s Data.Map.adjust function, except that it can take the entire map into consideration when updating a key.

My generalized version takes in a list of keys to update, and makes the updater accept a key so that its behavior can be specialized for each entry it changes. The sketch of the implementation is in the _updating_via_ function from the Map module, and its helper transform. Here, I collapse its definition, since it’s not particularly important.

(Click here to see the definition of transform)

From Map.agda, lines 926 through 931

    transform : List (A × B) → List A → (A → B) → List (A × B)
    transform [] _ _ = []
    transform ((k , v) ∷ xs) ks f
        with k∈-dec k ks
    ...   | yes _ = (k , f k) ∷ transform xs ks f
    ...   | no _ = (k , v) ∷ transform xs ks f

The proof of monotonicity — which is the solution to the exercise — is actually quite complicated. I will omit its description, and show it here in another collapsed block.

(Click here to see the proof of monotonicity of $h$)

From Map.agda, lines 1042 through 1105

        f'-Monotonic : Monotonic _≼ˡ_ _≼_ f'
        f'-Monotonic {l₁} {l₂} l₁≼l₂ = (f'l₁f'l₂⊆f'l₂ , f'l₂⊆f'l₁f'l₂)
            where
                fl₁fl₂⊆fl₂ = proj₁ (f-Monotonic l₁≼l₂)
                fl₂⊆fl₁fl₂ = proj₂ (f-Monotonic l₁≼l₂)

                f'l₁f'l₂⊆f'l₂ : ((f' l₁) ⊔ (f' l₂)) ⊆ f' l₂
                f'l₁f'l₂⊆f'l₂ k v k,v∈f'l₁f'l₂
                    with Expr-Provenance-≡ ((` (f' l₁)) ∪ (` (f' l₂))) k,v∈f'l₁f'l₂
                ...   | in₁ (single k,v∈f'l₁) k∉kf'l₂ =
                        let
                            k∈kfl₁ = updating-via-∈k-backward (f l₁) ks (updater l₁) (forget k,v∈f'l₁)
                            k∈kfl₁fl₂ = union-preserves-∈k₁ {l₁ = proj₁ (f l₁)} {l₂ = proj₁ (f l₂)} k∈kfl₁
                            (v' , k,v'∈fl₁l₂) = locate {m = (f l₁ ⊔ f l₂)} k∈kfl₁fl₂
                            (v'' , (v'≈v'' , k,v''∈fl₂)) = fl₁fl₂⊆fl₂ k v' k,v'∈fl₁l₂
                            k∈kf'l₂ = updating-via-∈k-forward (f l₂) ks (updater l₂) (forget k,v''∈fl₂)
                        in
                            ⊥-elim (k∉kf'l₂ k∈kf'l₂)
                ...   | in₂ k∉kf'l₁ (single k,v'∈f'l₂) =
                        (v , (IsLattice.≈-refl lB , k,v'∈f'l₂))
                ...   | bothᵘ (single {v₁} k,v₁∈f'l₁) (single {v₂} k,v₂∈f'l₂)
                        with k∈-dec k ks
                ...       | yes k∈ks
                            with refl ← updating-via-k∈ks-≡ (f l₁) (updater l₁) k∈ks k,v₁∈f'l₁
                            with refl ← updating-via-k∈ks-≡ (f l₂) (updater l₂) k∈ks k,v₂∈f'l₂ =
                            (updater l₂ k , (g-Monotonicʳ k l₁≼l₂ , k,v₂∈f'l₂))
                ...       | no k∉ks =
                                let
                                    k,v₁∈fl₁ = updating-via-k∉ks-backward (f l₁) (updater l₁) k∉ks k,v₁∈f'l₁
                                    k,v₂∈fl₂ = updating-via-k∉ks-backward (f l₂) (updater l₂) k∉ks k,v₂∈f'l₂
                                    k,v₁v₂∈fl₁fl₂ = ⊔-combines {m₁ = f l₁} {m₂ = f l₂} k,v₁∈fl₁ k,v₂∈fl₂
                                    (v' , (v'≈v₁v₂ , k,v'∈fl₂)) = fl₁fl₂⊆fl₂ k _ k,v₁v₂∈fl₁fl₂
                                    k,v'∈f'l₂ = updating-via-k∉ks-forward (f l₂) (updater l₂) k∉ks k,v'∈fl₂
                                in
                                    (v' , (v'≈v₁v₂ , k,v'∈f'l₂))

                f'l₂⊆f'l₁f'l₂ : f' l₂ ⊆ ((f' l₁) ⊔ (f' l₂))
                f'l₂⊆f'l₁f'l₂ k v k,v∈f'l₂
                    with k∈kfl₂ ← updating-via-∈k-backward (f l₂) ks (updater l₂) (forget k,v∈f'l₂)
                    with (v' , k,v'∈fl₂) ← locate {m = f l₂} k∈kfl₂
                    with (v'' , (v'≈v'' , k,v''∈fl₁fl₂)) ← fl₂⊆fl₁fl₂ k v' k,v'∈fl₂
                    with Expr-Provenance-≡ ((` (f l₁)) ∪ (` (f l₂))) k,v''∈fl₁fl₂
                ...   | in₁ (single k,v''∈fl₁) k∉kfl₂ = ⊥-elim (k∉kfl₂ k∈kfl₂)
                ...   | in₂ k∉kfl₁ (single k,v''∈fl₂) =
                        let
                            k∉kf'l₁ = updating-via-∉k-forward (f l₁) ks (updater l₁) k∉kfl₁
                        in
                            (v , (IsLattice.≈-refl lB , union-preserves-∈₂ k∉kf'l₁ k,v∈f'l₂))
                ...   | bothᵘ (single {v₁} k,v₁∈fl₁) (single {v₂} k,v₂∈fl₂)
                        with k∈-dec k ks
                ...       | yes k∈ks with refl ← updating-via-k∈ks-≡ (f l₂) (updater l₂) k∈ks k,v∈f'l₂ =
                            let
                                k,uv₁∈f'l₁ = updating-via-k∈ks-forward (f l₁) (updater l₁) k∈ks (forget k,v₁∈fl₁)
                                k,uv₂∈f'l₂ = updating-via-k∈ks-forward (f l₂) (updater l₂) k∈ks (forget k,v₂∈fl₂)
                                k,uv₁uv₂∈f'l₁f'l₂ = ⊔-combines {m₁ = f' l₁} {m₂ = f' l₂} k,uv₁∈f'l₁ k,uv₂∈f'l₂
                            in
                                (updater l₁ k ⊔₂ updater l₂ k , (IsLattice.≈-sym lB (g-Monotonicʳ k l₁≼l₂) , k,uv₁uv₂∈f'l₁f'l₂))
                ...       | no k∉ks
                            with k,v₁∈f'l₁ ← updating-via-k∉ks-forward (f l₁) (updater l₁) k∉ks k,v₁∈fl₁
                            with k,v₂∈f'l₂ ← updating-via-k∉ks-forward (f l₂) (updater l₂) k∉ks k,v₂∈fl₂
                            with k,v₁v₂∈f'l₁f'l₂ ← ⊔-combines {m₁ = f' l₁} {m₂ = f' l₂} k,v₁∈f'l₁ k,v₂∈f'l₂
                            with refl ← Map-functional {m = f' l₂} k,v∈f'l₂ k,v₂∈f'l₂
                            with refl ← Map-functional {m = f l₂} k,v'∈fl₂ k,v₂∈fl₂ =
                            (v₁ ⊔₂ v , (v'≈v'' , k,v₁v₂∈f'l₁f'l₂))

Given a proof of the exercise, all that’s left is to instantiate the theorem with the argument I described. Specifically:

$L_1 \triangleq \text{Info} \triangleq \text{NodeId} \to (\text{Variable} \to \text{Sign})$
$L_2 \triangleq \text{Variable} \to \text{Sign} $
$A \triangleq \text{NodeId}$
$f \triangleq \text{id} \triangleq x \mapsto x$
$g(k, m) = \text{JOIN}(k, m)$

In the equation for $g$, I explicitly insert the map $m$ instead of leaving it implicit as the textbook does. In Agda, this instantiation for joining all predecessor looks like this (using states as the list of keys to update, indicating that we should update every key):

From Forward.agda, lines 152 through 157

    open StateVariablesFiniteMap.GeneralizedUpdate states isLatticeᵐ (λ x → x) (λ a₁≼a₂ → a₁≼a₂) joinForKey joinForKey-Mono states
        renaming
            ( f' to joinAll
            ; f'-Monotonic to joinAll-Mono
            ; f'-k∈ks-≡ to joinAll-k∈ks-≡
            )

And the one for evaluating all programs looks like this:

From Forward.agda, lines 215 through 220

        open StateVariablesFiniteMap.GeneralizedUpdate states isLatticeᵐ (λ x → x) (λ a₁≼a₂ → a₁≼a₂) updateVariablesForState updateVariablesForState-Monoʳ states
            renaming
                ( f' to updateAll
                ; f'-Monotonic to updateAll-Mono
                ; f'-k∈ks-≡ to updateAll-k∈ks-≡
                )

Actually, we haven’t yet seen that updateVariablesFromStmt. This is a function that we can define using the user-provided abtract interpretation eval. Specifically, it handles the job of updating the sign of a variable once it has been assigned to (or doing nothing if the statement is a no-op).

From Forward.agda, lines 191 through 193

191
192
193

        updateVariablesFromStmt : BasicStmt → VariableValues → VariableValues
        updateVariablesFromStmt (k ← e) vs = updateVariablesFromExpression k e vs
        updateVariablesFromStmt noop vs = vs

The updateVariablesFromExpression is now new, and it is yet another map update, which changes the sign of a variable k to be the one we get from running eval on it. Map updates are instances of the generalized update; this time, the updater $g$ is eval. The exercise requires the updater to be monotonic, which constrains the user-provided evaluation function to be monotonic too.

From Forward.agda, lines 173 through 181

        private module _ (k : String) (e : Expr) where
            open VariableValuesFiniteMap.GeneralizedUpdate vars isLatticeᵛ (λ x → x) (λ a₁≼a₂ → a₁≼a₂) (λ _ → eval e) (λ _ {vs₁} {vs₂} vs₁≼vs₂ → eval-Mono e {vs₁} {vs₂} vs₁≼vs₂) (k ∷ [])
                renaming
                    ( f' to updateVariablesFromExpression
                    ; f'-Monotonic to updateVariablesFromExpression-Mono
                    ; f'-k∈ks-≡ to updateVariablesFromExpression-k∈ks-≡
                    ; f'-k∉ks-backward to updateVariablesFromExpression-k∉ks-backward
                    )
                public

We finally write the analyze function as the composition of the two bulk updates:

From Forward.agda, lines 226 through 232

        analyze : StateVariables → StateVariables
        analyze = updateAll ∘ joinAll

        analyze-Mono : Monotonic _≼ᵐ_ _≼ᵐ_ analyze
        analyze-Mono {sv₁} {sv₂} sv₁≼sv₂ =
            updateAll-Mono {joinAll sv₁} {joinAll sv₂}
                           (joinAll-Mono {sv₁} {sv₂} sv₁≼sv₂)

Instantiating with the Sign Lattice

Thus far, I’ve been talking about the sign lattice throughout, but implementing the Agda code in terms of a general lattice L and evaluation function eval. In order to actually run the Agda code, we do need to provide an eval function, which implements the logic we used above, in which a zero-sign variable $x$ minus one was determined to be negative. For binary operators specifically, I’ve used the table provided in the textbook; here they are:

Cayley tables for abstract interpretation of plus and minus

These are pretty much common sense:

A positive plus a positive is still positive, so $+\ \hat{+}\ + = +$
A positive plus any sign could be any sign still, so $+\ \hat{+}\ \top = \top$
Any sign plus “impossible” is impossible, so $\top\ \hat{+} \bot = \bot$.
etc.

The Agda encoding for the plus function is as follows, and the one for minus is similar.

From Sign.agda, lines 76 through 94

plus : SignLattice → SignLattice → SignLattice
plus ⊥ᵍ _ = ⊥ᵍ
plus _ ⊥ᵍ = ⊥ᵍ
plus ⊤ᵍ _ = ⊤ᵍ
plus _ ⊤ᵍ = ⊤ᵍ
plus [ + ]ᵍ [ + ]ᵍ = [ + ]ᵍ
plus [ + ]ᵍ [ - ]ᵍ = ⊤ᵍ
plus [ + ]ᵍ [ 0ˢ ]ᵍ = [ + ]ᵍ
plus [ - ]ᵍ [ + ]ᵍ = ⊤ᵍ
plus [ - ]ᵍ [ - ]ᵍ = [ - ]ᵍ
plus [ - ]ᵍ [ 0ˢ ]ᵍ = [ - ]ᵍ
plus [ 0ˢ ]ᵍ [ + ]ᵍ = [ + ]ᵍ
plus [ 0ˢ ]ᵍ [ - ]ᵍ = [ - ]ᵍ
plus [ 0ˢ ]ᵍ [ 0ˢ ]ᵍ = [ 0ˢ ]ᵍ

-- this is incredibly tedious: 125 cases per monotonicity proof, and tactics
-- are hard. postulate for now.
postulate plus-Monoˡ : ∀ (s₂ : SignLattice) → Monotonic _≼ᵍ_ _≼ᵍ_ (λ s₁ → plus s₁ s₂)
postulate plus-Monoʳ : ∀ (s₁ : SignLattice) → Monotonic _≼ᵍ_ _≼ᵍ_ (plus s₁)

As the comment in the block says, it would be incredibly tedious to verify the monotonicity of these tables, since you would have to consider roughly 125 cases per argument: for each (fixed) sign $s$ and two other signs $s_1 \le s_2$, we’d need to show that $s\ \hat{+}\ s_1 \le s\ \hat{+}\ s_2$. I therefore commit the faux pas of using postulate. Fortunately, the proof of monotonicity is not used for the execution of the program, so we will get away with this, barring any meddling kids.

From this, all that’s left is to show that for any expression e, the evaluation function:

$$ \text{eval} : \text{Expr} \to (\text{Variable} \to \text{Sign}) \to \text{Sign} $$

is monotonic. It’s defined straightforwardly and very much like an evaluator / interpreter, suggesting that “abstract interpretation” is the correct term here.

From Sign.agda, lines 176 through 184

    eval : ∀ (e : Expr) → VariableValues → SignLattice
    eval (e₁ + e₂) vs = plus (eval e₁ vs) (eval e₂ vs)
    eval (e₁ - e₂) vs = minus (eval e₁ vs) (eval e₂ vs)
    eval (` k) vs
        with ∈k-decᵛ k (proj₁ (proj₁ vs))
    ...   | yes k∈vs = proj₁ (locateᵛ {k} {vs} k∈vs)
    ...   | no _ = ⊤ᵍ
    eval (# 0) _ = [ 0ˢ ]ᵍ
    eval (# (suc n')) _ = [ + ]ᵍ

Thought it won’t happen, it was easier to just handle the case where there’s an undefined variable; I give it “any sign”. Otherwise, the function simply consults the sign tables for + or -, as well as the known signs of the variables. For natural number literals, it assigns 0 the “zero” sign, and any other natural number the “$+$”.

To prove monotonicity, we need to consider two variable maps (one less than the other), and show that the abstract interpretation respects that ordering. This boils down to the fact that the plus and minus tables are monotonic in both arguments (thus, if their sub-expressions are evaluated monotonically given an environment, then so is the whole addition or subtraction), and to the fact that for two maps m₁ ≼ m₂, the values at corresponding keys are similarly ordered: m₁[k] ≼ m₂[k]. We saw that above.

(Click to expand the proof that the evaluation function for signs is monotonic)

From Sign.agda, lines 186 through 223

    eval-Mono : ∀ (e : Expr) → Monotonic _≼ᵛ_ _≼ᵍ_ (eval e)
    eval-Mono (e₁ + e₂) {vs₁} {vs₂} vs₁≼vs₂ =
        let
            -- TODO: can this be done with less boilerplate?
            g₁vs₁ = eval e₁ vs₁
            g₂vs₁ = eval e₂ vs₁
            g₁vs₂ = eval e₁ vs₂
            g₂vs₂ = eval e₂ vs₂
        in
            ≼ᵍ-trans
                {plus g₁vs₁ g₂vs₁} {plus g₁vs₂ g₂vs₁} {plus g₁vs₂ g₂vs₂}
                (plus-Monoˡ g₂vs₁ {g₁vs₁} {g₁vs₂} (eval-Mono e₁ {vs₁} {vs₂} vs₁≼vs₂))
                (plus-Monoʳ g₁vs₂ {g₂vs₁} {g₂vs₂} (eval-Mono e₂ {vs₁} {vs₂} vs₁≼vs₂))
    eval-Mono (e₁ - e₂) {vs₁} {vs₂} vs₁≼vs₂ =
        let
            -- TODO: here too -- can this be done with less boilerplate?
            g₁vs₁ = eval e₁ vs₁
            g₂vs₁ = eval e₂ vs₁
            g₁vs₂ = eval e₁ vs₂
            g₂vs₂ = eval e₂ vs₂
        in
            ≼ᵍ-trans
                {minus g₁vs₁ g₂vs₁} {minus g₁vs₂ g₂vs₁} {minus g₁vs₂ g₂vs₂}
                (minus-Monoˡ g₂vs₁ {g₁vs₁} {g₁vs₂} (eval-Mono e₁ {vs₁} {vs₂} vs₁≼vs₂))
                (minus-Monoʳ g₁vs₂ {g₂vs₁} {g₂vs₂} (eval-Mono e₂ {vs₁} {vs₂} vs₁≼vs₂))
    eval-Mono (` k) {vs₁@((kvs₁ , _) , _)} {vs₂@((kvs₂ , _), _)} vs₁≼vs₂
        with ∈k-decᵛ k kvs₁ | ∈k-decᵛ k kvs₂
    ...   | yes k∈kvs₁ | yes k∈kvs₂ =
            let
                (v₁ , k,v₁∈vs₁) = locateᵛ {k} {vs₁} k∈kvs₁
                (v₂ , k,v₂∈vs₂) = locateᵛ {k} {vs₂} k∈kvs₂
            in
                m₁≼m₂⇒m₁[k]ᵛ≼m₂[k]ᵛ vs₁ vs₂ vs₁≼vs₂ k,v₁∈vs₁ k,v₂∈vs₂
    ...   | yes k∈kvs₁ | no k∉kvs₂ = ⊥-elim (k∉kvs₂ (subst (λ l → k ∈ˡ l) (all-equal-keysᵛ vs₁ vs₂) k∈kvs₁))
    ...   | no k∉kvs₁ | yes k∈kvs₂ = ⊥-elim (k∉kvs₁ (subst (λ l → k ∈ˡ l) (all-equal-keysᵛ vs₂ vs₁) k∈kvs₂))
    ...   | no k∉kvs₁ | no k∉kvs₂ = IsLattice.≈-refl isLatticeᵍ
    eval-Mono (# 0) _ = ≈ᵍ-refl
    eval-Mono (# (suc n')) _ = ≈ᵍ-refl

That’s all we need. With this, I just instantiate the Forward module we have been working with, and make use of the result. I also used a show function (which I defined) to stringify that output.

From Sign.agda, lines 225 through 229

    module ForwardWithEval = ForwardWithProg.WithEvaluator eval eval-Mono
    open ForwardWithEval using (result)

    -- For debugging purposes, print out the result.
    output = show result

But wait, result? We haven’t seen a result just yet. That’s the last piece, and it involves finally making use of the fixed-point algorithm.

Invoking the Fixed Point Algorithm

Our $\text{Info}$ lattice is of finite height, and the function we have defined is monotonic (by virtue of being constructed only from map updates, which are monotonic by Exercise 4.26, and from function composition, which preserves monotonicity). We can therefore apply the fixed-point-algorithm, and compute the least fixed point:

From Forward.agda, lines 235 through 238

        open import Fixedpoint ≈ᵐ-dec isFiniteHeightLatticeᵐ analyze (λ {m₁} {m₂} m₁≼m₂ → analyze-Mono {m₁} {m₂} m₁≼m₂)
            using ()
            renaming (aᶠ to result; aᶠ≈faᶠ to result≈analyze-result)
            public

With this, analyze is the result of our forward analysis!

In a Main.agda file, I invoked this analysis on a sample program:

testCode : Stmt
testCode =
    ⟨ "zero" ← (# 0) ⟩ then
    ⟨ "pos" ← ((` "zero") Expr.+ (# 1)) ⟩ then
    ⟨ "neg" ← ((` "zero") Expr.- (# 1)) ⟩ then
    ⟨ "unknown" ← ((` "pos") Expr.+ (` "neg")) ⟩

testProgram : Program
testProgram = record
    { rootStmt = testCode
    }

open WithProg testProgram using (output; analyze-correct)

main = run {0ℓ} (putStrLn output)

The result is verbose, since it shows variable signs for each statement in the program. However, the key is the last basic block, which shows the variables at the end of the program. It reads:

{"neg" ↦ -, "pos" ↦ +, "unknown" ↦ ⊤, "zero" ↦ 0, }

Verifying the Analysis

We now have a general framework for running forward analyses: you provide an abstract interpretation function for expressions, as well as a proof that this function is monotonic, and you get an Agda function that takes a program and tells you the variable states at every point. If your abstract interpretation function is for determining the signs of expressions, the final result is an analysis that determines all possible signs for all variables, anywhere in the code. It’s pretty easy to instantiate this framework with another type of forward analysis — in fact, by switching the plus function to one that uses AboveBelow ℤ, rather than AboveBelow Sign:

plus : ConstLattice → ConstLattice → ConstLattice
plus ⊥ᶜ _ = ⊥ᶜ
plus _ ⊥ᶜ = ⊥ᶜ
plus ⊤ᶜ _ = ⊤ᶜ
plus _ ⊤ᶜ = ⊤ᶜ
plus [ z₁ ]ᶜ [ z₂ ]ᶜ = [ z₁ Int.+ z₂ ]ᶜ

we can defined a constant-propagation analysis.

{"neg" ↦ -1, "pos" ↦ 1, "unknown" ↦ 0, "zero" ↦ 0, }

However, we haven’t proved our analysis correct, and we haven’t yet made use of the CFG-semantics equivalence that we proved in the previous section. I was hoping to get to it in this post, but there was just too much to cover. So, I will get to that in the next post, where we will make use of the remaining machinery to demonstrate that the output of our analyzer matches reality.

Implementing and Verifying "Static Program Analysis" in Agda, Part 7: Connecting Semantics and Control Flow Graphs

Thu, 28 Nov 2024 20:32:00 -0700

In the previous two posts, I covered two ways of looking at programs in my little toy language:

In part 5, I covered the formal semantics of the programming language. These are precise rules that describe how programs are executed. These serve as the source of truth for what each statement and expression does.

Because they are the source of truth, they capture all information about how programs are executed. To determine that a program starts in one environment and ends in another (getting a judgement $\rho_1, s \Rightarrow \rho_2$), we need to actually run the program. In fact, our Agda definitions encoding the semantics actually produce proof trees, which contain every single step of the program’s execution.
In part 6, I covered Control Flow Graphs (CFGs), which in short arranged code into a structure that represents how execution moves from one statement or expression to the next.

Unlike the semantics, CFGs do not capture a program’s entire execution; they merely contain the possible orders in which statements can be evaluated. Instead of capturing the exact number of iterations performed by a while loop, they encode repetition as cycles in the graph. Because they are missing some information, they’re more of an approximation of a program’s behavior.

Our analyses operate on CFGs, but it is our semantics that actually determine how a program behaves. In order for our analyses to be able to produce correct results, we need to make sure that there isn’t a disconnect between the approximation and the truth. In the previous post, I stated the property I will use to establish the connection between the two perspectives:

For each possible execution of a program according to its semantics, there exists a corresponding path through the graph.

By ensuring this property, we will guarantee that our Control Flow Graphs account for anything that might happen. Thus, a correct analysis built on top of the graphs will produce results that match reality.

Traces: Paths Through a Graph

A CFG contains each “basic” statement in our program, by definition; when we’re executing the program, we are therefore running code in one of the CFG’s nodes. When we switch from one node to another, there ought to be an edge between the two, since edges in the CFG encode possible control flow. We keep doing this until the program terminates (if ever).

Now, I said that there “ought to be edges” in the graph that correspond to our program’s execution. Moreover, the endpoints of these edges have to line up, since we can only switch which basic block / node we’re executing by following an edge. As a result, if our CFG is correct, then for every program execution, there is a path between the CFG’s nodes that matches the statements that we were executing.

Take the following program and CFG from the previous post as an example.

x = 2;
while x {
  x = x - 1;
}
y = x;

We start by executing x = 2, which is the top node in the CFG. Then, we execute the condition of the loop, x. This condition is in the second node from the top; fortunately, there exists an edge between x = 2 and x that allows for this possibility. Once we computed x, we know that it’s nonzero, and therefore we proceed to the loop body. This is the statement x = x - 1, contained in the bottom left node in the CFG. There is once again an edge between x and that node; so far, so good. Once we’re done executing the statement, we go back to the top of the loop again, following the edge back to the middle node. We then execute the condition, loop body, and condition again. At that point we have reduced x to zero, so the condition produces a falsey value. We exit the loop and execute y = x, which is allowed by the edge from the middle node to the bottom right node.

We will want to show that every possible execution of the program (e.g., with different variable assignments) corresponds to a path in the CFG. If one doesn’t, then our program can do something that our CFG doesn’t account for, which means that our analyses will not be correct.

I will define a Trace datatype, which will be an embellished path through the graph. At its core, a path is simply a list of indices together with edges that connect them. Viewed another way, it’s a list of edges, where each edge’s endpoint is the next edge’s starting point. We want to make illegal states unrepresentable, and therefore use the type system to assert that the edges are compatible. The easiest way to do this is by making our Trace indexed by its start and end points. An empty trace, containing no edges, will start and end in the same node; the :: equivalent for the trace will allow prepending one edge, starting at node i1 and ending in i2, to another trace which starts in i2 and ends in some arbitrary i3. Here’s an initial stab at that:

module _ {g : Graph} where
    open Graph g using (Index; edges; inputs; outputs)

    data Trace : Index → Index → Set where
        Trace-single : ∀ {idx : Index} → Trace idx idx
        Trace-edge : ∀ {idx₁ idx₂ idx₃ : Index} →
                     (idx₁ , idx₂) ∈ edges →
                     Trace idx₂ idx₃ → Trace idx₁ idx₃

This isn’t enough, though. Suppose you had a function that takes an evaluation judgement and produces a trace, resulting in a signature like this:

buildCfg-sufficient : ∀ {s : Stmt} {ρ₁ ρ₂ : Env} → ρ₁ , s ⇒ˢ ρ₂ →
                      let g = buildCfg s
                      in Σ (Index g × Index g) (λ (idx₁ , idx₂) → Trace {g} idx₁ idx₂)

What’s stopping this function from returning any trace through the graph, including one that doesn’t even include the statements in our program s? We need to narrow the type somewhat to require that the nodes it visits have some relation to the program execution in question.

We could do this by indexing the Trace data type by a list of statements that we expect it to match, and requiring that for each constructor, the statements of the starting node be at the front of that list. We could compute the list of executed statements in order using a recursive function on the _,_⇒ˢ_ data type. [note: I mentioned earlier that our encoding of the semantics is actually defining a proof tree, which includes every step of the computation. That's why we can write a function that takes the proof tree and extracts the executed statements. ]

That would work, but it loses a bit of information. The execution judgement contains not only each statement that was evaluated, but also the environments before and after evaluating it. Keeping those around will be useful: eventually, we’d like to state the invariant that at every CFG node, the results of our analysis match the current program environment. Thus, instead of indexing simply by the statements of code, I chose to index my Trace by the starting and ending environment, and to require it to contain evaluation judgements for each node’s code. The judgements include the statements that were evaluated, which we can match against the code in the CFG node. However, they also assert that the environments before and after are connected by that code in the language’s formal semantics. The resulting definition is as follows:

From Traces.agda, lines 10 through 18

module _ {g : Graph} where
    open Graph g using (Index; edges; inputs; outputs)

    data Trace : Index → Index → Env → Env → Set where
        Trace-single : ∀ {ρ₁ ρ₂ : Env} {idx : Index} →
                       ρ₁ , (g [ idx ]) ⇒ᵇˢ ρ₂ → Trace idx idx ρ₁ ρ₂
        Trace-edge : ∀ {ρ₁ ρ₂ ρ₃ : Env} {idx₁ idx₂ idx₃ : Index} →
                     ρ₁ , (g [ idx₁ ]) ⇒ᵇˢ ρ₂ → (idx₁ , idx₂) ∈ edges →
                     Trace idx₂ idx₃ ρ₂ ρ₃ → Trace idx₁ idx₃ ρ₁ ρ₃

The g [ idx ] and g [ idx₁ ] represent accessing the basic block code at indices idx and idx₁ in graph g.

Trace Preservation by Graph Operations

Our proofs of trace existence will have the same “shape” as the functions that build the graph. To prove the trace property, we’ll assume that evaluations of sub-statements correspond to traces in the sub-graphs, and use that to prove that the full statements have corresponding traces in the full graph. We built up graphs by combining sub-graphs for sub-statements, using _∙_ (overlaying two graphs), _↦_ (sequencing two graphs) and loop (creating a zero-or-more loop in the graph). Thus, to make the jump from sub-graphs to full graphs, we’ll need to prove that traces persist through overlaying, sequencing, and looping.

Take _∙_, for instance; we want to show that if a trace exists in the left operand of overlaying, it also exists in the final graph. This leads to the following statement and proof:

From Properties.agda, lines 88 through 97

Trace-∙ˡ : ∀ {g₁ g₂ : Graph} {idx₁ idx₂ : Graph.Index g₁} {ρ₁ ρ₂ : Env} →
           Trace {g₁} idx₁ idx₂ ρ₁ ρ₂ →
           Trace {g₁ ∙ g₂} (idx₁ Fin.↑ˡ Graph.size g₂) (idx₂ Fin.↑ˡ Graph.size g₂) ρ₁ ρ₂
Trace-∙ˡ {g₁} {g₂} {idx₁} {idx₁} (Trace-single ρ₁⇒ρ₂)
    rewrite sym (lookup-++ˡ (Graph.nodes g₁) (Graph.nodes g₂) idx₁) =
    Trace-single ρ₁⇒ρ₂
Trace-∙ˡ {g₁} {g₂} {idx₁} (Trace-edge ρ₁⇒ρ idx₁→idx tr')
    rewrite sym (lookup-++ˡ (Graph.nodes g₁) (Graph.nodes g₂) idx₁) =
    Trace-edge ρ₁⇒ρ (ListMemProp.∈-++⁺ˡ (x∈xs⇒fx∈fxs (_↑ˡ Graph.size g₂) idx₁→idx))
                    (Trace-∙ˡ tr')

There are some details there to discuss.

First, we have to change the indices of the returned Trace. That’s because they start out as indices into the graph g₁, but become indices into the graph g₁ ∙ g₂. To take care of this re-indexing, we have to make use of the ↑ˡ operators, which I described in this section of the previous post.
Next, in either case, we need to show that the new index acquired via ↑ˡ returns the same basic block in the new graph as the old index returned in the original graph. Fortunately, the Agda standard library provides a proof of this, lookup-++ˡ. The resulting equality is the following:
```
g₁ [ idx₁ ] ≡ (g₁ ∙ g₂) [ idx₁ ↑ˡ Graph.size g₂ ]
```
This allows us to use the evaluation judgement in each constructor for traces in the output of the function.
Lastly, in the Trace-edge case, we have to additionally return a proof that the edge used by the trace still exists in the output graph. This follows from the fact that we include the edges from g₁ after re-indexing them.
```
    ; edges = (Graph.edges g₁ ↑ˡᵉ Graph.size g₂) List.++
              (Graph.size g₁ ↑ʳᵉ Graph.edges g₂)
```
The ↑ˡᵉ function is just a list map with ↑ˡ. Thus, if a pair of edges is in the original list (Graph.edges g₁), as is evidenced by idx₁→idx, then its re-indexing is in the mapped list. To show this, I use the utility lemma x∈xs⇒fx∈fxs. The mapped list is the left-hand-side of a List.++ operator, so I additionally use the lemma ∈-++⁺ˡ that shows membership is preserved by list concatenation.

The proof of Trace-∙ʳ, the same property but for the right-hand operand g₂, is very similar, as are the proofs for sequencing. I give their statements, but not their proofs, below.

From Properties.agda, lines 99 through 101

 99
100
101

Trace-∙ʳ : ∀ {g₁ g₂ : Graph} {idx₁ idx₂ : Graph.Index g₂} {ρ₁ ρ₂ : Env} →
           Trace {g₂} idx₁ idx₂ ρ₁ ρ₂ →
           Trace {g₁ ∙ g₂} (Graph.size g₁ Fin.↑ʳ idx₁) (Graph.size g₁ Fin.↑ʳ idx₂) ρ₁ ρ₂

From Properties.agda, lines 139 through 141

139
140
141

Trace-↦ˡ : ∀ {g₁ g₂ : Graph} {idx₁ idx₂ : Graph.Index g₁} {ρ₁ ρ₂ : Env} →
           Trace {g₁} idx₁ idx₂ ρ₁ ρ₂ →
           Trace {g₁ ↦ g₂} (idx₁ Fin.↑ˡ Graph.size g₂) (idx₂ Fin.↑ˡ Graph.size g₂) ρ₁ ρ₂

From Properties.agda, lines 150 through 152

150
151
152

Trace-↦ʳ : ∀ {g₁ g₂ : Graph} {idx₁ idx₂ : Graph.Index g₂} {ρ₁ ρ₂ : Env} →
           Trace {g₂} idx₁ idx₂ ρ₁ ρ₂ →
           Trace {g₁ ↦ g₂} (Graph.size g₁ Fin.↑ʳ idx₁) (Graph.size g₁ Fin.↑ʳ idx₂) ρ₁ ρ₂

From Properties.agda, lines 175 through 176

175
176

Trace-loop : ∀ {g : Graph} {idx₁ idx₂ : Graph.Index g} {ρ₁ ρ₂ : Env} →
             Trace {g} idx₁ idx₂ ρ₁ ρ₂ → Trace {loop g} (2 Fin.↑ʳ idx₁) (2 Fin.↑ʳ idx₂) ρ₁ ρ₂

Preserving traces is unfortunately not quite enough. The thing that we’re missing is looping: the same sub-graph can be re-traversed several times as part of execution, which suggests that we ought to be able to combine multiple traces through a loop graph into one. Using our earlier concrete example, we might have traces for evaluating x then x = x -1 with the variable x being mapped first to 2 and then to 1. These traces occur back-to-back, so we will put them together into a single trace. To prove some properties about this, I’ll define a more precise type of trace.

End-To-End Traces

The key way that traces through a loop graph are combined is through the back-edges. Specifically, our loop graphs have edges from each of the output nodes to each of the input nodes. Thus, if we have two paths, both starting at the beginning of the graph and ending at the end, we know that the first path’s end has an edge to the second path’s beginning. This is enough to combine them.

This logic doesn’t work if one of the paths ends in the middle of the graph, and not on one of the outputs. That’s because there is no guarantee that there is a connecting edge.

To make things easier, I defined a new data type of “end-to-end” traces, whose first nodes are one of the graph’s inputs, and whose last nodes are one of the graph’s outputs.

From Traces.agda, lines 27 through 36

    record EndToEndTrace (ρ₁ ρ₂ : Env) : Set where
        constructor MkEndToEndTrace
        field
            idx₁ : Index
            idx₁∈inputs : idx₁ ∈ inputs

            idx₂ : Index
            idx₂∈outputs : idx₂ ∈ outputs

            trace : Trace idx₁ idx₂ ρ₁ ρ₂

We can trivially lift the proofs from the previous section to end-to-end traces. For example, here’s the lifted version of the first property we proved:

From Properties.agda, lines 110 through 121

EndToEndTrace-∙ˡ : ∀ {g₁ g₂ : Graph} {ρ₁ ρ₂ : Env} →
                   EndToEndTrace {g₁} ρ₁ ρ₂ →
                   EndToEndTrace {g₁ ∙ g₂} ρ₁ ρ₂
EndToEndTrace-∙ˡ {g₁} {g₂} etr = record
    { idx₁ = EndToEndTrace.idx₁ etr Fin.↑ˡ Graph.size g₂
    ; idx₁∈inputs = ListMemProp.∈-++⁺ˡ (x∈xs⇒fx∈fxs (Fin._↑ˡ Graph.size g₂)
                                                    (EndToEndTrace.idx₁∈inputs etr))
    ; idx₂ = EndToEndTrace.idx₂ etr Fin.↑ˡ Graph.size g₂
    ; idx₂∈outputs = ListMemProp.∈-++⁺ˡ (x∈xs⇒fx∈fxs (Fin._↑ˡ Graph.size g₂)
                                                    (EndToEndTrace.idx₂∈outputs etr))
    ; trace = Trace-∙ˡ (EndToEndTrace.trace etr)
    }

The other lifted properties are similar.

For looping, the proofs get far more tedious, because of just how many sources of edges there are in the output graph — they span four lines:

From Graphs.agda, lines 84 through 94

loop : Graph → Graph
loop g = record
    { size = 2 Nat.+ Graph.size g
    ; nodes = [] ∷ [] ∷ Graph.nodes g
    ; edges = (2 ↑ʳᵉ Graph.edges g) List.++
              List.map (zero ,_) (2 ↑ʳⁱ Graph.inputs g) List.++
              List.map (_, suc zero) (2 ↑ʳⁱ Graph.outputs g) List.++
              ((suc zero , zero) ∷ (zero , suc zero) ∷ [])
    ; inputs = zero ∷ []
    ; outputs = (suc zero) ∷ []
    }

I therefore made use of two helper lemmas. The first is about list membership under concatenation. Simply put, if you concatenate a bunch of lists, and one of them (l) contains some element x, then the concatenation contains x too.

From Utils.agda, lines 82 through 85

concat-∈ : ∀ {a} {A : Set a} {x : A} {l : List A} {ls : List (List A)} →
           x ∈ l → l ∈ ls → x ∈ foldr _++_ [] ls
concat-∈ x∈l (here refl) = ListMemProp.∈-++⁺ˡ x∈l
concat-∈ {ls = l' ∷ ls'} x∈l (there l∈ls') = ListMemProp.∈-++⁺ʳ l' (concat-∈ x∈l l∈ls')

I then specialized this lemma for concatenated groups of edges.

From Properties.agda, lines 162 through 172

loop-edge-groups : ∀ (g : Graph) → List (List (Graph.Edge (loop g)))
loop-edge-groups g =
    (2 ↑ʳᵉ Graph.edges g) ∷
    (List.map (zero ,_) (2 ↑ʳⁱ Graph.inputs g)) ∷
    (List.map (_, suc zero) (2 ↑ʳⁱ Graph.outputs g)) ∷
    ((suc zero , zero) ∷ (zero , suc zero) ∷ []) ∷
    []

loop-edge-help : ∀ (g : Graph) {l : List (Graph.Edge (loop g))} {e : Graph.Edge (loop g)} →
                 e ListMem.∈ l → l ListMem.∈ loop-edge-groups g →
                 e ListMem.∈ Graph.edges (loop g)

Now we can finally prove end-to-end properties of loop graphs. The simplest one is that they allow the code within them to be entirely bypassed (as when the loop body is evaluated zero times). I called this EndToEndTrace-loop⁰. The “input” node of the loop graph is index zero, while the “output” node of the loop graph is index suc zero. Thus, the key step is to show that an edge between these two indices exists:

From Properties.agda, lines 227 through 240

EndToEndTrace-loop⁰ : ∀ {g : Graph} {ρ : Env} →
                      EndToEndTrace {loop g} ρ ρ
EndToEndTrace-loop⁰ {g} {ρ} =
    let
        zero→suc = loop-edge-help g (there (here refl))
                                    (there (there (there (here refl))))
    in
        record
            { idx₁ = zero
            ; idx₁∈inputs = here refl
            ; idx₂ = suc zero
            ; idx₂∈outputs = here refl
            ; trace = Trace-single [] ++⟨ zero→suc ⟩ Trace-single []
            }

The only remaining novelty is the trace field of the returned EndToEndTrace. It uses the trace concatenation operation ++⟨_⟩. This operator allows concatenating two traces, which start and end at distinct nodes, as long as there’s an edge that connects them:

From Traces.agda, lines 21 through 25

    _++⟨_⟩_ : ∀ {idx₁ idx₂ idx₃ idx₄ : Index} {ρ₁ ρ₂ ρ₃ : Env} →
           Trace idx₁ idx₂ ρ₁ ρ₂ → (idx₂ , idx₃) ∈ edges →
           Trace idx₃ idx₄ ρ₂ ρ₃ → Trace idx₁ idx₄ ρ₁ ρ₃
    _++⟨_⟩_ (Trace-single ρ₁⇒ρ₂) idx₂→idx₃ tr = Trace-edge ρ₁⇒ρ₂ idx₂→idx₃ tr
    _++⟨_⟩_ (Trace-edge ρ₁⇒ρ₂ idx₁→idx' tr') idx₂→idx₃ tr = Trace-edge ρ₁⇒ρ₂ idx₁→idx' (tr' ++⟨ idx₂→idx₃ ⟩ tr)

The expression on line 239 of Properties.agda is simply the single-edge trace constructed from the edge 0 -> 1 that connects the start and end nodes of the loop graph. Both of those nodes is empty, so no code is evaluated in that case.

The proof for combining several traces through a loop follows a very similar pattern. However, instead of constructing a single-edge trace as we did above, it concatenates two traces from its arguments. Also, instead of using the edge from the first node to the last, it instead uses an edge from the last to the first, as I described at the very beginning of this section.

From Properties.agda, lines 209 through 225

EndToEndTrace-loop² : ∀ {g : Graph} {ρ₁ ρ₂ ρ₃ : Env} →
                      EndToEndTrace {loop g} ρ₁ ρ₂ →
                      EndToEndTrace {loop g} ρ₂ ρ₃ →
                      EndToEndTrace {loop g} ρ₁ ρ₃
EndToEndTrace-loop² {g} (MkEndToEndTrace zero (here refl) (suc zero) (here refl) tr₁)
                        (MkEndToEndTrace zero (here refl) (suc zero) (here refl) tr₂) =
    let
        suc→zero = loop-edge-help g (here refl)
                                    (there (there (there (here refl))))
    in
        record
            { idx₁ = zero
            ; idx₁∈inputs = here refl
            ; idx₂ = suc zero
            ; idx₂∈outputs = here refl
            ; trace = tr₁ ++⟨ suc→zero ⟩ tr₂
            }

Proof of Sufficiency

We now have all the pieces to show each execution of our program has a corresponding trace through a graph. Here is the whole proof:

From Properties.agda, lines 281 through 296

buildCfg-sufficient : ∀ {s : Stmt} {ρ₁ ρ₂ : Env} → ρ₁ , s ⇒ˢ ρ₂ →
                      EndToEndTrace {buildCfg s} ρ₁ ρ₂
buildCfg-sufficient (⇒ˢ-⟨⟩ ρ₁ ρ₂ bs ρ₁,bs⇒ρ₂) =
    EndToEndTrace-singleton (ρ₁,bs⇒ρ₂ ∷ [])
buildCfg-sufficient (⇒ˢ-then ρ₁ ρ₂ ρ₃ s₁ s₂ ρ₁,s₁⇒ρ₂ ρ₂,s₂⇒ρ₃) =
    buildCfg-sufficient ρ₁,s₁⇒ρ₂ ++ buildCfg-sufficient ρ₂,s₂⇒ρ₃
buildCfg-sufficient (⇒ˢ-if-true ρ₁ ρ₂ _ _ s₁ s₂ _ _ ρ₁,s₁⇒ρ₂) =
    EndToEndTrace-∙ˡ (buildCfg-sufficient ρ₁,s₁⇒ρ₂)
buildCfg-sufficient (⇒ˢ-if-false ρ₁ ρ₂ _ s₁ s₂ _ ρ₁,s₂⇒ρ₂) =
    EndToEndTrace-∙ʳ {buildCfg s₁} (buildCfg-sufficient ρ₁,s₂⇒ρ₂)
buildCfg-sufficient (⇒ˢ-while-true ρ₁ ρ₂ ρ₃ _ _ s _ _ ρ₁,s⇒ρ₂ ρ₂,ws⇒ρ₃) =
    EndToEndTrace-loop² {buildCfg s}
                        (EndToEndTrace-loop {buildCfg s} (buildCfg-sufficient ρ₁,s⇒ρ₂))
                        (buildCfg-sufficient ρ₂,ws⇒ρ₃)
buildCfg-sufficient (⇒ˢ-while-false ρ _ s _) =
    EndToEndTrace-loop⁰ {buildCfg s} {ρ}

We proceed by checking what inference rule was used to execute a particular statement, [note: Precisely, we proceed by induction on the derivation of $\rho_1, s \Rightarrow \rho_2$. ] because that’s what tells us what the program did in that particular moment.

When executing a basic statement, we know that we constructed a singleton graph that contains one node with that statement. Thus, we can trivially construct a single-step trace without any edges.
When executing a sequence of statements, we have two induction hypotheses. These state that the sub-graphs we construct for the first and second statement have the trace property. We also have two evaluation judgements (one for each statement), which means that we can apply that property to get traces. The buildCfg function sequences the two graphs, and we can sequence the two traces through them, resulting in a trace through the final output.
For both the then and else cases of evaluating an if statement, we observe that buildCfg overlays the sub-graphs of the two branches using _∙_. We also know that the two sub-graphs have the trace property.
- In the then case, since we have an evaluation judgement for s₁ (in variable ρ₁,s₁⇒ρ₂), we conclude that there’s a correct trace through the then sub-graph. Since that graph is the left operand of _∙_, we use EndToEndTrace-∙ˡ to show that the trace is preserved in the full graph.
- In the else case things are symmetric. We are evaluating s₂, with a judgement given by ρ₁,s₂⇒ρ₂. We use that to conclude that there’s a trace through the graph built from s₂. Since this sub-graph is the right operand of _∙_, we use EndToEndTrace-∙ʳ to show that it’s preserved in the full graph.
For the true case of while, we have two evaluation judgements: one for the body and one for the loop again, this time in a new environment. They are stored in ρ₁,s⇒ρ₂ and ρ₂,ws⇒ρ₃, respectively. The statement being evaluated by ρ₂,ws⇒ρ₃ is actually the exact same statement that’s being evaluated at the top level of the proof. Thus, we can use EndToEndTrace-loop², which sequences two traces through the same graph.

We also use EndToEndTrace-loop to lift the trace through buildCfg s into a trace through buildCfg (while e s).
For the false case of the while, we don’t execute any instructions, and finish evaluating right away. This corresponds to the do-nothing trace, which we have established exists using EndToEndTrace-loop⁰.

That’s it! We have now validated that the Control Flow Graphs we construct match the semantics of the programming language, which makes them a good input to our static program analyses. We can finally start writing those!

Defining and Verifying Static Program Analyses

We have all the pieces we need to define a formally-verified forward analysis:

We have used the framework of lattices to encode the precision of program analysis outputs. Smaller elements in a lattice are more specific, meaning more useful information.
We have implemented fixed-point algorithm, which finds the smallest solutions to equations in the form $f(x) = x$ for monotonic functions over lattices. By defining our analysis as such a function, we can apply the algorithm to find the most precise steady-state description of our program.
We have defined how our programs are executed, which is crucial for defining “correctness”.

Here’s how these pieces will fit together. We will construct a finite-height lattice. Every single element of this lattice will contain information about each variable at each node in the Control Flow Graph. We will then define a monotonic function that updates this information using the structure encoded in the CFG’s edges and nodes. Then, using the fixed-point algorithm, we will find the least element of the lattice, which will give us a precise description of all program variables at all points in the program. Because we have just validated our CFGs to be faithful to the language’s semantics, we’ll be able to prove that our algorithm produces accurate results.

The next post or two will be the last stretch; I hope to see you there!

Implementing and Verifying "Static Program Analysis" in Agda, Part 6: Control Flow Graphs

Wed, 27 Nov 2024 16:26:42 -0700

In the previous section, I’ve given a formal definition of the programming language that I’ve been trying to analyze. This formal definition serves as the “ground truth” for how our little imperative programs are executed; however, program analyses (especially in practice) seldom take the formal semantics as input. Instead, they focus on more pragmatic program representations from the world of compilers. One such representation are Control Flow Graphs (CFGs). That’s what I want to discuss in this post.

Let’s start by building some informal intuition. CFGs are pretty much what their name suggests: they are a type of graph; their edges show how execution might jump from one piece of code to another (how control might flow).

For example, take the below program.

x = ...;
if x {
  x = 1;
} else {
  x = 0;
}
y = x;

The CFG might look like this:

Here, the initialization of x with ..., as well as the if condition (just x), are guaranteed to execute one after another, so they occupy a single node. From there, depending on the condition, the control flow can jump to one of the branches of the if statement: the “then” branch if the condition is truthy, and the “else” branch if the condition is falsy. As a result, there are two arrows coming out of the initial node. Once either branch is executed, control always jumps to the code right after the if statement (the y = x). Thus, both the x = 1 and x = 0 nodes have a single arrow to the y = x node.

As another example, if you had a loop:

x = ...;
while x {
  x = x - 1;
}
y = x;

The CFG would look like this:

Here, the condition of the loop (x) is not always guaranteed to execute together with the code that initializes x. That’s because the condition of the loop is checked after every iteration, whereas the code before the loop is executed only once. As a result, x = ... and x occupy distinct CFG nodes. From there, the control flow can proceed in two different ways, depending on the value of x. If x is truthy, the program will proceed to the loop body (decrementing x). If x is falsy, the program will skip the loop body altogether, and go to the code right after the loop (y = x). This is indicated by the two arrows going out of the x node. After executing the body, we return to the condition of the loop to see if we need to run another iteration. Because of this, the decrementing node has an arrow back to the loop condition.

Now, let’s be a bit more precise. Control Flow Graphs are defined as follows:

The nodes are basic blocks. Paraphrasing Wikipedia’s definition, a basic block is a piece of code that has only one entry point and one exit point.

The one-entry-point rule means that it’s not possible to jump into the middle of the basic block, executing only half of its instructions. The execution of a basic block always begins at the top. Symmetrically, the one-exit-point rule means that you can’t jump away to other code, skipping some instructions. The execution of a basic block always ends at the bottom.

As a result of these constraints, when running a basic block, you are guaranteed to execute every instruction in exactly the order they occur in, and execute each instruction exactly once.
The edges are jumps between basic blocks. We’ve already seen how if and while statements introduce these jumps.

Basic blocks can only be made of code that doesn’t jump (otherwise, we violate the single-exit-point policy). In the previous post, we defined exactly this kind of code as simple statements. So, in our control flow graph, nodes will be sequences of simple statements.

Control Flow Graphs in Agda

Basic Definition

At an abstract level, it’s easy to say “it’s just a graph where X is Y” about anything. It’s much harder to give a precise definition of such a graph, particularly if you want to rule out invalid graphs (e.g., ones with edges pointing nowhere). In Agda, I chose the represent a CFG with two lists: one of nodes, and one of edges. Each node is simply a list of BasicStmts, as I described in a preceding paragraph. An edge is simply a pair of numbers, each number encoding the index of the node connected by the edge.

Here’s where it gets a little complicated. I don’t want to use plain natural numbers for indices, because that means you can easily introduce “broken” edge. For example, what if you have 4 nodes, and you have an edge (5, 5)? To avoid this, I picked the finite natural numbers represented by Fin as endpoints for edges.

data Fin : ℕ → Set where
  zero : Fin (suc n)
  suc  : (i : Fin n) → Fin (suc n)

Specifically, Fin n is the type of natural numbers less than n. Following this definition, Fin 3 represents the numbers 0, 1 and 2. These are represented using the same constructors as Nat: zero and suc. The type of zero is Fin (suc n) for any n; this makes sense because zero is less than any number plus one. For suc, the bound n of the input i is incremented by one, leading to another suc n in the final type. This makes sense because if i < n, then i + 1 < n + 1. I’ve previously explained this data type in another post on this site.

Here’s my definition of Graphs written using Fin:

From Graphs.agda, lines 24 through 39

    constructor MkGraph
    field
        size : ℕ

    Index : Set
    Index = Fin size

    Edge : Set
    Edge = Index × Index

    field
        nodes : Vec (List BasicStmt) size
        edges : List Edge
        inputs : List Index
        outputs : List Index

I explicitly used a size field, which determines how many nodes are in the graph, and serves as the upper bound for the edge indices. From there, an index Index into the node list is just a natural number less than size, [note: Ther are size natural numbers less than size:
0, 1, ..., size - 1. ] and an edge is just a pair of indices. The graph then contains a vector (exact-length list) nodes of all the basic blocks, and then a list of edges edges.

There are two fields here that I have not yet said anything about: inputs and outputs. When we have a complete CFG for our programs, these fields are totally unnecessary. However, as we are building the CFG, these will come in handy, by telling us how to stitch together smaller sub-graphs that we’ve already built. Let’s talk about that next.

Combining Graphs

Suppose you’re building a CFG for a program in the following form:

code1;
code2;

Where code1 and code2 are arbitrary pieces of code, which could include statements, loops, and pretty much anything else. Besides the fact that they occur one after another, these pieces of code are unrelated, and we can build CFGs for each one them independently. However, the fact that code1 and code2 are in sequence means that the full control flow graph for the above program should have edges going from the nodes in code1 to the nodes in code2. Of course, not every node in code1 should have such edges: that would mean that after executing any “basic” sequence of instructions, you could suddenly decide to skip the rest of code1 and move on to executing code2.

Thus, we need to be more precise about what edges we need to insert; we want to insert edges between the “final” nodes in code1 (where control ends up after code1 is finished executing) and the “initial” nodes in code2 (where control would begin once we started executing code2). Those are the outputs and inputs, respectively. When stitching together sequenced control graphs, we will connect each of the outputs of one to each of the inputs of the other.

This is defined by the operation g₁ ↦ g₂, which sequences two graphs g₁ and g₂:

From Graphs.agda, lines 72 through 83

_↦_ : Graph → Graph → Graph
_↦_ g₁ g₂ = record
    { size = Graph.size g₁ Nat.+ Graph.size g₂
    ; nodes = Graph.nodes g₁ ++ Graph.nodes g₂
    ; edges = (Graph.edges g₁ ↑ˡᵉ Graph.size g₂) List.++
              (Graph.size g₁ ↑ʳᵉ Graph.edges g₂) List.++
              (List.cartesianProduct (Graph.outputs g₁ ↑ˡⁱ Graph.size g₂)
                                     (Graph.size g₁ ↑ʳⁱ Graph.inputs g₂))
    ; inputs = Graph.inputs g₁ ↑ˡⁱ Graph.size g₂
    ; outputs = Graph.size g₁ ↑ʳⁱ Graph.outputs g₂
    }

The definition starts out pretty innocuous, but gets a bit complicated by the end. The sum of the numbers of nodes in the two operands becomes the new graph size, and the nodes from the two graphs are all included in the result. Then, the definitions start making use of various operators like ↑ˡᵉ and ↑ʳᵉ; these deserve an explanation.

The tricky thing is that when we’re concatenating lists of nodes, we are changing some of the indices of the elements within. For instance, in the lists [x] and [y], the indices of both x and y are 0; however, in the concatenated list [x, y], the index of x is still 0, but the index of y is 1. More generally, when we concatenate two lists l1 and l2, the indices into l1 remain unchanged, whereas the indices l2 are shifted by length l1.

Actually, that’s not all there is to it. The values of the indices into the left list don’t change, but their types do! They start as Fin (length l1), but for the whole list, these same indices will have type Fin (length l1 + length l2)).

To help deal with this, Agda provides the operators ↑ˡ and ↑ʳ that implement this re-indexing and re-typing. The former implements “re-indexing on the left” – given an index into the left list l1, it changes its type by adding the other list’s length to it, but keeps the index value itself unchanged. The latter implements “re-indexing on the right” – given an index into the right list l2, it adds the length of the first list to it (shifting it), and does the same to its type.

The definition leads to the following equations:

l1 : Vec A n
l2 : Vec A m

idx1 : Fin n -- index into l1
idx2 : Fin m -- index into l2

l1 [ idx1 ] ≡ (l1 ++ l2) [ idx1 ↑ˡ m ]
l2 [ idx2 ] ≡ (l1 ++ l2) [ n ↑ʳ idx2 ]

The operators used in the definition above are just versions of the same re-indexing operators. The ↑ˡᵉ operator applies ↑ˡ to all the (e)dges in a graph, and the ↑ˡⁱ applies it to all the (i)ndices in a list (like inputs and outputs).

Given these definitions, hopefully the intent with the rest of the definition is not too hard to see. The edges in the new graph come from three places: the graph g₁ and g₂, and from creating a new edge from each of the outputs of g₁ to each of the inputs of g₂. We keep the inputs of g₁ as the inputs of the whole graph (since g₁ comes first), and symmetrically we keep the outputs of g₂. Of course, we do have to re-index them to keep them pointing at the right nodes.

Another operation we will need is “overlaying” two graphs: this will be like placing them in parallel, without adding jumps between the two. We use this operation when combining the sub-CFGs of the “if” and “else” branches of an if/else, which both follow the condition, and both proceed to the code after the conditional.

From Graphs.agda, lines 59 through 70

_∙_ : Graph → Graph → Graph
_∙_ g₁ g₂ = record
    { size = Graph.size g₁ Nat.+ Graph.size g₂
    ; nodes = Graph.nodes g₁ ++ Graph.nodes g₂
    ; edges = (Graph.edges g₁ ↑ˡᵉ Graph.size g₂) List.++
              (Graph.size g₁ ↑ʳᵉ Graph.edges g₂)
    ; inputs = (Graph.inputs g₁ ↑ˡⁱ Graph.size g₂) List.++
               (Graph.size g₁ ↑ʳⁱ Graph.inputs g₂)
    ; outputs = (Graph.outputs g₁ ↑ˡⁱ Graph.size g₂) List.++
                (Graph.size g₁ ↑ʳⁱ Graph.outputs g₂)
    }

Everything here is just concatenation; we pool together the nodes, edges, inputs, and outputs, and the main source of complexity is the re-indexing.

The one last operation, which we will use for while loops, is looping. This operation simply connects the outputs of a graph back to its inputs (allowing looping), and also allows the body to be skipped. This is slightly different from the graph for while loops I showed above; the reason for that is that I currently don’t include the conditional expressions in my CFG. This is a limitation that I will address in future work.

From Graphs.agda, lines 85 through 95

loop g = record
    { size = 2 Nat.+ Graph.size g
    ; nodes = [] ∷ [] ∷ Graph.nodes g
    ; edges = (2 ↑ʳᵉ Graph.edges g) List.++
              List.map (zero ,_) (2 ↑ʳⁱ Graph.inputs g) List.++
              List.map (_, suc zero) (2 ↑ʳⁱ Graph.outputs g) List.++
              ((suc zero , zero) ∷ (zero , suc zero) ∷ [])
    ; inputs = zero ∷ []
    ; outputs = (suc zero) ∷ []
    }

Given these thee operations, I construct Control Flow Graphs as follows, where singleton creates a new CFG node with the given list of simple statements:

From Graphs.agda, lines 122 through 126

buildCfg ⟨ bs₁ ⟩ = singleton (bs₁ ∷ [])
buildCfg (s₁ then s₂) = buildCfg s₁ ↦ buildCfg s₂
buildCfg (if _ then s₁ else s₂) = buildCfg s₁ ∙ buildCfg s₂
buildCfg (while _ repeat s) = loop (buildCfg s)

Throughout this, I’ve been liberal to include empty CFG nodes as was convenient. This is a departure from the formal definition I gave above, but it makes things much simpler.

Additional Functions

To integrate Control Flow Graphs into our lattice-based program analyses, we’ll need to do a couple of things. First, upon reading the reference Static Program Analysis text, one sees a lot of quantification over the predecessors or successors of a given CFG node. For example, the following equation is from Chapter 5:

$$ \textit{JOIN}(v) = \bigsqcup_{w \in \textit{pred}(v)} \llbracket w \rrbracket $$

To compute the $\textit{JOIN}$ function (which we have not covered yet) for a given CFG node, we need to iterate over all of its predecessors, and combine their static information using $\sqcup$, which I first explained several posts ago. To be able to iterate over them, we need to be able to retrieve the predecessors of a node from a graph!

Our encoding does not make computing the predecessors particularly easy; to check if two nodes are connected, we need to check if an Index-Index pair corresponding to the nodes is present in the edges list. To this end, we need to be able to compare edges for equality. Fortunately, it’s relatively straightforward to show that our edges can be compared in such a way; after all, they are just pairs of Fins, and Fins and products support these comparisons.

From Graphs.agda, lines 149 through 152

module _ (g : Graph) where
    open import Data.Product.Properties as ProdProp using ()
    private _≟_ = ProdProp.≡-dec (FinProp._≟_ {Graph.size g})
                                 (FinProp._≟_ {Graph.size g})

Next, if we can compare edges for equality, we can check if an edge is in a list. Agda provides a built-in function for this:

From Graphs.agda, line 154

    open import Data.List.Membership.DecPropositional (_≟_) using (_∈?_)

To find the predecessors of a particular node, we go through all other nodes in the graph and see if there’s an edge there between those nodes and the current one. This is preferable to simply iterating over the edges because we may have duplicates in that list (why not?).

From Graphs.agda, lines 165 through 166

165
166

    predecessors : (Graph.Index g) → List (Graph.Index g)
    predecessors idx = List.filter (λ idx' → (idx' , idx) ∈? (Graph.edges g)) indices

Above, indices is a list of all the node identifiers in the graph. Since the graph has size nodes, the indices of all these nodes are simply the values 0, 1, …, size - 1. I defined a special function finValues to compute this list, together with a proof that this list is unique.

From Graphs.agda, lines 127 through 143

private
    z≢sf : ∀ {n : ℕ} (f : Fin n) → ¬ (zero ≡ suc f)
    z≢sf f ()

    z≢mapsfs : ∀ {n : ℕ} (fs : List (Fin n)) → All (λ sf → ¬ zero ≡ sf) (List.map suc fs)
    z≢mapsfs [] = []
    z≢mapsfs (f ∷ fs') = z≢sf f ∷ z≢mapsfs fs'

    finValues : ∀ (n : ℕ) → Σ (List (Fin n)) Unique
    finValues 0 = ([] , Utils.empty)
    finValues (suc n') =
        let
            (inds' , unids') = finValues n'
        in
            ( zero ∷ List.map suc inds'
            , push (z≢mapsfs inds') (Unique-map suc suc-injective unids')
            )

Another important property of finValues is that each node identifier is present in the list, so that our computation written by traversing the node list do not “miss” nodes.

From Graphs.agda, lines 145 through 147

145
146
147

    finValues-complete : ∀ (n : ℕ) (f : Fin n) → f ListMem.∈ (proj₁ (finValues n))
    finValues-complete (suc n') zero = RelAny.here refl
    finValues-complete (suc n') (suc f') = RelAny.there (x∈xs⇒fx∈fxs suc (finValues-complete n' f'))

We can specialize these definitions for a particular graph g:

From Graphs.agda, lines 156 through 163

    indices : List (Graph.Index g)
    indices = proj₁ (finValues (Graph.size g))

    indices-complete : ∀ (idx : (Graph.Index g)) → idx ListMem.∈ indices
    indices-complete = finValues-complete (Graph.size g)

    indices-Unique : Unique indices
    indices-Unique = proj₂ (finValues (Graph.size g))

To recap, we now have:

A way to build control flow graphs from programs
A list (unique’d and complete) of all nodes in the control flow graph so that we can iterate over them when the algorithm demands.
A ‘predecessors’ function, which will be used by our static program analyses, implemented as an iteration over the list of nodes.

All that’s left is to connect our predecessors function to edges in the graph. The following definitions say that when an edge is in the graph, the starting node is listed as a predecessor of the ending node, and vise versa.

From Graphs.agda, lines 168 through 177

    edge⇒predecessor : ∀ {idx₁ idx₂ : Graph.Index g} → (idx₁ , idx₂) ListMem.∈ (Graph.edges g) →
                    idx₁ ListMem.∈ (predecessors idx₂)
    edge⇒predecessor {idx₁} {idx₂} idx₁,idx₂∈es =
        ∈-filter⁺ (λ idx' → (idx' , idx₂) ∈? (Graph.edges g))
                  (indices-complete idx₁) idx₁,idx₂∈es

    predecessor⇒edge : ∀ {idx₁ idx₂ : Graph.Index g} → idx₁ ListMem.∈ (predecessors idx₂) →
                       (idx₁ , idx₂) ListMem.∈ (Graph.edges g)
    predecessor⇒edge {idx₁} {idx₂} idx₁∈pred =
        proj₂ (∈-filter⁻ (λ idx' → (idx' , idx₂) ∈? (Graph.edges g)) {v = idx₁} {xs = indices} idx₁∈pred )

Connecting Two Distinct Representations

I’ve described Control Flow Graphs as a compiler-centric representation of the program. Unlike the formal semantics from the previous post, CFGs do not reason about the dynamic behavior of the code. Instead, they capture the possible paths that execution can take through the instructions. In that sense, they are more of an approximation of what the program will do. This is good: because of Rice’s theorem, we can’t do anything other than approximating without running the program.

However, an incorrect approximation is of no use at all. Since the CFGs we build will be the core data type used by our program analyses, it’s important that they are an accurate, if incomplete, representation. Specifically, because most of our analyses reason about possible outcomes — we report what sign each variable could have, for instance — it’s important that we don’t accidentally omit cases that can happen in practice from our CFGs. Formally, this means that for each possible execution of a program according to its semantics, there exists a corresponding path through the graph. [note: The converse is desirable too: that the graph has only paths that correspond to possible executions of the program. One graph that violates this property is the strongly-connected graph of all basic blocks in a program. Analyzing such a graph would give us an overly-conservative estimation; since anything can happen, most of our answers will likely be too general to be of any use. If, on the other hand, only the necessary graph connections exist, we can be more precise.

However, proving this converse property (or even stating it precisely) is much harder, because our graphs are somewhat conservative already. There exist programs in which the condition of an if-statement is always evaluated to false, but our graphs always have edges for both the "then" and "else" cases. Determining whether a condition is always false (e.g.) is undecidable thanks to Rice's theorem (again), so we can't rule it out. Instead, we could broaden "all possible executions" to "all possible executions where branching conditions can produce arbitrary results", but this is something else entirely.

For the time being, I will leave this converse property aside. As a result, our approximations might be "too careful". However, they will at the very least be sound. ]

In the next post, I will prove that this property holds for the graphs shown here and the formal semantics I defined earlier. I hope to see you there!

Implementing and Verifying "Static Program Analysis" in Agda, Part 5: Our Programming Language

Sun, 03 Nov 2024 17:50:27 -0800

In the previous several posts, I’ve formalized the notion of lattices, which are an essential ingredient to formalizing the analyses in Anders Møller’s lecture notes. However, there can be no program analysis without a program to analyze! In this post, I will define the (very simple) language that we will be analyzing. An essential aspect of the language is its semantics, which simply speaking explains what each feature of the language does. At the end of the previous article, I gave the following inference rule which defined (partially) how the if-else statement in the language works.

$$ \frac{\rho_1, e \Downarrow z \quad \neg (z = 0) \quad \rho_1,s_1 \Downarrow \rho_2} {\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2} $$

Like I mentioned then, this rule reads as follows:

If the condition of an if-else statement evaluates to a nonzero value, then to evaluate the statement, you evaluate its then branch.

Another similar — but crucially, not equivlalent – rule is the following:

$$ \frac{\rho_1, e \Downarrow z \quad z = 1 \quad \rho_1,s_1 \Downarrow \rho_2} {\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2} $$

This time, the English interpretation of the rule is as follows:

If the condition of an if-else statement evaluates to one, then to evaluate the statement, you evaluate its then branch.

These rules are certainly not equivalent. For instance, the former allows the “then” branch to be executed when the condition is 2; however, in the latter, the value of the conditional must be 1. If our analysis were intelligent (our first few will not be), then this difference would change its output when determining the signs of the following program:

x = 2
if x {
  y = - 1
} else {
  y = 1
}

Using the first, more “relaxed” rule, the condition would be considered “true”, and the sign of y would be -. On the other hand, using the second, “stricter” rule, the sign of y would be +. I stress that in this case, I am showing a flow-sensitive analysis (one that can understand control flow and make more specific predictions); for our simplest analyses, we will not be aiming for flow-sensitivity. There is plenty of work to do even then.

The point of showing these two distinct rules is that we need to be very precise about how the language will behave, because our analyses depend on that behavior.

Let’s not get ahead of ourselves, though. I’ve motivated the need for semantics, but there is much groundwork to be laid before we delve into the precise rules of our language. After all, to define the language’s semantics, we need to have a language.

The Syntax of Our Simple Language

I’ve shown a couple of examples our our language now, and there won’t be that much more to it. We can start with expressions: things that evaluate to something. Some examples of expressions are 1, x, and 2-(x+y). For our specific language, the precise set of possible expressions can be given by the following Context-Free Grammar:

$$ \begin{array}{rcll} e & ::= & x & \text{(variables)} \\ & | & z & \text{(integer literals)} \\ & | & e + e & \text{(addition)} \\ & | & e - e & \text{(subtraction)} \end{array} $$

The above can be read as follows:

An expression $e$ is one of the following things:

Some variable $x$ [importantly $x$ is a placeholder for any variable, which could be x or y in our program code; specifically, $x$ is a metavariable.]

Some integer $z$ [once again, $z$ can be any integer, like 1, -42, etc.].

The addition of two other expressions [which could themselves be additions etc.].

The subtraction of two other expressions [which could also themselves be additions, subtractions, etc.].

Since expressions can be nested within other expressions — which is necessary to allow complicated code like 2-(x+y) above — they form a tree. Each node is one of the elements of the grammar above (variable, addition, etc.). If a node contains sub-expressions (like addition and subtraction do), then these sub-expressions form sub-trees of the given node. This data structure is called an Abstract Syntax Tree.

Notably, though 2-(x+y) has parentheses, our grammar above does not include include them as a case. The reason for this is that the structure of an abstract syntax tree is sufficient to encode the order in which the operations should be evaluated. Since I lack a nice way of drawing ASTs, I will use an ASCII drawing to show an example.

Expression: 2 - (x+y)
    (-)
   /   \
  2    (+)
      /   \
     x     y


Expression: (2-x) + y
       (+)
      /   \
    (-)    y
   /   \
  2     x

Above, in the first AST, (+) is a child of the (-) node, which means that it’s a sub-expression. As a result, that subexpression is evaluated first, before evaluating (-), and so, the AST expresents 2-(x+y). In the other example, (-) is a child of (+), and is therefore evaluated first. The resulting association encoded by that AST is (2-x)+y.

To an Agda programmer, the one-of-four-things definition above should read quite similarly to the definition of an algebraic data type. Indeed, this is how we can encode the abstract syntax tree of expressions:

From Base.agda, lines 12 through 16

data Expr : Set where
    _+_ : Expr → Expr → Expr
    _-_ : Expr → Expr → Expr
    `_ : String → Expr
    #_ : ℕ → Expr

The only departure from the grammar above is that I had to invent constructors for the variable and integer cases, since Agda doesn’t support implicit coercions. This adds a little bit of extra overhead, requiring, for example, that we write numbers as # 42 instead of 42.

Having defined expressions, the next thing on the menu is statements. Unlike expressions, which just produce values, statements “do something”; an example of a statement might be the following Python line:

print("Hello, world!")

The print function doesn’t produce any value, but it does perform an action; it prints its argument to the console!

For the formalization, it turns out to be convenient to separate “simple” statements from “complex” ones. Pragmatically speaking, the difference is that between the “simple” and the “complex” is control flow; simple statements will be guaranteed to always execute without any decisions or jumps. The reason for this will become clearer in subsequent posts; I will foreshadow a bit by saying that consecutive simple statements can be placed into a single basic block.

The following is a group of three simple statements:

x = 1
y = x + 2
noop

These will always be executed in the same order, exactly once. Here, noop is a convenient type of statement that simply does nothing.

On the other hand, the following statement is not simple:

while x {
  x = x - 1
}

It’s not simple because it makes decisions about how the code should be executed; if x is nonzero, it will try executing the statement in the body of the loop (x = x - 1). Otherwise, it would skip evaluating that statement, and carry on with subsequent code.

I first define simple statements using the BasicStmt type:

From Base.agda, lines 18 through 20

18
19
20

data BasicStmt : Set where
    _←_ : String → Expr → BasicStmt
    noop : BasicStmt

Complex statements are just called Stmt; they include loops, conditionals and sequences — $s_1\ \text{then}\ s_2$ [note: The standard notation for sequencing in imperative languages is $s_1; s_2$. However, Agda gives special meaning to the semicolon, and I couldn't find any passable symbolic alternatives. ] is a sequence where $s_2$ is evaluated after $s_1$. Complex statements subsume simple statements, which I model using the constructor ⟨_⟩.

From Base.agda, lines 25 through 29

data Stmt : Set where
    ⟨_⟩ : BasicStmt → Stmt
    _then_ : Stmt → Stmt → Stmt
    if_then_else_ : Expr → Stmt → Stmt → Stmt
    while_repeat_ : Expr → Stmt → Stmt

For an example of using this encoding, take the following simple program:

var = 1
if var {
  x = 1
}

The Agda version is:

From Main.agda, lines 27 through 34

testCodeCond₂ : Stmt
testCodeCond₂ =
    ⟨ "var" ← (# 1) ⟩ then
    if (` "var") then (
        ⟨ "x" ← (# 1) ⟩
    ) else (
        ⟨ noop ⟩
    )

Notice how we used noop to express the fact that the else branch of the conditional does nothing.

The Semantics of Our Language

We now have all the language constructs that I’ll be showing off — because those are all the concepts that I’ve formalized. What’s left is to define how they behave. We will do this using a logical tool called inference rules. I’ve written about them a number of times; they’re ubiquitous, particularly in the sorts of things I like explore on this site. The section on inference rules from my Advent of Code series is pretty relevant, and the notation section from a post in my compiler series says much the same thing; I won’t be re-describing them here.

There are three pieces which demand semantics: expressions, simple statements, and non-simple statements. The semantics of each of the three requires the semantics of the items that precede it. We will therefore start with expressions.

Expressions

The trickiest thing about expression is that the value of an expression depends on the “context”: x+1 can evaluate to 43 if x is 42, or it can evaluate to 0 if x is -1. To evaluate an expression, we will therefore need to assign values to all of the variables in that expression. A mapping that assigns values to variables is typically called an environment. We will write $\varnothing$ for “empty environment”, and $\{\texttt{x} \mapsto 42, \texttt{y} \mapsto -1 \}$ for an environment that maps the variable $\texttt{x}$ to 42, and the variable $\texttt{y}$ to -1.

Now, a bit of notation. We will use the letter $\rho$ to represent environments (and if several environments are involved, we will occasionally number them as $\rho_1$, $\rho_2$, etc.) We will use the letter $e$ to stand for expressions, and the letter $v$ to stand for values. Finally, we’ll write $\rho, e \Downarrow v$ to say that “in an environment $\rho$, expression $e$ evaluates to value $v$”. Our two previous examples of evaluating x+1 can thus be written as follows:

$$ \{ \texttt{x} \mapsto 42 \}, \texttt{x}+1 \Downarrow 43 \\ \{ \texttt{x} \mapsto -1 \}, \texttt{x}+1 \Downarrow 0 \\ $$

Now, on to the actual rules for how to evaluate expressions. Most simply, integer literals like 1 just evaluate to themselves.

$$ \frac{n \in \text{Int}}{\rho, n \Downarrow n} $$

Note that the letter $\rho$ is completely unused in the above rule. That’s because no matter what values variables have, a number still evaluates to the same value. As we’ve already established, the same is not true for a variable like $\texttt{x}$. To evaluate such a variable, we need to retrieve the value it’s mapped to in the current environment, which we will write as $\rho(\texttt{x})$. This gives the following inference rule:

$$ \frac{\rho(x) = v}{\rho, x \Downarrow v} $$

All that’s left is to define addition and subtraction. For an expression in the form $e_1+e_2$, we first need to evaluate the two subexpressions $e_1$ and $e_2$, and then add the two resulting numbers. As a result, the addition rule includes two additional premises, one for evaluating each summand.

$$ \frac {\rho, e_1 \Downarrow v_1 \quad \rho, e_2 \Downarrow v_2 \quad v_1 + v_2 = v} {\rho, e_1+e_2 \Downarrow v} $$

The subtraction rule is similar. Below, I’ve configured an instance of Bergamot to interpret these exact rules. Try typing various expressions like 1, 1+1, etc. into the input box below to see them evaluate. If you click the “Full Proof Tree” button, you can also view the exact rules that were used in computing a particular value. The variables x, y, and z are pre-defined for your convenience.

The Agda equivalent of this looks very similar to the rules themselves. I use ⇒ᵉ instead of $\Downarrow$, and there’s a little bit of tedium with wrapping integers into a new Value type. I also used a (partial) relation (x, v) ∈ ρ instead of explicitly defining accessing an environment, since it is conceivable for a user to attempt accessing a variable that has not been assigned to. Aside from these notational changes, the structure of each of the constructors of the evaluation data type matches the inference rules I showed above.

From Semantics.agda, lines 27 through 35

data _,_⇒ᵉ_ : Env → Expr → Value → Set where
    ⇒ᵉ-ℕ : ∀ (ρ : Env) (n : ℕ) → ρ , (# n) ⇒ᵉ (↑ᶻ (+ n))
    ⇒ᵉ-Var : ∀ (ρ : Env) (x : String) (v : Value) → (x , v) ∈ ρ → ρ , (` x) ⇒ᵉ v
    ⇒ᵉ-+ : ∀ (ρ : Env) (e₁ e₂ : Expr) (z₁ z₂ : ℤ) →
           ρ , e₁ ⇒ᵉ (↑ᶻ z₁) → ρ , e₂ ⇒ᵉ (↑ᶻ z₂) →
           ρ , (e₁ + e₂) ⇒ᵉ (↑ᶻ (z₁ +ᶻ z₂))
    ⇒ᵉ-- : ∀ (ρ : Env) (e₁ e₂ : Expr) (z₁ z₂ : ℤ) →
           ρ , e₁ ⇒ᵉ (↑ᶻ z₁) → ρ , e₂ ⇒ᵉ (↑ᶻ z₂) →
           ρ , (e₁ - e₂) ⇒ᵉ (↑ᶻ (z₁ -ᶻ z₂))

Simple Statements

The main difference between formalizing (simple and “normal”) statements is that they modify the environment. If x has one value, writing x = x + 1 will certainly change that value. On the other hand, statements don’t produce values. So, we will be writing claims like $\rho_1 , \textit{bs} \Rightarrow \rho_2$ to say that the basic statement $\textit{bs}$, when starting in environment $\rho_1$, will produce environment $\rho_2$. Here’s an example:

$$ \{ \texttt{x} \mapsto 42, \texttt{y} \mapsto 17 \}, \ \texttt{x = x - \text{1}} \Rightarrow \{ \texttt{x} \mapsto 41, \texttt{y} \mapsto 17 \} $$

Here, we subtracted one from a variable with value 42, leaving it with a new value of 41.

There are two basic statements, and one of them quite literally does nothing. The inference rule for noop is very simple:

$$ \rho,\ \texttt{noop} \Rightarrow \rho $$

For the assignment rule, we need to know how to evaluate the expression on the right side of the equal sign. This is why we needed to define the semantics of expressions first. Given those, the evaluation rule for assignment is as follows, with $\rho[x \mapsto v]$ meaning “the environment $\rho$ but mapping the variable $x$ to value $v$”.

$$ \frac {\rho, e \Downarrow v} {\rho, x = e \Rightarrow \rho[x \mapsto v]} $$

Those are actually all the rules we need, and below, I am once again configuring a Bergamot instance, this time with simple statements. Try out noop or some sort of variable assignment, like x = x + 1.

The Agda implementation is once again just a data type with constructors-for-rules. This time they also look quite similar to the rules I’ve shown up until now, though I continue to explicitly quantify over variables like ρ.

From Semantics.agda, lines 37 through 40

data _,_⇒ᵇ_ : Env → BasicStmt → Env → Set where
    ⇒ᵇ-noop : ∀ (ρ : Env) → ρ , noop ⇒ᵇ ρ
    ⇒ᵇ-← : ∀ (ρ : Env) (x : String) (e : Expr) (v : Value) →
           ρ , e ⇒ᵉ v → ρ , (x ← e) ⇒ᵇ ((x , v) List.∷ ρ)

Statements

Let’s work on non-simple statements next. The easiest rule to define is probably sequencing. When we use then (or ;) to combine two statements, what we actually want is to execute the first statement, which may change variables, and then execute the second statement while keeping the changes from the first. This means there are three environments: $\rho_1$ for the initial state before either statement is executed, $\rho_2$ for the state between executing the first and second statement, and $\rho_3$ for the final state after both are done executing. This leads to the following rule:

$$ \frac { \rho_1, s_1 \Rightarrow \rho_2 \quad \rho_2, s_2 \Rightarrow \rho_3 } { \rho_1, s_1; s_2 \Rightarrow \rho_3 } $$

We will actually need two rules to evaluate the conditional statement: one for when the condition evaluates to “true”, and one for when the condition evaluates to “false”. Only, I never specified booleans as being part of the language, which means that we will need to come up what “false” and “true” are. I will take my cue from C++ and use zero as “false”, and any other number as “true”.

If the condition of an if-else statement is “true” (nonzero), then the effect of executing the if-else should be the same as executing the “then” part of the statement, while completely ignoring the “else” part.

$$ \frac { \rho_1 , e \Downarrow v \quad v \neq 0 \quad \rho_1, s_1 \Rightarrow \rho_2} { \rho_1, \textbf{if}\ e\ \{ s_1 \}\ \textbf{else}\ \{ s_2 \}\ \Rightarrow \rho_2 } $$

Notice that in the above rule, we used the evaluation judgement $\rho_1, e \Downarrow v$ to evaluate the expression that serves as the condition. We then had an additional premise that requires the truthiness of the resulting value $v$. The rule for evaluating a conditional with a “false” branch is very similar.

$$ \frac { \rho_1 , e \Downarrow v \quad v = 0 \quad \rho_1, s_2 \Rightarrow \rho_2} { \rho_1, \textbf{if}\ e\ \{ s_1 \}\ \textbf{else}\ \{ s_2 \}\ \Rightarrow \rho_2 } $$

Now that we have rules for conditional statements, it will be surprisingly easy to define the rules for while loops. A while loop will also have two rules, one for when its condition is truthy and one for when it’s falsey. However, unlike the “false” case, a while loop will do nothing, leaving the environment unchanged:

$$ \frac { \rho_1 , e \Downarrow v \quad v = 0 } { \rho_1 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_1 } $$

The trickiest rule is for when the condition of a while loop is true. We evaluate the body once, starting in environment $\rho_1$ and finishing in $\rho_2$, but then we’re not done. In fact, we have to go back to the top, and check the condition again, starting over. As a result, we include another premise, that tells us that evaluating the loop starting at $\rho_2$, we eventually end in state $\rho_3$. This encodes the “rest of the iterations” in addition to the one we just performed. The environment $\rho_3$ is our final state, so that’s what we use in the rule’s conclusion.

$$ \frac { \rho_1 , e \Downarrow v \quad v \neq 0 \quad \rho_1 , s \Rightarrow \rho_2 \quad \rho_2 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_3 } { \rho_1 , \textbf{while}\ e\ \{ s \}\ \Rightarrow \rho_3 } $$

And that’s it! We have now seen every rule that defines the little object language I’ve been using for my Agda work. Below is a Bergamot widget that implements these rules. Try the following program, which computes the xth power of two, and stores it in y:

x = 5; y = 1; while (x) { y = y + y; x = x - 1 }

As with all the other rules we’ve seen, the mathematical notation above can be directly translated into Agda:

From Semantics.agda, lines 47 through 64

data _,_⇒ˢ_ : Env → Stmt → Env → Set where
    ⇒ˢ-⟨⟩ : ∀ (ρ₁ ρ₂ : Env) (bs : BasicStmt) →
            ρ₁ , bs ⇒ᵇ ρ₂ → ρ₁ , ⟨ bs ⟩ ⇒ˢ ρ₂
    ⇒ˢ-then : ∀ (ρ₁ ρ₂ ρ₃ : Env) (s₁ s₂ : Stmt) →
              ρ₁ , s₁ ⇒ˢ ρ₂ → ρ₂ , s₂ ⇒ˢ ρ₃ →
              ρ₁ , (s₁ then s₂) ⇒ˢ ρ₃
    ⇒ˢ-if-true : ∀ (ρ₁ ρ₂ : Env) (e : Expr) (z : ℤ) (s₁ s₂ : Stmt) →
        ρ₁ , e ⇒ᵉ (↑ᶻ z) → ¬ z ≡ (+ 0) → ρ₁ , s₁ ⇒ˢ ρ₂ →
        ρ₁ , (if e then s₁ else s₂) ⇒ˢ ρ₂
    ⇒ˢ-if-false : ∀ (ρ₁ ρ₂ : Env) (e : Expr) (s₁ s₂ : Stmt) →
        ρ₁ , e ⇒ᵉ (↑ᶻ (+ 0)) → ρ₁ , s₂ ⇒ˢ ρ₂ →
        ρ₁ , (if e then s₁ else s₂) ⇒ˢ ρ₂
    ⇒ˢ-while-true : ∀ (ρ₁ ρ₂ ρ₃ : Env) (e : Expr) (z : ℤ) (s : Stmt) →
        ρ₁ , e ⇒ᵉ (↑ᶻ z) → ¬ z ≡ (+ 0) → ρ₁ , s ⇒ˢ ρ₂ → ρ₂ , (while e repeat s) ⇒ˢ ρ₃ →
        ρ₁ , (while e repeat s) ⇒ˢ ρ₃
    ⇒ˢ-while-false : ∀ (ρ : Env) (e : Expr) (s : Stmt) →
        ρ , e ⇒ᵉ (↑ᶻ (+ 0)) →
        ρ , (while e repeat s) ⇒ˢ ρ

Semantics as Ground Truth

Prior to this post, we had been talking about using lattices and monotone functions for program analysis. The key problem with using this framework to define analyses is that there are many monotone functions that produce complete nonsese; their output is, at best, unrelated to the program they’re supposed to analyze. We don’t want to write such functions, since having incorrect information about the programs in question is unhelpful.

What does it mean for a function to produce correct information, though? In the context of sign analysis, it would mean that if we say a variable x is +, then evaluating the program will leave us in a state in which x is posive. The semantics we defined in this post give us the “evaluating the program piece”. They establish what the programs actually do, and we can use this ground truth when checking that our analyses are correct. In subsequent posts, I will prove the exact property I informally stated above: for the program analyses we define, things they “claim” about our program will match what actually happens when executing the program using our semantics.

A piece of the puzzle still remains: how are we going to use the monotone functions we’ve been talking so much about? We need to figure out what to feed to our analyses before we can prove their correctness.

I have an answer to that question: we will be using control flow graphs (CFGs). These are another program representation, one that’s more commonly found in compilers. I will show what they look like in the next post. I hope to see you there!

Implementing and Verifying "Static Program Analysis" in Agda, Part 4: The Fixed-Point Algorithm

Sun, 03 Nov 2024 17:50:26 -0800

In the preivous post we looked at lattices of finite height, which are a crucial ingredient to our static analyses. In this post, I will describe the specific algorithm that makes use of these lattices; this algorithm will be at the core of this series.

Lattice-based static analyses tend to operate by iteratively combining facts from the program into new ones. For instance, when analyzing y = 1 + 2, we take the (trivial) facts that the numbers one and two are positive, and combine them into the knowledge that y is positive as well. If another line of code reads x = y + 1, we then apply our new knowledge of y to determine the sign of x, too. Combining facs in this manner gives us more information, which we can then continue to apply to learn more about the program.

A static program analyzer, however, is a very practical thing. Although in mathemaitics we may allow ourselves to delve into infinite algorithms, we have no such luxury while trying to, say, compile some code. As a result, after a certain point, we need to stop our iterative (re)combination of facts. In an ideal world, that point would be when we know we have found out everything we could about the program. A corollary to that would be that this point must be guaranteed to eventually occur, lest we keep looking for it indenfinitely.

The fixed-point algorithm does this for us. If we describe our analysis as a monotonic function over a finite-height lattice, this algorithm gives us a surefire way to find out facts about our program that constitute “complete” information that can’t be re-inspected to find out more. The algorithm is guaranteed to terminate, which means that we will not get stuck in an infinite loop.

The Algorithm

Take a lattice $L$ and a monotonic function $f$. We’ve talked about monotonicity before, but it’s easy to re-state. Specifically, a function is monotonic if the following rule holds true:

$$ \textbf{if}\ a \le b\ \textbf{then}\ f(a) \le f(b) $$

Recall that the less-than relation on lattices in our case encodes specificity. In particular, if elements of our lattice describe our program, than smaller elements should provide more precise descriptions (where “x is potive” is more precise than “x has any sign”, for example). Viewed through this lens, monotonicity means that more specific inputs produce more specific outputs. That seems reasonable.

Now, let’s start with the least element of our lattice, denoted $\bot$. A lattice of finite height is guaranteed to have such an element. If it didn’t, we could always extend chains by tacking on a smaller element to their bottom, and then the lattice wouldn’t have a finite height anymore.

Now, apply $f$ to $\bot$ to get $f(\bot)$. Since $\bot$ is the least element, it must be true that $\bot \le f(\bot)$. Now, if it’s “less than or equal”, is it “less than”, or is “equal”)? If it’s the latter, we have $\bot = f(\bot)$. This means we’ve found a fixed point: given our input $\bot$ our analysis $f$ produced no new information, and we’re done. Otherwise, we are not done, but we know that $\bot < f(\bot)$, which will be helpful shortly.

Continuing the “less than” case, we can apply $f$ again, this time to $f(\bot)$. This gives us $f(f(\bot))$. Since $f$ is monotonic and $\bot \le f(\bot)$, we know also that $f(\bot) \le f(f(\bot))$. Again, ask “which is it?”, and as before, if $f(\bot) = f(f(\bot))$, we have found a fixed point. Otherwise, we know that $f(\bot) < f(f(\bot))$.

We can keep doing this. Notice that with each step, we are either done (having found a fixed point) or we have a new inequality in our hands. We can arrange the ones we’ve seen so far into a chain:

$$ \bot < f(\bot) < f(f(\bot)) < ... $$

Each time we fail to find a fixed point, we add one element to our chain, growing it. But if our lattice $L$ has a finite height, that means eventually this process will have to stop; the chain can’t grow forever. Eventually, we will have to find a value such that $v = f(v)$. Thus, our algorithm is guaranteed to terminate, and give a fixed point.

I implemented the iterative process of applying $f$ using a recursive function. Agda has a termination checker, to which the logic above — which proves that iteration will eventually finish — is not at all obvious. The trick to getting it to work was to use a notion of “gas”: an always-decreasing value that serves as one of the functions’ arguments. Since the value is always decreasing in size, the termination checker is satisfied.

This works by observing that we already have a rough idea of the maximum number of times our function will recurse; that would be the height of the lattice. After that, we would be building an impossibly long chain. So, we’ll give the function a “budget” of that many iterations, plus one more. Since the chain increases once each time the budget shrinks (indicating recursion), running out of our “gas” will mean that we built an impossibly long chain — it will provably never happen.

In all, the recursive function is as follows:

From Fixedpoint.agda, lines 53 through 64

    doStep : ∀ (g hᶜ : ℕ) (a₁ a₂ : A) (c : ChainA.Chain a₁ a₂ hᶜ) (g+hᶜ≡h : g + hᶜ ≡ suc h) (a₂≼fa₂ : a₂ ≼ f a₂) → Σ A (λ a → a ≈ f a)
    doStep 0 hᶜ a₁ a₂ c g+hᶜ≡sh a₂≼fa₂ rewrite g+hᶜ≡sh = ⊥-elim (ChainA.Bounded-suc-n boundedᴬ c)
    doStep (suc g') hᶜ a₁ a₂ c g+hᶜ≡sh a₂≼fa₂ rewrite sym (+-suc g' hᶜ)
        with ≈-dec a₂ (f a₂)
    ...   | yes a₂≈fa₂ = (a₂ , a₂≈fa₂)
    ...   | no  a₂̷≈fa₂ = doStep g' (suc hᶜ) a₁ (f a₂) c' g+hᶜ≡sh (Monotonicᶠ a₂≼fa₂)
                where
                    a₂≺fa₂ : a₂ ≺ f a₂
                    a₂≺fa₂ = (a₂≼fa₂ , a₂̷≈fa₂)

                    c' : ChainA.Chain a₁ (f a₂) (suc hᶜ)
                    c' rewrite +-comm 1 hᶜ = ChainA.concat c (ChainA.step a₂≺fa₂ ≈-refl (ChainA.done (≈-refl {f a₂})))

The first case handles running out of gas, arguing by bottom-elimination (contradiction). The second case follows the algorithm I’ve described pretty closely; it applies $f$ to an existing value, checks if the result is equal (equivalent) to the original, and if it isn’t, it grows the existing chain of elements and invokes the step function rescurisvely with the grown chain and less gas.

The recursive function implements a single “step” of the process (applying f, comparing for equality, returning the fixed point if one was found). All that’s left is to kick off the process using $\bot$. This is what fix does:

From Fixedpoint.agda, lines 66 through 67

66
67

    fix : Σ A (λ a → a ≈ f a)
    fix = doStep (suc h) 0 ⊥ᴬ ⊥ᴬ (ChainA.done ≈-refl) (+-comm (suc h) 0) (⊥ᴬ≼ (f ⊥ᴬ))

This functions is responsible for providing gas to doStep; as I mentioned above, it provides just a bit more gas than the maximum-length chain, which means that if the gas is exhausted, we’ve certainly arrived at a contradiction. It also provides an initial chain onto which doStep will keep tacking on new inequalities as it finds them. Since we haven’t found any yet, this is the single-element chain of $\bot$. The last thing is does is set up the recursion invariant (that the sum of the gas and the chain length is constant), and provides a proof that $\bot \le f(\bot)$. This function always returns a fixed point.

Least Fixed Point

Functions can have many fixed points. Take the identity function that simply returns its argument unchanged; this function has a fixed point for every element in its domain, since, for example, $\text{id}(1) = 1$, $\text{id}(2) = 2$, etc. The fixed point found by our algorithm above is somewhat special among the possible fixed points of $f$: it is the least fixed point of the function. Call our fixed point $a$; if there’s another point $b$ such that $b=f(b)$, then the fixed point we found must be less than or equal to $b$ (that is, $a \le b$). This is important given our interpretation of “less than” as “more specific”: the fixedpoint algorithm produces the most specific possible information about our program given the rules of our analysis.

The proof is simple; suppose that it took $k$ iterations of calling $f$ to arrive at our fixed point. This gives us:

$$ a = \underbrace{f(f(...(f(}_{k\ \text{times}}\bot)))) = f^k(\bot) $$

Now, take our other fixed point $b$. Since $\bot$ is the least element of the lattice, we have $\bot \le b$.

$$ \begin{array}{ccccccccr} & & \bot & \le & & & b & \quad \implies & \text{(monotonicity of}\ f \text{)}\\ & & f(\bot) & \le & f(b) & = & b & \quad \implies & \text{(} b\ \text{is a fixed point, monotonicity of}\ f \text{)}\\ & & f^2(\bot) & \le & f(b) & = & b & \quad \implies & \text{(} b\ \text{is a fixed point, monotonicity of}\ f \text{)}\\ \\ & & \vdots & & \vdots & & \vdots & & \\ \\ a & = & f^k(\bot) & \le & f(b) & = & b & \end{array} $$

Because of the monotonicity of $f$, each time we apply it, it preserves the less-than relationship that started with $\bot \le b$. Doing that $k$ times, we verify that $a$ is our least fixed point.

To convince Agda of this proof, we once again get in an argument with the termination checker, which ends the same way it did last time: with us using the notion of ‘gas’ to ensure that the repeated application of $f$ eventually ends. Since we’re interested in verifying that doStep producdes the least fixed point, we formulate the proof in terms of doStep applied to various arguments.

From Fixedpoint.agda, lines 76 through 84

    stepPreservesLess : ∀ (g hᶜ : ℕ) (a₁ a₂ b : A) (b≈fb : b ≈ f b) (a₂≼a : a₂ ≼ b)
                     (c : ChainA.Chain a₁ a₂ hᶜ) (g+hᶜ≡h : g + hᶜ ≡ suc h)
                     (a₂≼fa₂ : a₂ ≼ f a₂) →
                     proj₁ (doStep g hᶜ a₁ a₂ c g+hᶜ≡h a₂≼fa₂) ≼ b
    stepPreservesLess 0 _ _ _ _ _ _ c g+hᶜ≡sh _ rewrite g+hᶜ≡sh = ⊥-elim (ChainA.Bounded-suc-n boundedᴬ c)
    stepPreservesLess (suc g') hᶜ a₁ a₂ b b≈fb a₂≼b c g+hᶜ≡sh a₂≼fa₂ rewrite sym (+-suc g' hᶜ)
        with ≈-dec a₂ (f a₂)
    ...   | yes _ = a₂≼b
    ...   | no  _ = stepPreservesLess g' _ _ _ b b≈fb (≼-cong ≈-refl (≈-sym b≈fb) (Monotonicᶠ a₂≼b)) _ _ _

As with doStep, this function takes as arguments the amount of gas g and a partially-built chain c, which gets appended to for each failed equality comparison. In addition, however, this function takes another arbitrary fixed point b, which is greater than the current input to doStep (which is a value $f^i(\bot$) for some $i$). It then proves that when doStep terminates (which will be with a value in the form $f^k(\bot)$), this value will still be smaller than b. Since it is a proof about doStep, stepPreservesLess proceeds by the same case analysis as its subject, and has a very similar (albeit simpler) structure. In short, though, it encodes the relatively informal proof I gave above.

Just like with doStep, I define a helper function for stepPreservesLess that kicks off its recursive invocations.

From Fixedpoint.agda, lines 86 through 87

86
87

aᶠ≼ : ∀ (a : A) → a ≈ f a → aᶠ ≼ a
aᶠ≼ a a≈fa = stepPreservesLess (suc h) 0 ⊥ᴬ ⊥ᴬ a a≈fa (⊥ᴬ≼ a) (ChainA.done ≈-refl) (+-comm (suc h) 0) (⊥ᴬ≼ (f ⊥ᴬ))

Above, aᶠ is the output of fix:

From Fixedpoint.agda, lines 69 through 70

69
70

aᶠ : A
aᶠ = proj₁ fix

What is a Program?

With the fixed point algorithm in hand, we have all the tools we need to define static program analyses:

We’ve created a collection of “lattice builders”, which allow us to combine various lattice building blocks into more complicated structures; these structures are advanced enough to represent the information about our programs.
We’ve figured out a way (our fixed point algorithm) to repeatedly apply an inference function to our programs and eventually produce results. This algorithm requires some additional properties from our latttices.
We’ve proven that our lattice builders create lattices with these properties, making it possible to use them to construct functions fit for our fixed point algorithm.

All that’s left is to start defining monotonic functions over lattices! Except, what are we analyzing? We’ve focused a fair bit on the theory of lattices, but we haven’t yet defined even a tiny piece of the language that our programs will be analyzing. We will start with programs like this:

x = 42
y = 1
if y {
  x = -1;
} else {
  x = -2;
}

We will need to model these programs in Agda by describing them as trees (Abstract Syntax Trees, to be precise). We will also need to specify how to evaluate these programs (provide the semantics of our language). We will use big-step (also known as “natural”) operational semantics to do so; here’s an example rule:

$$ \frac{\rho_1, e \Downarrow z \quad \neg (z = 0) \quad \rho_1,s_1 \Downarrow \rho_2} {\rho_1, \textbf{if}\ e\ \textbf{then}\ s_1\ \textbf{else}\ s_2\ \Downarrow\ \rho_2} $$

The above reads:

If the condition of an if-else statement evaluates to a nonzero value, then to evaluate the statement, you evaluate its then branch.

In the next post, we’ll talk more about how these rules work, and define the remainder of them to give our language life. See you then!

Implementing and Verifying "Static Program Analysis" in Agda, Part 3: Lattices of Finite Height

Thu, 08 Aug 2024 17:29:00 -0700

In the previous post, I introduced the class of finite-height lattices: lattices where chains made from elements and the less-than operator < can only be so long. As a first example, natural numbers form a lattice, but they are not a finite-height lattice; the following chain can be made infinitely long:

$$ 0 < 1 < 2 < ... $$

There isn’t a “biggest natural number”! On the other hand, we’ve seen that our sign lattice has a finite height; the longest chain we can make is three elements long; I showed one such chain (there are many chains of three elements) in the previous post, but here it is again:

$$ \bot < + < \top $$

It’s also true that the Cartesian product lattice $L_1 \times L_2$ has a finite height, as long as $L_1$ and $L_2$ are themselves finite-height lattices. In the specific case where both $L_1$ and $L_2$ are the sign lattice ($L_1 = L_2 = \text{Sign} $) we can observe that the longest chains have five elements. The following is one example:

$$ (\bot, \bot) < (\bot, +) < (\bot, \top) < (+, \top) < (\top, \top) $$

The fact that $L_1$ and $L_2$ are themselves finite-height lattices is important; if either one of them is not, we can easily construct an infinite chain of the products. If we allowed $L_2$ to be natural numbers, we’d end up with infinite chains like this one:

$$ (\bot, 0) < (\bot, 1) < (\bot, 2) < ... $$

Another lattice that has a finite height under certain conditions is the map lattice. The “under certain conditions” part is important; we can easily construct an infinite chain of map lattice elements in general:

$$ \{ a : 1 \} < \{ a : 1, b : 1 \} < \{ a: 1, b: 1, c: 1 \} < ... $$

As long as we have infinite keys to choose from, we can always keep adding new keys to make bigger and bigger maps. But if we fix the keys in the map — say, we use only a and b — then suddenly our heights are once again fixed. In fact, for the two keys I just picked, one longest chain is remarkably similar to the product chain above.

$$ \{a: \bot, a: \bot\} < \{a: \bot, b: +\} < \{a: \bot, b: \top\} < \{a: +, b: \top\} < \{a: \top, b: \top\} $$

The class of finite-height lattices is important for static program analysis, because it ensures that out our analyses don’t take infinite time. Though there’s an intuitive connection (“finite lattices mean finite execution”), the details of why the former is needed for the latter are nuanced. We’ll talk about them in a subsequent post.

In the meantime, let’s dig deeper into the notion of finite height, and the Agda proofs of the properties I’ve introduced thus far.

Formalizing Finite Height

The formalization I settled on is quite similar to the informal description: a lattice has a finite height of length $h$ if the longest chain of elements compared by $(<)$ is exactly $h$. There’s only a slight complication: we allow for equivalent-but-not-equal elements in lattices. For instance, for a map lattice, we don’t care about the order of the keys: so long as two maps relate the same set of keys to the same respective values, we will consider them equal. This, however, is beyond the notion of Agda’s propositional equality (_≡_). Thus, we we need to generalize the definition of a chain to support equivalences. I parameterize the Chain module in my code by an equivalence relation, as well as the comparison relation R, which we will set to < for our chains. The equivalence relation _≈_ and the ordering relation R/< are expected to play together nicely (if a < b, and a is equivalent to c, then it should be the case that c < b).

From Chain.agda, lines 3 through 7

module Chain {a} {A : Set a}
             (_≈_ : A → A → Set a)
             (≈-equiv : IsEquivalence A _≈_)
             (_R_ : A → A → Set a)
             (R-≈-cong : ∀ {a₁ a₁' a₂ a₂'} → a₁ ≈ a₁' → a₂ ≈ a₂' → a₁ R a₂ → a₁' R a₂') where

From there, the definition of the Chain data type is much like the definition of a vector from Data.Vec, but indexed by the endpoints, and containing witnesses of R/< between its elements. The indexing allows for representing the type of chains between particular lattice elements, and serves to ensure concatenation and other operations don’t merge disparate chains.

From Chain.agda, lines 19 through 21

19
20
21

    data Chain : A → A → ℕ → Set a where
        done : ∀ {a a' : A} → a ≈ a' → Chain a a' 0
        step : ∀ {a₁ a₂ a₂' a₃ : A} {n : ℕ} → a₁ R a₂ → a₂ ≈ a₂' → Chain a₂' a₃ n → Chain a₁ a₃ (suc n)

In the done case, we create a single-element chain, which has no comparisons. In this case, the chain starts and stops at the same element (where “the same” is modulo our equivalence). The step case prepends a new comparison a1 < a2 to an existing chain; once again, we allow for the existing chain to start with a different-but-equivalent element a2'.

With that definition in hand, I define what it means for a type of chains between elements of the lattice A to be bounded by a certain height; simply put, all chains must have length less than or equal to the bound.

From Chain.agda, lines 38 through 39

38
39

    Bounded : ℕ → Set a
    Bounded bound = ∀ {a₁ a₂ : A} {n : ℕ} → Chain a₁ a₂ n → n ≤ bound

Though Bounded specifies a bound on the length of chains, it doesn’t specify the (lowest) bound. Specifically, if the chains can only have length three, they are bounded by both 3, 30, and 300. To claim a lowest bound (which would be the maximum length of the lattice), we need to show that a chain of that length actually exists (otherwise, we could take the previous natural number, and it would be a bound as well). Thus, I define the Height predicate to require that a chain of the desired height exists, and that this height bounds the length of all other chains.

From Chain.agda, lines 47 through 53

    record Height (height : ℕ) : Set a where
        field
            ⊥ : A
            ⊤ : A

            longestChain : Chain ⊥ ⊤ height
            bounded : Bounded height

Finally, for a lattice to have a finite height, the type of chains formed by using its less-than operator needs to have that height (satisfy the Height h predicate). To avoid having to thread through the equivalence relation, congruence proof, and more, I define a specialized predicate for lattices specifically. I do so as a “method” in my IsLattice record.

From Lattice.agda, lines 183 through 210

record IsLattice {a} (A : Set a)
    (_≈_ : A → A → Set a)
    (_⊔_ : A → A → A)
    (_⊓_ : A → A → A) : Set a where

    field
        joinSemilattice : IsSemilattice A _≈_ _⊔_
        meetSemilattice : IsSemilattice A _≈_ _⊓_

        absorb-⊔-⊓ : (x y : A) → (x ⊔ (x ⊓ y)) ≈ x
        absorb-⊓-⊔ : (x y : A) → (x ⊓ (x ⊔ y)) ≈ x

    open IsSemilattice joinSemilattice public
    open IsSemilattice meetSemilattice public using () renaming
        ( ⊔-assoc to ⊓-assoc
        ; ⊔-comm to ⊓-comm
        ; ⊔-idemp to ⊓-idemp
        ; ⊔-Monotonicˡ to ⊓-Monotonicˡ
        ; ⊔-Monotonicʳ to ⊓-Monotonicʳ
        ; ≈-⊔-cong to ≈-⊓-cong
        ; _≼_ to _≽_
        ; _≺_ to _≻_
        ; ≼-refl to ≽-refl
        ; ≼-trans to ≽-trans
        )

    FixedHeight : ∀ (h : ℕ) → Set a
    FixedHeight h = Chain.Height (_≈_) ≈-equiv _≺_ ≺-cong h

Thus, bringing the operators and other definitions of IsLattice into scope will also bring in the FixedHeight predicate.

Fixed Height of the “Above-Below” Lattice

We’ve already seen intuitive evidence that the sign lattice — which is an instance of the “above-below” lattice — has a fixed height. The reason is simple: we extended a set of incomparable elements with a single element that’s greater, and a single element that’s lower. We can’t make chains out of incomparable elements (since we can’t compare them using <); thus, we can only have one < from the new least element, and one < from the new greatest element.

The proof is a bit tedious, but not all that complicated. First, a few auxiliary helpers; feel free to read only the type signatures. They specify, respectively:

That the bottom element $\bot$ of the above-below lattice is less than any concrete value from the underlying set. For instance, in the sign lattice case, $\bot < +$.
That $\bot$ is the only element satisfying the first property; that is, any value strictly less than an element of the underlying set must be $\bot$.
That the top element $\top$ of the above-below lattice is greater than any concrete value of the underlying set. This is the dual of the first property.
That, much like the bottom element is the only value strictly less than elements of the underlying set, the top element is the only value strictly greater.

From AboveBelow.agda, lines 315 through 335

    ⊥≺[x] : ∀ (x : A) → ⊥ ≺ [ x ]
    ⊥≺[x] x = (≈-refl , λ ())

    x≺[y]⇒x≡⊥ : ∀ (x : AboveBelow) (y : A) → x ≺ [ y ] → x ≡ ⊥
    x≺[y]⇒x≡⊥  x y ((x⊔[y]≈[y]) , x̷≈[y]) with x
    ... | ⊥ = refl
    ... | ⊤ with () ← x⊔[y]≈[y]
    ... | [ b ] with ≈₁-dec b y
    ...     | yes b≈y = ⊥-elim (x̷≈[y] (≈-lift b≈y))
    ...     | no _ with () ← x⊔[y]≈[y]

    [x]≺⊤ : ∀ (x : A) → [ x ] ≺ ⊤
    [x]≺⊤ x rewrite x⊔⊤≡⊤ [ x ] = (≈-⊤-⊤ , λ ())

    [x]≺y⇒y≡⊤ : ∀ (x : A) (y : AboveBelow) → [ x ] ≺ y → y ≡ ⊤
    [x]≺y⇒y≡⊤ x y ([x]⊔y≈y , [x]̷≈y) with y
    ... | ⊥ with () ← [x]⊔y≈y
    ... | ⊤ = refl
    ... | [ a ] with ≈₁-dec x a
    ...     | yes x≈a = ⊥-elim ([x]̷≈y (≈-lift x≈a))
    ...     | no _  with () ← [x]⊔y≈y

From there, we can construct an instance of the longest chain. Actually, there’s a bit of a hang-up: what if the underlying set is empty? Concretely, what if there were no signs? Then we could only construct a chain with one comparison: $\bot < \top$. Instead of adding logic to conditionally specify the length, I simply require that the set is populated by requiring a witness

From AboveBelow.agda, line 85

`85`	`module Plain (x : A) where`

I use this witness to construct the two-< chain.

From AboveBelow.agda, lines 339 through 340

339
340

    longestChain : Chain ⊥ ⊤ 2
    longestChain = step (⊥≺[x] x) ≈-refl (step ([x]≺⊤ x) ≈-⊤-⊤ (done ≈-⊤-⊤))

The proof that the length of two — in terms of comparisons — is the bound of all chains of AboveBelow elements requires systematically rejecting all longer chains. Informally, suppose you have a chain of three or more comparisons.

If it starts with $\top$, you can’t add any more elements since that’s the greatest element (contradiction).
If you start with an element of the underlying set, you could add another element, but it has to be the top element; after that, you can’t add any more (contradiction).
If you start with $\bot$, you could arrive at a chain of two comparisons, but you can’t go beyond that (in three cases, each leading to contradictions).

From AboveBelow.agda, lines 342 through 355

    ¬-Chain-⊤ : ∀ {ab : AboveBelow} {n : ℕ} → ¬ Chain ⊤ ab (suc n)
    ¬-Chain-⊤ {x} (step (⊤⊔x≈x , ⊤̷≈x) _ _) rewrite ⊤⊔x≡⊤ x = ⊥-elim (⊤̷≈x ⊤⊔x≈x)

    isLongest : ∀ {ab₁ ab₂ : AboveBelow} {n : ℕ} → Chain ab₁ ab₂ n → n ≤ 2
    isLongest (done _) = z≤n
    isLongest (step _ _ (done _)) = s≤s z≤n
    isLongest (step _ _ (step _ _ (done _))) = s≤s (s≤s z≤n)
    isLongest {⊤} c@(step _ _ _) = ⊥-elim (¬-Chain-⊤ c)
    isLongest {[ x ]} (step {_} {y} [x]≺y y≈y' c@(step _ _ _))
        rewrite [x]≺y⇒y≡⊤ x y [x]≺y with ≈-⊤-⊤ ← y≈y' = ⊥-elim (¬-Chain-⊤ c)
    isLongest {⊥} (step {_} {⊥} (_ , ⊥̷≈⊥) _ _) = ⊥-elim (⊥̷≈⊥ ≈-⊥-⊥)
    isLongest {⊥} (step {_} {⊤} _ ≈-⊤-⊤ c@(step _ _ _)) = ⊥-elim (¬-Chain-⊤ c)
    isLongest {⊥} (step {_} {[ x ]} _ (≈-lift _) (step [x]≺y y≈z c@(step _ _ _)))
        rewrite [x]≺y⇒y≡⊤ _ _ [x]≺y with ≈-⊤-⊤ ← y≈z = ⊥-elim (¬-Chain-⊤ c)

Thus, the above-below lattice has a length of two comparisons (or alternatively, three elements).

From AboveBelow.agda, lines 357 through 363

    fixedHeight : IsLattice.FixedHeight isLattice 2
    fixedHeight = record
        { ⊥ = ⊥
        ; ⊤ = ⊤
        ; longestChain = longestChain
        ; bounded = isLongest
        }

And that’s it.

Fixed Height of the Product Lattice

Now, for something less tedious. We saw above that for a product lattice to have a finite height, its constituent lattices must have a finite height. The proof was by contradiction (by constructing an infinitely long product chain given a single infinite lattice). As a result, we’ll focus this section on products of two finite lattices A and B. Additionally, for the proofs in this section, I require element equivalence to be decidable.

From Prod.agda, lines 115 through 117

115
116
117

module _ (≈₁-dec : IsDecidable _≈₁_) (≈₂-dec : IsDecidable _≈₂_)
         (h₁ h₂ : ℕ)
         (fhA : FixedHeight₁ h₁) (fhB : FixedHeight₂ h₂) where

Let’s think about how we might go about constructing the longest chain in a product lattice. Let’s start with some arbitrary element $p_1 = (a_1, b_1)$. We need to find another value that isn’t equal to $p_1$, because we’re building chains of the less-than operator $(<)$, and not the less-than-or-equal operator $(\leq)$. As a result, we need to change either the first component, the second component, or both. If we’re building “to the right” (adding bigger elements), the new components would need to be bigger. Suppose then that we came up with $a_2$ and $b_2$, with $a_1 < a_2$ and $b_1 < b_2$. We could then create a length-one chain:

$$ (a_1, b_1) < (a_2, b_2) $$

That works, but we can construct an even longer chain by increasing only one element at a time:

$$ (a_1, b_1) < (a_1, b_2) < (a_2, b_2) $$

We can apply this logic every time; the conclusion is that when building up a chain, we need to increase one element at a time. Then, how many times can we increase an element? Well, if lattice A has a height of two (comparisons), then we can take its lowest element, and increase it twice. Similarly, if lattice B has a height of three, starting at its lowest element, we can increase it three times. In all, when building a chain of A × B, we can increase an element five times. Generally, the number of < in the product chain is the sum of the numbers of < in the chains of A and B.

This gives us a recipe for constructing the longest chain in the product lattice: take the longest chains of A and B, and start with the product of their lowest elements. Then, increase the elements one at a time according to the chains. The simplest way to do that might be to increase by all elements of the A chain, and then by all of the elements of the B chain (or the other way around). That’s the strategy I took when constructing the $\text{Sign} \times \text{Sign}$ chain above.

To formalize this notion, a few lemmas. First, given two chains where one starts with the same element another ends with, we can combine them into one long chain.

From Chain.agda, lines 31 through 33

31
32
33

    concat : ∀ {a₁ a₂ a₃ : A} {n₁ n₂ : ℕ} → Chain a₁ a₂ n₁ → Chain a₂ a₃ n₂ → Chain a₁ a₃ (n₁ + n₂)
    concat (done a₁≈a₂) a₂a₃ = Chain-≈-cong₁ (≈-sym a₁≈a₂) a₂a₃
    concat (step a₁Ra a≈a' a'a₂) a₂a₃ = step a₁Ra a≈a' (concat a'a₂ a₂a₃)

More interestingly, given a chain of comparisons in one lattice, we are able to lift it into a chain in another lattice by applying a function to each element. This function must be monotonic, because it must not be able to reverse $a < b$ such that $f(b) < f(a)$. Moreover, this function should be injective, because if $f(a) = f(b)$, then a chain $a < b$ might be collapsed into $f(a) \not< f(a)$, changing its length. Finally, the function needs to produce equivalent outputs when giving equivalent inputs. The result is the following lemma:

From Lattice.agda, lines 226 through 247

module ChainMapping {a b} {A : Set a} {B : Set b}
    {_≈₁_ : A → A → Set a} {_≈₂_ : B → B → Set b}
    {_⊔₁_ : A → A → A} {_⊔₂_ : B → B → B}
    (slA : IsSemilattice A _≈₁_ _⊔₁_) (slB : IsSemilattice B _≈₂_ _⊔₂_) where

    open IsSemilattice slA renaming (_≼_ to _≼₁_; _≺_ to _≺₁_; ≈-equiv to ≈₁-equiv; ≺-cong to ≺₁-cong)
    open IsSemilattice slB renaming (_≼_ to _≼₂_; _≺_ to _≺₂_; ≈-equiv to ≈₂-equiv; ≺-cong to ≺₂-cong)

    open Chain _≈₁_ ≈₁-equiv _≺₁_ ≺₁-cong using () renaming (Chain to Chain₁; step to step₁; done to done₁)
    open Chain _≈₂_ ≈₂-equiv _≺₂_ ≺₂-cong using () renaming (Chain to Chain₂; step to step₂; done to done₂)

    Chain-map : ∀ (f : A → B) →
                Monotonic _≼₁_ _≼₂_ f →
                Injective _≈₁_ _≈₂_ f →
                f Preserves _≈₁_ ⟶  _≈₂_ →
                ∀ {a₁ a₂ : A} {n : ℕ} → Chain₁ a₁ a₂ n → Chain₂ (f a₁) (f a₂) n
    Chain-map f Monotonicᶠ Injectiveᶠ Preservesᶠ (done₁ a₁≈a₂) =
        done₂ (Preservesᶠ a₁≈a₂)
    Chain-map f Monotonicᶠ Injectiveᶠ Preservesᶠ (step₁ (a₁≼₁a , a₁̷≈₁a) a≈₁a' a'a₂) =
        let fa₁≺₂fa = (Monotonicᶠ a₁≼₁a , λ fa₁≈₂fa → a₁̷≈₁a (Injectiveᶠ fa₁≈₂fa))
            fa≈fa' = Preservesᶠ a≈₁a'
        in step₂ fa₁≺₂fa fa≈fa' (Chain-map f Monotonicᶠ Injectiveᶠ Preservesᶠ a'a₂)

Given this, and two lattices of finite height, we construct the full product chain by lifting the A chain into the product via $a \mapsto (a, \bot_2)$, lifting the B chain into the product via $b \mapsto (\top_1, b)$, and concatenating the results. This works because the first chain ends with $(\top_1, \bot_2)$, and the second starts with it.

From Prod.agda, lines 169 through 171

169
170
171

      ; longestChain = concat
            (ChainMapping₁.Chain-map (λ a → (a , ⊥₂)) (∙,b-Monotonic _) proj₁ (∙,b-Preserves-≈₁ _) longestChain₁)
            (ChainMapping₂.Chain-map (λ b → (⊤₁ , b)) (a,∙-Monotonic _) proj₂ (a,∙-Preserves-≈₂ _) longestChain₂)

This gets us the longest chain; what remains is to prove that this chain’s length is the bound of all other changes. To do so, we need to work in the opposite direction; given a chain in the product lattice, we need to somehow reduce it to chains in lattices A and B, and leverage their finite height to complete the proof.

The key idea is that for every two consecutive elements in the product lattice chain, we know that at least one of their components must’ve increased. This increase had to come either from elements in lattice A or in lattice B. We can thus stick this increase into an A-chain or a B-chain, increasing its length. Since one of the chains grows with every consecutive pair, the number of consecutive pairs can’t exceed the combined lengths of the A and B chains.

I implement this idea as an unzip function, which takes a product chain and produces two chains made from its increases. By the logic we’ve described, the length two chains has to bound the main one’s. I give the signature below, and will put the implementation in a collapsible detail block. One last detail is that the need to decide which chain to grow — and thus which element has increased — is what introduces the need for decidable equality.

From Prod.agda, line 149

        unzip : ∀ {a₁ a₂ : A} {b₁ b₂ : B} {n : ℕ} → Chain (a₁ , b₁) (a₂ , b₂) n → Σ (ℕ × ℕ) (λ (n₁ , n₂) → ((Chain₁ a₁ a₂ n₁ × Chain₂ b₁ b₂ n₂) × (n ≤ n₁ + n₂)))

(Click here for the implementation of unzip)

From Prod.agda, lines 149 through 163

        unzip : ∀ {a₁ a₂ : A} {b₁ b₂ : B} {n : ℕ} → Chain (a₁ , b₁) (a₂ , b₂) n → Σ (ℕ × ℕ) (λ (n₁ , n₂) → ((Chain₁ a₁ a₂ n₁ × Chain₂ b₁ b₂ n₂) × (n ≤ n₁ + n₂)))
        unzip (done (a₁≈a₂ , b₁≈b₂)) = ((0 , 0) , ((done₁ a₁≈a₂ , done₂ b₁≈b₂) , ≤-refl))
        unzip {a₁} {a₂} {b₁} {b₂} {n} (step {(a₁ , b₁)} {(a , b)} ((a₁≼a , b₁≼b) , a₁b₁̷≈ab) (a≈a' , b≈b') a'b'a₂b₂)
            with ≈₁-dec a₁ a | ≈₂-dec b₁ b | unzip a'b'a₂b₂
        ...   | yes a₁≈a | yes b₁≈b | ((n₁ , n₂) , ((c₁ , c₂) , n≤n₁+n₂)) = ⊥-elim (a₁b₁̷≈ab (a₁≈a , b₁≈b))
        ...   | no a₁̷≈a  | yes b₁≈b | ((n₁ , n₂) , ((c₁ , c₂) , n≤n₁+n₂)) =
                ((suc n₁ , n₂) , ((step₁ (a₁≼a , a₁̷≈a) a≈a' c₁ , Chain₂-≈-cong₁ (≈₂-sym (≈₂-trans b₁≈b b≈b')) c₂), +-monoʳ-≤ 1 (n≤n₁+n₂)))
        ...   | yes a₁≈a | no b₁̷≈b | ((n₁ , n₂) , ((c₁ , c₂) , n≤n₁+n₂)) =
                ((n₁ , suc n₂) , ( (Chain₁-≈-cong₁ (≈₁-sym (≈₁-trans a₁≈a a≈a')) c₁ , step₂ (b₁≼b , b₁̷≈b) b≈b' c₂)
                                 , subst (n ≤_) (sym (+-suc n₁ n₂)) (+-monoʳ-≤ 1 n≤n₁+n₂)
                                 ))
        ...   | no a₁̷≈a  | no b₁̷≈b | ((n₁ , n₂) , ((c₁ , c₂) , n≤n₁+n₂)) =
                ((suc n₁ , suc n₂) , ( (step₁ (a₁≼a , a₁̷≈a) a≈a' c₁ , step₂ (b₁≼b , b₁̷≈b) b≈b' c₂)
                                     , m≤n⇒m≤o+n 1 (subst (n ≤_) (sym (+-suc n₁ n₂)) (+-monoʳ-≤ 1 n≤n₁+n₂))
                                     ))

Having decomposed the product chain into constituent chains, we simply combine the facts that they have to be bounded by the height of the A and B lattices, as well as the fact that they bound the combined chain.

From Prod.agda, lines 165 through 175

    fixedHeight : IsLattice.FixedHeight isLattice (h₁ + h₂)
    fixedHeight = record
      { ⊥ = (⊥₁ , ⊥₂)
      ; ⊤ = (⊤₁ , ⊤₂)
      ; longestChain = concat
            (ChainMapping₁.Chain-map (λ a → (a , ⊥₂)) (∙,b-Monotonic _) proj₁ (∙,b-Preserves-≈₁ _) longestChain₁)
            (ChainMapping₂.Chain-map (λ b → (⊤₁ , b)) (a,∙-Monotonic _) proj₂ (a,∙-Preserves-≈₂ _) longestChain₂)
      ; bounded = λ a₁b₁a₂b₂ →
            let ((n₁ , n₂) , ((a₁a₂ , b₁b₂) , n≤n₁+n₂)) = unzip a₁b₁a₂b₂
            in ≤-trans n≤n₁+n₂ (+-mono-≤ (bounded₁ a₁a₂) (bounded₂ b₁b₂))
      }

This completes the proof!

Iterated Products

The product lattice allows us to combine finite height lattices into a new finite height lattice. From there, we can use this newly created lattice as a component of yet another product lattice. For instance, if we had $L_1 \times L_2$, we can take a product of that with $L_1$ again, and get $L_1 \times (L_1 \times L_2)$. Since this also creates a finite-height lattice, we can repeat this process, and keep taking a product with $L_1$, creating:

$$ \overbrace{L_1 \times ... \times L_1}^{n\ \text{times}} \times L_2. $$

I call this the iterated product lattice. Its significance will become clear shortly; in the meantime, let’s prove that it is indeed a lattice (of finite height). To create an iterated product lattice, we still need two constituent lattices as input.

From IterProd.agda, lines 7 through 11

module Lattice.IterProd {a} {A B : Set a}
    (_≈₁_ : A → A → Set a) (_≈₂_ : B → B → Set a)
    (_⊔₁_ : A → A → A) (_⊔₂_ : B → B → B)
    (_⊓₁_ : A → A → A) (_⊓₂_ : B → B → B)
    (lA : IsLattice A _≈₁_ _⊔₁_ _⊓₁_) (lB : IsLattice B _≈₂_ _⊔₂_ _⊓₂_) where

From IterProd.agda, lines 23 through 24

23
24

IterProd : ℕ → Set a
IterProd k = iterate k (λ t → A × t) B

At a high level, the proof goes by induction on the number of applications of the product. There’s just one trick. I’d like to build up an isLattice instance even if A and B are not finite-height. That’s because in that case, the iterated product is still a lattice, just not one with a finite height. On the other hand, the isFiniteHeightLattice proof requires the isLattice proof. Since we’re building up by induction, that means that every recursive invocation of the function, we need to get the “partial” lattice instance and give it to the “partial” finite height lattice instance. When I implemented the inductive proof for isLattice independently from the (more specific) inductive proof of isFiniteHeightLattice, Agda could not unify the two isLattice instances (the “actual” one and the one that serves as witness for isFiniteHeightLattice). This led to some trouble and inconvenience, and so, I thought it best to build the two up together.

To build up with the lattice instance and — if possible — the finite height instance, I needed to allow for the constituent lattices being either finite or infinite. I supported this by defining a helper type:

From IterProd.agda, lines 40 through 55

    record RequiredForFixedHeight : Set (lsuc a) where
        field
            ≈₁-dec : IsDecidable _≈₁_
            ≈₂-dec : IsDecidable _≈₂_
            h₁ h₂ : ℕ
            fhA : FixedHeight₁ h₁
            fhB : FixedHeight₂ h₂

        ⊥₁ : A
        ⊥₁ = Height.⊥ fhA

        ⊥₂ : B
        ⊥₂ = Height.⊥ fhB

        ⊥k : ∀ (k : ℕ) → IterProd k
        ⊥k = build ⊥₁ ⊥₂

Then, I defined the “everything at once” type, in which, instead of a field for the proof of finite height, has a field that constructs this proof if the necessary additional information is present.

From IterProd.agda, lines 57 through 76

    record IsFiniteHeightWithBotAndDecEq {A : Set a} {_≈_ : A → A → Set a} {_⊔_ : A → A → A} {_⊓_ : A → A → A} (isLattice : IsLattice A _≈_ _⊔_ _⊓_) (⊥ : A) : Set (lsuc a) where
        field
            height : ℕ
            fixedHeight : IsLattice.FixedHeight isLattice height
            ≈-dec : IsDecidable _≈_

            ⊥-correct : Height.⊥ fixedHeight ≡ ⊥

    record Everything (k : ℕ) : Set (lsuc a) where
        T = IterProd k

        field
            _≈_ : T → T → Set a
            _⊔_ : T → T → T
            _⊓_ : T → T → T

            isLattice : IsLattice T _≈_ _⊔_ _⊓_
            isFiniteHeightIfSupported :
                ∀ (req : RequiredForFixedHeight) →
                IsFiniteHeightWithBotAndDecEq isLattice (RequiredForFixedHeight.⊥k req k)

Finally, the proof by induction. It’s actually relatively long, so I’ll include it as a collapsible block.

(Click here to expand the inductive proof)

From IterProd.agda, lines 78 through 120

    everything : ∀ (k : ℕ) → Everything k
    everything 0 = record
        { _≈_ = _≈₂_
        ; _⊔_ = _⊔₂_
        ; _⊓_ = _⊓₂_
        ; isLattice = lB
        ; isFiniteHeightIfSupported = λ req → record
            { height = RequiredForFixedHeight.h₂ req
            ; fixedHeight = RequiredForFixedHeight.fhB req
            ; ≈-dec = RequiredForFixedHeight.≈₂-dec req
            ; ⊥-correct = refl
            }
        }
    everything (suc k') = record
        { _≈_ = P._≈_
        ; _⊔_ = P._⊔_
        ; _⊓_ = P._⊓_
        ; isLattice = P.isLattice
        ; isFiniteHeightIfSupported = λ req →
            let
                fhlRest = Everything.isFiniteHeightIfSupported everythingRest req
            in
                record
                    { height = (RequiredForFixedHeight.h₁ req) + IsFiniteHeightWithBotAndDecEq.height fhlRest
                    ; fixedHeight =
                        P.fixedHeight
                        (RequiredForFixedHeight.≈₁-dec req) (IsFiniteHeightWithBotAndDecEq.≈-dec fhlRest)
                        (RequiredForFixedHeight.h₁ req) (IsFiniteHeightWithBotAndDecEq.height fhlRest)
                        (RequiredForFixedHeight.fhA req) (IsFiniteHeightWithBotAndDecEq.fixedHeight fhlRest)
                    ; ≈-dec = P.≈-dec (RequiredForFixedHeight.≈₁-dec req) (IsFiniteHeightWithBotAndDecEq.≈-dec fhlRest)
                    ; ⊥-correct =
                        cong ((Height.⊥ (RequiredForFixedHeight.fhA req)) ,_)
                             (IsFiniteHeightWithBotAndDecEq.⊥-correct fhlRest)
                    }
        }
        where
            everythingRest = everything k'

            import Lattice.Prod
                _≈₁_ (Everything._≈_ everythingRest)
                _⊔₁_ (Everything._⊔_ everythingRest)
                _⊓₁_ (Everything._⊓_ everythingRest)
                lA  (Everything.isLattice everythingRest) as P

Fixed Height of the Map Lattice

We saw above that we can make a map lattice have a finite height if we fix its keys. How does this work? Well, if the keys are always the same, we can think of such a map as just a tuple, with as many element as there are keys.

$$ \begin{array}{cccccc} \{ & a: 1, & b: 2, & c: 3, & \} \\ & & \iff & & \\ ( & 1, & 2, & 3 & ) \end{array} $$

This is why I introduced iterated products earlier; we can use them to construct the second lattice in the example above. I’ll take one departure from that example, though: I’ll “pad” the tuples with an additional unit element at the end. The unit type (denoted $\top$) — which has only a single element — forms a finite height lattice trivially; I prove this in an appendix below. Using this padding helps reduce the number of special cases; without the adding, the tuple definition might be something like the following:

$$ \text{tup}(A, k) = \begin{cases} \top & k = 0 \\ A & k = 1 \\ A \times \text{tup}(A, k - 1) & k > 1 \end{cases} $$

On the other hand, if we were to allow the extra padding, we could drop the definition down to:

$$ \text{tup}(A, k) = \text{iterate}(t \mapsto A \times t, k, \bot) = \begin{cases} \top & k = 0 \\ A \times \text{tup}(A, k - 1) & k > 0 \end{cases} $$

And so, we drop from two to three cases, which means less proof work for us. The tough part is to prove that the two representations of maps — the key-value list and the iterated product — are equivalent. We will not have much trouble proving that they’re both lattices (we did that last time, for both products and maps). Instead, what we need to do is prove that the height of one lattice is the same as the height of the other. We prove this by providing something like an isomorphism: a pair of functions that convert between the two representations, and preserve the properties and relationships (such as $(\sqcup)$) of lattice elements. In fact, the list of the conversion functions’ properties is quite extensive:

From Isomorphism.agda, lines 22 through 33

module TransportFiniteHeight
         {a b : Level} {A : Set a} {B : Set b}
         {_≈₁_ : A → A → Set a} {_≈₂_ : B → B → Set b}
         {_⊔₁_ : A → A → A} {_⊔₂_ : B → B → B}
         {_⊓₁_ : A → A → A} {_⊓₂_ : B → B → B}
         {height : ℕ}
         (fhlA : IsFiniteHeightLattice A height _≈₁_ _⊔₁_ _⊓₁_) (lB : IsLattice B _≈₂_ _⊔₂_ _⊓₂_)
         {f : A → B} {g : B → A}
         (f-preserves-≈₁ : f Preserves _≈₁_ ⟶  _≈₂_) (g-preserves-≈₂ : g Preserves _≈₂_ ⟶  _≈₁_)
         (f-⊔-distr : ∀ (a₁ a₂ : A) → f (a₁ ⊔₁ a₂) ≈₂ ((f a₁) ⊔₂ (f a₂)))
         (g-⊔-distr : ∀ (b₁ b₂ : B) → g (b₁ ⊔₂ b₂) ≈₁ ((g b₁) ⊔₁ (g b₂)))
         (inverseˡ : IsInverseˡ _≈₁_ _≈₂_ f g) (inverseʳ : IsInverseʳ _≈₁_ _≈₂_ f g) where

First, the functions must preserve our definition of equivalence. Thus, if we convert two equivalent elements from the list representation to the tuple representation, the resulting tuples should be equivalent as well. The reverse must be true, too.
Second, the functions must preserve the binary operations — see also the definition of a homomorphism. Specifically, if $f$ is a conversion function, then the following should hold:
$$ f(a \sqcup b) \approx f(a) \sqcup f(b) $$
For the purposes of proving that equivalent maps have finite heights, it turns out that this property need only hold for the join operator $(\sqcup)$.
Finally, the functions must be inverses of each other. If you convert a list to a tuple, and then the tuple back into a list, the resulting value should be equivalent to what we started with. In fact, they need to be both “left” and “right” inverses, so that both $f(g(x))\approx x$ and $g(f(x)) \approx x$.

Given this, the high-level proof is in two parts:

Proving that a chain of the same height exists in the second (e.g., tuple) lattice: To do this, we want to take the longest chain in the first (e.g. key-value list) lattice, and convert it into a chain in the second. The mechanism for this is not too hard to imagine: we just take the original chain, and apply the conversion function to each element.

Intuitively, this works because of the structure-preserving properties we required above. For instance (recall the definition of $(\leq)$ given by Lars Hupel, which in brief is $a \leq b \triangleq a \sqcup b = b$):
$$ \begin{array}{rcr} a \leq b & \iff & (\text{definition of less than})\\ a \sqcup b \approx b & \implies & (\text{conversions preserve equivalence}) \\ f(a \sqcup b) \approx f(b) & \implies & (\text{conversions distribute over binary operations}) \\ f(a) \sqcup f(b) \approx f(b) & \iff & (\text{definition of less than}) \\ f(a) \leq f(b) \end{array} $$
Proving that longer chains can’t exist in the second (e.g., tuple) lattice: we’ve already seen the mechanism to port a chain from one lattice to another lattice, and we can use this same mechanism (but switching directions) to go in reverse. If we do that, we can take a chain of questionable length in the tuple lattice, port it back to the key-value map, and use the (already known) fact that its chains are bounded to conclude the same thing about the tuple chain.

As you can tell, the chain porting mechanism is doing the heavy lifting here. It’s relatively easy to implement given the conditions we’ve set on conversion functions, in both directions:

From Isomorphism.agda, lines 52 through 64

        f-preserves-̷≈ : f Preserves (λ x y → ¬ x ≈₁ y) ⟶  (λ x y → ¬ x ≈₂ y)
        f-preserves-̷≈ x̷≈y = λ fx≈fy → x̷≈y (f-Injective fx≈fy)

        g-preserves-̷≈ : g Preserves (λ x y → ¬ x ≈₂ y) ⟶  (λ x y → ¬ x ≈₁ y)
        g-preserves-̷≈ x̷≈y = λ gx≈gy → x̷≈y (g-Injective gx≈gy)

        portChain₁ : ∀ {a₁ a₂ : A} {h : ℕ} → Chain₁ a₁ a₂ h → Chain₂ (f a₁) (f a₂) h
        portChain₁ (done₁ a₁≈a₂) = done₂ (f-preserves-≈₁ a₁≈a₂)
        portChain₁ (step₁ {a₁} {a₂} (a₁≼a₂ , a₁̷≈a₂) a₂≈a₂' c) = step₂ (≈₂-trans (≈₂-sym (f-⊔-distr a₁ a₂)) (f-preserves-≈₁ a₁≼a₂) , f-preserves-̷≈ a₁̷≈a₂) (f-preserves-≈₁ a₂≈a₂') (portChain₁ c)

        portChain₂ : ∀ {b₁ b₂ : B} {h : ℕ} → Chain₂ b₁ b₂ h → Chain₁ (g b₁) (g b₂) h
        portChain₂ (done₂ a₂≈a₁) = done₁ (g-preserves-≈₂ a₂≈a₁)
        portChain₂ (step₂ {b₁} {b₂} (b₁≼b₂ , b₁̷≈b₂) b₂≈b₂' c) = step₁ (≈₁-trans (≈₁-sym (g-⊔-distr b₁ b₂)) (g-preserves-≈₂ b₁≼b₂) , g-preserves-̷≈ b₁̷≈b₂) (g-preserves-≈₂ b₂≈b₂') (portChain₂ c)

With that, we can prove the second lattice’s finite height:

From Isomorphism.agda, lines 66 through 80

    isFiniteHeightLattice : IsFiniteHeightLattice B height _≈₂_ _⊔₂_ _⊓₂_
    isFiniteHeightLattice =
        let
            open Chain.Height (IsFiniteHeightLattice.fixedHeight fhlA)
                using ()
                renaming (⊥ to ⊥₁; ⊤ to ⊤₁; bounded to bounded₁; longestChain to c)
        in record
            { isLattice = lB
            ; fixedHeight = record
                { ⊥ = f ⊥₁
                ; ⊤ = f ⊤₁
                ; longestChain = portChain₁ c
                ; bounded = λ c' → bounded₁ (portChain₂ c')
                }
            }

The conversion functions are also not too difficult to define. I give them below, but I refrain from showing proofs of the more involved properties (such as the fact that from and to are inverses, preserve equivalence, and distribute over join) here. You can view them by clicking the link at the top of the code block below.

From FiniteValueMap.agda, lines 68 through 85

    from : ∀ {ks : List A} → FiniteMap ks → IterProd (length ks)
    from {[]} (([] , _) , _) = tt
    from {k ∷ ks'} (((k' , v) ∷ fm' , push _ uks') , refl) =
        (v , from ((fm' , uks'), refl))

    to : ∀ {ks : List A} → Unique ks → IterProd (length ks) → FiniteMap ks
    to {[]} _ ⊤ = (([] , empty) , refl)
    to {k ∷ ks'} (push k≢ks' uks') (v , rest) =
        let
            ((fm' , ufm') , fm'≡ks') = to uks' rest

            -- This would be easier if we pattern matched on the equiality proof
            -- to get refl, but that makes it harder to reason about 'to' when
            -- the arguments are not known to be refl.
            k≢fm' = subst (λ ks → All (λ k' → ¬ k ≡ k') ks) (sym fm'≡ks') k≢ks'
            kvs≡ks = cong (k ∷_) fm'≡ks'
        in
            (((k , v) ∷ fm' , push k≢fm' ufm') , kvs≡ks)

Above, FiniteValueMap ks is the type of maps whose keys are fixed to ks; defined as follows:

From FiniteMap.agda, lines 58 through 60

58
59
60

module WithKeys (ks : List A) where
    FiniteMap : Set (a ⊔ℓ b)
    FiniteMap = Σ Map (λ m → Map.keys m ≡ ks)

Proving the remaining properties (which as I mentioned, I omit from the main body of the post) is sufficient to apply the isomorphism, proving that maps with finite keys are of a finite height.

Using the Finite Height Property

Lattices having a finite height is a crucial property for the sorts of static program analyses I’ve been working to implement. We can create functions that traverse “up” through the lattice, creating larger values each time. If these lattices are of a finite height, then the static analyses functions can only traverse “so high”. Under certain conditions, this guarantees that our static analysis will eventually terminate with a fixed point. Pragmatically, this is a state in which running our analysis does not yield any more information.

The way that the fixed point is found is called the fixed point algorithm. We’ll talk more about this in the next post.

Appendix: The Unit Lattice

The unit lattice is a relatively boring one. I use the built-in unit type in Agda, which (perhaps a bit confusingly) is represented using the symbol ⊤. It only has a single constructor, tt.

From Unit.agda, lines 6 through 7

6
7

open import Data.Unit using (⊤; tt) public
open import Data.Unit.Properties using (_≟_; ≡-setoid)

The equivalence for the unit type is just propositional equality (we have no need to identify unequal values of ⊤, since there is only one value).

From Unit.agda, lines 17 through 25

_≈_ : ⊤ → ⊤ → Set
_≈_ = _≡_

≈-equiv : IsEquivalence ⊤ _≈_
≈-equiv = record
    { ≈-refl = refl
    ; ≈-sym = sym
    ; ≈-trans = trans
    }

Both the join $(\sqcup)$ and meet $(\sqcap)$ operations are trivially defined; in both cases, they simply take two tts and produce a new tt. Mathematically, one might write this as $(\text{tt}, \text{tt}) \mapsto \text{tt}$. In Agda:

From Unit.agda, lines 30 through 34

_⊔_ : ⊤ → ⊤ → ⊤
tt ⊔ tt = tt

_⊓_ : ⊤ → ⊤ → ⊤
tt ⊓ tt = tt

These operations are trivially associative, commutative, and idempotent.

From Unit.agda, lines 39 through 46

⊔-assoc : (x y z : ⊤) → ((x ⊔ y) ⊔ z) ≈ (x ⊔ (y ⊔ z))
⊔-assoc tt tt tt = Eq.refl

⊔-comm : (x y : ⊤) → (x ⊔ y) ≈ (y ⊔ x)
⊔-comm tt tt = Eq.refl

⊔-idemp : (x : ⊤) → (x ⊔ x) ≈ x
⊔-idemp tt = Eq.refl

That’s sufficient for them to be semilattices:

From Unit.agda, lines 48 through 54

isJoinSemilattice : IsSemilattice ⊤ _≈_ _⊔_
isJoinSemilattice = record
    { ≈-equiv = ≈-equiv
    ; ≈-⊔-cong = ≈-⊔-cong
    ; ⊔-assoc = ⊔-assoc
    ; ⊔-comm  = ⊔-comm
    ; ⊔-idemp = ⊔-idemp

The absorption laws are also trivially satisfied, which means that the unit type forms a lattice.

From Unit.agda, lines 78 through 90

absorb-⊔-⊓ : (x y : ⊤) → (x ⊔ (x ⊓ y)) ≈ x
absorb-⊔-⊓ tt tt = Eq.refl

absorb-⊓-⊔ : (x y : ⊤) → (x ⊓ (x ⊔ y)) ≈ x
absorb-⊓-⊔ tt tt = Eq.refl

isLattice : IsLattice ⊤ _≈_ _⊔_ _⊓_
isLattice = record
    { joinSemilattice = isJoinSemilattice
    ; meetSemilattice = isMeetSemilattice
    ; absorb-⊔-⊓ = absorb-⊔-⊓
    ; absorb-⊓-⊔ = absorb-⊓-⊔
    }

Since there’s only one element, it’s not really possible to have chains that contain any more than one value. As a result, the height (in comparisons) of the unit lattice is zero.

From Unit.agda, lines 102 through 117

private
    longestChain : Chain tt tt 0
    longestChain = done refl

    isLongest : ∀ {t₁ t₂ : ⊤} {n : ℕ} → Chain t₁ t₂ n → n ≤ 0
    isLongest {tt} {tt} (step (tt⊔tt≈tt , tt̷≈tt) _ _) = ⊥-elim (tt̷≈tt refl)
    isLongest (done _) = z≤n

fixedHeight : IsLattice.FixedHeight isLattice 0
fixedHeight = record
    { ⊥ = tt
    ; ⊤ = tt
    ; longestChain = longestChain
    ; bounded = isLongest
    }

Implementing and Verifying "Static Program Analysis" in Agda, Part 2: Combining Lattices

Thu, 08 Aug 2024 16:40:00 -0700

In the previous post, I wrote about how lattices arise when tracking, comparing and combining static information about programs. I then showed two simple lattices: the natural numbers, and the (parameterized) “above-below” lattice, which modified an arbitrary set with “bottom” and “top” elements ($\bot$ and $\top$ respectively). One instance of the “above-below” lattice was the sign lattice, which could be used to reason about the signs (positive, negative, or zero) of variables in a program.

At the end of that post, I introduced a source of complexity: the “full” lattices that we want to use for the program analysis aren’t signs or numbers, but maps of states and variables to lattice-based descriptions. The full lattice for sign analysis might something in the form:

$$ \text{Info} \triangleq \text{ProgramStates} \to (\text{Variables} \to \text{Sign}) $$

Thus, we have to compare and find least upper bounds (e.g.) of not just signs, but maps! Proving the various lattice laws for signs was not too challenging, but for for a two-level map like $\text{Info}$ above, we’d need to do a lot more work. We need tools to build up such complicated lattices.

The way to do this, it turns out, is by using simpler lattices as building blocks. To start with, let’s take a look at a very simple way of combining lattices into a new one: taking the Cartesian product.

The Cartesian Product Lattice

Suppose you have two lattices $L_1$ and $L_2$. As I covered in the previous post, each lattice comes equipped with a “least upper bound” operator $(\sqcup)$ and a “greatest lower bound” operator $(\sqcap)$. Since we now have two lattices, let’s use numerical suffixes to disambiguate between the operators of the first and second lattice: $(\sqcup_1)$ will be the LUB operator of the first lattice $L_1$, and $(\sqcup_2)$ of the second lattice $L_2$, and so on.

Then, let’s take the Cartesian product of the elements of $L_1$ and $L_2$; mathematically, we’ll write this as $L_1 \times L_2$, and in Agda, we can just use the standard Data.Product module. Then, I’ll define the lattice as another parameterized module. Since both $L_1$ and $L_2$ are lattices, this parameterized module will require IsLattice instances for both types:

From Prod.agda, lines 1 through 7

open import Lattice

module Lattice.Prod {a b} {A : Set a} {B : Set b}
    (_≈₁_ : A → A → Set a) (_≈₂_ : B → B → Set b)
    (_⊔₁_ : A → A → A) (_⊔₂_ : B → B → B)
    (_⊓₁_ : A → A → A) (_⊓₂_ : B → B → B)
    (lA : IsLattice A _≈₁_ _⊔₁_ _⊓₁_) (lB : IsLattice B _≈₂_ _⊔₂_ _⊓₂_) where

Elements of $L_1 \times L_2$ are in the form $(l_1, l_2)$, where $l_1 \in L_1$ and $l_2 \in L_2$. Knowing that, let’s define what it means for two such elements to be equal. Recall that we opted for a custom equivalence relation instead of definitional equality to allow similar elements to be considered equal; we’ll have to define a similar relation for our new product lattice. That’s easy enough: we have an equality predicate _≈₁_ that checks if an element of $L_1$ is equal to another, and we have _≈₂_ that does the same for $L_2$. It’s reasonable to say that pairs of elements are equal if their respective first and second elements are equal:

$$ (l_1, l_2) \approx (j_1, j_2) \iff l_1 \approx_1 j_1 \land l_2 \approx_2 j_2 $$

In Agda:

From Prod.agda, lines 39 through 40

39
40

_≈_ : A × B → A × B → Set (a ⊔ℓ b)
(a₁ , b₁) ≈ (a₂ , b₂) = (a₁ ≈₁ a₂) × (b₁ ≈₂ b₂)

Verifying that this relation has the properties of an equivalence relation boils down to the fact that _≈₁_ and _≈₂_ are themselves equivalence relations.

From Prod.agda, lines 42 through 48

≈-equiv : IsEquivalence (A × B) _≈_
≈-equiv = record
    { ≈-refl = λ {p} → (≈₁-refl , ≈₂-refl)
    ; ≈-sym = λ {p₁} {p₂} (a₁≈a₂ , b₁≈b₂) → (≈₁-sym a₁≈a₂ , ≈₂-sym b₁≈b₂)
    ; ≈-trans = λ {p₁} {p₂} {p₃} (a₁≈a₂ , b₁≈b₂) (a₂≈a₃ , b₂≈b₃) →
        ( ≈₁-trans a₁≈a₂ a₂≈a₃ , ≈₂-trans b₁≈b₂ b₂≈b₃ )
    }

Defining $(\sqcup)$ and $(\sqcap)$ by simply applying the corresponding operators from $L_1$ and $L_2$ seems quite natural as well.

$$ (l_1, l_2) \sqcup (j_1, j_2) \triangleq (l_1 \sqcup_1 j_1, l_2 \sqcup_2 j_2) \\ (l_1, l_2) \sqcap (j_1, j_2) \triangleq (l_1 \sqcap_1 j_1, l_2 \sqcap_2 j_2) $$

As an example, consider the product lattice $\text{Sign}\times\text{Sign}$, which is made up of pairs of signs that we talked about in the previous post. Two elements of this lattice are $(+, +)$ and $(+, -)$. Here’s how the $(\sqcup)$ operation is evaluated on them:

$$ (+, +) \sqcup (+, -) = (+ \sqcup + , + \sqcup -) = (+ , \top) $$

In Agda, the definition is written very similarly to its mathematical form:

From Prod.agda, lines 50 through 54

_⊔_ : A × B → A × B → A × B
(a₁ , b₁) ⊔ (a₂ , b₂) = (a₁ ⊔₁ a₂ , b₁ ⊔₂ b₂)

_⊓_ : A × B → A × B → A × B
(a₁ , b₁) ⊓ (a₂ , b₂) = (a₁ ⊓₁ a₂ , b₁ ⊓₂ b₂)

All that’s left is to prove the various (semi)lattice properties. Intuitively, we can see that since the “combined” operator _⊔_ just independently applies the element operators _⊔₁_ and _⊔₂_, as long as they are idempotent, commutative, and associative, so is the “combined” operator itself. Moreover, the proofs that _⊔_ and _⊓_ form semilattices are identical up to replacing $(\sqcup)$ with $(\sqcap)$. Thus, in Agda, we can write the code once, parameterizing it by the binary operators involved (and proofs that these operators obey the semilattice laws).

From Prod.agda, lines 56 through 82

private module ProdIsSemilattice (f₁ : A → A → A) (f₂ : B → B → B) (sA : IsSemilattice A _≈₁_ f₁) (sB : IsSemilattice B _≈₂_ f₂) where
    isSemilattice : IsSemilattice (A × B) _≈_ (λ (a₁ , b₁) (a₂ , b₂) → (f₁ a₁ a₂ , f₂ b₁ b₂))
    isSemilattice = record
        { ≈-equiv = ≈-equiv
        ; ≈-⊔-cong = λ (a₁≈a₂ , b₁≈b₂) (a₃≈a₄ , b₃≈b₄) →
            ( IsSemilattice.≈-⊔-cong sA a₁≈a₂ a₃≈a₄
            , IsSemilattice.≈-⊔-cong sB b₁≈b₂ b₃≈b₄
            )
        ; ⊔-assoc = λ (a₁ , b₁) (a₂ , b₂) (a₃ , b₃) →
            ( IsSemilattice.⊔-assoc sA a₁ a₂ a₃
            , IsSemilattice.⊔-assoc sB b₁ b₂ b₃
            )
        ; ⊔-comm = λ (a₁ , b₁) (a₂ , b₂) →
            ( IsSemilattice.⊔-comm sA a₁ a₂
            , IsSemilattice.⊔-comm sB b₁ b₂
            )
        ; ⊔-idemp = λ (a , b) →
            ( IsSemilattice.⊔-idemp sA a
            , IsSemilattice.⊔-idemp sB b
            )
        }

isJoinSemilattice : IsSemilattice (A × B) _≈_ _⊔_
isJoinSemilattice = ProdIsSemilattice.isSemilattice _⊔₁_ _⊔₂_ joinSemilattice₁ joinSemilattice₂

isMeetSemilattice : IsSemilattice (A × B) _≈_ _⊓_
isMeetSemilattice = ProdIsSemilattice.isSemilattice _⊓₁_ _⊓₂_ meetSemilattice₁ meetSemilattice₂

Above, I used f₁ to stand for “either _⊔₁_ or _⊓₁_”, and similarly f₂ for “either _⊔₂_ or _⊓₂_”. Much like the semilattice properties, proving lattice properties boils down to applying the lattice properties of $L_1$ and $L_2$ to individual components.

From Prod.agda, lines 84 through 96

isLattice : IsLattice (A × B) _≈_ _⊔_ _⊓_
isLattice = record
    { joinSemilattice = isJoinSemilattice
    ; meetSemilattice = isMeetSemilattice
    ; absorb-⊔-⊓ = λ (a₁ , b₁) (a₂ , b₂) →
        ( IsLattice.absorb-⊔-⊓ lA a₁ a₂
        , IsLattice.absorb-⊔-⊓ lB b₁ b₂
        )
    ; absorb-⊓-⊔ = λ (a₁ , b₁) (a₂ , b₂) →
        ( IsLattice.absorb-⊓-⊔ lA a₁ a₂
        , IsLattice.absorb-⊓-⊔ lB b₁ b₂
        )
    }

This concludes the definition of the product lattice, which is made up of two other lattices. If we have a type of analysis that can be expressed as a pair of two signs, [note: Perhaps the signs are the smallest and largest possible values of a variable. ] for example, we won’t have to do all the work of proving the (semi)lattice properties of those pairs. In fact, we can build up even bigger data structures. By taking a product twice, like $L_1 \times (L_2 \times L_3)$, we can construct a lattice of 3-tuples. Any of the lattices involved in that product can itself be a product; we can therefore create lattices out of arbitrary bundles of data, so long as the smallest pieces that make up the bundles are themselves lattices.

Products will come very handy a bit later in this series. For now though, our goal is to create another type of lattice: the map lattice. We will take the same approach we did with products: assuming the elements of the map are lattices, we’ll prove that the map itself is a lattice. Then, just like we could put products inside products when building up lattices, we’ll be able to put a map inside a map. This will allow us to represent the $\text{Info}$ lattice, which is a map of maps.

The Map Lattice

The Theory

When I say “map”, what I really means is something that associates keys with values, like dictionaries in Python. This data structure need not have a value for every possible key; a very precise author might call such a map a “partial map”. We might have a map whose value (in Python-ish notation) is { "x": +, "y": - }. Such a map states that the sign of the variable x is +, and the sign of variable y is -. Another possible map is { "y": +, "z": - }; this one states that the sign of y is +, and the sign of another variable z is -.

Let’s start thinking about what sorts of lattices our maps will be. The thing that motivated our introduction of lattices was comparing them by “specificity”, so let’s try figure out how to compare maps. For that, we can begin small, by looking at singleton maps. If we have {"x": +} and {"x": ⊤}, which one of them is smaller? Well, we have previously established that + is more specific (and thus less than) ⊤. Thus, it shouldn’t be too much of a stretch to say that for singleton maps of the same key, the one with the smaller value is smaller.

Now, what about a pair of singleton maps like {"x": +} and {"y": ⊤}? Among these two, each contains some information that the other does not. Although the value of y is larger than the value of x, it describes a different key, so it seems wrong to use that to call the y-singleton “larger”. Let’s call these maps incompatible, then. More generally, if we have two maps and each one has a key that the other doesn’t, we can’t compare them.

If only one map has a unique key, though, things are different. Take for instance {"x": +} and {"x": +, "y": +}. Are they really incomparable? The keys that the two maps do share can be compared (+ <= +, because they’re equal).

All of the above leads to the following conventional definition, which I find easier to further motivate using $(\sqcup)$ and $(\sqcap)$ (and do so below).

A map m1 is less than or equal to another map m2 (m1 <= m2) if for every key k that has a value in m1, the key also has a value in m2, and m1[k] <= m2[k].

That definitions matches our intuitions so far. The only key in {"x": +} is x; this key is also in {"x": ⊤} (check) and + < ⊤ (check). On the other hand, both {"x": +} and {"y": ⊤} have a key that the other doesn’t, so the definition above is not satisfied. Finally, for {"x": +} and {"x": +, "y": +}, the only key in the former is also present in the latter, and + <= +; the definition is satisfied.

Next, we need to define the $(\sqcup)$ and $(\sqcap)$ operators that match our definition of “less than or equal”. Let’s start with $(\sqcup)$. For two maps $m_1$ and $m_2$, the join of those two maps, $m_1 \sqcup m_2$ should be greater than or equal to both; in other words, both sub-maps should be less than or equal to the join.

Our newly-introduced condition for “less than or equal” requires that each key in the smaller map be present in the larger one; as a result, $m_1 \sqcup m_2$ should contain all the keys in $m_1$ and all the keys in $m_2$. So, we could just take the union of the two maps: copy values from both into the result. Only, what happens if both $m_1$ and $m_2$ have a value mapped to a particular key $k$? The values in the two maps could be distinct, and they might even be incomparable. This is where the second part of the condition kicks in: the value in the combination of the maps needs to be bigger than the value in either sub-map. We already know how to get a value that’s bigger than two other values: we use a join on the values!

Thus, define $m_1 \sqcup m_2$ as a map that has all the keys from $m_1$ and $m_2$, where the value at a particular key is given as follows:

$$ (m_1 \sqcup m_2)[k] = \begin{cases} m_1[k] \sqcup m_2[k] & k \in m_1, k \in m_2 \\ m_1[k] & k \in m_1, k \notin m_2 \\ m_2[k] & k \notin m_1, k \in m_2 \end{cases} $$

If you’re familiar with set theory, this operation is like an extension of the union operator $(\cup)$ [note: There are, of course, other ways to extend the "union" operation to maps. Haskell, for instance, defines it in a "left-biased" way (preferring the elements from the left operand of the operation when duplicates are encountered).

However, with a "join" operation $(\sqcup)$ that's defined on the values stored in the map gives us an extra tool to work with. As a result, I would argue that our extension, given such an operator, is the most natural. ] to maps. In fact, this begins to motivate the choice to use $(\sqcup)$ to denote this operation. A further bit of motivation is this: we’ve already seen that the $(\sqcup)$ and $(\sqcap)$ operators correspond to “or” and “and”. The elements in the union of two sets are precisely those that are in one set or the other. Thus, using union here fits our notion of how the $(\sqcup)$ operator behaves.

Now, let’s take a look at the $(\sqcap)$ operator. For two maps $m_1$ and $m_2$, the meet of those two maps, $m_1 \sqcap m_2$ should be less than or equal to both. Our definition above requires that each key of the smaller map is present in the larger map; for the combination of two maps to be smaller than both, we must ensure that it only has keys present in both maps. To combine the elements from the two maps, we can use the $(\sqcap)$ operator on values.

$$ (m_1 \sqcap m_2)[k] = m_1[k] \sqcap m_2[k] $$

Turning once again to set theory, we can think of this operation like the extension of the intersection operator $(\cap)$ to maps. This can be motivated in the same way as the union operation above; the $(\sqcap)$ operator combines lattice elements in such away that the result represents both of them, and intersections of sets contain elements that are in both sets.

Now we have the the two binary operators and the comparison function in hand. There’s just one detail we’re missing: what it means for two maps to be equivalent. Here, once again we take our cue from set theory: two sets are said to be equal when each one is a subset of the other. Mathematically, we can write this as follows:

$$ m_1 \approx m_2 \triangleq m_1 \subseteq m_2 \land m_1 \supseteq m_2 $$

I might as well show you the Agda definition of this, since it’s a word-for-word transliteration:

From Map.agda, lines 530 through 531

530
531

_≈_ : Map → Map → Set (a ⊔ℓ b)
_≈_ m₁ m₂ = m₁ ⊆ m₂ × m₂ ⊆ m₁

Defining equivalence more abstractly this way helps avoid concerns about the precise implementation of our maps.

Okay, but we haven’t actually defined what it means for one map to be a subset of another. My definition is as follows: if $m_1 \subseteq m_2$, that is, if $m_1$ is a subset of $m_2$, then every key in $m_1$ is also present in $m_2$, and they are mapped to the same value. My first stab at a mathematical definition of this is the following:

$$ m_1 \subseteq m_2 \triangleq \forall k, v.\ (k, v) \in m_1 \Rightarrow (k, v) \in m_2 $$

Only there’s a slight complication; remember that our values themselves come from a lattice, and that this lattice might use its own equivalence operator $(\approx)$ to group similar elements. One example where this is important is our now-familiar “map of maps” scenario: the values store in the “outer” map are themselves maps, and we don’t want the order of the keys or other menial details of the inner maps to influence whether the outer maps are equal. Thus, we settle for a more robust definition of $m_1 \subseteq m_2$ that allows $m_1$ to have different-but-equivalent values from those in $m_2$.

$$ m_1 \subseteq m_2 \triangleq \forall k, v.\ (k, v) \in m_1 \Rightarrow \exists v'.\ v \approx v' \land (k, v') \in m_2 $$

In Agda, the core of my definition is once again very close:

From Map.agda, lines 98 through 99

98
99

    subset m₁ m₂ = ∀ (k : A) (v : B) → (k , v) ∈ m₁ →
                   Σ B (λ v' → v ≈₂ v' × ((k , v') ∈ m₂))

The Implementation

Now it’s time to show you how I implemented the Map lattice. I chose represent maps using a list of key-value pairs, along with a condition that the keys are unique (non-repeating). I chose this definition because it was simple to implement, and because it makes it possible to iterate over the keys of a map. That last property is useful if we use the maps to later represent sets (which I did). Moreover, lists of key-value pairs are easy to serialize and write to disk. This isn’t hugely important for my immediate static program analysis needs, but it might be nice in the future. The requirement that the keys are unique prevents the map from being a multi-map (which might have several values associated with a particular key).

My Map module is parameterized by the key and value types (A and B respectively), and additionally requires some additional properties to be satisfied by these types.

From Map.agda, lines 6 through 10

module Lattice.Map {a b : Level} {A : Set a} {B : Set b}
    {_≈₂_ : B → B → Set b}
    {_⊔₂_ : B → B → B} {_⊓₂_ : B → B → B}
    (≡-dec-A : Decidable (_≡_ {a} {A}))
    (lB : IsLattice B _≈₂_ _⊔₂_ _⊓₂_) where

For A, the key property is the decidability of equality: there should be a way to compare keys for equality. This is important for all sorts of map operations. For example, when inserting a new value into a map, we need to decide if the value is already present (so that we know to override it), but if we can’t check if two values are equal, we can’t see if it’s already there.

The values of the map (represented by B) we expected to be lattices, so we require them to provide the lattice operations $(\sqcup)$ and $(\sqcap)$, as well as the equivalence relation $(\approx)$ and the proof of the lattice properties in isLattice. To distinguish the lattice operations on B from the ones we’ll be defining on the map itself – you might’ve noticed that there’s a bit of overleading going on in this post – I’ve suffixed them with the subscript 2. My convention is to use the subscript corresponding to the number of the type parameter. Here, A is “first” and B is “second”, so the operators on B get 2.

From there, I define the map as a pair; the first component is the list of key-value pairs, and the second is the proof that all the keys in the list occur only once.

From Map.agda, lines 480 through 481

480
481

Map : Set (a ⊔ℓ b)
Map = Σ (List (A × B)) (λ l → Unique (ImplKeys.keys l))

Now, to implement union and intersection; for the most part, the proofs deal just with the first component of the map – the key-value pairs. For union, the key operation is “insert-or-combine”. We can think of merging two maps as inserting all the keys from one map (arbitrary, the “left”) into the other. If a key is not in the “left” map, insertion won’t do anything to its prior value in the right map; similarly, if a key is not in the “right” map, then it should appear unchanged in the final result after insertion. Finally, if a key is inserted into the “right” map, but already has a value there, then the two values need to be combined using _⊔₂_. This leads to the following definition of insert on key-value pair lists:

From Map.agda, lines 114 through 118

    insert : A → B → List (A × B) → List (A × B)
    insert k v [] = (k , v) ∷ []
    insert k v (x@(k' , v') ∷ xs) with ≡-dec-A k k'
    ...                             | yes _ = (k' , f v v') ∷ xs
    ...                             | no _ = x ∷ insert k v xs

Above, f is just a stand-in for _⊓₂_ (making the definition a tiny bit more general). For each element in the “right” key-value list, we check if its key matches the one we’re inserting; if it does, we have to combine the values, and there’s no need to recurse into the rest of the list. If on the other hand the key doesn’t match, we move on to the next element of the list. If we run out of elements, we know that the key we’re inserting wasn’t in the “right” map, so we insert it as-is.

The union operation is just about inserting every pair from one map into another.

From Map.agda, lines 120 through 121

120
121

    union : List (A × B) → List (A × B) → List (A × B)
    union m₁ m₂ = foldr insert m₂ m₁

Here, I defined my own version of foldr which unpacks the pairs, for convenience:

(Click here to see the definition of my foldr)

From Map.agda, lines 110 through 112

110
111
112

        foldr : ∀ {c} {C : Set c} → (A → B → C → C) -> C -> List (A × B) -> C
        foldr f b [] = b
        foldr f b ((k , v) ∷ xs) = f k v (foldr f b xs)

For intersection, we do something similar; however, since only elements in both maps should be in the final output, if our “insertion” doesn’t find an existing key, it should just fall through; this can be achieved by defining a version of insert whose base case simply throws away the input. Of course, this function should also use _⊓₂_ instead of _⊔₂_; below, though, I again use a general function f to provide a more general definition. I called this version of the function update.

From Map.agda, lines 295 through 299

    update : A → B → List (A × B) → List (A × B)
    update k v [] = []
    update k v ((k' , v') ∷ xs) with ≡-dec-A k k'
    ...                            | yes _ = (k' , f v v') ∷ xs
    ...                            | no _ = (k' , v') ∷ update k v xs

Just changing insert to update is not enough. It’s true that calling update with all keys from m1 on m2 would forget all keys unique to m1, it would still leave behind the only-in-m2 keys. To get rid of these, I defined another function, restrict, that drops all keys in its second argument that aren’t present in its first argument.

From Map.agda, lines 304 through 308

    restrict : List (A × B) → List (A × B) → List (A × B)
    restrict l [] = []
    restrict l ((k' , v') ∷ xs) with ∈k-dec k' l
    ...                           | yes _ = (k' , v') ∷ restrict l xs
    ...                           | no _ = restrict l xs

Altogether, intesection is defined as follows, where updates just calls update for every key-value pair in its first argument.

From Map.agda, lines 310 through 311

310
311

    intersect : List (A × B) → List (A × B) → List (A × B)
    intersect l₁ l₂ = restrict l₁ (updates l₁ l₂)

The next hurdle is all the proofs about these implementations. I will leave the details of the proofs either as appendices or as links to other posts on this site.

The first key property is that the insertion, union, update, and intersection operations all preserve uniqueness of keys; the proofs for this are here. The set of properties are the lattice laws for union and intersection. The proofs of those proceed by cases; to prove that $(\sqcup)$ is commutative, we reason that if $(k , v) \in m_1 \sqcup m_2$, then it must be either in $m_1$, in $m_2$, or in both; for each of these three possible cases, we can show that $(k , v)$ must be the same in $m_2 \sqcup m_1$. Things get even more tedious for proofs of associativity, since there are 7 cases to consider; I describe the strategy I used for such proofs in my article about the “Expression” pattern in Agda.

Additional Properties of Lattices

The product and map lattices are the two pulling the most weight in my implementation of program analyses. However, there’s an additional property that they have: if the lattices they are made of have a finite height, then so do products and map lattices themselves. A lattice having a finite height means that we can only line up so many elements using the less-than operator <. For instance, the natural numbers are not a finite-height lattice; we can create the infinite chain:

$$ 0 < 1 < 2 < ... $$

On the other hand, our sign lattice is of finite height; the longest chains we can make have three elements and two < signs. Here’s one:

$$ \bot < + < \top $$

As a result of this, pairs of signs also have a finite height; the longest chains we can make have five elements and four < signs. An example: [note: Notice that the elements in the example progress the same way as the ones in the single-sign chain. This is no accident; the longest chains in the pair lattice can be constructed from longest chains of its element lattices. The length of the product lattice chain, counted by the number of "less than" signs, is the sum of the lengths of the element chains. ]

$$ (\bot, \bot) < (\bot, +) < (\bot, \top) < (+, \top) < (\top, \top) $$

The same is true for maps, under certain conditions.

The finite-height property is crucial to lattice-based static program analysis; we’ll talk about it in more detail in the next post of this series.

Appendix: Proof of Uniqueness of Keys

I will provide sketches of the proofs here, and omit the implementations of my lemmas. Click on the link in the code block headers to jump to their implementation on my Git server.

First, note that if we’re inserting a key that’s already in a list, then the keys of that list are unchanged.

From Map.agda, lines 123 through 124

123
124

    insert-keys-∈ : ∀ {k : A} {v : B} {l : List (A × B)} →
                    k ∈k l → keys l ≡ keys (insert k v l)

On the other hand, if we’re inserting a new key, it ends up at the end, and the rest of the keys are unchanged.

From Map.agda, lines 134 through 135

134
135

    insert-keys-∉ : ∀ {k : A} {v : B} {l : List (A × B)} →
                    ¬ (k ∈k l) → (keys l ++ (k ∷ [])) ≡ keys (insert k v l)

Then, for any given key-value pair, the key either is or isn’t in the list we’re inserting it into. If it is, then the list ends up unchanged, and remains unique if it was already unique. On the other hand, if it’s not in the list, then it ends up at the end; adding a new element to the end of a unique list produces another unique list. Thus, in either case, the final keys are unique.

From Map.agda, lines 143 through 148

    insert-preserves-Unique : ∀ {k : A} {v : B} {l : List (A × B)}
                              → Unique (keys l) → Unique (keys (insert k v l))
    insert-preserves-Unique {k} {v} {l} u
        with (∈k-dec k l)
    ...   | yes k∈kl rewrite insert-keys-∈ {v = v} k∈kl = u
    ...   | no k∉kl rewrite sym (insert-keys-∉ {v = v} k∉kl) = Unique-append k∉kl u

By induction, we can then prove that calling insert many times as we do in union preserves uniqueness too. Here, insert-preserves-Unique serves as the inductive step.

From Map.agda, lines 164 through 168

    union-preserves-Unique : ∀ (l₁ l₂ : List (A × B)) →
                             Unique (keys l₂) → Unique (keys (union l₁ l₂))
    union-preserves-Unique [] l₂ u₂ = u₂
    union-preserves-Unique ((k₁ , v₁) ∷ xs₁) l₂ u₂ =
        insert-preserves-Unique (union-preserves-Unique xs₁ l₂ u₂)

For update, things are simple; it doesn’t change the keys of the argument list at all, since it only modifies, and doesn’t add new pairs. This is captured by the update-keys property:

From Map.agda, lines 313 through 314

313
314

    update-keys : ∀ {k : A} {v : B} {l : List (A × B)} →
                  keys l ≡ keys (update k v l)

If the keys don’t change, they obviously remain unique.

From Map.agda, lines 328 through 330

328
329
330

    update-preserves-Unique : ∀ {k : A} {v : B} {l : List (A × B)} →
                              Unique (keys l) → Unique (keys (update k v l ))
    update-preserves-Unique {k} {v} {l} u rewrite update-keys {k} {v} {l} = u

For restrict, we note that it only ever removes keys; as a result, if a key was not in the input to restrict, then it won’t be in its output, either.

From Map.agda, lines 337 through 338

337
338

    restrict-preserves-k≢ : ∀ {k : A} {l₁ l₂ : List (A × B)} →
                            All (λ k' → ¬ k ≡ k') (keys l₂) → All (λ k' → ¬ k ≡ k') (keys (restrict l₁ l₂))

As a result, for each key of the list being restricted, we either drop it (which does not damage uniqueness) or we keep it; since we only remove keys, and since the keys were originally unique, the key we kept won’t conflict with any of the other final keys.

From Map.agda, lines 345 through 351

    restrict-preserves-Unique : ∀ {l₁ l₂ : List (A × B)} →
                                Unique (keys l₂) → Unique (keys (restrict l₁ l₂))
    restrict-preserves-Unique {l₁} {[]} _ = Utils.empty
    restrict-preserves-Unique {l₁} {(k , v) ∷ xs} (push k≢xs uxs)
        with ∈k-dec k l₁
    ...   | yes _ = push (restrict-preserves-k≢ k≢xs) (restrict-preserves-Unique uxs)
    ...   | no _ = restrict-preserves-Unique uxs

Since both update and restrict preserve uniqueness, then so does intersect:

From Map.agda, lines 353 through 355

353
354
355

    intersect-preserves-Unique : ∀ {l₁ l₂ : List (A × B)} →
                                 Unique (keys l₂) → Unique (keys (intersect l₁ l₂))
    intersect-preserves-Unique {l₁} u = restrict-preserves-Unique (updates-preserve-Unique {l₁} u)

Untitled Short Story

Thu, 01 Aug 2024 20:31:18 -0700

I’m losing my edge to the art-school Brooklynites in little jackets and
borrowed nostalgia for the unremembered Eighties

The Everpresent Void was first discovered at a children’s birthday party.

Among the laughter and alluring warbling of an arcade, a party was preparing to take their seats at a worn table. The food was french fries, mediocre cheese pizza, and hamburgers; the sort of diet that would be frowned upon at home, and as a result was now highly coveted. The main event, however, turned out to be the self-serve soda machine. It provided unlimited amounts of sugary beverages at the slightest provocation, which was evidenced by the sticky layer of dried drinks that covered the table.

It was an unusual sight: such machines were still somewhat rare in those days. Soon, the children were drunk on Coca-Cola and power. Cups were filled, emptied, spilled, dropped on the floor, and used as musical instruments all while the group crowded around the soda dispenser. The birthday girl soon found a new dimension along which the machine could be abused. One cup needed not contain a single drink.

This new discovery reignited the drinking frenzy. Sensible combinations soon gave way to outrageous mixes. Drinks were paired up, tripled, quadrupled. Soon, everyone was rushing to mix every flavor together, telling stories of a chemical reaction that would occur when they were combined with precise proportions. No such reaction came.

The children were not satisfied with this conclusion. They continued their search for the missing ingredient. Having exhausted the products of the soda machine, they had to broaden their horizons. Ketchup and mustard were the first additions to their repertoire. The boys made shows of tasting and being revolted by their mustard-root-cola, while the girls squealed with disapproval and laughter. Having still failed to perform their act of alchemy, the kids braved yet further frontiers, dropping leftover pieces of cheese and torn fragments of napkins into their cup-cauldrons.

Then, it worked.

When one of the children looked back at his cup, having been distracted by another’s exaggerated gagging, he found it to contain a uniformily black fluid. This intrigued the boy; he went to prod it with his fork, but it never reached the side of the cup. Startled, he dropped the utensil, and watched it sink out of sight. This too was intriguing: the fork was noticeably longer than the container.

The others soon crowded around him to examine what was later understood to be the first instance of the Everpresent Void. They dropped straws, arcade tickets, cheap toys (purchased with arcade tickets), and coins into the cup, all of which disappeared without a sound. The boy found himself at the center of attention, and took great pleasure in recounting his latest recipe. Soon, the Void was replicated in the cups of everyone at the party.

During the first week after that incident, teachers and janitors had a particularly difficult time. Various quantities of the Void were smuggled into schools. When the staff caught on to peculiarly black water bottles, smugglers switched to more creative techniques involving Ziploc bags and photographic film tubes. Like crystals around an impurity, crowded islands formed at lunch tables with Void at their centers. The featureless and endless substance drew all attention away from civil wars, derivatives, and Steinbeck novels.

Only, the Void was not entirely featureless and endless. As kids spent entire lunch breaks gazing into the darkness of the fluid, some thought they saw something. As more took on the arduous task of sitting perfectly still and staring into space, it became clear that this was no mere trick of the mind.

With time, a light show emerged from the emptiness of the Void. It was not unlike pressing down on one’s eyes: colorful particles swirled in the darkness forming spirals and fractals. These gradually changed colors, appearing at times red-and-beige, at times blue-and-green, and everything in-between.

The display was polarizing. Swaths of children, though initially enthralled by the mysterious properties of the Void, were not sufficiently re-captured by some flashing colors. In the later parts of the week, they would leave lunch halls early to study, practice, or socialize. There were, they thought, better, more normal things to do. A minority, however, only grew more enamored with their philosopher’s stones.

Like alchemists of the past, many of the remaining experimenters had a tendency to obsess. Even as the world — with its track meets, birthday parties, and dances — went on around them, they continued their close observation of the mysterious substance. The Void proved worthy of this sustained attention. The patterns that swirled in its depths were not entirely random: they responded, reluctantly and sluggishly, to the observer’s mind. Anger and frustration tended to produce redder hues; sadness manifested as a snotty shade of green. Focusing on a particular color made it more likely to appear, as well. Following its own peculiar kind of intuition, the Void responded faster when more individuals were present.

Other promising avenues of research also grew in popularity over the following days and weeks. The precise recipe for the Void was not, it turned out, very strict. Though soda and fast food remained a constant fixture, the precise ingredients could be substituted for alternates. Pieces of swiss cheese worked just as well as cheddar. A fragment of a turkey patty replaced the traditional 100% Angus beef in a pinch. The resulting substance was as opaque and inscrutable as ever.

Following much trial error, adolescent adventurers mapped the frontiers of Void synthesis. Though the full specification is not particularly relevant, of note was the requirement for the base to be made of a mixture of sodas, and another for the final concoction to contain at least two sandwich ingredients. Orange juice, though sweet and liquid, did not catalyze the reaction, but Orange Fanta did, even if it was completely flat.

If all properties hereto described were the only notable aspects of the Void, it would have been a mere curiosity. Late-night History Channel shows might have shown it along theories of ancient aliens or telepathy, filling inattentive viewers’ dimly lit homes with comfortable background noise. The substance, however, had one final, crucial aspect. The discovery was made – as is often the case – in the midst of conflict.

The two parties could not be more different. One group consistent of the boys from Mr. Thompson’s physics class. They were skinny, bespectacled, and dressed in graphic t-shirts and jeans. The other group was made of the boys from Mrs. Leonard’s biology class; they were skinny, bespectacled, and dressed in graphic t-shirts and jeans. Naturally, the two factions were sworn enemies.

One rainy West Coast day, the two groups were engaging in customary lunch-break Void-viewing. By then, participants in the activity were relegated to the floors of a hallway in the back of the school, their sustained interest in staring-into-space taking on toll on their social standing. They were making use of advanced techniques; by then, experts were able to influence the Void not only to change color, but to form into shimmering images. The Thompsonians were constructing an image of a Christmas tree; the Leonardese were working on The Funniest Image Image to Ever Exist (something phallic).

Neither group was having much luck. The tree’s trunk was consistently much too short, and its bottom too curvy. The phallus, conversely, was quite sharp and somehow sickly looking. Each side mocked the other relentlessly. It took the insult-hurlers from the two camps several back-and-forth trips to realize the images they were ridiculing and the images their friends were conjuring were one and the same.

The Void was interconnected. By repeating a specific and precise recipe, one could reliably come back over and over to the same “place” in the infinite blackness. Painstakingly painted pictures persisted into the next day, and anyone with the same recipe could tune in to see. Enthusiasts rushed to claim recipes most affordable on modest allowances. Registries of locations and ingredients were posted on bulletin boards, written in bathroom stalls, tossed around as crumpled paper balls in class. Even those who had previously shunned the alchemists were drawn back into the fold by the promise of conjuring images of their own for others to see.

It was not until weeks later that the first glimpses of a post-Void future revealed themselves, to no one’s attention or particular interest. A groggy, not-yet-caffeinated Mrs. Leonard walked into class one day to find a sea of red fabric. Nearly every girl in the morning section showed up to class that day waring a red dress. Not a single one of the students could provide a concrete method by which they chose the day’s wardrobe; feelings, whims, and even coin-flipping were cited as reasons for wearing the outfit. What’s more, the same happened in Mr. Thompson’s class, and in a number of scattered schools throughout the country.

Being a scientist at heart, and rejecting wholeheartedly the possibility of coincidence of paranormal involvement, Mrs. Leonard spent the rest of the day distractedly overcorrecting for her earlier lack of coffee. A satisfying answer eluded her, and she came home jittery and defeated. Walking past her son’s room she noted that he was indulging in his habitual Void-gazing. In the depths of the pitch-black bowl, she glimpsed swirls of that very same shade of red.

The mechanism that precipitated the red-dress curiosity was not all that sinister. The Void was not unlike an ocean, absorbing and releasing heat, mitigating changes in its environment. After a day in which red was prevalent in the collective thoughts of Void-viewers, the color dissipated like heat through the dark realm, was stirred somehow by the convection of its hidden currents, and re-entered the minds of practitioners in its altered form. They averted their gazes and went to put on red clothes.

There was another way in which the Void resembled an ocean. Locations within it drifted through the darkness like rafts. Each day, some recipes would move closer or further apart. Whenever others occupied a nearby place in that ocean, their thoughts echoed in the silences between one’s own. Void-voyagers, their eyes directed at customary, comforting blackness, would encounter each other there, often without knowing. With each encounter, they left unnoticeable imprimnts upon one another.

If the Thompsonians or Leonardese were chemists rather than physicists and biologists, and if they were more inclined towards introspection or calm, deliberate thought, they might have observed this gradual exchange, and seen in it the physical process of diffusion, with its particles and collisions. They might have thought of drops of dye in water, swirling in beautiful patterns until finally there were no recognizable shapes, nothing to see at all except a gentle haze of red in an erlenmeyer flask. The final stage of diffusion was uniformity.

The viewers of the Void were as much connected to each other as they were disconnected from the rest of the world. They were distant, sitting in exile at lunch, looking for hours on end with mild expressions into their bowls of inky soup, talking about their latest journeys with each other on the phone. They spoke in references to landmarks in The Ocean, to happenings in their shared dimension. At the same time, they knew little of parties and dances, and they seldom — if ever — went to cheer for their peers in athletic events. It was hard for others to hold a conversation with them.

It was not only children that sought to explore the Everpresent Void; adult interest in the substance was growing. Grownups heard about it from their children, their students, or news reports. They could understand the desire to be seen. More than that, though, they say potential in the Void.

The substance was unprecedented. It was one thing to call someone on the phone, or come to their door; it was something else altogether to create a scene, in three crisp (with practice) dimensions, and to set it adrift in the populous sea of black. The Void was an invaluable tool for promotional material, advertising, and even storefronts. Adults adept at Void manipulation found themselves employed with various large companies, and sometimes even created their own. Though preconceptions about them — largely negative, and often substantiated — successfully made the jump across the age gap, these men and women became sought after and well-rewarded for their talents.

With adult influence, of course, came adult concerns. Though recipes for their windows into the other world remained permanent reminders of that first day at the arcade, older practitioners were too used to thinking about elections, taxes, and mortgages. What’s worse, the presidential candidates, tax collectors, and banks did not stop thinking about them. As months went by, it became more and more common to see blue donkeys and red elephants as motifs in the Void. These drops of color swirled with all others, splashing from raft to raft on their predetermined path to joining the haze.

Among some practitioners, there was a growing sense that the Everpresent Void was alive. It didn’t speak, or think, or breathe. Sometimes, though, its movements and currents were too deliberate to be mere chance. The connections that it made, the glimpses of nearby islands that viewers saw in the corners of their eyes, must’ve been chosen on purpose; chosen to entice. The Void wanted to be seen. It spoke to its sailors in echoes of others’ words, it showed them films whose frames were others’ images. Encountering another voyager with his nearby recipe, it was insufficient to simply extricate his thoughts from one’s own; it was also necessary to determine why he was sent as the Void’s emissary.

The echoes or films were not sent to convey a message. The Void was not aware of human logic or values, or even of the physical reality outside of its own darkness. It was indifferent to such things, and continued to behave according to some incomprehensible laws. Nevertheless, somewhere near the core of these laws was the desire to command human attention.

Nature did not bestow upon humanity the mechanisms to defend against something as otherworldly as the Void. The stories they learned each day were spoken by a chorus of voices, so loud and numerous that it seemed the whole world was speaking to them. In truth, however, each human argument sounded within that ocean’s surf — as did its refutation. Each voyager heard fragments they were used to hearing, stories they wanted to learn. Though the Void reflected no light, staring at it was looking into an endless mirror.

Through this process, the modern-day alchemists’ demeanor began to resemble their ancient counterparts’ mercury-induced insanity. They spoke in baffling absolutisms. Their language, already rich with Void-specific jargon, grew further removed from the words spoken still in coffee shops and bars. Anger and anxiety attracted attention, and so they were angry and anxious, exploding at times at seemingly innocuous occurrences. Sometimes, as with the red-dress incident, hundreds of alchemists were compelled to eat a certain food, or dress in a certain outfit. They swayed like kelp with the invisible waves of the Void.

Concurrently, the Void’s influence grew, its versatility and power proving impossible to surrender. More and more learned to create viewports into the blackness. As they did, the prevalence of madness increased. Soon everyone knew somebody affected by the lunacy. The victims remained entirely human; they kept their fond memories, their endearing mannerisms, their connections. The Void reflected their light like a carnival mirror and amplified thoughts, but it could not cultivate that which was not already there. There was nothing to do but to stand by them and hope that with time, their features would recede back into their former shape.

Years later, on a chill November night, a weary Mrs. Leonard re-entered her home, one lit house in an entire city of houses that were dark. Her son was away at college now, and her husband out with friends, leaving her to collapse onto the couch and revisit the events of the day. Her students had done well on their exams, and were rewarded with a day off; in class, they watched a nature documentary. The subject was the Amazon rainforest, and among its inhabitants were the leaf-cutter ants.

Mrs. Leonard thought about the ants. Day after day they scoured the rainforest, collecting leaves to feed to a fungal garden in their colony. Day after day, the fungus emitted chemicals that diffused from the garden, swirling in the air currents that permeated the rest of the colony. Sensing changes, the ants altered their routes to look for different sources of food.

There was, she thought, a first day to all of this, even a first moment. Before that moment, they were just ants, going about their day as all their relatives do to this day. Then, perhaps, a worker returned accompanied by a spore, and changed the course of the colony’s history.

Outside, it was storming. In the dark, roads were discernible only through streaking reflections of stoplights in puddles. Rain drummed with increasing urgency against the house’s windows; larger drops left craters in the waterscape forming on the glass. The resulting texture was not unlike mycelium.

Mrs. Leonard wondered whether the first of the leaf-cutter ants were wary of the transformation occurring around them, whether the incursion of hyphae into their familiar tunnels concerned them. Something new was beginning to live alongside them, something decidedly un-ant-like. It thought nothing of workers, queens, or larvae, and it was unconcerned with conquest, hunting, or foraging. The fungus only swelled more with each day, entangling itself deeper into their lives. Were the ants really not afraid of this thing? It certainly scared her.

It was over 45 million years ago, the narrator of the documentary had said, that the colonies likely began their new mutualistic way of life. That November night, the ants were still there, tending to their gardens. The fungus nourished their larvae in exchange for their protection. Perhaps, in some way neither she nor the ants could understand, that symbiosis warded off dangers that their competitors succumbed to.

Upon returning home, Mr. Leonard found his wife still on the couch, embers of a dying fireplace casting playful shadows across her face. He had no way of knowing, but she was dreaming of gathering leaves in a warm, humid jungle.

Implementing and Verifying "Static Program Analysis" in Agda, Part 1: Lattices

Sat, 06 Jul 2024 17:37:43 -0700

This is the first post in a series on static program analysis in Agda. See the introduction for a little bit more context.

The goal of this post is to motivate the algebraic structure called a lattice. Lattices have broad applications [note: See, for instance, Lars Hupel's excellent introduction to CRDTs which uses lattices for Conflict-Free Replicated Data Types. CRDTs can be used to implement peer-to-peer distributed systems. ] beyond static program analysis, so the work in this post is interesting in its own right. However, for the purposes of this series, I’m most interested in lattices as an encoding of program information when performing analysis. To start motivating lattices in that context, I’ll need to start with monotone frameworks.

Monotone Frameworks

The key notion for monotone frameworks is the “specificity” of information. Take, for instance, an analyzer that tries to figure out if a variable is positive, negative, or equal to zero (this is called a sign analysis, and we’ll be using this example a lot). Of course, the variable could be “none of the above” – perhaps if it was initialized from user input, which would allow both positive and negative numbers. Such an analyzer might return +, -, 0, or unknown for any given variable. These outputs are not created equal: if a variable has sign +, we know more about it than if the sign is unknown: we’ve ruled out negative numbers as possible values!

Specificity is important to us because we want our analyses to be as precise as possible. It would be valid for a program analysis to just return unknown for everything, but it wouldn’t be very useful. Thus, we want to rank possible outputs, and try pick the most specific one. The convention [note: I say convention, because it doesn't actually matter if we represent more specific values as "larger" or "smaller". Given a lattice with a particular order written as <, we can flip the sign in all relations (turning a < b into a > b), and get back another lattice. This lattice will have the same properties (more precisely, the properties will be dual). So we shouldn't fret about picking a direction for "what's less than what". ] seems to be to make more specific things “smaller” [note: Admittedly, it's a little bit odd to say that something which is "more" than something else is actually smaller. The intuition that I favor is that something that's more specific describes fewer objects: there are less white horses than horses, so "white horse" is more specific than "horse". The direction of < can be thought of as comparing the number of objects.

Note that this is only an intuition; there are equally many positive and negative numbers, but we will not group them together in our order. ] , and less specific things “larger”. Coming back to our previous example, we’d write + < unknown, since + is more specific. Of course, the exact things we’re trying to rank depend on the sort of analysis we’re trying to perform. Since I introduced sign analysis, we’re ranking signs like + and -. For other analyses, the elements will be different. The comparison, however, will be a permanent fixture.

Suppose now that we have some program analysis, and we’re feeding it some input information. Perhaps we’re giving it the signs of variables x and y, and hoping for it to give us the sign of a third variable z. It would be very unfortunate if, when given more specific information, the analysis would return a less specific output! The more you know going in, the more you should know coming out. Similarly, when given less specific / vaguer information, the analysis shouldn’t produce a more specific answer – how could it do that? This leads us to come up with the following rule:

$$ \textbf{if}\ \text{input}_1 \le \text{input}_2, \textbf{then}\ \text{analyze}(\text{input}_1) \le \text{analyze}(\text{input}_2) $$

In mathematics, such a property is called monotonicity. We say that “analyze” is a monotonic function. This property gives its name to monotone frameworks. For our purposes, this property means that being more specific “pays off”: better information in means better information out. In Agda, we can encode monotonicity as follows:

From Lattice.agda, lines 17 through 21

module _ {a b} {A : Set a} {B : Set b}
    (_≼₁_ : A → A → Set a) (_≼₂_ : B → B → Set b) where

    Monotonic : (A → B) → Set (a ⊔ℓ b)
    Monotonic f = ∀ {a₁ a₂ : A} → a₁ ≼₁ a₂ → f a₁ ≼₂ f a₂

Note that above, I defined Monotonic on an arbitrary function, whose outputs might be of a different type than its inputs. This will come in handy later.

The order < of our elements and the monotonicity of our analysis are useful to us for another reason: they help gauge and limit, in a roundabout way, how much work might be left for our analysis to do. This matters because we don’t want to allow analyses that can take forever to finish – that’s a little too long for a pragmatic tool used by people.

The key observation – which I will describe in detail in a later post – is that a monotonic analysis, in a way, “climbs upwards” through an order. As we continue using this analysis to refine information over and over, its results get less and less specific. [note: It is not a bad thing for our results to get less specific over time, because our initial information is probably incomplete. If you've only seen German shepherds in your life, that might be your picture of what a dog is like. If you then come across a chihuahua, your initial definition of "dog" would certainly not accommodate it. To allow for both German shepherds and chihuahuas, you'd have to loosen the definition of "dog". This new definition would be less specific, but it would be more accurate. ] If we add an additional ingredient, and say that the order has a fixed height, we can deduce that the analysis will eventually stop producing additional information: either it will keep “climbing”, and reach the top (thus having to stop), or it will stop on its own before reaching the top. This is the essence of the fixed-point algorithm, which in Agda-like pseudocode can be stated as follows:

module _ (IsFiniteHeight A ≺)
         (f : A → A)
         (Monotonicᶠ : Monotonic _≼_ _≼_ f) where
    -- There exists a point...
    aᶠ : A

    -- Such that applying the monotonic function doesn't change the result.
    aᶠ≈faᶠ : aᶠ ≈ f aᶠ

Moreover, the value we’ll get out of the fixed point algorithm will be the least fixed point. For us, this means that the result will be “the most specific result possible”.

From Fixedpoint.agda, line 86

`86`	`aᶠ≼ : ∀ (a : A) → a ≈ f a → aᶠ ≼ a`

The above explanation omits a lot of details, but it’s a start. To get more precise, we must drill down into several aspects of what I’ve said so far. The first of them is, how can we compare program information using an order?

Lattices

Let’s start with a question: when it comes to our specificity-based order, is - less than, greater than, or equal to +? Surely it’s not less specific; knowing that a number is negative doesn’t give you less information than knowing if that number is positive. Similarly, it’s not any more specific, for the same reason. You could consider it equally specific, but that doesn’t seem quite right either; the information is different, so comparing specificity feels apples-to-oranges. On the other hand, both + and - are clearly more specific than unknown.

The solution to this conundrum is to simply refuse to compare certain elements: + is neither less than, greater than, nor equal to -, but + < unknown and - < unknown. Such an ordering is called a partial order.

Next, another question. Suppose that the user writes code like this:

if someCondition {
  x = exprA;
} else {
  x = exprB;
}
y = x;

If exprA has sign s1, and exprB has sign s2, what’s the sign of y? It’s not necessarily s1 nor s2, since they might not match: s1 could be +, and s2 could be -, and using either + or - for y would be incorrect. We’re looking for something that can encompass both s1 and s2. Necessarily, it would be either equally specific or less specific than either s1 or s2: there isn’t any new information coming in about x, and since we don’t know which branch is taken, we stand to lose a little bit of info. However, our goal is always to maximize specificity, since more specific signs give us more information about our program.

This gives us the following constraints. Since the combined sign s has to be equally or less specific than either s1 and s2, we have s1 <= s and s2 <= s. However, we want to pick s such that it’s more specific than any other “combined sign” candidate. Thus, if there’s another sign t, with s1 <= t and s2 <= t, then it must be less specific than s: s <= t.

At first, the above constraints might seem quite complicated. We can interpret them in more familiar territory by looking at numbers instead of signs. If we have two numbers n1 and n2, what number is the smallest number that’s bigger than either n1 or n2? Why, the maximum of the two, of course!

There is a reason why I used the constraints above instead of just saying “maximum”. For numbers, max(a,b) is either a or b. However, we saw earlier that neither + nor - works as the sign for y in our program. Moreover, we agreed above that our order is partial: how can we pick “the bigger of two elements” if neither is bigger than the other? max itself doesn’t quite work, but what we’re looking for is something similar. Instead, we simply require a similar function for our signs. We call this function “least upper bound”, since it is the “least (most specific) element that’s greater (less specific) than either s1 or s2”. Conventionally, this function is written as $a \sqcup b$ (or in our case, $s_1 \sqcup s_2$). The $(\sqcup)$ symbol is also called the join of $a$ and $b$. We can define it for our signs so far using the following Cayley table.

$$ \begin{array}{c|cccc} \sqcup & - & 0 & + & ? \\ \hline - & - & ? & ? & ? \\ 0 & ? & 0 & ? & ? \\ + & ? & ? & + & ? \\ ? & ? & ? & ? & ? \\ \end{array} $$

By using the above table, we can see that $(+\ \sqcup\ -)\ =\ ?$ (aka unknown). This is correct; given the four signs we’re working with, that’s the most we can say. Let’s explore the analogy to the max function a little bit more, by observing that this function has certain properties:

max(a, a) = a. The maximum of one number is just that number. Mathematically, this property is called idempotence. Note that by inspecting the diagonal of the above table, we can confirm that our $(\sqcup)$ function is idempotent.
max(a, b) = max(b, a). If you’re taking the maximum of two numbers, it doesn’t matter which one you consider first. This property is called commutativity. Note that if you mirror the table along the diagonal, it doesn’t change; this shows that our $(\sqcup)$ function is commutative.
max(a, max(b, c)) = max(max(a, b), c). When you have three numbers, and you’re determining the maximum value, it doesn’t matter which pair of numbers you compare first. This property is called associativity. You can use the table above to verify the $(\sqcup)$ is associative, too.

A set that has a binary operation (like max or $(\sqcup)$) that satisfies the above properties is called a semilattice. In Agda, we can write this definition roughly as follows:

record IsSemilattice {a} (A : Set a) (_⊔_ : A → A → A) : Set a where
    field
        ⊔-assoc : (x y z : A) → ((x ⊔ y) ⊔ z) ≡ (x ⊔ (y ⊔ z))
        ⊔-comm : (x y : A) → (x ⊔ y) ≡ (y ⊔ x)
        ⊔-idemp : (x : A) → (x ⊔ x) ≡ x

Note that this is an example of the “Is Something” pattern. It turns out to be convenient, however, to not require definitional equality (≡). For instance, we might model sets as lists. Definitional equality would force us to consider lists with the same elements but a different order to be unequal. Instead, we parameterize our definition of IsSemilattice by a binary relation _≈_, which we ask to be an equivalence relation.

From Lattice.agda, lines 23 through 39

record IsSemilattice {a} (A : Set a)
    (_≈_ : A → A → Set a)
    (_⊔_ : A → A → A) : Set a where

    _≼_ : A → A → Set a
    a ≼ b = (a ⊔ b) ≈ b

    _≺_ : A → A → Set a
    a ≺ b = (a ≼ b) × (¬ a ≈ b)

    field
        ≈-equiv : IsEquivalence A _≈_
        ≈-⊔-cong : ∀ {a₁ a₂ a₃ a₄} → a₁ ≈ a₂ → a₃ ≈ a₄ → (a₁ ⊔ a₃) ≈ (a₂ ⊔ a₄)

        ⊔-assoc : (x y z : A) → ((x ⊔ y) ⊔ z) ≈ (x ⊔ (y ⊔ z))
        ⊔-comm : (x y : A) → (x ⊔ y) ≈ (y ⊔ x)
        ⊔-idemp : (x : A) → (x ⊔ x) ≈ x

Notice that the above code also provides – but doesn’t require – _≼_ and _≺_. That’s because a least-upper-bound operation encodes an order: intuitively, if max(a, b) = b, then b must be larger than a. Lars Hupel’s CRDT series includes an explanation of how the ordering operator and the “least upper bound” function can be constructed from one another.

As it turns out, the min function has very similar properties to max: it’s idempotent, commutative, and associative. For a partial order like ours, the analog to min is “greatest lower bound”, or “the largest value that’s smaller than both inputs”. Such a function is denoted as $a\sqcap b$, and often called the “meet” of $a$ and $b$. As for what it means, where $s_1 \sqcup s_2$ means “combine two signs where you don’t know which one will be used” (like in an if/else), $s_1 \sqcap s_2$ means “combine two signs where you know both of them to be true [note: If you're familiar with Boolean algebra, this might look a little bit familiar to you. In fact, the symbol for "and" on booleans is $\land$. Similarly, the symbol for "or" is $\lor$. So, $s_1 \sqcup s_2$ means "the sign is $s_1$ or $s_2$", or "(the sign is $s_1$) $\lor$ (the sign is $s_2$)". Similarly, $s_1 \sqcap s_2$ means "(the sign is $s_1$) $\land$ (the sign is $s_2$)". Don't these symbols look similar?

In fact, booleans with $(\lor)$ and $(\land)$ satisfy the semilattice laws we've been discussing, and together form a lattice (to which I'm building to in the main body of the text). The same is true for the set union and intersection operations, $(\cup)$ and $(\cap)$. ] ”. For example, $(+\ \sqcap\ ?)\ =\ +$, because a variable that’s both “any sign” and “positive” must be positive.

There’s just one hiccup: what’s the greatest lower bound of + and -? it needs to be a value that’s less than both of them, but so far, we don’t have such a value. Intuitively, this value should be called something like impossible, because a number that’s both positive and negative doesn’t exist. So, let’s extend our analyzer to have a new impossible value. In fact, it turns out that this “impossible” value is the least element of our set (we added it to be the lower bound of + and co., which in turn are less than unknown). Similarly, unknown is the largest element of our set, since it’s greater than + and co, and transitively greater than impossible. In mathematics, it’s not uncommon to define the least element as $\bot$ (read “bottom”), and the greatest element as $\top$ (read “top”). With that in mind, the following are the updated Cayley tables for our operations.

$$ \begin{array}{c|ccccc} \sqcup & - & 0 & + & \top & \bot \\ \hline - & - & \top & \top & \top & - \\ 0 & \top & 0 & \top & \top & 0 \\ + & \top & \top & + & \top & + \\ \top & \top & \top & \top & \top & \top \\ \bot & - & 0 & + & \top & \bot \\ \end{array} \qquad \begin{array}{c|ccccc} \sqcap & - & 0 & + & \top & \bot \\ \hline - & - & \bot & \bot & - & \bot \\ 0 & \bot & 0 & \bot & 0 & \bot \\ + & \bot & \bot & + & + & \bot \\ \top & - & 0 & + & \top & \bot \\ \bot & \bot & \bot & \bot & \bot & \bot \\ \end{array} $$

So, it turns out that our set of possible signs is a semilattice in two ways. And if “semi” means “half”, does two “semi"s make a whole? Indeed it does!

A lattice is made up of two semilattices. The operations of these two lattices, however, must satisfy some additional properties. Let’s examine the properties in the context of min and max as we have before. They are usually called the absorption laws:

max(a, min(a, b)) = a. a is either less than or bigger than b; so if you try to find the maximum and the minimum of a and b, one of the operations will return a.
min(a, max(a, b)) = a. The reason for this one is the same as the reason above.

In Agda, we can therefore write a lattice as follows:

From Lattice.agda, lines 183 through 193

record IsLattice {a} (A : Set a)
    (_≈_ : A → A → Set a)
    (_⊔_ : A → A → A)
    (_⊓_ : A → A → A) : Set a where

    field
        joinSemilattice : IsSemilattice A _≈_ _⊔_
        meetSemilattice : IsSemilattice A _≈_ _⊓_

        absorb-⊔-⊓ : (x y : A) → (x ⊔ (x ⊓ y)) ≈ x
        absorb-⊓-⊔ : (x y : A) → (x ⊓ (x ⊔ y)) ≈ x

Concrete Examples

Natural Numbers

Since we’ve been talking about min and max as motivators for properties of $(\sqcap)$ and $(\sqcup)$, it might not be all that surprising that natural numbers form a lattice with min and max as the two binary operations. In fact, the Agda standard library writes min as _⊓_ and max as _⊔_! We can make use of the already-proven properties of these operators to easily define IsLattice for natural numbers. Notice that since we’re not doing anything clever, like considering lists up to reordering, there’s no reason not to use definitional equality ≡ for our equivalence relation.

From Nat.agda, lines 1 through 45

module Lattice.Nat where

open import Equivalence
open import Lattice
open import Relation.Binary.PropositionalEquality using (_≡_; refl; sym; trans)
open import Data.Nat using (ℕ; _⊔_; _⊓_; _≤_)
open import Data.Nat.Properties using 
    ( ⊔-assoc; ⊔-comm; ⊔-idem
    ; ⊓-assoc; ⊓-comm; ⊓-idem
    ; ⊓-mono-≤; ⊔-mono-≤
    ; m≤n⇒m≤o⊔n; m≤n⇒m⊓o≤n; ≤-refl; ≤-antisym
    )

private
    ≡-⊔-cong : ∀ {a₁ a₂ a₃ a₄} → a₁ ≡ a₂ → a₃ ≡ a₄ → (a₁ ⊔ a₃) ≡ (a₂ ⊔ a₄)
    ≡-⊔-cong a₁≡a₂ a₃≡a₄ rewrite a₁≡a₂ rewrite a₃≡a₄ = refl

    ≡-⊓-cong : ∀ {a₁ a₂ a₃ a₄} → a₁ ≡ a₂ → a₃ ≡ a₄ → (a₁ ⊓ a₃) ≡ (a₂ ⊓ a₄)
    ≡-⊓-cong a₁≡a₂ a₃≡a₄ rewrite a₁≡a₂ rewrite a₃≡a₄ = refl

isMaxSemilattice : IsSemilattice ℕ _≡_ _⊔_
isMaxSemilattice = record
    { ≈-equiv = record
        { ≈-refl = refl
        ; ≈-sym = sym
        ; ≈-trans = trans
        }
    ; ≈-⊔-cong = ≡-⊔-cong
    ; ⊔-assoc = ⊔-assoc
    ; ⊔-comm = ⊔-comm
    ; ⊔-idemp = ⊔-idem
    }

isMinSemilattice : IsSemilattice ℕ _≡_ _⊓_
isMinSemilattice = record
    { ≈-equiv = record
        { ≈-refl = refl
        ; ≈-sym = sym
        ; ≈-trans = trans
        }
    ; ≈-⊔-cong = ≡-⊓-cong
    ; ⊔-assoc = ⊓-assoc
    ; ⊔-comm = ⊓-comm
    ; ⊔-idemp = ⊓-idem
    }

The definition for the lattice instance itself is pretty similar; I’ll omit it here to avoid taking up a lot of vertical space, but you can find it on lines 47 through 83 of my Lattice.Nat module.

The “Above-Below” Lattice

It’s not too hard to implement our sign lattice in Agda. However, we can do it in a somewhat general way. As it turns out, extending an existing set, such as $\{+, -, 0\}$, with a “bottom” and “top” element (to be used when taking the least upper bound and greatest lower bound) is quite common and useful. For instance, if we were to do constant propagation (simplifying 7+4 to 11), we would probably do something similar, using the set of integers $\mathbb{Z}$ instead of the plus-zero-minus set.

The general definition is as follows. Take some original set $S$ (like our 3-element set of signs), and extend it with new “top” and “bottom” elements ($\top$ and $\bot$). Then, define $(\sqcup)$ as follows:

$$ x_1 \sqcup x_2 = \begin{cases} \top & x_1 = \top\ \text{or}\ x_2 = \top \\ \top & x_1, x_2 \in S, x_1 \neq x_2 \\ x_1 = x_2 & x_1, x_2 \in S, x_1 = x_2 \\ x_1 & x_2 = \bot \\ x_2 & x_1 = \bot \end{cases} $$

In other words, $\top$ overrules anything that it’s combined with. In math terms, it’s the absorbing element of the lattice. On the other hand, $\bot$ gets overruled by anything it’s combined with. In math terms, that’s an identity element. Finally, when combining two elements that aren’t $\top$ or $\bot$ (which would otherwise be covered by the prior sentences), combining an element with itself leaves it unchanged (upholding idempotence), while combining two unequal element results in $\top$. That last part matches the way we defined “least upper bound” earlier.

The intuition is as follows: the $(\sqcup)$ operator is like an “or”. Then, “anything or positive” means “anything”; same with “anything or negative”, etc. On the other hand, “impossible or positive” means positive, since one of those cases will never happen. Finally, in the absense of additional elements, the most we can say about “positive or negative” is “any sign”; of course, “positive or positive” is the same as “positive”.

The “greatest lower bound” operator is defined by effectively swapping top and bottom.

$$ x_1 \sqcup x_2 = \begin{cases} \bot & x_1 = \bot\ \text{or}\ x_2 = \bot \\ \bot & x_1, x_2 \in S, x_1 \neq x_2 \\ x_1 = x_2 & x_1, x_2 \in S, x_1 = x_2 \\ x_1 & x_2 = \top \\ x_2 & x_1 = \top \end{cases} $$

For this operator, $\bot$ is the absorbing element, and $\top$ is the identity element. The intuition here is not too different: if $(\sqcap)$ is like an “and”, then “impossible and positive” can’t happen; same with “impossible and negative”, and so on. On the other hand, “anything and positive” clearly means positive. Finally, “negative and positive” can’t happen (again, there is no number that’s both positive and negative), and “positive and positive” is just “positive”.

What properties of the underlying set did we use to get this to work? The only thing we needed is to be able to check and see if two elements are equal or not; this is called decidable equality. Since that’s the only thing we used, this means that we can define an “above/below” lattice like this for any type for which we can check if two elements are equal. In Agda, I encoded this using a parameterized module:

From AboveBelow.agda, lines 5 through 8

module Lattice.AboveBelow {a} (A : Set a)
                          (_≈₁_ : A → A → Set a)
                          (≈₁-equiv : IsEquivalence A _≈₁_)
                          (≈₁-dec : IsDecidable _≈₁_) where

From there, I defined the actual data type as follows:

From AboveBelow.agda, lines 23 through 26

data AboveBelow : Set a where
    ⊥ : AboveBelow
    ⊤ : AboveBelow
    [_] : A → AboveBelow

From there, I defined the $(\sqcup)$ and $(\sqcap)$ operations almost exactly to the mathematical equation above (the cases were re-ordered to improve Agda’s reduction behavior). Here’s the former:

From AboveBelow.agda, lines 86 through 93

    _⊔_ : AboveBelow → AboveBelow → AboveBelow
    ⊥ ⊔ x = x
    ⊤ ⊔ x = ⊤
    [ x ] ⊔ [ y ] with ≈₁-dec x y
    ...   | yes _ = [ x ]
    ...   | no  _ = ⊤
    x ⊔ ⊥ = x
    x ⊔ ⊤ = ⊤

And here’s the latter:

From AboveBelow.agda, lines 181 through 188

    _⊓_ : AboveBelow → AboveBelow → AboveBelow
    ⊥ ⊓ x = ⊥
    ⊤ ⊓ x = x
    [ x ] ⊓ [ y ] with ≈₁-dec x y
    ...   | yes _ = [ x ]
    ...   | no  _ = ⊥
    x ⊓ ⊥ = ⊥
    x ⊓ ⊤ = x

The proofs of the lattice properties are straightforward and proceed by simple case analysis. Unfortunately, Agda doesn’t quite seem to evaluate the binary operator in every context that I would expect it to, which has led me to define some helper lemmas such as the following:

From AboveBelow.agda, lines 95 through 96

95
96

    ⊤⊔x≡⊤ : ∀ (x : AboveBelow) → ⊤ ⊔ x ≡ ⊤
    ⊤⊔x≡⊤ _ = refl

As a sample, here’s a proof of commutativity of $(\sqcup)$:

From AboveBelow.agda, lines 158 through 165

    ⊔-comm : ∀ (ab₁ ab₂ : AboveBelow) → (ab₁ ⊔ ab₂) ≈ (ab₂ ⊔ ab₁)
    ⊔-comm ⊤ x rewrite x⊔⊤≡⊤ x = ≈-refl
    ⊔-comm ⊥ x rewrite x⊔⊥≡x x = ≈-refl
    ⊔-comm x ⊤ rewrite x⊔⊤≡⊤ x = ≈-refl
    ⊔-comm x ⊥ rewrite x⊔⊥≡x x = ≈-refl
    ⊔-comm [ x₁ ] [ x₂ ] with ≈₁-dec x₁ x₂
    ... | yes x₁≈x₂ rewrite x≈y⇒[x]⊔[y]≡[x] (≈₁-sym x₁≈x₂) = ≈-lift x₁≈x₂
    ... | no  x₁̷≈x₂ rewrite x̷≈y⇒[x]⊔[y]≡⊤ (x₁̷≈x₂ ∘ ≈₁-sym) = ≈-⊤-⊤

The details of the rest of the proofs can be found in the AboveBelow.agda file.

To recover the sign lattice we’ve been talking about all along, it’s sufficient to define a sign data type:

From Sign.agda, lines 19 through 22

data Sign : Set where
    + : Sign
    - : Sign
    0ˢ : Sign

Then, prove decidable equality on it (effecitly defining a comparison function), and instantiate the AboveBelow module:

From Sign.agda, lines 34 through 47

-- g for siGn; s is used for strings and i is not very descriptive.
_≟ᵍ_ : IsDecidable (_≡_ {_} {Sign})
_≟ᵍ_ + + = yes refl
_≟ᵍ_ + - = no (λ ())
_≟ᵍ_ + 0ˢ = no (λ ())
_≟ᵍ_ - + = no (λ ())
_≟ᵍ_ - - = yes refl
_≟ᵍ_ - 0ˢ = no (λ ())
_≟ᵍ_ 0ˢ + = no (λ ())
_≟ᵍ_ 0ˢ - = no (λ ())
_≟ᵍ_ 0ˢ 0ˢ = yes refl

-- embelish 'sign' with a top and bottom element.
open import Lattice.AboveBelow Sign _≡_ (record { ≈-refl = refl; ≈-sym = sym; ≈-trans = trans }) _≟ᵍ_ as AB

From Simple Lattices to Complex Ones

Natural numbers and signs alone are cool enough, but they will not be sufficient to write program analyzers. That’s because when we’re writing an analyzer, we don’t just care about one variable: we care about all of them! An initial guess might be to say that when analyzing a program, we really need several signs: one for each variable. This might be reminiscent of a map. So, when we compare specificity, we’ll really be comparing the specificity of maps. Even that, though, is not enough. The reason is that variables might have different signs at different points in the program! A single map would not be able to capture that sort of nuance, so what we really need is a map associating states with another map, which in turn associates variables with their signs.

Mathematically, we might write this as:

$$ \text{Info} \triangleq \text{ProgramStates} \to (\text{Variables} \to \text{Sign}) $$

That’s a big step up in complexity. We now have a doubly-nested map structure instead of just a sign. and we need to compare such maps in order to gaugage their specificity and advance our analyses. But where do we even start with maps, and how do we define the $(\sqcup)$ and $(\sqcap)$ operations?

The solution turns out to be to define ways in which simpler lattices (like our sign) can be combined and transformed to define more complex lattices. We’ll move on to that in the next post of this series.

Implementing and Verifying "Static Program Analysis" in Agda, Part 0: Intro

Sat, 06 Jul 2024 17:37:42 -0700

Some years ago, when the Programming Languages research group at Oregon State University was discussing what to read, the Static Program Analysis lecture notes came up. The group didn’t end up reading the lecture notes, but I did. As I was going through them, I noticed that they were quite rigorous: the first several chapters cover a little bit of lattice theory, and the subsequent analyses – and the descriptions thereof – are quite precise. When I went to implement the algorithms in the textbook, I realized that just writing them down would not be enough. After all, the textbook also proves several properties of the lattice-based analyses, which would be lost in translation if I were to just write C++ or Haskell.

At the same time, I noticed that lots of recent papers in programming language theory were formalizing their results in Agda. Having played with dependent types before, I was excited to try it out. Thus began my journey to formalize (the first few chapters of) Static Program Analysis in Agda.

In all, I built a framework for static analyses, based on a tool called motone functions. This framework can be used to implement and reason about many different analyses (currently only a certain class called forward analyses, but that’s not hard limitation). Recently, I’ve proven the correctness of the algorithms my framework produces. Having reached this milestone, I’d like to pause and talk about what I’ve done.

In subsequent posts in this series, will describe what I have so far. It’s not perfect, and some work is yet to be done; however, getting to this point was no joke, and I think it’s worth discussing. In all, I’d like to cover the following major topics, spending a couple of posts on each:

Lattices: the analyses I’m reasoning about use an algebraic structure called a lattice. This structure has certain properties that make it amenable to describing degrees of “knowledge” about a program. In lattice-based static program analysis, the various elements of the lattice represent different facts or properties that we know about the program in question; operations on the lattice help us combine these facts and reason about them. I write about this in Part 1: Lattices.

Interestingly, lattices can be made by combining other lattices in certain ways. We can therefore use simpler lattices as building blocks to create more complex ones, all while preserving the algebraic structure that we need for program analysis. I write about this in Part 2: Combining Lattices.
The Fixed-Point Algorithm: to analyze a program, we use information that we already know to compute additional information. For instance, we might use the fact that 1 is positive to compute the fact that 1+1 is positive as well. Using that information, we can determine the sign of (1+1)+1, and so on. In practice, this is often done by calling some kind of “analyze” function over and over, each time getting closer to an accurate characterization of the program’s behavior. When the output of “analyze” stops changing, we know we’ve found as much as we can find, and stop.

What does it mean for the output to stop changing? Roughly, that’s when the following equation holds: knownInfo = analyze(knownInfo). In mathematics, this is known as a fixed point. To enable computing fixed points, we focus on a specific kind of lattices: those with a fixed height. I talk about what this means in Part 3: Lattices of Finite Height.

Even if we restrict our attention to lattices of fixed height, not all functions have fixed points; however, certain types of functions on lattices always do. The fixed-point algorithm is a way to compute these points, and we will use this to drive our analyses. I talk about this in Part 4: The Fixed-Point Algorithm.
Correctness: putting together the work on lattices and the fixed-point algorithm, we can implement a static program analyzer in Agda. However, it’s not hard to write an “analyze” function that has a fixed point but produces an incorrect result. Thus, the next step is to prove that the results of our analyzer accurately describe the program in question.

The interesting aspect of this step is that our program analyzer works on control-flow graphs (CFGs), which are a relatively compiler-centric representation of programs. On the other hand, what the language actually does is defined by its semantics, which is not at all compiler-centric. We need to connect these two, showing that the CFGs we produce “make sense” for our language, and that given CFGs that make sense, our analysis produces results that match the language’s execution. To do so, I write about the language and its semantics in Part 5: Our Programming Language, then about building control flow graphs for the language in Part 6: Control Flow Graphs. I then write about combining these two representations in Part 7: Connecting Semantics and Control Flow Graphs.

Here are the posts that I’ve written so far for this series:

Microfeatures I Love in Blogs and Personal Websites

Sun, 23 Jun 2024 11:03:10 -0700

Some time ago, Hillel Wayne published an article titled Microfeatures I’d like to see in more languages. In this article, he described three kinds of features in programming languages: fundamental features, deeply engrained features, and nice-to-have convenience features. Hillel’s premise was that language designers tend to focus on the first two; however, because the convenience features are relatively low-overhead, it’s easier for them to jump between projects, and they provide a quality-of-life increase.

I’ve been running a blog for a while — some of the oldest posts I’ve found (which are no longer reflected on this site due to their low quality) were from 2015. In this time, I’ve been on the lookout for ways to improve the site, and I’ve seen quite a few little things that are nice to use, but relatively easy to implement. They don’t really make or break a website; the absence of such features might be noticed, but will not cause any disruption for the reader. On the other hand, their presence serves as a QoL enhancement. I find these to be analogous to Hillel’s notion of “microfeatures”. If you’re interested in adding something to your site, consider browsing this menu to see if anything resonates!

One last thing is that this post is not necessarily about microfeatures I’d like every blog or personal website to have. Some ideas I present here are only well-suited to certain types of content and certain written voices. They need not be applied indiscriminately.

With that, let’s get started!

Sidenotes

Gwern is, in my view, the king of sidenotes. Gwern’s writing makes very heavy use of them (at least based on the articles that I’ve read). This is where I originally got inspiration for my own implementation in Hugo. Check out the page on hydrocephalus for an example; Here’s what a piece of that page looks like on my end at the time of writing:

A screenshot of Gwern’s page on hydrocephalus

Sidenotes are nice because they allow for diversions without interrupting the main article’s flow. You can provide additional details for the curious reader, or — as Gwern does — use the sidenotes for citing studies or sources. In either case, the reading experience is significantly more pleasant that footnotes, for which you typically have to go to the bottom of the page, and then return to the top.

Another reason I called Gwern the “king of sidenotes” is this page on sidenotes. There, Gwern documents numerous approaches to this feature, mostly inspired by Tufte CSS. The page is very thorough — it even includes a link to my own work, as unknown as it may be! I would recommend checking it out if you are interested in enhancing your site with sidenotes.

Tables of Contents

Not all personal sites include tables of contents (TOCs), but they are nice. They serve two purposes:

Seeing at a glance what the post will be about, in the form of headings.
Being able to navigate to an interesting part of the page without having to scroll.

Static site generators (I myself use Hugo) are typically able to generate TOCs automatically, since they are already generating the HTML and know what headings they are inserting into the page. For instance, Hugo has TableOfContents. I suspect the same is true for other existing website technologies.

Despite this, I actually had to look relatively long to find sites I frequent that have TOCs to show off as examples here. The first one I came across — after Gwern’s, whose site will be mentioned plenty in this post anyway — is Faster than Lime. Take this post on Rust’s Futures; this is what the top of it looks like at the time of writing:

A screenshot of the table of contents on Faster than Lime

The quality and value of TOCs certainly depends on the sections within the page itself — and whether or not the page has sections at all! — but in my opinion, the benefits to navigation become apparent even for relatively simple pages.

As an honorable mention, I’d like to show Lars Hupel’s site. The pages on the site don’t — as far as I can tell — have internal tables of contents. However, pages that are part of a series — such as the introduction to CRDTs — have tables of contents that span the entire series.

A screenshot of the table of contents on Lars Hupel’s site

I also find this very nice, though it does miss out on headings within a page.

Bonus: Showing Page Progress

I’ve mentioned that tables of contents can communicate the structure of the page. However, they do so from the outset, before you’ve started reading. In their “base form”, the reader stops benefiting from tables of contents once they’ve started reading. [note: That is, of course, unless they jump back to the top of the post and find the table of contents again. ]

If you want to show progress while the reader is somewhere in the middle of a page, you could use a page progress bar. I’ve noticed one while reading Quanta Magazine; it looks like this (recording my scrolling through the most recent article at the time of writing).

The progress bar on a Quanta Magazine article

One immediate thought is that this is completely superseded by the regular browser scroll bar that’s ever-present at the side of the page. However, the scroll bar could be deceiving. If your page has a comments section, the comments could make the page look dauntingly long. Similarly, references to other pages and general “footer material” count towards the scroll bar, but would not count towards the progress bar.

Combining the two, you could imagine an always-visible table of contents that highlights the current section you’re in. With such a feature, you can always see where you are (including a rough estimate of how far into the page you’ve scrolled), and at the same time see how the current section integrates into the broader structure. I’ve seen this done before, but could not find a site off the top of my head that implements the feature; as a fallback, here’s the CSS tricks tutorial that shows how to implement a dynamic table of contents, and a recording of me scrolling through it:

The table of contents from a CSS Tricks demo

Easily Linkable Headings

How can you link a particular section of a page to your friend? There’s a well-defined mechanism to do this in HTML: you can use the ID of a particular HTML element, and add it as #some-id to the end of a link to the page. The link then takes the user to that particular HTML element. I can do this, for instance, to link to the sidenotes section above.

How does one discover the ID of the part of the page that they want to link to? The ID is not a “visual” property; it’s not displayed to the user, and is rather a detail of HTML itself. Thus, on any given page, even if every element has a unique, linkable ID, I can’t make use of it without going into Inspect Element and trying to find the ID in the HTML tree.

The simple solution is to make the elements that you want to be easily “linkable” into links to themselves! Then, the user can right-click the element in question (probably the heading) and click Copy Link. Much easier! To demonstrate a similar idea, here is a link to this paragraph itself. You can now use the context menu to Copy Link, put it in your browser, and voilà — you’re right back here!

As with tables of contents, many website technologies provide most of the tooling to add support for this feature. Relatively often I come across pages that have unique IDs for each header, but no clickable links! I end up having to use inspect element to find the anchor points.

A variation on this idea — if you don’t want to make the entire heading or title a link — is to include alongside it (before or after) a clickable element that is a link to that title. You can click that element to retrieve link information, instead (and the icon additionally tells you that this is possible). Hugo’s documentation does this: here’s a screenshot of an arbitrary page.

A title and paragraph from the Hugo documentation

Grouping Series of Posts

Some authors like to write at length on a particular topic; to get the content out to readers faster (and to make the resulting pages less daunting), it makes sense to break a single topic up into a series. The easiest way to do this is to just… publish several articles, possibly with related names, and link them to each other. Done!

With a little more effort, though, the series-reading and series-writing experience could be nicer. Instead of manually inserting links, you could configure your website to automatically add a “next” and “previous” button to pages in a given series. You could also give an overview of a particular series and create a “navigation hub” for it.

As an example, the Chapel language blog has navigation buttons. Here’s a screenshot from a post in the Advent of Code series:

Series navigation buttons on a Chapel blog post

I’ve mentioned this in the section on tables of contents, but Lars Hupel’s site has tables of contents that link between series. I’m not sure if it’s automatically generated or hand-written, but it’s definitely nice.

A screenshot of the table of contents on Lars Hupel’s site

Dialogues

I first came across dialogues on Xe Iaso’s site, but I think I see them used most often in posts on Faster than Lime. As an example, here’s a little dialogue on a post about Rust’s futures. At the time of writing, it looks like this:

A dialogue with “cool bear” on Faster than Lime

Using dialogues — even for technical writing — is not a particularly novel idea. I know I’ve seen it in a textbook before; probably this part of Operating Systems: Three Easy Pieces. It can help ask questions from a less-experienced point of view, and therefore possibly voice concerns that a reader might themselves be having. And of course — as with “cool bear” and Xe Iaso’s many characters — it can change the tone and make the page a bit more fun.

Code Blocks with Origin

This one was recommended to me by a reader, and so I’ll be talking about my page specifically!

When I was writing about making a compiler, a reader emailed me and pointed out that they were getting lost in the various code blocks. My page displayed the code that I was writing about, but the project had grown beyond a single file. As a result, I’d be making changes midway through one file at one moment, and another file the next. This prompted me to add decorators to my code blocks that look something like this:

From patterns.rb, lines 3 through 8

def sum_digits(n)
  while n > 9
    n = n.to_s.chars.map(&:to_i).sum
  end
  n
end

The decorator says what file the code is from, as well as what lines are being presented. If you click the file name, the decorator links to my Gitea instance, allowing you to read the code in context.

Though it’s not quite the same (in particular, it’s unfortunately missing links), the Crafting Interpreters online book does something similar. It describes changes to the code in words next to the changed code itself, like “added after MyStruct”. Here’s a screenshot of the page on local variables at the time of writing.

Location notes on code in Crafting Interpreters

I think it looks quite elegant, and in some ways — specifically in the verbal descriptions of what each change does — might be superior to my approach.

It’s not quite the same thing, but GitHub Gists can help approximate this feature. A Gist could contain multiple files, and each file can be individually embedded into your page. Hugo in particular has built-in support for Gists (and I’ve snagged that link using the docs’ easily linkable headings); I suspect that other website engines have some form of support as well. At the time of writing, an embedded Gist looks something like this:

Code embedded in Hugo documentation using a GitHub Gist

Clicking list.html takes you to the source code of the file.

Bonus: Code Blocks with Clickable Links

If we’re going for fancy code blocks, another fancy feature is provided by the Agda programming language. Agda can generate HTML code blocks in which every symbol (like a variable, record name, function name) are linked to where they are defined. So if you’re reading the code, and wonder “what the heck is x?”, you can just click it to see how it’s defined.

It’s not simple to integrate Agda’s plain HTML output into an existing webpage, but some projects do that. I took a stab at it in my post about integrating it with Hugo. I wager this would be even harder for other languages. However, it leads to nice results; my go-to is Programming Languages Foundations in Agda. The online book introduces various concepts from Programming Language Theory, and each code block that it shows is fully linked. This makes it possible to jump around the page like so:

Navigating code blocks on a page from PLFA

Markers for External Links

Some sites I’ve seen mark links that go to a different domain with a little icon. If you’ve read this far, you’ve likely noticed that my site does the same. Another good example of this — even though the CSS is little rough at the time of writing — is James’ Coffee Blog ☕. I’ve taken the (small) liberty to adjust the color of the icon, which I suspect is buggy in my browser.

An external link on James’ blog

Some websites (~~this one included~~) also make such links open in a new tab automatically. That way, you tend to not lose the original article by clicking through one of its references.

Bonus: Different Markers for Different Destinations

Gwern’s website takes this idea further, by changing the icon for external links depending on the destination. For instance, links to Wikipedia articles are stylized with a little “W”, links to Haskell.org are stylized using a lambda ($\lambda$), and links to .zip files have a little archive icon. There are more; ~~I’ve found the link processing code on GitHub, and even the list of websites that get their own icons.~~ I could not find a verbal description, though.

Edit: Gwern has pointed out that the links I provided go to obsolete code. The link processing functionality is documented in comments here and the link icon rules are here. A non-code list of icons exists too.

Now for some pictures. Here are a ton of links from the “About” page!

Links to Wikipedia on Gwern’s site

A link to Haskell.org on Gwern’s site

Links zip files on Gwern’s site

Bonus: Link Preview

Gwern’s website has no shortage of cool ideas. Among them showing link previews on hover. When hovering over a link, the site displays a popup window that contains a view into that page. I suspect that this view is also archived somehow, so that it retains a view into the page that matches it at the time of writing.

To be perfectly honest, I found this feature a little jarring at first. As I would try to click links, I would get surprised by an additional overlay. However, as I spent more time browsing the site, I grew quite accustomed to the previews. I would hover over a link to see the first paragraph and thus get a short synopsis. This worked really well in tandem with per-destination marker icons; I could tell at a glance whether a link was worth hovering over.

Here’s what it looks like:

Hovering over a link on Gwern’s site

RSS Feeds

RSS is a feed standard that allows sites to publish updates. Blogs in particular can make use of RSS to notify readers of updates. RSS feeds are processed by a feed reader, which is a program that polls a website’s index.xml file (or other similar files) and reads it to detect new content. If you opt in to full-text RSS feeds, users can read the entire post entirely from their reader.

RSS makes it easier to keep up with your site. Rather than having to check in on every author whose content I enjoy on the internet, I can add their feed URL to my list, and have my feed reader automatically aggregate all updates for me to read. It’s kind of like a social media or news feed, except that I control what’s shown to me, and authors of the blogs I follow don’t need to create accounts and explicitly share their work on social media!

I don’t have any particular website to show off in this section; instead I’ll show you a list of websites that I’m following in my feed reader of choice. You might notice that a lot of these websites are listed here as inspiration for other microfeatures.

A screenshot of my Feedbin list

Links to Other Sites

This feature I first noticed on Drew DeVault’s blog. Every page on Drew’s blog, at the bottom, has a section titled “Articles from blogs I read”. For instance, on a sample post, at the time of writing, I’m seeing the following footer:

Links to other blogs from Drew DeVault’s blog

As indicated in the image, Drew’s site in particular uses a program called openring, which is based on RSS feeds (another microfeature I love). However, how the site finds such articles (statically like openring, or on page load using some JavaScript) isn’t hugely important to me. What’s important is that you’re promoting other content creators whose work you enjoy, which is the ethos of my favorite slice of the internet.

Conclusion + Anything Else?

Those are all the microfeatures that I could think of in a single sitting. I hope that you have been inspired to integrate features like these into your own site, or at the very least that you think doing so would be a good idea.

This list isn’t exhaustive. I’ve probably missed some good microfeatures! If you can think of such a feature, let me know; my email address is linked in the footer of this article.

Thank you for reading, and cheers!

Integrating Agda's HTML Output with Hugo

Thu, 30 May 2024 00:29:26 -0700

One of my favorite things about Agda are its clickable HTML pages. If you don’t know what they are, that’s pages like Data.List.Properties; they just give the code from a particular Agda file, but make every identifier clickable. Then, if you see some variable or function that you don’t know, you can just click it and jump right to it! It makes exploring the documentation a lot smoother. I’ve found that these HTML pages provide all the information I need for writing proofs.

Recently, I’ve been writing a fair bit about Agda; mostly about the patterns that I’ve learned about, such as the “is something” pattern and the “deeply embedded expression” trick. I’ve found myself wanting to click on definitions in my own code blocks; recently, I got this working, and I wanted to share how I did it, in case someone else wants to integrate Agda into their own static website. Though my stack is based on Hugo, the general idea should work with any other static site generator.

TL;DR and Demo

I wrote a script to transfer links from an Agda HTML file into Hugo’s HTML output, making it possible to embellish “plain” Hugo output with Agda’s ‘go-to-definition links’. It looks like this. Here’s an Agda code block defining an ’expression’ data type, from a project of mine:

From Map.agda, lines 543 through 546

data Expr : Set (a ⊔ℓ b) where
    `_ : Map → Expr
    _∪_ : Expr → Expr → Expr
    _∩_ : Expr → Expr → Expr

And here’s the denotational semantics for that expression:

From Map.agda, lines 586 through 589

⟦_⟧ : Expr -> Map
⟦ ` m ⟧ = m
⟦ e₁ ∪ e₂ ⟧ = ⟦ e₁ ⟧ ⊔ ⟦ e₂ ⟧
⟦ e₁ ∩ e₂ ⟧ = ⟦ e₁ ⟧ ⊓ ⟦ e₂ ⟧

Notice that you can click Expr, _∪_, ⟦, etc.! All of this integrates with my existing Hugo site, and only required a little bit of additional metadata to make it work. The conversion is implemented as a Ruby script; this script transfers the link structure from an Agda-generated documentation HTML file onto lightly-annotated Hugo code blocks.

To use the script, your Hugo theme (or your Markdown content) must annotate the code blocks with several properties:

data-agda-block, which marks code that needs to be processed.
data-file-path, which tells the script what Agda file provided the code in the block, and therefore what Agda HTML file should be searched for links.
data-first-line and data-last-line, which tell the script what section of the Agda HTML file should be searched for said links.

Given this – and a couple of other assumptions, such as that all Agda projects are in a code/<project> folder, the script post-processes the HTML files automatically. Right now, the solution is pretty tailored to my site and workflow, but the core of the script – the piece that transfers links from an Agda HTML file into a syntax-highlighted Hugo HTML block – should be fairly reusable.

Now, the details.

The Constraints

The goal was simple: to allow the code blocks on my Hugo-generated site to have links that take the user to the definition of a given symbol. Specifically, if the symbol occurs somewhere on the same blog page, the link should take the user there (and not to a regular Module.html file). That way, the reader can not only get to the code that they want to see, but also have a chance to read the surrounding prose in properly-rendered Markdown.

Next, unlike standard “literate Agda” files, my blog posts are not single .agda files with Markdown in comments. Rather, I use regular Hugo Markdown, and present portions of an existing project, weaving together many files, and showing the fragments out of order. So, my tool needs to support links that come from distinct modules, in any order.

Additionally, I’ve recently been writing a whole series about an Agda project of mine; in this series, I gradually build up to the final product, explaining one or two modules at a time. I would expect that links on pages in this series could jump to other pages in the same series: if I cover module A in part 1, then write A.f in part 2, clicking on A – and maybe f – should take the reader back to the first part’s page; once again, this would help provide them with the surrounding explanation.

Finally, I wanted the Agda code to appear exactly the same as any other code on my site, including the Hugo-provided syntax highlighting and theme. This ruled out just copy-pasting pieces of the Agda-generated HTML in place of code blocks on my page (and redirecting the links). Thought it was not a hard requirement, I also hoped to include Agda code in the same manner that I include all other code: my codelines shortcode. In brief, the codelines shortcode creates a syntax-highlighted code block, as well as a surrounding “context” that says what file the code is from, which lines are listed, and where to find the full code (e.g., on my Git server). It looks something like this:

From Base.agda, lines 12 through 20

data Expr : Set where
    _+_ : Expr → Expr → Expr
    _-_ : Expr → Expr → Expr
    `_ : String → Expr
    #_ : ℕ → Expr

data BasicStmt : Set where
    _←_ : String → Expr → BasicStmt
    noop : BasicStmt

In summary:

I want to create cross-links between symbols in Agda blocks in a blog post.
These code blocks could include code from disjoint files, and be out of order.
Code blocks among a whole series of posts should be cross-linked too.
The code blocks should be syntax highlighted the same way as the rest of the code on the site.
Ideally, I should be able to use my regular method for referencing code.

I’ve hit all of these requirements; now it’s time to dig into how I got there.

Implementation

Processing Agda’s HTML Output

It’s pretty much a no-go to try to resolve Agda from Hugo, or perform some sort of “heuristic” to detect cross-links. Agda is a very complex programming language, and Hugo’s templating engine, though powerful, is just not up to this task. Fortunately, Agda has support for HTML output using the --html flag. As a build step, I can invoke Agda on files that are referenced by my blog, and generate HTML. This would decidedly slow down the site build process, but it would guarantee accurate link information.

On the other hand, to satisfy the 4th constraint, I need to somehow mimic – or keep – the format of Hugo’s existing HTML output. The easiest way to do this without worrying about breaking changes and version incompatibility is to actually use the existing syntax-highlighted HTML, and annotate it with links as I discover them. Effectively, what I need to do is a “link transfer”: I need to identify regions of code that are highlighted in Agda’s HTML, find those regions in Hugo’s HTML output, and mark them with links. In addition, I’ll need to fix up the links themselves: the HTML output assumes that each Agda file is its own HTML page, but this is ruled out by the second constraint of mine.

As a little visualization, the overall problems looks something like this:

-- Agda's HTML output (blocks of 't' are links):
-- |tttttt| |tttt|  |t|  |t| |ttttt|
    module   ModX  ( x  : T ) where
-- |tttttt| |tt|t|  |t|  |t| |ttttt|
-- Hugo's HTML output (blocks of 't' are syntax highlighting spans)

Both Agda and Hugo output a preformatted code block, decorated with various inline HTMl that indicates information (token color for Hugo; symbol IDs and links in Agda). However, Agda and Hugo do not use the same process to create this decorated output; it’s entirely possible – and not uncommon – for Hugo and Agda to produce misaligned HTML nodes. In my diagram above, this is reflected as ModX being considered a single token by Agda, but two tokens (Mod and X) by the syntax highlighter. As a result, it’s difficult to naively iterate the two HTML formats in parallel.

What I ended up doing is translating Agda’s HTML output into offsets and data about the code block’s plain text – the source code being decorated. Both the Agda and Hugo HTML describe the same code; thus, the plain text is the common denominator between the two.

I wrote a Ruby script to extract the decorations from the Agda output; here it is in slightly abridged form. You can find the original agda.rb file here.

# Traverse the preformatted Agda block in the given Agda HTML file
# and find which textual ranges have IDs and links to other ranges.
# Store this information in a hash, line => links[]
def process_agda_html_file(file)
  document = Nokogiri::HTML.parse(File.open(file))
  pre_code = document.css("pre.Agda")[0]

  # The traversal is postorder; we always visit children before their
  # parents, and we visit leaves in sequence.
  line_infos = []
  offset = 0 # Column index within the current Agda source code line
  line = 1
  pre_code.traverse do |at|
    # Text nodes are always leaves; visiting a new leaf means we've advanced
    # in the text by the length of that text. However, if there are newlines
    # -- since this is a preformatted block -- we also advanced by a line.
    # At this time, do not support links that span multiple lines, but
    # Agda doesn't produce those either.
    if at.text?
      if at.content.include? "\n"
        raise "no support for links with newlines inside" if at.parent.name != "pre"

        # Increase the line and track the final offset. Written as a loop
        # in case we eventually want to add some handling for the pieces
        # sandwiched between newlines.
        at.content.split("\n", -1).each_with_index do |bit, idx|
          line += 1 unless idx == 0
          offset = bit.length
        end
      else
        # It's not a newline node. Just adjust the offset within the plain text.
        offset += at.content.length
      end
    elsif at.name == "a"
      # Agda emits both links and things-to-link-to as 'a' nodes.

      line_info = line_infos.fetch(line) { line_infos[line] = [] }
      href = at.attribute("href")
      id = at.attribute("id")
      if href or id
        new_node = { :from => offset-at.content.length, :to => offset }
        new_node[:href] = href if href
        new_node[:id] = id if id

        line_info << new_node
      end
    end
  end
  return line_infos
end

This script takes an Agda HTML file and returns a map in which each line of the Agda source code is associated with a list of ranges; the ranges indicate links or places that can be linked to. For example, for the ModX example above, the script might produce:

3 => [
  { :from => 3, :to => 9, id => "..." },       # Agda creates <a> nodes even for keywords.
  { :from => 12, :to => 16, id => "ModX-id" }, # The IDs Agda generates aren't usually this nice.
  { :from => 20, :to => 21, id => "x-id" },
]

Modifying Hugo’s HTML

Given such line information, the next step is to transfer it onto existing Hugo HTML files. Within a file, I’ve made my codelines shortcode emit custom attributes that can be used to find syntax-highlighted Agda code. The chief such attribute is data-agda-block; my script traverses all elements with this attribute.

def process_source_file(file, document)
  # Process each highlight group that's been marked as an Agda file.
  document.css('div[data-agda-block]').each do |t|
    # ...

To figure out which Agda HTML file to use, and which lines to search for links, the script also expects some additional attributes.

    # ...
    first_line, last_line = nil, nil

    if first_line_attr = t.attribute("data-first-line")
      first_line = first_line_attr.to_s.to_i
    end
    if last_line_attr = t.attribute("data-last-line")
      last_line = last_line_attr.to_s.to_i
    end

    if first_line and last_line
      line_range = first_line..last_line
    else
      # no line number attributes = the code block contains the whole file
      line_range = 1..
    end

    full_path = t.attribute("data-file-path").to_s
    # ...

At this point, the Agda file could be in some nested directory, like A/B/C/File.agda. However, the project root – the place where Agda modules are compiled from – could be any one of the folders A, B, or C. Thus, the fully qualified module name for File.agda could be File, C.File, B.C.File, or A.B.C.File. Since Agda’s HTML output produces files named after the fully qualified module name, the script needs to guess what the module file is. This is where some conventions come in play: I keep my code in folders directly nested within a top-level code directory; thus, I’ll have folders project1 or project2 inside code, and those will always be project roots. As a result, I guess that the first directory relative to code should be discarded, while the rest should be included in the path. The only exception to this is Git submodules: if an Agda file is included using a submodule, the root directory of the submodule is considered the Agda project root. My Hugo theme indicates the submodule using an additional data-base-path attribute; in all, that leads to the following logic:

    # ...
    full_path_dirs = Pathname(full_path).each_filename.to_a
    base_path = t.attribute("data-base-path").to_s
    base_dir_depth = 0
    if base_path.empty?
      # No submodules were used. Assume code/<X> is the root.
      # The path of the file is given relative to `code`, so need
      # to strip only the one outermost directory.
      base_dir_depth = 1
      base_path = full_path_dirs[0]
    else
      # The code is in a submodule. Assume that the base path / submodule
      # root is the Agda module root, ignore all folders before that.
      base_path_dirs = Pathname(base_path).each_filename.to_a
      base_dir_depth = base_path_dirs.length
    end
    # ...

With that, the script determines the actual HTML file path — by assuming that there’s an html folder in the same place as the Agda project root — and runs the above process_agda_html_file:

    # ...
    dirs_in_base = full_path_dirs[base_dir_depth..-1]
    html_file = dirs_in_base.join(".").gsub(/\.agda$/, ".html")
    html_path = File.join(["code", base_path, "html", html_file])

    agda_info = process_agda_html_file(html_path)
    # ...

The next step is specific to the output of Hugo’s syntax highlighter, Chroma. When line numbers are enabled – and they are on my site – Chroma generates a table that, at some point, contains a bunch of span HTML nodes, each with the line class. Each such span corresponds to a single line of output; naturally, the first one contains the code from first_line, the second from first_line + 1, and so on until last_line. This is quite convenient, because it saves the headache of counting newlines the way that the Agda processing code above has to.

For each line of syntax-highlighted code, the script retrieves the corresponding list of links that were collected from the Agda HTML file.

    # ...
    lines = t.css("pre.chroma code[data-lang] .line")
    lines.zip(line_range).each do |line, line_no|
      line_info = agda_info[line_no]
      next unless line_info

      # ...

The subsequent traversal – which picks out the plain text of the Agda file as reasoned above – is very similar to the previous one. Here too there’s an offset variable, which gets incremented with the length of a new plain text pieces. Since we know the lines match up to spans, there’s no need to count newlines.

      # ...
      offset = 0
      line.traverse do |lt|
        if lt.text?
          content = lt.content
          new_offset = offset + content.length

          # ...

At this point, we have a line number, and an offset within that line number that describes the portion of the text under consideration. We can traverse all the links for the line, and find ones that mark a piece of text somewhere in this range. For the time being – since inserting overlapping spans is quite complicated – I require the links to lie entirely within a particular plain text region. As a result, if Chroma splits a single Agda identifier into several tokens, it will not be linked. For now, this seems like the most conservative and safe approach.

          # ...
          matching_links = line_info.links.filter do |link|
            link[:from] >= offset and link[:to] <= new_offset
          end
          # ...

All that’s left is to slice up the plain text fragment into a bunch of HTML pieces: the substrings that are links will turn into a HTML nodes, while the substrings that are “in between” the links will be left over as plain text nodes. The code to do so is relatively verbose, but not all that complicated.

          replace_with = []
          replace_offset = 0
          matching_links.each do |match|
            # The link's range is an offset from the beginning of the line,
            # but the text piece we're splitting up might be partway into
            # the line. Convert the link coordinates to piece-relative ones.
            relative_from = match[:from] - offset
            relative_to = match[:to] - offset

            # If the previous link ended some time before the new link
            # began (or if the current link is the first one, and is not
            # at the beginning), ensure that the plain text "in between"
            # is kept.
            replace_with << content[replace_offset...relative_from]

            tag = (match.include? :href) ? 'a' : 'span'
            new_node = Nokogiri::XML::Node.new(tag, document)
            if match.include? :href
              # For nodes with links, note what they're referring to, so
              # we can adjust their hrefs when we assign global IDs.
              href = match[:href].to_s
              new_node['href'] = note_used_href file, new_node, href
            end
            if match.include? :id
              # For nodes with IDs visible in the current Hugo file, we'll
              # want to redirect links that previously go to other Agda
              # module HTML files. So, note the ID that we want to redirect,
              # and pick a new unique ID to replace it with.
              id = match[:id].to_s
              new_node['id'] = note_defined_href file, "#{html_file}##{id}"
            end
            new_node.content = content[relative_from...relative_to]

            replace_with << new_node
            replace_offset = relative_to
          end
          replace_with << content[replace_offset..-1]

There’s a little bit of a subtlety in the above code: specifically, I use the note_used_href and note_defined_href methods. These are important for rewriting links. Like I mentioned earlier, Agda’s HTML output assumes that each source file should produce a single HTML file – named after its qualified module – and creates links accordingly. However, my blog posts interweave multiple source files. Some links that would’ve jumped to a different file must now point to an internal identifier within the page. Another important aspect of the transformation is that, since I’m pulling HTML files from distinct files, it’s not guaranteed that each of them will have a unique id attribute. After all, Agda just assigns sequential numbers to each node that it generates; it would only take, e.g., including the first line from two distinct modules to end up with two nodes with id="1".

The solution is then twofold:

Track all the nodes referencing a particular href (made up of an HTML file and a numerical identifier, like File.html#123). When we pick new IDs – thus guaranteeing their uniqueness – we’ll visit all the nodes that refer to the old ID and HTML file, and update their href.
Track all existing Agda HTML IDs that we’re inserting. If we transfer an <a id="1234"> onto the Hugo content, we know we’ll need to pick a new ID for it (since 1234 need not be unique), and that we’ll need to redirect the other links to that new ID as the previous bullet describes.

Here’s how these two methods work:

def note_defined_href(file, href)
  file_hrefs = @local_seen_hrefs.fetch(file) do
    @local_seen_hrefs[file] = {}
  end

  uniq_id = file_hrefs.fetch(href) do
    new_id = "agda-unique-ident-#{@id_counter}"
    @id_counter += 1
    file_hrefs[href] = new_id
  end

  unless @global_seen_hrefs.include? href
    @global_seen_hrefs[href] = { :file => file, :id => uniq_id }
  end

  return uniq_id
end

def note_used_href(file, node, href)
  ref_list = @nodes_referencing_href.fetch(href) { @nodes_referencing_href[href] = [] }
  ref_list << { :file => file, :node => node }
  return href
end

Note that they use class variables: these are methods on a FileGroup class. I’ve omitted the various classes I’ve declared from the above code for brevity, but here it makes sense to show them. Like I mentioned earlier, you can view the complete code here.

Interestingly, note_defined_href makes use of two global maps: @local_seen_hrefs and @global_seen_hrefs. This helps satisfy the third constraint above, which is linking between code defined in the same series. The logic is as follows: when rewriting a link to a new HTML file and ID, if the code we’re trying to link to exists on the current page, we should link to that. Otherwise, if the code we’re trying to link to was presented in a different part of the series, then we should link to that other part. So, we consult the “local” map for hrefs that will be rewritten to HTML nodes in the current file, and as a fallback, consult the “global” map for hrefs that were introduced in other parts. The note_defined_href populates both maps, and is “biased” towards the first occurrence of a piece of code: if posts A and B define a function f, and post C only references f, then that link will go to post A’s definition, which came earlier.

The other method, note_used_href, is simpler. It just appends to a list of Nokogiri HTML nodes that reference a given href. We keep track of the file in which the reference occurred so we can be sure to consult the right sub-map of @local_seen_hrefs when checking for in-page rewrites.

After running process_source_file on all Hugo HTML files within a particular series, the following holds true:

We have inserted span or a nodes wherever Agda’s original output had nodes with id or href elements. This is with the exception of the case where Hugo’s inline HTML doesn’t “line up” with Agda’s inline HTML, which I’ve only found to happen when the leading character of an identifier is a digit.
We have picked new IDs for each HTML node we inserted that had an ID, noting them both globally and for the current file. We noted their original href value (in the form File.html#123) and that it should be transformed into our globally-unique identifiers, in the form agda-unique-ident-1234.
For each HTML node we inserted that links to another, we noted the href of the reference (also in the form File.html#123).

Now, all that’s left is to redirect the hrefs of the nodes we inserted from their old values to the new ones. I do this by iterating over @nodes_referencing_href, which contains every link we inserted.

def cross_link_files
  @nodes_referencing_href.each do |href, references|
    references.each do |reference|
      file = reference[:file]
      node = reference[:node]

      local_targets = @local_seen_hrefs[file]
      if local_targets.include? href
        # A code block in this file provides this href, create a local link.
        node['href'] = "##{local_targets[href]}"
      elsif @global_seen_hrefs.include? href
        # A code block in this series, but not in this file, defines
        # this href. Create a cross-file link.
        target = @global_seen_hrefs[href]
        other_file = target[:file]
        id = target[:id]

        relpath = Pathname.new(other_file).dirname.relative_path_from(Pathname.new(file).dirname)
        node['href'] = "#{relpath}##{id}"
      else
        # No definitions in any blog page. For now, just delete the anchor.
        node.replace node.content
      end
    end
  end
end

Notice that for the time being, I simply remove links to Agda definitions that didn’t occur in the Hugo post. Ideally, this would link to the plain, non-blog documentation page generated by Agda; however, this requires either hosting those documentation pages, or expecting the Agda standard library HTML pages to remain stable and hosted at a fixed URL. Neither was simple enough to do, so I opted for the conservative “just don’t insert links” approach.

And that’s all of the approach that I wanted to show off today! There are other details, like finding posts in the same series (I achieve this with a meta element) and invoking agda --html on the necessary source files (my build-agda-html.rb script is how I personally do this), but I don’t think it’s all that valuable to describe them here.

Unfortunately, the additional metadata I had my theme insert makes it harder for others to use this approach out of the box. However, I hope that by sharing my experience, others who write Agda and post about it might be able to get a similar solution working. And of course, it’s always fun to write about a recent project or endeavor.

Happy (dependently typed) programming and blogging!

The "Deeply Embedded Expression" Trick in Agda

Mon, 11 Mar 2024 14:25:52 -0700

I’ve been working on a relatively large Agda project for a few months now, and I’d like to think that I’ve become quite proficient. Recently, I came up with a little trick to help simplify some of my proofs, and it seems like this trick might have broader applications.

In my head, I call this trick ‘Deeply Embedded Expressions’. Before I introduce it, let me explain the part of my work that motivated developing the trick.

Proofs about Map Operations

A part of my Agda project is the formalization of simple key-value maps. I model key-value maps as lists of key-value pairs. On top of this, I implement two operations: join and meet, which in my code are denoted using ⊔ and ⊓. When “joining” two maps, you create a new map that has the keys from both input ones. If a key is only present in one of the input maps, then the new “joined” map has the same value for that key as the original. On the other hand, if the key is present in both maps, then its value in the new map is the result of “joining” the original values. The “meet” operation is similar, except instead of taking keys from either map, the result only has keys that were present in both maps, “meeting” their values. In a way, “join” and “meet” are similar to set union and intersection — but they also operate on the values in the map.

Given these operations, I need to prove certain properties of these operation. The most inconvenient to prove is probably associativity:

From Map.agda, line 752

⊔-assoc : ∀ (m₁ m₂ m₃ : Map) → ((m₁ ⊔ m₂) ⊔ m₃) ≈ (m₁ ⊔ (m₂ ⊔ m₃))

This property is, in turn, proven using two ‘subset’ relations on maps, defined in the usual way.

From Map.agda, line 755

        ⊔-assoc₁ : ((m₁ ⊔ m₂) ⊔ m₃) ⊆ (m₁ ⊔ (m₂ ⊔ m₃))

From Map.agda, line 774

        ⊔-assoc₂ : (m₁ ⊔ (m₂ ⊔ m₃)) ⊆ ((m₁ ⊔ m₂) ⊔ m₃)

The reason this property is so inconvenient to prove is that there are a lot of cases to consider. That’s because your claim, in words, is something like:

Suppose a key-value pair k , v is present in (m₁ ⊔ m₂) ⊔ m₃. Show that k , v is also in m₁ ⊔ (m₂ ⊔ m₃).

The only thing you can really do with k , v is figure out how it got into the three-way union map: did it come from m₁, m₂, or m₃, or perhaps several of them? The essence of the proof boils down to repeated uses of the fact that for a key to be in the union, it must be in at least one of the two maps. You end up with witnesses, repeated application of the same lemmas, lots of let-expressions or where clauses. It’s relatively tedious and, what’s more frustrating, driven entirely by the structure of the map operations. It seems like one shouldn’t have to mimic that structure using boilerplate lemmas. So I started looking at other ways.

Case Analysis using GADTs

A “proof by cases” in a dependently typed language like Agda usually brings to mind pattern matching. So, here’s an idea: what if for each expression involving ⊔ and ⊓, we had some kind of data type, and that data type had exactly as many inhabitants as there are cases to analyze? A data type corresponding to m₁ ⊔ m₂ might have three cases, and the one for (m₁ ⊔ m₂) ⊔ m₃ might have seven. Each case would contain the information necessary to perform the proof.

A data type whose “shape” depends on an expression in the way I described above is said to be indexed by that expression. In Agda, GADTs are used to create indexed types. My initial attempt was something like this:

data Provenance (k : A) : B → Map → Set (a ⊔ℓ b) where
    single : ∀ {v : B} {m : Map} → (k , v) ∈ m → Provenance k v m
    in₁ : ∀ {v : B} {m₁ m₂ : Expr} → Provenance k v e₁ → ¬ k ∈k m₂ → Provenance k v (e₁ ⊔ e₂)
    in₂ : ∀ {v : B} {m₁ m₂ : Expr} → ¬ k ∈k m₁ → Provenance k v m₂ → Provenance k v (e₁ ⊔ e₂)
    bothᵘ : ∀ {v₁ v₂ : B} {m₁ m₂ : Expr} → Provenance k v₁ m₁ → Provenance k v₂ m₂ → Provenance k (v₁ ⊔ v₂) (e₁ ⊔ e₂)
    bothⁱ : ∀ {v₁ v₂ : B} {m₁ m₂ : Expr} → Provenance k v₁ m₁ → Provenance k v₂ m₂ → Provenance k (v₁ ⊓ v₂) (e₁ ⊓ e₂)

I was planning on a proof of associativity (in one direction) that looked something like the following — pattern matching on cases from the new Provenance type.

⊔-assoc₁ : ((m₁ ⊔ m₂) ⊔ m₃) ⊆ (m₁ ⊔ (m₂ ⊔ m₃))
⊔-assoc₁ k v k,v∈m₁₂m₃
    with get-Provenance k,v∈m₁₂m₃
...   | in₂ k∉km₁₂ (single v∈m₃) = ...
...   | in₁ (in₂ k∉km₁ (single v∈m₂)) k∉km₃ = ...
...   | bothᵘ (in₂ k∉km₁ (single {v₂} v₂∈m₂)) (single {v₃} v₃∈m₃) = ...
...   | in₁ (in₁ (single v₁∈m₁) k∉km₂) k∉km₃ = ...
...   | bothᵘ (in₁ (single {v₁} v₁∈m₁) k∉km₂) (single {v₃} v₃∈m₃) = ...
...   | in₁ (bothᵘ (single {v₁} v₁∈m₁) (single {v₂} v₂∈m₂)) k∉ke₃ = ...
...   | bothᵘ (bothᵘ (single {v₁} v₁∈m₁) (single {v₂} v₂∈m₂)) (single {v₃} v₃∈m₃) = ...

However, this doesn’t work. Agda has trouble figuring out which cases of the Provenance GADT are allowed, in which aren’t. Is m₁ ⊔ m a single map, fit for the single case, or should it be broken up into more cases like in₁ and in₂? In general, is some expression of type Map the “bottom” of our recursion, or should it be analyzed further?

The above hints at what’s wrong. The mistake here is requiring Agda to infer the shape of our “join” and “meet” expressions from arbitrary terms. The set of expressions that we want to reason about is much more restricted – each expression will always be of three components: “meet”, “join”, and base-case maps being combined using these operations.

Defining an Expression Data Type

If you’re like me, and have spent years of your life around programming language theory and domain specific languages (DSLs), the last sentence of the previous section may be ringing a bell. In fact, it’s eerily similar to how we describe recursive grammars:

An expression of interest is either,

A map

The “join” of two expressions

The “meet” of two expressions

Mathematically, we might write this as follows:

$$ \begin{array}{rcll} e & ::= & m & \text{(maps)} \\ & | & e \sqcup e & \text{(join)} \\ & | & e \sqcap e & \text{(meet)} \end{array} $$

And in Agda,

From Map.agda, lines 543 through 546

data Expr : Set (a ⊔ℓ b) where
    `_ : Map → Expr
    _∪_ : Expr → Expr → Expr
    _∩_ : Expr → Expr → Expr

In the code, I used the set union and intersection operators to avoid overloading the ⊔ and ⊓ more than they already are.

We have just defined a very small expression language. In computer science, a language is called deeply embedded if a data type (or class hierarchy, or other ’explicit’ representation) is defined for its syntax in the host language (Agda, in our case). This is in contrast to a shallow embedding, in which expressions in the (new) language are just expressions in the host language.

In this sense, our Expr is deeply embedded — we defined new container for it, and _∪_ is a distinct entity from _⊔_. Our first attempt was a shallow embedding. That fell through because the Agda language is much broader than our expression language, which makes case analysis very difficult.

An obvious thing to do with an expression is to evaluate it. This will be important for our proofs, because it will establish a connection between expressions (created via Expr) and actual Agda objects that we need to reason about at the end of the day. The notation $\llbracket e \rrbracket$ is commonly used in PL circles for evaluation (it comes from Denotational Semantics). Thus, my Agda evaluation function is written as follows:

From Map.agda, lines 586 through 589

⟦_⟧ : Expr -> Map
⟦ ` m ⟧ = m
⟦ e₁ ∪ e₂ ⟧ = ⟦ e₁ ⟧ ⊔ ⟦ e₂ ⟧
⟦ e₁ ∩ e₂ ⟧ = ⟦ e₁ ⟧ ⊓ ⟦ e₂ ⟧

On top of this, here is my actual implementation of the Provenance data type. This time, it’s indexed by expressions in Expr, which makes it much easier to pattern match on instances:

From Map.agda, lines 591 through 596

data Provenance (k : A) : B → Expr → Set (a ⊔ℓ b) where
    single : ∀ {v : B} {m : Map} → (k , v) ∈ m → Provenance k v (` m)
    in₁ : ∀ {v : B} {e₁ e₂ : Expr} → Provenance k v e₁ → ¬ k ∈k ⟦ e₂ ⟧ → Provenance k v (e₁ ∪ e₂)
    in₂ : ∀ {v : B} {e₁ e₂ : Expr} → ¬ k ∈k ⟦ e₁ ⟧ → Provenance k v e₂ → Provenance k v (e₁ ∪ e₂)
    bothᵘ : ∀ {v₁ v₂ : B} {e₁ e₂ : Expr} → Provenance k v₁ e₁ → Provenance k v₂ e₂ → Provenance k (v₁ ⊔₂ v₂) (e₁ ∪ e₂)
    bothⁱ : ∀ {v₁ v₂ : B} {e₁ e₂ : Expr} → Provenance k v₁ e₁ → Provenance k v₂ e₂ → Provenance k (v₁ ⊓₂ v₂) (e₁ ∩ e₂)

Note that we have to use the evaluation function to be able to use operators such as ∈. That’s because these are still defined on maps, and not expressions.

With this, I was able to write my proof in the way that I had hoped. It has the exact form of my previous sketch-of-proof.

(click here to see the full example, including each case’s implementation)

From Map.agda, lines 755 through 773

        ⊔-assoc₁ : ((m₁ ⊔ m₂) ⊔ m₃) ⊆ (m₁ ⊔ (m₂ ⊔ m₃))
        ⊔-assoc₁ k v k,v∈m₁₂m₃
            with Expr-Provenance-≡ (((` m₁) ∪ (` m₂)) ∪ (` m₃)) k,v∈m₁₂m₃
        ...   | in₂ k∉ke₁₂ (single {v₃} v₃∈e₃) =
                let (k∉ke₁ , k∉ke₂) = I⊔.∉-union-∉-either {l₁ = l₁} {l₂ = l₂} k∉ke₁₂
                in (v₃ , (≈₂-refl , I⊔.union-preserves-∈₂ k∉ke₁ (I⊔.union-preserves-∈₂ k∉ke₂ v₃∈e₃)))
        ...   | in₁ (in₂ k∉ke₁ (single {v₂} v₂∈e₂)) k∉ke₃ =
                (v₂ , (≈₂-refl , I⊔.union-preserves-∈₂ k∉ke₁ (I⊔.union-preserves-∈₁ u₂ v₂∈e₂ k∉ke₃)))
        ...   | bothᵘ (in₂ k∉ke₁ (single {v₂} v₂∈e₂)) (single {v₃} v₃∈e₃) =
                (v₂ ⊔₂ v₃ , (≈₂-refl , I⊔.union-preserves-∈₂ k∉ke₁  (I⊔.union-combines u₂ u₃ v₂∈e₂ v₃∈e₃)))
        ...   | in₁ (in₁ (single {v₁} v₁∈e₁) k∉ke₂) k∉ke₃ =
                (v₁ , (≈₂-refl , I⊔.union-preserves-∈₁ u₁ v₁∈e₁ (I⊔.union-preserves-∉ k∉ke₂ k∉ke₃)))
        ...   | bothᵘ (in₁ (single {v₁} v₁∈e₁) k∉ke₂) (single {v₃} v₃∈e₃) =
                (v₁ ⊔₂ v₃ , (≈₂-refl , I⊔.union-combines u₁ (I⊔.union-preserves-Unique l₂ l₃ u₃) v₁∈e₁ (I⊔.union-preserves-∈₂ k∉ke₂ v₃∈e₃)))
        ...   | in₁ (bothᵘ (single {v₁} v₁∈e₁) (single {v₂} v₂∈e₂)) k∉ke₃ =
                (v₁ ⊔₂ v₂ , (≈₂-refl , I⊔.union-combines u₁ (I⊔.union-preserves-Unique l₂ l₃ u₃) v₁∈e₁ (I⊔.union-preserves-∈₁ u₂ v₂∈e₂ k∉ke₃)))
        ...   | bothᵘ (bothᵘ (single {v₁} v₁∈e₁) (single {v₂} v₂∈e₂)) (single {v₃} v₃∈e₃) =
                (v₁ ⊔₂ (v₂ ⊔₂ v₃) , (⊔₂-assoc v₁ v₂ v₃ , I⊔.union-combines u₁ (I⊔.union-preserves-Unique l₂ l₃ u₃) v₁∈e₁ (I⊔.union-combines u₂ u₃ v₂∈e₂ v₃∈e₃)))

The General Trick

So far, I’ve presented a problem I faced in my Agda proof and a solution for that problem. However, it may not be clear how useful the trick is beyond this narrow case that I’ve encountered. The way I see it, the “deeply embedded expression” trick is applicable whenever you have data that is constructed from some fixed set of cases, and when proofs about that data need to follow the structure of these cases. Thus, examples include:

Proofs about the origin of keys in a map (this one): the “data” is the key-value map that is being analyzed. The enumeration of cases for this map is driven by the structure of the “join” and “meet” operations used to build the map.
Automatic derivation of function properties: suppose you’re interested in working with continuous functions. You also know that the addition, subtraction, and multiplication of two functions preserves continuity. Of course, the constant function $x \mapsto c$ and the identity function $x \mapsto x$ are continuous too. You may define an expression data type that has cases for these operations. Then, your evaluation function could transform the expression into a plain function, and a proof on the structure of the expression can be used to verify the resulting function’s continuity.
Proof search for algebraic expressions: suppose that you wanted to automatically find solutions for certain algebraic (in)equalities. Instead of using some sort of reflection mechanism to inspect terms and determine how constraints should be solved, you might represent the set of operations in you equation system as cases in a data type. You can then use regular Agda code to manipulate terms; an evaluation function can then be used to recover the equations in Agda, together with witnesses justifying the solution.

There are some pretty clear commonalities about examples above, which are the ingredients to this trick:

The expression: you create a new expression data type that encodes all the operations (and bases cases) on your data. In my example, this is the Expr data type.
The evaluation function: you provide a way to lower the expression you’ve defined back into a regular Agda term. This connects your (abstract) operations to their interpretation in Agda. In my example, this is the ⟦_⟧ function.
The proofs: you write proofs that consider only the fixed set of cases encoded by the data type (Expr), but state properties about the evaluated expression. In my example, this is Provenance and the Expr-Provenance function. Specifically, the Provenance data type connects expressions and the terms they evaluate to, because it is indexed by expressions, but contains data in the form k ∈k ⟦ e₂ ⟧.

Conclusion

I’ll be the first to admit that this trick is quite situational, and may not be as far-reaching as the “Is Something” pattern I wrote about before, which seems to occur far more in the wild. However, there have now been two times when I personally reached for this trick, which seems to suggest that it may be useful to someone else.

I hope you’ve found this useful. Happy (dependently typed) programming!

Bergamot: Exploring Programming Language Inference Rules

Fri, 22 Dec 2023 18:16:44 -0800

Inference Rules and the Study of Programming Languages

In this post, I will talk about inference rules, particularly in the field of programming language theory. The first question to get out of the way is “what on earth is an inference rule?”. The answer is simple: an inference rule is just a way of writing “if … then …”. When writing an inference rule, we write the “if” stuff above a line, and the “then” stuff below the line. Really, that’s all there is to it. I’ll steal an example from another one of my posts on the blog – here’s an inference rule:

$$ \frac {\text{I'm allergic to cats} \quad \text{My friend has a cat}} {\text{I will not visit my friend very much}} $$

We can read this as “if I’m allergic to cats, and my friend has a cat, then I will not visit my friend very much”.

In the field of programming languages, inference rules are everywhere. Practically any paper I read has a table that looks something like this:

Inference rules from Logarithm and program testing by Kuen-Bang Hou (Favonia) and Zhuyang Wang

And I, for one, love it! They’re a precise and concise way to describe static and dynamic behavior of programs. I might’ve written this elsewhere on the blog, but whenever I read a paper, my eyes search for the rules first and foremost.

But to those just starting their PL journey, inference rules can be quite cryptic – I know they were to me! The first level of difficulty are the symbols: we have lots of Greek ($\Gamma$ and $\Delta$ for environments, $\tau$ and perhaps $\sigma$ for types), and the occasional mathematical symbol (the “entails” symbol $\vdash$ is the most common, but for operational semantics we can have $\leadsto$ and $\Downarrow$). If you don’t know what they mean, or if you’re still getting used to them, symbols in judgements are difficult enough to parse.

The second level of difficulty is making sense of the individual rules: although they tend to not be too bad, for some languages even making sense of one rule can be challenging. The following rule from the Calculus of Inductive Constructions is a doozy, for instance.

The match inference rule from Introduction to the Calculus of Inductive Constructions by Christine Paulin-Mohring

Just look at the metavariables! We have $\textit{pars}$, $t_1$ through $t_p$, $x_1$ through $x_n$, plain $x$, and at least two other sets of variables. Not only this, but the rule requires at least some familiarity with GADTs to understand completely.

The third level is making sense of how the rules work, together. In my programming languages class in college, a familiar question was:

the Hindley-Milner type system supports let-polymorphism only. What is it about the rules that implies let-polymorphism, and not any other kind of polymorphism?

If you don’t know the answer, or the question doesn’t make sense do you, don’t worry about it – suffice to say that whole systems of inference rules exhibit certain behaviors, and it takes familiarity with several rules to spot these behaviors.

Seeing What Works and What Doesn’t

Maybe I’m just a tinker-y sort of person, but for me, teaching inference rules just by showing them is not really enough. For instance, let me show you two ways of writing the following (informal) rule:

When adding two operands, if both operands are strings, then the result of adding them is also a string.

There’s a right way to write this inference rule, and there is a wrong way. Let me show you both, and try to explain the two. First, here’s the wrong way:

$$ \cfrac {x : \text{string} \in \Gamma \quad y : \text{string} \in \Gamma} {\Gamma \vdash x + y : \text{string}} $$

This says that the type of adding two variables of type string is still string. Here, $\Gamma$ is a context, which keeps track of which variable has what type. Writing $x : \text{string} \in \Gamma$ is the same as saying “we know the variable x has type string”. The whole rule reads,

If the variables x and y both have type string, then the result of adding these two variables, x+y, also has type string.

The trouble with this rule is that it only works when adding two variables. But x+x is not itself a variable, so the rule wouldn’t work for an expression like (x+x)+(y+y). The proper way of writing the rule is, then, something like this:

$$ \cfrac {\Gamma \vdash e_1 : \text{string} \quad \Gamma \vdash e_2 : \text{string}} {\Gamma \vdash e_1 + e_2 : \text{string}} $$

This rule says:

If the two subexpressions e1 and e2 both have type string, then the result of adding these two subexpressions, e1+e2, also has type string.

Much better! We can apply this rule recursively: to get the type of (x+x)+(y+y), we consider (x+x) and (y+y) as two subexpressions, and go on to compute their types first. We can then break (x+x) into two subexpressions (x and x), and determine their type separately. Supposing that the variables x and y indeed have the type string, this tells us that (x+x) and (y+y) are both string, and therefore that the whole of (x+x)+(y+y) is a string.

What I’d really like to do is type the program in question and have the computer figure out whether my rules accept or reject this program. With my new rules, perhaps I’d get something like this:

Verifying the (x+x)+(y+y) expression using the good rule

To fully understand how the rule works to check a big expression like the above sum, I’d need to see the recursion we applied a couple of paragraphs ago. My ideal tool would display this too. For simplicity, I’ll just show the output for (1+1)+(1+1), sidestepping variables and using numbers instead. This just saves a little bit of space and visual noise.

Verifying the (1+1)+(1+1) expression using the good rule

On the other hand, since the sum of two xs and two ys doesn’t work with my old rules, maybe i wouldn’t get a valid type at all:

Verifying (unsuccessfully) the (x+x)+(y+y) expression using the old rule

More generally, I want to be able to write down some inference rules, and apply them to some programs. This way, I can see what works and what doesn’t, and when it works, which rules were used for what purposes. I also want to be able to try tweaking, removing, or adding inference rules, to see what breaks.

This brings me to the project that I’m trying to show off in this post: Bergamot!

Introducing Bergamot

A certain class of programming languages lends itself particularly well to writing inference rules and applying them to programs: logic programming. The most famous example of a logic programming language is Prolog. In logic programming languages like Prolog, we can write rules describing when certain statements should hold. The simplest rule I could write is a fact. Perhaps I’d like to say that the number 42 is a “good” number:

good(42).

Perhaps I’d then like to say that adding two good numbers together creates another good number.

good(N) :- good(A), good(B), N is A+B.

The above can be read as:

the number N is good if the numbers A and B are good, and N is the sum of A and B.

I can then ask Prolog to give me some good numbers:

?- good(X)

Prompting Prolog a few times, I get:

X = 42
X = 84
X = 126
X = 168

It’s not a huge leap from this to programming language type rules. Perhaps instead of something being “good”, we can say that it has type string. Of course, adding two strings together, as we’ve established, creates another string. In Prolog:

/* A string literal like "hello" has type string */
type(_, strlit(_), string).
/* Adding two string expressions has type string */
type(Env, plus(X, Y), string) :- type(Env, X, string), type(Env, Y, string).

That’s almost identical to our inference rules above, except that it’s written using code instead of mathematical notation! If we could just take these Prolog rules and display them as inference rules, we’d be able to “have our cake” (draw pretty inference rules like in the papers) and “eat it too” (run our rules on the computer against various programs).

This is where it gets a teensy bit hairy. It’s not that easy to embed a Prolog engine into the browser; alternatives that I’ve surveyed are either poorly documented, hard to extend, or both. Furthermore, for studying what each rule was used for, it’s nice to be able to see a proof tree: a tree made up from the rules that we used to arrive at a particular answer. Prolog engines are excellent at applying rules and finding answers, but they don’t usually provide a way to get all the rules that were used, making it harder to get proof trees.

Thus, Bergamot is a new, tiny programming language that I threw together in a couple of days. It comes as JavaScript-based widget, and can be embedded into web pages like this one to provide an interactive way to write and explore proof trees. Here’s a screenshot of what all of that looks like:

A screenshot of a Bergamot widget with some type rules

The components of Bergamot are:

The programming language, as stated above. This language is a very simple, unification-based logic programming language.
A rule rendering system, which takes Prolog-like rules written in Bergamot and converts them into pretty LaTeX inference rules.
An Elm-based widget that you can embed into your web page, which accepts Bergamot rule and an input expression, and applies the rules to produce a result (or a whole proof tree!).

Much like in Prolog, we can write Bergamot rules that describe when certain things are true. Unlike Prolog, Bergamot requires each rule to have a name. This is common practice in programming languages literature (when we talk about rules in papers, we like to be able to refer to them by name). Below are some sample Bergamot rules, corresponding to the first few inference rules in the above screenshot.

TNumber @ type(?Gamma, intlit(?n), number) <-;
TString @ type(?Gamma, strlit(?s), string) <-;
TVar @ type(?Gamma, var(?x), ?tau) <- inenv(?x, ?tau, ?Gamma);
TPlusI @ type(?Gamma, plus(?e_1, ?e_2), number) <-
    type(?Gamma, ?e_1, number), type(?Gamma, ?e_2, number);
TPlusS @ type(?Gamma, plus(?e_1, ?e_2), string) <-
    type(?Gamma, ?e_1, string), type(?Gamma, ?e_2, string);

Unlike Prolog, where “variables” are anything that starts with a capital letter, in Bergamot, variables are things that start with the special ? symbol. Also, Prolog’s :- has been replaced with an arrow symbol <-, for reverse implication. These are both purely syntactic differences.

Demo

If you want to play around with it, here’s an embedded Bergamot widget with some rules pre-programmed in. [note: Actually, one of the rules is incorrect to my knowledge. Can you spot it? Hint: is \x : number. \x: string. x+1 well-typed? What does Bergamot report? Can you see why? ] It has two modes:

Language Term: accepts a rather simple programming language to typecheck. Try 1+1, fst((1,2)), or maybe (\x : number. x) 42.
Query: it accepts Bergamot expressions to query, similarly to Prolog; try type(empty, ?e, tpair(number, string)) to search for expressions that have the type “a pair of a number and a string”.

Rendering Bergamot with Bergamot

There’s something to be said about the conversion between Bergamot’s rules, encoded as plain text, and pretty LaTeX-based inference rules that the users see. Crucially, we don’t want to hardcode how any particular Bergamot expression is rendered. For one, this is a losing battle: we can’t possibly keep up with all the notation that people use in PL literature, and even if we focused ourselves on only “beginner” notation, there wouldn’t be one way to do it! Different PL papers and texts use slightly different variations of notation. For instance, I render my pairs as $(a, b)$, but the very first screenshot in this post demonstrates a PL paper that writes pairs as $\langle a, b \rangle$. Neither way (as far as I know!) is right or wrong. But if we hardcode one, we lose the ability to draw the other.

More broadly, one aspect about writing PL rules is that we control the notation. We are free to define shorthands, symbols, and anything else that would make reading our rules easier for others. As an example, a paper from POPL22 about programming language semantics with garbage collection used a literal trash symbol in their rules:

A rule that uses a trashcan icon as notation, from A separation logic for heap space under garbage collection by Jean-Marie Madiot and François Pottier

Thus, what I want to do is encourage the (responsible) introduction of new notation. This can only be done if Bergamot itself supports custom notation.

When thinking about how I’d like to implement this custom notation, I was imagining some sort of templated rule engine, that would define how each term in a Bergamot program can be converted to its LaTeX variant. But then I realized: Bergamot is already a rule engine! Instead of inventing yet another language or format for defining LaTeX pretty printing, I could just use Bergamot. This turned out to work quite nicely – the “Presentation Rules” tab in the demo above should open a text editor with Bergamot rules that handle the conversion of Bergamot notation into LaTex. Here are some example rules:

LatexPlus @ latex(plus(?e_1, ?e_2), ?l) <- latex(?e_1, ?l_1), latex(?e_2, ?l_2), join([?l_1, " + ", ?l_2], ?l);
LatexPair @ latex(pair(?e_1, ?e_2), ?l) <- latex(?e_1, ?l_1), latex(?e_2, ?l_2), join(["(", ?l_1, ",  ", ?l_2, ")"], ?l);

If we change the LatexPair to the following, we can make all pairs render using angle brackets:

LatexPair @ latex(pair(?e_1, ?e_2), ?l) <- latex(?e_1, ?l_1), latex(?e_2, ?l_2), join(["\\langle", ?l_1, ",  ", ?l_2, "\\rangle"], ?l);

The LaTeX output when angle brackets are used in the rule instead of parentheses.

You can write rules about arbitrary Bergamot terms for rendering; thus, you can invent completely new notation for absolutely anything.

Next Steps

I hope to use Bergamot to write a series of articles about type systems. By providing an interactive widget, I hope to make it possible for users to do exercises: writing variations of inference rules, or even tweaking the notation, and checking them against sets of programs to make sure that they work. Of course, I also hope that Bergamot can be used to explore why an existing set of inference rules (such as Hindley-Milner) works. Stay tuned for those!

My Favorite C++ Pattern: X Macros

Sat, 14 Oct 2023 15:38:17 -0700

When I first joined the Chapel team, one pattern used in its C++-based compiler made a strong impression on me. Since then, I’ve used the pattern many more times, and have been very satisfied with how it turned out. However, it feels like the pattern is relatively unknown, so I thought I’d show it off, and some of its applications in the Chapel compiler. I’ve slightly tweaked a lot of the snippets I directly show in this article for the sake of simpler presentation; I’ve included links to the original code (available on GitHub) if you want to see the unabridged version.

Broadly speaking, the “X Macros” pattern is about generating code. If you have a lot of repetitive code to write (declaring many variables or classes, performing many very similar actions, etc.), this pattern can save a lot of time, lead to much more maintainable code, and reduce the effort required to add more code.

I will introduce the pattern in its simplest form with my first example: interning strings.

Application 1: String Interning

The Chapel compiler interns a lot of its strings. This way, it can reduce the memory footprint of keeping identifiers in memory (every string "x" is actually the same string) and make for much faster equality comparisons (you can just perform a pointer comparison!). Generally, a Context class is used to manage interning state. A new interned string can be constructed using the context object in the following manner:

UniqueString::get(ctxPtr, "the string");

Effectively, this performs a search of the currently existing unique strings. If one with the content ("the string" in this case) doesn’t exist, it’s created and registered with the Context. Otherwise, the existing string is returned. Some strings, however, occur a lot in the compiler, to the point that it would be inefficient to perform the whole “find-or-create” operation every time. One example is the "this" string, which is an identifier with a lot of special behavior in the language (much like this in languages such as Java). To support such frequent flier strings, the compiler initializes them once, and creates a variable per-string that can be accessed to get that string’s value.

There’s that repetitive code. Defining a brand new variable for each string, of which there are around 100 at the time of writing, is a lot of boilerplate. There are also at least two places where code needs to be added: once in the declaration of the variables, once in the code that initializes them. [note: A third use in the compiler is actually a variadic template defined over character arrays. The template is defined and specialized in such a way that you can refer to a variable by its string contents (i.e., you can write USTR("the string") instead of theStringVariable). ] It would be very easy to accidentally modify the former but not the latter, especially for developers not familiar with how these “common strings” are implemented.

This is where the X Macros come in. If you look around the compiler source code, there’s a header file that looks something like the following:

)" data-file-path="frontend/include/chpl/framework/all-global-strings.h">

From all-global-strings.h, around line 31

X(align          , "align")
X(atomic         , "atomic")
X(bool_          , "bool")
X(borrow         , "borrow")
X(borrowed       , "borrowed")
X(by             , "by")
X(bytes          , "bytes")
// A lot more of these...

What’s this X thing? That right there is the essence of the pattern: the macro X isn’t defined in the header! Effectively, all-global-strings.h is just a list, and we can “iterate” over this list to generate some code for each one of its elements, in as many places as we want. What I mean by this is that we can then write code like the following:

)" data-file-path="frontend/include/chpl/framework/global-strings.h">

From global-strings.h, around line 76

    struct GlobalStrings {
#define X(field, str) UniqueString field;
#include "all-global-strings.h"
#undef X
    };

In this case, we define the macro X to ignore the value of the string (we’re just declaring it here), and create a new UniqueString variable declaration. Since the declaration is inside the GlobalStrings struct, this ends up creating a field. Just like that, we’ve declared a class with over 100 fields. Initialization is equally simple:

)" data-file-path="frontend/lib/framework/Context.cpp">

From Context.cpp, around line 49

    GlobalStrings globalStrings;
    Context rootContext;

    static void initGlobalStrings() {
#define X(field, str) globalStrings.field = UniqueString::get(&rootContext, str);
#include "chpl/framework/all-global-strings.h"
#undef X
    }

With this, we’ve completely automated the code for for both declaring and initializing all 100 of our unique strings. Adding a new string doesn’t require a developer to know all of the places where this is implemented: just by modifying the all-global-strings.h header with a new call to X, they can add both a new variable and code to initialize it. Pretty robust!

Application 2: AST Class Hierarchy

Altough the interned strings are an excellent first example, it wasn’t the first usage of X Macros that I encountered in the Chapel compiler. Beyond strings, the compiler uses X Macros to represent the whole class hierarchy of abstract syntax tree (AST) nodes that it uses. Here, the code is actually a bit more complicated; the class hierarchy isn’t a list like the strings were; it is itself a tree. To represent such a structure, we need more than a single X macro; the compiler went with AST_NODE, AST_BEGIN_SUBCLASSES, and AST_END_SUBCLASSES. Here’s what that looks like:

)" data-file-path="frontend/include/chpl/uast/uast-classes-list.h">

From uast-classes-list.h, around line 96

  // Other AST nodes above...

  AST_BEGIN_SUBCLASSES(Loop)
      AST_NODE(DoWhile)
      AST_NODE(While)

    AST_BEGIN_SUBCLASSES(IndexableLoop)
      AST_NODE(BracketLoop)
      AST_NODE(Coforall)
      AST_NODE(For)
      AST_NODE(Forall)
      AST_NODE(Foreach)
    AST_END_SUBCLASSES(IndexableLoop)

  AST_END_SUBCLASSES(Loop)

  // Other AST nodes below...

The class hierarchy defined in this header, called uast-classes-list.h, is used for a lot of things, both in the compiler itself and in some libraries that use the compiler. I’ll go through the use cases in turn.

Tags and Dynamic Casting

First, to deal with a general absence of RTTI, the hierarchy header is used to declare a “tag” enum. Each AST node has a tag matching its class; this allows us inspect the AST and perform safe casts similar to dynamic_cast. Note that for parent classes (defined via BEGIN_SUBCLASSES), we actually end up creating two tags: one START_... and one END_.... The reason for this will become clear in a moment.

)" data-file-path="frontend/include/chpl/uast/AstTag.h">

From AstTag.h, around line 36

enum AstTag {
#define AST_NODE(NAME) NAME ,
#define AST_BEGIN_SUBCLASSES(NAME) START_##NAME ,
#define AST_END_SUBCLASSES(NAME) END_##NAME ,
#include "chpl/uast/uast-classes-list.h"
#undef AST_NODE
#undef AST_BEGIN_SUBCLASSES
#undef AST_END_SUBCLASSES
  NUM_AST_TAGS,
  AST_TAG_UNKNOWN
};

The above snippet makes AstTag contain elements such as DoWhile, While, START_Loop, and END_Loop. For convenience, we also add a couple of other elements: NUM_AST_TAGS, which is automatically assigned the number of tags we generated, [note: This is because C++ assigns integer values to enum elements sequentially, starting at zero. ] and a generic “unknown tag” value.

Having generated the enum elements in this way, we can write query functions. This way, the API consumer can write isLoop(tag) instead of manually performing a comparison. Code generation here is actually split into two distinct forms of “is bla” methods: those for concrete AST nodes (DoWhile, While) and those for abstract base classes (Loop). The reason for this is simple: only a AstTag::DoWhile represents a do-while loop, but both DoWhile and While are instances of Loop. So, isLoop should return true for both.

This is where the START_... and END_... enum elements come in. Reading the header file top-to-bottom, we first end up generating START_Loop, then DoWhile and While, and then END_Loop. Since C++ assigns integer value to enums sequentially, to check if a tag “extends” a base class, it’s sufficient to check if its value is greater than the START token, and smaller than the END token – this means it was declared within the matching pair of BEGIN_SUBCLASSES and END_SUBCLASES.

)" data-file-path="frontend/include/chpl/uast/AstTag.h">

From AstTag.h, around line 59

// define is___ for leaf and regular nodes
// (not yet for abstract parent classes)
#define AST_NODE(NAME) \
  static inline bool is##NAME(AstTag tag) { \
    return tag == NAME; \
  }
#define AST_BEGIN_SUBCLASSES(NAME)
#define AST_END_SUBCLASSES(NAME)
// Apply the above macros to uast-classes-list.h
#include "chpl/uast/uast-classes-list.h"
// clear the macros
#undef AST_NODE
#undef AST_BEGIN_SUBCLASSES
#undef AST_END_SUBCLASSES

// define is___ for abstract parent classes
#define AST_NODE(NAME)
#define AST_BEGIN_SUBCLASSES(NAME) \
  static inline bool is##NAME(AstTag tag) { \
    return START_##NAME < tag && tag < END_##NAME; \
  }
#define AST_END_SUBCLASSES(NAME)
// Apply the above macros to uast-classes-list.h
#include "chpl/uast/uast-classes-list.h"
// clear the macros
#undef AST_NODE
#undef AST_BEGIN_SUBCLASSES
#undef AST_END_SUBCLASSES

These helpers are quite convenient. Here are a few examples of what we end up with:

isFor(AstTag::For)             // Returns true; a 'for' loop is indeed a 'for' loop.
isIndexableLoop(AstTag::For)   // Returns true; a 'for' loop is "indexable" ('for i in ...')
isLoop(AstTag::For)            // Returns true; a 'for' loop is a loop.
isFor(AstTag::While)           // Returns false; a 'while' loop is not a 'for' loop.
isIndexableLoop(AstTag::While) // Returns false; a 'while' loop uses a boolean condition, not an index
isLoop(AstTag::While)          // Returns true; a 'while' loop is a loop.

On the top-level AST node class, we generate isWhateverNode and toWhateverNode for each AST subclass. Thus, user code is able to inspect the AST and perform (checked) casts using plain methods. I omit isWhateverNode here for brevity (its definition is very simple), and include only toWhateverNode.

)" data-file-path="frontend/include/chpl/uast/AstNode.h">

From AstNode.h, around line 313

  #define AST_TO(NAME) \
    const NAME * to##NAME() const { \
      return this->is##NAME() ? (const NAME *)this : nullptr; \
    } \
    NAME * to##NAME() { \
      return this->is##NAME() ? (NAME *)this : nullptr; \
    }
  #define AST_NODE(NAME) AST_TO(NAME)
  #define AST_LEAF(NAME) AST_TO(NAME)
  #define AST_BEGIN_SUBCLASSES(NAME) AST_TO(NAME)
  #define AST_END_SUBCLASSES(NAME)
  // Apply the above macros to uast-classes-list.h
  #include "chpl/uast/uast-classes-list.h"
  // clear the macros
  #undef AST_NODE
  #undef AST_LEAF
  #undef AST_BEGIN_SUBCLASSES
  #undef AST_END_SUBCLASSES
  #undef AST_TO

These methods are used heavily in the compiler. For example, here’s a completely random snippet of code I pulled out:

)" data-file-path="frontend/lib/resolution/Resolver.cpp">

From Resolver.cpp, around line 1161

  if (auto var = decl->toVarLikeDecl()) {
    // Figure out variable type based upon:
    //  * the type in the variable declaration
    //  * the initialization expression in the variable declaration
    //  * the initialization expression from split-init

    auto typeExpr = var->typeExpression();
    auto initExpr = var->initExpression();

    if (auto var = decl->toVariable())
      if (var->isField())
        isField = true;

Thus, developers adding new AST nodes are not required to manually implement the isWhatever, toWhatever, and other functions. This and a fair bit of other AST functionality (which I will cover in the next subsection) is automatically generated using X Macros.

You haven't actually shown how the AST node classes are declared, only the tags. It seems implausible that they be generated using this same strategy - doesn't each AST node have its own different methods and implementation code? You're right. The AST node classes are defined "as usual", and their constructors must explicitly set their tag field to the corresponding AstTag value. It's also on the person defining the new class to extend the node that they promise to extend in uast-classes-list.h. This seems like an opportunity for bugs. Nothing is stopping a developer from returning the wrong tag, which would break the auto-casting behavior. Yes, it's not bulletproof. Just recently, a team meber found a bug in which a node was listed to inherit from AstNode, but actually inherited from NamedDecl. The toNamedDecl method would not have worked on it, even though it inherited from the class.

Still, this pattern provides the Chapel compiler with a lot of value; I will show more use cases in the next subsection, like promised.

The Visitor Pattern without Double Dispatch

The Visitor Pattern is very important in general, but it’s beyond ubiquitous for us compiler developers. It helps avoid bloating AST node classes with methods and state required for the various operations we perform on them. It also often saves us from writing AST traversal code.

Essentially, rather than adding each new operation (e.g. convert to string, compute the type, assign IDs) as methods on each AST node class, we extract this code into a per-operation visitor. This visitor is a class that has methods implementing the custom behavior on the AST nodes. A visit(WhileLoop*) method might be used to perform the operation on ‘while’ loops, and visit(ForLoop*) might do the same for ‘for’ loops. The AST nodes themselves only have a traverse method that accepts a visitor, whatever it may be, and calls the appropriate visit methods. This way, the AST node implementations remain simple and relatively stable.

As a very simple example, suppose you wanted to count the number of loops used in a program for an unspecified reason. You could add a countLoops method, but then you’ve introduced a method to the AST node API for what might be a one-time, throwaway operation. With the visitor pattern, you don’t need to do that; you can just create a new class:

struct MyVisitor {
    int count = 0;

    void visit(const Loop*) { count += 1; }
    void visit(const AstNode*) { /* do nothing for other nodes */ }
}

int countLoops(const AstNode* root) {
    MyVisitor visitor;
    root->traverse(visitor);
    return visitor.count;
}

The traverse method is a nice API, isn’t it? It’s very easy to add operations that work on your syntax trees, without modifying them. There is still an important open question, though: how does traverse know to call the right visit function?

If traverse were only defined on AstNode*, and it simply called visit(this), we’d always end up calling the AstNode version of the visit function. This is because C++ doesn’t dynamic dispatch based on the types of method arguments. [note: Obviously, C++ has the ability to pick the right method based on the runtime type of the receiver: that's just virtual functions and vtables. ] Statically, the call clearly accepts an AstNode, and nothing more specific. The compiler therefore picks that version of the visit method.

The “traditional” way to solve this problem in a language like C++ or Java is called double dispatch. Using our example as reference, this involves making each AST node class have its own traverse method. This way, calls to visit(this) have more specific type information, and are resolved to the appropriate overload. But that’s more boilerplate code: each new AST node will need to have a virtual traverse method that looks something like this:

void MyNode::traverse(Visitor& v) {
  v.visit(this);
}

It would also require all visitors to extend from Visitor. So now you have:

Boilerplate code on every AST node that looks the same but needs to be duplicated
A parent Visitor class that must have a visit method for each AST node in the language (so that children can override it).
To make it easier to write code like our MyVisitor above, the visit methods in the Visitor must be written such that visit(ChildNode*) calls visit(ParentNode*) by default. Otherwise, the Loop overload wouldn’t have been called by the DoWhile overload (e.g.).

So there’s a fair bit of tedious boilerplate, and more code to manually modify when adding an AST node: you have to go and adjust the Visitor class with new visit stub.

The reason all of this is necessary is that everyone (myself included) generally agrees that code like the following is generally a bad idea:

struct AstNode {
  void traverse(Visitor& visitor) {
    if (auto forLoop = toForLoop()) {
      visitor.visit(forLoop);
    } else if (auto whileLoop = toWhileLoop()) {
      visitor.visit(whileLoop);
    } else {
      // 100 more lines like this...
    }
  }
}

After all, what happens when you add a new AST node? You’d still have to modify this list, and since everything still extends Visitor, you’d still need to add a new visit stub there. But what if there were no base class? Instead, what if traverse were a template?

struct AstNode {
  template <typename VisitorType>
  void traverse(VisitorType& visitor) {
    if (auto forLoop = toForLoop()) {
      visitor.visit(forLoop);
    } else if (auto whileLoop = toWhileLoop()) {
      visitor.visit(whileLoop);
    } else {
      // 100 more lines like this...
    }
  }
}

Note that this wouldn’t be possible to write in C++ if visit were a virtual method; have you ever heard of a virtual template? With code like this, the VisitorType wouldn’t need to define every overload, as long as it had a version for AstNode. Furthermore, C++’s regular overload resolution rules would take care of calling the Loop overload if a more specific one for DoWhile didn’t exist.

The only problem that remains is that of having a 100-line if-else (which could be a switch to little aesthetic benefit). But this is exactly where the X Macro pattern shines again! We already have a list of all AST node classes, and the code for invoking them is nearly identical. Thus, the Chapel compiler has a doDispatch function (used by traverse) that looks like this:

)" data-file-path="frontend/include/chpl/uast/AstNode.h">

From AstNode.h, around line 377

    static void doDispatch(const AstNode* ast, Visitor& v) {

      switch (ast->tag()) {
        #define CONVERT(NAME) \
          case chpl::uast::asttags::NAME: \
          { \
            v.visit((const chpl::uast::NAME*) ast); \
            return; \
          }

        #define IGNORE(NAME) \
          case chpl::uast::asttags::NAME: \
          { \
            CHPL_ASSERT(false && "this code should never be run"); \
          }

        #define AST_NODE(NAME) CONVERT(NAME)
        #define AST_BEGIN_SUBCLASSES(NAME) IGNORE(START_##NAME)
        #define AST_END_SUBCLASSES(NAME) IGNORE(END_##NAME)

        #include "chpl/uast/uast-classes-list.h"

        IGNORE(NUM_AST_TAGS)
        IGNORE(AST_TAG_UNKNOWN)

        #undef AST_NODE
        #undef AST_BEGIN_SUBCLASSES
        #undef AST_END_SUBCLASSES
        #undef CONVERT
        #undef IGNORE
      }

      CHPL_ASSERT(false && "this code should never be run");
    }

And that’s it. We have automatically generated the traversal code, allowing us to use the visitor pattern in what I think is a very elegant way. Assuming a developer adding a new AST node updates the uast-classes-list.h header, the traversal logic will be auto-modified to properly handle the new node.

Generating a Python Class Hierarchy

This is a fun one. For a while, in my spare time, I was working on Python bindings for Chapel. These bindings are oriented towards developing language tooling: it feels much easier to write a language linter, auto-formatter, or maybe even a language server in Python rather than in C++. It’s definitely much easier to use Python to develop throwaway scripts that work with Chapel programs, which is something that developers on the Chapel team tend to do quite often.

I decided I wanted the Python AST node class hierarchy to match the C++ version. This is convenient for many reasons, including being able to wrap methods on parent AST nodes and have them be available through child AST nodes and having isinstance work properly. It’s also advantageous from the point of view of conceptual simplicity. However, I very much did not want to write CPython API code to define the many AST node classes that are available in the Chapel language.

Once again, the uast-classes-list.h header came into play here. With little effort, I was able to auto-generate PyTypeObjects for each AST node in the class hierarchy:

)" data-file-path="tools/chapel-py/chapel.cpp">

From chapel.cpp, around line 563

#define DEFINE_PY_TYPE_FOR(NAME, TAG, FLAGS)\
  PyTypeObject NAME##Type = { \
    PyVarObject_HEAD_INIT(NULL, 0) \
    .tp_name = #NAME, \
    .tp_basicsize = sizeof(NAME##Object), \
    .tp_itemsize = 0, \
    .tp_flags = FLAGS, \
    .tp_doc = PyDoc_STR("A Chapel " #NAME " AST node"), \
    .tp_methods = (PyMethodDef*) PerNodeInfo<TAG>::methods, \
    .tp_base = parentTypeFor(TAG), \
    .tp_init = (initproc) NAME##Object_init, \
    .tp_new = PyType_GenericNew, \
  };

#define AST_NODE(NAME) DEFINE_PY_TYPE_FOR(NAME, chpl::uast::asttags::NAME, Py_TPFLAGS_DEFAULT)
#define AST_BEGIN_SUBCLASSES(NAME) DEFINE_PY_TYPE_FOR(NAME, chpl::uast::asttags::START_##NAME, Py_TPFLAGS_BASETYPE)
#define AST_END_SUBCLASSES(NAME)
#include "chpl/uast/uast-classes-list.h"
#undef AST_NODE
#undef AST_BEGIN_SUBCLASSES
#undef AST_END_SUBCLASSES

You may have noticed that I snuck templates into the code above. The motivation there is to avoid writing out the (usually empty) Python method table for every single AST node. In particular, I have a template that, by default, provides an empty method table, which can be specialized per node to add methods when necessary. This detail is useful for application 3 below, but not necessary to understand the use of X Macros here.

I used the same < and > trick to generate the parentTypeFor each tag:

)" data-file-path="tools/chapel-py/chapel.cpp">

From chapel.cpp, around line 157

static PyTypeObject* parentTypeFor(chpl::uast::asttags::AstTag tag) {
#define AST_NODE(NAME)
#define AST_LEAF(NAME)
#define AST_BEGIN_SUBCLASSES(NAME)
#define AST_END_SUBCLASSES(NAME) \
  if (tag > chpl::uast::asttags::START_##NAME && tag < chpl::uast::asttags::END_##NAME) { \
    return &NAME##Type; \
  }
#include "chpl/uast/uast-classes-list.h"
#include "chpl/uast/uast-classes-list.h"
#undef AST_NODE
#undef AST_LEAF
#undef AST_BEGIN_SUBCLASSES
#undef AST_END_SUBCLASSES
  return &AstNodeType;
}

A few more invocations of the uast-classes-list.h macro, and I had a working class hierarchy. I didn’t explicitly mention any AST node at all; all was derived from the Chapel compiler header. This also meant that as the language changed and the AST class hierarchy developed, the Python bindings’ code would not need to be updated. As long as it was compiled with an up-to-date version of the header, the hierarchy would match that present within the language.

This allows for code like the following to be written in Python:

def print_decls(mod):
    """
    Print all the things declared in this Chapel module.
    """
    for child in mod:
        if isinstance(child, NamedDecl):
            print(child.name())

Application 3: CPython Method Tables and Getters

The Chapel Python bindings use the X Macro pattern another time, actually. Like I mentioned earlier, I use template specialization to reduce the amount of boilerplate code required for declaring Python objects. In particular, there’s a general method table declared as follows:

)" data-file-path="tools/chapel-py/chapel.cpp">

From chapel.cpp, around line 541

template <chpl::uast::asttags::AstTag tag>
struct PerNodeInfo {
  static constexpr PyMethodDef methods[] = {
    {NULL, NULL, 0, NULL}  /* Sentinel */
  };
};

Then, when I need to add methods, I use template specialization by writing something like the following:

template <>
struct PerNodeInfo<TheAstTag> {
  static constexpr PyMethodDef methods[] = {
    {"method_name", TheNode_method_name, METH_NOARGS, "Documentation string"},
    // ... more like the above ...
    {NULL, NULL, 0, NULL}  /* Sentinel */
  };
};

When reviewing a PR that adds more methods to the Python bindings (by defining new TheNode_methodname functions and then including them in the method table), I noticed that in the PR, the developer added some methods but forgot to put them into the respective table, leaving them unusable by the Python client code. This came with the additional observation that there was a moderate amount of duplication when declaring the C++ functions and then listing them in the table. The name (method_name in the code) occurred many times.

The developer who opened the PR suggesting using X Macros to combine the information (declaration of function and its use in the corresponding method table) into a single list. This led to the following header file:

)" data-file-path="tools/chapel-py/method-tables.h">

From method-tables.h, around line 323

CLASS_BEGIN(FnCall)
  METHOD_PROTOTYPE(FnCall, actuals, "Get the actuals of this FnCall node")
  PLAIN_GETTER(FnCall, used_square_brackets, "Check if this FnCall was made using square brackets",
               "b", return node->callUsedSquareBrackets())
CLASS_END(FnCall)

The PLAIN_GETTER macro in this case is used to define trivial getters (precluding the need for handling the Python-object-to-AST-node conversion, and other CPython-specific things), whereas the METHOD_PROTOTYPE is used to refer to methods that needed explicit implementations. With this, the method tables are generated as follows:

)" data-file-path="tools/chapel-py/chapel.cpp">

From chapel.cpp, around line 548

#define CLASS_BEGIN(TAG) \
  template <> \
  struct PerNodeInfo<chpl::uast::asttags::TAG> { \
    static constexpr PyMethodDef methods[] = {
#define CLASS_END(TAG) \
      {NULL, NULL, 0, NULL}  /* Sentinel */ \
    }; \
  };
#define PLAIN_GETTER(NODE, NAME, DOCSTR, TYPESTR, BODY) \
  {#NAME, NODE##Object_##NAME, METH_NOARGS, DOCSTR},
#define METHOD_PROTOTYPE(NODE, NAME, DOCSTR) \
  {#NAME, NODE##Object_##NAME, METH_NOARGS, DOCSTR},
#include "method-tables.h"

The CLASS_BEGIN generates the initial template <> header and the code up to the opening curly brace of the table definition. Then, for each method, PLAIN_GETTER and METHOD_PROTOTYPE generate the relevant entries. Finally, CLASS_END inserts the sentinel and the closing curly brace.

Another invocation of the macros in method-tables.h is used to generate the implementations of “plain getters”, which is boilerplate that I won’t get into it here, since it’s pretty CPython specific.

Discussion

I’ve presented to you a three applications of the pattern, in an order that happens to be from least to most “extreme”. It’s possible that some of these are over the line for using macros, especially for those who think of macros as unfortunate remnants of C++’s past. However, I think that what I’ve demonstrated demonstrates the versatility of the X Macro pattern – feel free to apply it to the degree that you find appropriate.

The thing I like the most about this pattern is that the header files read quite nicely: you end up with a very declarative “scaffold” of what’s going on. The uast-classes-list.h makes for an excellent and fairly readable reference of all the AST nodes in the Chapel compiler. The method-tables.h header provides a fairly concise summary of what methods are available on what (Python) AST node.

Of course, this approach is not without its drawbacks. Drawback zero is the heavy use of macros: to the best of my knowledge, modern C++ tends to discourage the usage of macros in favor of C++-specific features. Of course, this “pure C++” preference is applicable to variable degrees in different use cases and code bases; because of this, I won’t count macros as (too much of) a drawback.

The more significant downside is that this approach introduces a lot of dependencies between source files. Any time the header changes, anything that uses any part of the code generated by the header must be recompiled. Thus, if you’re generating classes, changing any one class will “taint” any code that uses any of the generated classes. In the Chapel compiler, touching the AST class hierarchy requires a recompilation of all the AST nodes, and any compiler code that uses the AST nodes (a lot). This is because each AST node needs access to the AstTag enum, and that enum is generated from the hierarchy header.

That’s all I have for today! Thanks for reading. I hope you got something useful for your day-to-day programming out of this.

The "Is Something" Pattern in Agda

Thu, 31 Aug 2023 22:15:34 -0700

Agda is a functional programming language with a relatively Haskell-like syntax and feature set, so coming into it, I relied on my past experiences with Haskell to get things done. However, the languages are sufficiently different to leave room for useful design patterns in Agda that can’t be brought over from Haskell, because they don’t exist there. One such pattern will be the focus of this post; it’s relatively simple, but I came across it by reading the standard library code. My hope is that by writing it down here, I can save someone the trouble of recognizing it and understanding its purpose. The pattern is “unique” to Agda (in the sense that it isn’t present in Haskell) because it relies on dependent types.

In my head, I call this the IsSomething pattern. Before I introduce it, let me try to provide some motivation. I should say that this may not be the only motivation for this pattern; it’s just how I arrived at seeing its value.

Suppose you wanted to define a type class for “a type that has an associative binary operation”. In Haskell, this is the famous Semigroup class. Here’s a definition I lifted from the Haskell docs:

class Semigroup a where
  (<>) :: a -> a -> a
  a <> b = sconcat (a :| [ b ])

It says that a type a is a semigroup if it has a binary operation, which Haskell calls (<>). The language isn’t expressive enough to encode the associative property of this binary operation, but we won’t hold it against Haskell: not every language needs dependent types or SMT-backed refinement types. If we translated this definition into Agda (and encoded the associativity constraint), we’d end up with something like this:

From example.agda, lines 9 through 13

    record Semigroup (A : Set a) : Set a where
        field
            _∙_ : A → A → A

            isAssociative : ∀ (a₁ a₂ a₃ : A) → a₁ ∙ (a₂ ∙ a₃) ≡ (a₁ ∙ a₂) ∙ a₃

So far, so good. Now, let’s also encode a more specific sort of type-with-binary-operation: one where the operation is associative as before, but also has an identity element. In Haskell, we can write this as:

class Semigroup a => Monoid a where
    mempty :: a

This brings in all the requirements of Semigroup, with one additional one: an element mempty, which is intended to be the aforementioned identity element for (<>). Once again, we can’t encode the “identity element” property; I say this only to explain the lack of any additional code in the preceding snippet.

In Agda, there isn’t really a special syntax for “superclass”; we just use a field. The “transliterated” implementation is as follows:

From example.agda, lines 15 through 24

    record Monoid (A : Set a) : Set a where
        field semigroup : Semigroup A

        open Semigroup semigroup public

        field
            zero : A

            isIdentityLeft : ∀ (a : A) → zero ∙ a ≡ a
            isIdentityRight : ∀ (a : A) → a ∙ zero ≡ a

This code might require a little bit of explanation. Like I said, the base class is brought in as a field, semigroup. Then, every field of semigroup is also made available within Monoid, as well as to users of Monoid, by using an open public directive. The subsequent fields mimic the Haskell definition amended with proofs of identity.

We get our first sign of awkwardness here. We can’t refer to the binary operation very easily; it’s nested inside of semigroup, and we have to access its fields to get ahold of (∙). It’s not too bad at all – it just cost us an extra line. However, the bookkeeping of what-operation-is-where gets frustrating quickly.

I will demonstrate the frustrations in one final example. I will admit to it being contrived: I am trying to avoid introducing too many definitions and concepts just for the sake of a motivating case. Suppose you are trying to specify a type in which the binary operation has two properties (e.g. it’s a monoid and something else). Since the only two type classes I have so far are Monoid and Semigroup, I will use those; note that in this particular instance, using both is a contrivance, since one contains the latter.

From example.agda, lines 26 through 32

    record ContrivedExample (A : Set a) : Set a where
        field
            -- first property
            monoid : Monoid A

            -- second property; Semigroup is a stand-in.
            semigroup : Semigroup A

However, there’s a problem: nothing in the above definition ensures that the binary operations of the two fields are the same! As far as Agda is concerned (as one would quickly come to realize by trying a few proofs with the code), the two operations are completely separate. One could perhaps add an equality constraint:

From example.agda, lines 26 through 34

    record ContrivedExample (A : Set a) : Set a where
        field
            -- first property
            monoid : Monoid A

            -- second property; Semigroup is a stand-in.
            semigroup : Semigroup A

            operationsEqual : Monoid._∙_ monoid ≡ Semigroup._∙_ semigroup

However, this will get tedious quickly. Proofs will need to leverage rewrites (via the rewrite keyword, or via cong) to change one of the binary operations into the other. As you build up more and more complex algebraic structures, in which the various operations are related in nontrivial ways, you start to look for other approaches. That’s where the IsSomething pattern comes in.

The `IsSomething` Pattern: Parameterizing By Operations

The pain point of the original approach is data flow. The way it’s written, data (operations, elements, etc.) flows from the fields of a record to the record itself: Monoid has to read the (∙) operation from Semigroup. The more fields you add, the more reading and reconciliation you have to do. It would be better if the data flowed the other direction: from Monoid to Semigroup. Monoid could say, “here’s a binary operation; it must satisfy these constraints, in addition to having an identity element”. To provide the binary operation to a field, we use type application; this would look something like this:

From example.agda, line 42

`42`	`isSemigroup : IsSemigroup _∙_`

Here’s the part that’s not possible in Haskell: we have a record, called IsSemigroup, that’s parameterized by a value – the binary operation! This new record is quite similar to our original Semigroup, except that it doesn’t need a field for (∙): it gets that from outside. Note the additional parameter in the record header:

From example.agda, lines 37 through 38

37
38

    record IsSemigroup {A : Set a} (_∙_ : A → A → A) : Set a where
        field isAssociative : ∀ (a₁ a₂ a₃ : A) → a₁ ∙ (a₂ ∙ a₃) ≡ (a₁ ∙ a₂) ∙ a₃

We can define an IsMonoid similarly:

From example.agda, lines 40 through 47

    record IsMonoid {A : Set a} (zero : A) (_∙_ : A → A → A) : Set a where
        field
            isSemigroup : IsSemigroup _∙_

            isIdentityLeft : ∀ (a : A) → zero ∙ a ≡ a
            isIdentityRight : ∀ (a : A) → a ∙ zero ≡ a

        open IsSemigroup isSemigroup public

We want to make an “is” version for each algebraic property; this way, if we want to use “monoid” as part of some other structure, we can pass it the required binary operation the same way we passed it to IsSemigroup. Finally, the contrived motivating example from above becomes:

From example.agda, lines 49 through 55

    record IsContrivedExample {A : Set a} (zero : A) (_∙_ : A → A → A) : Set a where
        field
            -- first property
            monoid : IsMonoid zero _∙_

            -- second property; Semigroup is a stand-in.
            semigroup : IsSemigroup _∙_

Since we passed the same operation to both IsMonoid and IsSemigroup, we know that we really do have a single operation with both properties, no strange equality witnesses or anything necessary.

Of course, these new records are not quite equivalent to our original ones. They need to be passed a binary operation; a “complete” package should include the binary operation in addition to its properties encoded as IsSemigroup or IsMonoid. Such a complete package would be more-or-less equivalent to our original Semigroup and Monoid instances. Here’s what that would look like:

From example.agda, lines 57 through 66

    record Semigroup (A : Set a) : Set a where
        field
            _∙_ : A → A → A
            isSemigroup : IsSemigroup _∙_

    record Monoid (A : Set a) : Set a where
        field
            zero : A
            _∙_ : A → A → A
            isMonoid : IsMonoid zero _∙_

Agda calls records that include both the operation and its IsSomething record bundles (see Algebra.Bundles, for example). Notice that the bundles don’t rely on other bundles to define properties; that would lead right back to the “bottom-up” data flow in which a parent record has to access the operations and values stored in its fields. Hower, bundles do sometimes “contain” (via a definition, not a field) smaller bundles, in case, for example, you need only a semigroup, but you have a monoid.

Bonus: Using Parameterized Modules to Avoid Repetitive Arguments

One annoying thing about our definitions above is that we had to accept our binary operation, and sometimes the zero element, as an argument to each one, and to thread it through to all the fields that require it. Agda has a nice mechanism to help alleviate some of this repetition: parameterized modules. We can define a whole module that accepts the binary operation as an argument; it will be implicitly passed as an argument to all of the definitions within. Thus, our entire IsMonoid, IsSemigroup, and IsContrivedExample code could look like this:

From example.agda, lines 68 through 87

module ThirdAttempt {A : Set a} (_∙_ : A → A → A) where
    record IsSemigroup : Set a where
        field isAssociative : ∀ (a₁ a₂ a₃ : A) → a₁ ∙ (a₂ ∙ a₃) ≡ (a₁ ∙ a₂) ∙ a₃

    record IsMonoid (zero : A) : Set a where
        field
            isSemigroup : IsSemigroup

            isIdentityLeft : ∀ (a : A) → zero ∙ a ≡ a
            isIdentityRight : ∀ (a : A) → a ∙ zero ≡ a

        open IsSemigroup isSemigroup public

    record IsContrivedExample (zero : A) : Set a where
        field
            -- first property
            monoid : IsMonoid zero

            -- second property; Semigroup is a stand-in.
            semigroup : IsSemigroup

The more IsSomething records you declare, the more effective this trick becomes.

Conclusion

That’s all I have! The pattern I’ve described shows up all over the Agda standard library; the example that made me come across it was the Algebra.Structures module. I hope you find it useful.

Happy (dependently typed) programming!

Proving My Compiler Code Incorrect With Alloy

Sun, 04 Jun 2023 21:56:00 -0700

Disclaimer: though “my compiler code” makes for a fun title, I do not claim exclusive credit for any of the C++ code in the Chapel compiler that I mention in this post. The code is “mine” in the sense that I was debugging changes I was making, and perhaps also in the sense that I was working with it.

I work as a compiler developer on the Chapel team. Recently, while thinking through a change to some code, I caught myself making wishes: “if only I could have a computer check this property for me”. Having at some point seen Hillel Wayne’s post about the release of Alloy 6, I thought I’d give it a go. In this post, I describe my experience applying Alloy to a real part of the Chapel compiler. I’d never touched Alloy before this, so be warned: this is what I came up with on my own attempt, and I may well be doing something fairly silly by the standards of “real” Alloy users.

The Problem at Hand

One of the things that a language like Chapel has to do is called resolution, which is the process of figuring out what each identifier, like x, refers to, and what its type is. Even the first part of that is pretty complicated, what with public and private variables, methods (which can be declared outside of their receiver type in Chapel), and more…

Scope resolution in Chapel is further complicated by the fact that the same scope might need to be searched multiple times, in different contexts. Let me start with a few examples to illustrate what I mean. Here’s the first program:

module M {
    class C {}

    // A regular procedure (not a method)
    proc foo() {}

    // A method on C.
    proc C.foo() {}

    // Another method on C.
    proc C.doSomething() {
        foo();
    }
}

If you don’t know Chapel (and you probably don’t!) this program already merits a fair bit of explanation. I’ve collapsed it for the sake of visual clarity; feel free to expand the below section to learn more about the language features used in the program above.

Click here for an explanation of the above code snippet

A module in Chapel (declared via a module keyword) is just a collection of definitions. Such definitions could include variables, methods, classes and more. Putting them in a module helps group them.

A class in Chapel (declared via a class keyword) is much like a class in object oriented languages. The class C that we’re creating on line 2 doesn’t have any fields or methods – at least not yet. We will, however, add methods to it using Chapel’s secondary method mechanism (more on that in a moment).

The proc keyword is used to create functions and methods. On line 5, we create a procedure called foo that does nothing. On line 8, because we write C.foo instead of just foo, we’re actually creating a method on the class C we declared earlier. This method does nothing too. Notice that although declaring classes in Chapel works about the same as declaring classes in other languages, it’s fairly unusual to be able to declare a class method (like the foo on line 8 in this case) outside of the class C { ... } section of code. This is part of the reason that Chapel method resolution is complicated (methods can be declared anywhere!). The only other language that I know of that supports this feature is Kotlin with its extension function mechanism, but it’s possible that other languages have similar functionality.

The interesting part of the snippet is the body of the doSomething method. It has a call to foo: but which foo is it referring to? There are two: the regular procedure (non-method) foo, declared on line 5, and the method C.foo declared on line 8. In Chapel, the rules dictate that when such a situation arises, and a fitting method is found, the method is preferred to the non-method. In the rewritten version of the Chapel compiler, titled Dyno, this disambiguation is achieved by first searching the scopes visible from the class C for methods only. In this particular example, the two scopes searched will be:

The inside of class C. The class itself doesn’t define any methods, so nothing is found.
The module in which C is defined (M in this case). This module does have a method, the one on line 8, so that one is returned.

Only if methods are not found are non-methods considered. In this situation, the search order will be as follows:

The inside of C.doSomething will be searched. doSomething doesn’t declare anything, so the search will come up empty.
The module in which C.doSomething is defined (M again) will be searched. This time, both methods and non-methods will be considered. Since we’re considering a hypothetical situation in which the method C.foo isn’t there (otherwise it would’ve been found earlier), the only thing that will be found will be the non-method foo.

Notice that we’ve already had to search the module M twice, looking for different things each time. First, we were looking for only methods, but later, we were looking for anything. However, this isn’t as complicated as things can get. The simplifying aspect of this program is that both doSomething and C are defined inside the module M, and therefore have access to its private methods and procedures. If we extracted C.doSomething into its own separate module, the program would look like this.

module M1 {
    class C {}

    // A regular procedure (not a method)
    proc foo() {}

    // A method on C.
    proc C.foo() {}
}
module M2 {
    use super.M1;

    // Another method on C.
    proc C.doSomething() {
        foo();
    }
}

Since doSomething is now in another module, it can’t just access the foos from M1 willy-nilly. There are a few ways to get the things that were declared in another module out and make use of them. I opted for a use statement, which, in its simplest form, just brings all the declarations inside the used module into the current scope. Thus, the use statement on line 11 would bring all things declared in M1 into the scope inside M2. There’s a catch, though. Since M2 is not declared inside M1, a use statement will not be able to bring in private symbols from M1 (they’re private for a reason!). So, this time, when searching the scope for M1, we will have to search only for public symbols. That’s another, different way of searching M1. So far, we’ve seen three:

Search M1 for any symbol.
Search M1 for methods only.
Search M1 for public symbols only.

Dyno introduces more ways to search within a scope, including combinations of search types, such as looking only for public methods. To represent the various search configurations, the Dyno team came up with using a bitfield of flags, each of which indicated a necessary condition for a symbol to be returned. A bitfield with flags set for two properties (like “public” and “method”) requires that both such properties be found on each symbol that’s returned from a scope. This led to C++ code along the lines of:

auto allPublicSymbols = Flags::PUBLIC;
auto allPublicMethods = Flags::PUBLIC | Flags::METHOD;

It also turned out convenient to add negative versions of each flag (NOT_PUBLIC for private symbols, NOT_METHOD for regular old procedures and other definitions, and so on. So, some other possible flag combinations include:

auto allNonMethods = Flags::NOT_METHOD;
auto privateMethods = Flags::NOT_PUBLIC | Flags::METHOD;

Given these flags, there are some situations in which checking a scope a second time is redundant, in that it is guaranteed to find no additional symbols. For instance, if you search a scope for all public symbols, and then subsequently search for all public methods, you will only find duplicates – after all, all public methods are public symbols. Most generally, this occurs when a second search has all the flags from a previous search, and maybe more. In math lingo, if the set of flags checked the first time is a subset of the set of flags checked the second time, it’s guaranteed not to find anything new.

In Dyno, we like to avoid additional work when we can. To do so, we track which scopes have already been searched, and avoid searching them again. Since what comes up from a search depends on the flags, we store the flags alongside the scopes we’ve checked. If we find that the previously-checked bitfield is a subset of the current bitset, we just skip the search.

But then, what if it isn’t a subset? Another concern here is avoiding duplicate results (it’s easier to check for duplicate definitions if you know a symbol is only returned from a search once). So, another feature of Dyno’s scope search is an additional bitfield of what to exclude, which we set to be the previous search’s filter. So if the first search looked for symbols matching description $A$, and the second search is supposed to look for symbols matching description $B$, then really we do a search for $A \land \lnot B$ (that is, $A$ and not $B$).

Hold on, why do you need a whole another bitfield? There are already negated versions of each flag available. Can't you just add those to the filter? Good question. The difference is a little bit tricky. If we just negated each flag, we'd turn an expression like $A \land B$ into $\lnot A \land \lnot B$. However, according to De Morgan's laws, the proper negation of $A \land B$ is $\lnot A \lor \lnot B$ (notice the use of "or" instead of "and"). On the other hand, using an "exclude" bitfield negates the whole conjunction, rather than the individual flags, and so gives us the result we need.

One last thing: what happens if there were two previous searches? What we need is to to somehow combine the two filters into one. Taking a cue from a previous example, in which “public” was followed by “public methods”, we can observe that since the second search has additional flags, it’s more restrictive, and thus guaranteed to not find anything. So we try to create the least restrictive bitfield possible, by taking an intersection of the flags used.

Actually, that last point is not quite correct in every possible case (taking the intersection is not always the right thing to do). However, running the code through our test suite, we did not notice any cases in which it misbehaved. So, noting the potential issue in a comment, we moved on to other things.

That is, until I decided that it was time to add another possible flag to the bitfield. At that point, sitting and trying to reason about the possible cases, I realized that it would be much nicer to describe this mathematically, and have a model checker generate outlandish scenarios for me.

Modeling Flags and Bitsets in Alloy

Flags are represented on the C++ side as an enum (with custom indexing so as to make each flag be exactly one bit). I checked, and it looked like Alloy had an enum feature, too! I started off by making an enum of the flags I wanted to play with.

From DynoAlloy.als, line 1

`1`	`enum Flag {Method, MethodOrField, Public}`

We haven’t seen the MethodOrField flag, but it’s an important one. It turns out that it’s much more common to look for anything that could be part of a class, rather than just its methods. This flag is itself an “or” of two properties (something being a method and something being a class field). Note that this is not the same as having two flags, Method and Field, and always including them together (because that would be an “and”, not an “or”).

Notice also that the list of flags doesn’t include the negative versions. Since the negative versions are one-for-one with the positive ones, I instead chose to represent bitfields as simply two sets: one set of “positive” flags, in which the presence of e.g. Method indicates that the METHOD flag was set, and one set of “negative” flags, in which the presence of Method indicates that NOT_METHOD was set. This way, I’m guaranteed that there’s a positive and negative version of each flag, automatically. Here’s how I wrote that in Alloy.

From DynoAlloy.als, lines 6 through 9

sig Bitfield {
    , positiveFlags: set Flag
    , negativeFlags: set Flag
}

This definition (a signature in Alloy terms) specifies what a bitfield is like, but not any operations on it. My next order of business is to define some common functionality on bitfields. Alloy is all about relations and predicates, so for all of these, I had to effectively write something that checks if some condition holds for some arguments. This might seem abstract; as an example, here’s bitfieldEmpty, which checks that a bitfield has no flags set.

From DynoAlloy.als, lines 26 through 28

26
27
28

pred bitfieldEmpty[b: Bitfield] {
    #b.positiveFlags = 0 and #b.negativeFlags = 0
}

The # operator in Alloy is used to check the size of a set. So, to check if a bitfield is empty, I simply check if there are neither positive nor negative flags. Probably the most unusual aspect of this piece of code is that equality is written as =, as opposed to == like in most common languages. This is because, like I said, Alloy is all about relations and predicates, and not at all about imperative manipulation of data. So, there’s no need to reserve = for assignment.

The next step from here is a predicate that accepts two arguments, bitfieldEqual. As its name suggests, this predicate accepts two bitfields, and makes sure they have exactly the same flags set.

From DynoAlloy.als, lines 30 through 32

30
31
32

pred bitfieldEqual[b1: Bitfield, b2: Bitfield] {
    b1.positiveFlags = b2.positiveFlags and b1.negativeFlags = b2.negativeFlags
}

So far, this has been pretty similar to just writing boolean functions in a language like C++. However, the similarity is only superficial. An easy way to see that is to try to determine the intersection of two bitfields – that’s the operation we will be having to model, since the Dyno implementation uses & to combine filter sets. In a language like C++, you might write a function like the following, in which you accept two bitfield arguments and return a new bitfield.

Bitfield intersection(Bitfield b1, Bitfield b2) { /* ... */ }

However, in Alloy, you can’t create a new bitfield, nor return something from a pred that isn’t a boolean. Instead, you describe how the inputs will be related to the output. So, to model a binary function, you end up with a three-parameter predicate: two inputs, and one output. But how does the output of a bitfield intersection connect to the two operands being intersected? Well, its two flag sets will be intersections of the flag sets of the inputs!

From DynoAlloy.als, lines 34 through 37

pred bitfieldIntersection[b1: Bitfield, b2: Bitfield, b3: Bitfield] {
    b3.positiveFlags = b1.positiveFlags & b2.positiveFlags
    b3.negativeFlags = b1.negativeFlags & b2.negativeFlags
}

Next, let’s talk about what flags do. They are used to include and exclude symbols based on certain properties. One property is being a method: a METHOD flag requires this property, whereas a NOT_METHOD flag ensures that a symbol does not have it. Another property is being a public definition: if a symbol isn’t public, it’ll be ignored by searches with the PUBLIC flag set. Just like a bitfield can have multiple flags, a symbol can have multiple properties (e.g., a public method). Unlike our bitfields, though, we won’t be modeling symbols as having both positive and negative properties. That is to say, we won’t have a “not public” property: the absence of the “public” property will be enough to make something private. Here’s the Alloy definition for everything I just said:

From DynoAlloy.als, lines 59 through 63

enum Property { PMethod, PField, PPublic }

sig Symbol {
    properties: set Property
}

Now, we can specify how flags in a bitfield relate to properties on a symbol. We can do so by saying which flags match which properties. The Method flag, for instance, will be satisfied by the PMethod property. The MethodOrField flag is more lenient, and will be satisfied by either PMethod or PField. Here’s a predicate flagMatchesProperty that encodes all the flag-property combinations:

From DynoAlloy.als, lines 65 through 69

pred flagMatchesProperty[flag: Flag, property: Property] {
    (flag = Method and property = PMethod) or
    (flag = MethodOrField and (property = PMethod or property = PField)) or
    (flag = Public and property = PPublic)
}

A bitfield matching a symbol is a little bit more complicated. Said informally, the condition for a bitfield matching a symbol is twofold:

Every single positive flag, like METHOD, must be satisfied by a property on the symbol.
None of the negative flags, like NOT_METHOD, must be satisfied by a property on the symbol (that is to say, if Method is in the negative flags set, then the symbol must not have PMethod property). It is more conveniently to formulate this – equivalently – as follows: for each negative flag, there must not be a property that satisfies it.

Each of the above two conditions translates quite literally into Alloy:

From DynoAlloy.als, lines 71 through 74

pred bitfieldMatchesProperties[bitfield: Bitfield, symbol: Symbol] {
    all flag: bitfield.positiveFlags | some property: symbol.properties | flagMatchesProperty[flag, property]
    all flag: bitfield.negativeFlags | no property: symbol.properties | flagMatchesProperty[flag, property]
}

We can read line 73 as “for each flag in a bitfield’s positive flags, there must be some property in the symbol that matches it”. Similarly, line 74 can be read out loud as “for each flag in the negative flags, no property in the symbol must match it”.

We’ve written a fair bit of Alloy. If you’re anything like me, you might be getting a bit twitchy: how do we even check that any of this works? For this, we’ll need to run our model. We will give Alloy a claim, and ask it to find a situation in which that claim holds true. The simplest claim is “there exists a bitfield”.

From DynoAlloy.als, lines 76 through 78

76
77
78

bitfieldExists: run {
    some Bitfield
}

Executing this model yields a pretty interesting bitfield: one in which every single flag is set – both the positive and negative versions.

Alloy’s output satisfying “a bit field exists”

That’s a little bit ridiculous: this bitfield will never match anything! You can’t be and not be a method at the same time, for instance. For for a more interesting example, let’s ask for a bitfield that matches some symbol.

From DynoAlloy.als, lines 80 through 82

80
81
82

matchingBitfieldExists: run {
    some bitfield : Bitfield, symbol : Symbol | bitfieldMatchesProperties[bitfield, symbol]
}

The output here is pretty interesting too. Alloy finds a symbol and a bitfield that matches it, but they’re both empty. In effect, it said: “if you don’t specify any filters, any private definition will match”. Fair enough, of course, but a curious departure from the previous maximalist “put in all the flags!” approach.

Alloy’s output satisfying “a bit field that matches a symbol exists”

Let’s try nudge it towards a more interesting case. I’m going to ask for a filter with one positive and one negative flag, and a symbol with two properties.

From DynoAlloy.als, lines 84 through 91

matchingBitfieldExists2: run {
    some bitfield : Bitfield, symbol : Symbol {
        #bitfield.positiveFlags = 1
        #bitfield.negativeFlags = 1
        #symbol.properties = 2
        bitfieldMatchesProperties[bitfield, symbol]
    }
}

The results are more interesting this time: we get a filter for private methods, and a private symbol that was… both a field and a method?

Alloy’s spiced up output satisfying “a bit field that matches a symbol exists”

We never told Alloy that a symbol can’t be both a field and a method. It had no idea what the flags meant, just that they exist. To let Alloy know what we do – that the two properties are incompatible – we can use a fact. To me, the most natural way of phrasing this is “there is never a symbol that has both the method and field properties”. Alas, Alloy doesn’t have a never keyword; it only has always. So I opt instead for an alternative formulation: “there are always zero symbols that are both methods and fields”. In Alloy, the claim looks like this:

From DynoAlloy.als, lines 93 through 98

fact "method and field are incompatible" {
    always no symbol: Symbol | {
        PMethod in symbol.properties and PField in symbol.properties
    }
}

Re-running the example program with this fact, Alloy spits out a filter for public non-method symbols, and a symbol that’s a public field. Public fields also aren’t a thing in Chapel (all fields in a class are publicly readable in the current version of the language). Perhaps it’s time for another fact.

From DynoAlloy.als, lines 99 through 103

fact "public and field are incompatible" {
    always no symbol: Symbol | {
        PPublic in symbol.properties and PField in symbol.properties
    }
}

But now, Alloy fails to come up with anything at all. That makes sense: by restricting the search to a symbol with two properties, and making PField incompatible with the other two possible properties, we’ve guaranteed that our symbol would be a public method. But then, we also required a negative flag in the filter; however, all the flags in the list match a public method, so making any of them negative would guarantee that our symbol would not be found. Let’s change the example up a bit to only ask for positive flags.

From DynoAlloy.als, lines 105 through 111

matchingBitfieldExists3: run {
    some bitfield : Bitfield, symbol : Symbol {
        #bitfield.positiveFlags = 2
        #symbol.properties = 2
        bitfieldMatchesProperties[bitfield, symbol]
    }
}

This time, Alloy gives us a symbol that’s a public method, and a filter that only looks for public methods. Fair enough.

Alloy’s spiced up output satisfying “a bit field that matches a symbol exists”

Exploring Possible Search Configurations

So now we have a descriptioin of filters and symbols in scopes. The next thing on the itinerary is modeling how the filters (include and exclude) are configured during scope search in Dyno. For this, let’s take a look at the C++ code in Dyno.

I’ll be using the branch that I was working on at the time of trying to apply Alloy. First, here’s the code in C++ that defines the various flags I’d be working with (though I’ve omitted flags that are not currently used in the implementation).

)" data-file-path="frontend/include/chpl/resolution/scope-types.h">

From scope-types.h, around line 45

  enum {
    /** Public */
    PUBLIC = 1,
    /** Not public (aka private) */
    NOT_PUBLIC = 2,
    /** A method or field declaration */
    METHOD_FIELD = 4,
    /** Something other than (a method or field declaration) */
    NOT_METHOD_FIELD = 8,
    /** A method declaration */
    METHOD = 64,
    /** Something other than a method declaration */
    NOT_METHOD = 128,
  };

These are the flags that we model using a Bitset: PUBLIC, METHOD_FIELD, and METHOD are modeled using positiveFlags, and NOT_PUBLIC, NOT_METHOD_FIELD, and NOT_METHOD are modeled using negativeFlags. There are a lot of flags here, and it’s not hard to imagine that some combination of these flags will cause problems in our system (particularly when we know it’s an approximation). However, the flags aren’t used arbitrarily; in fact, it wasn’t too hard to track down the most important place in the code where bitsets are built.

)" data-file-path="frontend/lib/resolution/scope-queries.cpp">

From scope-queries.cpp, around line 914

  IdAndFlags::Flags curFilter = 0;
  /* ... some unrelated code ... */
  if (skipPrivateVisibilities) {
    curFilter |= IdAndFlags::PUBLIC;
  }
  if (onlyMethodsFields) {
    curFilter |= IdAndFlags::METHOD_FIELD;
  } else if (!includeMethods && receiverScopes.empty()) {
    curFilter |= IdAndFlags::NOT_METHOD;
  }

The above code converts the current search parameters into Bitfield flags. For instance, if a use statement is being processed that doesn’t have access to private fields, skipPrivateVisibilities will be set. On the other hand, if the calling code didn’t explicitly ask for methods, and if there’s no method receiver, then the last condition will be true. These various conditions are converted into bits and applied to curFilter. Then, curFilter is used for looking up symbols in a scope.

It’s not too hard to model this by just looking at the code, and enumerating the possibilities. The first if statement can either be true or false, and then the subsequent if-else chain creates three possibilities in each case: either METHOD_FIELD is set, or NOT_METHOD, or nothing.

However, I envisioned this condition to possibly grow in complexity as more search configurations became necessary (in that, the NOT_METHOD option was an addition in my new branch). I therefore chose to model the possible Bitfield values more faithfully, by mimicking the imperative C++ code.

Wait, something sounds off. Just earlier, you said Alloy "is not at all about imperative manipulation of data". But now, we're going to mimic plain imperative C++ code? Alloy the programming language is still not imperative. However, we can model imperative behavior in Alloy. The way I see it, doing so requires us to venture a tiny bit into the realm of semantics for programming languages, in particular for imperative languages. This "venture" is very minimal though, and you really don't need to know much about semantics to understand it. Alright. How does one model imperative behavior in Alloy? On to that next.

The essential piece of insight to modeling an imperative language, though it sounds a little bit tautological, is that statements are all about manipulating state. For example, state could be the value of a variable. If you start with the variable x storing the number 6, and then execute the statement x = x * 7, the final value of x will be 42. Thus, state has changed. To put this in terms Alloy would understand – relations and sets – a statement connects (relates) states before it’s executed to states after it’s executed. In our particular example, the connection would between the state x = 6 and the state x = 42. In the case of adding the PUBLIC to curFilter, as on line 917 in the above code block, we could state the relationship as follows:

addBitfieldFlag[bitfieldBefore, bitfieldAfter, Public]

The above code states that bitfieldAfter (the state after line 917) is the same Bitfield as bitfieldBefore (the state before line 917), except that the Public flag has been added to it.

Things are a little more complicated when it comes to modeling the whole if-statement on line 916. If we wanted to be very precise, we’d need to encode the other variables (such as skipPrivateVisibilities), how they’re set, and what values are possible. However, for the sake of keeping the scope of this model manageable for the time being, I’m content to do something simpler – that is, acknowledge that the code on line 917 may or may not run. If it does run, our previous addBitfieldFlag will be the correct restriction on the before and after states. However, if it doesn’t, the state shouldn’t change at all. Therefore, we can model lines 916 through 918 as follows (notice the or):

addBitfieldFlag[bitfieldBefore, bitfieldAfter, Public] or
bitfieldEqual[bitfieldBefore, bitfieldAfter]

The next thing to note is that there are two if statements one after another. The state “after” the first statement is one and the same as the state “before” the second statement. Using arrows to represent the “before-after” relationship created by each statement, we can visualize the whole situation as follows:

$$ \text{initial state} \xRightarrow{\text{first statement}} \text{middle state} \xRightarrow{\text{second statement}} \text{final state} $$

We’ll write our Alloy code to match:

/* First if statement */
addBitfieldFlag[bitfieldBefore, bitfieldMiddle, Public] or
bitfieldEqual[bitfieldBefore, bitfieldMiddle]

/* ... something connecting bitfieldMiddle and bitfieldAfter ... */

From here, we can handle the second if/else chain in the same way we did the first if-statement: by making all three outcomes of the chain be possible, and creating an or of all of them.

/* First if statement */
addBitfieldFlag[bitfieldBefore, bitfieldMiddle, Public] or
bitfieldEqual[bitfieldBefore, bitfieldMiddle]

/* Second if statement */
addBitfieldFlag[bitfieldMiddle, bitfieldAfter, MethodOrField] or
addBitfieldFlagNeg[bitfieldMiddle, bitfieldAfter, Method] or
bitfieldEqual[bitfieldMiddle, bitfieldAfter]

So that helps model the relevant Dyno code. However, what we really want is an Alloy predicate that classifies possible outcomes of the piece of code: is a particular combination of flags possible or not? Here’s the piece of Alloy that does so:

From DynoAlloy.als, lines 113 through 132

pred possibleState[filterState: FilterState] {
    some initialState: FilterState {
        // Each lookup in scope starts with empty filter flags
        bitfieldEmpty[initialState.curFilter]

        // The intermediate states (bitfieldMiddle) are used for sequencing of operations.
        some bitfieldMiddle : Bitfield {
            // Add "Public" depending on skipPrivateVisibilities
            addBitfieldFlag[initialState.curFilter, bitfieldMiddle, Public] or
            bitfieldEqual[initialState.curFilter, bitfieldMiddle]

            // If it's a method receiver, add method or field restriction
            addBitfieldFlag[bitfieldMiddle, filterState.curFilter, MethodOrField] or
            // if it's not a receiver, filter to non-methods (could be overridden)
            // addBitfieldFlagNeg[bitfieldMiddle, filterState.curFilter, Method] or
            // Maybe methods are not being curFilterd but it's not a receiver, so no change.
            bitfieldEqual[bitfieldMiddle, filterState.curFilter]
        }
    }
}

The FilterState on the first line (and elsewhere, really), is new. I’m trying to be explicit about the state in this particular computation. Its definition is very simple: currently, the only state we care about is the Bitfield corresponding to curFilter in the C++ code above.

From DynoAlloy.als, lines 12 through 14

12
13
14

sig FilterState {
    , curFilter: Bitfield
}

There’s not much more to the predicate. It says, in English, that a state filterState is possible if, starting from an empty initial state initialState, the model of our C++ code can end up with its particular set of flags in the curFilter bitfield.

Modeling Search State

Next, I needed to model the behavior the I described earlier: searching for $A \land \lnot B$, and taking the intersection of past searches when running subsequent searches.

Dyno implemented this roughly as follows:

It kept a mapping of (searched scope → search bitfield). Initially, this mapping was empty.
When a scope was searched for the first time, its curFilter / search bitfield was stored into the mapping.
When a scope was searched after that, the previously-stored flags in the mapping were excluded (that’s the $A\land\lnot B$ behavior), and the bitfield in the mapping was updated to be the intersection of curFilter and the stored flags.

We’ll simplify the model by doing away with the mapping, and considering only a single scope that is searched many times. We’ll represent the stored flags as a field found, which will be one of two things: either a Bitfield representing the previously-stored search configuration, or a NotSet sentinel value, representing a scope that hasn’t been searched yet. The Alloy code:

From DynoAlloy.als, lines 21 through 23

21
22
23

one sig SearchState {
    , var found: Bitfield + NotSet
}

The NotSet sentinel value is defined in a very simple way:

From DynoAlloy.als, line 17

`17`	`one sig NotSet {}`

Both of these signatures use a new keyword, one. This keyword means that there’s only a single instance of both NotSet and SearchState in our model. This is in contrast to a signature like Bitfield, which allows multiple bitfields to exist at the same time. I ended up with a pretty simple predicate that implemented the “store if not set, intersect if set” behavior in Alloy:

From DynoAlloy.als, lines 147 through 150

pred updateOrSet[toSet: Bitfield + NotSet, setTo: FilterState] {
    (toSet in NotSet and toSet' = setTo.curFilter) or
    (toSet not in NotSet and update[toSet, setTo])
}

If you look closely, this predicate uses a feature of Alloy we haven’t really seen: its ability to reason about time by dipping into temporal logic. Notice that the predicate is written not just in terms of toSet, but also toSet'. The tick (which I personally read as “prime”) indicates that what we’re talking about is not the current value of toSet, but its value at the next moment in time.

The first line of the predicate represents the second item from the list above: if a scope hasn’t been searched before (represented by the present value of toSet being NotSet) the future value (represented by toSet') is just the current filter / bitfield. The second line handles the third item from the list, updating a previously-set filter based on new flags. I defined an additional predicate to help with this:

From DynoAlloy.als, lines 138 through 140

138
139
140

pred update[toSet: Bitfield + NotSet, setTo: FilterState] {
    toSet' in Bitfield and bitfieldIntersection[toSet, setTo.curFilter, toSet']
}

What this predicate says is that at the next moment, the value of toSet will be equal to its present value intersected with curFilter. I also had to specify that the future value of toSet, will still be a Bitfield after the step, and would not revert to a NotSet.

With the updateOrSet predicate in hand, we can actually specify how our model will evolve. To do so, we first need to specify the initial conditions. In particular, our scope will start out not having been searched; its flags will be NotSet.

From DynoAlloy.als, lines 138 through 140

138
139
140

pred update[toSet: Bitfield + NotSet, setTo: FilterState] {
    toSet' in Bitfield and bitfieldIntersection[toSet, setTo.curFilter, toSet']
}

Next, we must specify that our SearchState changes in a very particular way: each step, the code invokes a search, and the state is modified to record that the search occurred. Each search is described via curFilter in a filterState. We want to ensure that curFilter is a reasonable filter (that is, it’s a combination of flags that can actually arise in the C++ program). To ensure this, we can use the possibleState predicate from earlier. From there, the updateOrSet predicate can be used to specify that this step’s curFilter is saved, either as-is (if no searches occurred previously) or as an intersection (if this is not the first search). The whole fact corresponding to this is below:

From DynoAlloy.als, lines 161 through 175

fact step {
    always {
        // Model that a new doLookupInScope could've occurred, with any combination of flags.
        all searchState: SearchState {
            some fs: FilterState {
                // This is a possible combination of lookup flags
                possibleState[fs]

                // If a search has been performed before, take the intersection; otherwise,
                // just insert the current filter flags.
                updateOrSet[searchState.found, fs]
            }
        }
    }
}

Asking for Counterexamples

As we’ve already seen, Alloy works by finding examples: combinations of various variables that match our requirements. It won’t be sufficient to ask Alloy for an example of our code doing what we expect: if the code malfunctions nine times out of ten, Alloy will still find us the one case in which it works. It won’t tell us much.

Instead, we have to ask it to find a counterexample: a case which does not work. If Alloy succeeds in finding such an example, the code we’re modeling has a bug. Of course, to make all this work, you need to know what to ask. There’s no way to tell Alloy, “find me a bug” – we need to be more specific. I had to focus on bugs I was most worried about.

If the stored combination of flags (in found) evolves into a bad configuration, things can go wrong in two ways. The first is that we will somehow exclude symbols from the lookup that shouldn’t have been excluded. In other words, can past searches break future searches?

I came up with the following Alloy (counter)example to model this situation. It’s a little bit long; there are comments there to explain what it does, and I’ll go through below.

From DynoAlloy.als, lines 177 through 202

counterexampleNotFound: run {
    all searchState: SearchState {
        // a way that subsequent results of searching will miss things.
        eventually some symbol: Symbol,
                        fs: FilterState, fsBroken: FilterState,
                        exclude1: Bitfield, exclude2: Bitfield {
            // Some search (fs) will cause a transition / modification of the search state...
            possibleState[fs]
            updateOrSet[searchState.found, fs]
            excludeBitfield[searchState.found, exclude1]
            // Such that a later, valid search... (fsBroken)
            possibleState[fsBroken]
            excludeBitfield[searchState.found', exclude2]

            // Will allow for a symbol ...
            // ... that are left out of the original search...
            not bitfieldMatchesProperties[searchState.found, symbol]
            // ... and out of the current search
            not (bitfieldMatchesProperties[fs.curFilter, symbol] and not bitfieldMatchesProperties[exclude1, symbol])
            // But would be matched by the broken search...
            bitfieldMatchesProperties[fsBroken.curFilter, symbol]
            // ... to not be matched by a search with the new state:
            not (bitfieldMatchesProperties[fsBroken.curFilter, symbol] and not bitfieldMatchesProperties[exclude2, symbol])
        }
    }
}

This example asks that at some point in time, things “go wrong”. In particular, will there by a symbol (symbol) that hasn’t been found yet, such that a search for a particular filter (fs) will break the system, making a subsequent search fsBroken not find symbol even though it should have?

The possibleState, updateOrSet, and excludeBitfield lines encode the fact that a search occurred for fs. This must be a valid search, and the search state must be modified appropriately. Furthermore, at the time this search takes place, to make the $\lnot B$ portion of the algorithm work, the bitfield exclude1 will be set based on the previous search state.

The next two lines, possibleState and excludeBitfield, set the stage for the broken search: fsBroken is a another valid search, and at the time it happens, the bitfield exclude2 is set based on previous search state. Since fsBroken occurs after fs, its “previous search state” is actually the state after fs, so we use found' instead of found.

Finally, the subsequent four lines of code describe the issue: the symbol in question has not been found before fs, and nor will it be found by fs. That means thus far, it hasn’t been reported to the user. Therefore, if the symbol matches fsBroken, it ought to be reported: we haven’t seen it yet, and here we’re being asked for something matching the symbol’s description! However, as per the last line of code, searching for fsBroken together with the appropriate set of exclude flags, we still don’t find symbol. That’s a problem!

Unfortunately, Alloy finds a model that satisfies this constraint. There are a lot of moving parts, so the output is a bit difficult to read. I did my best to clean it up by turning off some arrows. Our system is spanning multiple “moments” in time, so a single picture won’t describe the bug entirely. Here’s the diagram Alloy outputs for the first state:

Figure representing the initial state according to Alloy

We can get a lot out of this figure. First, the symbol-to-be-lost is a private method (it doesn’t have the PPublic property, and it does have the PMethod property). Also, Alloy immediately gives away what fs and fsBroken will be: eventually, when the user searches for all non-methods (negativeFlags: Method are the giveaway there), their subsequent search for anything will fail to come up with our private method, even though it should. To gather more details about this broken case, we can look at the state that follows the initial one.

Figure representing the second state according to Alloy

The main difference is that found has changed from NotSet (because no searches occurred) to FilterState1. This indicates that the first search was for all Public symbols (which our method is not). There is only one more state after this:

Figure representing the final state according to Alloy

In the above diagram, found has changed once again, this time to an empty bitfield. This is a valid behavior for our system. Recall that fs was a search for non-methods, and that the intersection of NOT_METHOD and PUBLIC is empty. Thus, found will be set to the empty bitfield, which (incorrectly) indicates that all symbols have been searched for! After this, any search would fail: fsBroken doesn’t have any flags set, and still, nothing is reported.

Now, this doesn’t definitively prove the compiler is broken: it’s possible that there isn’t a situation in which three searches like this (PUBLIC, then NOT_METHOD, then anything) will occur in practice. However, this gave the “motif” for reproducing the bug. All I had to do was find a real-life case that matched the counterexample.

It was a little easier to find a reproducer for a similar counterexample, actually. By inspection, I noticed that the same bug would occur if the second search was for METHOD_OR_FIELD, and not for NOT_METHOD. I was able to come up with a (fairly convoluted) example of Chapel code that triggered the issue. I include it here as a curiosity; there’s no need to understand how exactly it works.

module TopLevel {
  module XContainerUser {
    public use TopLevel.XContainer; // Will search for public, to no avail.
  }
  module XContainer {
    private var x: int;
    record R {} // R is in the same scope as x so it won't set public
    module MethodHaver {
      use TopLevel.XContainerUser;
      use TopLevel.XContainer;
      proc R.foo() {
        var y = x;
      }
    }
  }
}

Alas, the two-bitfield system is not just an approximation, it malfunctions in practice. I submitted a PR to fix the issue.

Search as a Polynomial

Mon, 22 May 2023 21:39:00 -0700

I read a really neat paper some time ago, and I’ve been wanting to write about it ever since. The paper is called Algebras for Weighted Search, and it is a tad too deep to dive into in a blog article – readers of ICFP are rarely the target audience on this site. However, one particular insight I gleaned from the paper merits additional discussion and demonstration. I’m going to do that here.

In particular, the paper pointed out a connection between polynomials and a general concept of search. In the context of the paper, “search” simply referred to a way of finding various solutions to some problem, perhaps like “what are the ways of getting from one place to another?”. In this case, a search would be a computation that explores the space of possible routes.

That all sounds very abstract, so let’s start with a concrete example. Suppose that you’re trying to get from city A to city B, and then from city B to city C. Also suppose that your trips are measured in one-hour intervals (maybe you round trip lengths, turning 2:45 into 3 hours), and that trips of equal duration are considered equivalent (“as long as it gets me there!”). Now, I give you a list of possible routes from city A to city B, and another list of possible routes from city B to city C, grouped by their length. Given these two lists, what are the possible routes from A to C?

Let’s make this even more concrete, and start with some actual lists of routes. Maybe there are two routes from A to B that take two hours each, and one “quick” trip that takes only an hour. On top of this, there’s one three-hour trip from B to C, and one two-hour trip. Given these building blocks, the list of possible trips from A to C is as follows.

Two two-hour trips from A to B, followed up by the three-hour trip from B to C.
Two two-hour trips from A to B, followed by the shorter two-hour trip from B to C.
One one-hour trip from A to B, followed by the three-hour trip from B to C.
One one-hour trip from A to B, followed by the shorter two-hour trip from B to C.

In the above, to figure out the various ways of getting from A to C, we had to examine all pairings of A-to-B routes with B-to-C routes. But then, multiple pairings end up having the same total length: the second and third bullet points both describe trips that take four hours. Thus, to give our final report, we need to “combine like terms” - add up the trips from the two matching bullet points, ending up with total of three four-hour trips.

Does this feel a little bit familiar? To me, this bears a rather striking resemblance to an operation we’ve seen in high school algebra class: we’re multiplying two binomials! Here’s the corresponding multiplication:

$$ \left(2x^2 + x\right)\left(x^3+x^2\right) = 2x^5 + 2x^4 + x^4 + x^3 = \underline{2x^5+3x^4+x^3} $$

It’s not just binomials that correspond to our combining paths between cities. We can represent any combination of trips of various lengths as a polynomial. Each term $ax^n$ represents $a$ trips of length $n$. As we just saw, multiplying two polynomials corresponds to “sequencing” the trips they represent – matching each trip in one with each of the trips in the other, and totaling them up.

What about adding polynomials, what does that correspond to? The answer there is actually quite simple: if two polynomials both represent (distinct) lists of trips from A to B, then adding them just combines the list. If I know one trip that takes two hours ($x^2$) and someone else knows a shortcut ($x$), then we can combine that knowledge ($x^2+x$).

Wait a moment. Sure, we learned about polynomials in algebra class: they're functions! You put in a number for $x$, and get another number out. But you haven't done that, and in fact you haven't even mentioned functions at all. What's going on? In this article (and in the paper it's based on), polynomials are viewed in a more general way than you might be used to. The point isn't to think of them as defining functions on numbers, but to make use of their "shape": a sum of certain powers of $x$, like $ax^n+bx^m+...$ So we won't be plugging numbers in, or trying to graph the polynomials in this section? That's right, we won't be. The sort of thing we're doing here is a bit closer to abstract algebra than to high school math. Don't worry if you're not familiar with the subject, though: I'm trying to explain everything from first principles.

Well, it’s a neat little thing that tracking trips corresponds to adding and mulitpying polynomials like that. We can push this observation a bit further, though. Since our trick relies on multiplying two polynomials, we’ll need to better understand what that multiplication needs to behave as we expect. In particular, we’ll need to know what the “bare minimum” is for working with polynomial: what arithmetic properties must we bring to the table? Let’s take a look at that next.

Polynomials over Semirings

Let’s watch what happens when we multiply two binomials, paying really close attention to the operations we’re performing. The following (concrete) example should do.

$$ \begin{aligned} & (x+1)(1-x)\\ =\ & (x+1)1+(x+1)(-x)\\ =\ & x+1-x^2-x \\ =\ & x-x+1-x^2 \\ =\ & 1-x^2 \end{aligned} $$

The first thing we do is distribute the multiplication over the addition, on the left. We then do that again, on the right this time. After this, we finally get some terms, but they aren’t properly grouped together; an $x$ is at the front, and a $-x$ is at the very back. We use the fact that addition is commutative ($a+b=b+a$) and associative ($a+(b+c)=(a+b)+c$) to rearrange the equation, grouping the $x$ and its negation together. This gives us $(1-1)x=0x=0$. That last step is important: we’ve used the fact that multiplication by zero gives zero. Another important property (though we didn’t use it here) is that multiplication has to be associative, too.

So, what if we didn’t use numbers, but rather any thing with two operations, one kind of like $(\times)$ and one kind of like $(+)$?

Here, it seems like you're saying that in the polynomials we've seen so far, it's numbers themselves that need to be commutative, associative, etc.. That's right, I am saying that. We need the $(+)$ and $(\times)$ operations on numbers to follow the laws I laid out above. Okay, but in your equations above, it's not just numbers that were moved around using commutativity and associativity: it was variables, like $x$. Just earlier you said that we're thinking of the polynomials in terms of their "shape", and not as functions. If that's the case, why we allowed to blur the lines between polynomial and number like that? Good question. If you want to get really precise, in the abstract view, adding numbers is not quite the same as adding polynomials. Because of this, saying that addition commutes for numbers does not immediately tel us that it commutes for something like $x$. However, also in the abstract view, we define how addition and multiplication on polynomials work using addition and multiplication numbers. Thus, properties of numbers make their way into properties of polynomials.

As I was saying, what if we used some other kind of thing other than numbers, together with notions of what it means to “add” and “multiply” this thing? As long as these operations satisfy the properties we have used so far, we should be able to create polynomials using them, and do this same sort of “combining paths” we did earlier. Before we get to that, let me just say that “things with addition and multiplication that work in the way we described” have an established name in math - they’re called semirings.

A semiring is a set equipped with two operations, one called “multiplicative” (and thus carrying the symbol $\times)$ and one called “additive” (and thus written as $+$). Both of these operations need to have an “identity element”. The identity element for multiplication is usually written as $1$, [note: And I do mean "written as": a semiring need not be over numbers. We could define one over graphs, sets, and many other things! Nevertheless, because most of us learn the properties of addition and multiplication much earlier than we learn about other more "esoteric" things, using numbers to stand for special elements seems to help use intuition. ] and the identity element for addition is written as $0$. Furthermore, a few equations hold. I’ll present them in groups. First, multiplication is associative and multiplying by $1$ does nothing; in mathematical terms, the set forms a monoid with multiplication and $1$. $$ \begin{array}{cl} (a\times b)\times c = a\times(b\times c) & \text{(multiplication associative)}\\ 1\times a = a = a \times 1 & \text{(1 is multiplicative identity)}\\ \end{array} $$

Similarly, addition is associative and adding $0$ does nothing. Addition must also be commutative; in other words, the set forms a commutative monoid with addition and $0$. $$ \begin{array}{cl} (a+b)+c = a+(b+c) & \text{(addition associative)}\\ 0+a = a = a+0 & \text{(0 is additive identity)}\\ a+b = b+a & \text{(addition is commutative)}\\ \end{array} $$

Finally, a few equations determine how addition and multiplication interact. $$ \begin{array}{cl} 0\times a = 0 = a \times 0 & \text{(annihilation)}\\ a\times(b+c) = a\times b + a\times c & \text{(left distribution)}\\ (a+b)\times c = a\times c + b\times c & \text{(right distribution)}\\ \end{array} $$

That’s it, we’ve defined a semiring. First, notice that numbers do indeed form a semiring; all the equations above should be quite familiar from algebra class. When using polynomials with numbers to do our city path finding, we end up tracking how many different ways there are to get from one place to another in a particular number of hours. There are, however, other semirings we can use that yield interesting results, even though we continue to add and multiply polynomials.

One last thing before we look at other semirings: given a semiring $R$, the polynomials using that $R$, and written in terms of the variable $x$, are denoted as $R[x]$.

The Semiring of Booleans, $\mathbb{B}$

Alright, it’s time for our first non-number example. It will be a simple one, though - booleans (that’s right, true and false from your favorite programming language!) form a semiring. In this case, addition is the “or” operation (aka ||), in which the result is true if either operand is true, and false otherwise.

$$ \begin{array}{c} \text{true} + b = \text{true}\\ b + \text{true} = \text{true}\\ \text{false} + \text{false} = \text{false} \end{array} $$

For addition, the identity element – our $0$ – is $\text{false}$.

Correspondingly, multiplication is the “and” operation (aka &&), in which the result is false if either operand is false, and true otherwise.

$$ \begin{array}{c} \text{false} \times b = \text{false}\\ b \times \text{false} = \text{false}\\ \text{true} \times \text{true} = \text{true} \end{array} $$

For multiplication, the identity element – the $1$ – is $\text{true}$.

It’s not hard to see that both operations are commutative - the first and second equations for addition, for instance, can be combined to get $\text{true}+b=b+\text{true}$, and the third equation clearly shows commutativity when both operands are false. The other properties are easy enough to verify by simple case analysis (there are 8 cases to consider). The set of booleans is usually denoted as $\mathbb{B}$, which means polynomials using booleans are denoted by $\mathbb{B}[x]$.

Let’s try some examples. We can’t count how many ways there are to get from A to B in a certain number of hours anymore: booleans aren’t numbers! Instead, what we can do is track whether or not there is a way to get from A to B in a certain number of hours (call it $n$). If we can, we write that as $\text{true}\ x^n = 1x^n = x^n$. If we can’t, we write that as $\text{false}\ x^n = 0x^n = 0$. The polynomials corresponding to our introductory problem are $x^2+x^1$ and $x^3+x^2$. Multiplying them out gives:

$$ (x^2+x^1)(x^3+x^2) = x^5 + x^4 + x^4 + x^3 = x^5 + x^4 + x^2 $$

And that’s right; if it’s possible to get from A to B in either two hours or one hour, and then from B to C in either three hours or two hours, then it’s possible to get from A to C in either five, four, or three hours. In a way, polynomials like this give us less information than our original ones [note: In fact, we can construct a semiring homomorphism (kind of like a ring homomorphism, but for semirings) from $\mathbb{N}[x]$ to $\mathbb{B}[x]$ as follows: $$ \sum_{i=0}^n a_ix^i \mapsto \sum_{i=0}^n \text{clamp}(a_i)x^i $$ Where the $\text{clamp}$ function checks if its argument is non-zero. In the case of city path search, $\text{clamp}$ asks the questions "are there any routes at all?". $$ \text{clamp}(n) = \begin{cases} \text{false} & n = 0 \\ \text{true} & n > 0 \end{cases} $$ We can't construct the inverse of the above homomorphism (a mapping that would undo our clamping, and take polynomials in $\mathbb{B}[x]$ to $\mathbb{N}[x]$). This fact gives us a more "mathematical" confirmation that we lost information, rather than gained it, but switching to boolean polynomials: we can always recover a boolean polynomial from the natural number one, but not the other way around. ] (which were $\mathbb{N}[x]$, polynomials over natural numbers $\mathbb{N} = \{ 0, 1, 2, ... \}$), so it’s unclear why we’d prefer them. However, we’re just warming up - there are more interesting semirings for us to consider!

The Semiring of Sets of Paths, $\mathcal{P}(\Pi)$

Until now, we explicitly said that “all paths of the same length are equivalent”. If we’re giving directions, though, we might benefit from knowing not just that there is a way, but what roads that way is made up of!

To this end, we define the set of paths, $\Pi$. This set will consist of the empty path (which we will denote $\circ$, why not?), street names (e.g. $\text{Mullholland Dr.}$ or $\text{Sunset Blvd.}$), and concatenations of paths, written using $\rightarrow$. For instance, a path that first takes us on $\text{Highway}$ and then on $\text{Exit 4b}$ will be written as:

$$ \text{Highway}\rightarrow\text{Exit 4b} $$

Furthermore, it’s not too much of a stretch to say that adding an empty path to the front or the back of another path doesn’t change it. If we use the letter $\pi$ to denote a path, this means the following equation:

$$ \circ \rightarrow \pi = \pi = \pi \rightarrow \circ $$

So those are paths. [note: Actually, if you clicked through the monoid link earlier, you might be interested to know that paths as defined here form a monoid with concatenation $\rightarrow$ and the empty path $\circ$ as a unit. ] Paths alone, though, aren’t enough for our polynomials; we’re tracking different ways to get from one place to another. This is an excellent use case for sets!

Our next semiring will be that of sets of paths. Some example elements of this semiring are $\varnothing$, also known as the empty set, $\{\circ\}$, the set containing only the empty path, and the set containing a path via the highway, and another path via the suburbs:

$$ \{\text{Highway}\rightarrow\text{Exit 4b}, \text{Suburb Rd.}\} $$

So what are the addition and multiplication on sets of paths? Addition is the easier one: it’s just the union of sets (the “triangle equal sign” symbol means “defined as”):

$$ A + B \triangleq A \cup B $$

It’s well known (and not hard to verify) that set union is commutative and associative. The additive identity $0$ is simply the empty set $\varnothing$. Intuitively, adding “no paths” to another set of paths doesn’t add anything, and thus leaves that other set unchanged.

Multiplication is a little bit more interesting, and uses the path concatenation operation we defined earlier. We will use this operation to describe path sequencing; given two sets of paths, $A$ and $B$, we’ll create a new set of paths consisting of each path from $A$ concatenated with each path from $B$:

$$ A \times B \triangleq \{ a \rightarrow b\ |\ a \in A, b \in B \} $$

The fact that this definition of multiplication on sets is associative relies on the associativity of path concatenation; if path concatenation weren’t associative, the second equality below would not hold.

$$ \begin{array}{rcl} A \times (B \times C) & = & \{ a \rightarrow (b \rightarrow c)\ |\ a \in A, b \in B, c \in C \} \\ & \stackrel{?}{=} & \{ (a \rightarrow b) \rightarrow c \ |\ a \in A, b \in B, c \in C \} \\ & = & (A \times B) \times C \end{array} $$

What’s the multiplicative identity? Well, since multiplication concatenates all the combinations of paths from two sets, we could try making a set of elements that don’t do anything when concatenating. Sound familiar? It should, that’s $\circ$, the empty path element! We thus define our multiplicative identity as $\{\circ\}$, and verify that it is indeed the identity:

$$ \begin{gathered} \{\circ\} \times A = \{ \circ \rightarrow a\ |\ a \in A \} = \{ a \ |\ a \in A \} = A \\ A \times \{\circ\}= \{ a\rightarrow \circ \ |\ a \in A \} = \{ a \ |\ a \in A \} = A \end{gathered} $$

It’s not too difficult to verify the annihilation and distribution laws for sets of paths, either; I won’t do that here, though. Finally, let’s take a look at an example. Like before, we’ll try make one that corresponds to our introductory description of paths from A to B and from B to C. Now we need to be a little bit creative, and come up with names for all these different roads between our hypothetical cities. Let’s say that $\text{Highway A}$ and $\text{Highway B}$ are the two paths from A to B that take two hours each, and then $\text{Shortcut}$ is the path that takes one hour. As for paths from B to C, let’s just call them $\text{Long}$ for the three-hour path, and $\text{Short}$ for the two-hour path. Our two polynomials are then:

$$ \begin{array}{rcl} P_1 & = & \{\text{Highway A}, \text{Highway B}\}x^2 + \{\text{Shortcut}\}x \\ P_2 & = & \{\text{Long}\}x^3 + \{\text{Short}\}x^2 \end{array} $$

Multiplying them gives: $$ \begin{array}{rl} & \{\text{Highway A} \rightarrow \text{Long}, \text{Highway B} \rightarrow \text{Long}\}x^5\\ + & \{\text{Highway A} \rightarrow \text{Short}, \text{Highway B} \rightarrow \text{Short}, \text{Shortcut} \rightarrow \text{Long}\}x^4\\ + & \{\text{Shortcut} \rightarrow \text{Short}\}x^3 \end{array} $$

This resulting polynomial gives us all the paths from city A to city C, grouped by their length!

The Tropical Semiring, $\mathbb{R}$

I only have one last semiring left to show you. It’s a fun semiring though, as even its name might suggest: we’ll take a look at a tropical semiring.

In this semiring, we go back to numbers; particularly, real numbers (e.g., $1.34$, $163$, $e$, that kind of thing). We even use addition – sort of. In the tropical semiring, addition serves as the multiplicative operation! This is even confusing to write, so I’m going to switch up notation; in the rest of this section, I’ll use $\otimes$ to represent the multiplicative operation in semirings, and $\oplus$ to represent the additive one. The symbols $\times$ and $+$ will be used to represent the regular operations on real numbers. With that, the operations on our tropical semiring over real numbers are defined as follows:

$$ \begin{array}{rcl} x \otimes y & \triangleq & x + y\\ x \oplus y & \triangleq & \min(x,y) \end{array} $$

What is this new semiring good for? How about this: suppose that in addition to the duration of the trip, you’d like to track the distance you must travel for each route (shorter routes do sometimes have more traffic!). Let’s watch what happens when we add and multiply polynomials over this semiring. When we add terms with the same power but different coefficients, like $ax\oplus bx$, we end up with a term $\min(a,b)x$. In other words, for each trip duration, we pick the shortest length. When we multiply two polynomials, like $ax\otimes bx$, we get $(a+b)x$; in other words, when sequencing two trips, we add up the distances to get the combined distance, just like we’d expect.

We can, of course, come up with a polynomial to match our initial example. Say that the trips from A to B are represented by $2.0x^2\oplus1.5x$ (the shortest two-hour trip is $2$ units of distance long, and the one-hour trip is $1.5$ units long), and that the trips from B to C are represented by $4.0x^3\oplus1.0x^2$. Multiplying the two polynomials out gives:

$$ \begin{array}{rcl} (2.0x^2\oplus1.5x)(4.0x^3\oplus1.0x^2) & = & 6.0x^5 \oplus \min(2.0+1.0, 1.5+4.0)x^4 \oplus 2.5x^3 \\ & = & 6.0x^5 \oplus 3.0x^4 \oplus 2.5x^3 \end{array} $$

The only time we used the additive operation in this case was to pick between two trips of equal druation but different length (two-hour trip from A to B followed by a two-hour trip from B to C, or one-hour trip from A to C followed by a three-hour trip from B to C). The first trip wins out, since it requires only $3.0$ units of distance.

Anything but Routes

So far, all I’ve done can be reduced to variations on a theme: keeping track of some aspects of a trip between cities, using polynomials for structure. However, that’s just the beginning. This sort of trick can be be made even more powerful by further relaxing the notion of a “polynomial”. By doing so, we can make our polynomials represent arbitrary effects (in the computer science sense – things like errors, logging to a console, storing and accessing information from a database). Relying for just a little longer on our example of journeys between cities, we might be able to represent trips with random variation (traffic can be unpredicatable!), or maybe cities where you will get stuck. But the point isn’t routes: the same approach can be used to represent traversing a binary tree, performing Prolog-like proof search, or evaluating a non-deterministic program. The sky’s the limit!

Unfortunately, doing so would require even more background and buildup, for which I just don’t have space for in this article. I’ll save these things for next time, though – stay tuned!

Generalizing Folds in Haskell

Fri, 22 Apr 2022 12:19:22 -0700

Have you encountered Haskell’s foldr function? Did you know that you can use it to express any function on a list? What’s more, there’s a way to derive similar functions for a large class of data types in Haskell. [note: Specifically, this is the class of inductive types. ] This is precisely the focus of this post. Before we get into the details, it’s good to review the underlying concepts in a more familiar setting: functions.

Recursive Functions

Let’s start off with a little bit of a warmup, and take a look at a simple recursive function: length. Here’s a straightforward definition:

From Cata.hs, lines 4 through 6

4
5
6

length :: [a] -> Int
length [] = 0
length (_:xs) = 1 + length xs

Haskell is nice because it allows for clean definitions of recursive functions; length can just reference itself in its definition, and everything works out in the end. In the underlying lambda calculus, though, a function definition doesn’t come with a name – you only get anonymous functions via the lambda abstraction. There’s no way for such functions to just refer to themselves by their name in their body. But the lambda calculus is Turing complete, so something is making recursive definitions possible.

The trick is to rewrite your recursive function in such a way that instead of calling itself by its name (which, with anonymous functions, is hard to come by), it receives a reference to itself as an argument. As a concrete example:

From Cata.hs, lines 8 through 10

 8
 9
10

lengthF :: ([a] -> Int) -> [a] -> Int
lengthF rec [] = 0
lengthF rec (_:xs) = 1 + rec xs

This new function can easily me anonymous; if we enable the LambdaCase extension, we can write it using only lambda functions as:

From Cata.hs, lines 12 through 14

12
13
14

lengthF' = \rec -> \case
    [] -> 0
    _:xs -> 1 + rec xs

This function is not equivalent to length, however. It expects “itself”, or a function which has type [a] -> Int, to be passed in as the first argument. Once fed this rec argument, though, lengthF returns a length function. Let’s try feed it something, then!

lengthF _something

But if lengthF produces a length function when given this something, why can’t we feed this newly-produced length function back to it?

lengthF (lengthF _something)

And again:

lengthF (lengthF (lengthF _something))

If we kept going with this process infinitely, we’d eventually have what we need:

$$ \text{length} = \text{lengthF}(\text{lengthF}(\text{lengthF}(...))) $$

But hey, the stuff inside the first set of parentheses is still an infinite sequence of applications of the function $\text{lengthF}$, and we have just defined this to be $\text{length}$. Thus, we can rewrite the above equation as:

$$ \text{length} = \text{lengthF}(\text{length}) $$

What we have just discovered is that the actual function that we want, length, is a fixed point of the non-recursive function lengthF. Fortunately, Haskell comes with a function that can find such a fixed point. It’s defined like this:

From Cata.hs, line 16

`16`	`fix f = let x = f x in x`

This definition is as declarative as can be; fix returns the $x$ such that $x = f(x)$. With this, we finally write:

From Cata.hs, line 18

`18`	`length' = fix lengthF`

Loading up the file in GHCi, and running the above function, we get exactly the expected results.

ghci> Main.length' [1,2,3]
3

You may be dissatisfied with the way we handled fix here; we went through and pretended that we didn’t have recursive function definitions, but then used a recursive let-expression in the body fix! This is a valid criticism, so I’d like to briefly talk about how fix is used in the context of the lambda calculus.

In the untyped typed lambda calculus, we can just define a term that behaves like fix does. The most common definition is the $Y$ combinator, defined as follows:

$$ Y = \lambda f. (\lambda x. f (x x)) (\lambda x. f (x x )) $$

When applied to a function, this combinator goes through the following evaluation steps:

$$ Y f = f (Y f) = f (f (Y f)) =\ ... $$

This is the exact sort of infinite series of function applications that we saw above with $\text{lengthF}$.

Recursive Data Types

We have now seen how we can rewrite a recursive function as a fixed point of some non-recursive function. Another cool thing we can do, though, is to transform recursive data types in a similar manner! Let’s start with something pretty simple.

From Cata.hs, line 20

`20`	`data MyList = MyNil \| MyCons Int MyList`

Just like we did with functions, we can extract the recursive occurrences of MyList into a parameter.

From Cata.hs, line 21

`21`	`data MyListF a = MyNilF \| MyConsF Int a`

Just like lengthF, MyListF isn’t really a list. We can’t write a function sum :: MyListF -> Int. MyListF requires something as an argument, and once given that, produces a type of integer lists. Once again, let’s try feeding it:

MyListF a

From the definition, we can clearly see that a is where the “rest of the list” is in the original MyList. So, let’s try fill a with a list that we can get out of MyListF:

MyListF (MyListF a)

And again:

MyListF (MyListF (MyListF a))

Much like we used a fix function to turn our lengthF into length, we need a data type, which we’ll call Fix (and which has been implemented before). Here’s the definition:

From Cata.hs, line 23

`23`	`newtype Fix f = Fix { unFix :: f (Fix f) }`

Looking past the constructors and accessors, we might write the above in pseudo-Haskell as follows:

newtype Fix f = f (Fix f)

This is just like the lambda calculus $Y$ combinator above! Unfortunately, we do have to deal with the cruft induced by the constructors here. Thus, to write down the list [1,2,3] using MyListF, we’d have to produce the following:

From Cata.hs, lines 25 through 26

25
26

testList :: Fix MyListF
testList = Fix (MyConsF 1 (Fix (MyConsF 2 (Fix (MyConsF 3 (Fix MyNilF))))))

This is actually done in practice when using some approaches to help address the expression problem; however, it’s quite unpleasant to write code in this way, so we’ll set it aside.

Let’s go back to our infinite chain of type applications. We’ve a similar pattern before, with $\text{length}$ and $\text{lengthF}$. Just like we did then, it seems like we might be able to write something like the following:

$$ \begin{aligned} & \text{MyList} = \text{MyListF}(\text{MyListF}(\text{MyListF}(...))) \\ \Leftrightarrow\ & \text{MyList} = \text{MyListF}(\text{MyList}) \end{aligned} $$

In something like Haskell, though, the above is not quite true. MyListF is a non-recursive data type, with a different set of constructors to MyList; they aren’t really equal. Instead of equality, though, we use the next-best thing: isomorphism.

$$ \text{MyList} \cong \text{MyListF}(\text{MyList}) $$

Two types are isomorphic when there exist a pair of functions, $f$ and $g$, [note: Let's a look at the types of Fix and unFix, by the way. Suppose that we did define MyList to be Fix MyListF. Let's specialize the f type parameter of Fix to MyListF for a moment, and check:

In one direction, Fix :: MyListF MyList -> MyList
And in the other, unFix :: MyList -> MyListF MyList

The two mutual inverses $f$ and $g$ fall out of the definition of the Fix data type! If we didn't have to deal with the constructor cruft, this would be more ergonomic than writing our own myIn and myOut functions. ] that take you from one type to the other (and vice versa), such that applying $f$ after $g$, or $g$ after $f$, gets you right back where you started. That is, $f$ and $g$ need to be each other’s inverses. For our specific case, let’s call the two functions myOut and myIn (I’m matching the naming in this paper). They are not hard to define:

From Cata.hs, lines 28 through 34

myOut :: MyList -> MyListF MyList
myOut MyNil = MyNilF
myOut (MyCons i xs) = MyConsF i xs

myIn :: MyListF MyList -> MyList 
myIn MyNilF = MyNil
myIn (MyConsF i xs) = MyCons i xs

By the way, when a data type is a fixed point of some other, non-recursive type constructor, this second type constructor is called a base functor. We can verify that MyListF is a functor by providing an instance (which is rather straightforward):

From Cata.hs, lines 36 through 38

36
37
38

instance Functor MyListF where
    fmap f MyNilF = MyNilF
    fmap f (MyConsF i a) = MyConsF i (f a)

Recursive Functions with Base Functors

One neat thing you can do with a base functor is define recursive functions on the actual data type!

Let’s go back to the very basics. When we write recursive functions, we try to think of it as solving a problem, assuming that we are given solutions to the sub-problems that make it up. In the more specific case of recursive functions on data types, we think of it as performing a given operation, assuming that we know how to perform this operation on the smaller pieces of the data structure. Some quick examples:

When writing a sum function on a list, we assume we know how to find the sum of the list’s tail (sum xs), and add to it the current element (x+). Of course, if we’re looking at a part of a data structure that’s not recursive, we don’t need to perform any work on its constituent pieces.
```
sum [] = 0
sum (x:xs) = x + sum xs
```
When writing a function to invert a binary tree, we assume that we can invert the left and right children of a non-leaf node. We might write:
```
invert Leaf = Leaf
invert (Node l r) = Node (invert r) (invert l)
```

What does this have to do with base functors? Well, recall how we arrived at MyListF from MyList: we replaced every occurrence of MyList in the definition with a type parameter a. Let me reiterate: wherever we had a sub-list in our definition, we replaced it with a. The a in MyListF marks the locations where we would have to use recursion if we were to define a function on MyList.

What if instead of a stand-in for the list type (as it was until now), we use a to represent the result of the recursive call on that sub-list? To finish computing the sum of the list, then, the following would suffice:

From Cata.hs, lines 40 through 42

40
41
42

mySumF :: MyListF Int -> Int
mySumF MyNilF = 0
mySumF (MyConsF i rest) = i + rest

Actually, this is enough to define the whole sum function. First things first, let’s use myOut to unpack one level of the Mylist type:

From Cata.hs, line 28

`28`	`myOut :: MyList -> MyListF MyList`

We know that MyListF is a functor; we can thus use fmap sum to compute the sum of the remaining list:

fmap mySum :: MyListF MyList -> MyListF Int

Finally, we can use our mySumF to handle the last addition:

From Cata.hs, line 40

`40`	`mySumF :: MyListF Int -> Int`

Let’s put all of these together:

From Cata.hs, lines 44 through 45

44
45

mySum :: MyList -> Int
mySum = mySumF . fmap mySum . myOut

Notice, though, that the exact same approach would work for any function with type:

MyListF a -> a

We can thus write a generalized version of mySum that, instead of using mySumF, uses some arbitrary function f with the aforementioned type:

From Cata.hs, lines 47 through 48

47
48

myCata :: (MyListF a -> a) -> MyList -> a
myCata f = f . fmap (myCata f) . myOut

Let’s use myCata to write a few other functions:

From Cata.hs, lines 50 through 60

myLength = myCata $ \case
    MyNilF -> 0
    MyConsF _ l -> 1 + l

myMax = myCata $ \case
    MyNilF -> 0
    MyConsF x y -> max x y

myMin = myCata $ \case
    MyNilF -> 0
    MyConsF x y -> min x y

It’s just a `foldr`!

When you write a function with the type MyListF a -> a, you are actually providing two things: a “base case” element of type a, for when you match MyNilF, and a “combining function” with type Int -> a -> a, for when you match MyConsF. We can thus define:

From Cata.hs, lines 64 through 66

64
65
66

pack :: a -> (Int -> a -> a) -> MyListF a -> a
pack b f MyNilF = b
pack b f (MyConsF x y) = f x y

We could also go in the opposite direction, by writing:

From Cata.hs, lines 68 through 69

68
69

unpack :: (MyListF a -> a) -> (a, Int -> a -> a)
unpack f = (f MyNilF, \i a -> f (MyConsF i a))

Hey, what was it that we said about types with two functions between them, which are inverses of each other? That’s right, MyListF a -> a and (a, Int -> a -> a) are isomorphic. The function myCata, and the “traditional” definition of foldr are equivalent!

Base Functors for All!

We’ve been playing with MyList for a while now, but it’s kind of getting boring: it’s just a list of integers! Furthermore, we’re not really getting anything out of this new “generalization” procedure – foldr is part of the standard library, and we’ve just reinvented the wheel.

But you see, we haven’t quite. This is because, while we’ve only been working with MyListF, the base functor for MyList, our approach works for any recursive data type, provided an out function. Let’s define a type class, Cata, which pairs a data type a with its base functor f, and specifies how to “unpack” a:

From Cata.hs, lines 71 through 72

71
72

class Functor f => Cata a f where
    out :: a -> f a

We can now provide a more generic version of our myCata, one that works for all types with a base functor:

From Cata.hs, lines 74 through 75

74
75

cata :: Cata a f => (f b -> b) -> a -> b
cata f = f . fmap (cata f) . out

Clearly, MyList and MyListF are one instance of this type class:

From Cata.hs, lines 77 through 78

77
78

instance Cata MyList MyListF where
    out = myOut

We can also write a base functor for Haskell’s built-in list type, [a]:

From Cata.hs, lines 80 through 84

data ListF a b = Nil | Cons a b deriving Functor

instance Cata [a] (ListF a) where
    out [] = Nil
    out (x:xs) = Cons x xs

We can use our cata function for regular lists to define a generic sum:

From Cata.hs, lines 86 through 89

sum :: Num a => [a] -> a
sum = cata $ \case
    Nil -> 0
    Cons x xs -> x + xs

It works perfectly:

ghci> Main.sum [1,2,3]
6
ghci> Main.sum [1,2,3.0]
6.0
ghci> Main.sum [1,2,3.0,-1]
5.0

What about binary trees, which served as our second example of a recursive data structure? We can do that, too:

From Cata.hs, lines 91 through 96

data BinaryTree a = Node a (BinaryTree a) (BinaryTree a) | Leaf deriving (Show, Foldable)
data BinaryTreeF a b = NodeF a b b | LeafF deriving Functor

instance Cata (BinaryTree a) (BinaryTreeF a) where
    out (Node a l r) = NodeF a l r
    out Leaf = LeafF

Given this, here’s an implementation of that invert function we mentioned earlier:

From Cata.hs, lines 98 through 101

invert :: BinaryTree a -> BinaryTree a
invert = cata $ \case
    LeafF -> Leaf
    NodeF a l r -> Node a r l

Degenerate Cases

Actually, the data types we consider don’t have to be recursive. We can apply the same procedure of replacing recursive occurrences in a data type’s definition with a new type parameter to Maybe; the only difference is that now the new parameter will not be used!

From Cata.hs, lines 103 through 107

data MaybeF a b = NothingF | JustF a deriving Functor

instance Cata (Maybe a) (MaybeF a) where
    out Nothing = NothingF
    out (Just x) = JustF x

And then we can define a function on Maybe using cata:

From Cata.hs, lines 109 through 112

getOrDefault :: a -> Maybe a -> a
getOrDefault d = cata $ \case
    NothingF -> d
    JustF a -> a

This isn’t really useful, since we’re still pattern matching on a type that looks identical to Maybe itself. There is one reason that I bring it up, though. Remember how foldr was equivalent to cata for MyList, because defining a function MyListF a -> a was the same as providing a base case a and a “combining function” Int -> a -> a? Well, defining a function MaybeF x a -> a is the same as providing a base case a (for NothingF) and a handler for the contained value, x -> a. So we might imagine the foldr function for Maybe to have type:

maybeFold :: a -> (x -> a) -> Maybe x -> a

This is exactly the function maybe from Data.Maybe! Hopefully you can follow a similar process in your head to arrive at “fold” functions for Either and Bool. Indeed, there are functions that correspond to these data types in the Haskell standard library, named either and bool. Much like fold can be used to represent any function on lists, maybe, either, and bool can be used to represent any function on their corresponding data types. I think that’s neat.

What About `Foldable`?

If you’ve been around the Haskell ecosystem, you may know the Foldable type class. Isn’t this exactly what we’ve been working towards here? No, not at all. Take a look at how the documentation describes Data.Foldable:

The Foldable class represents data structures that can be reduced to a summary value one element at a time.

One at a time, huh? Take a look at the signature of foldMap, which is sufficient for an instance of Foldable:

foldMap :: Monoid m => (a -> m) -> t a -> m

A Monoid is just a type with an associative binary operation that has an identity element. Then, foldMap simply visits the data structure in order, and applies this binary operation pairwise to each monoid produced via f. Alas, this function is not enough to be able to implement something like inverting a binary tree; there are different configurations of binary tree that, when visited in-order, result in the same sequence of elements. For example:

ghci> fold (Node "Hello" Leaf (Node ", " Leaf (Node "World!" Leaf Leaf)))
"Hello, World!"
ghci> fold (Node "Hello" (Node ", " Leaf Leaf) (Node "World!" Leaf Leaf))
"Hello, World!"

As far as fold (which is just foldMap id) is concerned, the two trees are equivalent. They are very much not equivalent for the purposes of inversion! Thus, whereas Foldable helps us work with list-like data types, the Cata type class lets us express any function on a recursive data type similarly to how we’d do it with foldr and lists.

Catamorphisms

Why is the type class called Cata, and the function cata? Well, a function that performs a computation by recursively visiting the data structure is called a catamorphism. Indeed, foldr f b, for function f an “base value” b is an example of a list catamorophism. It’s a fancy word, and there are some fancier descriptions of what it is, especially when you step into category theory (check out the Wikipedia entry if you want to know what I mean). However, for our purposes, a catamorphism is just a generalization of foldr from lists to any data type!

Declaratively Deploying Multiple Blog Versions with NixOS and Flakes

Sun, 10 Apr 2022 00:24:58 -0700

Prologue

You can skip this section if you’d like.

For the last few days, I’ve been stuck inside of my room due to some kind of cold or flu, which or may or may not be COVID™. [note: The results of the PCR test are pending at the time of writing. ] In seeming correspondence with the progression of my cold, a thought occurred in the back of my mind: “Your blog deployment is kind of a mess”. On the first day, when I felt only a small tingling in my throat, I waved that thought away pretty easily. On the second day, feeling unwell and staying in bed, I couldn’t help but start to look up Nix documentation. And finally, on the third day, between coughing fits and overconsumption of oral analgesic, I got to work.

In short, this post is the closest thing I’ve written to a fever dream.

The Constraints

I run several versions of this site. The first is, of course, the “production” version, hosted at the time of writing on danilafe.com and containing the articles that I would like to share with the world. The second is a version of this site on which drafts are displayed - this way, I can share posts with my friends before they are published, get feedback, and even just re-read what I wrote from any device that has an internet connection. The third is the Russian version of my blog. It’s rather empty, because translation is hard work, so it only exists so far as another “draft” website.

Currently, only my main site is behind HTTPS. However, I would like for it to be possible to adjust this, and possibly even switch my hosts without changing any of the code that actually builds my blog.

I wanted to be able to represent all of this complexity in my NixOS configuration file, and that’s what this post is about!

Why Flakes

I decided to use Nix flakes to manage my configuration. But what is it that made me do so? Well, two things:

Adding custom packages. The Nix code for my blog provides a package / derivation for each version of my website, and I want to use these packages in my configuration.nix. Adding custom packages is typically done using overlays; however, how should my system configuration get my overlay Nix expression? I would like to be able to separate my build-the-blog code from my describe-the-server code, and so I need a clean way to let my system access the former from the latter. Flakes solve this issue by letting me specify a blog flake, and pull it in as one of the system configuration’s inputs.
Versioning. My process for deploying new versions of the site prior to flakes boiled down to fetching the latest commit from the master branch of my blog repository, and updating the default.nix file with the corresponding hash. This way, I could reliably fetch the version of my site that I wanted published. Flakes do the same thing: the flake.lock file contains the hashes of the Git-based dependencies of a flake, and thus prevents builds from accidentally pulling in something else. However, unlike my approach, which relies on custom scripts and extra tools such as jq, the locking mechanism used by flakes is provided with standard Nix tooling. Using flakes also guarantees that my build process won’t break with updates to Hugo or Ruby, since the nixpkgs version is stored in flake.lock, too.

The Final Result

Here’s the relevant section of my configuration:

From configuration.nix, lines 42 through 59

  services.danilafe-blog = {
    enable = true;
    challengePath = "/var/www/challenges";
    sites = [
      (builders.english {
        ssl = true;
        host = "danilafe.com";
      })
      (builders.english {
        drafts = true;
        host = "drafts.danilafe.com";
      })
      (builders.russian {
        drafts = true;
        host = "drafts.ru.danilafe.com";
      })
    ];
  };

I really like how this turned out for three reasons. First, it’s very clear from the configuration what I want from my server: three virtual hosts, one with HTTPS, one with drafts, and one with drafts and in Russian. Second, there’s plenty of code reuse. I’m using two builder functions, english and russian, but under the hood, the exact same code is being used to run Hugo and perform all the necessary post-processing. Finally, all of this can be used pretty much immediately given my blog flake, which reduces the amount of glue code I have to write.

Getting There

A Derivation Builder

As I mentioned earlier, I need to generate multiple versions of my blog. All of these use pretty much the same build process – run Hugo on the Markdown files, then do some post-processing (in particular, convert the LaTeX in the resulting pages into MathML and nice-looking HTML). I didn’t want to write this logic multiple times, so I settled for a function that takes some settings, and returns a derivation:

From lib.nix, lines 6 through 21

  website = settings: stdenv.mkDerivation {
    inherit (settings) src ssl host;
    name = "blog-static";
    version = settings.src.rev;
    urlSub =
      let
        regexEscape = lib.escape [ "/" "(" ")" "[" "]" "+" "*" "\\" ];
      in
        with settings.replaceUrl; "s/${regexEscape from}/${regexEscape to}/g";
    publicPath = settings.path;
    extraFlags = (if settings.drafts then " -D " else "") + settings.extraFlags;
    builder = ./build/builder.sh;
    buildInputs = [
      hugo katex-html
    ];
  };

There are a few things here:

On line 7, the settings src, ssl, and host are inherited into the derivation. The src setting provides a handle on the source code of the blog. I haven’t had much time to test and fine-tune the changes enabling multi-language support on the site, so they reside on a separate branch. It’s up to the caller to specify which version of the source code should be used for building. The host and ssl settings are interesting because they don’t actually matter for the derivation itself – they just aren’t used in the builder. However, attributes given to a derivation are accessible from “outside”, and these settings will play a role later.
Lines 10 through 14 deal with setting the base URL of the site. Hugo does not know how to interpret the --baseURL option when a blog has multiple languages. What this means is that in the end, it is impossible to configure the base URL used in links from the command line. I need to apply some manual changes to the configuration file. It’s necessary to adjust the base URL because each version of my website is hosted in a different place: the default (English) website is hosted on danilafe.com, the version with drafts on drafts.danilafe.com, and so on. However, the configuration file only knows one base URL per language, and so it doesn’t know when or when not to use the drafts. prefix. The urlSub variable is used in the builder.
On line 15, the publicPath variable is set; while single-language Hugo puts all the generated HTML into the public folder, the multi-language configuration places it into public/[language-code]. Thus, depending on the configuration, the builder needs to look in a different place for final output.

This new website function is general enough to represent all my blog versions, but it’s too low-level. Do I really want to specify the publicPath each time I want to describe a version of the site? What about settings.replaceUrl, or the source code? Just like I would in any garden variety language, I defined two helper functions:

From lib.nix, lines 25 through 48

    english = settings: website {
      inherit (settings) host;
      ssl = settings.ssl or false;
      drafts = settings.drafts or false;
      src = blog-source;
      path = ".";
      extraFlags = "--config=config.toml,config-gen.toml";
      replaceUrl = {
        from = "https://danilafe.com";
        to = wrapHost (settings.ssl or false) settings.host;
      };
    };
    russian = settings: website {
      inherit (settings) host;
      ssl = settings.ssl or false;
      drafts = settings.drafts or false;
      src = blog-source-localized;
      path = "ru";
      extraFlags = "";
      replaceUrl = {
        from = "https://ru.danilafe.com";
        to = wrapHost (settings.ssl or false) settings.host;
      };
    };

Both of these simply make a call to the website function (and thus return derivations), but they make some decisions for the caller, and provide a nicer interface by allowing attributes to be omitted. Specifically, by default, a site version is assumed to be HTTP-only, and to contain non-draft articles. Furthermore, since each function corresponds to a language, there’s no need for the caller to provide a blog version, and thus also the output path, or even to specify the “from” part of replaceUrl. The wrapHost function, not included in the snippet, simply adds http or https to the host parameter, which does not otherwise include this information. These functions can now be called to describe different versions of my site:

# Default version, hosted on the main site and using HTTPS
english {
    ssl = true;
    host = "danilafe.com";
}

# English draft version, hosted on draft domain and not using HTTPS.
english {
    drafts = true;
    host = "drafts.danilafe.com";
}

# Russian draft version, hosted on draft (russian) domain, and not using HTTPS.
russian {
    drafts = true;
    host = "drafts.ru.danilafe.com";
}

Configuring Nginx

The above functions are already a pretty big win (in my opinion) when it comes to describing my blog. However, by themselves, they aren’t quite enough to clean up my system configuration: for each of these blog versions, I’d need to add an Nginx virtualHosts entry where I’d pass in the corresponding host (like danilafe.com or drafts.danilafe.com), configure SSL, and so on. At one point, too, all paths in /var were by default mounted as read-only by NixOS, which meant that it was necessary to tell systemd that /var/www/challenges should be writeable so that the SSL certificate for the site could be properly renewed. Overall, this was a lot of detail that I didn’t want front-and-center in my server configuration.

However, with the additional “ghost” attributes, my derivations already contain most of the information required to configure Nginx. The virtual host, for instance, is the same as replaceUrl.to (since I’d want the Nginx virtual host for a blog version to handle links within that version). The ssl ghost parameter corresponds precisely to whether or not a virtual host will need SSL (and thus ACME, and thus the systemd setting). For each derivation built using website, I can access the attributes like ssl or host to generate the corresponding piece of the Nginx configuration.

To make this really nice, I wanted all of this to be “just another section of my configuration file”. That is, I wanted to control my site deployment via regular old attributes in configuration.nix. To this end, I needed a module. Xe recently wrote about NixOS modules in flakes, and what I do here is very similar. In essence, a module has two bits:

The options, which specify what kind of attributes this module understands. The most common option is enable, which tells a module that it should apply its configuration changes.
The configuration, which consists of the various system settings that this module will itself set. These typically depend on the options.

In short, a module describes the sort of options it will accept, and then provides a way to convert these newly-described options into changes to the system configuration. It may help if I showed you the concrete options that my newly-created blog module provides:

From module.nix, lines 32 through 43

    options.services.danilafe-blog = {
      enable = mkEnableOption "Daniel's blog service";
      sites = mkOption {
        type = types.listOf types.package;
        default = {};
        description = "List of versions of this blog that should be enabled.";
      };
      challengePath = mkOption {
        type = types.str;
        description = "The location for ACME challenges.";
      };
    };

There are three options here:

enable, a boolean-valued input that determines whether or not the module should make any changes to the system configuration at all.
sites, which, as written in the code, accepts a list of derivations. These derivations correspond to the various versions of my site that should be served to the outside world.
challengePath, a string to configure where ACME will place files during automatic SSL renewal.

Now, while these are the only three options the user will need to set, the changes to the system configuration are quite involved. For instance, for each site (derivation) in the sites list, the resulting configuration needs to have a virtualHost in the services.nginx namespace. To this end, I defined a function that accepts a site derivation and produces the necessary settings:

From module.nix, lines 7 through 19

  virtualHost = package:
    {
      virtualHosts."${package.host}" = mkMerge [
        {
          root = package;
        }
        (mkIf (sslForSite package) {
          addSSL = true;
          enableACME = true;
          acmeRoot = cfg.challengePath;
        })
      ];
    };

Each virtual host always has a root option (where Nginx should look for HTML files), but only those sites for which SSL is enabled need to specify addSSL, enableACME, and acmeRoot. All the virtual hosts are assembled into a single array (below, cfg refers to the options that the user provided to the module, as specified above).

From module.nix, line 28

`28`	`virtualHosts = map virtualHost cfg.sites;`

If the enable option is set, we enable Nginx, and provide it with a list of all of the virtual hosts we generated. Below, config (not to be confused with cfg) is the namespace for the module’s configuration.

From module.nix, lines 45 through 51

    config.services.nginx = mkIf cfg.enable (mkMerge (virtualHosts ++ [
      {
        # Always enable nginx.
        enable = true;
        recommendedGzipSettings = true;
      }
    ]));

In a similar manner to this, I generate a list of systemd services which are used to configure the challenge path to be writeable. Click the module.nix link above to check out the full file.

Creating a Flake

We now have two “things” that handle the deployment of the blog: the builder functions english and russian which help describe various blog versions, and the NixOS module that configures the server’s Nginx to serve said versions. We now want to expose these to the NixOS system configuration, which describes the entire server. This is where flakes finally come in. Yanik Sander wrote up a pretty comprehensive explanation of how their blog is deployed using flakes, which I often consulted while getting started – check it out if you are looking for more details.

In brief, a Nix flake has inputs and outputs. Inputs can be other flakes or source files that the flake needs access to, and outputs are simply Nix expressions that the flake provides.

The nice thing about flakes’ inputs is that they can reference other flakes via Git. This means that, should I write a flake for my blog (as I am about to do) I will be able to reference its git URL in another flake, and Nix will automatically clone and import it. This helps achieve the adding custom packages goal, since I can now easily write Nix expressions and reference them from my system configuration.

Importantly, flakes track the versions of their inputs in a flake.lock file; this means that, unless explicitly told to do otherwise, they will use the same version of their inputs. This achieves the versioning goal for my blog, too, since now it will pull the pre-defined commit from Git until I tell it to fetch the updated site. In addition to pinning the version of my blog, though, the flake also locks down the version of nixpkgs itself. This means that the same packages will be used in the build process, instead of those found on the host system at the time. This has the nice effect of preventing updates to dependencies from breaking the build; it’s a nice step towards purity and reproducibility.

Let’s take a look at the inputs of my blog flake:

From flake.nix, lines 2 through 19

  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs";
    flake-utils.url = "github:numtide/flake-utils";
    katex-html.url = "git+https://dev.danilafe.com/Nix-Configs/katex-html";
    blog-source = {
      flake = false;
      url = "https://dev.danilafe.com/Web-Projects/blog-static.git";
      type = "git";
      submodules = true;
    };
    blog-source-localized = {
      flake = false;
      url = "https://dev.danilafe.com/Web-Projects/blog-static.git";
      ref = "localization";
      type = "git";
      submodules = true;
    };
  };

Two of these inputs are my blog source code, pulled from its usual Git host. They are marked as flake = false (my blog is just a Hugo project!), and both require submodules to be fetched. One of them is set to the localization branch, once again because localization is not yet stabilized and thus not merged into my blog’s master branch. The other three inputs are flakes, one of which is just nixpkgs. The flake-utils flake provides some convenient functions for writing other flakes, and katex-html is my own creation, a KaTeX-to-HTML conversion script that I use to post-process the blog.

So what outputs should this flake provide? Well, we’ve already defined a NixOS module for the blog, and we’d like our flake to expose this module to the world. But the module alone is not enough; its configuration requires a list of packages created using our builders. Where does one procure such a list? The caller will need access to the builders themselves. To make all of this work, I ended up with the following expression for my outputs:

From flake.nix, lines 21 through 34

  outputs = { self, blog-source, blog-source-localized, nixpkgs, flake-utils, katex-html }:
    let
      buildersFor = system: import ./lib.nix {
        inherit blog-source blog-source-localized;
        pkgs = import nixpkgs { inherit system; };
	katex-html = katex-html.defaultPackage.${system};
      };
    in
      {
	inherit buildersFor;
        nixosModule = (import ./module.nix);
      } // flake-utils.lib.eachDefaultSystem (system: {
          defaultPackage = (buildersFor system).english { host = "danilafe.com"; };
      });

The flake output schema provides a standard option for exposing modules, nixosModule. Then, exposing my module.nix file from the flake is simply a matter of importing it, as on line 31. There is, however, no standard way for exposing a function. The good news is that any attribute defined on a flake is accessible from code that imports that flake. Thus, I simply added a buildersFor function, which fetches the nixpkgs collection and LaTeX builder script for a given system, and feeds them to the file that defines the english and russian builders. This buildersFor function also provides the builders with the two different blog sources they reference, blog-source and blog-source-localized.

The system parameter to buildersFor is necessary because the set of packages from nixpkgs depends on it. Thus, if the builders use any packages from the collection (they do), they must know which system to pull packages for. This is a common pattern in flakes: the packages attribute is typically a system-to-package mapping, too.

Finally, the last little bit on lines 32 through 34 defines a default package for the flake. This is the package that is built if a user runs nix build .#. This isn’t strictly necessary for my purposes, but it’s nice to be able to test that the builders still work by running a test build. The eachDefaultSystem function generates a defaultPackage attribute for each of the “default” systems, so that the package is buildable on more than just my server architecture.

And that’s it for the blog flake! I simply push it to Git, and move on to actually using it from elsewhere.

Using the Module

In my server configuration (which is, itself, a flake), I simply list my blog-static-flake as one of the inputs:

From flake.nix, line 4

`4`	`blog.url = "git+https://dev.danilafe.com/DanilaFe/blog-static-flake";`

Then, in the modules attribute, I include blog.nixosModule, making NixOS aware of its options and configuration. The final little piece is to provide the english and russian builders to the system configuration; this can be done using the specialArgs attribute. The whole flake.nix file is pretty short:

From flake.nix, entire file

{
  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";
    blog.url = "git+https://dev.danilafe.com/DanilaFe/blog-static-flake";
  };
  outputs = { self, nixpkgs, blog }:
    let
      system = "x86_64-linux";
      builders = blog.buildersFor system;
    in  
      {
        nixosConfigurations.nixos-droplet-v2 = nixpkgs.lib.nixosSystem {
	  inherit system;
          specialArgs = { inherit system builders; };
          modules = [ ./configuration.nix blog.nixosModule ];
        };
      };
}

Finally, in configuration.nix, taking builders as one of the inputs, I write what you saw above:

From configuration.nix, lines 42 through 59

  services.danilafe-blog = {
    enable = true;
    challengePath = "/var/www/challenges";
    sites = [
      (builders.english {
        ssl = true;
        host = "danilafe.com";
      })
      (builders.english {
        drafts = true;
        host = "drafts.danilafe.com";
      })
      (builders.russian {
        drafts = true;
        host = "drafts.ru.danilafe.com";
      })
    ];
  };

Wrapping Up

So there you have it, a flake-based multi-version blog deployment written in a declarative style. You can check out both my system configuration flake and my blog flake on my Git server. If you want more, check out the articles by Xe and Yannik linked above. Thanks for reading!

Digit Sum Patterns and Modular Arithmetic

Thu, 30 Dec 2021 15:42:40 -0800

When I was in elementary school, our class was briefly visited by our school’s headmaster. He was there for a demonstration, probably intended to get us to practice our multiplication tables. “Pick a number”, he said, “And I’ll teach you how to draw a pattern from it.”

The procedure was rather simple:

Pick a number between 2 and 8 (inclusive).
Start generating positive multiples of this number. If you picked 8, your multiples would be 8, 16, 24, and so on.
If a multiple is more than one digit long, sum its digits. For instance, for 16, write 1+6=7. If the digits add up to a number that’s still more than 1 digit long, add up the digits of that number (and so on).
Start drawing on a grid. For each resulting number, draw that many squares in one direction, and then “turn”. Using 8 as our example, we could draw 8 up, 7 to the right, 6 down, 5 to the left, and so on.
As soon as you come back to where you started (“And that will always happen”, said my headmaster), you’re done. You should have drawn a pretty pattern!

Sticking with our example of 8, the pattern you’d end up with would be something like this:

Pattern generated by the number 8.

Before we go any further, let’s observe that it’s not too hard to write code to do this. For instance, the “add digits” algorithm can be naively written by turning the number into a string (17 becomes "17"), splitting that string into characters ("17" becomes ["1", "7"]), turning each of these character back into numbers (the array becomes [1, 7]) and then computing the sum of the array, leaving 8.

From patterns.rb, lines 3 through 8

def sum_digits(n)
  while n > 9
    n = n.to_s.chars.map(&:to_i).sum
  end
  n
end

We may now encode the “drawing” logic. At any point, there’s a “direction” we’re going - which I’ll denote by the Ruby symbols :top, :bottom, :left, and :right. Each step, we take the current x,y coordinates (our position on the grid), and shift them by n in a particular direction dir. We also return the new direction alongside the new coordinates.

From patterns.rb, lines 10 through 21

def step(x, y, n, dir)
  case dir
  when :top
    return [x,y+n,:right]
  when :right
    return [x+n,y,:bottom]
  when :bottom
    return [x,y-n,:left]
  when :left
    return [x-n,y,:top]
  end
end

The top-level algorithm is captured by the following code, which produces a list of coordinates in the order that you’d visit them.

From patterns.rb, lines 23 through 35

def run_number(number)
  counter = 1
  x, y, dir = 0, 0, :top
  line_stack = [[0,0]]

  loop do
    x, y, dir = step(x,y, sum_digits(counter*number), dir)
    line_stack << [x,y]
    counter += 1
    break if x == 0 && y == 0
  end
  return make_svg(line_stack)
end

I will omit the code for generating SVGs from the body of the article – you can always find the complete source code in this blog’s Git repo (or by clicking the link in the code block above). Let’s run the code on a few other numbers. Here’s one for 4, for instance:

Pattern generated by the number 4.

And one more for 2, which I don’t find as pretty.

Pattern generated by the number 2.

It really does always work out! Young me was amazed, though I would often run out of space on my grid paper to complete the pattern, or miscount the length of my lines partway in. It was only recently that I started thinking about why it works, and I think I figured it out. Let’s take a look!

Is a number divisible by 3?

You might find the whole “add up the digits of a number” thing familiar, and for good reason: it’s one way to check if a number is divisible by 3. The quick summary of this result is,

If the sum of the digits of a number is divisible by 3, then so is the whole number.

For example, the sum of the digits of 72 is 9, which is divisible by 3; 72 itself is correspondingly also divisible by 3, since 24*3=72. On the other hand, the sum of the digits of 82 is 10, which is not divisible by 3; 82 isn’t divisible by 3 either (it’s one more than 81, which is divisible by 3).

Why does this work? Let’s talk remainders.

If a number doesn’t cleanly divide another (we’re sticking to integers here), what’s left behind is the remainder. For instance, dividing 7 by 3 leaves us with a remainder 1. On the other hand, if the remainder is zero, then that means that our dividend is divisible by the divisor (what a mouthful). In mathematics, we typically use $a|b$ to say $a$ divides $b$, or, as we have seen above, that the remainder of dividing $b$ by $a$ is zero.

Working with remainders actually comes up pretty frequently in discrete math. A well-known example I’m aware of is the RSA algorithm, which works with remainders resulting from dividing by a product of two large prime numbers. But what’s a good way to write, in numbers and symbols, the claim that “$a$ divides $b$ with remainder $r$”? Well, we know that dividing yields a quotient (possibly zero) and a remainder (also possibly zero). Let’s call the quotient $q$. Then, we know that when dividing $b$ by $a$ we have: [note: It's important to point out that for the equation in question to represent division with quotient $q$ and remainder $r$, it must be that $r$ is less than $a$. Otherwise, you could write $r = s + a$ for some $s$, and end up with $$ \begin{aligned} & b = qa + r \\ \Rightarrow\ & b = qa + (s + a) \\ \Rightarrow\ & b = (q+1)a + s \end{aligned} $$ In plain English, if $r$ is bigger than $a$ after you've divided, you haven't taken out "as much $a$ from your dividend as you could", and the actual quotient is larger than $q$. ]

$$ \begin{aligned} & b = qa + r \\ \Rightarrow\ & b-r = qa \\ \end{aligned} $$

We only really care about the remainder here, not the quotient, since it’s the remainder that determines if something is divisible or not. From the form of the second equation, we can deduce that $b-r$ is divisible by $a$ (it’s literally equal to $a$ times $q$, so it must be divisible). Thus, we can write:

$$ a|(b-r) $$

There’s another notation for this type of statement, though. To say that the difference between two numbers is divisible by a third number, we write:

$$ b \equiv r\ (\text{mod}\ a) $$

Some things that seem like they would work from this “equation-like” notation do, indeed, work. For instance, we can “add two equations” (I’ll omit the proof here; jump down to this section to see how it works):

$$ \textbf{if}\ a \equiv b\ (\text{mod}\ k)\ \textbf{and}\ c \equiv d, (\text{mod}\ k),\ \textbf{then}\ a+c \equiv b+d\ (\text{mod}\ k). $$

Multiplying both sides by the same number (call it $n$) also works (once again, you can find the proof in this section below).

$$ \textbf{if}\ a \equiv b\ (\text{mod}\ k),\ \textbf{then}\ na \equiv nb\ (\text{mod}\ k). $$

Ok, that’s a lot of notation and other stuff. Let’s talk specifics. Of particular interest is the number 10, since our number system is base ten (the value of a digit is multiplied by 10 for every place it moves to the left). The remainder of 10 when dividing by 3 is 1. Thus, we have:

$$ 10 \equiv 1\ (\text{mod}\ 3) $$

From this, we can deduce that multiplying by 10, when it comes to remainders from dividing by 3, is the same as multiplying by 1. We can clearly see this by multiplying both sides by $n$. In our notation:

$$ 10n \equiv n\ (\text{mod}\ 3) $$

But wait, there’s more. Take any power of ten, be it a hundred, a thousand, or a million. Multiplying by that number is also equivalent to multiplying by 1!

$$ 10^kn = 10\times10\times...\times 10n \equiv n\ (\text{mod}\ 3) $$

We can put this to good use. Let’s take a large number that’s divisible by 3. This number will be made of multiple digits, like $d_2d_1d_0$. Note that I do not mean multiplication here, but specifically that each $d_i$ is a number between 0 and 9 in a particular place in the number – it’s a digit. Now, we can write:

$$ \begin{aligned} 0 &\equiv d_2d_1d_0 \\ & = 100d_2 + 10d_1 + d_0 \\ & \equiv d_2 + d_1 + d_0 \end{aligned} $$

We have just found that $d_2+d_1+d_0 \equiv 0\ (\text{mod}\ 3)$, or that the sum of the digits is also divisible by 3. The logic we use works in the other direction, too: if the sum of the digits is divisible, then so is the actual number.

There’s only one property of the number 3 we used for this reasoning: that $10 \equiv 1\ (\text{mod}\ 3)$. But it so happens that there’s another number that has this property: 9. This means that to check if a number is divisible by nine, we can also check if the sum of the digits is divisible by 9. Try it on 18, 27, 81, and 198.

Here’s the main takeaway: summing the digits in the way described by my headmaster is the same as figuring out the remainder of the number from dividing by 9. Well, almost. The difference is the case of 9 itself: the remainder here is 0, but we actually use 9 to draw our line. We can actually try just using 0. Here’s the updated sum_digits code:

def sum_digits(n)
    n % 9
end

The results are similarly cool:

Pattern generated by the number 8.

Pattern generated by the number 4.

Pattern generated by the number 2.

Sequences of Remainders

So now we know what the digit-summing algorithm is really doing. But that algorithm isn’t all there is to it! We’re repeatedly applying this algorithm over and over to multiples of another number. How does this work, and why does it always loop around? Why don’t we ever spiral farther and farther from the center?

First, let’s take a closer look at our sequence of multiples. Suppose we’re working with multiples of some number $n$. Let’s write $a_i$ for the $i$th multiple. Then, we end up with:

$$ \begin{aligned} a_1 &= n \\ a_2 &= 2n \\ a_3 &= 3n \\ a_4 &= 4n \\ ... \\ a_i &= in \end{aligned} $$

This is actually called an arithmetic sequence; for each multiple, the number increases by $n$.

Here’s a first seemingly trivial point: at some time, the remainder of $a_i$ will repeat. There are only so many remainders when dividing by nine: specifically, the only possible remainders are the numbers 0 through 8. We can invoke the pigeonhole principle and say that after 9 multiples, we will have to have looped. Another way of seeing this is as follows:

$$ \begin{aligned} & 9 \equiv 0\ (\text{mod}\ 9) \\ \Rightarrow\ & 9n \equiv 0\ (\text{mod}\ 9) \\ \Rightarrow\ & 10n \equiv n\ (\text{mod}\ 9) \\ \end{aligned} $$

The 10th multiple is equivalent to n, and will thus have the same remainder. The looping may happen earlier: the simplest case is if we pick 9 as our $n$, in which case the remainder will always be 0.

Repeating remainders alone do not guarantee that we will return to the center. The repeating sequence 1,2,3,4 will certainly cause a spiral. The reason is that, if we start facing “up”, we will always move up 1 and down 3 after four steps, leaving us 2 steps below where we started. Next, the cycle will repeat, and since turning four times leaves us facing “up” again, we’ll end up getting further away. Here’s a picture that captures this behvior:

Spiral generated by the number 1 with divisor 4.

And here’s one more where the cycle repeats after 8 steps instead of 4. You can see that it also leads to a spiral:

Spiral generated by the number 1 with divisor 8.

From this, we can devise a simple condition to prevent spiraling – the length of the sequence before it repeats cannot be a multiple of 4. This way, whenever the cycle restarts, it will do so in a different direction: backwards, turned once to the left, or turned once to the right. Clearly repeating the sequence backwards is guaranteed to take us back to the start. The same is true for the left and right-turn sequences, though it’s less obvious. If drawing our sequence once left us turned to the right, drawing our sequence twice will leave us turned more to the right. On a grid, two right turns are the same as turning around. The third repetition will then undo the effects of the first one (since we’re facing backwards now), and the fourth will undo the effects of the second.

There is an exception to this multiple-of-4 rule: if a sequence makes it back to the origin right before it starts over. In that case, even if it’s facing the very same direction it started with, all is well – things are just like when it first started, and the cycle repeats. I haven’t found a sequence that does this, so for our purposes, we’ll stick with avoiding multiples of 4.

Okay, so we want to avoid cycles with lengths divisible by four. What does it mean for a cycle to be of length k? It effectively means the following:

$$ \begin{aligned} & a_{k+1} \equiv a_1\ (\text{mod}\ 9) \\ \Rightarrow\ & (k+1)n \equiv n\ (\text{mod}\ 9) \\ \Rightarrow\ & kn \equiv 0\ (\text{mod}\ 9) \\ \end{aligned} $$

If we could divide both sides by $k$, we could go one more step:

$$ n \equiv 0\ (\text{mod}\ 9) \\ $$

That is, $n$ would be divisible by 9! This would contradict our choice of $n$ to be between 2 and 8. What went wrong? Turns out, it’s that last step: we can’t always divide by $k$. Some values of $k$ are special, and it’s only those values that can serve as cycle lengths without causing a contradiction. So, what are they?

They’re values that have a common factor with 9 (an incomplete explanation is in this section below). There are many numbers that have a common factor with 9; 3, 6, 9, 12, and so on. However, those can’t all serve as cycle lengths: as we said, cycles can’t get longer than 9. This leaves us with 3, 6, and 9 as possible cycle lengths, none of which are divisible by 4. We’ve eliminated the possibility of spirals!

Generalizing to Arbitrary Divisors

The trick was easily executable on paper because there’s an easy way to compute the remainder of a number when dividing by 9 (adding up the digits). However, we have a computer, and we don’t need to fall back on such cool-but-complicated techniques. To replicate our original behavior, we can just write:

def sum_digits(n)
  x = n % 9
  x == 0 ? 9 : x
end

But now, we can change the 9 to something else. There are some numbers we’d like to avoid - specifically, we want to avoid those numbers that would allow for cycles of length 4 (or of a length divisible by 4). If we didn’t avoid them, we might run into infinite loops, where our pencil might end up moving further and further from the center.

Actually, let’s revisit that. When we were playing with paths of length $k$ while dividing by 9, we noted that the only possible values of $k$ are those that share a common factor with 9, specifically 3, 6 and 9. But that’s not quite as strong as it could be: try as you might, but you will not find a cycle of length 6 when dividing by 9. The same is true if we pick 6 instead of 9, and try to find a cycle of length 4. Even though 4 does have a common factor with 6, and thus is not ruled out as a valid cycle by our previous condition, we don’t find any cycles of length 4.

So what is it that really determines if there can be cycles or not?

Let’s do some more playing around. What are the actual cycle lengths when we divide by 9? For all but two numbers, the cycle lengths are 9. The two special numbers are 6 and 3, and they end up with a cycle length of 3. From this, we can say that the cycle length seems to depend on whether or not our $n$ has any common factors with the divisor.

Let’s explore this some more with a different divisor, say 12. We fill find that 8 has a cycle length of 3, 7 has a cycle length of 12, 9 has a cycle length of 4. What’s happening here? To see, let’s divide 12 by these cycle lengths. For 8, we get (12/3) = 4. For 7, this works out to 1. For 9, it works out to 3. These new numbers, 4, 1, and 3, are actually the greatest common factors of 8, 7, and 3 with 12, respectively. The greatest common factor of two numbers is the largest number that divides them both. We thus write down our guess for the length of a cycle:

$$ k = \frac{d}{\text{gcd}(d,n)} $$

Where $d$ is our divisor, which has been 9 until just recently, and $\text{gcd}(d,n)$ is the greatest common factor of $d$ and $n$. This equation is in agreement with our experiment for $d = 9$, too. Why might this be? Recall that sequences with period $k$ imply the following congruence:

$$ kn \equiv 0\ (\text{mod}\ d) $$

Here I’ve replaced 9 with $d$, since we’re trying to make it work for any divisor, not just 9. Now, suppose the greatest common divisor of $n$ and $d$ is some number $f$. Then, since this number divides $n$ and $d$, we can write $n=fm$ for some $m$, and $d=fg$ for some $g$. We can rewrite our congruence as follows:

$$ kfm \equiv 0\ (\text{mod}\ fg) $$

We can simplify this a little bit. Recall that what this congruence really means is that the difference of $kfm$ and $0$, which is just $kfm$, is divisible by $fg$:

$$ fg|kfm $$

But if $fg$ divides $kfm$, it must be that $g$ divides $km$! This, in turn, means we can write:

$$ g|km $$

Can we distill this statement even further? It turns out that we can. Remember that we got $g$ and $m$ by dividing $d$ and $n$ by their greatest common factor, $f$. This, in turn, means that $g$ and $m$ have no more common factors that aren’t equal to 1 (see this section below). From this, in turn, we can deduce that $m$ is not relevant to $g$ dividing $km$, and we get:

$$ g|k $$

That is, we get that $k$ must be divisible by $g$. Recall that we got $g$ by dividing $d$ by $f$, which is our largest common factor – aka $\text{gcd}(d,n)$. We can thus write:

$$ \frac{d}{\text{gcd}(d,n)}|k $$

Let’s stop and appreciate this result. We have found a condition that is required for a sequnce of remainders from dividing by $d$ (which was 9 in the original problem) to repeat after $k$ numbers. Furthermore, all of our steps can be performed in reverse, which means that if a $k$ matches this conditon, we can work backwards and determine that a sequence of numbers has to repeat after $k$ steps.

Multiple $k$s will match this condition, and that’s not surprising. If a sequence repeats after 5 steps, it also repeats after 10, 15, and so on. We’re interested in the first time our sequences repeat after taking any steps, which means we have to pick the smallest possible non-zero value of $k$. The smallest number divisible by $d/\text{gcd}(d,n)$ is $d/\text{gcd}(d,n)$ itself. We thus confirm our hypothesis:

$$ k = \frac{d}{\text{gcd}(d,n)} $$

Lastly, recall that our patterns would spiral away from the center whenever a $k$ is a multiple of 4. Now that we know what $k$ is, we can restate this as “$d/\text{gcd}(d,n)$ is divisible by 4”. But if we pick $n=d-1$, the greatest common factor has to be $1$ (see this section below), so we can even further simplify this “$d$ is divisible by 4”. Thus, we can state simply that any divisor divisible by 4 is off-limits, as it will induce loops. For example, pick $d=4$. Running our algorithm for $n=d-1=3$, [note: Did you catch that? From our work above, we didn't just find a condition that would prevent spirals; we also found the precise number that would result in a spiral if this condition were violated! This is because our proof is constructive: instead of just claiming the existence of a thing, it also shows how to get that thing. Our proof in the earlier section (which claimed that the divisor 9 would never create spirals) went by contradiction, which was not constructive. Repeating that proof for a general $d$ wouldn't have told us the specific numbers that would spiral.

This is the reason that direct proofs tend to be preferred over proofs by contradiction. ] we indeed find an infinite spiral:

Spiral generated by the number 3 with divisor 4.

Let’s try again. Pick $d=8$; then, for $n=d-1=7$, we also get a spiral:

Spiral generated by the number 7 with divisor 8.

A poem comes to mind:

Turning and turning in the widening gyre

The falcon cannot hear the falconner;

Fortunately, there are plenty of numbers that are not divisible by four, and we can pick any of them! I’ll pick primes for good measure. Here are a few good ones from using 13 (which corresponds to summing digits of base-14 numbers):

Pattern generated by the number 8 in base 14.

Pattern generated by the number 4 in base 14.

Here’s one from dividing by 17 (base-18 numbers).

Pattern generated by the number 5 in base 18.

Finally, base-30:

Pattern generated by the number 2 in base 30.

Pattern generated by the number 6 in base 30.

Generalizing to Arbitrary Numbers of Directions

What if we didn’t turn 90 degrees each time? What, if, instead, we turned 120 degrees (so that turning 3 times, not 4, would leave you facing the same direction you started)? We can pretty easily do that, too. Let’s call this number of turns $c$. Up until now, we had $c=4$.

First, let’s update our condition. Before, we had “$d$ cannot be divisible by 4”. Now, we aren’t constraining ourselves to only 4, but rather using a generic variable $c$. We then end up with “$d$ cannot be divisible by $c$”. For instance, suppose we kept our divisor as 9 for the time being, but started turning 3 times instead of 4. This violates our divisibility condtion, and we once again end up with a spiral:

Pattern generated by the number 8 in base 10 while turning 3 times.

If, on the other hand, we pick $d=8$ and $c=3$, we get patterns for all numbers just like we hoped. Here’s one such pattern:

Pattern generated by the number 7 in base 9 while turning 3 times.

Hold on a moment; it’s actully not so obvious why our condition still works. When we just turned on a grid, things were simple. As long as we didn’t end up facing the same way we started, we will eventually perform the exact same motions in reverse. The same is not true when turning 120 degrees, like we suggested. Here’s an animated circle all of the turns we would make:

Orientations when turning 120 degrees

We never quite do the exact opposite of any one of our movements. So then, will we come back to the origin anyway? Well, let’s start simple. Suppose we always turn by exactly one 120-degree increment (we might end up turning more or less, just like we may end up turning left, right, or back in the 90 degree case). Each time you face a particular direciton, after performing a cycle, you will have moved some distance away from when you started, and turned 120 degrees. If you then repeat the cycle, you will once again move by the same offset as before, but this time the offset will be rotated 120 degrees, and you will have rotated a total of 240 degrees. Finally, performing the cycle a third time, you’ll have moved by the same offset (rotated 240 degrees).

If you overaly each offset such that their starting points overlap, they will look very similar to that circle above. And now, here’s the beauty: you can arrange these rotated offsets into a triangle:

Triangle formed by three 120-degree turns.

As long as you rotate by the same amount each time (and you will, since the cycle length determines how many times you turn, and the cycle length never changes), you can do so for any number of directions. For instance, here’s a similar visualization in which there are 5 possible directions, and where each turn is consequently 72 degrees:

Pentagon formed by five 72-degree turns.

Each of these polygon shapes forms a loop. If you walk along its sides, you will eventually end up exactly where you started. This confirms that if you end up making one turn at the end of each cycle, you will eventually end up right where you started.

Things aren’t always as simple as making a single turn, though. Let’s go back to the version of the problem in which we have 3 possible directions, and think about what would happen if we turned by 240 degrees at a time: 2 turns instead of 1?

Even though we first turn a whole 240 degrees, the second time we turn we “overshoot” our initial bearing, and end up at 120 degrees compared to it. As soon as we turn 240 more degrees (turning the third time), we end up back at 0. In short, even though we “visited” each bearing in a different order, we visited them all, and exactly once at that. Here’s a visualization:

Orientations when turning 120 degrees, twice at a time

Note that even though in the above picture it looks like we’re just turning left instead of right, that’s not the case; a single turn of 240 degrees is more than half the circle, so our second bearing ends up on the left side of the circle even though we turn right.

Just to make sure we really see what’s happening, let’s try this when there are 5 possible directions, and when we still make two turns (now of 72 degrees each)

Orientations when turning 72 degrees, twice at a time

Let’s try put some mathematical backing to this “visited them all” idea, and turning in general. First, observe that as soon as we turn 360 degrees, it’s as good as not turning at all - we end up facing up again. If we turned 480 degrees (that is, two turns of 240 degrees each), the first 360 can be safely ignored, since it puts us where we started; only the 120 degrees that remain are needed to figure out our final bearing. In short, the final direction we’re facing is the remainder from dividing by 360. We already know how to formulate this using modular arithmetic: if we turn $t$ degrees $k$ times, and end up at final bearing (remainder) $b$, this is captured by:

$$ kt \equiv b\ (\text{mod}\ 360) $$

Of course, if we end up facing the same way we started, we get the familiar equivalence:

$$ kt \equiv 0\ (\text{mod}\ 360) $$

Even though the variables in this equivalence mean different things now than they did last time we saw it, the mathematical properties remain the same. For instance, we can say that after $360/\text{gcd}(360, t)$ turns, we’ll end up facing the way that we started.

So far, so good. What I don’t like about this, though, is that we have all of these numbers of degrees all over our equations: 72 degrees, 144 degrees, and so forth. However, something like 73 degrees (if there are five possible directions) is just not a valid bearing, and nor is 71. We have so many possible degrees (360 of them, to be exact), but we’re only using a handful! That’s wasteful. Instead, observe that for $c$ possible turns, the smallest possible turn angle is $360/c$. Let’s call this angle $\theta$ (theta). Now, notice that we always turn in multiples of $\theta$: a single turn moves us $\theta$ degrees, two turns move us $2\theta$ degrees, and so on. If we define $r$ to be the number of turns that we find ourselves rotated by after a single cycle, we have $t=r\theta$, and our turning equation can be written as:

$$ kr\theta \equiv 0\ (\text{mod}\ c\theta) $$

Now, once again, recall that the above equivalence is just notation for the following:

$$ \begin{aligned} & c\theta|kr\theta \\ \Leftrightarrow\ & c|kr \end{aligned} $$

And finally, observing that $kr=kr-0$, we have:

$$ kr \equiv 0\ (\text{mod}\ c) $$

This equivalence says the same thing as our earlier one; however, instead of being in terms of degrees, it’s in terms of the number of turns $c$ and the turns-per-cycle $r$. Now, recall once again that the smallest number of steps $k>0$ for which this equivalence holds is $k = c/\text{gcd}(c,r)$.

We’re close now: we have a sequence of $k$ steps that will lead us back to the beginning. What’s left is to show that these $k$ steps are evenly distributed throughout our circle, which is the key property that makes it possible for us to make a polygon out of them (and thus end up back where we started).

To show this, say that we have a largest common divisor $f=\text{gcd}(c,r)$, and that $c=fe$ and $r=fs$. We can once again “divide through” by $f$, and get:

$$ ks \equiv 0\ (\text{mod}\ e) $$

Now, we know that $\text{gcd}(e,s)=1$ (see this section below), and thus:

$$ k = e/\text{gcd}(e,s) = e $$

That is, our cycle will repeat after $e$ remainders. But wait, we’ve only got $e$ possible remainders: the numbers $0$ through $e-1$! Thus, for a cycle to repeat after $e$ remainders, all possible remainders must occur. For a concrete example, take $e=5$; our remainders will be the set $\{0,1,2,3,4\}$. Now, let’s “multiply back through” by $f$:

$$ kfs \equiv 0\ (\text{mod}\ fe) $$

We still have $e$ possible remainders, but this time they are multiplied by $f$. For example, taking $e$ to once again be equal to $5$, we have the set of possible remainders $\{0, f, 2f, 3f, 4f\}$. The important bit is that these remainders are all evenly spaced, and that space between them is $f=\text{gcd}(c,r)$.

Let’s recap: we have confirmed that for $c$ possible turns (4 in our original formulation), and $r$ turns at a time, we will always loop after $k=c/\text{gcd}(c,r)$ steps, evenly spaced out at $\text{gcd}(c,r)$ turns. No specific properties from $c$ or $r$ are needed for this to work. Finally, recall from the previous section that $r$ is zero (and thus, our pattern breaks down) whenever the divisor $d$ (9 in our original formulation) is itself divisible by $c$. And so, as long as we pick a system with $c$ possible directions and divisor $d$, we will always loop back and create a pattern as long as $c\nmid d$ ($c$ does not divide $d$).

Let’s try it out! There’s a few pictures below. When reading the captions, keep in mind that the base is one more than the divisor (we started with numbers in the usual base 10, but divided by 9).

Pattern generated by the number 1 in base 8 while turning 5 times.

Pattern generated by the number 3 in base 5 while turning 7 times.

Pattern generated by the number 3 in base 12 while turning 6 times.

Pattern generated by the number 2 in base 12 while turning 7 times.

Conclusion

Today we peeked under the hood of a neat mathematical trick that was shown to me by my headmaster over 10 years ago now. Studying what it was that made this trick work led us to play with the underlying mathematics some more, and extend the trick to more situations (and prettier patterns). I hope you found this as interesting as I did!

By the way, the kind of math that we did in this article is most closely categorized as number theory. Check it out if you’re interested!

Finally, a huge thank you to Arthur for checking my math, helping me with proofs, and proofreading the article.

All that remains are some proofs I omitted from the original article since they were taking up a lot of space (and were interrupting the flow of the explanation). They are listed below.

Referenced Proofs

Adding Two Congruences

Claim: If for some numbers $a$, $b$, $c$, $d$, and $k$, we have $a \equiv b\ (\text{mod}\ k)$ and $c \equiv d\ (\text{mod}\ k)$, then it’s also true that $a+c \equiv b+d\ (\text{mod}\ k)$.

Proof: By definition, we have $k|(a-b)$ and $k|(c-d)$. This, in turn, means that for some $i$ and $j$, $a-b=ik$ and $c-d=jk$. Add both sides to get: $$ \begin{aligned} & (a-b)+(c-d) = ik+jk \\ \Rightarrow\ & (a+c)-(b+d) = (i+j)k \\ \Rightarrow\ & k\ |\left[(a+c)-(b+d)\right]\\ \Rightarrow\ & a+c \equiv b+d\ (\text{mod}\ k) \\ \end{aligned} $$ $\blacksquare$

Multiplying Both Sides of a Congruence

Claim: If for some numbers $a$, $b$, $n$ and $k$, we have $a \equiv b\ (\text{mod}\ k)$ then we also have that $an \equiv bn\ (\text{mod}\ k)$.

Proof: By definition, we have $k|(a-b)$. Since multiplying $a-b$ but $n$ cannot make it not divisible by $k$, we also have $k|\left[n(a-b)\right]$. Distributing $n$, we have $k|(na-nb)$. By definition, this means $na\equiv nb\ (\text{mod}\ k)$.

$\blacksquare$

Claim: A number $k$ is only invertible (can be divided by) in $\text{mod}\ d$ if $k$ and $d$ share no common factors (except 1).

Proof: Write $\text{gcd}(k,d)$ for the greatest common factor divisor of $k$ and $d$. Another important fact (not proven here, but see something like this), is that if $\text{gcd}(k,d) = r$, then the smallest possible number that can be made by adding and subtracting $k$s and $d$s is $r$. That is, for some $i$ and $j$, the smallest possible positive value of $ik + jd$ is $r$.

Now, note that $d \equiv 0\ (\text{mod}\ d)$. Multiplying both sides by $j$, get $jd\equiv 0\ (\text{mod}\ d)$. This, in turn, means that the smallest possible value of $ik+jd \equiv ik$ is $r$. If $r$ is bigger than 1 (i.e., if $k$ and $d$ have common factors), then we can’t pick $i$ such that $ik\equiv1$, since we know that $r>1$ is the least possible value we can make. There is therefore no multiplicative inverse to $k$. Alternatively worded, we cannot divide by $k$.

$\blacksquare$

Numbers Divided by Their $\text{gcd}$ Have No Common Factors

Claim: For any two numbers $a$ and $b$ and their largest common factor $f$, if $a=fc$ and $b=fd$, then $c$ and $d$ have no common factors other than 1 (i.e., $\text{gcd}(c,d)=1$).

Proof: Suppose that $c$ and $d$ do have sommon factor, $e\neq1$. In that case, we have $c=ei$ and $d=ej$ for some $i$ and $j$. Then, we have $a=fei$, and $b=fej$. From this, it’s clear that both $a$ and $b$ are divisible by $fe$. Since $e$ is greater than $1$, $fe$ is greater than $f$. But our assumptions state that $f$ is the greatest common divisor of $a$ and $b$! We have arrived at a contradiction.

Thus, $c$ and $d$ cannot have a common factor other than 1.

$\blacksquare$

Divisors of $n$ and $n-1$.

Claim: For any $n$, $\text{gcd}(n,n-1)=1$. That is, $n$ and $n-1$ share no common divisors.

Proof: Suppose some number $f$ divides both $n$ and $n-1$. In that case, we can write $n=af$, and $(n-1)=bf$ for some $a$ and $b$. Subtracting one equation from the other:

$$ 1 = (a-b)f $$ But this means that 1 is divisible by $f$! That’s only possible if $f=1$. Thus, the only number that divides $n$ and $n-1$ is 1; that’s our greatest common factor.

$\blacksquare$

Daniel's Blog

Reasons to Love the Field of Programming Languages

Human Aspects of PL

The Mathematics of PL

Pragmatics of PL

Conclusion

Chapel's Runtime Types as an Interesting Alternative to Dependent Types

A Taste of Chapel’s Array Types

Difficulties with Dependent Types

Hiding Runtime Values from the Type

Pitfalls of Runtime Types

Conclusion

Implementing and Verifying "Static Program Analysis" in Agda, Part 9: Verifying the Forward Analysis

High-Level Algorithm

A Formal Definition of Correctness

Properties of the Semantic Function

Correctness of the Evaluator

Proving The Analysis Correct

Lifting Expression Evaluation Correctness to Statements

Walking the Trace

Future Work

Implementing and Verifying "Static Program Analysis" in Agda, Part 8: Forward Analysis

Choosing a Lattice

Constructing a Monotone Function

Generalized Update

Instantiating with the Sign Lattice

Invoking the Fixed Point Algorithm

Verifying the Analysis

Implementing and Verifying "Static Program Analysis" in Agda, Part 7: Connecting Semantics and Control Flow Graphs

Traces: Paths Through a Graph

Trace Preservation by Graph Operations

End-To-End Traces

Proof of Sufficiency

Defining and Verifying Static Program Analyses

Implementing and Verifying "Static Program Analysis" in Agda, Part 6: Control Flow Graphs

Control Flow Graphs in Agda

Basic Definition

Combining Graphs

Additional Functions

Connecting Two Distinct Representations

Implementing and Verifying "Static Program Analysis" in Agda, Part 5: Our Programming Language

The Syntax of Our Simple Language

The Semantics of Our Language

Expressions

Simple Statements

Statements

Semantics as Ground Truth

Implementing and Verifying "Static Program Analysis" in Agda, Part 4: The Fixed-Point Algorithm

The Algorithm

Least Fixed Point

What is a Program?

Implementing and Verifying "Static Program Analysis" in Agda, Part 3: Lattices of Finite Height

Formalizing Finite Height

Fixed Height of the “Above-Below” Lattice

Fixed Height of the Product Lattice

Iterated Products

Fixed Height of the Map Lattice

Using the Finite Height Property

Appendix: The Unit Lattice

Implementing and Verifying "Static Program Analysis" in Agda, Part 2: Combining Lattices

The Cartesian Product Lattice

The Map Lattice

The Theory

The Implementation

Additional Properties of Lattices

Appendix: Proof of Uniqueness of Keys

Untitled Short Story

Implementing and Verifying "Static Program Analysis" in Agda, Part 1: Lattices

Monotone Frameworks

Lattices

Concrete Examples

Natural Numbers

The “Above-Below” Lattice

From Simple Lattices to Complex Ones

Implementing and Verifying "Static Program Analysis" in Agda, Part 0: Intro

Navigation

Microfeatures I Love in Blogs and Personal Websites

Sidenotes

Tables of Contents

Bonus: Showing Page Progress

The `IsSomething` Pattern: Parameterizing By Operations

It’s just a `foldr`!

What About `Foldable`?