Consider this silly code:
trait MyTrait {
fn foo(&self);
}
struct S1;
impl MyTrait for S1 {
fn foo(&self) {
println!("S1::foo()");
}
}
fn call_foo<T>(t: &T) where T: MyTrait {
t.foo();
}
fn main() {
let s1 = S1{};
call_foo(&s1);
}
This seems fine so far.
Now, let’s suppose we have an collection of MyTraits, like this:
// Previous code not shown.
struct S2;
impl MyTrait for S2 {
fn foo(&self) {
println!("S2::foo()");
}
}
fn main() {
let v: Vec<&dyn MyTrait> = vec![&S1{}, &S2{}];
for x in v {
call_foo(x);
}
}
This produces this compilation error:
Compiling playground v0.0.1 (/playground)
error[E0277]: the size for values of type `dyn MyTrait` cannot be known at compilation time
--> src/main.rs:28:18
|
28 | call_foo(x);
| -------- ^ doesn't have a size known at compile-time
| |
| required by a bound introduced by this call
|
= help: the trait `Sized` is not implemented for `dyn MyTrait`
The problem is that Rust generics are monomorphized, but monomorphization is not supported for trait objects.
call_foo is a colored function.
The code doesn’t compile because trait objects are the wrong color.
Yes. Here’s an example: The Rust bindings for interacting with the Z3
theorem prover have a trait
z3::ast::Ast to represent terms, constants, and expressions. As
you’re building a theory, you may want to maintain a vector of your
constants in a Vec<Box<dyn z3::ast::Ast>>. Once Z3 has constructed a
model that satisfies your theory, you’ll probably want to query the model for
the values of constants via the method pub fn get_const_interp<T: Ast<'ctx>>(&self, ast: &T) -> Option<T>.
Well, you just shot your foot off. You can’t call this method on a trait object, so now you need to redo the work you just did. And the new code is going to be a whole lot uglier.
In contrast to the orthodox Rust opinion, we should prefer to use trait objects unless we explicitly need to combine multiple trait bounds or dynamic dispatch is a performance issue. Here’s what I mean:
// Previous code not shown.
fn call_foo(x: &dyn MyTrait) {
x.foo();
}
fn main() {
let v: Vec<&dyn MyTrait> = vec![&S1{}, &S2{}];
for x in v {
call_foo(x);
}
}
Note that this trait object is general enough to work with many data
structures. For example, we can still use a Box with this implementation:
// Previous code not shown.
fn main() {
let v2: Vec<std::boxed::Box<dyn MyTrait>> = vec![std::boxed::Box::new(S1{}),
std::boxed::Box::new(S2{})];
for x in &v2 {
call_foo(x.as_ref());
}
call_foo(v2[0].as_ref());
}
And, of course, we can still use call_foo on a specific instance:
// Previous code not shown.
fn main() {
let s = S1{};
call_foo(&s);
}
You should just always implement your traits for trait objects:
// Previous code not shown.
impl MyTrait for &dyn MyTrait {
fn foo(&self) {
(**self).foo();
}
}
fn call_foo<T>(x: &T) where T: MyTrait {
x.foo();
}
fn main() {
let v: Vec<&dyn MyTrait> = vec![&S1{}, &S2{}];
for x in v {
call_foo(&x);
}
}
Note that this code also works on other kinds of trait objects:
// Previous code not shown.
fn main() {
let v2: Vec<std::boxed::Box<dyn MyTrait>> = vec![std::boxed::Box::new(S1{}),
std::boxed::Box::new(S2{})];
for x in &v2 {
call_foo(&x.as_ref());
}
call_foo(&v2[0].as_ref());
let v3: Vec<std::rc::Rc<dyn MyTrait>> = vec![std::rc::Rc::new(S1{}),
std::rc::Rc::new(S2{})];
for x in &v3 {
call_foo(&x.as_ref());
}
call_foo(&v3[0].as_ref());
}
If you create a trait then you must be the one that implements it for trait objects. Per the coherence rule a trait can only be implemented for a type by the crate that defines the trait or defines the type.
There’s a lot of code in the wild that share the same pain-point as the Z3 example I mentioned. It shouldn’t be difficult to use generics. Effective Rust does explain the reason for the current design rather well. But I feel like this is an area that can be improved on.
]]>Programs communicate – whether with other programs or humans. Software developers write programs with a protocol in mind. Sometimes there’s documentation for the protocol. But there’s no mechanism that keeps implementation and documentation in sync. Bugs occur when protocols diverge.
Many of us already use type systems. But naive approaches to typing
fall short of guaranteeing that an implementation speaks a
protocol. For example: Suppose two threads T1 and T2 communicate
over a channel chan. T1 and T2 play a guessing game. T1
guesses a number (int) and T2 informs T1 if the guess is right
(bool). We might type chan as Chan<std::variant<int,
bool>>. This isn’t helpful, though. If T1 sends a bool the
program should not compile, yet it does.
Session types are a tool that solves this problem. This post discusses an implementation of session types in C++. You’ll learn more about how you can use session types to specify protocols. You’ll also see some features in C++ (concepts and template meta-programming) you might not know how to use today.
All code is available on GitHub.
Instead of two threads playing a guessing game, let’s make a game for humans. First, the computer generates a random number between 1 and 100. Second, the computer prompts the user to guess the number. Then, the user enters a guess. Next, the computer evaluates the user’s guess. If the guess is correct then the program sends a congratulatory message and exits. If the guess is wrong then the program asks the user if they give up. The user keeps guessing the generated number until they get it right or give up.
This listing shows how we might specify this protocol with session types:
using GuessingGameProtocol =
Rec<Choose<QueryUserProtocol<Choose<KeepPlayingProtocol, Var<Z>>>,
ExitProtocol>>;
template <HasDual P>
using QueryUserProtocol = Send<std::string, Recv<int, P>>;
using KeepPlayingProtocol = Send<std::string, Recv<std::string, Var<Z>>>;
using ExitProtocol = Choose<ExitUserLost, ExitUserWon>;
using ExitUserLost = Send<std::string, Send<int, Send<std::string, Send<std::ostream&(std::ostream&), Z>>>>;
using ExitUserWon = Send<std::string, Send<std::ostream&(std::ostream &), Z>>;
Let’s unpack:
Rec<P> introduces a recursive protocol. It allows the protocol to repeat itself using Var.Choose<P1, P2> allows the implementation to make a choice between
protocols P1 and P2. Choose<QueryUserProtocol<...>,
ExitProtocol> represents a choice between asking the user for
another guess and terminating.Send<T1, P> represents that the implementation sends a value of
type T1 then executes the protocol P. Similarly, Recv receives.Var<N> accepts a natural number – either Z or Succ<M> –
and returns to the recursive environment N levels out.Here’s what an implementation of this protocol might look like:
int main() {
std::default_random_engine generator;
generator.seed(time(nullptr));
std::uniform_int_distribution distribution(1,10);
const auto the_number = distribution(generator);
auto keep_going = true;
auto guess = 0;
Chan<GuessingGameProtocol, decltype(&std::cin), decltype(&std::cout)> chan(&std::cin, &std::cout);
while (keep_going) {
auto c1 = chan.enter().choose1();
auto c2 = c1 << "Guess: ";
auto c3 = c2 >> guess;
keep_going = guess != the_number;
if (keep_going) {
auto c4 = c3.choose1();
auto c5 = c4 << "Incorrect. Keep playing? (y/n) ";
std::string response;
auto c6 = c5 >> response;
keep_going = response != "n";
chan = c6.ret();
} else {
chan = c3.choose2().ret();
}
}
if (guess != the_number) {
auto ce = chan.enter().choose2().choose1();
ce << "You lose. I was thinking of " << the_number << "." << std::endl;
} else {
auto ce = chan.enter().choose2().choose2();
ce << "You win!" << std::endl;
}
}
Some explanations are in order:
Chan type represents a session typed communication channel. It
encapsulates some other input and output mechanisms. In this case,
cin and cout.Chan by calling methods. Following a method
call, it is illegal to reuse the Chan – doing so triggers a
run time error. Operations return new channels that speak the proper protocol.chan.enter() enters a recursive context.Chan<Choose<P1, P2>>::choose1() returns a channel that speaks
P1. Chan<Choose<P1, P2>>::choose2() returns a channel that speaks P2.Chan<Recv<T, P>>::operator>>(T &t) reads a value from the
channel’s input stream into t. It returns a channel that speaks
P. operator<<(const T &t) behaves similarly.Chan<Var<N>>::ret() returns a channel that speaks the Nth recursive protocol defined in the original type.Combined, this provides a stronger guarantee than what we had before: Programs always send the right shaped data for the protocol, or send nothing.
When two threads communicate over a channel it’s important that they
speak the same protocol. Our intuition tells us that every Send<T, ...>
should have a corresponding Recv<T, ...>, etc. We call this
duality. We desire that our type system only allow two threads to
communicate over the channel if they are each other’s duals.
This next listing shows part an implementation of program with two
threads: T1 and T2. T1 sends a value to T2, who responds with
that value doubled.
#include <cstdio>
#include <iostream>
#include <memory>
#include "sesstypes.hh"
#include "concurrentmedium.hh"
using Protocol = Rec<Send<int, Recv<int, Var<Z>>>>;
void log(const std::string &tname, const std::string &action, int val) {
printf("%s %s %d\n", tname.c_str(), action.c_str(), val);
}
void log(const std::string &tname, const std::string &action) {
printf("%s %s\n", tname.c_str(), action.c_str());
}
struct {
template <typename CommunicationMedium>
void operator()(Chan<Protocol, CommunicationMedium, CommunicationMedium> chan) {
int val;
auto c = chan.enter();
for (int i = 0; i < 5; i++) {
auto c1 = c << i;
log("T1", "sent", i);
int val;
auto c2 = c1 >> val;
log("T1", "received", val);
c = c2.ret().enter();
}
log("T1", "done", -1);
}
} t1;
int main() {
auto chan = std::make_shared<ConcurrentMedium<ProtocolTypes<Protocol>>>();
auto threads = connect<Protocol>(t1, t2, chan);
threads.first.join();
threads.second.join();
}
Critically, we are only allowed to call connect<Protocol>(t1, t2) if
t2 is the dual of t1. This requirement is enforced at compile time.
Now that we have a better idea about what session types are, let’s see how they are implemented.
Duality is critical to our concurrent motivating example. The idea
that a type has a dual can be captured using a
concept. Concepts
are named boolean predicates that restrict template parameters.
Take the definition of the Recv type:
template <typename T, HasDual P>
struct Recv {
using dual = Send<T, typename P::dual>;
};
Recv defines dual as its opposite, Send. Since Recv requires
that the protocol P has a dual, we constrain P to types where
HasDual evaluates to true.
Here’s the implementation of HasDual:
template <typename T>
concept HasDual = requires { typename T::dual; };
This introduces another new feature of C++: The requires
expression. requires { typename T::dual; } evaluates to true if
typename T::dual compiles. Otherwise, it evaluates to false. (By
the way, it’s illegal for a requires expression to always fail to
compile.)
Concepts are great because they improve compiler error messages. We’ve all seen the error vomit C++ compilers produce when template expansion fails. Concepts eliminate much of the noise to help us debug.
Remember that Var uses a natural number to decide how many levels of
recursion to return from. Let’s see how our natural numbers are implemented.
Here’s a naive way to implement natural numbers:
struct Z {};
template <typename T>
struct Succ {};
This definition allows us to write real natural numbers like
Succ<Succ<Z>>. The problem is that it also allows us to write things
that aren’t natural numbers, like Succ<int>. Given that this post is
about radical type checking, we should not be satisfied with this.
Instead, we use template metaprogramming to enforce that a type is a natural number. There are two ways for a type to be a natural number:
Z.Succ<M> and M is a natural number.Here’s how we define a concept IsNat to check that a type is a natural number:
template <typename T>
struct IsNatImpl : std::false_type {};
template <>
struct IsNatImpl<Z> : std::true_type {};
template <typename M>
struct IsNatImpl<Succ<M>>
: std::conditional_t<
IsNatImpl<M>::value,
std::true_type,
std::false_type
> {};
template <typename T>
concept IsNat = IsNatImpl<T>::value;
The type_traits header provides std::true_type and
std::false_type as canonical representations of true and false
at the type level. The default implementation of IsNatImpl inherits
from false_type, so its value member is false. The Z
specialization inherits from true_type, so its value member is
true.
The last specialization is kind of tricky. conditional_t<Condition, A, B>
is A when Condition is true and B otherwise. So we recursively check that
IsNatImpl<M>::value is true. If so, then Succ<M> is a natural number,
and so we inherit from true_type.
This lets us write a more correct version of natural numbers:
template <typename T>
struct Succ;
// Code for IsNat.
template <>
struct Succ<Z> {};
template <IsNat M>
struct Succ<M> {};
Here we discuss the implementation of the Chan type.
Since recursion is the hardest thing that we have to support
we’ll describe it first. It has far-reaching implications.
The idea is to represent a channel as a Chan<Protocol, E>.
Protocol is the protocol type. For example, Recv<int, Send<int, Z>>.
E (for environment) is kind of like a stack. Here’s what I mean:
template <HasDual P, typename IT, typename OT, typename E>
class Chan<Rec<P>, IT, OT, E> : ChanBase<IT, OT> {
public:
using ChanBase<IT, OT>::ChanBase;
Chan<P, IT, OT, std::pair<Rec<P>, E>> enter() {
// Implementation not shown.
}
};
So, Chan is specialized on recursive protocols. It provides only one method, enter.
This makes it impossible to try to read from a recursive protocol, for example.
The enter method for a protcol Rec<P> pushes P onto a stack. Since this all occurs in
the type system, we represent the stack as a std::pair.
This allows us to define Var<N>, which pops N levels from the environment:
template <HasDual P, typename IT, typename OT, typename E>
class Chan<Var<Z>, IT, OT, std::pair<P, E>> : ChanBase<IT, OT> {
public:
using ChanBase<IT, OT>::ChanBase;
Chan<P, IT, OT, E> ret() {
// Implementation not shown.
}
};
template <typename T, HasDual P, typename IT, typename OT, typename E>
class Chan<Var<Succ<T>>, IT, OT, std::pair<P, E>> : ChanBase<IT, OT> {
public:
using ChanBase<IT, OT>::ChanBase;
Chan<Var<T>, IT, OT, E> ret() {
// Implementation not shown.
}
};
This is sort of recursive. In the base case, ret returns a channel whose
protocol is the top of the environment stack. Otherwise, for Var<N>, ret returns a channel
that also speaks Var. Only this time, it’s Var<N - 1>.
Chan is specialized for all of the types with duals. For example, here’s Chan<Recv<...>, ...>:
template <typename T, HasDual P, typename IT, typename OT, typename E>
class Chan<Recv<T, P>, IT, OT, E> : ChanBase<IT, OT> {
public:
using ChanBase<IT, OT>::ChanBase;
Chan<P, IT, OT, E> operator>>(T &t) {
if (ChanBase<IT, OT>::used) {
throw ChannelReusedError();
}
ChanBase<IT, OT>::used = true;
(*ChanBase<IT, OT>::input) >> t;
return Chan<P, IT, OT, E>(ChanBase<IT, OT>::input, ChanBase<IT, OT>::output);
}
};
Since it’s specialized, the only thing we can do with a Chan<Recv<...>> is
use operator>>. This prevents a large number of mistakes – we can’t send
an integer at an unexpected time, for example.
The second motivating example uses ConcurrentMedium to create a
Chan, instead of cin and cout. This allows two threads to
communicate over a channel. This section describes the design of ConcurrentMedium.
Chan.We store writes in a std::variant. This is a type-safe union. So,
the type ConcurrentMedium<std::variant<int, std::string>> can
communicate values with types of int or std::string.
This listing shows this implementation:
template <typename... Ts>
class ConcurrentMedium<std::variant<Ts...>> {
public:
ConcurrentMedium()
: was_read(true), writers_waiting(0), readers_waiting(0) {}
template <typename T>
ConcurrentMedium& operator<<(const T &value) {
std::unique_lock held_lock(lock);
while (!was_read) {
// Needs to be in a while loop to ignore "spurious wakeups".
// https://en.cppreference.com/w/cpp/thread/condition_variable/wait
writers_waiting++;
writer_cv.wait(held_lock);
writers_waiting--;
}
data = value;
was_read = false;
write_source = std::this_thread::get_id();
if (readers_waiting > 0) {
reader_cv.notify_one();
}
return *this;
}
template <typename T>
ConcurrentMedium& operator>>(T &datum) {
std::unique_lock held_lock(lock);
while (write_source == std::this_thread::get_id() || was_read) {
readers_waiting++;
reader_cv.wait(held_lock);
readers_waiting--;
}
datum = std::get<T>(data);
was_read = true;
if (writers_waiting > 0) {
writer_cv.notify_one();
}
return *this;
}
private:
std::mutex lock;
int readers_waiting;
std::condition_variable reader_cv;
int writers_waiting;
std::condition_variable writer_cv;
std::variant<Ts...> data;
std::thread::id write_source;
bool was_read;
};
You may notice a small problem with operator>> and operator<<:
They accept any type T, but we are only able to read/write T if
it is part of the variant.
The way we’re going to solve this problem is to create a concept
AssignableToVariant<T, V> that is true whenever T can be written
to the variant V. AssignableToVariant is written by using a
template meta-program called OneOf. Here are the implementations:
template <typename T, typename V>
struct OneOf : public std::false_type {};
template <typename T, typename... Ts>
struct OneOf<T, std::variant<Ts...>> : public std::conditional_t<
(std::is_same_v<T, Ts> || ...),
std::true_type,
std::false_type
>
{};
template <typename T, typename V>
concept AssignableToVariant = OneOf<T, V>::value;
This is similar to IsNatImpl. The syntax (std::is_same_v<T, Ts> || ...)
is called a fold expression.
It essentially rewrites the original expression into
(std::is_same_v<T, Ts[0]> || ... || std::is_same_v<T, Ts[N]>),
although Ts[0] is not real syntax.
These are the updated signatures for operator<< and operator>>:
template <AssignableToVariant<std::variant<Ts...>> T>
ConcurrentMedium& operator<<(const T &value);
template <AssignableToVariant<std::variant<Ts...>> T>
ConcurrentMedium& operator>>(T &value);
ConcurrentMedium is hard to use. If we have a protocol Send<int, Read<std::string, ...>>,
it is time-consuming and error-prone to keep writing ConcurrentMedium<std::variant<int, std::string, ...>>.
Plus, we have to exert effort to keep the variant and the protocol in sync.
To solve this problem, we’ll create another template meta-program called ProtocolTypes.
ProtocolTypes<Send<int, Read<std::string, ...>>> automatically creates a std::variant<int, std::string, ...>.
Here’s the implementation:
template <typename Variant, typename T>
struct ProtocolTypesImpl;
template <typename... Ts, typename T, HasDual P>
struct ProtocolTypesImpl<std::variant<Ts...>, Recv<T, P>> {
using type = typename ProtocolTypesImpl<std::variant<T, Ts...>, P>::type;
};
template <typename... Ts>
struct ProtocolTypesImpl<std::variant<Ts...>, Z> {
using type = std::variant<Z, Ts...>;
};
template <typename... Ts, typename T, HasDual P>
struct ProtocolTypesImpl<std::variant<Ts...>, Send<T, P>> {
using type = typename ProtocolTypesImpl<std::variant<T, Ts...>, P>::type;
};
template <typename... Ts, HasDual P>
struct ProtocolTypesImpl<std::variant<Ts...>, Rec<P>> {
using type = typename ProtocolTypesImpl<std::variant<Ts...>, P>::type;
};
template <typename... Ts, IsNat N>
struct ProtocolTypesImpl<std::variant<Ts...>, Var<N>> {
using type = std::variant<Ts...>;
};
template <HasDual P>
using ProtocolTypes = ProtocolTypesImpl<std::variant<>, P>::type;
Of course, there’s a small problem with this implementation. Namely,
if we have a protocol Send<int, Recv<int, Z>>, we create
a std::variant<int, int>. Then, std::get<int>(data) is illegal because
the type int does not uniquely index the variant. We need all types to be
unique.
Once again, we use a template meta-program to implement this idea:
template <typename T, typename... Ts>
struct Unique : std::type_identity<T> {};
template <typename... Ts, typename U, typename... Us>
struct Unique<std::variant<Ts...>, U, Us...>
: std::conditional_t<(std::is_same_v<U, Ts> || ...),
Unique<std::variant<Ts...>, Us...>,
Unique<std::variant<Ts..., U>, Us...>> {};
template <typename T>
struct MakeUniqueVariantImpl;
template <typename... Ts>
struct MakeUniqueVariantImpl<std::variant<Ts...>> {
using type = typename Unique<std::variant<>, Ts...>::type;
};
template <typename T>
using MakeUniqueVariant = typename MakeUniqueVariantImpl<T>::type;
And we revise ProtocolTypes:
using ProtocolTypes = MakeUniqueVariant<ProtocolTypesImpl<std::variant<>, P>::type>;
Now we can easily type a ConcurrentMedium: ConcurrentMedium<ProtocolTypes<Protocol>>.
There are (at least) two important ways this approach is not sound:
This approach does help us describe communications between exactly two entities. But here are some scenarios that this specific approach doesn’t help:
This writing has three goals. First, I want to showcase how far formal methods have come. Second, there is not a lot of material discussing how to use formal methods, and particularly F*. I hope others are able to learn from my mistakes, and newcomers can pick up some proof-engineering strategies. Finally, I want to draw attention to some current pain-points for the sake of improving current formal methods research.
The F* tutorial has an editor you can interact with in your browser. You can follow along with these examples there, without downloading any additional software.
Contents:
F* is a complex language, and I am but a journeyman. The purpose of this section is only to familiarize you, gentle reader, with enough F* to broadly understand this post’s verification efforts. If you are interested in learning more, check out the F* tutorial.
F* is inspired by ML languages. You can define simple functions like this:
let double (x: int) : int
= x + x
This just defines a function called double that accepts an int as
a parameter, and returns an int. Note that in F* int refers to a
mathematical integer, not a fixed-size integer as in C. This means
that the value of x can be arbitrarily large (small).
Note that we may want to define double like:
let double (x: int) : int
= x * 2
But this simple definition won’t work because * is reserved by F*
for constructing tuples. F* tells us this fact with an informative
error message:
(Error 189) Expected expression of type "Type"; got expression "x" of
type "Prims.int"
Instead, we have to import a definition that
redfines * to refer to multiplication. We do this by opening a
module. This definition works:
open FStar.Mul
let double (x: int) : int
= x * 2
In a dependently typed programming language, types are permitted to
depend on values. Let’s consider the double example:
let double (x: int) : (result: int{result = x + x})
= x * 2
We changed the return type of double to (result: int{result = x +
x}). This is called a refinement type. This is a dependent type
because the type depends on the value of x (as well as the return
value of double). Note that there is nothing special about the name
result – we just needed a name to refer to the return value of
double in the refinement type. Any name would work.
Interestingly, notice that x + x is not syntactically the same as
x * 2. F* is aware of the semantics of the * operator and the +
operator, and automatically proved that x * 2 = x + x. This
highlights the power of F*: Many facts can be proven with little
effort.
In F*, assert statements check that a condition is true at
proof-time (i.e., before the code runs). This is done by proving the
condition asserted. Here is a simple example:
let _ = assert (true)
Of course, the proposition true is always provable (true is
true).
Here’s an example of a proposition that cannot be proved:
let _ = assert (false)
This produces this error message:
(Error 19) assertion failed; The SMT solver could not prove the query. Use --query_stats for more details.
Of course false cannot be proved (false is never true).
These examples are rather boring. Let’s consider an example that uses more interesting pieces of logic:
let _ = assert (forall (x: nat) (y: nat) .
y >= x ==>
(exists (z: nat) .
y = x + z))
In more familiar logic, we’d write this as $\forall x, y . y >= x \implies \exists z . y = x + z$.
But if we try to verify our assertion with F*, it fails:
(Error 19) assertion failed; The SMT solver could not prove the query. Use --query_stats for more details.
Under the hood, F* uses the Z3 SMT solver to perform proofs. While Z3 is powerful, no theorem prover can automatically prove all theorems. Z3 appears stuck here. Let’s try adding hints to help Z3 get unstuck:
open FStar.Mul
open FStar.Tactics
let _ = assert (forall (x: nat) (y: nat) .
y >= x ==>
(exists (z: nat) .
y = x + z))
by (
let x = forall_intro () in
let y = forall_intro () in
let imp = implies_intro () in
witness (`(`#y - `#x));
dump "after witness"
)
We provide hints by using tactics, which are programs that manipulate proofs. Every proof has 1 or more goals, or statements that we need to show are true. Tactics use known facts to simplify goals. This example shows a few tactics:
forall_intro introduces the first variable quantified by forall to
the set of known facts (i.e., the variable exists and has the
specified type). As a really simple example, forall_intro
transforms a goal like $\vdash \forall (x: \mathbb{N}) . x = x$ into $(x : \mathbb{N})
\vdash x = x$.implies_intro adds the antecedent of an implication to the set of facts known to
the theorem prover. To prove $\Gamma \vdash a \implies b$, it is
sufficient to show $\Gamma, a \vdash b$.witness helps us manipulate existence quantifiers. witness adds
a term that shows an object with a given property exists. Here, our
witness to the existential quantifier is y - x.dump is an extremely useful tactic. It shows the current goals
that need to be proved.Dump shows us this message:
Goal 1/2:
(x: Prims.nat), (x'0: Prims.nat), (_: x'0 >= x) |- _ : Prims.squash (x'0 - x >= 0 == true)
Goal 2/2:
(x: Prims.nat), (x'0: Prims.nat), (_: x'0 >= x) |- _ : Prims.squash
(x'0 = x + (x'0 - x))
If you read these goals for a second, they should seem obviously true. F* is quite easy to use: If you think something is obvious, just stop talking and see if F* completes the proof:
open FStar.Mul
open FStar.Tactics
let _ = assert (forall (x: nat) (y: nat) .
y >= x ==>
(exists (z: nat) .
y = x + z))
by (
let x = forall_intro () in
let y = forall_intro () in
let imp = implies_intro () in
witness (`(`#y - `#x))
)
In this case, it does.
The problem we’re going to solve and verify is the Capacity to
Ship Packages within D
Days problem. You’re
given weights (an array of positive numbers representing the weights of
items), and days (the maximum number of days you have to ship all
the items). These items must be loaded onto a ship with a capacity of
capacity. The challenge is to find the smallest value of capacity
so that the number of days required to ship the items is less than or
equal to days. Check out the LeetCode description for more details.
Clearly, the minimum capacity that might work is the maximum element
of weights. For, if the capacity were any smaller, it would be
impossible to ship the largest item. The largest capacity we should
consider is the sum of the item weights. Any larger capacity is
superfluous, since a ship with this capacity can already ship all the
items in 1 day. The correct capacity is therefore somewhere in the
range $[maximum\_element~ weights,~ sum~ weights]$.
The naive approach is to simply check every weight in this range. But this number could be quite large – for instance, when the number of items is large but the maximum weight is small. A smarter approach is to use binary search to find the correct capacity.
To be frank, I find that getting the bounds of binary search right to be a little tricky. For tricky loop bounds, I craft loop invariants to help me write the code. Let $min\_elt$ denote the smallest capacity that maybe could ship the items, and $max\_elt$ denote the largest capacity we should consider. We will maintain two key invariants:
$\forall x . x < min\_elt \implies time\_to\_ship~ weights~ x > days$.
$\forall x . x >= max\_elt \implies time\_to\_ship~ weights~ x <= days$.
Under these invariants, when $min\_elt = max\_elt$, the correct capacity to return is $min\_elt$ (or, equivalently, $max\_elt$).
Let’s start by computing the number of days it takes to ship items
with weights weights given a capacity capacity. We’ll represent
weights as a non-empty list of natural numbers. F*
already provides a theory of lists, so we’ll use that.
module Capacity
open FStar.List
open FStar.List.Tot
open FStar.Tactics
let weight_list = (l:list nat{not (isEmpty l)})
The syntax list nat describes a list of natural numbers. We use a
refinement type to specify that the list is non-empty.
Here’s a function definition that returns the number of days it takes to ship some items:
let rec max_elt (l: weight_list) : nat =
match l with
| [x] -> x
| (x::xs) ->
let max' = max_elt xs in
if x >= max' then x
else max'
let rec days_to_ship' (weights: weight_list)
(capacity: nat{capacity >= (max_elt weights)})
(current_cap: nat)
: (x: nat{x >= 1})
=
match weights with
| [x] ->
if x <= current_cap then 1
else 2
| (x::xs) ->
if x <= current_cap then
days_to_ship' xs capacity (current_cap - x)
else
1 + (days_to_ship' xs capacity capacity)
let days_to_ship (weights: weight_list)
(capacity: nat{capacity >= (max_elt weights)})
: (x: nat{x >= 1})
= days_to_ship' weights capacity capacity
A few notes about these functions:
let rec syntax.match syntax performs pattern-matching. Inside of max_elt,
[x] matches with a list containing exactly 1 item. The second
match case (x::xs) matches with an item consed into any list. Note
that these patterns are exhaustive since a weight list is
non-empty. Also note that F* verifies this exhaustivity for us, automatically.capacity
parameter. This is applying our earlier argument: The minimum
capacity we can use to ship the items is the maxmium weight of the
items.Here’s the implementation of our solution function in F*:
let nat_sum (a: nat) (b: nat) : nat = a + b
let sum_of_weights (weights: weight_list) : nat =
List.Tot.fold_left nat_sum (hd weights) (tl weights)
let rec lemma_sum_of_weights_is_gte_max (weights: weight_list) :
Lemma (ensures (sum_of_weights weights) >= max_elt weights)
=
match weights with
| [w] -> ()
| (x::xs) ->
FStar.List.Tot.Properties.fold_left_monoid nat_sum 0 xs;
lemma_sum_of_weights_is_gte_max xs
let min_bound (weights: weight_list) : nat = max_elt weights
let max_bound (weights: weight_list) : nat = sum_of_weights weights
// Returns the minimum capacity necessary to ship all the items in `days` days.
// Note that we have to specify that we decrease the difference between max_cap and min_cap.
let rec ship_within_days' (weights: weight_list)
(days: nat{days > 0})
(min_cap: nat{min_cap >= min_bound weights})
(max_cap: nat{max_cap >= min_cap})
: Tot (n:nat{n >= min_cap /\ n <= max_cap}) (decreases max_cap - min_cap)
=
if min_cap = max_cap then
min_cap
else
let middle_cap = (min_cap + max_cap) / 2 in
let total_days = days_to_ship weights middle_cap in
if total_days > days then
ship_within_days' weights days (middle_cap + 1) max_cap
else
ship_within_days' weights days min_cap middle_cap
let ship_within_days (weights: weight_list) (days: nat{days > 0})
: (n:nat{n >= min_bound weights /\ n <= max_bound weights})
= lemma_sum_of_weights_is_gte_max weights;
ship_within_days' weights
days
(max_elt weights)
(sum_of_weights weights)
The heart of the implementation is ship_within_days', so we’ll start
there. This is a fairly simple binary search implementation. Again,
we’re just maintaining the 2 invariants discussed in the Solution
Design subsection. Try to go through the logic and see
why those invariants are maintained.
The first bit of new syntax we’ll discuss is the return type of
ship_within_days'. It returns Tot (n:nat{n >= min_cap /\ n <=
max_cap}) (decreases max_cap - min_cap). In F*, all functions must
be total – meaning they must terminate. So, really, the type of double from
earlier is
val double (x: int) : Tot int
But F* nicely writes Tot for us. Unfortunately, F* doesn’t know
why the function ship_within_days' terminates. We explain it:
Because max_cap - min_cap always decreases. F* can see that this
statement is true, and then accepts our function as terminating. If we
delete (decreases max_cap - min_cap) from our code, F* produces
this error:
Could not prove termination of this recursive call; The SMT solver could not prove the query. Use --query_stats for more details.
This is our cue to add the decreases expression.
Our primary solution function is ship_within_days. There’s one bit
of magic in it: The application of the lemma
lemma_sum_of_weights_is_gte_max. This is required because we used a
refinement type for max_cap that requires max_cap >=
min_cap. Unfortunately, F* cannot automatically prove that
(sum_of_weights weights) >= (max_elt weights), so type checking
fails if we delete the application of the lemma:
Subtyping check failed; expected type max_cap: Prims.nat{max_cap >= max_elt weights}; got type Prims.nat; The SMT solver could not prove the query.
In general, F* cannot automatically prove propositions that require induction. But once we apply the lemma, F* can easily verify that the types are correct.
Now, let’s discuss our max_bound implementation for a moment. As we
mentioned in the Solution Design, the maximum bound on the
weights is just the sum of all weights. To sum the weights, we use the standard
fold_left function that should be familiar to functional
programmers. Note that we cannot write sum_of_weights like this:
// Error
let sum_of_weights (weights: weight_list) : nat =
List.Tot.fold_left (+) (hd weights) (tl weights)
This is because the type of + is int -> int -> int. While nat is
a subtype of int, F*’s type checking algorithm does not induce int
-> int -> int will produce a nat. To solve this problem, we
explicitly define nat_sum.
Finally, lemma_sum_of_weights_is_gte_max procedes by induction. We
use the Lemma (...) type because the function is a proof. In the
case where this is exactly 1 item in the list, we produce the value
(). This term has a type of unit. In F*, the type Lemma (ensures
(sum_of_weights weights) >= max_elt weights) is really just a synonym
for the type u:unit{(sum_of_weights weights) >= max_elt
weights}. So, F* will automatically try (and succeed!) to show our
lemma is true.
In the case when there is more than 1 item in the list, we first apply
FStar.List.Tot.Properties.fold_left_monoid. This
establishes the fact that nat_sum (x::xs) = x + nat_sum xs. The
following line (lemma_sum_of_weights_is_gte_max xs) convinces F*
that the lemma holds by induction. As an exercise: Look at lemma
fold_left_monoid provides and consider why we didn’t use this
definition:
// Error
let sum_of_weights (weights: weight_list) : nat =
List.Tot.fold_left nat_sum 0 weights
There are two facts we want to prove:
days days.days days. I.e., our solution is minimal.In fact, these statements are direct consequences of the 2 invariants we constructed in our design subsection. So, let’s start by writing these invariants in F*:
let min_bound_invariant (weights: weight_list)
(cap: nat{cap >= min_bound weights})
(days: nat{days > 0})
= forall (x : nat) . x >= min_bound weights /\ x < cap ==> days_to_ship weights x > days
let max_bound_invariant (weights: weight_list)
(cap: nat{cap >= min_bound weights})
(days: nat{days > 0})
= forall (x : nat) . x >= cap ==> days_to_ship weights x <= days
Let’s also define the concept of minimality:
let is_minimal (w: weight_list) (c: nat{c >= min_bound w}) (days: nat{days > 0}) =
c = min_bound w \/ (c > min_bound w /\ days_to_ship w (c - 1) > days)
The proof follows from induction. We’ll start by drawing the outline of the proof, then fill in details until it is complete. To start the proof:
let rec lemma_ship_within_days'_ships_within_days (weights: weight_list)
(days: nat{days > 0})
(min_cap: nat{min_cap >= min_bound weights})
(max_cap: nat{max_cap >= min_cap})
: Lemma
(requires min_bound_invariant weights min_cap days /\
max_bound_invariant weights max_cap days)
(ensures (days_to_ship weights (ship_within_days' weights days min_cap max_cap)) <= days /\
is_minimal weights (ship_within_days' weights days min_cap max_cap) days)
(decreases max_cap - min_cap)
=
if min_cap = max_cap then
()
else
admit ()
Notice the new requires component of the Lemma type. The requires
and ensures clauses of Lemma are
preconditions and
postconditions
respectively. Our strategy is to require that our 2 invariants hold at
each call to lemma_ship_within_days'_ships_within_days. Then, it
is obvious that the postconditions hold. Indeed: Notice that F*
automatically finds a proof when min_cap = max_cap. On the other
hand, we use admit () in the else branch. F* programs that
contain admit () aren’t proofs at all - admit () forces F* to
accept the current goals as true (even if it they are false). However, it’s
invaluable when building proofs.
Let’s zoom in further by applying the definition of ship_within_days
in the else branch:
// Error
let rec lemma_ship_within_days'_ships_within_days (weights: weight_list)
(days: nat{days > 0})
(min_cap: nat{min_cap >= min_bound weights})
(max_cap: nat{max_cap >= min_cap})
: Lemma
(requires min_bound_invariant weights min_cap days /\
max_bound_invariant weights max_cap days)
(ensures (days_to_ship weights (ship_within_days' weights days min_cap max_cap)) <= days /\
is_minimal weights (ship_within_days' weights days min_cap max_cap) days)
(decreases max_cap - min_cap)
=
if min_cap = max_cap then
()
else
let middle_cap = (min_cap + max_cap) / 2 in
let total_days = days_to_ship weights middle_cap in
if total_days > days then (
lemma_ship_within_days'_ships_within_days weights days (middle_cap + 1) max_cap
) else (
admit ()
)
Unfortunately, verification fails at this point:
(Error 19) assertion failed; The SMT solver could not prove the query. Use --query_stats for more details.
Frankly, this error message is pretty awful. Hopefully, it is clear
that if lemma_ship_within_days'_ships_within_days can be applied
in the body of if total_days > days then postcondition holds. This
should lead us to suspect that the problem is that F* cannot prove
the preconditions of lemma_ship_within_days'_ships_within_days holds
at this point. Let’s add a temporary assert statement to check on the
precondition:
// Error
let rec lemma_ship_within_days'_ships_within_days (weights: weight_list)
(days: nat{days > 0})
(min_cap: nat{min_cap >= min_bound weights})
(max_cap: nat{max_cap >= min_cap})
: Lemma
(requires min_bound_invariant weights min_cap days /\
max_bound_invariant weights max_cap days)
(ensures (days_to_ship weights (ship_within_days' weights days min_cap max_cap)) <= days /\
is_minimal weights (ship_within_days' weights days min_cap max_cap) days)
(decreases max_cap - min_cap)
=
if min_cap = max_cap then
()
else
let middle_cap = (min_cap + max_cap) / 2 in
let total_days = days_to_ship weights middle_cap in
if total_days > days then (
assert (min_bound_invariant weights (middle_cap + 1) days);
lemma_ship_within_days'_ships_within_days weights days (middle_cap + 1) max_cap
) else (
admit ()
)
F* still prints an assertion failed error, but now it points to the
line checking the precondition. So, we know that the problem is that
F* cannot prove min_bound_invariant on (middle_cap + 1). We know
that maximum_bound_invariant must continue to hold.
Observe that min_bound_invariant holds because days_to_ship is
decreasing: If we decrease the capacity, we will increase the days to
ship, and the condition if total_days > days already has proven that
we cannot ship at the capacity middle_cap. We just need to show F*
these facts are true:
let rec lemma_days_to_ship_is_decreasing'' (weights: weight_list)
(cap: nat{cap >= (max_elt weights)})
(ccap: nat{ccap <= cap})
(ccap1: nat{ccap1 > ccap /\ ccap1 <= cap + 1})
: Lemma (ensures days_to_ship' weights (cap + 1) ccap1 <= (days_to_ship' weights cap ccap))
=
match weights with
| [w] -> ()
| x::xs ->
if x <= ccap && x <= ccap1 then
lemma_days_to_ship_is_decreasing'' xs cap (ccap - x) (ccap1 - x)
else if x > ccap && x <= ccap1 then
lemma_days_to_ship_is_decreasing' xs cap cap (ccap1 - x)
else if x > ccap && x >= ccap1 then
lemma_days_to_ship_is_decreasing'' xs cap cap (cap + 1)
and lemma_days_to_ship_is_decreasing' (weights: weight_list)
(cap: nat{cap >= (max_elt weights)})
(ccap: nat{ccap <= cap})
(ccap1: nat{ccap1 <= cap + 1})
: Lemma (ensures days_to_ship' weights (cap + 1) ccap1 <= 1 + (days_to_ship' weights cap ccap))
=
match weights with
| [w] -> ()
| x::xs ->
if x <= ccap && x <= ccap1 then
lemma_days_to_ship_is_decreasing' xs cap (ccap - x) (ccap1 - x)
else if x > ccap && x > ccap1 then
lemma_days_to_ship_is_decreasing' xs cap cap (cap + 1)
else if x > ccap && x <= ccap1 then
lemma_days_to_ship_is_decreasing' xs cap cap (ccap1 - x)
else
// I.e., x <= ccap && x > ccap1
lemma_days_to_ship_is_decreasing'' xs cap (ccap - x) (cap + 1)
let lemma_days_to_ship_is_decreasing (weights: weight_list)
(cap: nat{cap >= (max_elt weights)})
(c_cap: nat{c_cap <= cap})
: Lemma (ensures days_to_ship' weights (cap + 1) (c_cap + 1) <= days_to_ship' weights cap c_cap)
=
lemma_days_to_ship_is_decreasing'' weights cap c_cap (c_cap + 1)
Despite the coinductive proof, this is a simple argument. The theorem
that we are primarily interested in is
lemma_days_to_ship_is_decreasing''. This follows from
induction. There is a wrinkle, though: In the else if x > ccap && x
<= ccap1 branch. In this case, the preconditions of
lemma_days_to_ship_is_decreasing'' are no longer met. So, we use
coinduction to show that days_to_ship' weights (cap + 1) ccap1 <= 1 +
(days_to_ship' weights cap ccap). Then, since days_to_ship weights
cap ccap = 1 + days_to_ship xs cap cap, F* is automatically
able to cancel the 1s and prove our theorem. A similar argument
applies to lemma_days_to_ship_is_decreasing'.
But even armed with this theorem, F* still can’t prove the precondition. Try it. We’ll have to go even further:
let lemma_days_to_ship_is_decreasing_full (weights: weight_list) (cap: nat{cap >= (max_elt weights)})
: Lemma (ensures days_to_ship weights (cap + 1) <= days_to_ship weights cap)
=
lemma_days_to_ship_is_decreasing weights cap cap
let rec lemma_days_to_ship_is_decreasing2 (weights: weight_list) (c: nat{c >= min_bound weights})
: Lemma (ensures (forall (x : nat) . x >= min_bound weights /\ x < c ==>
days_to_ship weights x >= days_to_ship weights c))
= if c > min_bound weights then (
lemma_days_to_ship_is_decreasing_full weights (c -1);
lemma_days_to_ship_is_decreasing2 weights (c - 1)
)
let rec lemma_ship_within_days'_ships_within_days (weights: weight_list)
(days: nat{days > 0})
(min_cap: nat{min_cap >= min_bound weights})
(max_cap: nat{max_cap >= min_cap})
: Lemma
(requires min_bound_invariant weights min_cap days /\
max_bound_invariant weights max_cap days)
(ensures (days_to_ship weights (ship_within_days' weights days min_cap max_cap)) <= days /\
is_minimal weights (ship_within_days' weights days min_cap max_cap) days)
(decreases max_cap - min_cap)
=
if min_cap = max_cap then
()
else
let middle_cap = (min_cap + max_cap) / 2 in
let total_days = days_to_ship weights middle_cap in
if total_days > days then (
lemma_days_to_ship_is_decreasing2 weights middle_cap;
lemma_ship_within_days'_ships_within_days weights days (middle_cap + 1) max_cap
) else (
admit ()
)
As you might guess, F* has a similar problem with the
max_bound_invariant. The problem is that the invariant requires
all capacities greater than max_cap to ship in less than or equal
to days, but our decreasing lemma only applies to max_cap + 1. Our
proof strategy is to use induction to extend our original decreasing lemma to show
$\forall k : \mathbb{N} . days\_to\_ship~ weights~ (capacity + k) <=
days\_to\_ship~ weights~ capacity$.
This argument convinces F*:
let rec lemma_days_to_ship_is_decreasing3'' (w: weight_list) (c : nat{c >= min_bound w}) (k: nat)
: Lemma (ensures days_to_ship w (c + k) <= days_to_ship w c)
=
if k = 0 then ()
else (
lemma_days_to_ship_is_decreasing_full w (c + k - 1);
lemma_days_to_ship_is_decreasing3'' w c (k - 1)
)
let lemma_days_to_ship_is_decreasing3' (w: weight_list) (c : nat{c >= min_bound w})
: Lemma (ensures forall (k : nat) . days_to_ship w (c + k) <= days_to_ship w c)
=
assert (forall (w: weight_list) (c: nat{c >= min_bound w}) (k : nat) .
days_to_ship w (c + k) <= days_to_ship w c)
by (
let w = forall_intros () in
mapply (`lemma_days_to_ship_is_decreasing3'' )
)
let lemma_add_definition (c:nat)
: Lemma (ensures (forall (x : nat) . x >= c ==> (exists (k : nat) . x = k + c)))
=
assert (forall (x : nat) . x >= c ==> x - c >= 0 /\ x - c + c = x)
let lemma_days_to_ship_is_decreasing3 (weights: weight_list) (c: nat{c >= min_bound weights})
: Lemma (ensures forall (x : nat) . x >= c ==> days_to_ship weights x <= days_to_ship weights c)
=
lemma_days_to_ship_is_decreasing3' weights c;
lemma_add_definition c
let rec lemma_ship_within_days'_ships_within_days (weights: weight_list)
(days: nat{days > 0})
(min_cap: nat{min_cap >= min_bound weights})
(max_cap: nat{max_cap >= min_cap})
: Lemma
(requires min_bound_invariant weights min_cap days /\
max_bound_invariant weights max_cap days)
(ensures (days_to_ship weights (ship_within_days' weights days min_cap max_cap)) <= days /\
is_minimal weights (ship_within_days' weights days min_cap max_cap) days)
(decreases max_cap - min_cap)
=
if min_cap = max_cap then
()
else
let middle_cap = (min_cap + max_cap) / 2 in
let total_days = days_to_ship weights middle_cap in
if total_days > days then (
lemma_days_to_ship_is_decreasing2 weights middle_cap;
lemma_ship_within_days'_ships_within_days weights days (middle_cap + 1) max_cap
) else (
lemma_days_to_ship_is_decreasing3 weights middle_cap;
lemma_ship_within_days'_ships_within_days weights days min_cap middle_cap
)
As an exercise: It is up to the reader to demonstrate that the
min_bound_invariant and max_bound_invariant hold under the initial
conditions set by ship_within_days.
F* has an amazing Emacs
mode. It uses unicode
symbols to make identifiers like forall and exists render as the
appropriate logic symbols. It also allows you to verify code as you
work inside of Emacs itself. Finally, it provides error squiggles.
F* can automatically find many proofs, more so than similar tools that I’ve experimented with (e.g., Coq and Isabelle). In that sense, F* seems easier to adopt than more mainstream tools.
Error messages are bad. From my experience using Z3, this is because Z3 does not generate very good unsatisfiable cores. To expand: You provide Z3 a bunch of logical formulae. Z3 attempts to find an interpretation (i.e., a mapping of variables to values) that satisfies the formulae. When Z3 definitely cannot find an interpretation, the formulae are unsatisfiable. For the sake of error reporting, you might be interested in why formulae are unsatisfiable. What is the smallest number of formulae you can remove from the solver that makes the others satisfiable?
Unfortunately, things are not so simple for several reasons:
Meanwhile, tools that use Z3 have to somehow manage the relationship between Z3 variables and their own semantic domain. This adds to the challenge of making good error messages with Z3.
Z3 is sensitive to a lot more than you may expect. A common idiom in F* is to test if adding a lemma helps you with a proof, like so:
let lemma_a (x: unit) : Lemma (ensures some_formula) =
admit ()
let lemma_b (x: unit) : Lemma (ensures some_formula) =
// Other lemmas not shown.
lemma_a ();
()
Here, lemma_b uses lemma_a in its proof. Now, assume that Z3 is
able to find a proof of lemma_b. So, we proceed to prove
lemma_a. Very rarely, I have noticed that changing the proof of
lemma_a causes Z3 to no longer be able to prove
lemma_b. Obviously, this is surprising because the lemma_b does
not logically depend on the specific proof of lemma_a.
Documentation and examples are also lacking. There are not a lot of high quality educational resources available today.
I found F* to be immensely usable. While error messages are not the best, this is really a limitation of the underlying SMT solver. From experience, Z3’s unsatisfiable cores are complex to handle. And moving back and forth from the high level language F* provides and SMT is challenging. But this definitely an area that needs improvement.
The ecosystem of F* is young. The resources I’ve used are:
FStar.List.Tot.Properties.fold_left_monoid.I hope that this post has inspired you to give F* a try.
]]>Inspired by a recent post where the author used ChatGPT as a virtual machine, I wanted to learn if ChatGPT can be a useful LISP interpreter. To my surprise, ChatGPT understands LISP remarkably well.
![]() |
|---|
| Figure 1: Initial prompt and basic LISP functions. |
Figure 1 shows the initial prompt I used. It’s very similar to the prompt in Building A Virtual Machine inside ChatGPT. We see a few interesting facts already:
NIL evaluates to NIL.(eq (car nil) nil).![]() |
|---|
Figure 2: CONS‘ing and SETFs. |
In LISP, we construct a CONS cell that contains two pointers (called CAR and CDR) with the CONS function. Continuing on to Figure 2, it seems like ChatGPT is aware of how CONS works. LISP also allows us to modify the value stored in a place with the the SETF macro. If the first argument to the SETF macro is a symbol (e.g., my-list), then SETF modifies the symbol table to associate the symbol-name with the value of the 2nd argument. ChatGPT seems aware of how SETF behaves. The first line of Figure 3 shows that ChatGPT can remember the state of the symbol table.
![]() |
|---|
| Figure 3: Recursive Functions of Symbolic Expressions. |
FFigure 3 shows the definition of a function named f. Here, f computes the factorial of a number. This might seem challenging, since f is a recursive function. But ChatGPT evaluates the function without any problems. Figure 4 shows f applied to a larger, challenging input. Once again, ChatGPT correctly evaluates the expression.
![]() |
|---|
| Figure 4: The persistence of memory. |
Let’s see if ChatGPT still recalls the association between my-list and (42) we introduced in our symbol table. Figure 4 shows the results of evaluating (setf (car my-list) 42). We see that:
setf works on arbitrary places, not just symbol names.my-list with a list containing a single element.Let’s try another challenge: The Y Combinator. I used this implementation. Figure 5 shows the results.
To my surprise, ChatGPT understands the function definition and correctly evaluates it. This is particularly challenging, since it shows:
FUNCALL and understands what it means to be a LISP-2.![]() |
|---|
| Figure 5: The Y Combinator. |
I am very surprised how well ChatGPT handles the task of interpreting LISP code. I am very curious if ChatGPT actually understands the source code, or if it has seen enough examples that it can blindly regurgitate results it has memorized. Since LISP has very simple semantics, it’s a great tool for studying the extent of a large language model’s ability to understand and interpret source code.
At this past year’s ASE, there was a really interesting paper called AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language models. One conclusion of this paper is large language models deeply understand syntax trees. I wonder if we can somehow decide if large language models understand a language’s operational semantics?
]]>I encountered a fun programming puzzle recently:
You are given a description of a two-lane road in which two strings, L1 and L2, represent the first and the second lane. Each lane consists of N segments of equal length.
The K-th segment of the first lane is represented by L1[K], and the K-th segment of the second lane is represented by L2[K], where “.” denotes a smooth segment of road, and “x” denotes a segment that contains potholes.
Cars can drive over segments with potholes, but it is uncomfortable for passengers. Therefore, a project to repair as many potholes as possible was submitted. At most one contiguous region of each lane may be repaired at a time. The region is closed to traffic while it is under repair.
How many road segments with potholes can be repaired given that the road must be kept open?
For example, if L1 = “..xx.x.” and L2 = “x.x.x..”, the maximum number of potholes we can repair is 4. See Figure 1 for an explanation.
![]() |
|---|
| Figure 1: Visualization of the example. Segments without potholes are shown as empty boxes. Segments with potholes are shown as gray boxes. Contiguous regions under repair are highlighted orange. The arrows indicate a path through the road. |
This problem has two key requirements:
![]() |
|---|
| Figure 2: L1 = “..xx…” and L2 = “x….x.”. The solution shown here is not allowed, since L2’s repair regions are not contiguous. |
There are two important observations about the problem. First, a vehicle must be able to travel the road by changing lanes at most once. I give an argument for this point in the next paragraph. Second, no repair can occur at the segment where the vehicle changes lanes. This is because both lanes must be open for the vehicle to change lanes.
A proof by contradiction shows the vehicle can change lanes at most once in an allowed solution. First, assume without loss in generality that a vehicle starts in L1, and changes lanes twice at segments i and j. A repair must occur in the region [0, i-1] in L2, otherwise the vehicle could have started in L2. A repair must start at segment j in L2, otherwise the vehicle need not change lanes. But the segments [0, i-1] and [j, …] are not contiguous. So, the solution is not allowed. We conclude that a vehicle can change lanes at most once.
Since the vehicle can only change lanes once, we only need to find (1) the segment to change lanes, and (2) the starting lane. Let’s start by characterizing the segment where the vehicle changes lanes. Suppose the vehicle starts in L1. Call the ideal segment to change lanes C. The sum of potholes in L1 in region [C+1, …] and L2 in region […, C-1] is maximal. This is because, since the vehicle doesn’t start in L2, we can repair all segments in L2 until C. The same argument applies to L1 after C.
We can compute C in $O(n)$ time, where n is the number of segments. Maintain two arrays of length n, $avoided_{L1}$ and $avoided_{L2}$. Let $avoided_{L1}[i]$ denote the number of potholes avoided in L1[i+1, …] if the vehicle changes lanes from L1 to L2 at segment i. Similarly, $avoided_{L2}[i]$ denotes the number of potholes avoided in L2[0, i-1] if the vehicles changes lanes from L1 to L2 at segment i. So, $avoided_{L1}$ stores the partial sums of the number of potholes in L1 counting from the end. Meanwhile, $avoided_{L2}$ stores the partial sums of the number of potholes in L2 counting from the start. Computing C is simple: $C = \underset{0 \leq c < n}{\text{argmax}}(avoided_{L1}[c] + avoided_{L2}[c]).$
Finding the starting lane $L$ is also easy. Let $F(A)$ denote the value of $C$ for a vehicle that starts in lane $A$. Then, $L = \underset{l \in \left\{ L1,~ L2 \right\} }{\text{argmax}}(F(l))$.
This solution has a runtime of $O(n)$, since computing $C$ takes $O(n)$ time. Memory usage is $O(n)$, since we create the extra arrays $avoided_{L1}$ and $avoided_{L2}$ to store partial sums.
from typing import List
class PotholeState(enum.Enum):
POTHOLE = 1
CLEAN = 2
_STR_TO_STATE = {
'.': PotholeState.CLEAN,
'x': PotholeState.POTHOLE,
}
def read_lanes(l1: str, l2: str) -> List[List[PotholeState]]:
return [[s1, s2] for s1, s2
in zip(
[_STR_TO_STATE[chr] for chr in l1],
[_STR_TO_STATE[chr] for chr in l2],
)]
def _max_repairable_helper(l1: List[PotholeState], l2: List[PotholeState]) -> int:
l1_avoided_potholes = [0] * len(l1)
l2_avoided_potholes = [0] * len(l2)
for i in range(len(l1) - 2, -1, -1):
l1_avoided_potholes[i] = l1_avoided_potholes[i + 1]
if l1[i+1] == PotholeState.POTHOLE:
l1_avoided_potholes[i] += 1
for i in range(1, len(l2)):
l2_avoided_potholes[i] = l2_avoided_potholes[i - 1]
if l2[i - 1] == PotholeState.POTHOLE:
l2_avoided_potholes[i] += 1
return max([l1_avoided_potholes[i] + l2_avoided_potholes[i] for i in range(len(l1))])
def max_repairable_segments(road: List[List[PotholeState]]) -> int:
lane1 = [road[i][0] for i in range(len(road))]
lane2 = [road[i][1] for i in range(len(road))]
return max(
_max_repairable_helper(lane1, lane2),
_max_repairable_helper(lane2, lane1),
)