Marco Paviotti

So you want to get a PhD..

2026-02-15T13:15:00+00:00

while I think this is a great idea, let me tell you: This is not a degree for the faint-hearted.

🎓 What a PhD Really Is

A PhD (Doctor of Philosophy) is a research degree focused on creating new knowledge, not just learning existing knowledge.

It is radically different from any other university degree.
As a PhD student you will not necessarily just study books an pass exams, but focus more on doing independent original research, publishing and presenting your findings and finally writing and defending a doctoral thesis.

In other words, you will spend a lot of time reading, writing, creating knowledge and disseminate it. Essentially, having a PhD demonstrates that you possess a deep knowledge of your own specific field and can engage with other international experts as a peer.

If it still sounds easy to you here’s some facts. A PhD is the highest academic degree awarded by universities — held by roughly 2% of the world’s population (I took this number from Google, don’t ask!) and, based on calculations I’ve made in a university I worked for, failure rates for PhD students can be as high as 40% .

The reasons why people don’t make it to the end of the line can vary from person to person, but common mistakes include underestimating the amount of work required, supervisor mismatch and/or topic mismatch, and not understanding what is the requirement for getting a PhD.

🌟 It can be the best moment of your life,… or the worst :pirate_flag: !

A PhD can become, one of your biggest life achievements and the most intellectually rewarding periods of your life. This provided that you can make consistent progress, tolerate uncertainty, reach the finish line.

Otherwise, it can become a long tunnel with no visible exit, a slow erosion of confidence, and a prolonged period of stress and doubt.

You might think I’m trying to scare you — and to some extent, that’s true — but it’s better to be honest than to lead you into a trap. Yes, it can feel like a trap, because once you begin, you’re committing to three or four years of your life, possibly more.

It’s truly a make-or-break journey. If halfway through you realize it’s not right for you, you’re faced with a difficult decision: acknowledging that you’ve lost a significant amount of time and quit or keep going for another several years and not knowing whether you’re going to make it or not.

That choice can be heartbreaking — for you and for us.

🔥 I have a PhD in “Making Mistakes”

One of the most precious life lessons I learned from my PhD is that

Failure happens when you try something new

If anyone ever mocked you for having failed at something don’t worry; Those who are never wrong have never tried anything challenging. I personally think I failed in my life more than I succeeded. In a sense, I have a PhD in making mistakes, and I am not ashamed of saying it.

However, failure is not a side effect, it is the mechanism. If you cannot tolerate being wrong, uncertain, or behind for long periods, a PhD will feel unbearable.

Doing research means

Submit papers that get rejected ❌
Run experiments that fail 💥
Write code that doesn’t work 🐛
Read things you don’t understand (yet) 🤯

But, 💪 if you can treat failure as data, confusion as progress, and rejection as part of life, then a PhD is for you. In other words, if failure challenges you and drives you forward, and if not knowing things excites you rather than frustrates you, then this path is for you.

🧠 The harsh truth of modern academia

Sorry for saying this, but the harsh truth is that undergraduate degrees are getting easier and easier to attain, on the other hand, PhDs are as demanding as ever.

When doing an undergraduate degree, you are safeguarded; Material is prepared for you, grades follow standardized metrics, and admission almost guarantees eventual graduation. A PhD, on the other hand, does not follow these rules; There is no guarantee of success.

There is also an uncomfortable truth about modern academia that we (as a society) will need to address at some point: The AI factor.

Don’t get me wrong, it is not a sin to use AI to restructure text, but I would be very careful about using it for doing rigorous work.

The problem is that the unscrupulous use of AI is making undergraduates often less trained to work independently and to push themselves rigorously. Not only AI only regurgitates existing knowledge and cannot produce new, but also the knowledge it regurgitates is often wrong and sloppy. I recently spent days trying to debug AI slop and at some point I gave up because the AI would produce wrong math faster than I could disprove it.

This is also known, as a The bullshit asymmetry principle (a valuable piece of wisdom btw). The principle is as follows:

The amount of energy needed to refute bullshit is an order of magnitude bigger than to produce it.

🔱 The most important thing of a PhD

The other most important thing I’ve learned when doing a PhD is that relationships are not built on magic or illusions — they are built on mutual benefit.🤝

I give you what you need, if you give me something that I need ⚖️

Anyone that ignores this fundamental balance risks becoming toxic. ☠️🧪⚠️

This basic life lesson is the core of the PhD program and therefore I’d say that the most important in your PhD is

The relationship with your supervisor.

It is crucial that you find a supervisor whose research interests align with yours, someone you feel comfortable working with and someone whose mentoring style matches your needs.

When looking for a PhD, my suggestion is to state upfront what kind of topics you like and what kind of supervision you want. Don’t negotiate on it. Ultimately, it is your PhD.

🤝 The 2-Player Game

In my opinion, a PhD is essentially a two-player game between the student and the supervisor.

Crucially though, if this relationship break, the one who suffers the most is the student.🛋️💡 While the supervisor is responsible for you, if you do not carry on with your own work, they will not lose their job if you don’t get your PhD.

In many European systems in particular, the supervisor may be the single most important factor in your PhD experience.

A bad or toxic supervisor can derail even a strong student, while a good supervisor can elevate an average project into an excellent one and, also,
a good supervisor can do little if the students is not willing to contribute.

🌍 Moving abroad

If you are not a native English speaker, I strongly support the idea of doing a PhD in an English-speaking country or similar (like the Netherlands, Canada or Scandinavia).

For a handful of reasons:

these countries are best at supporting research 🔬
you will hone your English abilities 🇬🇧
you will learn to interact with a wide variety of different people 🧑‍🤝‍🧑

In other words, moving abroad will strengthen your PhD considerably into many different directions.

Often countries like Denmark or the Netherlands have way more funding of PhD students, they treat students more fairly and … they pay more. It is overall a better environment that allows everyone to thrive.

Moreover, to get a good PhD you need to be able to collaborate with different people as good ideas tend to arise in environments with a high level of diversity. This is, in my opinion, just because we work in a very specialised and narrow field and so eventually finding people locally is extremely hard.
Hence it is extremely important that you learn to communicate and collaborate with all kind of people regardless of their culture, ethnicity and so on.

At the same time, doing a PhD while living abroad, confronting yourself with different cultures and learning a new language can become a daunting task.
Some people experience significant cultural shocks while pursuing their PhD, which can take a heavy toll on their mental health. I would encourage anyone who feels overwhelmed to seek support. Many universities provide resources, sometimes limited, but still helpful, or it can be beneficial to speak with a professional.

Make sure you like the country you are moving to before accepting a PhD offer.

✅ Takeaways

Explaining what a PhD is, is not an easy task, it is a challenging and unpredictable journey.
Everyone’s experience is different, so in the end, no one can truly tell you what a PhD is — you have to experience it yourself, fail, learn, and try again.

To anyone who is considering doing a PhD, I’d suggest to choose

a topic that makes you tick :metal:
a supervisor that is willing to supervise you 🤝
a country that you like

During times of self-doubt, take care of yourself, don’t beat yourself up and avoid overindulging in pubs or nightclubs 🍻. Get a good night’s sleep 🛌 and try again tomorrow.

(Do as I say, not as I did).

Recursion as an Effect

2025-12-28T19:24:21+00:00

In one of my previous post I showed that any theory featuring general recursion is inconsistent when viewed as a logical system which inevitably leads to the idea that all definable functions in such a theory should be total (or productive).

However, losing Turing-completeness could be somewhat problematic for some, but it can be addressed in several ways. One of these, an possibly the most popular, is to isolate recursion into a monad, effectively regarding recursion as an effect. We discuss several lifting monads which are fit for purpose.

Domain-Theoretic Liftings

The domain-theoretic approach to non-termination is to model computations as maps between sets with an additional element. Thus we define a lifting operation which takes a set and adds an element to it

\[M A = A + 1\]

which is the set of computations that either return an element of type $A$ or do not terminate.

Of course, without proper restrictions on the functions that we can apply to it, this monad allows one to “decide non-termination”: one can write a function $f : M A \to \{\textbf{True}, \textbf{False}\}$ which returns $\textbf{True}$ if the program does not terminate and $\textbf{False}$ otherwise. This clearly is not what we are trying to model.

To avoid this problem, in domain theory, a set $A$ are endowed with a complete partial order ($\sqsubseteq$) where non-termination is modelled as the least element ($\bot$). The operation $A \mapsto A_\bot$ which adds a least element to a CPO is called the lifting of a CPO.
Moreover, functions have to respect a continuity condition, that is the function must preserve least upper bounds of arbitrary $\omega$-chains:

\[f(\bigsqcup_{i\in \omega} d_i) = \bigsqcup_{i \in \omega} f(d_i)\]

Essentially what this means is that the function $f$ when applied to the best approximation of a subset, can be computed locally for each element of this subset. One consequence of this fact is that $f$ is monotonic: it preserves the order of the CPO. One feature of this category is that every continuous map $A_\bot \xrightarrow{\text{cont}} A_\bot$ has a fixed-point operator via the Fixed-Point Theorem:

\[\text{fix}(f) = \bigsqcup_{i \in \omega} f^n(\bot)\]

which is given by the least upper bound of an $\omega$-chain

\[\bot \sqsubseteq f(\bot) \sqsubseteq f^2(\bot) \sqsubseteq \dots \sqsubseteq f^n(\bot) \sqsubseteq \dots\]

To go back to our original problem. Since $\bot \sqsubseteq a$ for all $a \in A$, we cannot define a continuous map $A_\bot \to 2_\bot$ such the one above because in the codomain of this function the elements $\textbf{True}$ and $\textbf{False}$ are not related.

Remark. When doing mathematics into a proof assistant the expert distinguishes two ways:

implementing all the theory inside the prover’s logic, or
creating a new synthetic language whose structure is interpreted inside the mathematical theory we want to work with

The second approach is the one, for example, used in HoTT, where types are certain topological spaces and functions are continuous.

The problem of formalising domain theory is that it becomes more complicated when the proof assistant is based on type theory. In particular, the problem is that a type is not really a set.

On the other hand, doing things synthetically would mean that recursion is somewhat spread across the whole language. What I mean by this is that since every continuous function has a fixed-point then non-termination can happen at every type making the internal language of this category effectively an inconsistent language when viewed as a logic. Hence the need for treating recursion as an effect.

The Coinductive Lifting (Capretta)

One solution proposed by Capretta is to take the coinductive solution to the following domain equation

\[D A \cong A + D A\]

In other words, $DA$ is the set coinductively generated by the constructors $\text{now} : A \to D A$ and $\text{delay} : DA \to DA$. Intuitively, $\text{now}(x)$ is a terminating computation which returns an element $x \in A$ in $0$ steps, while $\text{delay}(c)$ takes a computation $c \in DA$ and delays it by adding one computational step to it. For example, $\text{delay}(\text{delay}(\text{delay}(10)))$ is a computation which returns the number $10$ in three steps.

Remark. $D$ can be given the structure of an $\omega$-CPO with $\bot$.

First, the non-terminating computation $\bot$ can be defined coinductively as

\[\bot = \text{delay}(\bot)\]

which is clearly a productive definition. Intuitively, $\bot$ corresponds to the never-ending stream of delays:

\[\bot = \text{delay}(\text{delay}(\text{delay}\dots))\]

Clearly, we cannot produce a function which discriminate between a terminating computation and non-terminating one. Capretta proves that $D$ is a domain (up-to bisimilarity), that is, he defines a partial order $\sqsubseteq_D$ on $D$ which leads to a notion of least upper bounds for $\omega$-chains, written $\bigsqcup_{n\in \omega} d_n$ for

\[d_0 \sqsubseteq_D d_0 \sqsubseteq_D d_1 \dots \sqsubseteq_D d_n \dots\]

then it can be proven that every continuous function on $D$ has a fixed-point similarly to the construction in domain theory.

Considerations. Now that recursion is being isolated into an effect we have solved one problem. However, programming in practice with this monad is far from being easy as one has to

prove that each program on $DA$ they define is a continuous function
working with a coinductive bisimilarity relation rather than equality
ensure productivity of definitions

Metric Lifting Monad (Martin Escardó)

Escardó’s metric lifting models partiality using metric spaces rather than coinduction, but the idea is not that different from Capretta’s. The metric lifting of a set $A$, written $LA$, is defined as

\[LA = (A \times \mathbb{N}) \cup \{\infty\}\]

together with a distance function $d : LA \times LA \to [0, \infty]$ where equal computations have distance $0$, terminating computations $(a,k)$ and non-terminating ones have distance $(1/2)^k$, and terminating computations $(a,k)$ and $(b,l)$ have distance $1/2^{\text{min}(k,l)}$. Intuitively, $(a,k)$ is a computation which returns $a$ in $k$ steps and $\infty$ is the divergent computation.

Remark. $LA$ is a complete bounded metric ultrametric space.

The unit of the monad $LA$ is defined by $\eta_A(a) = (a,0)$ and the delay operation is defined by

\[\delta_A(a,n) = (a, n + 1) \qquad \delta_A(\infty) = \infty\]

In metric spaces terminology, a function is non-expansive if it does not expand the space relative to a distance function $d$, but possibly contracts it:

\[d(f(x), f(y)) \le d(x,y)\]

On the other hand, a contractive map is a map which contracts the space:

\[d(f(x), f(y)) \le c \dot (d(x,y))\]

for a certain $c < 1$. At this point it is possible to define a fixed-point operator for all contractive maps

\[\text{fix} : (LA \to LA) \to LA\]

which sends every non-expansive map $f$ to the fixed-point of $\delta_A \circ f$, which is contractive because $\delta_A$ is contractive. At this point the non-divergent computation is now defined as

\[\bot_A = \text{fix}(id_{LA})\]

Considerations. This approach does not seem to suffer from the use of coinduction, but it still needs the programmer to prove functions are non-expansiveness.

Guarded Lifting (Atkey & McBride)

The coinductive lifting monad suffers from productivity and equality issues, while both the coinductive and metric liftings need additional structure on the maps defined on them to work properly with fixed-points.

In guarded type theory however, maps are always non-expansive and contractiveness is enforced at the type level. In particular, a contractive map is a function of type $\triangleright X \to X$ for which there is always a fixed-point at all types $X$:

\[\text{fix}_g : (\triangleright X \to X) \to X\]

sending a map $f : (\triangleright X \to X)$ to the unique fixed-point of $f \circ \text{next}$. The guarded lifting is defined as the unique solution to the domain equation

\[L_g A = A + \triangleright L_g A\]

There is an obvious unit of the monad $\eta_A : A \to L_g A$ and delay map which has type

\[\delta_A : \triangleright L_g A \to L_g A\]

Conceptually, this monad can be seen as Capretta’s lifting monad with an explicit notion of time or delay built into the type theory. At this point the divergent computation $\bot_A : L_{g} A$ is defined as the guarded fixed-point of $\delta$:

\[\bot_A = \text{fix}_g (\delta_A)\]

Now we can check from the fixed-point property that $\bot = \delta_A (\text{next}(\bot_A))$. Here, the term $\delta_A \circ \text{next}$ corresponds to the delay operation which adds one step to the computation.

Conclusion

The Synthetic Approach. What I personally found truly amazing about the guarded lifting is that this monad is truly synthetic. There is no need for additional structure as in Capretta’s lifting, no need for checking continuity or non-expansiveness of maps. Furthermore, using the model of guarded type theory one can show that (in a certain sense) it corresponds to Martin’s metric lifting on one side and to Capretta’s monad on the other. I will probably need another post to explain this point.

Intensionality. To be honest, the only problem arising from the use of guarded recursion unfortunately is the fact that computations are modelled intensionally, that is two computations that return the same output given the same input are not necessarily equal if they take a different amount of steps to terminate. This is an issue that has to be solved once again by quotienting the monad which is another problem entirely.

Nevertheless, these problems also arise in coinductive and metric approaches. At present, the only extensional model of general recursion I am aware of is based on domain theory.

Consistency. Naturally, one might wonder why do we need guarded recursion, if domain theory already lets us model all of this extensionally? The answer to that is that, while domain theory is extremely powerful for modelling recursion extensionally, it does not yield a logically consistent model suitable for type theory. As noted in the introduction, this inconsistency makes domain-theoretic models ill-suited as foundations for type-theoretic languages, where logical soundness is essential.

On Lax Monoidal Functors

2025-12-19T21:17:21+00:00

What is the difference between

a lax monoidal functor
a monoid in a Day-monoidal category
a morphism of lax-algebras for the free monoid 2-monad, and
a codistributive law with the tensor product?

Well, None. Let’s see why.

To keep this post as concise as humanly possible I will assume knowledge of (symmetric)monoidal categories, kan extensions and enriched categories.

We show informally the following proposition.

Proposition. Let $(\mathcal{C}, \otimes_{\mathcal{C}}, I_{\mathcal{C}})$ be a small monoidal closed category enriched in a monoidal closed category $(\mathcal{D}, \otimes_{\mathcal{D}}, I_{\mathcal{V}})$ and let $F : \mathcal{C} \to \mathcal{D}$ be a functor. The following statements for $F$ are equivalent:

It is a lax monoidal functor
It is a monoid in the monoidal category $([\mathcal{C}, \mathcal{D}], \otimes_\text{Day}, y(I_{\mathcal{C}}))$
It is a homomorphism of pseudo algebras for the free monoid 2-monad
It is a $\mathbb{N}$-indexed family of (co)distributive laws for a functor $F : \mathcal{C} \to \mathcal{C}$
\[\text{Nat}(\otimes^{n} \circ F^{n}, F \circ \otimes^{n})\]
where $\otimes^{n} : \mathcal{C}^{n} \to \mathcal{C}$

Let us assume the hypothesis of the proposition.

Proof(Sketch). (1) $\Leftrightarrow$ (2).

A lax monoidal functor is a functor which lax-preserves the monoidal structure of $\mathcal{C}$ that is, there is a morphism

\[u : I_{\mathcal{D}} \to F I_{\mathcal{C}}\]

and a family of morphisms

\[\circledast_{X,Y} : F X \otimes_{\mathcal{D}} F Y \to F (X \otimes_{\mathcal{C}} Y)\]

indexed by $X,Y$ and natural therein, subject to some coherence conditions.

On the other hand, the Day convolution provides a natural way to define a monoidal structure on the category of functors. In other words, the task is to turn the category of functors $[\mathcal{C}, \mathcal{D}]$ into a monoidal category by equipping it with a tensor product and a unit. Hence, for two functors $F, G : \mathcal{C} \to \mathcal{D}$ the Day convolution $\otimes_\text{Day}$ is defined as follows:

\[\begin{align*} (F \otimes_\text{Day} G) C & := \int^{X,Y \in \mathcal{C}} \mathcal{C}(X \otimes_{\mathcal{C}} Y, C) \otimes_{\mathcal{D}} F X \otimes_{\mathcal{D}} G Y\\ & = \text{Lan}_{\otimes_{\mathcal{C}}}(\otimes_{\mathcal{D}} \circ F \times F) \end{align*}\]

while the unit of $[\mathcal{C}, \mathcal{D}]$ is given by the Yoneda embedding applied to the unit $I_\mathcal{C}$ that is $y(I_\mathcal{C}) = \mathcal{C}(I_\mathcal{C},-)$.

Now, a monoid in $([\mathcal{C}, \mathcal{D}], \otimes_\text{Day}, y(I))$ is called a Day-monoid. This is a functor $F : \mathcal{C} \to \mathcal{D}$ together with a unit and multiplication map.

The unit map $\eta : y(I) \to F$ is obtained from the unit of the lax monoidal functor (and viceversa) via the enriched Yoneda lemma

\[\text{Nat}(y(I), F) \cong \mathcal{D}(I_{\mathcal{D}}, F(I_{\mathcal{C}}))\]

The multiplication map $\mu : F \otimes_{\text{Day}} F \to F$ is obtained from $\circledast$ (and viceversa) by using the adjunction $\text{Lan}_J \dashv - \circ J$ as follows

\[\text{Nat}(\text{Lan}_{\otimes_\mathcal{C}}(\otimes_\mathcal{D} \circ F \times F), F) \cong \text{Nat}(\otimes_\mathcal{D} \circ F \times F, F \circ \otimes_\mathcal{C})\]

It remains to prove that the laws of the unit and multiplication of the monoid imply the lax monoidal properties of $u$ and $\circledast$ (left as exercise to the reader).

$(2) \Leftrightarrow (3)$.

This is a rather easy statement which generalises the free monoid construction to 2-categories.

In particular, the cheapest way of turning a set $A$ into a monoid is to take the set of words over $A$, namely $A^*$. This is the free monoid over $A$ where the empty word is the unit and concatenation is the multiplication of the monoid. The Eilenberg-Moore algebras of $A^*$ are equivalent to the algebraic structure of the monoid $A^*$.
In particular, the category of Eilenberg-Moore algebras over $A^*$ is equivalent to the category of monoids

\[\mathcal{C}^{A^*} \simeq \textbf{Mon}\]

Similarly, given a category $\mathcal{C}$, the cheapest way of turning this category into a monoid (a monoidal category) is to send $\mathcal{C}$ to the category of finite sequences of objects $(A_1, \dots, A_n)$ and componentwise sequences of morphisms in $\mathcal{C}$. In other words, $T$ is the free monoid 2-monad in $\textbf{Cat}$ defined as the $\mathbb{N}$-coproduct $\mathcal{C}^n$, that is

\[T\mathcal{C} = \sum_{n : \mathbb{N}} \mathcal{C}^{n}\]

Now, similarly to what happens in the 1-category case, we have the following equivalence

\[\textbf{Cat}^T \simeq 2\text{-Mon}\]

where $\textbf{Cat}^T$ is the 2-category of algebras for a 2-monad $T$ and $T$-algebra homomorphisms and $2$-Mon is the 2-category of monoidal categories and monoidal functors (monoids in $\textbf{Cat}$). Hence (pseudo) $T$-algebra homomorphisms are (lax) monoidal functors.

$(3 \Leftrightarrow 4)$.

Clearly, if $T$ is the free monoid 2-monad, an algebra for $T$ is a map

\[a : \sum_{n : \mathbb{N}} \mathcal{C}^n \to \mathcal{C}\]

The previous point states that this is a monoidal category where $A \otimes_\mathcal{C} B := a (A,B)$ and $I_\mathcal{C} = a()$, thus $a$ sends $(A_1, \dots, A_n)$ to $A_1 \otimes_\mathcal{C} \dots \otimes_\mathcal{C} A_n$.

A lax monoidal functor $F$ is a lax $T$-algebra homomorphism, thus it has to satisfy

\[F(A_1) \otimes_\mathcal{D} \dots \otimes_\mathcal{D} F(A_n) \to F(A_1 \otimes_\mathcal{C} \dots \otimes_\mathcal{C} A_n)\]

which is defined at all $n$ and $A_i$ hence it is a (co)distributive law

\[\text{Nat}(\otimes^{n}_{\mathcal{D}} \circ F^{n}, F \circ \otimes^{n}_{\mathcal{C}})\]

Bisimulations, Equality and Traces

2023-10-09T13:14:21+00:00

Strong bisimulation for CCS is the preferred equivalence method in concurrency because it relates less programs than trace equality. However, the reality is that is strong bisimulation and trace equality ought to be regarded as equivalent. This is the essence behind proof assistant’s like (e.g.) Isabelle. So what is going here?

One example of this fact is when considering CCS with the choice operator. In this language we can define a process $P$ and a process $Q$ as follows

\[P = \text{pay}.(\text{coffee}. 0 + \text{tea}. 0)\] \[Q = (\text{pay}.\text{coffee}.0 + \text{pay}.\text{tea}. 0)\]

Now the trace semantics of the CCS processes can be defined by a function

\[[\![ \cdot ]\!] : \text{CCS} \to \mathcal{P}_\text{fin}(\text{Str } L)\]

where $L$ is the finite set of actions and, for a generic set $A$, the set $\text{Str } A = 1 + A \times \text{Str }A$ is the set of possibly finite streams over a set $A$.

For the processes above we have that the semantics of $P$ is $[\![P]\!] = \{\text{pay}.\text{coffee}, \text{pay}.\text{tea}\}$ and the semantics of $Q$ is $[\![ Q ]\!] = \{\text{pay}.\text{coffee}, \text{pay}.\text{tea}\}$ and thus the trace semantics of $P$ and $Q$ indicate that these processes should be equal.

However, consider the relation $P$ simulates $Q$ which is stated as

\[P \lesssim Q \Leftrightarrow \forall P'. \text{ if } P \xrightarrow{a} P' \text{ then } \exists Q'. Q \xrightarrow{a} Q' \text{ s.t. } P' \lesssim Q'\]

Now the bisimulation relation can be defined as $P \approx Q \Leftrightarrow P \lesssim Q \text{ and } Q \lesssim P$.

The above example is a standard example in concurrency theory that shows that bisimulation can distinguish processes where equality on the trace semantics indicate that they should be regarded as equal and that is why bisimulations turn out to be more useful relations to compare processes.

Using the example above we can prove that $Q \lesssim P$. Let’s define half-evaluated processes as

\[P' = \text{coffee}. 0 + \text{tea}. 0\] \[P'_{1} = \text{coffee}. 0\] \[P'_{2} = \text{tea}. 0\] \[Q_{1} = \text{pay}.\text{coffee}.0\] \[Q_{2} = \text{pay}.\text{tea}.0\] \[Q'_{1} = \text{coffee}.0\] \[Q'_{2} = \text{tea}.0\]

Now for all transitions of $Q$ we have to show $P$ simulates them. The first one is $Q \xrightarrow{\text{pay}} Q'_{1}$. Obviously $P \xrightarrow{\text{pay}} P'_{1}$ and so now we have to show that $Q'_{1} \lesssim P'_{1}$ which clearly does. This works similarly if $Q$ decides to take the other route and produce tea in the end.

All right, but $P \lesssim Q$ does not work. This is because since $P$ makes a transition $P \xrightarrow{\text{pay}} P'$ we are forced to select which branch in $Q$ is simulating this behaviour. No matter which one we choose we get stuck in one way or the other. Say $Q \xrightarrow{\text{pay}} Q'_{1}$ we have to show $P' \lesssim Q'_{1}$, but this latter fact does not hold because $P'$ can make two different transitions and $Q'_1$ can only make one.

CoRecursion Schemes and Traces

Consider now the unfold function which takes a seed function an produces a trace by running the seed at each steps

unfold :: (x -> (L, x)) -> x  -> Str L
unfold seed x = let (l,x') = seed x in l :: unfold seed x'

Notice that the seed function $X \to L \times X$ can be viewed as a Labeled Transition System (LTS) where the set of states is $X$ and the function is the function implementing the transitions.

It is a very well-known fact that the unfold is a fully abstract map in the sense if we consider the notion of bisimilarity above and set $[\![ \cdot ]\!]$ to be unfold seed then we have the following theorem

Full abstraction $\text{ for all } t_{1}, t_{2}, t_{1} \approx t_{2} \Leftrightarrow [\![ t_{1} ]\!] = [\![ t_{2}]\!]$.

This is also backed by the fact that when programming in proof assistants like (e.g.) Agda – since coinductive data types are not really final coalgebras – it is common practice to just add the following axiom to the type theory

Axiom $\text{ for all } (s_{1}, s_{2} : \text{Str L}). s_{1} \approx s_{2} \to s_{1} = s_{2}$ .

Even more so, in some proof assistants like Isabelle coinductive data types are real final coalgebras and so the above axiom is actually a true fact in the prover’s logic.

Notice that the other direction is obvious and thus the axiom implies bisimiliary is logically equivalent equality.

So why bisimulation in the above example does not correpond to equality?

The reason is that the shape behaviours for CCS+choice is not $BX = L \times X$ but it is $\mathcal{P}_\text{fin}(L \times X)$.

In fact, the seed function describing the LTS of CCS+choice has the following type

 
opsem :: CCS ->  [(L, CCS)]

where we use lists [-] as a (rough) implementation of finite powersets.

At this point the LTS for CCS+choice can be defined roughly like this

...
opsem (p + q) = [(l, p') | (l, p') <- opsem p ] ++ [(l, q') | (l, q') <- opsem q ]

And now the unfold on this LTS will yield a fully abstract semantics

unfold opsem :: CCS -> Trees L

where $\text{Trees}\; L = \mathcal{P}_\text{fin} (L \times \text{Trees}\; L)$.

The mini Yoneda lemma for Type Theorists

2023-09-09T13:14:21+00:00

I have managed to teach the Yoneda lemma to students who knew very little about category theory, here’s how you do it.

Say that you want to do denotational semantics for a simply typed calculus with a unary constructor $\textsf{R}$ which has the following typing rule

\[\frac{\Gamma \vdash t : A}{\Gamma \vdash \textsf{R}(t) : B}\]

The task is to give a semantic interpretation $[\![ \cdot ]\!]$ for the language by induction on the typing judgment $\Gamma \vdash t : A$ such that terms are interpreted as morphisms $[\![\Gamma ]\!] \xrightarrow{[\![ t ]\!]} [\![ A ]\!]$, assuming for course $[\![ \cdot ]\!]$ is also defined separately for contexts and types.

We interpret the rule above we do induction on the typing judgment. Thus we assume there exists a morphism $[\![ \Gamma ]\!] \xrightarrow{[\![ t ]\!]} [\![ A ]\!]$ and we construct a morphism $[\![ \Gamma ]\!] \xrightarrow{[\![ \textsf{R} ]\!] } [\![ B ]\!]$.

For simplicity we remove the semantics brackets, for example, assuming $A$ be interpretation of $[\![ A ]\!]$, $t : \Gamma \to A$ the interpretation of $t$ an so on.

Back to the problem we are trying to solve. It can be quite tricky sometimes to figure out what the semantics of $\textsf{R}(t)$ are since there is some plumming needed to pass around the context. A particular instantiation of the Yoneda lemma states that given a morphism $t : \Gamma \xrightarrow{t} A$ and a morphism $R : A \to B$ there is a canonical way to construct a morphism $\Gamma \xrightarrow{R(t)} B$.

To show this we instantiate the contravariant Yoneda lemma by setting $F = \mathbb{C}(-, B)$. Then for all objects $A : \mathbb{C}^{\text{op}}$ we have

\[\mathbb{C}(A, B) \cong \mathbb{C}(-, A) \xrightarrow{\cdot} \mathbb{C}(-, B)\]

Let $R : A \to B$ be the interpretation of $\textsf{R}$ then, one side of the isomorphism is $\phi (\textsf{R},t) = F(t)(\textsf{R}) = \mathbb{C}(t, B)(\textsf{R})$. In other words, the interpretation of $\textsf{R}(t)$ is simply $\textsf{R} \circ t$.

CCCs and the complete models of STLC

2023-03-16T13:14:21+00:00

Cartesian closed categories are not regarded as complete models of the Simply Typed $\lambda$-calculus in the traditional sense. Let’s see why.

Assume $\Lambda_X$ is the set of closed well-typed STLC (Simply Typed $\lambda$-calculus) terms. Clearly, STLC can be interpreted into any Cartesian Closed category (CCC) by defining an interpretation function $[\![\cdot]\!] : \Lambda_X \to \mathcal{C}$ such that for any term $t \in \Lambda_X$ , $[\![t]\!] \in \mathcal{C}(1, [\![\sigma]\!])$ where $\sigma$ is the type of $t$. We will only consider well-typed interpretations here. Moreover, it can be proved that the interpretation function is sound and complete. The completeness statement reads as follows. For all terms $t_1$ and $t_2$,

\[t_1 \equiv_{\beta\eta} t_2 \text{ iff } [\![t_1]\!] = [\![t_2]\!]\]

where the $(\Rightarrow)$ direction is soundness whereas $(\Leftarrow)$ is completeness of the interpretation.

This statement is certainly true. If two terms are $\beta\eta$ equivalent they are equal in the model, i.e. the semantics is agnostic to $\beta\eta$-step reductions. Conversely, all equations that hold for any two STLC-denotable terms also hold in the syntax.

However, completeness of a model is a slightly different statement:

\[t_1 =_{\beta\eta} t_2 \text{ iff for all } [\![ \cdot ]\!] : \Lambda_X \to \mathcal{C}, [\![t_1]\!] = [\![t_2]\!]\]

This one states that fixed a category $\mathcal{C}$, $\beta\eta$-equivalence between terms holds if and only if these two terms are equal in every possible interpretation.

In this sense, CCC categories are not complete models. The counter example is given by the preorder category $\mathcal{P}$ with CCC structure. The preorder the category has at most one morphism ($\sqsubset$) between objects. If this category has the greatest element $\top$, binary meets ($\wedge$) and Heyting implications ($\to$) then $\mathcal{P}$ is CCC.

Now the problem is that when the category is thin every (well-typed) interpretation interprets two programs of the same type into morphisms of the same type, but since the category is thin these two morphisms are always equal. For example, consider the projection maps out of the product $x \wedge x \xrightarrow{\pi_1} x$ and $x \wedge x \xrightarrow{\pi_2} x$ for the particular case when the codomain of the two coincide. In $\mathcal{P}$ these two are the same map, i.e. $\pi_1 = \pi_2$.

Now the right-hand side of the completeness theorem is satisfied since For all well-typed interpretations $[\![\cdot]\!]$ we have $[\![\pi_1]\!] = [\![\pi_2]\!]$ (when the codomain of the two is the same). However, the projections $\pi_1$ and $\pi_2$ in the syntax are definitely not $\beta\eta$-equivalent.

I will defer the reader to the original paper for more details.

The Axiom of Choice in Type Theory

2022-11-25T13:14:21+00:00

The Axiom of Choice (AC) is an axiom that states that the product of a family of non-empty sets is itself non-empty. This is a rather controversial axiom amongst mathematicians but in type theory this axiom is provable within the logic.

First off, I do not consider myself an expert on set theory, but after having this kind of conversation with mathematicians and computer scientists I found there are some misconceptions around this axiom and the reasons why it is needed.

For example, as you will see, it is indeed true that the axiom of choice is connected with the existential quantifier, it is not true, however, that we cannot pick an element out of the existential because the logic is classical.

In my mind there are two problems: the first is that

the existential quantifier does not ensure there exists one element with a particular property in the domain of discourse

and the second is that

we would need to create a infinite proof that uses Existential Instantiation for each element of the indexing set

However, in order to fully understand what is going on we need to be more precise. So first let’s begin with what is the axiom of choice.

The axiom of choice (AC)

The original formulation of the AC is the following.

Given a set $X$ and a family of non-empty sets $\{A_x\}_{x \in X}$ over $X$, the infinite product of these sets, namely $\Pi_{x \in X}. A_{x}$ is non-empty

For the record, the infinite product is defined as follows

\[\Pi_{x \in X}. A_{x} = \{ f : X \to \bigcup_{x \in X} A_{x} \mid f(x) = A_{x} \}\]

However, this statement is a little bit more packed than we would like it to be. An equivalent statement is skolemization.

Skolemization (Sk)

Skolemization is what allows one to turn an existentially quantified formula into a function. Formally, skolemization is the following statement

Given a relation $R \subseteq X \times Y$, $\forall x \in X. \exists y \in Y. R(x,y)$ then $\exists f \in X \to Y. \forall x \in X. R (x, f(x))$

The AC is equivalent to Skolemization. A full discussion of this fact can be found in here

For proving that Sk $\Rightarrow$ AC, for a family of sets $\{A_{x}\}_{x \in X}$, we define a relation $R(x,y) = y \in A_{x}$. For the other direction we assume a relation $R \subseteq X \times Y$ and then we construct the family of sets $\{A_{x}\}_{x \in X}$ such that each $A_{x} = \{ y \mid y \in Y \text{ and } R(x,y)\}$.

The existential

Set theory is a first-order logic together with a set of axioms (9 of them exactly including the AC) postulating the existence of certain sets. Besides the propositional fragment of first-order logic there is also the predicate fragment formed by universal quantification ($\forall$) and existential quantification ($\exists$).

The Existential Instantiation rule states that if we know there exists an $x$ that satisfies the property $P$ and we can construct a proof from a fresh $t$ that satisfies that property to a proposition $R$ then we can obtain $R$

\[\frac{\exists x. P \qquad t, P[t/x]\cdots R }{R}\]

with $t$ free for $x$ in $P$.

So here we have to treat $t$ carefully in that it is a fresh $t$ that satisfies $P$, but “we do not know what it is!”.

The reason why I put this sentence in quotes is because this is the explanation that many people would use. However, to me the real reason is that we do not know how many other elements in the universe exist with such a property. There is certainly one, but there may be more.

The problem with producing a choice function

To prove Sk we have to assume $\forall x \in X. \exists y \in Y. R(x, y)$ and then prove $\exists f : X \to Y. \forall x \in X. R (x , f (x))$. Though $f : X \to Y$ really means a relation $f \subseteq X \times Y$ such that it is a function, i.e. that for all $x \in X$ there exists only one $y \in Y$ such that $(x,y) \in f$.

Now first we try to construct this relation $f$. A first naive attempt is to use the axiom of comprehension as follows

\[f = \{(x, y) \mid x \in X \wedge y \in Y \wedge R(x, y)\}\]

The problem is that $f$ is clearly not a function since there may be more than one $y$ per one $x$ in $R$. Notice that the above statement is very simlar to the one where we include the existential

\[f = \{(x, y) \mid x \in X \wedge \exists y'. y = y' \wedge R(x, y)\}\]

But this does not change much from before since we know there exists at least one $y$ per every $x$ but we do not know how many. Clearly, we can prove that for all $x \in X$ we have $R(x, f(x))$, however, we cannot prove that $f$ is a function. In particular, that for each $x \in X$ we have a unique $y \in Y$ we map $x$ to.

Now the question is, couldn’t we just have picked one $y$ for each $x$?

We could do this if we were able to use Existential Instantiation for each $x \in X$. If $X$ was finite then we could certainly do that as we can pick an $n \in \mathbb{N}$ and assume $X$ assuming that $X = \{x_0, x_1, \dots, x_n \}$.
Now we can construct a set of pairs $(x_i, y_i)_{i\in \{1,\dots,n\}}$ such that every $(x_i, y_i) \in R$ by repeatedly using existential instantiation. Once the set is created we can assign $f$ to it

\[f = \{(x_0,y_0), (x_1,y_1), \dots, (x_n, y_n)\}\]

However, when $X$ is not finite, we cannot simply write down the set by hand. Instead we have to create a formula and then use set comprehension. However, there is no (open) formula of the form

$(x_0,y_0) \in R \wedge (x_1,y_1) \in R \wedge \dots \wedge (x_n, y_n) \in R \wedge \dots }$$

This is because formulas and proofs in set theory are finite and the one above is an infinite formula which would need an (potentially) infinite number of applications of the Existential Instantiation rule.

Conclusions

Hopefully this untangles some confusion around the axiom of choice.

On the other hand, AoC is derivable in Type Theory simply because we have access to the proof that for every $x$ there exists a $y$ such that $R(x,y)$. But the reason why there exists only one is because inhabitants of the dependent product $\forall$ are functions already.

See the code below.

choice : ∀ (A B : Set) → ∀ (R : A →  B → Set) → (∀ (x : A) → Σ B (λ y → R x y)) → Σ (A → B) (λ f → ∀ x → R x (f x))
choice A B R r = (λ x →  proj₁ (r x)) , (λ x → proj₂ (r x))

If you have any comment about this please feel free to drop me an email or something I would very happy to know more (especially if I said something wrong).

###

Inconsistencies in Cartesian Closed Categories with fixed-points

2022-11-10T13:14:21+00:00

Any Cartesian Closed Category (CCC) with an initial object and a fixed-point operator is trivial. Essentially this means that in languages like (e.g.) Haskell the empty type is not actually empty as it contains the non-terminating computation. Perhaps this is obvious, but here’s the categorical explanation.

Here the word trivial means that every object $A$ in the category is isomorphic to the terminal object $1$.

To do this proof we make use of the fixed-point operator, which exists at all types.

We know that for all endomaps $f : A \to A$ in the category there exists a map $\text{fix}_{f} : 1 \to A$ such that $f \circ \text{fix}_{f} = \text{fix}_{f}$. Thus, we can use the unique endomap on the initial object, namely the identity map $id_{0}: 0 \to 0$, to get a map $\text{fix}_{id_{0}} : 1 \to 0$. But now, because $0$ is initial (and $1$ is terminal), we also have a unique map into the terminal object, namely $! : 0 \to 1$. It is easy to see that $\text{fix}_{id_{0}}$ and $1$ are inverses to each other, hence they form an isomorphism $0 \cong 1$. In particular, $\text{fix}_{id_{0}} \circ ! : 0 \to 0$ is $id_{0}$ by initiality and $! \circ \text{fix}_{id_{0}} : 1 \to 1$ is $id_{1}$ by finality.

Now we compute as follows. For every object $A$ in the category $1 \cong 0 \cong 0 \times A \cong 1 \times A \cong A$ and the proof is concluded.

This result was shown to hold also when in the case when instead of the initial object we postulate a natural numbers object $\mathbb{N}$.

A natural question to ask now is:

is every model of PCF trivial?

To answer this question we take as a model of PCF the category of Scott domains. This category consists of pointed directed complete partial orders (dCPPO) as objects and continuous functions as arrows (just following Thomas Streicher’s book to avoid any misunderstanding).

Now, we would like to prove that this category is cartesian closed (which we know), has a fixed-point map (which it has) and that it has an initial object. However,

there is no initial object in the category of Scott domains

This is because if this category had an initial element $0$ it would have at least a bottom element $\bot_0$. Notice that the subset $\{\bot_0\}$ is indeed directed and its suprema $\bigsqcup \{\bot_0\}$ is $\bot_0$ itself. Now if we take any other dCPPO $X$, a continuous function $f : 0 \to X$ that maps $\bot_{0}$ to any element $x \in X$ will satisfy the equation

\[f \bigsqcup \{\bot_0\} = \bigsqcup f \{\bot_0\}\]

because, for any $x \in X$ we choose for $f(\bot_0)$ (even the bottom element), $\bigsqcup f \{\bot_0\} = \bigsqcup \{x\} = x$.

The only way this category had an initial element is if the arrows in the category were strict, namely they preserved $\bot$ elements, but, as we have seen, continuous functions do not necessarily preserve it.

Is this just a coincidence that Scott’s model is not trivial?

Not really. Because if it was trivial it would have broken computational adequacy which is the statement that for every pair or well-typed terms in the language $\Gamma \vdash t : A$ and $\Gamma \vdash t' : A$

if $[\![ t ]\!] = [\![ t' ]\!]$ then $t \approx t'$

where $\approx$ is contextual equivalence of programs.

But if the models was trivial then all the pairs of PCF-denotable terms (pairs of maps into something isomorphic to $1$) would be equal (by finality) and therefore operationally equivalent.

What does this all mean for the Haskell programmer?

Well nothing, because Haskell does not have a formal model.

But let’s say we make a big leap and take the fragment of Haskell consisting of “inductive data types” and recursion. Now I can craft a program that resembles what I just said above

{-# LANGUAGE GADTs #-}

data Empty where

data Unit = One ()

y :: (a -> a) -> a
y f = f (y f)

empty :: Empty -> Empty
empty x = x

(===) :: a -> a -> a
x === y

endoEmpty :: Unit -> Empty
endoEmpty = y id === id (y id) -- by Fixed-point property y f = f (y f)

Is this a problem? No, this is not a problem because y id is the infinite computation. In other words, sends the unit element to $\bot$. But since Haskell functions need not to be strict, I can send the $\bot$ element in Empty to One (). So this map is not an isomorphism.

Conclusions

This is probably a very convoluted way of saying

There is no initial object (or natural numbers object) in PCF (or other “PCF-like” languages like Haskell)

this is because Empty actually contains the bottom element $\bot$. For the same reasons, if we now consider System F with a polymorphic fixed-point operator and define the $0$ object by setting

\[0 = \forall x . x\]

This object has actually an inhabitant: the non-terminating computation. Thus, it is not the initial object.