Aidan Ewart


The λ\lambda-Calculus, in Set Theory, in Coq

I've recently been reading about Peter Aczel's interpretation of 'Constructive Zermelo-Fraenkel Set Theory' (CZF) in Martin-Löfs Type Theory (MLTT). This got me thinking about the inverse interpretation; how would one go about creating a model for a simple type theory within a set theory like CZF? I resolved to experiment with constructing models of various logics in set theory; however, to do this, I first needed a set theory to experiment in.

Axiomatic Set Theory

CZF is an 'axiomatic' set theory, meaning that one may only 'construct' sets by using the particular rules, called axioms, which make up the theory. The reason for this is twofold; firstly, the axiomisation of the set theory avoids the possibility of subtle inconsistencies which can invalidate proofs - the commonly-cited example is Russell's paradox.

Secondly, we can limit the axioms of our system to those that are constructive, meaning that each axiom (abstractly) represents a 'computation' or 'algorithm' of some sort; such a system of axioms is called a constructive theory. This has the interesting property that proofs in constructive theories can be thought of as descriptions of computations: for example, a proof of ABA \to B can be thought of as an algorithm for constructing a proof of BB given a proof of AA.

Using non-constructive theories, such as Zermelo-Fraenkel Set Theory (ZF), mathematicians can prove results which seem counterintuitive, such as the Banach-Tarski paradox (which relies on the non-constructive 'axiom of choice'). The distinction between constructive and non-constructive axioms and theories is mostly philosophical, as some non-constructive axioms (such as the law of the excluded middle or the powerset) may seem perfectly intuitive on the surface.

Most set theories are but an extension of an 'underlying language' of logic (classical logic for ZF, intuonistic logic for CZF) with the aforementioned axioms for set construction. This underlying language usually includes the normal logical operators and quantifiers, and a statement in this language is called a predicate. Luckily for us, many logics have already been implemented as programs for computers. This means we need not check our proofs manually; the program automatically checks the soundness of our reasoning for us!

These systems are called 'automated theorem provers' and are quite a mature technology; I will extend one, Coq, with axioms for the construction of sets. The actual set theory I will use is called 'Intuonistic Zermelo-Fraenkel Set Theory' (IZF); as the name suggests, it is similar to CZF, but is slightly 'less' constructive. It uses the following axioms:

I describe the process of creating a model (an 'encoding') for this set theory in Coq in addendum I. You can view the axiomisation of IZF in Coq here.

Somewhat amazingly, using only these axioms, one is able to 'construct' much of modern mathematics, as we shall soon see!

Set-Theoretic Constructions

As a first example construction, let us consider the Cartesian product of two sets A×BA \times B. For any sets AA and BB, A×BA \times B should have the property that for any xAx \in A and yBy \in B, x,yA×B\langle x, y \rangle \in A \times B, where x,y\langle x, y \rangle denotes the ordered pair of xx and yy, constructed in the usual way as {{x},{x,y}}\{\{x\}, \{x, y\}\}.

For a given xAx \in A, we can construct the set {x,y:yB}\{\langle x, y \rangle : \forall y \in B\} by the axiom of replacement and a predicate ϕ(y,z)\phi(y,z) defined as z=x,yz = \langle x, y \rangle. Clearly, this predicate is functional. Call this process f(x)f(x), and we can then apply replacement again, this time using ff, to generate the indexed set (zx)xA(z_x)_{x \in A}. Taking xAzx\bigcup_{x \in A}z_x, and we have the Cartesian product A×BA \times B:

Definition cartesian_product (A B: set) :=
  ⋃(replacement (fun x (_: x ∈ A) => replacement (fun y (_: y ∈ B) => ⟨x,y⟩))).

Notation "A × B" := (cartesian_product A B) (at level 60, right associativity).
Lemma is_cartesian_product : forall A B x y, x ∈ A -> y ∈ B -> ⟨x,y⟩ ∈ A × B.

The disjoint union, or coproduct, of an indexed set (xi)iI(x_i)_{i \in I} is similar to the union of (xi)iI(x_i)_{i \in I}, except every element is 'tagged' with the index of the set which it is a member of. For example, for an yxjy \in x_j for some jIj \in I, then j,yiIxi\langle j, y \rangle \in \bigsqcup_{i \in I} x_i, where \bigsqcup is the disjoint union operation.

The disjoint union of an indexed set (xi)iI(x_i)_{i \in I} can be constructed using the axioms of IZF, similarly to the construction of the Cartesian product. First, for a given jIj \in I, take the replacement of xjx_j with f(x)=j,xf(x) = \langle j, x \rangle, and call this operation g(j)g(j). The disjoint union is then the union of the replacement of gg over the elements of II. In set notation, this is {{j,y:yxj}:jI}\bigcup\{\{\langle j, y \rangle : \forall y \in x_j\} : \forall j \in I\}, which is easily represented in Coq (representing an indexed set (xi)iI(x_i)_{i \in I} as a function forall i, i ∈ I -> set):

Definition disjoint_union {I} (f: forall i, i ∈ I -> set) :=
  ⋃(replacement (fun j H => replacement (fun y (_: y ∈ f j H)  => ⟨j,y⟩))).

Lemma is_disjoint_union :
  forall {S f x y}
         (p: x ∈ S),
         y ∈ f x p -> ⟨x,y⟩ ∈ disjoint_union f.

That derivation is almost identical to our derivation for the Cartesian product! In fact, the Cartesian product is definitionally equal to the case of the disjoint union where xjx_j is always BB for any jAj \in A:

Lemma cartesian_product_is_extreme_disjoint_union :
  forall {A B}, A × B ≡ disjoint_union (fun x (_: x ∈ A) => B).
Proof.
  intros. reflexivity.
Qed.

This is a pretty good example of how a constructive approach can allow you to discover nuances about the properties of operations that would remain undiscovered otherwise.

We can also define a binary version of the union and disjoint union operations:

Definition union2 x y :=
  ⋃(x +> (y +> ∅)).
Notation "x ∪ y" := (union2 x y) (at level 60).

Lemma union2_prop : forall {A B x},
  x ∈ A ∪ B <-> x ∈ A \/ x ∈ B.

Definition left x := ⟨∅,x⟩.
Definition right x := ⟨Suc ∅,x⟩.

Definition disjoint_union2 x y.
Notation "x + y" := (disjoint_union2 x y) (at level 60).

Lemma disjoint_union2_prop : forall {A B x},
  x ∈ A + B <-> (exists a, x ≡ left a /\ a ∈ A) \/ (exists b, x ≡ right b /\ b ∈ B).

We can derive a stronger form of our axiom of infinity by separating out the elements of \infty that satisfy the inductive property. The inductive property, which is written Ind(x)\mathrm{Ind}(x), says that 'if ϕ()\phi(\emptyset) and for all ϕ(a)\phi(a) we can derive ϕ(Suc(a))\phi(\mathrm{Suc}(a)), then we can derive ϕ(x)\phi(x) for any property ϕ\phi':

Ind(x)=ϕ,(ϕ()a,ϕ(a)ϕ(Suc(a)))ϕ(x)\mathrm{Ind}(x) = \forall \phi, (\phi(\emptyset) \wedge \forall a, \phi(a) \to \phi(\mathrm{Suc}(a))) \to \phi(x)

If for any xx we have Ind(x)\mathrm{Ind}(x), then we say that xx is in the standard model of arithmetic - this is pretty much the same as saying xx can be expressed by repeating the Suc\mathrm{Suc} operation over the empty set a finite number of times.

It is important to note that the inductive property cannot be written in 'first-order' logic where only quantification over sets is allowed; in order to express the inductive property correctly, we must quantify over all propositional functions ϕ\phi, which requires 'second-order' logic. It is possible to construct the standard model of arithmetic in ZF using only first-order logic, but not express the inductive property in it.

We can construct the standard model of arithmetic (which we call ω\omega) by selecting the elements xx \in \infty that have the inductive property:

ω={x:x,Ind(x)}\omega = \{x : \forall x \in \infty, \mathrm{Ind}(x)\}

As, by definition, all elements of ω\omega have the inductive property, we can prove that one can perform mathematical induction over ω\omega itself; if, for any property ϕ\phi, both ϕ()\phi(\emptyset) and n,ϕ(n)ϕ(Suc(n))\forall n, \phi(n) \to \phi(\mathrm{Suc}(n)) hold, then ϕ(x)\phi(x) holds for all xωx \in \omega. This makes ω\omega so useful for further constructions that it is sometimes even stated axiomatically, as the axiom of strong infinity:

ω,ϕ,ϕ()(n,ϕ(n)ϕ(Suc(n)))xω,ϕ(x)\exists \omega, \forall \phi, \phi(\emptyset) \wedge (\forall n, \phi(n) \to \phi(\mathrm{Suc}(n))) \to \forall x \in \omega, \phi(x)

The exponential set ABA^B, of functions f:BAf : B \to A has a pretty obvious construction; all functions BAB \to A are subsets of B×AB \times A, and so to construct ABA^B we can simply restrict P(B×A)\mathcal{P}(B \times A) to only those subsets RR which are functional (for all xBx \in B, if x,yR\langle x, y \rangle \in R for a yAy \in A, then yy is unique) and left-total (for all xBx \in B, there exists a yAy \in A such that x,yR\langle x, y \rangle \in R), using separation:

Definition functional (R: set) :=
  forall {a b}, ⟨a,b⟩ ∈ R -> forall {c}, ⟨a,c⟩ ∈ R -> b ≡ c.

Definition left_total (R D: set) :=
  forall {a}, a ∈ D -> exists {b}, ⟨a,b⟩ ∈ R.

Definition exponential (A B: set) :=
  separation (fun R => functional R /\ left_total R B) (powerset (B × A)).

Lemma member_exp_function :
  forall {A B R}, R ∈ exponential A B -> functional R /\ left_total R B.

From the exponential set and the principle of \in-induction, we can actually derive some useful and surprising tools for constructing more complex sets, such as the principle of \in-recursion. The principle of \in-recursion states, that for every formula of the underlying language ϕ(x,y)\phi(x,y) such that x,!y,ϕ(x,y)\forall x, \exists ! y, \phi(x,y), then there exists a formula χ(x,y)\chi(x,y) such that χ(x,y)ϕ({b:axχ(a,b)},y)\chi(x,y) \leftrightarrow \phi(\{b : a \in x \wedge \chi(a,b)\},y). Expressed as a function, this can be written as:

G,F,F(x)=G({F(y):yx})\forall G, \exists F, F(x) = G(\{F(y) : y \in x\})

However, in order to prove \in-recursion, we must first prove that the 'transitive closure' of any set xx is itself a set. The transitive closure of any set, TC(x)TC(x), is defined by the properties yx,yTC(x)\forall y \in x, y \in TC(x) and yTC(x),zy,zTC(x)\forall y \in TC(x), \forall z \in y, z \in TC(x) - that is, if for any zz and xx there is a chain of the form z...xz \in ... \in x, then zTC(x)z \in TC(x). Notice that xTC(x)x \subset TC(x). This is provable inside our model using \in-induction and collection:

Lemma TC : forall x, exists TCx,
  (forall y, y ∈ x -> y ∈ TCx)
  /\ (forall y, y ∈ TCx -> forall z, z ∈ y -> z ∈ TCx).

Using the transitive closure, we can prove the existence of \in-recursion as follows:

For any formula ϕ(x,y)\phi(x,y) such that x,!yY,ϕ(x,y)\forall x, \exists ! y \in Y, \phi(x,y), and a set XX which we are recursing over, consider elements of the set:

{FYTC(X):xTC(X),ϕ({F(a):ax},F(x))}\{F \in Y^{TC(X)} : \forall x \in TC(X), \phi(\{F(a) : a \in x\},F(x))\}

Call this set F\mathbb{F}. We can prove that F\mathbb{F} contains a unique element, meaning that fgF,xTC(X),f(x)=g(x)\forall f g \in \mathbb{F}, \forall x \in TC(X), f(x) = g(x), by \in-induction over xx. In the inductive case, we need to prove that f(a)=g(a)f(a) = g(a) given ba,f(b)=g(b)\forall b \in a, f(b) = g(b). From this, it is obvious that {f(b):ba}={g(b):ba}\{f(b) : b \in a\} = \{g(b) : b \in a\}. As ϕ\phi relates this to a unique set cc, then f(a)=c=g(a)f(a) = c = g(a).

Taking f=Ff = \bigcup \mathbb{F}, by the uniqueness property above, ff will be the function we want, provided we can prove that F\mathbb{F} is inhabited. dom(f)=\mathrm{dom}(f) = \emptyset iff F\mathbb{F} is uninhabited, so proving TC(X)dom(f)TC(X) \subseteq \mathrm{dom}(f) suffices. Once again, \in-induction comes to our aid! Using the predicate vTC(X)w,v,wf\forall v \in TC(X) \to \exists w, \langle v,w \rangle \in f, the inductive case becomes:

a,(ba,bTC(X)c,b,cf)aTC(X)d,a,df\forall a, (\forall b \in a, b \in TC(X) \to \exists c, \langle b, c \rangle \in f) \to a \in TC(X) \to \exists d, \langle a, d \rangle \in f

The assumption aTC(X)a \in TC(X) proves that ba,bTC(X)\forall b \in a, b \in TC(X) by the property of the transitive closure. Therefore, we can use replacement to construct the set {f(b):ba}\{f(b) : b \in a\}, for which there exists a unique set dd such that ϕ({f(b):ba},d)\phi(\{f(b) : b \in a\},d). By the construction of ff, f(a)f(a) is exactly the set dd, and so a,df\langle a,d \rangle \in f. From this, vTC(X)w,v,wf\forall v \in TC(X) \to \exists w, \langle v,w \rangle \in f, i.e. TC(x)dom(f)TC(x) \subseteq \mathrm{dom}(f), thereby proving the existence of ff - for a given XX. Whenever we use \in-recursion over a set SS, we can just set X=SX = S, meaning we can safely ignore the added complexity of XX from now on.

Formalizing the reasoning above is somewhat out of the scope of this project, so I just add an axiom implementing the functionality of \in-recursion:

Axiom ϵ_rec : (forall x, (forall y, y ∈ x -> set) -> set) -> set -> set.
Axiom ϵ_rec_prop : forall {F x},
  ϵ_rec F x ≡ F x (fun y _ => ϵ_rec F y).

This is the last axiom I will add to the system; everything else is derived from the Coq model described in addendum I.

Inductively Defined Sets

We now have enough theory to begin constructing some interesting sets!

A set is 'inductively defined' if all members of the set can be finitely described in terms of operations on other members of that set - a classic example is the set of natural numbers:

You can see that the set of λ\lambda-calculus terms is also inductively defined - each term is either a constant or 'base case' (00 for the natural numbers, or variables for λ\lambda-terms) or a composition of terms already in the set (n+1n+1 in the example above, abstraction and application for λ\lambda-terms). We can formalise this notion of inductive definability by introducing the concept of a 'description' of an inductively defined set.

The description of a set XX is a class function Γ\Gamma such that XX is the 'least fixed point' of Γ\Gamma - that is, XX is the smallest set such that Γ(X)=X\Gamma(X) = X.

As an example, consider ΓN(X)=X+1\Gamma_\mathbb{N}(X) = X + 1, the description of the natural numbers. If 11 is the set containing the single element \bullet, then the least fixed point of ΓN\Gamma_\mathbb{N} must contain right()\mathrm{right}(\bullet); repeating this process, we see that it must also contain left(right())\mathrm{left}(\mathrm{right}(\bullet)), as well as left(left(right()))\mathrm{left}(\mathrm{left}(\mathrm{right}(\bullet))), and so on. If we define 00 as right()\mathrm{right}(\bullet) and n+1n+1 as left(n)\mathrm{left}(n), then the similarities with the inductive definition we gave above become clear.

It is often possible to gain an understanding of what the least fixed point of a given set will 'look' like by looking at its description. If we consider only 'polynomial' descriptions - descriptions in the form a+b×x+c×x×x+a + b \times x + c \times x \times x + \dots, where a,b,c,a,b,c,\dots are independent of XX - then each term of the polynomial - a×xna \times x^n - is an operation combining nn other elements of xx (and an element of aa) to construct a new element of xx. When n=0n = 0, the 'operations' are constant; these are the base cases of the inductive definition.

For example, we can write the description for the terms of the λ\lambda-calculus as Γλ(X)=vars+vars×X+X×X\Gamma_\lambda(X) = \mathrm{vars} + \mathrm{vars} \times X + X \times X, given a set of variables vars\mathrm{vars}. The constant term vars\mathrm{vars} is the base case - we can construct a λ\lambda-term from a variable. The second term states that we can construct a λ\lambda-term from another λ\lambda-term and a variable - this is the abstraction (λv.M)(\lambda v. M). Finally, with two λ\lambda-terms - X×XX \times X - we can construct (MN)(M N), which is itself a λ\lambda-term.

We can 'isolate' the base cases of any polynomial Γ\Gamma by taking Γ()\Gamma(\emptyset); any other varying term must be in the form a×na \times \emptyset^n for n>0n > 0. Since \emptyset has no members, then the Cartesian product of any set with \emptyset must also have no members. As these constant terms are always 'included' in Γ(X)\Gamma(X) for all XX, we have Γ()Γ(X)\Gamma(\emptyset) \subseteq \Gamma(X). We can use this property to algorithmically construct the least fixed point for Γ\Gamma.

The least fixed point of any Γ\Gamma should contain all base cases - Γ()\Gamma(\emptyset) - all terms constructed from the base cases - Γ(Γ())\Gamma(\Gamma(\emptyset)) - and so on. This gives us a candidate for the least fixed point Γ()Γ(Γ())\emptyset \cup \Gamma(\emptyset) \cup \Gamma(\Gamma(\emptyset)) \cup \dots, alternatively expressed as i=0Γi()\bigcup_{i=0}^\infty \Gamma^i(\emptyset). In order to prove that this actually is a fixed point of Γ\Gamma, we need to prove:

Γ(i=0Γi())=i=0Γi()\Gamma(\bigcup_{i=0}^\infty \Gamma^i(\emptyset)) = \bigcup_{i=0}^\infty \Gamma^i(\emptyset)

The two operators used in polynomial descriptions are both distributive over the binary union operator, meaning that a×(bc)=(a×b)(a×c)a \times (b \cup c) = (a \times b) \cup (a \times c) and a+(bc)=(a+b)(a+c)a + (b \cup c) = (a + b) \cup (a + c). This in turn means that polynomial equations are distributive over the binary union: Γ(ab)=Γ(a)Γ(b)\Gamma(a \cup b) = \Gamma(a) \cup \Gamma(b). We now have:

Γ(i=0Γi())=i=0Γ(Γi())=i=1Γi()\Gamma(\bigcup_{i=0}^\infty \Gamma^i(\emptyset)) = \bigcup_{i=0}^\infty \Gamma(\Gamma^i(\emptyset)) = \bigcup_{i=1}^\infty \Gamma^i(\emptyset)

The property that Γ()Γ(X)\Gamma(\emptyset) \subseteq \Gamma(X) for all XX means that Γ()Γ(X)=Γ(X)\Gamma(\emptyset) \cup \Gamma(X) = \Gamma(X); this applies to the case above as well:

Γ(i=0Γi())=Γ()i=1Γi()=i=0Γi()\Gamma(\bigcup_{i=0}^\infty \Gamma^i(\emptyset)) = \Gamma(\emptyset) \cup \bigcup_{i=1}^\infty \Gamma^i(\emptyset) = \bigcup_{i=0}^\infty \Gamma^i(\emptyset)

We can use \in-recursion to construct infinite sets like Γ()Γ(Γ())\Gamma(\emptyset) \cup \Gamma(\Gamma(\emptyset)) \cup \dots. I define a function ϵ_iter to iterate a function Γ\Gamma over a set recursively:

Definition ϵ_iter Γ := ϵ_rec (fun _ f => ⋃(replacement (fun x H => Γ(f x H)))).

Lemma ϵ_iter_prop : forall {Γ X},
  ϵ_iter Γ X = ⋃ (replacement (fun Y _ => Γ (ϵ_iter Γ Y))).

We can then define the least fixed point of any polynomial as that polynomial iterated over ω\omega. As ω\omega contains arbitrarily-nested sets, then ϵ_iter Γ ω must contain arbitrary iterations of Γ\Gamma. We can, for instance, use this to construct naturals and binary trees (trees where every non-leaf node has two children).

Definition lfp Γ := ϵ_iter Γ ω.

Definition Γ_nat X := X + (Suc ∅).
Definition zero := right ∅.
Definition suc n := left n.

Definition Γ_tree X := (X × X) + (Suc ∅).
Definition leaf := right ∅.
Definition branch l r := left ⟨l,r⟩.

Definition nats := lfp Γ_nat.
Definition trees := lfp Γ_tree.

We can then use the same methodology as before to prove that nats and trees are fixed points for their respective signatures, and use this fact to prove properties expected of them:

Lemma nat_lfp : Γ_nat nats ≡ nats.

Lemma zero_in_nats : zero ∈ nats.
Lemma suc_in_nats : forall {n}, n ∈ nats -> suc n ∈ nats.

Lemma tree_lfp : Γ_tree trees ≡ trees.

Lemma leaf_in_trees : leaf ∈ trees.
Lemma branch_in_trees : forall {l r},
  l ∈ trees -> r ∈ trees -> branch l r ∈ trees.

This approach generalises quite well and so can be used to implement the syntax of the λ\lambda-calculus.

Terms of the λ\lambda-Calculus

For an explanation of the λ\lambda-calculus see addendum II. I'll go ahead and assume you already know what it is now, so be warned!

To keep complexity to a minimum, I will create a model of the λ\lambda-calculus using de Brujin indices instead of the normal variables. Problems can arise when normalising λ\lambda-terms meaning you have to rename bound variables in order to avoid 'name conflicts'. Consider the evaluation of ((λy.(λx.y))x)((\lambda y. (\lambda x. y)) x). Naively performing a β\beta-reduction on this term yields (λx.x)(\lambda x. x), which is probably not what was intended - the free xx has suddenly become bound.

To avoid this issue, de Brujin instead used natural numbers to represent variables, and removed variables from abstractions altogether; when a natural number variable occurs in an expression, the 'value' of the variable directly refers to the abstraction to which it is bound. Specifically, a variable nn has nn other abstractions between it and the abstraction to which it is bound. For example, using de Brujin indices, the expression (λx.x)(\lambda x. x) becomes (λ.0)(\lambda. 0) and the expression (λx.(λy.xy))(\lambda x. (\lambda y. x y)) becomes (λ.(λ.10))(\lambda. (\lambda. 1 0)).

Unfortunately, de Brujin indices are not an entirely free lunch. When substituting using de Brujin indices, we need to increment free variables in the expression being substituted every time we pass a λ\lambda-abstraction, so they do not become bound mistakenly. For example, we would not like ((λ.(λ.1))0)((\lambda. (\lambda. 1)) 0) to β\beta-reduce to (λ.0)(\lambda. 0); instead we have to 'lift' the free variables in the expression 00 to reduce it to (λ.1)(\lambda. 1).

λ\lambda-terms using de Brujin indices (λdB\lambda_{dB}-terms) the following syntax:

From this we can derive ΓλdB(X)=ω+X+X×X\Gamma_{\lambda_{dB}}(X) = \omega + X + X \times X as a description for terms of the λdB\lambda_{dB}-calculus:

Definition Γ_lam X := ω + X + X × X.

Definition var n := left (left n).
Definition lam m := left (right n).
Definition app m n := right ⟨m,n⟩.

Since Γ_lam is a polynomial function, we can create the set of terms in exactly the same way as we did for nats and trees above:

Definition terms := lfp Γ_lam.

Lemma lam_lfp : Γ_lam terms = terms.
Lemma var_term : forall {n}, n ∈ ω -> var n ∈ terms.
Lemma lam_term : forall {m}, m ∈ terms -> lam m ∈ terms.
Lemma app_term : forall {m n},
  m ∈ terms -> n ∈ terms -> app m n ∈ terms.

We can use this to construct, for example, the identity function (λ.0)(\lambda. 0) and prove that it is a term:

Definition id := lam (var ∅).

Lemma id_term : id ∈ terms.

Inductively Defined Functions

Similarly to how we can use inductive definitions to define sets, we can use inductive definitions to describe and construct recursive functions. Consider the recursive function for converting from our inductive definition of the naturals to ω\omega:

f(right())=f(left(x))=Suc(f(x))\begin{gathered} f(\mathrm{right}(\bullet)) = \emptyset \\ f(\mathrm{left}(x)) = \mathrm{Suc}(f(x)) \end{gathered}

To construct a description Γf\Gamma_f for this function (such that Γf(f)=f\Gamma_f(f) = f), notice that the f(right())=f(\mathrm{right}(\bullet)) = \emptyset case is non-recursive; therefore, right(),\langle \mathrm{right}(\bullet), \emptyset \rangle should be a base case and an element of Γf()\Gamma_f(\emptyset). Then, for the recursive case, with every iteration of Γf\Gamma_f, we can 'add' a new mapping left(x),Suc(y)\langle \mathrm{left}(x), \mathrm{Suc}(y) \rangle to Γf(F)\Gamma_f(F), provided that F(x)=yF(x) = y; given the set-theoretic encoding of functions, this is the same as saying x,yF\langle x, y \rangle \in F. Iterating Γf\Gamma_f like we did for inductive definitions should then cover all cases of ff.

Unlike in a normal inductive definition, we do not want to be able to discriminate between the base and recursive cases of ff, so we should use the union instead of the disjoint union:

Γf(F)={right(),}{left(x),Suc(y):xN,yω,x,yF}f=i=0Γfi()\begin{gathered} \Gamma_f(F) = \{ \langle \mathrm{right}(\bullet), \emptyset \rangle \} \cup \{ \langle \mathrm{left}(x), \mathrm{Suc}(y) \rangle : \forall x \in \mathbb{N}, \forall y \in \omega, \langle x, y \rangle \in F \} \\ f = \bigcup_{i = 0}^\infty \Gamma_f^i(\emptyset) \end{gathered}

Essentially, given an equation for a recursive function rr, one can express the 'term' of the description corresponding to that equation by replacing each recursive application r(a)r(a) in the right-hand-side of the equation with an element xx of img(r)\mathrm{img}(r); a case may then be 'added' to the function if a,x\langle a, x \rangle is already a member of the function. We can write definitions for (both total and partial) general recursive functions using this method - a general recursive function being one that you can program an algorithm to solve. For a nontrivial example, see addendum III.

An Evaluator for the λ\lambda-Calculus

With the ability to create arbitrary recursive functions we can finally encode an evaluator for the λ\lambda-calculus within the set theory. I will be translating a minimal Haskell evaluator for λdB\lambda_{dB}, with each Haskell function corresponding to a set-theoretic one.

I wanted to keep my evaluator as simple as possible, so I opted to write a 'call-by-name weak-head-normal-form' (CBV WHNF) evaluator; 'call-by-name' means that a function's argument is not evaluated to normal form before being substituted into the function's body, while 'weak-head-normal-form' means that any abstraction (λ.b)(\lambda . b) is in normal form, even if bb is not.

First off, the Haskell implementation defines a function, lift, to handle the lifting of free variables in de Brujin-indexed λ\lambda-terms as described above. I have assigned new variables to each function call to simplify the translation process, and better display the similarities between the Haskell and IZF implementations:

data Term
    = Var Int
    | Lam Term
    | App Term Term

lift :: (Int, Term) -> Term
lift (i, Var v) | v < i = Var v
lift (i, Var v) | v >= i = Var (v+1)
lift (i, Lam b) =
    let c = lift (i+1, b) in Lam c
lift (i, App f a) =
    let g = lift (i, f)
        b = lift (i, a)
    in App g b

On ordinals (such as the members of ω\omega), the << relation corresponds to \in, while the \le relation corresponds to \subseteq. Using this, we can write descriptions for the above function, using ω\omega instead of Int:

Γlift(X)={i,v,v:viω,vi}{i,v,Suc(v):viω,iv}{i,(λ.b),(λ.c):bcterms,iω,Suc(i),b,cX}{i,(fa),(gb):fgabterms,iω,i,f,gXi,a,bX}\begin{aligned} \Gamma_\mathrm{lift}(X) = &\{\langle \langle i, v \rangle, v \rangle : \forall v i \in \omega, v \in i\} \\ \cup &\{\langle \langle i, v \rangle, \mathrm{Suc}(v) \rangle : \forall v i \in \omega, i \subseteq v\} \\ \cup &\{\langle \langle i, (\lambda . b) \rangle, (\lambda . c) \rangle: \forall b c \in \mathrm{terms}, \forall i \in \omega, \langle \langle \mathrm{Suc}(i), b \rangle, c \rangle \in X\} \\ \cup &\{\langle \langle i, (f a) \rangle, (g b) \rangle : \forall f g a b \in \mathrm{terms}, \forall i \in \omega, \langle \langle i, f \rangle, g \rangle \in X \wedge \langle \langle i, a \rangle, b \rangle \in X\} \end{aligned}

We can now use this description to construct the function itself:

lift=i=0Γlifti()\mathrm{lift} = \bigcup_{i = 0}^\infty \Gamma_\mathrm{lift}^i(\emptyset)

To express this in our Coq model of set theory, I attempt to replicate the usual set-builder notation shown above. Unfortunately, the notation {x | P} is already used in Coq, so I have to make do with a textual form; for x, y, z ∈ S, exp is simply the repeated application of the axioms of replacement on exp, introducing x, y and z as variables bound within the replacement. The expression given P, exp is a strange one; if a proof of P cannot be provided, it is the empty set. Otherwise, it is the set exp. I use this to emulate the conditional part of set-builder notation:

Notation "'for' x , .. , y ∈ X , Z" :=
  (⋃ (@replacement X (fun x _ => .. (⋃ (@replacement X (fun y _ => Z))) ..)))
    (at level 200, x closed binder, y closed binder).
Notation "'given' P , Z" := (⋃ (selection (fun _ => P) (Z+>∅))) (at level 200, P at level 200).

Definition liftΓ X :=
  (for x, y ∈ ω, given y ∈ x, ⟨⟨x,var y⟩,var y⟩+>∅)
    ∪ (for x, y ∈ ω, given x ⊆ y, ⟨⟨x,var y⟩,var (Suc y)⟩+>∅)
    ∪ (for b, c ∈ terms, for x ∈ ω,
        given ⟨⟨Suc x,b⟩,c⟩ ∈ X,
        ⟨⟨x,lam b⟩,lam c⟩+>∅)
    ∪ (for f, a, g, b ∈ terms, for x ∈ ω,
        given ⟨⟨x,f⟩,g⟩ ∈ X,
        given ⟨⟨x,a⟩,b⟩ ∈ X,
        ⟨⟨x,app f a⟩,app g b⟩+>∅).

Definition lift := lfp liftΓ.

We can use the same process to construct the evaluator's subst function:

subst :: (Int, Term, Term) -> Term
subst (i, st, Var v) | i == v = st
subst (i, st, Var v) | i /= v = Var v
subst (i, st, App f a) =
    let g = subst (i, st, f)
        b = subst (i, st, a)
    in App g b
subst (i, st, Lam b) =
    let c = subst (i+1, lift (0, st), b)
    in Lam c

When we want to 'call' an already-defined function ff from within our set theory, we simply need to check that i,of\langle i, o \rangle \in f for some idom(f)i \in \mathrm{dom}(f), and oimg(f)o \in \mathrm{img}(f), which can be seen in the lam branch of the subst function:

Definition substΓ X :=
  (for v ∈ ω, for st ∈ terms, ⟨⟨⟨v,st⟩,var v⟩,st⟩+>∅)
    ∪ (for i, v ∈ ω, for st ∈ terms, given i <> v,
        ⟨⟨⟨i,st⟩,var v⟩,var v⟩+>∅)
    ∪ (for i ∈ ω, for st, f, g, a, b ∈ terms,
        given ⟨⟨⟨i,st⟩,f⟩,g⟩ ∈ X,
        given ⟨⟨⟨i,st⟩,a⟩,b⟩ ∈ X,
        ⟨⟨⟨i,st⟩,app f a⟩,app g b⟩+>∅)
    ∪ (for i ∈ ω, for st, st', b, c ∈ terms,
        given ⟨⟨∅,st⟩,st'⟩ ∈ lift,
        given ⟨⟨⟨Suc i,st'⟩,b⟩,c⟩ ∈ X,
        ⟨⟨⟨i,st⟩,lam b⟩,lam c⟩+>∅).

Definition subst := lfp substΓ.

The CBN WHNF evaluation function has only two cases; Lam b is already in normal form, so evaluates to itself, while App f a first evaluates f to an abstraction Lam b, before substituting a in b. A free variable has no computation rules and is never evaluated. This means that the eval function has only two cases and is extremely simple:

eval :: Term -> Term
eval (Lam b) = Lam b
eval (App f a) =
    let (Lam b) = eval f
        sb = subst (0, a, b)
        vb = eval sb
    in vb

This is easily translated into IZF:

Definition evalΓ X :=
  (for b ∈ terms, ⟨lam b,lam b⟩+>∅)
    ∪ (for f, a, b, sb, vb ∈ terms,
        given ⟨f,lam b⟩ ∈ X,
        given ⟨⟨⟨∅,a⟩,b⟩,sb⟩ ∈ subst,
        given ⟨sb,vb⟩ ∈ X,
        ⟨app f a,vb⟩+>∅).

Definition eval := lfp evalΓ.

Now that we have an eval function encoded within our set theory, we can finally try it out! Perhaps the simplest evaluation that this system can express is ((λ.0)(λ.0))((\lambda. 0) (\lambda. 0)), the identity function applied to itself, which simply evaluates to (λ.0)(\lambda. 0). To prove that ((λ.0)(λ.0)),(λ.0)eval\langle ((\lambda. 0) (\lambda. 0)), (\lambda. 0) \rangle \in \mathrm{eval} (i.e. that eval((λ.0)(λ.0))=(λ.0)\mathrm{eval}((\lambda. 0) (\lambda. 0)) = (\lambda. 0)) we first need to prove that eval(λ.0)=(λ.0)\mathrm{eval}(\lambda. 0) = (\lambda. 0). This comes from the first part of our definition of Γeval\Gamma_\mathrm{eval}:

Lemma eval_lam : forall {b}, ⟨lam b,lam b⟩ ∈ eval.
Lemma eval_id : ⟨id,id⟩ ∈ eval.

We can also prove that (λ.0)(\lambda. 0) substituted into the variable 00 (which is the body of id\mathrm{id}) is just (λ.0)(\lambda. 0), and that (λ.0)(\lambda. 0) is constant under the lift operation:

Lemma id_lift : ⟨⟨∅,id⟩,id⟩ ∈ lift.
Lemma id_subst : ⟨⟨⟨∅,id⟩,id⟩,id⟩ ∈ subst.

Using these intermediate proofs, we can prove that ((λ.0)(λ.0))((\lambda. 0) (\lambda. 0)) evaluates to (λ.0)(\lambda. 0) in our model!

Lemma eval_id_id : ⟨app id id,id⟩ ∈ eval.

Verifying the Evaluator

We can prove that our evaluator is correct by proving that it is equivalent to a 'trusted' Coq implementation of the λ\lambda-calculus. First, then, we need to create such an implementation. We'll start with λ\lambda-terms using de Brujin indicies:

Inductive lambda_term : Set :=
| Lam : lambda_term -> lambda_term
| Var : nat -> lambda_term
| App : lambda_term -> lambda_term -> lambda_term.

We can then define the lifting function using an inductively defined relation:

Inductive lift_lambda_term (n: nat) :
  lambda_term -> lambda_term -> Prop :=
| lift_Var_lt (v: nat) :
  v < n -> lift_lambda_term n (Var v) (Var v)
| lift_Var_ge (v: nat) :
  v >= n -> lift_lambda_term n (Var v) (Var (v + 1))
| lift_Lam (a b: lambda_term) :
  lift_lambda_term (n + 1) a b -> lift_lambda_term n (Lam a) (Lam b)
| lift_App (f g x y: lambda_term) :
  lift_lambda_term n f g
  -> lift_lambda_term n x y
  -> lift_lambda_term (App f x) (App g y).

Substitution and evaluation can then be implemented similarly:

Inductive subst_lambda_term (x: nat) (m: lambda_term) :
  lambda_term -> lambda_term -> Prop :=
| subst_Var_eq :
  subst_lambda_term x m (Var x) m
| subst_Var_neq (y: nat) :
  x <> y -> subst_lambda_term x m (Var y) (Var y)
| subst_Lam (o p: lambda_term) :
  lift_lambda_term 0 m o
  -> subst_lambda_term (x + 1) o p q
  -> subst_lambda_term x (Lam p) (Lam q)
| subst_App (n o p q: lambda_term) :
  subst_lambda_term x m n o
  -> subst_lambda_term x m p q
  -> subst_lambda_term x m (App n p) (App o q).

Inductive eval_lambda_term :
  lambda_term -> lambda_term -> Prop :=
| eval_Lam (a: lambda_term) :
  eval_lambda_term (Lam a) (Lam a)
| eval_App (f a b sb vb: lambda_term) :
  eval_lambda_term f (Lam b)
  -> subst_lambda_term 0 b a sb
  -> eval_lambda_term sb vb
  -> eval_lambda_term (App f a) vb.

In order to prove some sort of equivalence between the these different models of the λ\lambda-calculus, we need to be able to convert between lambda_term and set. However, not every set is a valid λ\lambda-term, so we make do with a one-way conversion:

Fixpoint nat_to_set (n: nat) : set :=
match n with
| 0 => zero
| S x => suc (nat_to_set x)
end.

Fixpoint lambda_term_to_set (l: lambda_term) : set :=
match l with
| Lam b => lam (lambda_term_to_set b)
| App f a => app (lambda_term_to_set f) (lambda_term_to_set a)
| Var x => var (nat_to_set x)
end.

The two models are equivalent if any λ\lambda-term aa evaluates to bb in the type-theoretic model if and only if aa evaluates to bb in the set-theoretic model. The 'only if' direction is the easiest to prove:

Lemma all_evaluations_hold :
  forall (a b: lambda_term),
    eval_lambda_term a b
    -> ⟨lambda_term_to_set a, lambda_term_to_set b⟩ ∈ eval.

When writing the 'if' direction, however, we run into a problem: the irreversable nature of the lambda_term_to_set function means we cannot simply flip the arrows of the above proposition. Doing so would mean that the set-theoretic model could still include evaluations that do not correspond to any λ\lambda-term. Instead we prove the following proposition:

Lemma only_evaluations_hold :
  forall (a b : set),
    ⟨a, b⟩ ∈ eval
    -> exists (x y: lambda_term),
      eval_lambda_term x y
      /\ lambda_term_to_set x = a
      /\ lambda_term_to_set y /\ b.

And with that we have proved the set-theoretic model of the λ\lambda-calculus correct!

It would be nice to prove \in-recursion correct within the Coq model of IZF, but that would require some considerable proof automation on my part; even the proofs for simple evaluations were quite long! In any case, this was fun to mess around with, and served as a nice introduction to constructive set theory for me, which I hadn't encountered previously.