Mikrokosmos
Programming on the λ-calculus

1. The λ-calculus
2. Boolean logic
3. Numbers
4. Combinatory logic
- 4.1. SKI combinators
- 4.2. Schönfinkel combinators
5. Data structures
6. Recursion
- 6.1. Fixpoints
- 6.2. Evaluation
7. Types

This is a tutorial in how to program on the untyped and simply-typed λ-calculus using the Mikrokosmos interpreter. It is divided on seven chapters of progressive difficulty.

The λ-calculus: describes the lenguage.
Boolean logic: implements logical gates.
Numbers: implements aritmetical operators and comparisons.
Combinatory logic: describes the SKI combinators.
Data structures: implements lists and binary trees.
Recursion: describes fixpoint operators and recursion.
Types: describes the simply-typed λ-calculus and its logical interpretation.

1 The λ-calculus

1.1 The untyped λ-calculus

The untyped λ-calculus (sometimes called λ-calculus) is a formal system that is also a model of computation (in the same sense that Turing machines are, for instance). In fact, it is Turing-complete, that is, it has the property that every computable function, every function you could write in any programming language, could be written on the untyped λ-calculus. It was introduced in the 1930s by Alonzo Church, who was looking for a formal system able to express the notion of computability.

Its syntax, however, is very simple; any expression is either:

a variable, \(x,y,z,\dots\) which may appear free or as a placeholder when defining a function;
a lambda abstraction \(\lambda x. -\), which can be interpreted as a function taking \(x\) as an argument and returning what is written under \(-\), which in turn can depend on \(x\);
or an application of two expressions \((\lambda x. M)\ N\) which applies the function \((\lambda x. M)\) over the argument \(N\). When a function is applied, all the ocurrences of \(x\) on \(M\) will be substituted by \(N\). This process is called a β-reduction, and it captures our usual notion of function application.

That is, in λ-calculus any function \(f\) can be written as \(\lambda x.f(x)\), and it can be applied to an argument and β-reduced as

\[ (\lambda x.f(x))\ y \longrightarrow f(y). \]

For example, the function that duplicates an element could be written as \((\lambda x.\ 2 \cdot x)\), and it would be applied to \(3\) as

\[ (\lambda x.\ 2 \cdot x)\ 3 \longrightarrow 2 \cdot 3 \longrightarrow 6. \]

In this chapter, however, we will restrict ourselves to very basic functions which do not involve arithmetic. How to define natural numbers, addition or multiplication will be topics for the next chapters. An introduction to the untyped lambda calculus can be found in the following video.

Lambda Calculus - Computerphile

On this tutorial we will use a didactic λ-calculus interpreter, Mikroskosmos; which can be used directly from the browser. Every time you find a Mikrokosmos code snippet like the following, you can execute it by pressing evaluate.

1.2 A first example

In Mikrokosmos, it is possible to use the backslash character \ as a λ to make the input easier. If you happen to have a greek keyboard configured, however, feel free to write λ! Mikrokosmos accepts both sintaxes. For instance, the nested expression \(\lambda x.\lambda y.x\) can be written as \x.\y.x.

Our first example defines the identity function \(\lambda x. x\); which is a function that takes an argument \(x\) and returns it unchanged. Our first statement just after this definition applies this function to itself, and the result is again the identity. Keep in mind during this tutorial that it is perfectly possible to apply functions to other functions or to themselves; in fact, this is one of the core ideas of λ-calculus.

In its output, Mikrokosmos shows the evaluated term and possible names for it. We will learn what those names mean later. In the previous example, the evaluation of id id to id has followed this chain of β-reductions and equivalences

\[ id\ id = (\lambda x.x)(\lambda y.y) \longrightarrow (\lambda y.y) = id. \]

Note that the specific name of a variable is irrelevant. They act only as placeholders. A perfectly valid definition of id would be id = \a.a or id = \b.b.

1.3 Some more examples

Function application is left distributive and parentheses can be ommited when they are not necessary, that is, f g h must be read as (f(g))(h) instead of f(g(h)). This goes against the usual mathematical convention, which always writes parentheses on functions, that is, we write \[ f\ x,\quad\text{ instead of }\quad f(x); \]

and \[ f\ x\ y,\quad\text{ instead of }\quad (f(x))(y),\ \text{ or }\ f(x,y); \]

but this makes it less verbose to write multiple argument functions such as the constant function, which takes two arguments and returns the first one.

Note that, in λ-calculus, a "function taking two arguments" and "a function that takes an argument and returns a function taking the second" are conceptually the same. This equivalence is called currying in honor of Haskell Curry, the logician who developed this idea, previously discovered by Schönfinkel and Frege.

This is why we can see every function with two arguments as a function taking only one argument and returning again a function taking the second one. For example, the discard = const id function is a function that takes only one argument, discards it, and outputs id. This is called a partial application of a function.

A more useful example of function taking functions as arguments is the function composition, which takes two functions and returns a new one created by applying the two sequentially. This corresponds to the usual mathematical function composition \(f \circ g\).

1.4 Exercises

Exercise 1.1: Think what should be the result of evaluating the following expressions and then check it with the interpreter. Beware that the last one is a bit more difficult than the other two.

compose const id
compose id const
compose const const

Exercise 1.2: Write the lambda terms for

function application, a function taking a function \(f\) and a term \(x\) as arguments and returning \(f(x)\); and
flip, a function that takes a function on two arguments \(f\) and returns a \(g\) such that \(g(x,y) = g\ x\ y = f\ y\ x = f(y,x)\). In other words, \(\mathsf{flip}\ f\ x\ y = f\ y\ x\).

And test them on the interpreter.

2 Boolean logic

2.1 Booleans

We will start our programming journey on λ-calculus encoding boolean logic. In boolean logic, we have two elements true and false; and some logical gates such as

the and gate, which outputs true if and only if its two inputs are true.
the or gate, which outputs true if any of its inputs is true, or even if both are true.
the not gate, which outpus true if and only if its input is false.

For example,

\[ \mathsf{not}\ (\mathsf{or}\ \mathsf{true}\ \mathsf{false}) = \mathsf{not}\ \mathsf{true} = \mathsf{false}. \]

Our intuition on what means to be a truth value is that truth values can distinghish between two values (true or false) or two branches on a program (if ... then ... else ...). We are going to use this intuition to write an encoding of boolean values based on their ability to choose between two branches. Maybe surprisingly, this encoding will be also useful to write the usual boolean logic gates.

Here, a truth value is a function on two elements that chooses one of them.

\(\mathsf{true}\ a\ b = a\)
\(\mathsf{false}\ a\ b = b\)

This is called the Church encoding of the booleans, as it was described by Alonzo Church for the first time. This idea of defining a data structure based not on its content but on how it can be used will appear later, when dealing with more complex structures. The following are examples on how false and true work.

In particular, true is exactly the same lambda term as const. This is a very usual phenomenon in λ-calculus, where the same function can be interpreted differently in different contexts. Mikrokosmos allows giving multiple names to the same term.

2.2 If-else

The advantage of this way of encoding the boolean values is that they can be easily used in combination with other λ-terms. In particular, the way to encode an if-else is almost trivial: it is already encoded on the λ-terms!

If we really want to write an if-else function, it will be, quite literally, the trivial one.

2.3 Logic gates

Usual operations on booleans can be defined too on this encoding and they will be surprisingly easy to construct if we think of booleans as functions choosing from two terms.

Exercise 2.1: Explain why this definition of the and gate works. Hint: think what happens when the first argument is true. What happens if it is false?

The or gate can be defined in a similar way.

And finally, the negation operator is only a way of interchanging the two truth values

2.4 Exercises and more logical gates

The boolean logic implication operator works also as a boolean gate, it can be defined as

\[(a \to b) \equiv (\neg a) \vee b,\]

that is, the implication is true if both are true or if the premise is false.

Exercise 2.2: Compute the complete operation table for the implication using the previous definition.

Exercise 2.3: Compute the following boolean clauses using lambda calculus

\((\mathsf{true} \vee \mathsf{false})\to \mathsf{false}\).
\(\mathsf{false} \to (\mathsf{false} \to \mathsf{false})\)
\((\neg \mathsf{false} \wedge \neg \mathsf{true}) \to \mathsf{true}\)

Exercise 2.4: Define the xor gate as a lambda term. The xor of two boolean values must return a true if and only if exactly one of them are true. Check also its logic table. Hint: you may want to use the already defined `not`.

3 Numbers

3.1 Peano and the natural numbers

In the 19th century, Giuseppe Peano gave a definition of the natural numbers and an axiomatic theory of them based on only two contructors.

The zero is a natural number, written as Z.
The successor of a natural number is a natural number, written as S.

In these terms, the usual natural numbers will be

\[ Z,\ S\ Z,\ S\ (S\ Z),\ S\ (S\ (S\ Z)),\ \dots \]

The question is now how can we encode them on λ-calculus. We cannot simply write the two constructors on λ-calculus, but we can make the natural numbers depend on any two given constructors. This is again the same idea we used when we worked with boolean values: we do not care about the structure, but about how can it be used later.

Under this interpretation, a number \(n\) is really a function taking a function \(a\) as an argument and applying it \(n\) times over the argument \(b\). Interpret the following two evaluations.

Exercise 3.1: Define a function that takes a natural number and returns true if and only if the number is even. Hint: you may want to think of the given number as a function.

3.2 Addition and multiplication

The encoding of the addition and multiplication of natural numbers will profit from the interpretation of numbers as functions. We are actually encoding naturals as their induction principle: we can define a function by defining how it acts on zero and a successor.

The double function will only change the successor for the composition of the successor function with itself.

Exercise 3.2: Define a triple function.

We are going now to define addition using this same principle. It takes a successor and a zero, computes the first number as n s z and then uses it as a zero on the interpretation of the second one.

Exercise 3.3: How would you define multiplication? Keep in mind that you can use a number as a function. Keep also in mind the previous exercises on double and triple. Spoilers below! You may want to try this exercise before continuing.

There many possible ways of defining multiplication. Some of them can use the repeated application of plus to a number; but we are going to define multiplication in a way that is similar to how we defined double previously. We are going to interpret the successor as the n-fold application of successor.

3.3 The predecessor function

How to compute the predecessor of a number? We have not encoded negative numbers, so it could be a function returning zero whenever it tries to get the predecessor of zero. It is an insightful exercise to try to define it by yourself, but please, do not get too obsessed with it. The solution is certainly not easy.

The solution is the term that follows this paragraph. You probably expected something easier! Stephen Kleene, who was a student of Alonzo Church, discovered for the first time how to write a predecessor on lambda calculus while at the dentist. This discovery made Church start thinking that every intuitively computable function could be computed using lambda calculus, that is, that the notions of lambda-computable function and intuitively computable function would coincide.

But why does something like this even work? The main idea is to create a function that can be applied to a pair of numbers \(n\) times and in such a way that the first application renders \(1,0\), and any subsequent application acts as a successor on both numbers. We finally take only the second one. A detailed derivation of this function can be found here; but we will also study it in the Exercise 3.5.

Exercise 3.4: Use the predecessor function to define the minus function. It should return the difference between two numbers. It should return zero whenever the first number is smaller than the second.

Exercise 3.5: Let's look again at the definition of pred. The core of the definition is the subexpression \g.(\h.h (g f)), which we will call inc f. It is a sort of reversed composition. Study what happens when we compose inc f with itself and reimplement pred using inc.

3.4 Predicates on natural numbers

This encoding even allows us to write predicates on natural numbers. The first predicate will be a function distinguishing a non-zero number from a zero. It will be used later to build more complex ones.

It is built by applying a const false function n times to a true constant. Only if it is applied 0 times, it will return a true value.

Using this predicate, we can build eq and leq, corresponding to \(==\) and \(\leq\).

4 Combinatory logic

4.1 SKI combinators

Combinatory logic provides a notation for λ-terms independent from quantified variables. Every λ-expression can be written in terms of three combinators, \(S,K,I\), which are defined as

\(I = \lambda x.x\), the identity function;
\(K = \lambda x.\lambda y.x\), the constant function;
\(S = \lambda x.\lambda y.\lambda z. x z (y z)\), a generalized application.

The first one, the identity, can be also written as a function of \(S\) and \(K\).

The interesting property of this particular set of combinators is that every other lambda expression can be written in terms of them. We can see how a particular lambda expression is written in SKI calculus by turning on the ski mode of the interpreter.

Exercise 4.1: How are Church-encoded numerals represented with SKI combinators? Compute the first four or five numbers and try to come up with the general rule.

4.2 Schönfinkel combinators

Schönfinkel and Church defined different but equivalent sets of combinators. Some of the other classical combinators are

function application, C;
function composition, B;
and duplication of an argument, W;

and they are defined as follows.

The Y combinator is related to recursion and we will discuss it on the following chapters.

5 Data structures

5.1 Tuples

Tuples can be easily defined from boolean logic. The main idea will be that the application of tuple to a function will result on the application of the function to its two components.

\[ \mathtt{tuple}(a,b)(f) \equiv f\ a\ b \]

With this idea, tuples and their two projections are defined as follows.

As seen in the example, we use true and false to select the first or the second argument to the function. It is possible to use the same idea to apply other functions.

5.2 Lists I: nil and cons

Data structures such as lists or binary trees can be represented using the same principle we used to build naturals and booleans. We would need two constructors to represent a list a nil signaling the end of the list and a cons, joining an element to the head of the list. A list would be something like this

\[ \mathtt{cons}\ 1\ (\mathtt{cons}\ 2\ (\mathtt{cons}\ 3\ \mathtt{nil})).\]

As we did with natural numbers, we are going to write a representation independent from the constructors, they are going to be passed as arguments. We need

nil, the empty list;
cons, a function taking an element (head) and a list (tail) and returning a new list.

This interpretation makes easier to write folding functions for lists. We can define a function on a list by simply giving the interpretation for the nil and a binary function as an interpretation for the const.

It is useful to encode this principle into a function called fold. We are going to define a summation \(\Sigma\) function and a list product \(\Pi\) function on lists.

Exercise 5.1: Write the any and all functions. They are functions that can be applied over lists of booleans.

all returns true if the list is made up only of trues.
any returns true if there is at least one true on the list.

You may want to use the fold function.

5.3 Lists II: map and filter

Map, filter and fold are the most famous examples of higher order functions on lists and a common example of the power of functional programming, which has its roots on lambda calculus.

The map function applies a function f to every element on a list.
The filter function removes the elements of the list that do not satisfy a given predicate. It "filters" the list, leaving only elements that satisfy the predicate.

We are going to implement these functions using our previously defined fold.

Exercise 5.3: Write functions

doubling the value of each number on a list.
negating each value of a list of booleans.

Hint: you don't need to use lambda abstractions on these definitions.

Filter can be defined using a boolean to decide at each step whether to return a list with a head or return the tail ignoring the head, like this

Exercise 5.4: Write a function that, given any list, returns a list containing only the even numbers on the list.

5.4 Binary trees

Lists have been defined using two constructors and trees will be defined using the same technique. The only difference with lists is that the cons constructor is going to be replaced by a node constructor, which takes two trees as arguments. That is, a binary tree is

an empty tree.
a node, containing a label, a left subtree, and a right subtree.

Defining functions using a fold-like combinator is again very simple due to the chosen representation. We are going to need also a variant of the usual function acting on three arguments, the label, the right node and the left node.

6 Recursion

6.1 Fixpoints

A fixpoint combinator is a higher-order function that, given a function \(f\), solves the equation

\[ x = f\ x \]

for \(x\), meaning that, if \(\mathtt{fix}\ f\) is the fixpoint of \(f\), the following sequence of equations holds

\[ \mathtt{fix}\ f = f (\mathtt{fix}\ f) = f ( f (\mathtt{fix}\ f)) = f ( f ( f (\mathtt{fix}\ f))) = \dots \]

The first surprising fact is that such a combinator actually exists. In the following example, we define the fix combinator; which we will use later to define recursive functions. The problem it has is that it cannot be evaluated without arguments into a closed form, so we have to delay the evaluation of the expression when we bind it. To do this, we use the Mikrokosmos != operator, which binds an expression to a variable without simplifying it first.

This `fix` operator (which is more commonly called the Y combinator) allows us to use the function we are defining on its own definition. The function will be passed as the first argument to the argument of fix, as f = fix (\f. ...). It is important to notice that recursive functions, even if they work, cannot be evaluated alone without entering an infinite beta-reduction loop.

Essentials: Functional Programming's Y Combinator - Computerphile

In Mikrokosmos, we need the != operator when defining recursive functions to prevent them from expanding.

Our first example is the factorial function.

The complexity of computing a factorial grows exponentially, and the lambda calculus (and particularly, this encoding of natural numbers) was not thought to be efficient. fact 6 will surely be too much for the interpreter.

As a last example, we are going to define Fibonacci numbers.

6.2 Evaluation

The order in which evaluation is performed is crucial to determine if an expression will eventually terminate. Mikrokosmos evaluates every expression from left to right, that is, the arguments of a function are not evaluated until they are being actually used on the function. This is not the most efficient way: if the same argument appears twice in the body of the function, it will be evaluated twice! but it prevent some expressions taking from entering an inifinite loop.

For instance, fix is a non-terminating term (it diverges); but if it is used inside an ifelse statement, it will be not evaluated at all.

If false 2 fix is evaluated, however, Mikrokosmos will enter an infinite loop. Normally, your browser will allow you to stop the Javascript script and reload the page.

Other examples of Mikrokosmos dealing with non terminating functions include infinite lists as in the following examples, where we take the first term of an infinite list without having to evaluate it completely or compare an infinite number arising as the fix point of the successor function with a finite number.

7 Types

7.1 Simply typed λ-calculus

Until now, we have been talking about untyped lambda calculus, but we are now going to deal with the simply-typed lambda calculus. The main differences are that

every term has a type;
only a subset of the lambda expressions can be written in simply-typed lambda calculus, the typable ones;
every term normalizes, that is, every computation finishes;
as a consequence, it is not Turing-complete.

The command :types on activates types. Types are displayed with every lambda expression, but certain lambda expressions which cannot be typed cannot be used anymore. The fix operator is an example.

A type is written as a set of type variables and arrows, where A -> B represents the type of a function between A and B. Currying works also with types, and a multiargument function must be written as A -> B -> C. The interpreter will always try to infer the most general type, that is, it is preferible to have A -> B than the particular case A -> C -> D where B happens to be C -> D.

Exercise 7.1: The library and function is not typeable. Think why and write a similar typeable and function.

7.2 Propositions as types

What types are inhabited? It is easy to find an expression of the type A -> A, but it seems that there is no expression of type A -> B. We can reason that any expression of that type should be able to transform any given input type onto any desired output type, and that such an expression would not be possible.

The rules of lambda calculus are similar to the rules of the intuitionistic propositional logic; this means that a type will be inhabited if and only if the type, reading arrows as logical implications, is a tautology of propositional logic. That is, the axioms of intuistic propositional logic are

every expression implies itself, A -> A.
we can discard any assumption to arrive at a conclusion A -> B -> A.
an assumption can be used multiple times to arrive at intermediate conclusions, (A -> B -> C) -> (A -> B) -> A -> C.

And these are precisely the types of the SKI combinators. As any lambda expression can be written in terms of these combinators, every lambda expression of a type is actually a proof of the proposition the type represents.

We can define some logical connectives using only the implication. For example, the negation of a proposition \(A\) would be a function taking \(A\) and returning any given type. As we discussed earlier, this should be impossible, so the existence of a function T -> B where B is a free variable should be a proof of the type T not being inhabited.

For example, we can write a proof of the modus ponens by presenting an inhabitant of the type \(A \to (A \to B) \to B\), where A and B are free type variables.

7.3 Products, unions and logic

Mikrokosmos supports product, union, unit and void types.

The product type A × B is the type of the pairs of elements where the first one is of type A and the second one is of type B.
The union type A + B is a type that can contain an element of type A or an element of type B. It is a labeled disjoint union of the two types.
The unit type ⊤ has only one element, called unit or ★.
The void type ⊥ has no elements. Its utility will be discussed later.

They can be used with the following typed constructors

Constructor	Type	Description
`(-,-)`	`A → B → A × B`	Pairs
`fst`	`(A × B) → A`	First projection
`snd`	`(A × B) → B`	Second projection
`inl`	`A → A + B`	First inclusion
`inr`	`B → A + B`	Second inclusion
`caseof`	`(A + B) → (A → C) → (B → C) → C`	Case analysis
`unit`	`⊤`	Unit element
`abort`	`⊥ → A`	Empty function
`absurd`	`⊥ → ⊥`	Empty identity

The following are examples of the use of typed constructors.

As an example, we could use the type ⊤ + ⊤ to write a new version of the booleans. This new version will use caseof as its ifelse.

7.4 Conjunctions, disjunctions and proofs

These types complete the correspondence between intuitionistic logic and lambda calculus. A type is inhabited if and only if its proposition is provable.

Description	Type	Proposition	Description
Product type	`A × B`	`A ∧ B`	Logical conjunction
Disjoint union type	`A + B`	`A ∨ B`	Logical disjunction
Unit type	`⊤`	`⊤`	True proposition
Empty type	`⊥`	`⊥`	False proposition
Function to empty	`A → ⊥`	`¬A`	Logical negation

Following this correspondence, we are now able to write more complex proofs on intuitionistic logic. For example, the following is a proof of the trivial fact that

\[ A \wedge B \to B \wedge A; \]

and, at the same time, it is a function swapping the two terms of a pair.

We had defined the negation of a type as a function from that type to the empty type. This makes sense, only if the type is not inhabited it is possible to write an empty function. The abort function is useful if we want to use the principle of explosion, that is, from falsity, everything follows. As an example, we will prove

\[ A \wedge \neg A \to B \]

and one of the deMorgan laws

\[ \neg (A \vee B) \to \neg A \wedge \neg B \]

as the following terms. Note the use of abort and absurd as □ and ■!

The characteristic difference of classical versus intuitionistic logic is that \(A \vee \neg A\) and \(\neg \neg A \to A\) (the law of excluded middle, LEM) are not provable on intuitionistic logic. It is not possible to find an expression of type A + (A → ⊥), which would correspond to \(A \vee \neg A\).

It is possible, however, to prove \(\neg \neg (A \vee \neg A)\).

Now that we are writing proofs, it is a good time to introduce a new feature of the interpreter. Mikrokosmos can show you the derivation tree of any of your proofs with @@. For example, we will prove that

\[ A \vee B \to B \vee A \]

7.5 Curry-Howard

The Curry-Howard isomorphism not only states the correspondence between types and propositions and between terms and proofs; but a full isomorphism with respect to term evaluation and proof simplification. That is, if a term corresponds to a proof, the fully evaluated term corresponds to a fully simplified proof.

Every simply typed λ-calculus term can be understood as a derivation tree of the term it is proving. Mikrokosmos allows the user to check the derivation tree of a given λ-term using @. And it can also simplify the derivation using @@, as we did before. We will use our previously defined function/proof swap as an example.

In the first case, the function is not fully simplified, and a more complex derivation tree arises from it. The simplification of the term corresponds directly to a simplification of the derivation tree.

Mikrokosmos Programming on the λ-calculus

Table of Contents