# Mikrokosmos

Programming on the λ-calculus

## Table of Contents

This is a tutorial in how to program on the untyped and simply-typed
λ-calculus using the **Mikrokosmos** interpreter. It is divided
on seven chapters of progressive difficulty.

- The λ-calculus: describes the lenguage.
- Boolean logic: implements logical gates.
- Numbers: implements aritmetical operators and comparisons.
- Combinatory logic: describes the SKI combinators.
- Data structures: implements lists and binary trees.
- Recursion: describes fixpoint operators and recursion.
- Types: describes the simply-typed λ-calculus and its logical interpretation.

## 1 The λ-calculus

### 1.1 The untyped λ-calculus

The **untyped λ-calculus** (sometimes called **λ-calculus**) is a formal
system that is also a model of computation (in the same sense that
Turing machines are, for instance). In fact, it is Turing-complete,
that is, it has the property that every computable function, every
function you could write in any programming language, could be written
on the untyped λ-calculus. It was introduced in the 1930s by Alonzo
Church, who was looking for a formal system able to express the notion
of computability.

Its syntax, however, is very simple; any expression is either:

- a
**variable**, \(x,y,z,\dots\) which may appear free or as a placeholder when defining a function; - a
**lambda abstraction**\(\lambda x. -\), which can be interpreted as a function taking \(x\) as an argument and returning what is written under \(-\), which in turn can depend on \(x\); - or an
**application**of two expressions \((\lambda x. M)\ N\) which applies the function \((\lambda x. M)\) over the argument \(N\). When a function is applied, all the ocurrences of \(x\) on \(M\) will be substituted by \(N\). This process is called a**β-reduction**, and it captures our usual notion of function application.

That is, in λ-calculus any function \(f\) can be written as \(\lambda x.f(x)\), and it can be applied to an argument and β-reduced as

\[ (\lambda x.f(x))\ y \longrightarrow f(y). \]

For example, the function that duplicates an element could be written as \((\lambda x.\ 2 \cdot x)\), and it would be applied to \(3\) as

\[ (\lambda x.\ 2 \cdot x)\ 3 \longrightarrow 2 \cdot 3 \longrightarrow 6. \]

In this chapter, however, we will restrict ourselves to very basic functions which do not involve arithmetic. How to define natural numbers, addition or multiplication will be topics for the next chapters. An introduction to the untyped lambda calculus can be found in the following video.

On this tutorial we will use a didactic λ-calculus interpreter,
Mikroskosmos; which can be used directly from the browser. Every time
you find a Mikrokosmos code snippet like the following, you can
execute it by pressing **evaluate**.

### 1.2 A first example

In Mikrokosmos, it is possible to use the backslash character `\`

as a λ
to make the input easier. If you happen to have a greek keyboard
configured, however, feel free to write `λ`

! Mikrokosmos accepts both
sintaxes. For instance, the nested expression \(\lambda x.\lambda y.x\)
can be written as `\x.\y.x`

.

Our first example defines the **identity function** \(\lambda x. x\);
which is a function that takes an argument \(x\) and returns it
unchanged. Our first statement just after this definition applies this
function to itself, and the result is again the identity. Keep in
mind during this tutorial that it is perfectly possible to apply
functions to other functions or to themselves; in fact, this is one of
the core ideas of λ-calculus.

In its output, Mikrokosmos shows the evaluated term and possible names
for it. We will learn what those names mean later. In the previous
example, the evaluation of `id id`

to `id`

has followed this chain of
β-reductions and equivalences

\[ id\ id = (\lambda x.x)(\lambda y.y) \longrightarrow (\lambda y.y) = id. \]

Note that the specific name of a variable is irrelevant. They act only
as placeholders. A perfectly valid definition of `id`

would be `id = \a.a`

or `id = \b.b`

.

### 1.3 Some more examples

Function application is **left distributive** and parentheses can be
ommited when they are not necessary, that is, `f g h`

must be read as
`(f(g))(h)`

instead of `f(g(h))`

. This goes against the usual
mathematical convention, which always writes parentheses on functions,
that is, we write
\[
f\ x,\quad\text{ instead of }\quad f(x);
\]

and \[ f\ x\ y,\quad\text{ instead of }\quad (f(x))(y),\ \text{ or }\ f(x,y); \]

but this makes it less verbose to write multiple argument functions
such as the **constant** function, which takes two arguments and
returns the first one.

Note that, in λ-calculus, a *"function taking two arguments"* and
*"a function that takes an argument and returns a function taking the second"*
are conceptually the same. This equivalence is called
**currying** in honor of Haskell Curry, the logician who developed this
idea, previously discovered by Schönfinkel and Frege.

This is why we can see every function with two arguments as a function
taking only one argument and returning again a function taking the
second one. For example, the `discard = const id`

function is a function that
takes only one argument, discards it, and outputs `id`

. This is called
a **partial application** of a function.

A more useful example of function taking functions as arguments is the
**function composition**, which takes two functions and returns a new
one created by applying the two sequentially. This corresponds to the
usual mathematical function composition \(f \circ g\).

### 1.4 Exercises

**Exercise 1.1:** Think what should be the result of evaluating the
following expressions and then check it with the interpreter. Beware
that the last one is a bit more difficult than the other two.

`compose const id`

`compose id const`

`compose const const`

**Exercise 1.2:** Write the lambda terms for

*function application*, a function taking a function \(f\) and a term \(x\) as arguments and returning \(f(x)\); and*flip*, a function that takes a function on two arguments \(f\) and returns a \(g\) such that \(g(x,y) = g\ x\ y = f\ y\ x = f(y,x)\). In other words, \(\mathsf{flip}\ f\ x\ y = f\ y\ x\).

And test them on the interpreter.

## 2 Boolean logic

### 2.1 Booleans

We will start our programming journey on λ-calculus encoding
boolean logic. In boolean logic, we have two elements `true`

and `false`

;
and some logical gates such as

- the
**and**gate, which outputs true if and only if its two inputs are true. - the
**or**gate, which outputs true if any of its inputs is true, or even if*both*are true. - the
**not**gate, which outpus true if and only if its input is false.

For example,

\[ \mathsf{not}\ (\mathsf{or}\ \mathsf{true}\ \mathsf{false}) = \mathsf{not}\ \mathsf{true} = \mathsf{false}. \]

Our intuition on what means to be a truth value is that truth values
can distinghish between two values (**true** or **false**) or two branches on
a program `(if ... then ... else ...)`

. We are going to use this
intuition to write an encoding of boolean values based on their
ability to choose between two branches. Maybe surprisingly, this
encoding will be also useful to write the usual boolean logic gates.

Here, a truth value is a function on two elements that chooses one of them.

- \(\mathsf{true}\ a\ b = a\)
- \(\mathsf{false}\ a\ b = b\)

This is called the **Church encoding** of the booleans, as it was
described by Alonzo Church for the first time. This idea of defining a
data structure based not on its content but on how it can be used will
appear later, when dealing with more complex structures. The following
are examples on how false and true work.

In particular, `true`

is exactly the same lambda term as `const`

. This is
a very usual phenomenon in λ-calculus, where the same function
can be interpreted differently in different contexts. Mikrokosmos
allows giving multiple names to the same term.

### 2.2 If-else

The advantage of this way of encoding the boolean values is that they can be easily used in combination with other λ-terms. In particular, the way to encode an if-else is almost trivial: it is already encoded on the λ-terms!

If we really want to write an `if-else`

function, it will be, quite
literally, the trivial one.

### 2.3 Logic gates

Usual operations on booleans can be defined too on this encoding and they will be surprisingly easy to construct if we think of booleans as functions choosing from two terms.

**Exercise 2.1:** Explain why this definition of the `and`

gate works.
*Hint: think what happens when the first argument is true.*

*What happens if it is*

`false`

?
The `or`

gate can be defined in a similar way.

And finally, the negation operator is only a way of interchanging the two truth values

### 2.4 Exercises and more logical gates

The boolean logic implication operator works also as a boolean gate, it can be defined as

\[(a \to b) \equiv (\neg a) \vee b,\]

that is, the implication is true if both are true or if the premise is false.

**Exercise 2.2:** Compute the complete operation table for the implication
using the previous definition.

**Exercise 2.3:** Compute the following boolean clauses using lambda calculus

- \((\mathsf{true} \vee \mathsf{false})\to \mathsf{false}\).
- \(\mathsf{false} \to (\mathsf{false} \to \mathsf{false})\)
- \((\neg \mathsf{false} \wedge \neg \mathsf{true}) \to \mathsf{true}\)

**Exercise 2.4:** Define the *xor* gate as a lambda term. The *xor* of
two boolean values must return a true if and only if **exactly one** of
them are true. Check also its logic table.
*Hint: you may want to use the already defined `not`.*

## 3 Numbers

### 3.1 Peano and the natural numbers

In the 19th century, Giuseppe Peano gave a definition of the natural numbers and an axiomatic theory of them based on only two contructors.

- The zero is a natural number, written as Z.
- The successor of a natural number is a natural number, written as S.

In these terms, the usual natural numbers will be

\[ Z,\ S\ Z,\ S\ (S\ Z),\ S\ (S\ (S\ Z)),\ \dots \]

The question is now how can we encode them on λ-calculus. We cannot simply write the two constructors on λ-calculus, but we can make the natural numbers depend on any two given constructors. This is again the same idea we used when we worked with boolean values: we do not care about the structure, but about how can it be used later.

Under this interpretation, a number \(n\) is really a function taking a function \(a\) as an argument and applying it \(n\) times over the argument \(b\). Interpret the following two evaluations.

**Exercise 3.1:** Define a function that takes a natural number and
returns true if and only if the number is even.
*Hint: you may want to think of the given number as a function.*

### 3.2 Addition and multiplication

The encoding of the addition and multiplication of natural numbers will profit from the interpretation of numbers as functions. We are actually encoding naturals as their induction principle: we can define a function by defining how it acts on zero and a successor.

The `double`

function will only change the successor for the
composition of the successor function with itself.

**Exercise 3.2**: Define a `triple`

function.

We are going now to define **addition** using this same principle. It
takes a successor and a zero, computes the first number as `n s z`

and then uses it as a zero on the interpretation of the second one.

**Exercise 3.3:** How would you define multiplication? Keep in mind that
you can use a number as a function. Keep also in mind the previous
exercises on `double`

and `triple`

. *Spoilers below!*
*You may want to try this exercise before continuing.*

There many possible ways of defining multiplication. Some of them can
use the repeated application of `plus`

to a number; but we are going
to define **multiplication** in a way that is similar to how we defined
`double`

previously. We are going to interpret the successor as the
n-fold application of successor.

### 3.3 The predecessor function

How to compute the predecessor of a number? We have not encoded negative numbers, so it could be a function returning zero whenever it tries to get the predecessor of zero. It is an insightful exercise to try to define it by yourself, but please, do not get too obsessed with it. The solution is certainly not easy.

The solution is the term that follows this paragraph. You probably expected something easier! Stephen Kleene, who was a student of Alonzo Church, discovered for the first time how to write a predecessor on lambda calculus while at the dentist. This discovery made Church start thinking that every intuitively computable function could be computed using lambda calculus, that is, that the notions of lambda-computable function and intuitively computable function would coincide.

But why does something like this even work? The main idea is to create
a function that can be applied to a pair of numbers \(n\) times and in
such a way that the first application renders \(1,0\), and any
subsequent application acts as a successor on both numbers. We finally
take only the second one. A detailed derivation of this function can
be found here; but we will also study it in the **Exercise 3.5**.

**Exercise 3.4:** Use the predecessor function to define the `minus`

function. It should return the difference between two numbers. It
should return zero whenever the first number is smaller than the
second.

**Exercise 3.5:** Let's look again at the definition of `pred`

.
The core of the definition is the subexpression `\g.(\h.h (g f))`

,
which we will call `inc f`

. It is a sort of reversed composition.
Study what happens when we compose `inc f`

with itself and reimplement
`pred`

using `inc`

.

### 3.4 Predicates on natural numbers

This encoding even allows us to write predicates on natural numbers. The first predicate will be a function distinguishing a non-zero number from a zero. It will be used later to build more complex ones.

It is built by applying a `const false`

function `n`

times to a true
constant. Only if it is applied `0`

times, it will return a true
value.

Using this predicate, we can build `eq`

and `leq`

, corresponding to \(==\) and \(\leq\).

## 4 Combinatory logic

### 4.1 SKI combinators

Combinatory logic provides a notation for λ-terms independent from quantified variables. Every λ-expression can be written in terms of three combinators, \(S,K,I\), which are defined as

- \(I = \lambda x.x\), the identity function;
- \(K = \lambda x.\lambda y.x\), the constant function;
- \(S = \lambda x.\lambda y.\lambda z. x z (y z)\), a generalized application.

The first one, the identity, can be also written as a function of \(S\) and \(K\).

The interesting property of this particular set of combinators is that
every other lambda expression can be written in terms of them. We can
see how a particular lambda expression is written in SKI calculus by
turning on the **ski** mode of the interpreter.

**Exercise 4.1:** How are Church-encoded numerals represented with SKI
combinators? Compute the first four or five numbers and try to come up
with the general rule.

### 4.2 Schönfinkel combinators

Schönfinkel and Church defined different but equivalent sets of combinators. Some of the other classical combinators are

- function application,
`C`

; - function composition,
`B`

; - and duplication of an argument,
`W`

;

and they are defined as follows.

The Y combinator is related to recursion and we will discuss it on the following chapters.

## 5 Data structures

### 5.1 Tuples

Tuples can be easily defined from boolean logic. The main idea will be that the application of tuple to a function will result on the application of the function to its two components.

\[ \mathtt{tuple}(a,b)(f) \equiv f\ a\ b \]

With this idea, tuples and their two projections are defined as follows.

As seen in the example, we use `true`

and `false`

to select the first
or the second argument to the function. It is possible to use the same
idea to apply other functions.

### 5.2 Lists I: nil and cons

Data structures such as lists or binary trees can be represented using
the same principle we used to build naturals and booleans. We would
need two constructors to represent a list a `nil`

signaling the end of
the list and a `cons`

, joining an element to the head of the list. A
list would be something like this

\[ \mathtt{cons}\ 1\ (\mathtt{cons}\ 2\ (\mathtt{cons}\ 3\ \mathtt{nil})).\]

As we did with natural numbers, we are going to write a representation independent from the constructors, they are going to be passed as arguments. We need

`nil`

, the empty list;`cons`

, a function taking an element (head) and a list (tail) and returning a new list.

This interpretation makes easier to write folding functions for
lists. We can define a function on a list by simply giving the
interpretation for the `nil`

and a binary function as an interpretation
for the `const`

.

It is useful to encode this principle into a function called
`fold`

. We are going to define a summation \(\Sigma\) function and a
list product \(\Pi\) function on lists.

**Exercise 5.1:** Write the `any`

and `all`

functions. They are
functions that can be applied over lists of booleans.

`all`

returns true if the list is made up only of*trues*.`any`

returns true if there is at least one*true*on the list.

You may want to use the `fold`

function.

### 5.3 Lists II: map and filter

Map, filter and fold are the most famous examples of higher order functions on lists and a common example of the power of functional programming, which has its roots on lambda calculus.

- The
**map**function applies a function`f`

to every element on a list. - The
**filter**function removes the elements of the list that do not satisfy a given predicate. It "filters" the list, leaving only elements that satisfy the predicate.

We are going to implement these functions using our previously defined
`fold`

.

**Exercise 5.3:** Write functions

- doubling the value of each number on a list.
- negating each value of a list of booleans.

*Hint: you don't need to use lambda abstractions on these*
*definitions.*

**Filter** can be defined using a boolean to decide at each step whether
to return a list with a head or return the tail ignoring the head,
like this

**Exercise 5.4:** Write a function that, given any list, returns a list
containing only the even numbers on the list.

### 5.4 Binary trees

Lists have been defined using two constructors and trees will be
defined using the same technique. The only difference with lists is
that the `cons`

constructor is going to be replaced by a `node`

constructor, which takes two trees as arguments. That is, a binary
tree is

- an empty tree.
- a node, containing a label, a left subtree, and a right subtree.

Defining functions using a fold-like combinator is again very simple due to the chosen representation. We are going to need also a variant of the usual function acting on three arguments, the label, the right node and the left node.

## 6 Recursion

### 6.1 Fixpoints

A fixpoint combinator is a higher-order function that, given a function \(f\), solves the equation

\[ x = f\ x \]

for \(x\), meaning that, if \(\mathtt{fix}\ f\) is the fixpoint of \(f\), the following sequence of equations holds

\[ \mathtt{fix}\ f = f (\mathtt{fix}\ f) = f ( f (\mathtt{fix}\ f)) = f ( f ( f (\mathtt{fix}\ f))) = \dots \]

The first surprising fact is that such a combinator actually exists.
In the following example, we define the `fix`

combinator; which we
will use later to define recursive functions. The problem it has is
that it cannot be evaluated without arguments into a closed form, so
we have to delay the evaluation of the expression when we bind it. To
do this, we use the Mikrokosmos `!=`

operator, which binds an
expression to a variable **without** simplifying it first.

This `fix` operator (which is more commonly called the Y combinator)
allows us to use the function we are defining on its own
definition. The function will be passed as the first argument to the
argument of fix, as `f = fix (\f. ...)`

. It is important to notice
that recursive functions, even if they work, cannot be evaluated alone
without entering an infinite beta-reduction loop.

In Mikrokosmos, we need the `!=`

operator when defining recursive
functions to prevent them from expanding.

Our first example is the **factorial** function.

The complexity of computing a factorial grows exponentially, and the
lambda calculus (and particularly, this encoding of natural numbers)
was not thought to be efficient. `fact 6`

will surely be too much for
the interpreter.

As a last example, we are going to define **Fibonacci** numbers.

### 6.2 Evaluation

The order in which evaluation is performed is crucial to determine if an expression will eventually terminate. Mikrokosmos evaluates every expression from left to right, that is, the arguments of a function are not evaluated until they are being actually used on the function. This is not the most efficient way: if the same argument appears twice in the body of the function, it will be evaluated twice! but it prevent some expressions taking from entering an inifinite loop.

For instance, `fix`

is a non-terminating term (it **diverges**); but if
it is used inside an `ifelse`

statement, it will be not evaluated at
all.

If `false 2 fix`

is evaluated, however, Mikrokosmos will enter an
infinite loop. Normally, your browser will allow you to stop the
Javascript script and reload the page.

Other examples of Mikrokosmos dealing with non terminating functions include infinite lists as in the following examples, where we take the first term of an infinite list without having to evaluate it completely or compare an infinite number arising as the fix point of the successor function with a finite number.

## 7 Types

### 7.1 Simply typed λ-calculus

Until now, we have been talking about untyped lambda calculus, but we are now going to deal with the simply-typed lambda calculus. The main differences are that

- every term has a type;
- only a subset of the lambda expressions can be written in simply-typed lambda calculus, the typable ones;
- every term normalizes, that is, every computation finishes;
- as a consequence, it is not Turing-complete.

The command `:types on`

activates types. Types are displayed with
every lambda expression, but certain lambda expressions which cannot
be typed cannot be used anymore. The `fix`

operator is an example.

A type is written as a set of type variables and arrows, where `A -> B`

represents the type of a function between `A`

and `B`

. Currying
works also with types, and a multiargument function must be written as
`A -> B -> C`

. The interpreter will always try to infer the
**most general type**, that is, it is preferible to have `A -> B`

than the
particular case `A -> C -> D`

where `B`

happens to be `C -> D`

.

**Exercise 7.1:** The library `and`

function is not typeable. Think why and
write a similar typeable `and`

function.

### 7.2 Propositions as types

What types are inhabited? It is easy to find an expression of the type
`A -> A`

, but it seems that there is no expression of type `A -> B`

.
We can reason that any expression of that type should be able to
transform any given input type onto any desired output type, and that
such an expression would not be possible.

The rules of lambda calculus are similar to the rules of the intuitionistic propositional logic; this means that a type will be inhabited if and only if the type, reading arrows as logical implications, is a tautology of propositional logic. That is, the axioms of intuistic propositional logic are

- every expression implies itself,
`A -> A`

. - we can discard any assumption to arrive at a conclusion
`A -> B -> A`

. - an assumption can be used multiple times to arrive at intermediate
conclusions,
`(A -> B -> C) -> (A -> B) -> A -> C`

.

And these are precisely the types of the SKI combinators. As any lambda
expression can be written in terms of these combinators, every lambda
expression of a type is actually a **proof** of the proposition the
type represents.

We can define some logical connectives using only the implication. For
example, the negation of a proposition \(A\) would be a function taking
\(A\) and returning any given type. As we discussed earlier, this should
be impossible, so the existence of a function `T -> B`

where `B`

is a
free variable should be a proof of the type `T`

not being inhabited.

For example, we can write a proof of the **modus ponens** by presenting
an inhabitant of the type \(A \to (A \to B) \to B\), where A and B are
free type variables.

### 7.3 Products, unions and logic

Mikrokosmos supports **product**, **union**, **unit** and **void** types.

- The
*product*type`A × B`

is the type of the pairs of elements where the first one is of type`A`

and the second one is of type`B`

. - The
*union*type`A + B`

is a type that can contain an element of type`A`

or an element of type`B`

. It is a labeled disjoint union of the two types. - The
*unit*type`⊤`

has only one element, called`unit`

or`★`

. - The
*void*type`⊥`

has no elements. Its utility will be discussed later.

They can be used with the following typed constructors

Constructor | Type | Description |
---|---|---|

`(-,-)` |
`A → B → A × B` |
Pairs |

`fst` |
`(A × B) → A` |
First projection |

`snd` |
`(A × B) → B` |
Second projection |

`inl` |
`A → A + B` |
First inclusion |

`inr` |
`B → A + B` |
Second inclusion |

`caseof` |
`(A + B) → (A → C) → (B → C) → C` |
Case analysis |

`unit` |
`⊤` |
Unit element |

`abort` |
`⊥ → A` |
Empty function |

`absurd` |
`⊥ → ⊥` |
Empty identity |

The following are examples of the use of typed constructors.

As an example, we could use the type `⊤ + ⊤`

to write a new version
of the booleans. This new version will use `caseof`

as its `ifelse`

.

### 7.4 Conjunctions, disjunctions and proofs

These types complete the correspondence between intuitionistic logic and lambda calculus. A type is inhabited if and only if its proposition is provable.

Description | Type | Proposition | Description |
---|---|---|---|

Product type | `A × B` |
`A ∧ B` |
Logical conjunction |

Disjoint union type | `A + B` |
`A ∨ B` |
Logical disjunction |

Unit type | `⊤` |
`⊤` |
True proposition |

Empty type | `⊥` |
`⊥` |
False proposition |

Function to empty | `A → ⊥` |
`¬A` |
Logical negation |

Following this correspondence, we are now able to write more complex proofs on intuitionistic logic. For example, the following is a proof of the trivial fact that

\[ A \wedge B \to B \wedge A; \]

and, at the same time, it is a function swapping the two terms of a pair.

We had defined the negation of a type as a function from that type
to the empty type. This makes sense, only if the type is not inhabited
it is possible to write an empty function. The `abort`

function is
useful if we want to use the principle of explosion, that is, from
falsity, everything follows. As an example, we will prove

\[ A \wedge \neg A \to B \]

and one of the deMorgan laws

\[ \neg (A \vee B) \to \neg A \wedge \neg B \]

as the following terms.
*Note the use of abort and absurd as □ and ■!*

The characteristic difference of classical versus intuitionistic logic
is that \(A \vee \neg A\) and \(\neg \neg A \to A\) (the law of excluded
middle, LEM) are not provable on intuitionistic logic. It is not
possible to find an expression of type `A + (A → ⊥)`

,
which would correspond to \(A \vee \neg A\).

It is possible, however, to prove \(\neg \neg (A \vee \neg A)\).

Now that we are writing proofs, it is a good time to introduce a new
feature of the interpreter. Mikrokosmos can show you the derivation
tree of any of your proofs with `@@`

. For example, we will prove
that

\[ A \vee B \to B \vee A \]

as

### 7.5 Curry-Howard

The Curry-Howard isomorphism not only states the correspondence between types and propositions and between terms and proofs; but a full isomorphism with respect to term evaluation and proof simplification. That is, if a term corresponds to a proof, the fully evaluated term corresponds to a fully simplified proof.

Every simply typed λ-calculus term can be understood as a
derivation tree of the term it is proving. Mikrokosmos allows the user
to check the derivation tree of a given λ-term using `@`

. And
it can also simplify the derivation using `@@`

, as we did before. We
will use our previously defined function/proof `swap`

as an example.

In the first case, the function is not fully simplified, and a more complex derivation tree arises from it. The simplification of the term corresponds directly to a simplification of the derivation tree.