Lambda calculus

Today's topic: Introduction to the Lambda Calculus

Today is a theory day. We will take a look at the mathematical precursor to Scheme, the lambda calculus.

The lambda calculus was invented by Alonzo Church in the 1930s to study the interaction of functional abstraction and function application from an abstract, purely mathematical point of view.

It was one of many related systems that were proposed in the late 1920s and 1930s. This was an exciting time for theoretical computer science, even though computers had not yet been invented. There were many great mathematicians such as Kurt Gödel, Alan Turing, and Emil Post who were trying to understand the notion of computation. They certainly understood intuitively what an algorithm was, but they were lacking a formal mathematical definition. They came up with a number of different systems for capturing the general idea of computation:

Turing machines -- Turing
m-recursive functions -- Gödel
rewrite systems -- Post
the lambda calculus -- Church
combinatory logic -- Schönfinkel, Curry

These systems all looked very different, but what was amazing was that they all turned out to be computationally equivalent in the sense that each could encode and simulate the others. Church thought that this was too striking to be mere coincidence, and declared that they had found the correct definition of the notion of computability: it was the common spirit embodied in a different way by each one of these formalisms. This declaration became known as Church's thesis. Nowadays we have a better understanding of computability, and one could add any modern general-purpose programming language such as Java, C, or Scheme (suitably abstracted) to this list, but Church and his colleagues did not have that luxury.

The lambda calculus

The lambda calculus is like Scheme except much simpler.

Everything is a function. There are no other primitive types---no integers, strings, cons objects, Booleans, etc. If you want these things, you must encode them using functions.
No state or side effects. It is purely functional. Thus we can think exclusively in terms of the substitution model.
The order of evaluation is irrelevant. Scheme imposes a certain evaluation order, namely eager evaluation---evaluate the arguments before applying the function. The lambda calculus does not specify an evaluation order.
No special forms (this is related to the last bullet). Special forms in Scheme are just expressions that are evaluated using a different evaluation order than the default. As mentioned, the evaluation order is irrelevant in the lambda calculus.
Only unary (one-argument) functions. No thunks or functions of more than argument.

Syntax

Lambda terms E, F, G, ... are defined inductively:

variables: any variable x, y, z, ... is a lambda term;
application: if E and F are lambda terms, then EF is a lambda term;
abstraction: if E is a lambda term and x is a variable, then lx.E is a lambda term.

Notes:

This is the basis of the inductive definition of lambda terms.
The expression EF denotes application of E (considered as a function) to F (considered as an input). This would be written (E F) in Scheme. Think of this as an expression with an invisible operator "apply" between the E and F, like + or *. Parentheses are used for readability only, so they do not have the same significance as in Scheme. The apply operator is not associative, so (EF)G is not the same as E(FG) in general. The expression EFG written without parentheses should be read as (EF)G.
The expression lx.E would be written (lambda (x) E) in Scheme. The application operator binds tighter than abstraction operator, so lx.lx.xy should be read as lx.(lx.(xy)).

Substitution

Computation in the lambda calculus takes place by substitution. There are two main rules:

a-reduction (renaming bound variables): lx.E => ly.E[x/y], where E[x/y] denotes the result of replacing all free occurrences of x in E by y, provided y would not be captured by a ly already in E. Example:
OK: lx.ly.xy => lz.ly.zy (rename x to z, nothing bad happened)
not OK: lx.ly.xy => ly.ly.yy (rename x to y, y was captured)
b-reduction (substitution rule): (lx.E F) => E[x/F], where E[x/F] denotes the result of replacing all free occurrences of x in E by F. Before doing this, bound variables in E are renamed by a-reduction if necessary to avoid capturing free variables in F. Example:
OK: (lx.ly.xy lz.z) => ly.(lz.z y)
not OK: (lx.ly.xy lz.yz) => ly.(lz.yz y) (y was captured)
but if we rename the bound y first, it's OK:
(lx.ly.xy lz.yz) => (lx.la.xa lz.yz) => la.(lz.yz a)

Examples in Scheme:

(lambda (x) (+ x 3)) is equivalent to (lambda (y) (+ y 3)).
This is just the substitution rule. ((lambda (x) (+ x 3)) 7) => (+ 7 3).

An a- or b-reduction step can be performed at any time to any subterm of a lambda term. Write E => F if E goes to F in some finite number of a- or b-reduction steps.

Order of Evaluation

In the lambda calculus, a reduction rule can be performed at any time to any subterm of a lambda term at which the rule is applicable. Scheme specifies an order of evaluation, but the lambda calculus does not. It turns out that it does not matter because of the Church-Rosser theorem: If E => F₁ and E => F₂, then there always exists a G such that F₁ => G and F₂ => G.

             E
            / \
           /   \
          /     \
         /       \
        F₁F₂         \       /
          \     /
           \   /
            \ /
             G

So if you diverge by applying rules to different parts of the term, you can always get back together again by applying more rules. Example:

  ((lambda (x) (+ x x)) (* 2 3))
            /       \
       lazy/         \eager
          /           \
         /             \
(+ (* 2 3) (* 2 3))((lambda (x) (+ x x)) 6)         \             /
          \           /
           \         /
            \       /
             (+ 6 6)

Terms are equivalent if they have a common reduct; that is, E₁ == E₂ if there exists an F such that E₁ => F and E₂ => F.

        E₁E₂         \       /
          \     /
           \   /
            \ /
             F

The relation == is an equivalence relation on terms. To show transitivity, we use the Church-Rosser property: if E₁ == E₂ and E₂ == E₃, then there exist F₁ and F₂ such that

        E₁E₂E₃         \       / \       /
          \     /   \     /
           \   /     \   /
            \ /       \ /
             F₁        F₂

By the Church-Rosser property, there exists a G such that F₁ => G and F₂ => G:

        E₁E₂E₃         \       / \       /
          \     /   \     /
           \   /     \   /
            \ /       \ /
             F₁        F₂         \       /
               \     /
                \   /
                 \ /
                  G

which proves that E₁ == E₃.

Currying

All functions are unary in the lambda calculus. Functions of higher arity are simulated by currying. A function of two arguments is simulated by a function of one argument that returns a function of one argument that returns the result. That is, we would simulate

(lambda (x y) E)

in curried form by

(lambda (x) (lambda (y) E)) = lx.ly.E

Instead of applying the function to two arguments, e.g.

(f A B)

we apply it to the first argument, then apply the result to the second argument:

((f A) B)

Encoding Other Data Types

The lambda calculus is powerful enough to encode other data types and operations.

Booleans

Define

#t =def= lx.ly.x
#f =def= lx.ly.y

In other words, #t and #f are two-argument functions that just return the first and second argument, respectively. Then we can take "if" to be the identity function!

if =def= lx.x

(if #t B C) = ((if #t) B C) => (#t B C) = ((lx.ly.x B) C)
=> (ly.B C) => B

(if #f B C) = ((if #f) B C) => (#f B C) = ((lx.ly.y B) C)
=> (ly.y C) => C.

We can define other Boolean functions:

and =def= lx.ly.xyx
or =def= lx.ly.xxy

Then for example

(and #t #f) => (#t #f #t) => (lx.ly.x #f #t) = #f.

Pairs and Lists

In Scheme, all cons does is package up two things in such a way that they can be extracted again by car and cdr. To code this in the lambda calculus, we can take two terms A and B and produce a function that will apply another function to them.

cons =def= lx.ly.lc.cxy

Thus if we apply this to A and B we get

(cons A B) => lc.(c A B)

Now we can extract A and B again by applying this function to #t and #f, respectively:

(lc.(c A B) #t) => (#t A B) = (lx.ly.x A B) => A
(lc.(c A B) #f) => (#f A B) = (lx.ly.y A B) => B

so we should define

car =def= ld.(d #t)
cdr =def= ld.(d #f).

If we define the null list object appropriately, we can get a null? test:

null =def= lx.#t
null? =def= ld.(d lx.ly.#f).

Then

(null? null) = (ld.(d lx.ly.#f) lx.#t) => (lx.#t lx.ly.#f) => #t

(null? (cons A B)) = (ld.(d lx.ly.#f) lc.(c A B))
=> (lc.(c A B) lx.ly.#f)
=> (lx.ly.#f A B)
=> #f.

Numbers and Arithmetic

In Scheme, numbers are a built-in primitive type. In the lambda calculus, we have to encode them as lambda terms. Church came up with a neat way to do this and his encoding is called the Church numerals. The idea is to encode the number n as a lambda term n representing a function that takes another function f as input and produces the n-fold composition of f with itself.

0 =def= lf.lx.x
1 =def= lf.lx.fx
2 =def= lf.lx.f(fx)
3 =def= lf.lx.f(f(fx))
...
n =def= lf.lx.f(...(f(f(fx)))...)

Note that 0 is just #f after a-conversion. Let's write fⁿx as an abbreviation for the term f(...(f(f(fx)))...). Then

n =def= lf.lx.fⁿx.

This is nice, but in order to do anything with it, we have to encode arithmetic. Let's start with the simplest operation: successor. We would like to define a lambda term add1 that when applied to any n produces n+1. Note that

nfx => fⁿx,

f (nfx) => (f fⁿx) = fⁿ⁺¹x.

Lambda abstracting, we get

lf.lx.f(nfx) => lf.lx.fⁿ⁺¹x = n+1.

Therefore we want our successor function to produce lf.lx.f(nfx) from n, so we define

add1 =def= ln.lf.lx.f(nfx).

How do we test for 0?

zero? =def= ln.(n lz.#f #t).

Then

(zero? 0) = (ln.(n lz.#f #t) lf.lx.x)
=> (lf.lx.x lz.#f #t) => (lx.x #t) => #t

(zero? n+1) = (ln.(n lz.#f #t) lf.lx.fⁿ⁺¹x)
=> (lf.lx.fⁿ⁺¹x lz.#f #t) => (lx.(lz.#f)ⁿ⁺¹x #t)
=> ((lz.#f)ⁿ⁺¹#t) => #f.

Addition is just iterated successor. Note that

(n add1) => lx.(add1ⁿ x)

thus

(n add1 m) => (lx.(add1ⁿ x) m) => (add1ⁿ m) => m+n.

This suggests we define

+ =def= ln.lm.(n add1 m).

Multiplication can be defined by iterated addition and exponentiation by iterated multiplication in a similar fashion.

Subtraction is a little more difficult, but it can be done. Define inductively

[] =def= null
[n n-1 n-2 ... 2 1 0] =def= (cons n [n-1 n-2 ... 2 1 0])

So [...] represents a proper list in the usual Scheme sense under our encoding. Our approach is to come up with a term F such that

nF[0] => [n n-1 n-2 ... 2 1 0].

Then we will be able to take

sub1 =def= ln.(if (zero? n) 0 (car (cdr (nF[0])))).

Note

nF[0] => (Fⁿ [0])

so this will work provided for all n we have

(F [n n-1 n-2 ... 2 1 0]) => [n+1 n n-1 n-2 ... 2 1 0].

This can be achieved by defining

F =def= lx.(cons (add1 (car x)) x).

Recursion

A recursive definition in Scheme such as

(define fact
  (lambda (x) (if (zero? x)
                  1
                  (* x (fact (sub1 x))))))

defines the function fact in terms of itself. This self-referential capability that we can achieve in Scheme at first seems impossible to simulate in the lambda calculus because of the lack of names. However it can be done.

Note that we are really looking for a term F that is a solution of the equation

F == lx.(if (zero? x) 1 (* x (F (sub1 x)))).

Put another way, we are looking for a fixpoint of the function

T =def= lf.lx.(if (zero? x) 1 (* x (f (sub1 x)))),

that is, a term F such that TF == F.

Fixpoint theorems abound in mathematics. A whorl on your head where your hair grows straight up is a fixpoint. At any instant of time, there must be at least one spot on the globe where the wind is not blowing. For any continuous map f from the closed real unit interval [0,1] to itself, there is always a point x such that f(x) = x.

The lambda calculus is no exception. Any term W has a fixpoint

(lx.W(xx) lx.W(xx))

as can be seen by performing one b step:

(lx.W(xx) lx.W(xx)) => W(lx.W(xx) lx.W(xx)).

Moreover, there is a lambda term Y, called the fixpoint combinator, which applied to any W gives its fixpoint:

Y =def= lt.(lx.t(xx) lx.t(xx)).

Let's argue that the fixpoint of T above gives the factorial function. Define

fact =def= (lx.T(xx) lx.T(xx)).

We wish to show that for any n,

(fact n) => n!

We argue this by induction on n. Basis:

(fact 0) => ((lx.T(xx) lx.T(xx)) 0) => (T(lx.T(xx) lx.T(xx)) 0)
=> (lx.(if (zero? x) 1 (* x ((lx.T(xx) lx.T(xx)) (sub1 x)))) 0)
=> (if (zero? 0) 1 (* 0 ((lx.T(xx) lx.T(xx)) (sub1 0))))
=> 1.

Induction step:

(fact n+1) => ((lx.T(xx) lx.T(xx)) n+1) => (T(lx.T(xx) lx.T(xx)) n+1)
=> (lx.(if (zero? x) 1 (* x ((lx.T(xx) lx.T(xx)) (sub1 x)))) n+1)
=> (if (zero? n+1) 1 (* n+1 ((lx.T(xx) lx.T(xx)) (sub1 n+1))))
=> (* n+1 ((lx.T(xx) lx.T(xx)) (sub1 n+1)))
=> (* n+1 ((lx.T(xx) lx.T(xx)) n))
= (* n+1 (fact n))
=> (* n+1 n!) (by the induction hypothesis)
=> (n+1)!