Today is a theory
day. We will take a look at the mathematical precursor to Scheme,
the *lambda calculus*.

The lambda calculus
was invented by Alonzo Church in the 1930s to study the interaction of *functional
abstraction* and *function application* from an abstract, purely mathematical point
of view.

It was one of many
related systems that were proposed in the late 1920s and 1930s. This was an
exciting time for theoretical computer science, even though computers had not
yet
been invented. There were many great mathematicians such as Kurt Gödel, Alan
Turing, and Emil Post who were trying to understand the notion of *computation*.
They certainly understood intuitively what an algorithm was, but they were
lacking a formal mathematical definition. They came up with a number of
different systems for capturing the general idea of computation:

- Turing machines -- Turing
- m-recursive functions -- Gödel
- rewrite systems -- Post
- the lambda calculus -- Church
- combinatory logic -- Schönfinkel, Curry

These systems all
looked very different, but what was amazing was that they all turned out to
be *computationally equivalent* in the sense that each could encode and
simulate the others. Church thought that this was too striking to be mere
coincidence, and declared that they had found the correct definition of the
notion of computability: it was the common spirit embodied in a different way
by each one of these formalisms. This declaration became known as *Church's
thesis*. Nowadays we have a better understanding of computability, and
one could add any modern general-purpose programming language such as Java, C,
or Scheme (suitably abstracted) to this list, but Church and his colleagues did
not have that luxury.

The lambda calculus is like Scheme except much simpler.

*Everything*is a function. There are no other primitive types---no integers, strings, cons objects, Booleans, etc. If you want these things, you must encode them using functions.- No state or side effects. It is purely functional. Thus we can think exclusively in terms of the substitution model.
- The order of
evaluation is irrelevant. Scheme imposes a certain evaluation order,
namely
*eager evaluation*---evaluate the arguments before applying the function. The lambda calculus does not specify an evaluation order. - No special forms (this is related to the last bullet). Special forms in Scheme are just expressions that are evaluated using a different evaluation order than the default. As mentioned, the evaluation order is irrelevant in the lambda calculus.
- Only unary (one-argument) functions. No thunks or functions of more than argument.

*Lambda terms*
E, F, G, ... are defined inductively:

**variables:**any variable x, y, z, ... is a lambda term;**application:**if E and F are lambda terms, then EF is a lambda term;**abstraction:**if E is a lambda term and x is a variable, then lx.E is a lambda term.

Notes:

- This is the basis of the inductive definition of lambda terms.
- The expression EF denotes application of E
(considered as a function) to F (considered as an input). This would
be written
`(E F)`

in Scheme. Think of this as an expression with an invisible operator "apply" between the E and F, like + or *. Parentheses are used for readability only, so they do not have the same significance as in Scheme. The apply operator is not associative, so (EF)G is not the same as E(FG) in general. The expression EFG written without parentheses should be read as (EF)G. - The expression
lx.E would be written
`(lambda (x) E)`

in Scheme. The application operator binds tighter than abstraction operator, so lx.lx.xy should be read as lx.(lx.(xy)).

Computation in the
lambda calculus takes place by *substitution*. There are two main
rules:

**a-reduction (renaming bound variables):**lx.E => ly.E[x/y], where E[x/y] denotes the result of replacing all free occurrences of x in E by y, provided y would not be captured by a ly already in E. Example:

OK: lx.ly.xy => lz.ly.zy (rename x to z, nothing bad happened)

not OK: lx.ly.xy => ly.ly.yy (rename x to y, y was captured)

**b-reduction (substitution rule):**(lx.E F) => E[x/F], where E[x/F] denotes the result of replacing all free occurrences of x in E by F. Before doing this, bound variables in E are renamed by a-reduction if necessary to avoid capturing free variables in F. Example:

OK: (lx.ly.xy lz.z) => ly.(lz.z y)

not OK: (lx.ly.xy lz.yz) => ly.(lz.yz y) (y was captured)

but if we rename the bound y first, it's OK:

(lx.ly.xy lz.yz) => (lx.la.xa lz.yz) => la.(lz.yz a)

Examples in Scheme:

`(lambda (x) (+ x 3))`

is equivalent to`(lambda (y) (+ y 3))`

.- This is just
the substitution rule.
`((lambda (x) (+ x 3)) 7) => (`

`+ 7 3)`

.

An a- or b-reduction step can be performed at any time to any subterm of a lambda term. Write E => F if E goes to F in some finite number of a- or b-reduction steps.

In the lambda
calculus, a reduction rule can be performed at any time to any subterm of a lambda
term at which the rule is applicable. Scheme
specifies an order of evaluation, but the lambda calculus does not. It
turns out that it does not matter because of the *Church-Rosser theorem*:
If E
=> F_{1} and E =>
F_{2}, then there always exists a G such that
F_{1} => G and
F_{2} => G.

```
E
/ \
/ \
/ \
/ \
F
```_{1 }F_{2
} \ /
\ /
\ /
\ /
G

So if you diverge by applying rules to different parts of the term, you can always get back together again by applying more rules. Example:

```
((lambda (x) (+ x x)) (* 2 3))
/ \
lazy/ \eager
/ \
/ \
(+ (* 2 3) (* 2 3))
```_{ }((lambda (x) (+ x x)) 6)_{
} \ /
\ /
\ /
\ /
(+ 6 6)

Terms are
equivalent if they have a common reduct; that is,
E_{1}
== E_{2} if there exists an F such that E_{1}
=> F and E_{2} =>
F.

` E`_{1 }E_{2
} \ /
\ /
\ /
\ /
F

The
relation == is an equivalence relation on terms. To show transitivity, we use the
Church-Rosser property: if E_{1} == E_{2} and E_{2}
== E_{3}, then there exist F_{1} and F_{2} such that

` E`_{1 }E_{2 }E_{3
} \ / \ /
\ / \ /
\ / \ /
\ / \ /
F_{1} F_{2}

By the
Church-Rosser property, there exists a G such that F_{1} =>
G and F_{2} => G:

` E`_{1 }E_{2 }E_{3
} \ / \ /
\ / \ /
\ / \ /
\ / \ /
F_{1} F_{2
} \ /
\ /
\ /
\ /
G

which proves that E_{1}
== E_{3}.

All functions are
unary in the lambda calculus. Functions of higher arity are simulated by *currying*.
A function of two arguments is simulated by a function of one argument that
returns a function of one argument that returns the result. That is, we
would simulate

`(lambda (x y) E)`

in curried form by

`(lambda (x) (lambda (y) E)) = lx.ly.E`

Instead of applying the function to two arguments, e.g.

`(f A B)`

we apply it to the first argument, then apply the result to the second argument:

`((f A) B)`

The lambda calculus is powerful enough to encode other data types and operations.

Define

#t =def=
lx.ly.x

#f =def= lx.ly.y

In other words, #t and #f are two-argument functions that just return the first and second argument, respectively. Then we can take "if" to be the identity function!

if =def= lx.x

(if #t
B C) = ((if #t) B C) =>
(#t B C) = ((lx.ly.x
B) C)

=> (ly.B C) =>
B

(if #f
B C) = ((if #f) B C) =>
(#f B C) = ((lx.ly.y
B) C)

=> (ly.y C) =>
C.

We can define other Boolean functions:

and =def=
lx.ly.xyx

or =def=
lx.ly.xxy

Then for example

(and #t #f) => (#t #f #t) => (lx.ly.x #f #t) = #f.

In Scheme, all cons does is package up two things in such a way that they can be extracted again by car and cdr. To code this in the lambda calculus, we can take two terms A and B and produce a function that will apply another function to them.

cons =def= lx.ly.lc.cxy

Thus if we apply this to A and B we get

(cons A B) => lc.(c A B)

Now we can extract A and B again by applying this function to #t and #f, respectively:

(lc.(c
A B) #t) => (#t A B) = (lx.ly.x
A B) => A

(lc.(c A B) #f) => (#f A B) = (lx.ly.y
A B) => B

so we should define

car =def=
ld.(d #t)

cdr =def=
ld.(d #f).

If we define the null list object appropriately, we can get a null? test:

null =def=
lx.#t

null? =def= ld.(d
lx.ly.#f).

Then

(null? null) = (ld.(d lx.ly.#f) lx.#t) => (lx.#t lx.ly.#f) => #t

(null? (cons A
B)) = (ld.(d lx.ly.#f)
lc.(c A B))

=> (lc.(c A B) lx.ly.#f)

=> (lx.ly.#f A B)

=>
#f.

In Scheme,
numbers are a built-in primitive type. In the lambda calculus, we have to
encode them as lambda terms. Church came up with a neat way to do this and his encoding is
called the * Church numerals*. The idea is to encode the number n as a lambda
term n representing a function
that takes another function f as input and produces the n-fold composition of f with
itself.

0
=def= lf.lx.x

1 =def= lf.lx.fx

2 =def= lf.lx.f(fx)

3 =def= lf.lx.f(f(fx))

...

n =def=
lf.lx.f(...(f(f(fx)))...)

Note that 0
is just #f after a-conversion. Let's write f^{n}x
as an abbreviation for the term f(...(f(f(fx)))...). Then

n
=def= lf.lx.f^{n}x.

This is nice, but in order to do anything with it, we have to encode arithmetic. Let's start with the simplest operation: successor. We would like to define a lambda term add1 that when applied to any n produces n+1. Note that

nfx
=> f^{n}x,

so

f (nfx)
=> (f f^{n}x) = f^{n+1}x.

Lambda abstracting, we get

lf.lx.f(nfx)
=> lf.lx.f^{n+1}x = n+1.

Therefore we want our successor function to produce lf.lx.f(nfx) from n, so we define

add1 =def= ln.lf.lx.f(nfx).

How do we test for 0?

zero? =def= ln.(n lz.#f #t).

Then

(zero? 0)
= (ln.(n lz.#f
#t) lf.lx.x)

=> (lf.lx.x
lz.#f #t) => (lx.x #t)
=> #t

(zero? n+1)
= (ln.(n lz.#f
#t) lf.lx.f^{n+1}x)

=> (lf.lx.f^{n+1}x
lz.#f #t) => (lx.(lz.#f)^{n+1}x
#t)

=>
((lz.#f)^{n+1
}#t) => #f.

Addition is just iterated successor. Note that

(n
add1) => lx.(add1^{n} x)

thus

(n
add1 m) =>
(lx.(add1^{n} x) m)
=> (add1^{n} m)
=> m+n.

This suggests we define

+ =def= ln.lm.(n add1 m).

Multiplication can be defined by iterated addition and exponentiation by iterated multiplication in a similar fashion.

Subtraction is a little more difficult, but it can be done. Define inductively

[]
=def= null

[n
n-1 n-2
... 2 1
0] =def=
(cons n
[n-1 n-2
... 2 1
0])

So [...] represents a proper list in the usual Scheme sense under our encoding. Our approach is to come up with a term F such that

nF[0] => [n n-1 n-2 ... 2 1 0].

Then we will be able to take

sub1 =def= ln.(if (zero? n) 0 (car (cdr (nF[0])))).

Note

nF[0]
=> (F^{n} [0])

so this will work provided for all n we have

(F [n n-1 n-2 ... 2 1 0]) => [n+1 n n-1 n-2 ... 2 1 0].

This can be achieved by defining

F =def= lx.(cons (add1 (car x)) x).

A recursive definition in Scheme such as

```
(define fact
(lambda (x) (if (zero? x)
1
(* x (fact (sub1 x))))))
```

defines the
function `fact`

in terms of itself. This self-referential
capability that we can achieve in Scheme at first seems impossible to simulate
in the lambda calculus because of the lack of names. However it can be
done.

Note that we are really looking for a term F that is a solution of the equation

F == lx.(if (zero? x) 1 (* x (F (sub1 x)))).

Put another way,
we are looking for a *fixpoint* of the function

T =def= lf.lx.(if (zero? x) 1 (* x (f (sub1 x)))),

that is, a term F such that TF == F.

Fixpoint theorems abound in mathematics. A whorl on your head where your hair grows straight up is a fixpoint. At any instant of time, there must be at least one spot on the globe where the wind is not blowing. For any continuous map f from the closed real unit interval [0,1] to itself, there is always a point x such that f(x) = x.

The lambda calculus is no exception. Any term W has a fixpoint

(lx.W(xx) lx.W(xx))

as can be seen by performing one b step:

(lx.W(xx) lx.W(xx)) => W(lx.W(xx) lx.W(xx)).

Moreover, there
is a lambda term Y, called the *fixpoint combinator*, which applied to any
W gives its fixpoint:

Y =def= lt.(lx.t(xx) lx.t(xx)).

Let's argue that the fixpoint of T above gives the factorial function. Define

fact =def= (lx.T(xx) lx.T(xx)).

We wish to show that for any n,

(fact n) => n!

We argue this by induction on n. Basis:

(fact 0)
=> ((lx.T(xx) lx.T(xx))
0) => (T(lx.T(xx)
lx.T(xx)) 0)

=> (lx.(if (zero? x) 1
(* x ((lx.T(xx) lx.T(xx))
(sub1 x)))) 0)

=> (if (zero? 0) 1
(* 0 ((lx.T(xx)
lx.T(xx)) (sub1 0))))

=> 1.

Induction step:

(fact n+1)
=> ((lx.T(xx) lx.T(xx))
n+1) => (T(lx.T(xx)
lx.T(xx)) n+1)

=> (lx.(if (zero? x) 1
(* x ((lx.T(xx) lx.T(xx))
(sub1 x)))) n+1)

=> (if (zero? n+1) 1
(* n+1 ((lx.T(xx)
lx.T(xx)) (sub1 n+1))))

=> (* n+1 ((lx.T(xx)
lx.T(xx)) (sub1 n+1)))

=> (* n+1 ((lx.T(xx)
lx.T(xx)) n))

= (* n+1 (fact n))

=> (* n+1 n!)
(by the induction hypothesis)

=> (n+1)!