Decoding Lambda Calculus Terms
I find the idea of programming in pure untyped lambda calculus absurd and amusing. I ran into a roadblock with my interpreter, though; the encoded versions of values in lambda calculus are not super friendly for human eyes. Even natural numbers take a second to parse: e.g. $\lambda f x.f (f x)$. Wouldn’t it be nice if the console could print values in their decoded form (as an aside, everything is an encoding, we’re simply so used to seeing 2 that we unbox it somewhat automatically as the abstract concept of 2)?
My first idea was to just analyze the syntax tree. This is pretty simple; given a Church numeral with variable $f$, we can just count the number of $f$’s. However, note that this only works if the number is in exactly the form we expect - it cannot be an un-reduced form of the same concept. This isn’t a huge issue, as we can just reduce the expression prior to decoding it, but it is a bit inelegant in my opinion. Couldn’t there be some way to represent this translation in a more pure way? After all, the encoding must have some meaning that can be accessed from within lambda calculus, or it would be useless.
I had to chew on this for a while. Continuing the example of Church numerals, a natural number $n$ is encoded as repeated application of an argument $f$ to an argument $x$ - $n$ times, specifically. If we could let $x = 0$, and $f$ be the successor function $y\rightarrow y + 1$, we could retrieve the decoded number from the reduced version of the expression. In a psuedo-Haskell notation, given the encoding $n$, we reduce the expression $n (y\rightarrow y+1) 0$. This would be nice, but our current grammar cannot support this notion of embedding typed expressions into lambda calculus computations. To review, we have the following data type:
data Expr = Var Name
| App Expr Expr
| Lam Name Expr
My first idea was to parameterize the expression with some type $a$ - the type of the embedded computations. Then, our data looks like this:
data Expr a = Var Name
| App (Expr a) (Expr a)
| Lam Name (Expr a)
| Embedded a
The idea here is that the reduction gives you some final expression Embedded x
,
and you just extract the inner value. Unfortunately, the devil is in the details.
How do encode typed functions here? The Lam
constructor gives us a way to
represent untyped functions, but we would need to embed the typed functions
within the type parameter. For a natural number, maybe we would represent the
intermediate computation as the following:
data IntermediateNat = Value Int
| Increment
With this, we can specify how to combine our typed values:
combine :: IntermediateNat -> IntermediateNat -> IntermediateNat
combine Increment (Value n) = Value (n + 1)
combine (Value n) (Value m) = ???
Crap. The extra rigidity of a type system means we cannot accept every
combination of applications. There are two options here: we can leave that
as undefined behavior, causing a runtime error, or we can make the possibility
of failure explicit. Whenever possible, we would like to represent all expected
states explicitly within the type system, because it’s nice to deal with Maybe
,
compared to dealing with runtime errors. So, let’s do that:
combine :: IntermediateNat -> IntermediateNat -> Maybe IntermediateNat
combine Increment (Value n) = Just (Value (n + 1))
combine _ _ = Nothing
Maybe we can pass these combine
functions as an argument to the reduce
function. It’s an algebra of some type - a partial semigroup, maybe? There is
an additional case to consider when searching for a redex. Specifically
App (Embedded Increment) (Embedded (Value n))
should now also be considered a redex. I explored the possibility of only
accepting these embedded redexes when they are valid (e.g., don’t accept
App (Embedded (Value n)) (Embedded (Value m)
), but I ran into trouble with
that idea later.
Ok, so it looks like we can write a Church encoding decoder. Given an expression that should be an encoding of a Church numeral, we apply that function to the increment and zero embeddings.
decodeChurch :: Expr Int -> Maybe Int
decodeChurch n =
let e = App (App n (Embedded Increment)) (Embedded (Value 0)) in
...do the reduction and extract the value...
At this point, I thought I had basically figured out decoding data types. I then turned my attention to decoding lists. This turned out to be much harder. Recall that a list $[x_1,…,x_n]$ is encoded as \(\lamdba s_1.s_1 x_1 (\lambda s_2.s_2 x_2 (\ldots(\lambda s_n.s_n x_n nil)\ldots))\)
This is trickier, because it’s not as simple as applying a function to a static
“zero” element; we need to get the $x$’s back out. It looked a bit impossible at
first, because it seemed like we would need to be doing $n$ operations at once.
However, recall that a linked list consists of a cons of the head of the list and
the tail of the list (in Haskell,the cons functions is (:)
). The bracket notation
is just syntactic sugar. We can desugar our list as $x1:x2:…:xn:[]$, and,
even further (by turning infix (:) into prefix) as (:) x1 ((:) x2 (...((:) xn [])...))
. If you’re familiar with functional programming, this should remind you of folds. Luckily, a fold is a pure recursive function, which can thus be represented in lambda calculus. All we need now is to figure out how to encode the Haskell cons function and intermediate Haskell lists.
The other tricky part of lists is that they’re parameterized over the type of element in the list. There are two approaches to this. We can either first get a list of subexpressions, and then run a conversion function over the subexpressions, or we can run the conversion function over the subexpressions, and then decode the list. I somewhat arbitrarily ran with the former option.
Notice that the we currently have no way of getting a Haskell list of expressions, because the algebras we use only operate on decoded values. For
this, I decided to modify the definition of Expr
to the following:
data TypedExpr a = Var Name
| App (TypedExpr a) (TypedExpr a)
| Lam Name (TypedExpr a)
| Typed a
| TypedFunc (a -> Maybe (TypedExpr a))
| UntypedFunc (TypedExpr a -> Maybe (TypedExpr a))
It’s also helpful to have a type for the old, pure lambda expressions - let’s
call that the UntypedExpr
type.
This representation will allow us to define functions of any arity. We have two different kinds of redexes to worry about now. We can define a new data type to handle all of these types of redexes:
data Redex a = UntypedRedex Context Name (TypedExpr a) (TypedExpr a)
| MixedRedex Context (TypedExpr a -> Maybe (TypedExpr a)) (TypedExpr a)
| TypedRedex Context (a -> Maybe (TypedExpr a)) a
The context is an expression that has a “hole” in it for place where the
redex comes from. As an aside, I used to represent this as a function
TypedExpr a -> TypedExpr a
, but this doesn’t enforce the constraint that
the hole can only appear once. We can use the following representation:
data Context a = LamContext Name (Context a)
| LeftAppContext (Context a) (TypedExpr a)
| RightAppContext (TypedExpr a) (Context a)
| Here
We can use some simple structural induction to prove this constraint. The leaf
node Here
clearly has exactly one hole. Given a non-terminal context $C$,
assume each of its sub-contexts has exactly one hole. Because every non-terminal
Context
constructor has exactly one Context
subexpression, we know that $C$
also has exactly one hole.
Let’s get back to business, and lists. First, let’s define the empty list expression:
emptyListExpr :: Expr [a]
emptyListExpr = Typed []
That was simple - it’s just a typed value we embed in an expression. The cons
expression is more complicated, because it takes in two arguments. The first
argument will be an untyped value (an Expr [a]
), and the second will be a typed value (a [Expr [a]]
). We would like to write the following:
consListExpr :: Expr [a] -> [Expr [a]] -> [Expr [a]]
consListExpr = (:)
However, consListExpr
is not an expression, it is a Haskell function that
returns an expression. As such, we cannot embed it in an expression. Using the
new representation, we can define a closed form expression as follows:
consListExpr :: Expr [a]
consListExpr =
UntypedFunc (\x -> Just (
TypedFunc (\xs -> Just (
Typed (x:xs)
))))
I’m not a huge fan of the syntax here. It seems like there might be some nice
way to hide this behind do
-notation, but we can save that for another time.
Now, given an encoded list xs
, we can construct an expression that will
reduce to a typed value: foldr # consListExpr # emptyListExpr # xs
(here,
I introduce the alias (#) = App
).
So we’re done, right? Well, I thought so, but… no. On a non-empty list,
foldr f z (x:xs) = f x (foldr f z xs)
. In normal order reduction, f
is
applied before evaluating the recursive foldr
call. Unfortunately, the
un-applied recursive foldr
call is not a typed list yet - it only becomes
that way once it’s applied. We have two options - switch to applicative order
(ew), or restructure the computation by the time f
is called, the second
argument is actually a typed list. What if we look to foldr
’s cousin, foldl
?
On a non-empty list, foldl f z (x:xs) = foldl f (f z x) xs
. Now, f
is not
applied to non-reduced terms. As a side-effect, we can’t use cons anymore,
because doing so would result in a reversed-order list -
foldl (flip (:)) [] [x1...xn] = flip (:) (flip (:) (flip (:) [] x1) x2)... xn
= xn...x2:x1:[]
= [xn,...,x1]
All we have to do to fix this is switch from cons to snoc. As an implementation detail, we can use difference lists internally, because snoc is $O(n)$ on regular lists, but $O(1)$ for difference lists.