I read the paper Abstract Models of Memory Management from Morrisett, Felleisen, and Harper (1995) today. And it contained some very interesting things about Garbage Collection.

The idea of this paper is that we can invent a formal language to talk about Memory Management at a higher level than bits, bytes, and pointers. Once we can speak at a high level of objects, intentions, and computation as rule reductions our mental landscape is made clearer so the interesting things we want to say are easier to express. This is my primary interest in the world of computer languages: taking away the burden of knowing about computers so effort can be focusing understanding your problems and their potential solutions. This is also why I prefer Lispy languages, and why the authors of this paper chose lambda-calculus as a computation model. It makes it easy to build domain specific languages with powerful, understandable semantics and clear, consistent syntax.

There was this great gem about the impossibility of a finding a perfectly optimal garbage collector by reducing optimal garbage collection to the Halting Problem, the pedagogical undecidable problem. (Notes: lambda-gc is a variation of the lambda-calculus for the purposes of use in this paper, but the idea can be applied to any language really.)

Proposition 2.4 (Garbage Undecidable)

Determining if a binding is garbage in an arbitrary closed lambda-gc program is undecidable.

Proof (sketch):

We can reduce the halting problem to an optimal collector by taking an arbitrary program, adding a binding to the heap and modifying the program so that if it terminates, it accesses the extra binding. An optimal collector will collect the binding if and only if the original program does not terminate.This is very cool. I like finding out that certain things are undecidable, because they we don’t have to feel bad about ourselves because we can’t figure out what the optimal solution would be. Although, we get instead the assertion that we’ll never agree on what the best method is and you never quite know if your algorithm if the best.

The second thing that I thought was very clever was this illustration of why having your garbage collector know something about your type system is a good thing. Most garbage collectors just look at THAT pointers go, not WHERE pointers go or WHY, by having your GC support Type Inference it can garbage collect values that will never be touched but *could* be.

Remember, lambda-calculus is functional so we don’t have to worry about functions and variables changing behind our back. (Also note: Here (P i e) means that e is a list and we want the i-th element. So if e = (1 2) then (P 1 e) = 1.)

For a simple example, consider the following lambda-gc program:

letrec {x1 = 1, x2 = 2, x3 = (x2, x2), x4 = (x1, x3) } in x4.1Every binding in the heap is accessible from the program’s expression (p1 x4), so the free-variable based collection rules can collect nothing. But clearly the program will never dereference x3 nor x2. The inference-based collection scheme described in this section will allow us to conclude that replacing the binding x3 = (x2, x2) with x3 = 0 (or any other binding) will have no observable effect on evaluation. That is, the inference collection scheme shows that

letrec {x1 = 1, x2 = 2, x3 = 0, x4 = (x1, x3)} in p1 x4is Kleene equivalent to the original program. Now by applying the free-variable rule, we can conclude that the binding x2 = 2 can be safely collected.

I like this because it shows that by making more formal languages with more type-annotations (and my imaginations imagines other formalisms that would help) we can build more intelligent garbage collectors (and other language features) to make computation: less expensive, faster, and more expressive.