Wednesday 29 August 2007

Fakin' closures

Closures are awesome. There are so many algorithms that can be implemented in a very straightforward manner once you have the ability to make little customised functions and pass them around.

For example, it makes functional programming in D a snap. Here's a simple example of what I mean:

string format_list(real[] parts, string sep)
{
    string reduce_closure(string left, string right)
    {
        return left ~ sep ~ right;
    }

    string map_closure(real n)
    {
        return toString(n);
    }

    return reduce(&reduce_closure, map(&map_closure, parts));
}

The one failing of D's closures are that... they aren't really closures. See, D's closures are made up of a function pointer, and a pointer to the enclosing function's stack pointer. The stack pointer is what allows you to actually create closures, but it's also why you must not use them outside the enclosing scope's lifetime. As soon as your closure goes out of scope, it's bye-bye stack frame, and hello Mr. Segmentation Fault if you should try to call it. Oh dear.

In practice, this can be worked around by putting your closure inside a class and heap allocating it, but that's annoying. Wouldn't it be nice to be able to just return the damn things out of your functions, or store them somewhere for later use?

If you're easily put off my horrible, hackish code: you'd best leave now. It's going to get nasty.

Remember how I said that closures are made up of a pointer to some code and the enclosing function's stack pointer? Well, on x86 the stack grows downwards, and a function's arguments are stored just above the stack pointer with its local variables below the stack pointer.

Say we have a function foo, with a closure bar, and another function baz.

If we call baz from foo, passing our closure bar, then baz would know two important things:

  1. It would have an upper-bound on foo's local variables by virtue of knowing its stack pointer (which is contained in the delegate to bar) and

  2. it would have a lower-bound based on the address of its own first argument on the stack.

So if we have both a lower and upper bound, we can create a heap copy of foo's local variables, and store that in our delegate instead of the pointer to the stack.

Just to prove the concept, here's a working implementation:

module closures;

import std.stdio;

/**
 * Takes a closure defined in the caller, copies the local variables onto the
 * heap, and returns a modified delegate.
 */
dgT closure(dgT)(dgT dg)
{
    void* end_ptr = dg.ptr;
    void* start_ptr = &dg;

    auto context = new ubyte[end_ptr-start_ptr+1];
    context[] = cast(ubyte[]) start_ptr[0..context.length];

    dg.ptr = &context[$-1];
    return dg;
}

/**
 * Creates a closure that sets *ptr to value when called.
 */
void delegate() make_dg(uint* ptr_, uint value_)
{
    auto ptr = ptr_;
    auto value = value_;

    void dg()
    {
        *ptr = value;
    }

    return closure(&dg);
}

void main()
{
    uint f;

    auto fn1 = make_dg(&f, 42);
    auto fn2 = make_dg(&f, 1701);

    f = 15;
    writefln("f: %s", f);
    fn1();
    writefln("f: %s", f);
    fn2();
    writefln("f: %s", f);
}

When run, it prints:

f: 15
f: 42
f: 1701

Just like one would expect. As I mentioned before, this only works if your closure doesn't access the enclosing function's arguments. But apart from that, it's a pretty neat trick, don'tcha think?

Well, hey, let's just make everything into a closure, and then we'll have our general garbage collector, installed by 'use less memory'. —Larry Wall

4 comments:

Anonymous said...

Here's how you would do something similar using the Tools package from scrapple:

xt4100 ~/d $ cat test2.d && echo "------" && gdc test2.d -o test2 scrapple/trunk/tools/tools/*.d -Iscrapple/trunk/tools && ./test2
module test2;
import std.stdio, tools.ext;
void delegate() make_dg(uint *target, uint what) {
return (ref uint *t, ref uint w) { *t=w; }~fix(target, what);
}

void main() {
uint f;
auto fn1=make_dg(&f, 42);
auto fn2=make_dg(&f, 1701);
f=15; writefln(f);
fn1(); writefln(f);
fn2(); writefln(f);
}
------
15
42
1701
The disadvantage is that you need to explicitly specify the external values you want to use. The advantage is that it stays away from the stack (or is that a disadvantage?)
std.bind works too, although I'd say that your trick has the advantage of being significantly neater. :)

--downs (who, coincidentally, wrote the tools package ^^)

Dk said...

Cool; that approach is probably a lot safer than mine is.

To be honest, I'd be much happier if delegates grew some kind of .dup method, but there you go.

Anonymous said...

I'm affraid it doesn't work at all.

With compiler v1.010, it would work as intended in debug mode but not in release, giving 15, 42, 165. As for v1.024, both debug and release give 15, 42, 165.

Anonymous said...

I used a mixin-expression and a ctfe-function to generate something along the lines of:

string var1 = "i am the context";
int var2 = 5;

&( ((new class Object {
this(string var1, int var2) { /* ... */ }
void run() {
/* closure body pasted here */
}
})(var1, var2) ).run );

which is also really safe but very complicated to implement and not that easy to use because you have to write the function body inside a string and have to explicitely select the variables you want to "import".

-- asmanian