Wednesday, 29 August 2007

Fakin' closures

Closures are awesome. There are so many algorithms that can be implemented in a very straightforward manner once you have the ability to make little customised functions and pass them around.

For example, it makes functional programming in D a snap. Here's a simple example of what I mean:

string format_list(real[] parts, string sep)
{
    string reduce_closure(string left, string right)
    {
        return left ~ sep ~ right;
    }

    string map_closure(real n)
    {
        return toString(n);
    }

    return reduce(&reduce_closure, map(&map_closure, parts));
}

The one failing of D's closures are that... they aren't really closures. See, D's closures are made up of a function pointer, and a pointer to the enclosing function's stack pointer. The stack pointer is what allows you to actually create closures, but it's also why you must not use them outside the enclosing scope's lifetime. As soon as your closure goes out of scope, it's bye-bye stack frame, and hello Mr. Segmentation Fault if you should try to call it. Oh dear.

In practice, this can be worked around by putting your closure inside a class and heap allocating it, but that's annoying. Wouldn't it be nice to be able to just return the damn things out of your functions, or store them somewhere for later use?

If you're easily put off my horrible, hackish code: you'd best leave now. It's going to get nasty.

Remember how I said that closures are made up of a pointer to some code and the enclosing function's stack pointer? Well, on x86 the stack grows downwards, and a function's arguments are stored just above the stack pointer with its local variables below the stack pointer.

Say we have a function foo, with a closure bar, and another function baz.

If we call baz from foo, passing our closure bar, then baz would know two important things:

  1. It would have an upper-bound on foo's local variables by virtue of knowing its stack pointer (which is contained in the delegate to bar) and

  2. it would have a lower-bound based on the address of its own first argument on the stack.

So if we have both a lower and upper bound, we can create a heap copy of foo's local variables, and store that in our delegate instead of the pointer to the stack.

Just to prove the concept, here's a working implementation:

module closures;

import std.stdio;

/**
 * Takes a closure defined in the caller, copies the local variables onto the
 * heap, and returns a modified delegate.
 */
dgT closure(dgT)(dgT dg)
{
    void* end_ptr = dg.ptr;
    void* start_ptr = &dg;

    auto context = new ubyte[end_ptr-start_ptr+1];
    context[] = cast(ubyte[]) start_ptr[0..context.length];

    dg.ptr = &context[$-1];
    return dg;
}

/**
 * Creates a closure that sets *ptr to value when called.
 */
void delegate() make_dg(uint* ptr_, uint value_)
{
    auto ptr = ptr_;
    auto value = value_;

    void dg()
    {
        *ptr = value;
    }

    return closure(&dg);
}

void main()
{
    uint f;

    auto fn1 = make_dg(&f, 42);
    auto fn2 = make_dg(&f, 1701);

    f = 15;
    writefln("f: %s", f);
    fn1();
    writefln("f: %s", f);
    fn2();
    writefln("f: %s", f);
}

When run, it prints:

f: 15
f: 42
f: 1701

Just like one would expect. As I mentioned before, this only works if your closure doesn't access the enclosing function's arguments. But apart from that, it's a pretty neat trick, don'tcha think?

Well, hey, let's just make everything into a closure, and then we'll have our general garbage collector, installed by 'use less memory'. —Larry Wall

Sunday, 26 August 2007

The Future of D Is Aaargh, My Eyes!

So the first ever D Conference has come and gone. Sadly, being a poor uni student, Australian and busy like an anime nerd in Akihabara with 100,000 spare yen meant I couldn't go.

Thankfully, the ever-wonderful Brad Roberts has posted up most of the slides from the various speakers so the rest of us can take a peek. One of particular interest1 is the set from Walter and Andrei's talk on the future of D.

I encourage you to read the full set of slides, but here's what I think of it (for what that's worth.)

First up, some very welcome additions to the language that will make every-day programming a lot nicer. There's function and template overloading which will finally allow you to do this:

void foo(int i);
void foo(T)(T* t);

What can I say but "finally!"? Also on the topic of overloads are function overload sets. Currently, if you import overloaded functions from more than one module (for instance: you import two modules that both have a global toString function), you need to either fully qualify the particular overload you're interested in, or manually alias in each module's overloads.

Function overload sets do away with this provided each overload is distinct. It's a small thing, but it's the small things that make programming in D so much more pleasant than in C or C++2.

Another of my pet hates is going the way of spelling and grammar online: having to qualify enumeration members. Now you'll be able to elide the enum's name before a member in cases where the compiler already knows the enum's type. Thank god for that.

Then there's the upgrade to switch. One thing I love about D's switch is that by default it will loudly complain (by crashing your program) if you don't have a case to handle the value you're switching on. Walter takes it one step further with final switch which will actually refuse to compile if you don't have a case for every possible value.

Obviously more useful for enums and subsets of integers than, say, strings. But that's OK. No one's perfect.

And while we're on upgraded constructs: static foreach. About time. Not that you can't do it with a range tuple template, but this is much cleaner4

So that's the day-to-day stuff. What about the seriously cool stuff we can dangle in front of C++ and Java programmers to make them cry on the inside?

Well, first up we've got the construct that comes up every few months on the newsgroup: type method extensions. Or, as Walter calls it, uniform function call syntax.

foo(a, args...);
a.foo(args...);

Those two will now be interchangeable. And it'll work with built-in types like int, too. Having suggested this myself a few times, I couldn't be happier about this coming.

It seems that another idea that I've shared with many people is being worked on: pure functions. The concept of "pureness" comes from functional programming. A pure function is one whose output depends only on its arguments, and has no side-effects.

Take, for instance, the following function:

int foo(int a, int b)
{
    return a + b;
}

Yes, it's contrived, but stay with me. This function's result depends entirely on its inputs, and has no side-effects; it's pure. What does this mean?

  • It means that the compiler only ever has to call it once. If it sees the same function call in two places in a function, it can simply compute the result once, and reuse it. You can't do this with regular functions because the compiler can't guarantee that the function doesn't have side-effects.
  • More interestingly, it means that the compiler can actually cache results from the function for future re-use. It could even theoretically go the whole hog and just pre-compute every possible result at compile-time.
  • Best of all, because pure functions have no side-effects and don't depend on or alter global state, they can be automatically parallelised. Suddenly, it just got a whole lot easier to make use of all those cores we have these days.

But wait, there's more! After bitching and whining about it for years, structs are finally getting constructors and destructors!

That's fantastic. Know what's even better? They're getting a copy operator overload, too. This (along with the awesome alias some_member this; declaration) means that D will finally be able to easily support the last major memory-management technique: reference-counting. I can see this being huge in the game development circles.

There are also more new operator overloads: we're also getting opImplicitCastTo and opImplicitCastFrom. This means we'll be able to construct custom types that can seamlessly "pretend" to be other pre-existing types6.

This also ties into a new concept being introduced called polysemous values: these are values that have an indeterminant type7. Things like cast(int)1 + cast(uint)2, or a type with multiple implicit casts. This allows the compiler to reason about values for which it can't immediately determine their type, deferring the decision until later.

We're also getting a new array type that's distinct from slices; slices will work the same as they do now, except you won't be able to modify their length property to resize them, whilst you will be able to with arrays. Anyone who has been caught by obscure and baffling bugs when you start accidentally resizing slices will appreciate this; it makes D's arrays that much tighter and safer.

A feature that I actually poo-poohed a few days ago is getting in: struct inheritance. Sorry, "inheritance." It means that structs will be able to implement interfaces, complete with static checks to make sure all the methods are there, but won't be castable down to that interface. Kind of like C++ concepts for structs.

Another feature that's got me salivating in anticipation are static function parameters. That's where you mark one or more parameters to a function as being static, meaning that the value is "passed" at compile-time. It allows you to write functions that execute partially at compile-time. Think of it as a cross between CTFE and regular functions.

This also means you will be able to write functions that have different behaviours depending on whether some of the arguments are known at compile-time or not. That means you could use a slower but CTFE-compatible algorithm to perform some of the function at compile-time without having to worry about users accidentally using the slower version at runtime.

Then comes the big one: AST macros. These are kind of like templates except that instead of operating on specific types of values, they operate on bits of code. The simple way to think about it is this: when you pass something to a template, the compiler goes off and works out what that expression means, then gives it to the template. For macros, on the other hand, the compiler just hands the macro the unparsed expression and says "here, you work this out."

I have a funny feeling Don Clugston's already incredible BLADE library is going to be even more impressive once we get AST macros. If this keeps up, we may even be able to finally kill FORTRAN8.

Now, as I mentioned in an earlier post, I was looking forward to writing a neat shell-style variable expansion macro when all this came around. Except Walter's gone and added it into the language itself.

macro say(s) { ... }
auto a = 3, b = "Pi", c = 3.14;
say("$a musketeers computed $b to be $c\n");

Not quite as flexible as what I had in mind, but still. Pants.

Other interesting additions include:

  • the creation of a standard template library,
  • a nothrow contract on functions,
  • the order of function argument evaluation being defined,
  • three new string literal forms:
    • delimited strings,
    • code strings (which would have been so much more useful before macros) and
    • heredoc strings, and
  • a special return storage class for function arguments that let you create both const and non-const versions of a function without having to duplicate code.

Whew! Needless to say, I'm stoked about all this. To quote a famous (and sadly, very dead) Australian: I'm excited!

This really crystallises why I threw my lot in with D for me: because unlike C which is basically dead, C++ which is bloated, arthritic and has eight heads and Java which makes me want to beat myself over the head with a brick...

D makes programming fun. It doesn't just let me tell the computer what to do, it makes it easy. It's like statically-typed Python in a lot of ways.

But I do have to disagree with Walter on one point: the future of D isn't bright.

It's blinding.

[1]: Not to say they aren't all interesting, mind you.

[2]: The big things make programming in D more powerful, safe and flexible. And they cure baldness, too3.

[3]: Not speaking from personal experience, mind you.

[4]: So clean that if you use it twice a day, it'll make your teeth all sparkly.5

[5]: Just be careful not to get any on your gums... it's the tiny shards of glass, y'see...

[6]: It also vindicates the decision to use the opX naming system instead of C++'s operator X notation; implicit casting doesn't have a symbol.

[7]: I would have been tempted to call them "quantum values" or "entangled values" or even "boxed cats". But maybe that's just me.

[8]: Unlikely, but it's always nice to dream. I mean, some people voluntarily use Java!

Blinded by the light; revved up like a Deuce, another runner in the night... Madman drummers bummers, Indians in the summer with a teenage diplomat and yes I do know all the words to this damn song thankyouverymuch.