Sunday, 4 November 2007

Firefox: How to Crash in Style

So Firefox just crashed on me. Again. So why, exactly, am I grinning like an idiot?

Back in the appreciatively nostalgic days, when Phoenix crashed, it just vanished from your desktop with maybe a generic "This program just lost the page you spent an hour trying to find with Google: sucks to be you" dialog and that's it. Firefox 2 made things a little more bearable by keeping track of what tabs you had open. In the event of a crash, it could then restore your session pretty much most of the time.

The one bug-bear was the damned crash reporter. It made you fill in all sorts of stuff, and click buttons, and half the time it just sat there twiddling its thumbs complaining about how it couldn't access the internet.

Enter the current alpha of Firefox 3. This thing makes crashing almost a total non-issue. I was reading through some old Coding Horror posts when Firefox just up and died; vanished from the face of the desktop. A second or two later, it pops up this dialog saying "Aww jeeze; sorry about that. We screwed up, but we'll get right on it." It then gives me a little field to pop in my email address if I want to be kept in the loop, a quit button and a restart button.

I clicked restart, and a few seconds later it's like the crash had never happened. The best part of all is that in the background, it's sending crash data back to those tireless peeps at Mozilla so they can figure out why it crashed, and stop it from happening again.

Why is this so much better than anything else I've used? I think Mozilla have managed to do three things really right with this:

  1. There's no long speech about how sorry they are, and how my bug report will help them and that my privacy is always blah, blah, blah, where's the cancel button?

  2. They don't ask me for any information at all. The crash reporter is apparently smart enough to get everything they need.

  3. There are two prominent buttons: quit and restart. Clicking restart puts me straight back where I was without any fuss.

Contrast this to the OpenOffice.org crash experience1. When that crashes, it throws up a full-blown wizard, asking for details on what you were doing. When it restarts (eventually, since OOo is so damnably lethargic) it then throws another wizard at you asking you if you want to recover your work. Well, of course I want to recover my work. What a stupid question. And then when it's done that, it stops you again to let you know that it's finished. That's great; get out of my way, already!

Now, all this is not to say that I'm happy with how often Firefox 3 appears to be crashing; I sincerely hope it gets more stable before it gets released proper. But when Firefox does crash, it does what it can to minimise the impact and let you get on with what you were doing.

Really, in an ideal world, this is how all software would work.

To get back to the old car analogies, Firefox 3 is the car that inexplicably explodes every few hours followed by immediately re-assembling itself, all while leaving the driver unscathed.

[1]: Note that I haven't actually had OpenOffice.org crash on me in a while. The following description could, in fact, be an amalgamation of various crash reporters I've experienced.

Wednesday, 29 August 2007

Fakin' closures

Closures are awesome. There are so many algorithms that can be implemented in a very straightforward manner once you have the ability to make little customised functions and pass them around.

For example, it makes functional programming in D a snap. Here's a simple example of what I mean:

string format_list(real[] parts, string sep)
{
    string reduce_closure(string left, string right)
    {
        return left ~ sep ~ right;
    }

    string map_closure(real n)
    {
        return toString(n);
    }

    return reduce(&reduce_closure, map(&map_closure, parts));
}

The one failing of D's closures are that... they aren't really closures. See, D's closures are made up of a function pointer, and a pointer to the enclosing function's stack pointer. The stack pointer is what allows you to actually create closures, but it's also why you must not use them outside the enclosing scope's lifetime. As soon as your closure goes out of scope, it's bye-bye stack frame, and hello Mr. Segmentation Fault if you should try to call it. Oh dear.

In practice, this can be worked around by putting your closure inside a class and heap allocating it, but that's annoying. Wouldn't it be nice to be able to just return the damn things out of your functions, or store them somewhere for later use?

If you're easily put off my horrible, hackish code: you'd best leave now. It's going to get nasty.

Remember how I said that closures are made up of a pointer to some code and the enclosing function's stack pointer? Well, on x86 the stack grows downwards, and a function's arguments are stored just above the stack pointer with its local variables below the stack pointer.

Say we have a function foo, with a closure bar, and another function baz.

If we call baz from foo, passing our closure bar, then baz would know two important things:

  1. It would have an upper-bound on foo's local variables by virtue of knowing its stack pointer (which is contained in the delegate to bar) and

  2. it would have a lower-bound based on the address of its own first argument on the stack.

So if we have both a lower and upper bound, we can create a heap copy of foo's local variables, and store that in our delegate instead of the pointer to the stack.

Just to prove the concept, here's a working implementation:

module closures;

import std.stdio;

/**
 * Takes a closure defined in the caller, copies the local variables onto the
 * heap, and returns a modified delegate.
 */
dgT closure(dgT)(dgT dg)
{
    void* end_ptr = dg.ptr;
    void* start_ptr = &dg;

    auto context = new ubyte[end_ptr-start_ptr+1];
    context[] = cast(ubyte[]) start_ptr[0..context.length];

    dg.ptr = &context[$-1];
    return dg;
}

/**
 * Creates a closure that sets *ptr to value when called.
 */
void delegate() make_dg(uint* ptr_, uint value_)
{
    auto ptr = ptr_;
    auto value = value_;

    void dg()
    {
        *ptr = value;
    }

    return closure(&dg);
}

void main()
{
    uint f;

    auto fn1 = make_dg(&f, 42);
    auto fn2 = make_dg(&f, 1701);

    f = 15;
    writefln("f: %s", f);
    fn1();
    writefln("f: %s", f);
    fn2();
    writefln("f: %s", f);
}

When run, it prints:

f: 15
f: 42
f: 1701

Just like one would expect. As I mentioned before, this only works if your closure doesn't access the enclosing function's arguments. But apart from that, it's a pretty neat trick, don'tcha think?

Well, hey, let's just make everything into a closure, and then we'll have our general garbage collector, installed by 'use less memory'. —Larry Wall

Sunday, 26 August 2007

The Future of D Is Aaargh, My Eyes!

So the first ever D Conference has come and gone. Sadly, being a poor uni student, Australian and busy like an anime nerd in Akihabara with 100,000 spare yen meant I couldn't go.

Thankfully, the ever-wonderful Brad Roberts has posted up most of the slides from the various speakers so the rest of us can take a peek. One of particular interest1 is the set from Walter and Andrei's talk on the future of D.

I encourage you to read the full set of slides, but here's what I think of it (for what that's worth.)

First up, some very welcome additions to the language that will make every-day programming a lot nicer. There's function and template overloading which will finally allow you to do this:

void foo(int i);
void foo(T)(T* t);

What can I say but "finally!"? Also on the topic of overloads are function overload sets. Currently, if you import overloaded functions from more than one module (for instance: you import two modules that both have a global toString function), you need to either fully qualify the particular overload you're interested in, or manually alias in each module's overloads.

Function overload sets do away with this provided each overload is distinct. It's a small thing, but it's the small things that make programming in D so much more pleasant than in C or C++2.

Another of my pet hates is going the way of spelling and grammar online: having to qualify enumeration members. Now you'll be able to elide the enum's name before a member in cases where the compiler already knows the enum's type. Thank god for that.

Then there's the upgrade to switch. One thing I love about D's switch is that by default it will loudly complain (by crashing your program) if you don't have a case to handle the value you're switching on. Walter takes it one step further with final switch which will actually refuse to compile if you don't have a case for every possible value.

Obviously more useful for enums and subsets of integers than, say, strings. But that's OK. No one's perfect.

And while we're on upgraded constructs: static foreach. About time. Not that you can't do it with a range tuple template, but this is much cleaner4

So that's the day-to-day stuff. What about the seriously cool stuff we can dangle in front of C++ and Java programmers to make them cry on the inside?

Well, first up we've got the construct that comes up every few months on the newsgroup: type method extensions. Or, as Walter calls it, uniform function call syntax.

foo(a, args...);
a.foo(args...);

Those two will now be interchangeable. And it'll work with built-in types like int, too. Having suggested this myself a few times, I couldn't be happier about this coming.

It seems that another idea that I've shared with many people is being worked on: pure functions. The concept of "pureness" comes from functional programming. A pure function is one whose output depends only on its arguments, and has no side-effects.

Take, for instance, the following function:

int foo(int a, int b)
{
    return a + b;
}

Yes, it's contrived, but stay with me. This function's result depends entirely on its inputs, and has no side-effects; it's pure. What does this mean?

  • It means that the compiler only ever has to call it once. If it sees the same function call in two places in a function, it can simply compute the result once, and reuse it. You can't do this with regular functions because the compiler can't guarantee that the function doesn't have side-effects.
  • More interestingly, it means that the compiler can actually cache results from the function for future re-use. It could even theoretically go the whole hog and just pre-compute every possible result at compile-time.
  • Best of all, because pure functions have no side-effects and don't depend on or alter global state, they can be automatically parallelised. Suddenly, it just got a whole lot easier to make use of all those cores we have these days.

But wait, there's more! After bitching and whining about it for years, structs are finally getting constructors and destructors!

That's fantastic. Know what's even better? They're getting a copy operator overload, too. This (along with the awesome alias some_member this; declaration) means that D will finally be able to easily support the last major memory-management technique: reference-counting. I can see this being huge in the game development circles.

There are also more new operator overloads: we're also getting opImplicitCastTo and opImplicitCastFrom. This means we'll be able to construct custom types that can seamlessly "pretend" to be other pre-existing types6.

This also ties into a new concept being introduced called polysemous values: these are values that have an indeterminant type7. Things like cast(int)1 + cast(uint)2, or a type with multiple implicit casts. This allows the compiler to reason about values for which it can't immediately determine their type, deferring the decision until later.

We're also getting a new array type that's distinct from slices; slices will work the same as they do now, except you won't be able to modify their length property to resize them, whilst you will be able to with arrays. Anyone who has been caught by obscure and baffling bugs when you start accidentally resizing slices will appreciate this; it makes D's arrays that much tighter and safer.

A feature that I actually poo-poohed a few days ago is getting in: struct inheritance. Sorry, "inheritance." It means that structs will be able to implement interfaces, complete with static checks to make sure all the methods are there, but won't be castable down to that interface. Kind of like C++ concepts for structs.

Another feature that's got me salivating in anticipation are static function parameters. That's where you mark one or more parameters to a function as being static, meaning that the value is "passed" at compile-time. It allows you to write functions that execute partially at compile-time. Think of it as a cross between CTFE and regular functions.

This also means you will be able to write functions that have different behaviours depending on whether some of the arguments are known at compile-time or not. That means you could use a slower but CTFE-compatible algorithm to perform some of the function at compile-time without having to worry about users accidentally using the slower version at runtime.

Then comes the big one: AST macros. These are kind of like templates except that instead of operating on specific types of values, they operate on bits of code. The simple way to think about it is this: when you pass something to a template, the compiler goes off and works out what that expression means, then gives it to the template. For macros, on the other hand, the compiler just hands the macro the unparsed expression and says "here, you work this out."

I have a funny feeling Don Clugston's already incredible BLADE library is going to be even more impressive once we get AST macros. If this keeps up, we may even be able to finally kill FORTRAN8.

Now, as I mentioned in an earlier post, I was looking forward to writing a neat shell-style variable expansion macro when all this came around. Except Walter's gone and added it into the language itself.

macro say(s) { ... }
auto a = 3, b = "Pi", c = 3.14;
say("$a musketeers computed $b to be $c\n");

Not quite as flexible as what I had in mind, but still. Pants.

Other interesting additions include:

  • the creation of a standard template library,
  • a nothrow contract on functions,
  • the order of function argument evaluation being defined,
  • three new string literal forms:
    • delimited strings,
    • code strings (which would have been so much more useful before macros) and
    • heredoc strings, and
  • a special return storage class for function arguments that let you create both const and non-const versions of a function without having to duplicate code.

Whew! Needless to say, I'm stoked about all this. To quote a famous (and sadly, very dead) Australian: I'm excited!

This really crystallises why I threw my lot in with D for me: because unlike C which is basically dead, C++ which is bloated, arthritic and has eight heads and Java which makes me want to beat myself over the head with a brick...

D makes programming fun. It doesn't just let me tell the computer what to do, it makes it easy. It's like statically-typed Python in a lot of ways.

But I do have to disagree with Walter on one point: the future of D isn't bright.

It's blinding.

[1]: Not to say they aren't all interesting, mind you.

[2]: The big things make programming in D more powerful, safe and flexible. And they cure baldness, too3.

[3]: Not speaking from personal experience, mind you.

[4]: So clean that if you use it twice a day, it'll make your teeth all sparkly.5

[5]: Just be careful not to get any on your gums... it's the tiny shards of glass, y'see...

[6]: It also vindicates the decision to use the opX naming system instead of C++'s operator X notation; implicit casting doesn't have a symbol.

[7]: I would have been tempted to call them "quantum values" or "entangled values" or even "boxed cats". But maybe that's just me.

[8]: Unlikely, but it's always nice to dream. I mean, some people voluntarily use Java!

Blinded by the light; revved up like a Deuce, another runner in the night... Madman drummers bummers, Indians in the summer with a teenage diplomat and yes I do know all the words to this damn song thankyouverymuch.

Saturday, 28 July 2007

pragma(msg, "Gday!");

No, I'm not dead.

Just thought I'd drop a line to Pragma who has just chiselled himself out a new corner of Cyberspace over at Phase Positive.

In a hole in the ground there lived a hobbit...

Tuesday, 19 June 2007

Const Wars

A long time ago, in an IRC channel far, far away...

Episode 5

A NEW CONST

It is a period of civil war. Rebel posters, striking from irc and the newsgroups, strive to win their final victory against the evil Constant Empire.

During this battle, Rebel agents were horrified to discover that the Empire's ultimate weapon, the D 2.0 Release, a new compiler with three kinds of const, had been released.

Pursued by the Empire's sinister agents, Princess Mutable races against the clock to find a way to destroy the terrible compiler to save her people and restore mutability to the galaxy...

Episode 5

A NEW CONST

It is a period of civil war. Rebel posters, striking from irc and the newsgroups, strive to win their final victory against the evil Mutable Empire.

During this battle, Rebel agents were horrified to discover that the Rebel's ultimate weapon, the D 2.0 Release, a new compiler with three kinds of const, was feared and hated.

Pursued by the Empire's sinister agents, Princess Invariant races against the clock to find a way to justify the badly needed compiler to save her people and restore type safety to the galaxy...

That's the wonderful thing about religion; you can use it to justify anything, with no requirement for facts or logic!

Thursday, 14 June 2007

Wrapping functions for fun and profit

Error handling sucks. I mean, I've got code to write! I don't want to have to waste my time making sure my function calls actually worked. After all, nothing ever goes wrong when I'm running it, so any problems are the user's fault.

Or something along those lines. That's one thing that always annoyed me about C-style libraries: you had to sit there writing all this code to handle errors on every single damn call. I can't count the number of times I've seen OpenGL examples where they never do a single error check, apparently because it's too much work.

So when I started writing GL code in D, I realised pretty quickly that this was sub-optimal. After all, you want to know when something's gone wrong (if you don't find out, how can you ever fix it?) but having to write all those ifs was out of the question on account of me being supremely lazy.

One way around this is to have a function that takes the error code from a function, and throws an exception if its something other than NO_ERROR. Sadly, OpenGL doesn't use return values; it has a separate function called glError that tells you if an error's occured.

OK, we can deal with this; we just need a templated function that checks for an error, throws an exception if there was one, or passes back what we pass into it.

T glCheck(T)(T result)
{
    if( glError() == GL_NO_ERROR )
        return result;
    else
        throw new GLException("OH NOES!");
}

And that does work. Well, except for functions that have void return types. That's when it starts to get a little ugly; we need to have a different function that we call afterwards.

And this is all well and good if you happen to like simple solutions to problems. Not me, though. I wanted something that I could stick in front of any GL call and have it do error checking. I also wanted to try and remove the double closing paren problem (every time you nest an expression, it gets just that tiny bit uglier).

So let's change things around a bit. Instead of a function that we pass the GL call's result to, let's create a function that wraps the GL call.

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
    alias ReturnType!(Fn) returnT;

    static if( is( returnT == void ) )
        Fn(args);
    else
        auto result = Fn(args);

    glCheckError();

    static if( !is( returnT == void ) )
        return result;
}

What we're doing here is creating a function that has the exact same signature as the function we want to call. When we call this wrapper function, it calls the underlying function, checks for errors (throwing an exception as necessary: that's the job of glCheckError), and returning the result.

Those static ifs are there because you can't declare a variable of type void in D, which kinda sucks. You'd use the above function like this:

glCheck!(glClear)(GL_COLOR_BUFFER_BIT);

For those keeping count, that's one character longer than the "pass-through" style. The nice thing is that this works uniformly with any function, no matter its return type.

However, we can still improve this. For instance, since we have an alias to the function being called, we can improve the call to glCheckError to this:

glCheckError((&Fn).stringof)

This allows us to report the exact GL call that failed (normally, all we would get is an exception telling us which error code we got). Even cooler, however, is we can use this information to actually log our GL calls as they happen:

version( gl_LogCalls )
{
    log.writef("%s",(&tFn).stringof[2..$]);
    log.writef("(");
    static if( args.length > 0 )
    {
        log.writef("%s", args[0]);
        foreach( arg ; args[1..$] )
            log.writef(", %s", arg);
    }
    log.writeLine(")");
    version( gl_LogCalls_Flush )
        log.flush();
}

If we place that in our glCheck function just before we call the function itself, it gives us the ability to trace through our GL code without having to hunt through functions. This can be really useful when you've got some weird behaviour, and can't figure out what's causing it.

One last improvement: in OpenGL, there are times where calling glError can itself cause an error. The most obvious of these are between glBegin and glEnd calls. You can solve this by either building some logic in to glCheck to account for glBegin/glEnd blocks, or you can do what I did and split the function into two: glSafe which does the error checking and glRaw which doesn't.

Before I go, one small note: if you are using DerelictGL, you need to replace the Fn in ReturnType!(Fn) and ParameterTypeTuple!(Fn) with typeof(Fn) because of a weird bug with specialising templates on aliases to function pointers, and replace the line that logs the name of the function with:

log.writef("%s",tFn.stringof);
Wrap me up, in your love, your love takes me higher...

Friday, 8 June 2007

Mixins, CTFE and shell-style variable expansion

Edit: Frits van Bommel pointed out that I'd forgotten about declaration mixins, which have now been added.

It didn't take all that long before PHP started annoying me. Don't get me wrong; it's a great language for hacking together dynamic web pages, but if I have to look at another full-blown CMS written in it, I am going to scream.

That said, PHP did get a few things right. One of these is the syntax it uses for variable expansion in strings. Basically, it's stolen almost verbatim from the Bourne shell and relatives:

$life = 42;
echo "Meaning of life: $life";

Which obviously expands to Meaning of life: 42. PHP adds to this by allowing you to use braces to expand a more complex expression involving member and array access.

Sadly, those of us in the statically-typed world have to make do with rather less attractive options like C-style format strings...

printf("Meaning of life: %d", life);

...C#-style format strings...

WriteLine("Meaning of life: {}", life);

..."whisper" syntax...

Cout("Meaning of life: ")(life).newline;

...or worst of all, the hideous monster that is C++'s iostreams.

cout << "Meaning of life: " << life
     << endl;

What would be really cool is a formatting function that uses PHP/shell style inline expansion, optionally with C-style formatting options. Best of both worlds. So, today I sat down and implemented such a formatter which makes this possible:

int a = 13;
int b = 29;

mixin(writeFln(
    "     a: $a" "\n"
    "     b: $b" "\n"
    " a + b: ${a+b}" "\n"
    "in hex: $(x){a+b}"));
     a: 13
     b: 29
 a + b: 42
in hex: 2a

Those already familiar with both string mixins and CTFE will no doubt already have an idea how I accomplished this; you can skip to the code at the end if you like. For those of you who would like to know how D makes this possible, read on.

Compile time function execution

The first feature that D adds to make this possible is compile time function execution or CTFE. Now this is basically just an extension of regular constant folding. Imagine you have code like this:

int registry = 1700 + 1;

You would expect any good compiler to be able to simplify this code since both numbers are constant (and the compiler hopefully knows how to add two numbers.) CTFE simply expands on this and allows you to use function calls in constant-folding, provided they satisfy a stringent set of requirements. The requirements are too long to list here, but suffice to say that they all boil down to two things: stick to basic structural constructs (no labelled breaks, gotos, nested functions, etc.), and don't do anything that involves pointers or memory allocation.

For example, CTFE allows us to do things like this:

char[] repeat(char[] a, uint n)
{
    char[] result;
    for( ; n>=0; --n )
        result ~= a;
    return result;
}

It's worth noting that simple array operations like changing the length, concatenation, etc. are allowed for CTFE.

String mixins

The other half of this magic trick are string mixins. In D, there are actually four different mixins:

  1. "Regular" mixins, or as I call them, scope mixins. These will, when you specify a template, mix the contents of that template into the current scope. For example:

    template foo()
    {
        char[] bar = "baz";
    }
    
    void main()
    {
        mixin foo;
        writefln(bar); // Outputs "baz".
    }
  2. Statement mixins. These mixins take a list of statements (as in D source code) as a compile-time constant string, and inserts them into the souce at that location. For instance, we could replace the last line of the previous example with:

    mixin("writefln(bar);"); // Outputs "baz".
  3. Expression mixins. These are like the statement mixins, except that they allow you to mix in any semantically complete expression. That includes things like "bar", "12+34", "a = foo(PI)" and even "writefln(bar)" (note the lack of a semicolon!)

  4. Declaration mixins. These are like statement mixins, except instead of mixin in executable statements, they mix in declarations (like functions, classes, imports, etc.).

    Many thanks to Frits van Bommel for pointing out that I stupidly missed this one.

Those last three are collectively known as "string mixins" since they take plain old strings of D source code. Now, you may be wondering how mixing in plain D source could be useful. It isn't, until you combine it with a CTFE function (or a template) that produces D source code from some other format.

Putting them together

The idea now is quite straightforward. We're going to write a CTFE compatible function that takes our format string and converts it into D source code. We then feed the result of this function into a string mixin which inserts our freshly generated code into our source file.

So what are we aiming for? Basically, we want to turn this:

"foo $bar baz"

Into this:

"foo ",bar," baz"

There are various ways of doing this, but my personal favourite is to write a simple state machine to process the string. So how would such a state machine look in a CTFE function? Lots of weird, crazy syntax? Sadly, no. It's actually pretty anticlimactic...

char[] ctfe_format_string(char[] fs)
{
    State state;
    char[] result;

    foreach( char c ; fs )
    {
        if( state == State.Default )
        {
            // output character & look for '$'
        }
        else if( state == State.PostDollar )
        {
            // work out if this is a variable or not...
        }
        else if( state == State.Varname )
        {
            // write out the variable name...
        }
        // ...
    }

    return result;
}

That's pretty much it; the actual details are just ordinary string manipulation code. There's really nothing terribly interesting to it at all. Aside from that, there's a few convenience methods for joining the string pieces together and dumping the result to STDOUT. You can check out the details by grabbing the source code. In fact, I encourage you to go read it now, just to see how stupidly simple this kind of manipulation is.

I mean, look at this:

// Finish the variable expansion
result ~= "))";

That's about as complex as the code gets.

So, OK; we've got it splitting strings up. Yeah, it's cute, but can you do anything else with it?

Perhaps the best example to date is Don Clugston's BLADE library which uses CTFE and string mixins to generate near-optimal floating-point vector code from library code.

Another example is Kirk McDonald's Pyd which uses string mixins behind the scenes. Or how about Gregor Richards' compile-time brainf**k compiler?

And there are other possibilities that haven't been explored yet. Like compiling SQL or LINQ statements, or creating parsers out of BNF grammars.

So the next time you've got some recurring pattern that you can't quite express concisely using ordinary code, think about how a little CTFE and mixin magic might help things along.

"Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats." —Howard Aiken

The source code for the formatter can be downloaded from the D wiki here, and is licensed under the BSDv2 license.

D 1.0 to be frozen in carbonite

So after much crying and gnashing of teeth, Walter has finally confirmed what I'm sure a lot of D users have been waiting to hear:

It's getting pretty clear to me that the 1.0 series won't get any new features, just bug fixes, as the recent 1.015 release suggests.

The new features will all go into the 2.0 series.

Walter Bright, Re: Do we need a time-out in D evolution?

This is somewhat mixed news. It's good in that we will now (hopefully) end up with a nice, stable compiler release and a less stable but cutting-edge experimental compiler release. It's bad in that it means more work for Walter; he introduced the largely unused -v1 compiler switch so that he wouldn't have to maintain two compiler branches.

Interesting question: does this mean GDC is going to branch, too?

That's very funny. You know something? No new features for you!

Thursday, 7 June 2007

You can't touch this.

So, there seem to be quite a few people who are confused about the upcoming changes to const-ness in D 2.0. The topic came up, once again, in the #D IRC channel, and I've been persuaded to write up how I understand it.

It is worth pointing out that this is all up in the air at the moment, meaning all of this could be rendered incorrect before I even finish writing it. Also, I'm not going to give any examples of what the code will look like, since the syntax appears to have recently changed, and I'm still not entirely sure what the impact is. Think of the examples as being "pseudo code."

That said, the concept itself seems fairly stable, which is what I'm going to talk about. With that, on with the show.

Firstly, it's important to disregard any pre-conceived notions you may have about what the word const means. What it will mean in D 2.0 is very different to what it means in D 1.x, C, C++ or any other language you might know.

Now, with D 2.0 will come three kinds of const-ness. For the moment, we will call these "storage const", "reference const" and "reference immutable."

Storage const is the easy one; it simply means that whatever bits are being used to store a particular variable cannot be changed once they are initialised. For example, every int variable in your program has four bytes of memory allocated to it somewhere, be it on the stack or in the static data segment. Storage const says that you can store a value into this int, but you cannot change it afterwards. One pretty obvious use of this would be:

"storage const" PI = 3.1415926535897931;

Unless you work for Xerox, you aren't likely to change the value of Pi. The same thing can be done with variables in functions:

void doStuff(int a, int b)
{
    "storage const" int c = a + b;
}

This is fine since we set the value of c during initialisation. However, the following wouldn't compile:

"storage const" int meaning_of_life = 42;
meaning_of_life = 72;

Once a storage const variable has had its value set, there's no changing it. There's one other interesting question: what about reference types like pointers, arrays and objects? Remember that a reference type stores an address of some kind. A storage const pointer, for example, would have the pointer itself fixed, but what it points to is unaffected. For example:

"storage const" int* stuff = new int;

(*stuff) = 23; // This is cool
(*stuff) = 19; // Still cool
stuff = new int; // This ain't cool at all.

In this example, we're free to change what stuff points to, but we are not allowed to change stuff itself.

Once you've wrapped your head around that, we can move on to "reference const".

Unlike storage const, reference const says nothing about a variable's storage. Rather, it deal with what that variable points at. If we take the previous example and change "storage const" to "reference const", we get:

"reference const" int* stuff = new int;

(*stuff) = 23; // This isn't cool anymore.
(*stuff) = 19; // Neither is this.
stuff = new int; // This *is* cool.

What reference const is saying is "Ok, you've got a pointer to something, but you aren't allowed to change it; feel free to point at something else, though!" In a way, reference const can be thought of as a read-only view of some data. You can look, but can't touch.

Another important thing is that reference const is transitive. Take this, for example:

"reference const" int** foo;

foo is a reference const pointer to a pointer to an int. Notice that we didn't say anything about the int* we're pointing to; just foo itself. That means we can't do this:

(*foo) = new int;

However, we also can't do this:

(*(*foo)) = 42;

This can be kinda tricky to explain, so just think of it like this: once you go through a "reference const" reference (be it a pointer, array or object reference), all references from that point onward are "reference const". Essentially, you can't change anything beyond a reference const.

Go back over that, and make sure you understand what's going on. Once you've done that, we'll move on to the last and strongest form of const-ness: "reference immutable".

Now, reference immutable is very similar to reference const; it applies to reference types and says "you can look, but can't touch." The important difference is that where reference const says "you can't touch", reference immutable says "no one can touch." Reference immutable is a guarantee to you and the compiler that nothing can modify the data being pointed at. For instance, if you have a static string in your program stored in the data segment (a part of the executable itself), that string is reference immutable since no one is able to change it. In fact, a reference immutable piece of data may not even necessarily have an address in memory; if it never changes, the compiler could potentially just insert its value wherever it gets used.

From the user's end, there isn't any appreciable difference between something that's reference const and something that's reference immutable. The difference is on the creator's end; you might get a reference const buffer that the owner keeps updating, or a reference immutable buffer that gets filled and then fixed.

So, the only question remaining is: how do these three concepts map to keywords in D 2.0? At the moment, it looks like this:

  • "storage const" will be final,
  • "reference const" will be const and
  • "reference immutable" will be invariant.

Let me just reiterate that I'm not saying anything about where these keywords go, where you put parenthesis, etc. I'm just trying to explain the concepts; don't take any of the code examples above literally.

One last thing I'd like to point out (as suggested by Oskar Linde): whilst reference const and reference immutable are part of a variable's type, storage const isn't; it only affects one particular variable. For example:

"storage const"   int  a;
"reference const" int* b;

int  c = a; // You can do this; typeof(a) is int
int* d = b; // But can't do this; typeof(b) is "reference const" int*

Hopefully, that's helped to clear things up, at least a little. If you have any questions or corrections, let me know and I'll try to clear it up.

Stop! Hammer time!

Incidentally...

... while(nan) doesn't compile since nan isn't defined. Technically, it should be while(real.nan), but that was a bit unwieldy.

As for what it does, it turns out that despite nan comparing false to everything including itself, it's effectively the same as while(true). Go figure.