Friday, 8 June 2007

Mixins, CTFE and shell-style variable expansion

Edit: Frits van Bommel pointed out that I'd forgotten about declaration mixins, which have now been added.

It didn't take all that long before PHP started annoying me. Don't get me wrong; it's a great language for hacking together dynamic web pages, but if I have to look at another full-blown CMS written in it, I am going to scream.

That said, PHP did get a few things right. One of these is the syntax it uses for variable expansion in strings. Basically, it's stolen almost verbatim from the Bourne shell and relatives:

$life = 42;
echo "Meaning of life: $life";

Which obviously expands to Meaning of life: 42. PHP adds to this by allowing you to use braces to expand a more complex expression involving member and array access.

Sadly, those of us in the statically-typed world have to make do with rather less attractive options like C-style format strings...

printf("Meaning of life: %d", life);

...C#-style format strings...

WriteLine("Meaning of life: {}", life);

..."whisper" syntax...

Cout("Meaning of life: ")(life).newline;

...or worst of all, the hideous monster that is C++'s iostreams.

cout << "Meaning of life: " << life
     << endl;

What would be really cool is a formatting function that uses PHP/shell style inline expansion, optionally with C-style formatting options. Best of both worlds. So, today I sat down and implemented such a formatter which makes this possible:

int a = 13;
int b = 29;

mixin(writeFln(
    "     a: $a" "\n"
    "     b: $b" "\n"
    " a + b: ${a+b}" "\n"
    "in hex: $(x){a+b}"));
     a: 13
     b: 29
 a + b: 42
in hex: 2a

Those already familiar with both string mixins and CTFE will no doubt already have an idea how I accomplished this; you can skip to the code at the end if you like. For those of you who would like to know how D makes this possible, read on.

Compile time function execution

The first feature that D adds to make this possible is compile time function execution or CTFE. Now this is basically just an extension of regular constant folding. Imagine you have code like this:

int registry = 1700 + 1;

You would expect any good compiler to be able to simplify this code since both numbers are constant (and the compiler hopefully knows how to add two numbers.) CTFE simply expands on this and allows you to use function calls in constant-folding, provided they satisfy a stringent set of requirements. The requirements are too long to list here, but suffice to say that they all boil down to two things: stick to basic structural constructs (no labelled breaks, gotos, nested functions, etc.), and don't do anything that involves pointers or memory allocation.

For example, CTFE allows us to do things like this:

char[] repeat(char[] a, uint n)
{
    char[] result;
    for( ; n>=0; --n )
        result ~= a;
    return result;
}

It's worth noting that simple array operations like changing the length, concatenation, etc. are allowed for CTFE.

String mixins

The other half of this magic trick are string mixins. In D, there are actually four different mixins:

  1. "Regular" mixins, or as I call them, scope mixins. These will, when you specify a template, mix the contents of that template into the current scope. For example:

    template foo()
    {
        char[] bar = "baz";
    }
    
    void main()
    {
        mixin foo;
        writefln(bar); // Outputs "baz".
    }
  2. Statement mixins. These mixins take a list of statements (as in D source code) as a compile-time constant string, and inserts them into the souce at that location. For instance, we could replace the last line of the previous example with:

    mixin("writefln(bar);"); // Outputs "baz".
  3. Expression mixins. These are like the statement mixins, except that they allow you to mix in any semantically complete expression. That includes things like "bar", "12+34", "a = foo(PI)" and even "writefln(bar)" (note the lack of a semicolon!)

  4. Declaration mixins. These are like statement mixins, except instead of mixin in executable statements, they mix in declarations (like functions, classes, imports, etc.).

    Many thanks to Frits van Bommel for pointing out that I stupidly missed this one.

Those last three are collectively known as "string mixins" since they take plain old strings of D source code. Now, you may be wondering how mixing in plain D source could be useful. It isn't, until you combine it with a CTFE function (or a template) that produces D source code from some other format.

Putting them together

The idea now is quite straightforward. We're going to write a CTFE compatible function that takes our format string and converts it into D source code. We then feed the result of this function into a string mixin which inserts our freshly generated code into our source file.

So what are we aiming for? Basically, we want to turn this:

"foo $bar baz"

Into this:

"foo ",bar," baz"

There are various ways of doing this, but my personal favourite is to write a simple state machine to process the string. So how would such a state machine look in a CTFE function? Lots of weird, crazy syntax? Sadly, no. It's actually pretty anticlimactic...

char[] ctfe_format_string(char[] fs)
{
    State state;
    char[] result;

    foreach( char c ; fs )
    {
        if( state == State.Default )
        {
            // output character & look for '$'
        }
        else if( state == State.PostDollar )
        {
            // work out if this is a variable or not...
        }
        else if( state == State.Varname )
        {
            // write out the variable name...
        }
        // ...
    }

    return result;
}

That's pretty much it; the actual details are just ordinary string manipulation code. There's really nothing terribly interesting to it at all. Aside from that, there's a few convenience methods for joining the string pieces together and dumping the result to STDOUT. You can check out the details by grabbing the source code. In fact, I encourage you to go read it now, just to see how stupidly simple this kind of manipulation is.

I mean, look at this:

// Finish the variable expansion
result ~= "))";

That's about as complex as the code gets.

So, OK; we've got it splitting strings up. Yeah, it's cute, but can you do anything else with it?

Perhaps the best example to date is Don Clugston's BLADE library which uses CTFE and string mixins to generate near-optimal floating-point vector code from library code.

Another example is Kirk McDonald's Pyd which uses string mixins behind the scenes. Or how about Gregor Richards' compile-time brainf**k compiler?

And there are other possibilities that haven't been explored yet. Like compiling SQL or LINQ statements, or creating parsers out of BNF grammars.

So the next time you've got some recurring pattern that you can't quite express concisely using ordinary code, think about how a little CTFE and mixin magic might help things along.

"Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats." —Howard Aiken

The source code for the formatter can be downloaded from the D wiki here, and is licensed under the BSDv2 license.

3 comments:

Anonymous said...

In D, there are actually three different mixins:
You missed one: mixin declarations.

Dk said...

Those were "statement mixins" (at least, I sure hope they are!)

Anonymous said...

No, statement mixins (or mixin statements, as the spec calls them) can only appear where statements are valid. (i.e. anywhere you can call writefln() if std.stdio is imported)

Mixin declarations on the other hand are valid anywhere declarations (surprise!) are valid, meaning they can be used to add global variables, functions, unit tests, attribute specifiers, import declarations and lots of other good stuff. Basically anything that can occur directly after a module declaration.

Take a look at the changelog entry for DMD 1.005, where string mixins were added. There are clearly 3 types of them defined.