Thursday, 15 April 2010

NOCOMMIT for Mercurial

One problem with DVCSes (DVCSii?) is that because they encourage committing often, I sometimes accidentally end up committing temporary debugging code I didn't mean to.

So here's a simple little Mercurial pre-commit hook that blocks any commit which adds a line containing the string !NOCOMMIT.

#!/usr/bin/env python

#
# Prevents any commits which contain the string `!NOCOMMIT'.
#
# Based on code at:
# http://hgbook.red-bean.com/read/handling-repository-events-with-hooks.html
#

import re

def scan_nocommit(difflines):

    linenum = 0
    header = False

    for line in difflines:
        if header:
            # capture name of file
            m = re.match(r'(?:---|\+\+\+) ([^\t]+)', line)
            if m and m.group(1) != '/dev/null':
                filename = m.group(1).split('/', 1)[-1]
            if line.startswith('+++ '):
                header = False
            continue

        if line.startswith('diff '):
            header = True
            continue

        # hunk header - save the line number
        m = re.match(r'@@ -\d+,\d+ \+(\d+),', line)
        if m:
            linenum = int(m.group(1))
            continue

        # hunk body - check for !NOCOMMIT
        m = re.match(r'\+.*!NOCOMMIT.*', line)
        if m:
            yield filename, linenum, line[1:].rstrip()

        if line and line[0] in ' +':
            linenum += 1


def main():
    import os, sys

    msg_shown = False

    added = 0
    for filename, linenum, line in scan_nocommit(os.popen('hg export tip')):
        if not msg_shown:
            print >> sys.stderr, 'refusing commit; nocommit flags present:'
            msg_shown = True
        print >> sys.stderr, ('%s:%d: %s' % (filename, linenum, line))
        added += 1

    if added:
        # Save commit message so we don't need to retype it
        os.system('hg tip --template "{desc}" > .hg/commit.save')
        print >> sys.stderr, 'commit message saved to .hg/commit.save'

        return 1


if __name__ == '__main__' and not __file__.endswith('idle.pyw'):
    import sys
    sys.exit(main())

To use this:

  • Save the above as .hg/block_nocommit.py in your repository.
  • If you're on *nix, set execute permissions on the script.
  • Add pretxncommit.nocommit = .hg/block_nocommit.py in the [hooks] section of your .hg/hgrc file.
Now all I need to do is figure out how to write a 'block commits that contain bugs' script...

Sunday, 4 November 2007

Firefox: How to Crash in Style

So Firefox just crashed on me. Again. So why, exactly, am I grinning like an idiot?

Back in the appreciatively nostalgic days, when Phoenix crashed, it just vanished from your desktop with maybe a generic "This program just lost the page you spent an hour trying to find with Google: sucks to be you" dialog and that's it. Firefox 2 made things a little more bearable by keeping track of what tabs you had open. In the event of a crash, it could then restore your session pretty much most of the time.

The one bug-bear was the damned crash reporter. It made you fill in all sorts of stuff, and click buttons, and half the time it just sat there twiddling its thumbs complaining about how it couldn't access the internet.

Enter the current alpha of Firefox 3. This thing makes crashing almost a total non-issue. I was reading through some old Coding Horror posts when Firefox just up and died; vanished from the face of the desktop. A second or two later, it pops up this dialog saying "Aww jeeze; sorry about that. We screwed up, but we'll get right on it." It then gives me a little field to pop in my email address if I want to be kept in the loop, a quit button and a restart button.

I clicked restart, and a few seconds later it's like the crash had never happened. The best part of all is that in the background, it's sending crash data back to those tireless peeps at Mozilla so they can figure out why it crashed, and stop it from happening again.

Why is this so much better than anything else I've used? I think Mozilla have managed to do three things really right with this:

  1. There's no long speech about how sorry they are, and how my bug report will help them and that my privacy is always blah, blah, blah, where's the cancel button?

  2. They don't ask me for any information at all. The crash reporter is apparently smart enough to get everything they need.

  3. There are two prominent buttons: quit and restart. Clicking restart puts me straight back where I was without any fuss.

Contrast this to the OpenOffice.org crash experience1. When that crashes, it throws up a full-blown wizard, asking for details on what you were doing. When it restarts (eventually, since OOo is so damnably lethargic) it then throws another wizard at you asking you if you want to recover your work. Well, of course I want to recover my work. What a stupid question. And then when it's done that, it stops you again to let you know that it's finished. That's great; get out of my way, already!

Now, all this is not to say that I'm happy with how often Firefox 3 appears to be crashing; I sincerely hope it gets more stable before it gets released proper. But when Firefox does crash, it does what it can to minimise the impact and let you get on with what you were doing.

Really, in an ideal world, this is how all software would work.

To get back to the old car analogies, Firefox 3 is the car that inexplicably explodes every few hours followed by immediately re-assembling itself, all while leaving the driver unscathed.

[1]: Note that I haven't actually had OpenOffice.org crash on me in a while. The following description could, in fact, be an amalgamation of various crash reporters I've experienced.

Wednesday, 29 August 2007

Fakin' closures

Closures are awesome. There are so many algorithms that can be implemented in a very straightforward manner once you have the ability to make little customised functions and pass them around.

For example, it makes functional programming in D a snap. Here's a simple example of what I mean:

string format_list(real[] parts, string sep)
{
    string reduce_closure(string left, string right)
    {
        return left ~ sep ~ right;
    }

    string map_closure(real n)
    {
        return toString(n);
    }

    return reduce(&reduce_closure, map(&map_closure, parts));
}

The one failing of D's closures are that... they aren't really closures. See, D's closures are made up of a function pointer, and a pointer to the enclosing function's stack pointer. The stack pointer is what allows you to actually create closures, but it's also why you must not use them outside the enclosing scope's lifetime. As soon as your closure goes out of scope, it's bye-bye stack frame, and hello Mr. Segmentation Fault if you should try to call it. Oh dear.

In practice, this can be worked around by putting your closure inside a class and heap allocating it, but that's annoying. Wouldn't it be nice to be able to just return the damn things out of your functions, or store them somewhere for later use?

If you're easily put off my horrible, hackish code: you'd best leave now. It's going to get nasty.

Remember how I said that closures are made up of a pointer to some code and the enclosing function's stack pointer? Well, on x86 the stack grows downwards, and a function's arguments are stored just above the stack pointer with its local variables below the stack pointer.

Say we have a function foo, with a closure bar, and another function baz.

If we call baz from foo, passing our closure bar, then baz would know two important things:

  1. It would have an upper-bound on foo's local variables by virtue of knowing its stack pointer (which is contained in the delegate to bar) and

  2. it would have a lower-bound based on the address of its own first argument on the stack.

So if we have both a lower and upper bound, we can create a heap copy of foo's local variables, and store that in our delegate instead of the pointer to the stack.

Just to prove the concept, here's a working implementation:

module closures;

import std.stdio;

/**
 * Takes a closure defined in the caller, copies the local variables onto the
 * heap, and returns a modified delegate.
 */
dgT closure(dgT)(dgT dg)
{
    void* end_ptr = dg.ptr;
    void* start_ptr = &dg;

    auto context = new ubyte[end_ptr-start_ptr+1];
    context[] = cast(ubyte[]) start_ptr[0..context.length];

    dg.ptr = &context[$-1];
    return dg;
}

/**
 * Creates a closure that sets *ptr to value when called.
 */
void delegate() make_dg(uint* ptr_, uint value_)
{
    auto ptr = ptr_;
    auto value = value_;

    void dg()
    {
        *ptr = value;
    }

    return closure(&dg);
}

void main()
{
    uint f;

    auto fn1 = make_dg(&f, 42);
    auto fn2 = make_dg(&f, 1701);

    f = 15;
    writefln("f: %s", f);
    fn1();
    writefln("f: %s", f);
    fn2();
    writefln("f: %s", f);
}

When run, it prints:

f: 15
f: 42
f: 1701

Just like one would expect. As I mentioned before, this only works if your closure doesn't access the enclosing function's arguments. But apart from that, it's a pretty neat trick, don'tcha think?

Well, hey, let's just make everything into a closure, and then we'll have our general garbage collector, installed by 'use less memory'. —Larry Wall

Sunday, 26 August 2007

The Future of D Is Aaargh, My Eyes!

So the first ever D Conference has come and gone. Sadly, being a poor uni student, Australian and busy like an anime nerd in Akihabara with 100,000 spare yen meant I couldn't go.

Thankfully, the ever-wonderful Brad Roberts has posted up most of the slides from the various speakers so the rest of us can take a peek. One of particular interest1 is the set from Walter and Andrei's talk on the future of D.

I encourage you to read the full set of slides, but here's what I think of it (for what that's worth.)

First up, some very welcome additions to the language that will make every-day programming a lot nicer. There's function and template overloading which will finally allow you to do this:

void foo(int i);
void foo(T)(T* t);

What can I say but "finally!"? Also on the topic of overloads are function overload sets. Currently, if you import overloaded functions from more than one module (for instance: you import two modules that both have a global toString function), you need to either fully qualify the particular overload you're interested in, or manually alias in each module's overloads.

Function overload sets do away with this provided each overload is distinct. It's a small thing, but it's the small things that make programming in D so much more pleasant than in C or C++2.

Another of my pet hates is going the way of spelling and grammar online: having to qualify enumeration members. Now you'll be able to elide the enum's name before a member in cases where the compiler already knows the enum's type. Thank god for that.

Then there's the upgrade to switch. One thing I love about D's switch is that by default it will loudly complain (by crashing your program) if you don't have a case to handle the value you're switching on. Walter takes it one step further with final switch which will actually refuse to compile if you don't have a case for every possible value.

Obviously more useful for enums and subsets of integers than, say, strings. But that's OK. No one's perfect.

And while we're on upgraded constructs: static foreach. About time. Not that you can't do it with a range tuple template, but this is much cleaner4

So that's the day-to-day stuff. What about the seriously cool stuff we can dangle in front of C++ and Java programmers to make them cry on the inside?

Well, first up we've got the construct that comes up every few months on the newsgroup: type method extensions. Or, as Walter calls it, uniform function call syntax.

foo(a, args...);
a.foo(args...);

Those two will now be interchangeable. And it'll work with built-in types like int, too. Having suggested this myself a few times, I couldn't be happier about this coming.

It seems that another idea that I've shared with many people is being worked on: pure functions. The concept of "pureness" comes from functional programming. A pure function is one whose output depends only on its arguments, and has no side-effects.

Take, for instance, the following function:

int foo(int a, int b)
{
    return a + b;
}

Yes, it's contrived, but stay with me. This function's result depends entirely on its inputs, and has no side-effects; it's pure. What does this mean?

  • It means that the compiler only ever has to call it once. If it sees the same function call in two places in a function, it can simply compute the result once, and reuse it. You can't do this with regular functions because the compiler can't guarantee that the function doesn't have side-effects.
  • More interestingly, it means that the compiler can actually cache results from the function for future re-use. It could even theoretically go the whole hog and just pre-compute every possible result at compile-time.
  • Best of all, because pure functions have no side-effects and don't depend on or alter global state, they can be automatically parallelised. Suddenly, it just got a whole lot easier to make use of all those cores we have these days.

But wait, there's more! After bitching and whining about it for years, structs are finally getting constructors and destructors!

That's fantastic. Know what's even better? They're getting a copy operator overload, too. This (along with the awesome alias some_member this; declaration) means that D will finally be able to easily support the last major memory-management technique: reference-counting. I can see this being huge in the game development circles.

There are also more new operator overloads: we're also getting opImplicitCastTo and opImplicitCastFrom. This means we'll be able to construct custom types that can seamlessly "pretend" to be other pre-existing types6.

This also ties into a new concept being introduced called polysemous values: these are values that have an indeterminant type7. Things like cast(int)1 + cast(uint)2, or a type with multiple implicit casts. This allows the compiler to reason about values for which it can't immediately determine their type, deferring the decision until later.

We're also getting a new array type that's distinct from slices; slices will work the same as they do now, except you won't be able to modify their length property to resize them, whilst you will be able to with arrays. Anyone who has been caught by obscure and baffling bugs when you start accidentally resizing slices will appreciate this; it makes D's arrays that much tighter and safer.

A feature that I actually poo-poohed a few days ago is getting in: struct inheritance. Sorry, "inheritance." It means that structs will be able to implement interfaces, complete with static checks to make sure all the methods are there, but won't be castable down to that interface. Kind of like C++ concepts for structs.

Another feature that's got me salivating in anticipation are static function parameters. That's where you mark one or more parameters to a function as being static, meaning that the value is "passed" at compile-time. It allows you to write functions that execute partially at compile-time. Think of it as a cross between CTFE and regular functions.

This also means you will be able to write functions that have different behaviours depending on whether some of the arguments are known at compile-time or not. That means you could use a slower but CTFE-compatible algorithm to perform some of the function at compile-time without having to worry about users accidentally using the slower version at runtime.

Then comes the big one: AST macros. These are kind of like templates except that instead of operating on specific types of values, they operate on bits of code. The simple way to think about it is this: when you pass something to a template, the compiler goes off and works out what that expression means, then gives it to the template. For macros, on the other hand, the compiler just hands the macro the unparsed expression and says "here, you work this out."

I have a funny feeling Don Clugston's already incredible BLADE library is going to be even more impressive once we get AST macros. If this keeps up, we may even be able to finally kill FORTRAN8.

Now, as I mentioned in an earlier post, I was looking forward to writing a neat shell-style variable expansion macro when all this came around. Except Walter's gone and added it into the language itself.

macro say(s) { ... }
auto a = 3, b = "Pi", c = 3.14;
say("$a musketeers computed $b to be $c\n");

Not quite as flexible as what I had in mind, but still. Pants.

Other interesting additions include:

  • the creation of a standard template library,
  • a nothrow contract on functions,
  • the order of function argument evaluation being defined,
  • three new string literal forms:
    • delimited strings,
    • code strings (which would have been so much more useful before macros) and
    • heredoc strings, and
  • a special return storage class for function arguments that let you create both const and non-const versions of a function without having to duplicate code.

Whew! Needless to say, I'm stoked about all this. To quote a famous (and sadly, very dead) Australian: I'm excited!

This really crystallises why I threw my lot in with D for me: because unlike C which is basically dead, C++ which is bloated, arthritic and has eight heads and Java which makes me want to beat myself over the head with a brick...

D makes programming fun. It doesn't just let me tell the computer what to do, it makes it easy. It's like statically-typed Python in a lot of ways.

But I do have to disagree with Walter on one point: the future of D isn't bright.

It's blinding.

[1]: Not to say they aren't all interesting, mind you.

[2]: The big things make programming in D more powerful, safe and flexible. And they cure baldness, too3.

[3]: Not speaking from personal experience, mind you.

[4]: So clean that if you use it twice a day, it'll make your teeth all sparkly.5

[5]: Just be careful not to get any on your gums... it's the tiny shards of glass, y'see...

[6]: It also vindicates the decision to use the opX naming system instead of C++'s operator X notation; implicit casting doesn't have a symbol.

[7]: I would have been tempted to call them "quantum values" or "entangled values" or even "boxed cats". But maybe that's just me.

[8]: Unlikely, but it's always nice to dream. I mean, some people voluntarily use Java!

Blinded by the light; revved up like a Deuce, another runner in the night... Madman drummers bummers, Indians in the summer with a teenage diplomat and yes I do know all the words to this damn song thankyouverymuch.

Saturday, 28 July 2007

pragma(msg, "Gday!");

No, I'm not dead.

Just thought I'd drop a line to Pragma who has just chiselled himself out a new corner of Cyberspace over at Phase Positive.

In a hole in the ground there lived a hobbit...

Tuesday, 19 June 2007

Const Wars

A long time ago, in an IRC channel far, far away...

Episode 5

A NEW CONST

It is a period of civil war. Rebel posters, striking from irc and the newsgroups, strive to win their final victory against the evil Constant Empire.

During this battle, Rebel agents were horrified to discover that the Empire's ultimate weapon, the D 2.0 Release, a new compiler with three kinds of const, had been released.

Pursued by the Empire's sinister agents, Princess Mutable races against the clock to find a way to destroy the terrible compiler to save her people and restore mutability to the galaxy...

Episode 5

A NEW CONST

It is a period of civil war. Rebel posters, striking from irc and the newsgroups, strive to win their final victory against the evil Mutable Empire.

During this battle, Rebel agents were horrified to discover that the Rebel's ultimate weapon, the D 2.0 Release, a new compiler with three kinds of const, was feared and hated.

Pursued by the Empire's sinister agents, Princess Invariant races against the clock to find a way to justify the badly needed compiler to save her people and restore type safety to the galaxy...

That's the wonderful thing about religion; you can use it to justify anything, with no requirement for facts or logic!

Thursday, 14 June 2007

Wrapping functions for fun and profit

Error handling sucks. I mean, I've got code to write! I don't want to have to waste my time making sure my function calls actually worked. After all, nothing ever goes wrong when I'm running it, so any problems are the user's fault.

Or something along those lines. That's one thing that always annoyed me about C-style libraries: you had to sit there writing all this code to handle errors on every single damn call. I can't count the number of times I've seen OpenGL examples where they never do a single error check, apparently because it's too much work.

So when I started writing GL code in D, I realised pretty quickly that this was sub-optimal. After all, you want to know when something's gone wrong (if you don't find out, how can you ever fix it?) but having to write all those ifs was out of the question on account of me being supremely lazy.

One way around this is to have a function that takes the error code from a function, and throws an exception if its something other than NO_ERROR. Sadly, OpenGL doesn't use return values; it has a separate function called glError that tells you if an error's occured.

OK, we can deal with this; we just need a templated function that checks for an error, throws an exception if there was one, or passes back what we pass into it.

T glCheck(T)(T result)
{
    if( glError() == GL_NO_ERROR )
        return result;
    else
        throw new GLException("OH NOES!");
}

And that does work. Well, except for functions that have void return types. That's when it starts to get a little ugly; we need to have a different function that we call afterwards.

And this is all well and good if you happen to like simple solutions to problems. Not me, though. I wanted something that I could stick in front of any GL call and have it do error checking. I also wanted to try and remove the double closing paren problem (every time you nest an expression, it gets just that tiny bit uglier).

So let's change things around a bit. Instead of a function that we pass the GL call's result to, let's create a function that wraps the GL call.

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
    alias ReturnType!(Fn) returnT;

    static if( is( returnT == void ) )
        Fn(args);
    else
        auto result = Fn(args);

    glCheckError();

    static if( !is( returnT == void ) )
        return result;
}

What we're doing here is creating a function that has the exact same signature as the function we want to call. When we call this wrapper function, it calls the underlying function, checks for errors (throwing an exception as necessary: that's the job of glCheckError), and returning the result.

Those static ifs are there because you can't declare a variable of type void in D, which kinda sucks. You'd use the above function like this:

glCheck!(glClear)(GL_COLOR_BUFFER_BIT);

For those keeping count, that's one character longer than the "pass-through" style. The nice thing is that this works uniformly with any function, no matter its return type.

However, we can still improve this. For instance, since we have an alias to the function being called, we can improve the call to glCheckError to this:

glCheckError((&Fn).stringof)

This allows us to report the exact GL call that failed (normally, all we would get is an exception telling us which error code we got). Even cooler, however, is we can use this information to actually log our GL calls as they happen:

version( gl_LogCalls )
{
    log.writef("%s",(&tFn).stringof[2..$]);
    log.writef("(");
    static if( args.length > 0 )
    {
        log.writef("%s", args[0]);
        foreach( arg ; args[1..$] )
            log.writef(", %s", arg);
    }
    log.writeLine(")");
    version( gl_LogCalls_Flush )
        log.flush();
}

If we place that in our glCheck function just before we call the function itself, it gives us the ability to trace through our GL code without having to hunt through functions. This can be really useful when you've got some weird behaviour, and can't figure out what's causing it.

One last improvement: in OpenGL, there are times where calling glError can itself cause an error. The most obvious of these are between glBegin and glEnd calls. You can solve this by either building some logic in to glCheck to account for glBegin/glEnd blocks, or you can do what I did and split the function into two: glSafe which does the error checking and glRaw which doesn't.

Before I go, one small note: if you are using DerelictGL, you need to replace the Fn in ReturnType!(Fn) and ParameterTypeTuple!(Fn) with typeof(Fn) because of a weird bug with specialising templates on aliases to function pointers, and replace the line that logs the name of the function with:

log.writef("%s",tFn.stringof);
Wrap me up, in your love, your love takes me higher...