Thursday, 7 June 2007

You can't touch this.

So, there seem to be quite a few people who are confused about the upcoming changes to const-ness in D 2.0. The topic came up, once again, in the #D IRC channel, and I've been persuaded to write up how I understand it.

It is worth pointing out that this is all up in the air at the moment, meaning all of this could be rendered incorrect before I even finish writing it. Also, I'm not going to give any examples of what the code will look like, since the syntax appears to have recently changed, and I'm still not entirely sure what the impact is. Think of the examples as being "pseudo code."

That said, the concept itself seems fairly stable, which is what I'm going to talk about. With that, on with the show.

Firstly, it's important to disregard any pre-conceived notions you may have about what the word const means. What it will mean in D 2.0 is very different to what it means in D 1.x, C, C++ or any other language you might know.

Now, with D 2.0 will come three kinds of const-ness. For the moment, we will call these "storage const", "reference const" and "reference immutable."

Storage const is the easy one; it simply means that whatever bits are being used to store a particular variable cannot be changed once they are initialised. For example, every int variable in your program has four bytes of memory allocated to it somewhere, be it on the stack or in the static data segment. Storage const says that you can store a value into this int, but you cannot change it afterwards. One pretty obvious use of this would be:

"storage const" PI = 3.1415926535897931;

Unless you work for Xerox, you aren't likely to change the value of Pi. The same thing can be done with variables in functions:

void doStuff(int a, int b)
    "storage const" int c = a + b;

This is fine since we set the value of c during initialisation. However, the following wouldn't compile:

"storage const" int meaning_of_life = 42;
meaning_of_life = 72;

Once a storage const variable has had its value set, there's no changing it. There's one other interesting question: what about reference types like pointers, arrays and objects? Remember that a reference type stores an address of some kind. A storage const pointer, for example, would have the pointer itself fixed, but what it points to is unaffected. For example:

"storage const" int* stuff = new int;

(*stuff) = 23; // This is cool
(*stuff) = 19; // Still cool
stuff = new int; // This ain't cool at all.

In this example, we're free to change what stuff points to, but we are not allowed to change stuff itself.

Once you've wrapped your head around that, we can move on to "reference const".

Unlike storage const, reference const says nothing about a variable's storage. Rather, it deal with what that variable points at. If we take the previous example and change "storage const" to "reference const", we get:

"reference const" int* stuff = new int;

(*stuff) = 23; // This isn't cool anymore.
(*stuff) = 19; // Neither is this.
stuff = new int; // This *is* cool.

What reference const is saying is "Ok, you've got a pointer to something, but you aren't allowed to change it; feel free to point at something else, though!" In a way, reference const can be thought of as a read-only view of some data. You can look, but can't touch.

Another important thing is that reference const is transitive. Take this, for example:

"reference const" int** foo;

foo is a reference const pointer to a pointer to an int. Notice that we didn't say anything about the int* we're pointing to; just foo itself. That means we can't do this:

(*foo) = new int;

However, we also can't do this:

(*(*foo)) = 42;

This can be kinda tricky to explain, so just think of it like this: once you go through a "reference const" reference (be it a pointer, array or object reference), all references from that point onward are "reference const". Essentially, you can't change anything beyond a reference const.

Go back over that, and make sure you understand what's going on. Once you've done that, we'll move on to the last and strongest form of const-ness: "reference immutable".

Now, reference immutable is very similar to reference const; it applies to reference types and says "you can look, but can't touch." The important difference is that where reference const says "you can't touch", reference immutable says "no one can touch." Reference immutable is a guarantee to you and the compiler that nothing can modify the data being pointed at. For instance, if you have a static string in your program stored in the data segment (a part of the executable itself), that string is reference immutable since no one is able to change it. In fact, a reference immutable piece of data may not even necessarily have an address in memory; if it never changes, the compiler could potentially just insert its value wherever it gets used.

From the user's end, there isn't any appreciable difference between something that's reference const and something that's reference immutable. The difference is on the creator's end; you might get a reference const buffer that the owner keeps updating, or a reference immutable buffer that gets filled and then fixed.

So, the only question remaining is: how do these three concepts map to keywords in D 2.0? At the moment, it looks like this:

  • "storage const" will be final,
  • "reference const" will be const and
  • "reference immutable" will be invariant.

Let me just reiterate that I'm not saying anything about where these keywords go, where you put parenthesis, etc. I'm just trying to explain the concepts; don't take any of the code examples above literally.

One last thing I'd like to point out (as suggested by Oskar Linde): whilst reference const and reference immutable are part of a variable's type, storage const isn't; it only affects one particular variable. For example:

"storage const"   int  a;
"reference const" int* b;

int  c = a; // You can do this; typeof(a) is int
int* d = b; // But can't do this; typeof(b) is "reference const" int*

Hopefully, that's helped to clear things up, at least a little. If you have any questions or corrections, let me know and I'll try to clear it up.

Stop! Hammer time!


Dan said...

Wow, thanks. That really cleared this up for me, I've been spending way too much time trying to piece it all together lurking in the NG.

Jesse Tov said...

You say this is nothing like _const_ in C/C++, but then the first two things you introduce are! A "storage const" int ** is "int ** const" in C, and a "reference const" int ** is just "int const * const *" in C. I've always found the C syntax for this appealing, but I realize that many people don't. Is the goal here to reduce confusion, because it's nothing new.

Also, you say that storage const-ness isn't part of the type system, but certainly if you have a storage const int a, then &a must be a reference const int *. There's something type-y going on there.

On the other hand, immutability is a nice feature to have, and I agree that's nothing like C(++) const. (Though there is some kind of relationship between const, invarient, and volatile, eh?)

Dk said...

Just a general comment: now that D 2.0 has been released with the shiny new const stuff, it seems that what I wrote here is still true (go me!)

That said, I urge everyone interested in D's const system to download the new compiler from the DigitalMars website and play with it. Reading about it is all well and good, but nothing beats being able to poke it with a stick...

Jesse: when I first started drafting this, I used the actual keywords. A lot of people will see the word "const" and think "I bet it works just like it does in C++!" In some ways it does, but in others it doesn't. It's simpler to just beat people over the head with "it's not C++; don't try comparing it because I'm sick of splitting hairs on this" than not.

Also, I never said that storage const wasn't part of the type system: I said it's not part of a variable's type. That taking the address of a storage const variable gives you a reference immutable type is kind of a side-effect of the addressof operator.

Finally, on immutability, there are two schools of thought at the moment: that it's really good to have, or that it's totally useless. I suppose we'll soon find out which is more correct.

Tyler Prete said...

Funny that you say no one wants to change the value of pi, because that is precisely what you did.

"storage const" PI = 3.1415926535897931;

That last value should be a 2 ;)

I'm just poking fun though. This was a good article, very informative.

Tomas said...

Good explanation. I like this disambiguation between different usages of const. C++ does not provide immutable storage, and does not differentiate between the other two uses of const.

Will D allow one to call functions on a reference const value that mutate the object, or is there a way to mark functions as non-mutating (as in C++). I feel much of the value of const is lost without this.

Dk said...

Tyler: Well, at least I'm closer than Terry Pratchett was in Going Postal; he used 3.

Tomas: Anything's possible with enough casting. However, unlike C++, casting away constness and then changing something has undefined behaviour. Also, you can have member functions marked as "const" that prohibit you from changing an object through its this reference.