Praise, Curse, and Recurse

30 August 2006

The Kernel Programming Language and the Quest for Simplicity

I've been reading with considerable interest about the Kernel programming language under development by John Shutt. Information and a PDF file are available here.

I am not sure I fully understand the implications and details of his $vau and $wrap constructions, but I strongly support his goals, and the rest of his report kept me, literally, awake half the night reading it. Shutt writes:

...the mandate of the Kernel language is to have a clean design, flowing from a broad design philosophy as refined by a series of more specific design guidelines --- just one of which is that all manipulable entities should be first-class.

He is trying to create a Scheme-like language which is even more uniform, predictable, and configurable than R5RS Scheme.

I don't have a Ph.D. in the programming language field, so bear with me if my terminology is a little vague. It seems that the big thing Shutt is attempting is to give fine-grained control over evaluation, essentially separating out the parameter-evaluation stage of a lambda call into a distinct and fully configurable stage. This means you can do more clever things with objects without evaluating them. I think it also points you towards macro-like constructions that don't lead off into the existing slightly swampy maze of different Scheme macro implementations. It also could be a more explicit and flexible means of handling the need for quoting.

On top of that he is also doing a number of slightly smaller things (on the scale of theoretical complexity), but which I think are great.

For instance, Shutt has in his design very explicit rules for what to do with cyclic structures, a.k.a. infinite lists. While in some situations creating these would be considered an error, in others they make perfect sense: for example, in creating trees that describe recursive-descent parsing. Therefore, giving a sensible semantics to handling cyclic structures is a sensible and intriguing thing to do. If you want to write functions which aren't willing to handle cyclic structures, you can use one of his predicates to check for that possibility.

He also has addressed a number of minor irritants which don't require a deep understanding of the lambda calculus to appreciate. For instance, he proposes that boolean evaluation should never be implicit: that is, if you want to evaluate something as a boolean value, you have to use a predicate on it to generate a boolean value, and compare it to true or false. In other words, no more "anything not explicitly false is not true" or "false is false, and nil is false, but anything else is true, and true is true" or other bizarre variant on this theme.

As someone who switches languages a lot, I have always tried to make my boolean evaluations absolutely explicit to avoid the potential confusion that implicit evaluation can bring. For example, I always write (pedantically) in C++:

if ( false == stack.full() ) {
    push()...

In addition, he has addressed the question "if everything in Scheme has a value, what is the value of a control structure?" To make this explicit he proposes an immutable value called inert, somewhat analagous to void in C/C++, and which is an explicit type and which even has explict external representation.

There's lots more great stuff in Shutt's draft report and it addresses many of the weaknesses that make Scheme implementations so maddeningly inconsistent. Read it, study it, live it. I am looking forward to Shutt's doctoral thesis and I hope that there is one day very soon a freely available implementation of Kernel to try out!

17 August 2006

A Rant about Const

WARNING: THE FOLLOWING MAY CAUSE BRAIN DAMAGE

This is an unfinished essay that sat in draft form in my bog, unpublished, for several years. It seems very unfinished, but I decided to go ahead and publish it anyway. I'm not sure it actually makes sense; it certainly needs some revision. It's possibly my thinking has just been too altered by working with pure functional programming languages such as Haskell to really understand the C++ mindset anymore. There may indeed be a point buried in here somewhere, if only I can determine what it is, but it might be so trivial or pedantic as to be uninteresting to anyone but me. But I'll let you the reader decide that.

A while back I was thinking about why I couldn't qualify the return value of a function -- in this case an object -- with const. When compiling some code with GCC, such a qualifier produced a warning, somehting like "type qualifier on return type is meaningless." GCC was calling my code meaningless.

Maybe it isn't sensible, and maybe the compiler can't enforce it, but it seemed to express what I meant, so I looked this up in the comp.lang.c++ newsgroups. What I keep reading is that returning a const value, when the value is int or float or some other plain built-in data type, as opposed to a class type, is not "meaningful."

Either I'm too dumb to understand this argument or the argument isn't actually coherent, it is just widely propagated because everyone mistakes the existing C/C++ runtime behavior for a truly sensible model of language semantics.

What they seem to be actually saying is "according to the language standard, C++ is not allowed to enforce meaningful const semantics on return types that are not reference or pointer types. And pointer types are hopelessly unsafe. But references can be pretty unsafe too."

There was a thread about this on comp.lang.c++, specifically about why you can't return a const int. One of the comments was:

"A value returned by a function is a rvalue, unless it is a reference. The notion of being (or not being) const-qualified is not applicable to rvalues of non-class types ('int' in your case). A rvalue cannot be modified just because it is a rvlaue. That's it. It doesn't make any difference whether you const-qualify it or not. That's what your compiler is trying to tell you."

See: http://groups.google.com/group/comp.lang.c++/browse_thread/thread/4e50f8abf9c11f71/6039d5b2e6612df0%236039d5b2e6612df0

As far as I can tell, the C++ standard specifies this because of the way returning values from functions has usually been implemented in C/C++ (by copying on the stack). So the caller isn't actually getting the "same" object or integer value, and in fact the original will be gone.

What this means is actually that you can freely change the return value:

int y = f(x);
y = y + 1;

But this is not actually "changing the rvalue" because the rvalue was actually copied into y; you're just changing a copy. So the statement above is correct in that the rvalue "cannot be modified." (It means that only indirectly).

In cases even when the compiler puts the return values in registers, or wherever, the standard rules about lvalues and rvalues still apply, for sanity and safety and consistency.

But it is not what I want to express when I use const with a return value -- consider it an annotation, if you will, rather than something the compiler will enforce. In the methods that generated warnings, I was using const return values to mean:

Case 1: what I'm returning is the result of a mathematical function call: f(x) always returns the same value for the given x, in a functional programming sense, so it wouldn't make sense for you to change the value I'm returning to you. If my function f is f(x) = 1 + x, then f(1) = 2. It doesn't make any sense to then say f(1) = 3 later. But of course this makes more sense in Haskell, where variables don't "vary," than in C or C++.

Case 2: what I'm returning is the result of an iterator and represents the state of another object at a given moment in the life of the program. Therefore, you shouldn't change the value itself; if you want to update it, call the iterator to get a new updated value. If say iterator_state = iterator:GetNextValue() and get a new iterator_state back, it doesn't make any sense to say iterator_state = iterator_state + 1 without calling GetNextValue(). Of course, it expresses something in C++, but your program is confusing. It would be less confusing, readability-wise, if you explicitly copied iterator_state to a working variable before altering it.

It seems to me that I ought to be able to expressthe notion that the caller of one of my object's methods should not update the object I'm returning, whether "returning" means binding it via a pointer, a reference (hidden pointer), or by copying the rvalue.

If it is a const u32_t I'm returning, the compiler should insist that the value be assigned only to a const u32_t. (It doesn't).

If I'm returning a const object, the compiler, it seems to me that the compiler should be able to require that it is assigned to a const variable. (It doesn't, but at least it doesn't generate the warning in those cases). This is because returning an object on the stack does not constitute returning an "rvalue."

This is, as far as I can tell, a completely arbitrary distinction, with the sole exception that copying an object gets to invoke the copy constructor (default, or one you provide), whereas copying a built-in type just does it for you.

In a similar thread Greg Comeau wrote:

"Also, returning a const T, where T IS NOT A BUILTIN TYPE many not be meaningless, because you may end up calling member functions on that object, and the member function may need to be a const member function."

It is "meaningful" in the sense that you may want to put the returned object into a const variable to _prevent_ calling any non-const member functions. But Greg phrased it backwards. You can always call const member functions on non-const objects. He should have said "you may end up calling member functions on that object, and you might want to prevent the calling of non-const member functions."

Example:

static AZMMultiZoneRef_c const MakeFromZonesBitmap( u32_t zones_bmp );
const AZMMultiZoneRef_c zones_ref = AZMMultiZoneRef_c::MakeFromZonesBitmap( p_data->zones_bitmap );
u32_t AZMMultiZoneRef_c::MutateMe( void_t );
zones_ref.MutateMe();

The compiler will (rightly) complain about this. But it will not complain about calling const member functions on a non-const object. The intent is that you should not be able to mutate an object that is const, except using the mutable workaround to allow hidden state changes.

However, the rules for returned objects let us completely get around this! If I don't like that restriction, I can just assign the return value to a non-const object. So this is legal:

AZMMultiZoneRef_c zones_ref = AZMMultiZoneRef_c::MakeFromZonesBitmap( p_data->zones_bitmap );
zones_ref_2.MutateMe();

Not much of a restriction, then, is it?

OK, so let's say we're more serious about enforcing const. So, we'll use a pointer or reference instead, because that will maintain the const enforcement restrictions, right? BUT... the semantics are actually quite different, and there is a serious safety hole introduced:

"When returning references, a reference bound to the return value is not valid after object deletion; when returning values, a reference bound to the return value is valid after object deletion."

And, of course, if what you are returning a pointer or reference to is a an object in a local variable, that is a seriously broken program right there. So it is kind of a "damned if I do, damned if I don't" situation.

C++ is actively hostile to functional programming. In fact, the "safety features" of the language can't even be used consistently. And with this compiler, when you try to express what you mean (taken as a given that the compiler won't enforce it), it treats it as a warning!

27 June 2006

Paying the Sin Tax on Semantics: the Lowly Struct

While this is not strictly about Scheme, I want to make a point about language design and the tax we pay (call it a sin tax, as the sins of the language designer are visited on generations of hapless language users) when there is no regularity in the syntax.

Programmers from the C/C++/Java world often look at languages like Scheme, with its relatively uniform syntax (or, I should say, almost a complete lack of syntax, where the language source is pretty much an textual form of an abstract syntax tree) and bemoan how strange it is.

But, as I will show, the alternative is actually far more complex: that alternative is to have semantics and syntax all jumbled together. The result is a jumbled mess that comes back to confuse generations of programmers again, and again, and again.

Said programmer with brains damaged from years of slogging away in C, C++, or Java might only really see this after working a bit with simple languages, such as Scheme, and then returning, as I've done with my embedded work. Going back is painful. To quote Gollum's Song:

Where once was light
now darkness falls
where once was love
love is no more

Let's look at one particular example which seems simple, but which is not, and which C and C++ programmers have come to accept. The Unix-Hater's Handbook describes this phenomenon with a quotation attributed to Ken Pier at Xerox PARC, which could also apply to the C family of languages:

"I liken starting one's computing career with Unix, say as an undergraduate, to being born in East Africa. It is intolerably hot, your body is covered with lice and flies, you are malnourished and you suffer from numerous curable diseases. But, as far as young East Africans can tell, this is simply the natural condition and they live within it. By the time they find out differently, it is too late. They already think that the writing of shell scripts is a natural act."

C has the concept of a struct, which is a sort of low-level record:

struct s_tag {
   int x;
   char y;
};

This defines an aggregate type. If you put names after the definition they define variables of that type:

struct s_tag {
   int x;
   char y;
} s_var_1, s_var_2;

Just like you define instances of an int:

int i_var_1;

You don't actually need a name for the type, unless you want to use it elsewhere; the following is legal:

struct {
    int x;
    char y;
} s_var_1, s_var_2;

However, even given the relative simplicity of this construct, we're already in a twisty maze of language features, all not quite alike! The struct is unusual in C in that it operates in a separate namespace reserved for struct and union tags, kind of like the way Common Lisp uses a distinct namespace for functions. While you normally define an instance of a variable like this (say, at file scope, or as a local variable inside a function):

int my_int;

Given the struct definition:

struct s_tag {
    int x;
    char y;
};

you cannot write:

s_tag s_var_1, s_var_2;

because the struct tag (the part that goes between the keyword "struct" and the opening curly bracket) is not a type in the general C namespace.

To access that namespace, you use the word struct again as a qualifier::

struct s_tag my_struct;

In practice, most C programmers ignore this second namespace, and create a typedef, which makes an alias of the struct tag in the general C namespace. This could be written:

struct s_tag {
   int x;
   char y;
};

typedef s_tag s_type;

but there is a commonly used shortcut, in which wrapping a typedef around the struct declaration changes its interpretation, and instead of defining variables, you are declaring a type:

typedef struct s_tag {
   int x;
   char y;
} s_type;

Since you aren't going to use the tag, you can leave it off altogether:

typedef struct {
   int x;
   char y;
} s_type;

and still use s_type as a type:

s_type my_struct;

Already, as you can see, the struct requires a kind of "little language" within C, with a lot of context-senstivity necessary to parse all this. And people thing Lisp's format is complicated! Other "little languages" include the way in which for loops and switch statements are written.

A "little language" is also known as a DSL, or domain-specific language -- you see them used a lot in Lisp-style programming, and implemented using macros. But if we wanted to try to support these various syntactic and semantic variations using Lisp or Scheme-sytle macros, we couldn't do it -- it isn't just a matter of replacing curly brackets with parentheses; there is no underlying common syntactic form. In other words, comprehending a language like C requires that we constantly change the behavior of the lowest level of the compiler, the parser, in a context-sensitive manner. This makes the full C language quite complex, and don't even get me started on C++. It also imposes a burden on the programmer, which he or she will tend, over time, to forget is there -- read the quotation from Ken Pier again!

There's another way in which structs are special. C supports a special form of initializer, called an aggregate initializer. This lets you specify the contents of a struct on initialization -- that is, when it comes into scope, whether it is file scope or function scope -- but not later, upon assignment, also known as mutation. In other words, the legal syntax changes again depending on context!

This makes it very clear to me that it is highly valuable to teach Scheme first, so that this tortured mess can be understood as it is, rather than as a set of givens about programming in general.

This aggregate initializer syntax can also be used for arrays, unions, and various nested combinations of the above. It also can in particular be used for const variables. For our example:

const s_type my_struct = { 1, 'a' };

If you leave off some of the values, the remainder of the struct will be initialized to zero; aggregate initializers may be nested; dimensions must be specified, or the compiler won't know how much data to supply. All well and good, and commonplace to C programmers. Simple, useful, and aside from some caution you must exercise about alignment and possible holes in the runtime representation, structs generally go about their business quite happily and never hurt anyone. (Of course, this is a lie; this shit causes unanticipated bugs and crashes every day. It would be more accurate to say that at this point we haven't introduced that much complexity, and so experienced developers who have been around the block have learned how to use these language elements with reasonable safety by applying some commonplace "best practices." This isn't the same as claiming that the language truly supports or helps to enforce this reasonable safety).

And then C++ came along.

One of the ways you can initialize objects in C++ is to provide an initializer list in your constructor. This is in fact the required method for initializing members of a class that are const or that are references, which are inherently const. In other cases, it is not required that you initialize members that way, but due to the rules about default member initialization this could mean that if you don't, but assign an initial value in the constructor body, you're wasting effort. Chalk this up to the distinction C and C++ makes between initialization and assignment; somewhat baffling to Scheme programmers, who would understand these both as variants on the underlying concept of binding.

The class could be declared like this:

class my_class {
    const int a;
    const int b;
};

The constructor is declared like this:

my_class( int a, int b );

and defined like this:

my_class::my_class( int a, int b ) :
   member_a( a ),
   member_b( b )
{
    /* body, which is often empty */
}

The initializer list is, essentially, made up of constructor calls.

What do you do if you have a struct in your class?

class my_class {
    const s_type my_struct;
}

Can you initialize it like this?

my_class::my_class( int a, int b ) :
   my_struct( { a, b } )
{
    /* body, which is often empty */
}

Naturally, after using Scheme for a while, I tend to presume that programming languages are at least somewhat regular. There's an aggregate initializer syntax (really, another tiny DSL) that you can use to initialize a struct, right? So can I use it here?

The answer is "no, that would be too easy!"

If you are accustomed to Scheme, you have become used to a language that is relatively regular -- different language forms, which return values, can be used just about everywhere it might make sense to do so. But C++ was not designed like that. There is no underlying syntax that is common despite the changed context.

In particular, although compatibility with C was a major goal of the language design, C++ broke this compatibility in several ways:

1. struct types can be declared directly in the common namespace.

In other words, you can just say

struct s_type {
  /* ... */
};

without having to typedef it, and simply use s_type freely as a type. The typedef struct form still works, but you've introduced an alias for the type; this is not usually an issue since C programmers tend to use prefixes, or suffixes, or some Hungarian-like name mangling to try to keep them straight. This has the side benefit of allowing you to use header files containing the same struct declarations in plain old C, assuming you keep those headers free of other C++-specific code.

2. Structs are classes.

It seems like a strange design decision if a design goal was really to maintain as much compatibility with C++ as possible, but when Bjarne Stroustrup designed C++, he apparently decided that a struct was just a class, with different default access rules (all members are public by default). The C construct known as the union is also a class now, but that's another story for another day.

But despite this surface uniformity, C++ is still a hybrid language. It isn't "objects all the way down," like Ruby, or Dylan, where even an integer is an object, with a class. Which means that a lot of other language constructs act kind of like classes, in some contexts -- except when they don't.

When you read Stroustrup's book The C++ Programming Language, Special Edition, when you are expecting to look at examples of construction and initialization in classes, he often throws in a struct instead. The C++ Standard document often does the same thing. (Warning: reading the C++ Standard document can cause persistent brain damage).

This also means it is perfectly legal to adorn the good old struct with public, protected, and private, and to provide constructors, assignment operators, and various other methods.

In fact, if you want to use a struct as a const member of a class, which means that it must be initialized using the initializer list syntax described above, you must do so by giving the struct a constructor. You can do this inline as follows:

struct s_type {
   int x;
   char y;
   s_type ( const int init_x, const char init_y )
       { x = init_x; y = init_y; }
};

Or you can just declare the constructor in the header and place the constructor definition in your implementation file, and you even... wait for it... use the same initializer-list syntax:

struct s_type {
    int x;
    char y;
    s_type ( const int init_x, const char init_y );
}

s_type::s_type( const int init_x, const char init_y ) :
    x( init_x ),
    y( init_y )
{
    /* empty constructor body */
}

And then, finally, in your class-containing-a-struct's initializer list, you can do this:

my_class::my_class( int a, int b ) :
   my_struct( a, b )
{
    /* body, which is often empty */
}

All that syntax, just to initialize a structure member, because Stroustrup decided that initializer lists would actually be lists of constructor calls, except when they aren't quite.

But what if you don't want to bother with that, but just want to set up the value of your member struct in the constructor body, like this:

my_class::my_class( int a, int b )
    /* no initializer list */
{
    my_struct( a, b );
}

The answer is: you can't! Because this isn't initialization; it is assigment. This is true even if the struct member of your class is not const.

However, you can assign your struct in the constructor body:

my_class::my_class( int a, int b )
    /* no initializer list */
{
    s_type temp_struct = { 0, 'a' };   
    my_struct = temp_struct;
}

and this is legal, because it isn't initialization.

If you want to maintain full compatibility with C and expose your header files to a standard C compiler, you will need to perform just this kind of awkward hack to use structs as members. This is apparently what maintaining full backward compatibility with C meant to Stroustrup: that is, it didn't mean as much as indulging his trivial observation that structs behaved like classes with public data members. Sort of.

Or, if you don't need your struct instance to be per-member (and if it is const, you probably don't), you can make it static, which means it is initialized at file scope:

class my_class {
    static const s_type my_struct;
}

const my_class::my_struct = { 0, 'a' };

Do you have a headache yet?

It is surprisingly difficult to get this information out of C++ books, even Stroustrup's books, and the C++ standard document: see the end notes, below. Understanding the usage in C++ basically requires an archaeological excavation of hacks: the original, oddly designed two-namespace struct in C, the common workarounds, the semantics of classes, Stroustrup's somewhat odd mutation of struct into class, and the strict distinction between initialization and assignment. (Not that there aren't a few things in Common Lisp that feels similarly weighted with historic baggage!)

I know this hasn't quite been about Scheme, but I am just trying to point out how C and C++ programmers have come to live with this strange mix of syntax-directed semantics, and semantic-specific syntax, where constructs are legal depening on their context. I've been using C and C++ on and off for about twenty years, but have only very recently fully understood why I can't initialize structs in what seems like the logical, backwards-compatible way. Maybe you think you can just look it up in the index. Good luck! And if you think you can just look this up in the language's BNF, think again!

C and C++ programmers live with these restrictions and special cases every day, but yet think that functional programming in a language like Scheme is somehow hard. It isn't -- it is freedom!

---------------------------

End Notes:

The compiler wasn't cooperating, so I asked myself "does C++ really forbid initialization of a struct member using aggregate initializer syntax (with the curly brackets) in a constructor initialization list?

Stroustrup, The C++ Programming Language, Special Edition p. 101, 234, 809, 818:

Stroustrup actually seems to advocate freely intermixing struct and class depending on your intentions for access control. Regarding initialization of structs, he writes "using a constructor [to initialize a struct] is usually better." But, of course, that breaks compatibility with one of the simplest but most highly useful tools of C, and the struct was never designed to support object-oriented programming.

Since the struct "tag" is promoted to a type in the general namespace, in order to maintain compatibility with C, which is broken anyway, there is a further workaround: although there is not a distinct namespace for structs, you can have a struct and non-struct type with the same name declared in the same scope, but the non-struct will take precedence unless you disambiguate with the prefix "struct." Stroustrup's examples complicate things even further by disambiguating using namespaces, which are another feature not present in C.

According to Stroustrup's somewhat limited version of the grammar of C++, there isn't any aggregate initialization in mem_initializer_list (p. 810), but the grammar is not complete. Page 247 seems to indicate that items in the member initializer list in the constructor definition are all constructor calls, given that even POD members (plain old data, like int and char and float and pointers to these types) can be initialized with this constructor call syntax.

In C++ Std. section 8.4: the grammar shows ctor-initializer and mem-initializer-list (where "mem" means "member.") This is described in 12.6.2 and seems to break down further into the formal grammar into class names and parameter lists, which seems to indicate that these are all constructors, but again this does not seem to be a complete syntax for C++. Good luck trying to find a complete grammar for C++!

04 April 2006

A New Planet Scheme!

Planet Scheme is reborn at http://www.scheme.dk/planet/ and this blog is now syndicated. Thanks for aggregating me! Time to get back to my Sudoku solver, or take a diversion into another program.

15 March 2006

Planet Scheme off the Air?

Planet Scheme seems to be defunct.

Presumably this means they won't be aggregating this blog, so I guess I can stop asking!

09 February 2006

Thinking about Sudoku and Scheme, Part 5.1

Just a brief post today. I have been too busy to do much on my Sudoku solving program itself, but I did spend some time polishing up my array2D library. With revision this library has finally assumed what I feel is its "natural" shape. The optimal design was not clear at the outset, but with some use I feel I've discerned the implementation that minimizes the amount of redundant code and thus the number of places that possible errors may be hiding. I'm sure it could be even more concise; I have a feeling that a little macro magic could cut the lines of code in half, but the shape of that implementation has not yet become clear to me, and I have an allergy to using techniques that don't provide a clear advantage.

It is really quite a simple library, but in order to provide some assurance of correctness, I also wrote a series of PLT Scheme test cases. I have not figured out how to use DrScheme to tell me if this test suite covers all of the code, or if this is even possible, but it is at least nearly complete.

The library does not do meaningful error-checking. Scheme textbook code rarely does anything by way of error-checking. As a result I'm deficient on strategies I could use to address this, especially when trying to write to RSR5.

It isn't possible to code around every conceivable error. For some applications, it makes sense to perform verification any time data is read from or written to memory. We can't operate at that level. We also shouldn't put in effort to detect flaws in the implementation itself. Given our partially functional approach, in which we don't update any top-level definitions, there aren't a lot of unexpected states our library can get into. We should focus our attention on the response to incorrect inputs. There are only a couple of classes of errors that the library should expect:

1. The set of array boundary errors. Because coordinate pairs are mapped to a single index on a vector, many of possible boundary errors will result in an incorrect cell being set or get. Only cell references that exceed the internal representation will generate a meaningful run-time error, and this error will not tell us if the x or y index was out of bounds.

2. The set of errors that can arise from receiving an incorrect value-generator or mapping function. In Scheme, a function signature really consists only of the name and the number of parameters, and does not include type information about the parameters. Supplying an incorrect number of parameters will be caught with a meaningful error at run-time but because the passed functions may be called via several levels of indirection, and the function may not be bound to a named variable, it may be difficult to glean from the error message just what has gone wrong. (Another way to say this is that DrScheme has thrown away too much context information).

Type errors are potentially more insidious, but in my implementation these functions will tend to receive an array (vector), integer indices, and possibly another function. The result of doing a vector access on an integer or function, or treating a vector or function like an integer, should at least fail fast with a moderately meaningful error.

There are richer language constructs that could address these problems. For example, either Common Lisp or Dylan would allow me to specify type information for the parameters. The runtime is then capable of flagging type errors as soon as they happen, or better yet, verify type safety across functions and generate code that operates on POD (plain old data). Dylan also supports limited types, although implementation support in Gwydion's d2c is spotty. Limited types would allow range checking further upstream. Of course, if I wrote this program in Common Lisp or Dylan, I would not bother simulating an array. If our language supported exception handling, I could wrap the top-level operations to catch exceptions without too much modification inside all my functions.

Another approach to bounds-checking in my 2-D array would be to encapsulate the array data together with metadata on its x and y dimensions. I am doing this to a limited extent with the closure generation, but the captured x-dimension is not used to validate that the x-index is within bounds. Other array implementations use a record or a closure to wrap up the representation itself. But then it is more difficult to display the contents of the array, and if we are going to start down that road, we might as well try to generalize our library for an n-dimensional array. That seems like overkill, so for now we will call array2D complete as it is.

31 January 2006

Interesting Embeddable Languages

There are a few languages I am paying particular attention to these days.

The first is Scheme; that's the language I'm using for my Sudoku experiments.

The second is Lua. Lua is a light language that includes a byte code compiler and interpreter. It is designed to have a small implementation footprint and to interface well with C. It has lightweight threads. Lua's home is at http://lua.org.

The third is Io. Io has a syntax that looks a bit weird, unless you're familiar with Smalltalk. I've never used Smalltalk, but I have used NewtonScript. The prototype-based inheritance scheme used in Io reminds me of NewtonScript's inheritance, although Io does not support dual inheritance chains. It may not be quite portable enough to move over to an arbitrary platform, but I won't know for sure until I try. Io's home is at http://iolanguage.com.

And then there is Ruby. Ruby is not quite so lightweight, but it is designed to be embeddable; it would not be my first choice, though, because it does not have a well-tested compiler or byte code runtime. It has too little regularity in its syntax (otherwise known as "too much syntax"), and uses sigils, which I don't like except as convention, as in Dylan, but as far as I'm concerned for usability it beats Perl hands-down. I prefer it to Python for no particularly good reasons other than Python's use of whitespace, which is a relatively trivial feature, and Guido's apparent hostility towards fully usable lexical closures. Ruby's home is at http://ruby-lang.org.

Ruby's support for higher-order functions has the side effect of making it serve as an acceptable dialect of Lisp, even without an exact equivalent of Lisp macros. I was able to bring up my basic Sudoku code in just a couple of hours using Ruby. Ruby is great for rapid prototyping and the standard libraries "feel" just the right size. There's some lessons here to inform future generations of Lisp, if the Lisp community is willing to listen to them.

There are some Forth-inspired languages that are also interesting; I am a fan of Forth, and consider it to be the "other" programmable programming language. Joy (http://www.latrobe.edu.au/philosophy/phimvt/joy.html) is a sort of functional variant on Forth.

30 January 2006

Thinking about Sudoku and Scheme, Part 4

Last time, I wrote:

Now we've got our given board and our candidate board, and we can start smashing them together to shake out candidates, and begin finding values to go in our boxes.

So let's do that. But first, I need to confess something. You may have found yourself wondering "You already have a data structure that contains the givens and their coordinates. Why make the given-board vector at all?" If you wondered that, congratulations. You're paying attention, possibly more than I am, and working towards writing code that is simple.

So what's the answer? The answer is "I don't have a good reason; the code just grew that way, and has not been optimally cleaned up yet." As I write this code, I'm learning to use Scheme more effectively. It has been an iterative process. Right now I'm in the part of the cycle that is excited to have the code successfully working on the first part of the Sudoku board solution. At some point I'll probably be in the part of the cycle that wants to rip up the code and make it smaller and simpler. Scheme lends itself well to stepwise refinement. This is both a blessing and a curse. A blessing, because it is easy to rewrite your code. In C++, assuming your design is not visibly horrible and doesn't crash, it is generally too painful to redo all those interfaces and class definitions and so working code often doesn't get improved unless there is a burning need. Scheme, especially if it is written in an appropriate functional style, is easy to improve locally. This also means that you can always improve it some more, and fall into the trap of improving it endlessly without solving an actual problem. That's the curse.

OK, now let's get to the meaty part, keeping in mind that this is code in progress. Here's how we map the givens and use them to cancel the first round of candidates. From the top:

(cancel-givens
(make-given-board given-list)
(make-initial-candidate-board))

Simple enough so far, right? Now we just have to define cancel-givens, and we're done.

(define cancel-givens
 (lambda (given-board candidate-board)
   (map-board-coordinates-elts
    given-board
    (lambda (x-idx y-idx elt)
      (if elt
          (cancel-given
           candidate-board x-idx y-idx elt))))
   candidate-board))

We know roughly what that map function is going to look like; it will call a function with the coordinates and elements of the given board. Now we just need to define cancel-given, and we're done. here's cancel-given. This one is big (most likely it could and should be further simplified):

(define cancel-given
 (lambda (candidate-board x-idx y-idx given)
   (reduce-candidate-list-to-given!
    candidate-board x-idx y-idx given)
   (letrec ((filtered-neighborhood-unique-coordinates-list
             (make-filtered-coordinate-list
              (get-unique-neighborhood-coordinates
               x-idx y-idx) (list x-idx y-idx)))
            (candidate-remover
             (make-candidate-remover given))
            (coordinate-list-handler
             (lambda (coordinate-list)
               (if (not (null? coordinate-list))
                   (begin
                     (candidate-remover
                      candidate-board
                      (caar coordinate-list)
                      (cadar coordinate-list))
                     (coordinate-list-handler
                      (cdr coordinate-list)))))))
     (coordinate-list-handler
      filtered-neighborhood-unique-coordinates-list))
   candidate-board))

Let's look at this one in a little more detail. Remember that this is called with each given. The first thing we do is turn the candidate list in the box with the given to a list containing only the given. Maybe that isn't the best representation, but it will do for now:

(define reduce-candidate-list-to-given!
 (lambda (candidate-board x-idx y-idx given)
   (set-board-elt!
    candidate-board
    x-idx y-idx
    (list given))))

Note that this is not strictly functional in style. I have tried to move over to a functional style, but my roots are still in imperative programming, so a fully functional, referentially-transparent implementation does not come naturally quite yet. In general, I am at a bit of a loss for good idioms to handle the gamut of solving strategies on the board over time without introducing mutation, but that is probably from lack of experience in purely functional programming techniques and data structures. I could keep regenerating the whole board, but that seems strange to me. I may explore a stricter functional design in a future version.

Meanwile, let's continue on and look at the next task, which is to remove the given from the entire neighborhood. Recall that the neighborhood of a box is the combined row, column and grid. We want a list of coordinates. Simply combining the coordinates for the row, column and grid will generate duplicate sets of coordinates; we also don't want to process the box itself, or we'll remove the given we just set. So we first use get-unique-neighborhood-coordinates:

(define get-unique-neighborhood-coordinates
 (lambda (x-idx y-idx)
   (let ((grid-bounds
          (map-coordinates-to-grid-bounds x-idx y-idx))
         (acc
          (make-unique-coordinate-accumulator)))
     (map-board-row-coordinates y-idx (car acc))
     (map-board-col-coordinates x-idx (car acc))
     (map-board-subarray-coordinates
      (car grid-bounds)
      (cadr grid-bounds)
      (caddr grid-bounds)
      (cadddr grid-bounds)
      (car acc))
     ((cadr acc)))))

The function map-coordinates-to-grid-bounds uses a little bit of Scheme math to determine the bounds of the 3x3 grid our box is in. Here's how I do that:

(define map-coordinates-to-grid-bounds
 (lambda (x-idx y-idx)
   (list (get-grid-lower-bound x-idx grid-x-dim)
         (get-grid-upper-bound x-idx grid-x-dim)
         (get-grid-lower-bound y-idx grid-y-dim)
         (get-grid-upper-bound y-idx grid-y-dim))))

And here are the helper functions:

(define get-grid-lower-bound
 (lambda (idx dim)
   (* dim (floor (/ idx dim)))))

(define get-grid-upper-bound
 (lambda (idx dim)
   (- (* dim (ceiling (/ (+ 1 idx) dim))) 1)))

The particulars of Scheme math require some care if you want to make sure you get a whole number result rounded in the direction that you want. If you're used to using languages that don't do exact fractions, this may seem like too much work. This way seems to work, though, so we will move on for now.

Next, we use an accumulator (created by make-unique-coordinate-accumulator) to collect up all the coordinates. This isn't a numeric accumulator, but a list accumulator: when we create it, it holds a reference to an empty list, and as we call it with coordinates, the list is assembled. Our accumulator generator looks like this:

(define make-unique-coordinate-accumulator
 (lambda ()
   (let ((coordinates (list)))
     (list
      (lambda (x-idx y-idx)
        (if (not (member (list x-idx y-idx) coordinates))
            (set! coordinates
                  (cons (list x-idx y-idx) coordinates))))
      (lambda ()
        coordinates)))))

This is again not strictly functional; in a more functional style, we'd probably generate a new accumulator with a new list created from the previous one. But then the client that held the accumulator would have to mutate its binding in order to get the new accumulator, so it would have to be strictly functional as well -- a sort of viral strict functionality would require rewriting all the upstream code. Something to think about!

This is yet another function generator, with a twist. We use a let to create a binding for a list, called coordinates. This is the list that will accumulate the coordinates. The actual return value of the function is... another list! This time, a list of two functions. The first function in the list receives coordinates and if the coordinate pair is not already present in the closed-over list, as determined by member, we destructively set coordinates to be the list with the new coordinate pair prepended. The second function in the list doesn't look like a function at all; in fact, it consists only of:

(lambda ()
 coordinates)

All that does is return the value held in the binding named coordinates. It's the rough equivalent of a return statement in C, C++, or Java. The important thing to note here is that both of the functions in the list we return contain a reference to coordinates. The binding in question is the one that is in lexical scope when the lambda form is executed. Note that we could stuff these functions inside some other data structure; we could even put them in a single cons cell using set-car! and set-cdr! But for now I am still using the list as my "Swiss Army Knife" data structure, so we'll use that.

Returning to the innards of get-unique-neighborhood-coordinates let's look at what happens next:

(map-board-row-coordinates y-idx (car acc))
    (map-board-col-coordinates x-idx (car acc))
    (map-board-subarray-coordinates
     (car grid-bounds)
     (cadr grid-bounds)
     (caddr grid-bounds)
     (cadddr grid-bounds)
     (car acc))

I'm applying my accumulator to the row, the column, and the subarray. The map functions are unexceptional; they are wrappers around my 2D array implementation. To pass the four parameters that constitute the bounding rectangle of the subarray, I'm again using those all-purpose list functions car, cadr, caddr, and cadddr. These simply correspond to retrieving the first, second, third, and fourth element in a list. Actually, in RSR5 Scheme you can rewrite this to use first, second, third, and fourth if you want to, but older textbooks generally use the composed list abbreviations, so I will stick with them for now; rather than just use clearer names, if I rewrite this to get rid of the composed list functions, I'd rather use a self-documenting data structure of some kind.

The last expression in the function is the return value. Don't overlook its importance:

(cadr acc)

This actually calls the function contained in the second element of our accumulator list, which will retrieve the contents of the accumulator. Since it is the last expression in the function, get-unique-neighborhood-coordinates will return its value.

Now we're back in cancel-given. The next thing we do is call make-filtered-coordinate-list on our unique list. We do this to remove the box containing our given, so that we don't wind up removing that candidate as well:

(define make-filtered-coordinate-list
 (lambda (coordinate-list coordinate-pair)
   (if (null? coordinate-list) (list)
       (if (not (equal?
                 coordinate-pair (car coordinate-list)))
           (cons (car coordinate-list)
                 (make-filtered-coordinate-list
                  (cdr coordinate-list) coordinate-pair))
           (make-filtered-coordinate-list
            (cdr coordinate-list) coordinate-pair)))))

This function is a textbook case of filtering out one member from a list, returning a new list. The key implementation detail here is that it recursively walks the list, building up a new list as it goes. As it stops to examine each element in the list, if it is not the equal to the item in question, it goes ahead and conses it on to the generated list; if it is, the element is not consed, and we just continue to recurse. You can see similar examples in The Little Schemer. Note that this function counts on the fact that equal? works properly on data structures that are not atoms and on atoms that are different objects in memory. In Scheme, Lisp, and other dynamic languages -- basically, as soon as you get away from "unboxed" primitive data types -- equality is a complex subject!

OK, are you still with me? We've got a list of the coordinates for the neighborhood we're examining. Next, we make a candidate remover, which is another function generator; when you pass it a candidate, it removes it from all the candidate list. That should be straightforward enough by now, so I won't belabor it. So, does it work?

Well, of course, it did not work on the first try. Or the second or third try! In fact, while I have presented this design top-down, I actually wrote it using a much more realistic mixture of top-down and bottom-up design. Sometimes the higher-order algorithm will be clear. This is often best worked out on paper. The cancel-givens and cancel-given function started out as a stub that called stubs. To test the implementation function-by-function, the stubs just returned fixed values. These were later fleshed out with real code. As I implemented more functions, I found that certain other function definitions seemed to naturally fall out, so wrote those next. Meanwhile, had already gone "bottom up" in the process of writing my 2D array functions, so my implementation hooked up in the middle. Then there was a bit of "impedance mismatch," where the functions I thought I would need turned out not to be precisely the functions I did need. I wound up iterating in both the "bottom up" and "top down" directions multiple times before getting even to this point. But using the DrScheme listener, turnaround on this kind of iteration is very quick.

The result of removing all the givens from the board, and crossing off candidates in the neighborhood of each given as I go, is a candidate-board vector that looks like the one below. A list of only one value indicates that the value has been solved. I have formatted the output lightly to make the rows a little more clear, and I have marked the initial set of givens in bold:

((2 3 7) (3 8) (2 3 7 8) (2 3 4) (6) (5) (2 3 4 8) (9) (1)

(1 2 3 6 7) (1 3 6 8) (2 3 6 7 8) (2 3 4 9) (2 3 7 8)

(2 3 7 8 9) (5) (4 6 8) (2 3 4 6 8)

(4) (5) (9) (1) (2 3 8) (2 3 8) (2 3 8) (7) (2 3 6 8)

(2 5 6) (4 6 8) (2 5 6 8) (2 5) (9) (1 2 7) (1 2 4 8)

(3) (2 4 6 8)

(2 3 5 9) (3 4) (1) (8) (2 3) (6) (7) (4 5) (2 4 9)

(2 3 5 6 9) (7) (2 3 5 6 8) (2 3 5) (4) (1 2 3) (1 2 8 9)

(1 5 6 8) (2 6 8 9)

(1 3 7) (9) (3 7) (3) (1 3 8) (4) (6) (2) (5)

(1 3 5 6 7) (1 3 6) (4) (2 3 6 9) (1 2 3 8) (1 2 3 8 9)

(1 3 8 9) (1 8) (3 7 8 9)

(8) (2) (3 6) (7) (5) (1 3 9) (1 3 4 9) (1 4) (3 4 9))

Note that there were 28 givens. The next important question to ask is "did we find any more solved boxes?" That is, did we reduce the candidate list in any of the non-given boxes to a length of 1, indicating only a single possibility in that box?

If you look closely, you will notice that, in fact, we have solved some boxes, simply by crossing out the givens and following through on the implications of those givens for the boxes in the neighborhood of each given. In the third-from-last row, which would be row 6 in our zero-based indexing, we see that there is a single 3 candidate in one of the candidate lists. This was not a given:

(1 3 7) (9) (3 7) (3) (1 3 8) (4) (6) (2) (5)

However, our code did not yet discover that we have made more progress. In the next installment, we'll look at how the code can make that discovery, and what the implications are. We'll also look how to apply some slightly less elementary solving rules to the board.

Practical Subversion and Other Subversive Practices

Since I'm a fan of the book Programming Ruby, I purchased the two Pragmatic Programmer books, Pragmatic Version Control Using CVS and Pragmatic Version Control Using Subversion. I was planning to use them as a quick refresher to CVS and a quick cross-introduction to Subversion, since I have a couple of years of experience using (and occasionally fixing) CVS repositories, and also have used a couple of other tools including MKS RCS, Voodoo, and (a little bit of) ClearCase, and we are implementing a Subversion repository at work.

I wound up returning the books to the store, though. I would recommend either of them to someone who is new to both tools, but they are too light for someone who has already used one of these version control systems extensively.

For CVS, there are plenty of free texts available online, including the venerable Cederquist, which is a bit hard to use as a tutorial, and the Fogel and Bar book (at http://cvsbook.red-bean.com/). I have the second edition in the dead tree edition; the third edition is available free online. If you like it, buy a paper copy.

For Subversion, although there are some free texts available, I decided that Practical Subversion, by Garrett Rooney, is the right text, on the right level, at the right time, for me. The chapter on best practices alone helps differentiate it from other books which tend to be organized around a command-line options reference. We'll see how it holds up as I gain a little more experience with Subversion.

I also picked up a copy of The Reasoned Schemer. I am a big fan of The Little Schemer and The Seasoned Schemer, although I must admit I have not completed The Seasoned Schemer.

I also started reading A Little Java, a Few Patterns, which is an effort to teach functional and recursive programming using Java. Given that I had a lot of Java experience and had already thought a lot about design patterns, it seemed like a strange and forced hybrid, so I decided not to bother with it and just work on Scheme instead. The book may be interesting to developers who know Java and who would like to begin to understand Scheme.

I am now re-skimming The Little Schemer as a warmup to a new book on logic programming. I've been interested in learning Prolog for a while now but have not gotten much past reading part of Clocksin and Mellish's book Programming in Prolog, 5th ed. Given that I'm currently learning Scheme, implementing a Prolog-like language in Scheme might be the impetus that I need. If I get enough time to work on it I will report on what I find.

27 January 2006

Thinking about Sudoku and Scheme, Part 3

OK, so I've given you a small view of one part of the array2D library and how it is implemented. Now let's look at my Sudoku solver from the top. Note that this is a hobby project, done in the rare scraps of free time I am able to piece together these days, by a relative novice in Scheme. In other words, it is not finished, polished, textbook work, so be kind.

Let's review what Sudoku is and consider some of the most elementary solving strategies. First, terminology. These are the terms I use; some are borrowed and some are my own. Other Sudoku geeks may use alternate terminology.

Box: the smallest unit; a single space for a number. I identify boxes using zero-based x, y coordinate pairs.

Open box: a box that does not have a number in it yet.

Coordinates: on a 9x9 grid, which does not represent the only possibility in Sudoku, the top row goes from (0, 0) to (8, 0) and the rightmost colum goes from (8, 0) to (8, 8).

Given: one of the numbers already provided in the puzzle.

Grid: in a 9x9 Sudoku puzzle, one of the 9 3x3 sub-parts; on a 16x16 puzzle, one of the 16 4x4 sub-parts, etc. Grids are not necessarily square; some published puzzles could contain 4x3 grids, for example.

Virtual row: a single group that must contain the set of possible numbers (in a 9x9 puzzle, 1 through 9). Note that each box is actually part of three virtual rows: its row, column, and grid.

Number set: for a 9x9 puzzle, the integers 1 through 9. I have seen Sudoku boards that allow the numbers 1 through 12, 1 through 16, 1 through 25, and 1 through 49. Sometimes letters are used: for example, the numbers 1 through 16 can be encoded using the hexadecimal digits 0..F (or 1..G), and the numbers 1 through 25 can be encoded by the letters A..Y. There are more exotic Sudoku involving the letters that make a specific word, but this is really just an encoding issue and does not fundamentally change the rules.

Sudoku rule: there is probably a good formal expression of this, but informally speaking, the Sudoku rule says that each box in each virtual row must contain one of the numbers in the number set.

By implication, each box cannot contain a number that is not a member of the number set; it also cannot contain a duplicate of another number in the same virtual row; in a completed puzzle each virtual row must be filled, which means the whole puzzle must be filled.

If you find that a puzzle you are working on seems to break this rule, either the puzzle was not valid to begin with, or you have made a solving error. The second case is more likely, but I have found errors in published puzzles.

There are several possible types of errors. The givens may violate the Sudoku rule, which will lead to an insoluble puzzle, or more than one solution may be possible, which could lead you to find a solution that follows the Sudoku rule but which does not match the published solution.

Candidate: one of the possible values that could legally be written in a box.

On an empty 9x9 grid, each box has as its candidates all the numbers 1 through 9, so no progress can be made towards a solution.

When the givens are taken into account, the set of candidates in the remaining open boxes in the neighborhood must be reduced to maintain the Sudoku rule. As the puzzle is filled in, the candidates are reduced further.

As you perform logical deductions on the board, you will remove candidates; when only one candidate remains for a box, the number should be written down and the candidates for boxes elsewhere in the neighborhood must have that number removed; repeat until no open boxes remain. (This is actually in a nutshell the algorithm for solving any Sudoku puzzle, and translates directly into the main loop of a computer solution that uses human solving techniques rather than testing every possibile chain of possibilities).

Noncandidate: instead of writing down candidates as you work on the puzzle, you could also write down noncandidates: that is, numbers that have been ruled out for each box. When the set of noncandidates for a given box leaves only one remaining candidate in the number set, you can write down that possibility, and you must then add this number this to the sets of noncandidates elsewhere in the neighborhood. You could represent noncandidates by writing them down directly, although this can be confusing, or writing down the number set in each open box and crossing out numbers as they become noncandidates.

Single Candidate: the case when one candidate remains in a box; once you recognize this case, you should write in the number; the box is solved, you should then make sure to reduce the candidates in the neighborhood. Which brings us to another term.

Neighborhood: the combined row, column, and grid that contains a given open box. Neighborhood is a distinct concept from virtual row because if a number appears anywhere in the neighborhood containing a given box, it cannot be a valid candidate for the box; this is because the Sudoku rule applies to each box three ways.

Basic Solving Strategies

For demonstration purposes, we will use the following 9x9 puzzle (taken from _Sudoku: Easy to Hard Presented by Will Shortz_ Volume 3):

-------------------------
| 9 . 8 | 3 . . | 7 4 2 |
| . 4 . | 5 8 . | . 9 . |
| 2 . . | . 7 . | 1 . 8 |
-------------------------
| . . 6 | 2 9 4 | 8 3 . |
| 4 8 . | . . . | . . 1 |
| 3 7 . | 6 . 8 | . 2 . |
-------------------------
| . 3 . | . 5 6 | . . . |
| 6 . . | 1 . . | . 8 3 |
| . 2 . | . . 3 | . 1 5 |
-------------------------

Let's look at how we might represent this board in Scheme, and then start coming up with functions to operate on it.

To allow my solver to be configured for different board sizes and grid dimensions, we'll provide bindings for the independent dimension values:

(define max-box-value 9)
(define board-x-dim 9)
(define board-y-dim 9)
(define grid-x-dim 3)
(define grid-y-dim 3)

To start solving our sample puzzle, let's use the following algorithm:

1. Make a given board. A given board is a board array that contains only the givens. Here are the givens, as a list of list, where each list contains a coordinate pair as a list and a value:

(define given-list
  '(((4 0) 6) ((5 0) 5) ((7 0) 9) ((8 0) 1)
    ((6 1) 5)
    ((0 2) 4) ((1 2) 5) ((2 2) 9) ((3 2) 1) ((7 2) 7)
    ((4 3) 9) ((7 3) 3)
    ((2 4) 1) ((3 4) 8) ((5 4) 6) ((6 4) 7)
    ((1 5) 7) ((4 5) 4)
    ((1 6) 9) ((5 6) 4) ((6 6) 6) ((7 6) 2) ((8 6) 5)
    ((2 7) 4)
    ((0 8) 8) ((1 8) 2) ((3 8) 7) ((4 8) 5)))

Here's a function which will make a given board:

(define make-given-board
  (lambda (given-list)
    (let ((given-board
           (make-vector (* board-x-dim board-y-dim) #f))
          (setter! (array2D-make-setter board-x-dim)))
      (letrec ((do-write-givens
                (lambda (puzzle-array givens)
                  (if (not (null? givens))
                      (begin
                        (do-write-givens
                         puzzle-array (cdr givens))
                        (setter! puzzle-array
                                 (caaar givens)
                                 (cadaar givens)
                                 (cadar givens)))))))
        (do-write-givens given-board given-list))
      given-board)))

This is a pretty simple function, but let's look at a couple of points. First, the default value of our vector is #f, the canonical "false" value. This has no real purpose other than allowing us to use the elements of this array act as an argument to Scheme's if or cond and give the expected results. We've already seen how to use letrec to create a self-recursive helper function; we already know how the curried setter! function works. The body of this function is a pretty straightforward example of recursion over a list. The arguments we pass to setter! are a little more interesting:

(setter! puzzle-array
              (caaar givens)
              (cadaar givens)
              (cadar givens))

These primitives provide shortcuts for expressing combinations of car and cdr, and are read inside out from right to left; caaar becomes (car (car (car ...))), cadaar becomes (car (cdr (car (car ...)))).

An aside: I have a love-hate relationship with these constructions. On the one hand, they allow you to do quick deconstruction of arbitrary list structures, and that's a good thing because lists are very quick and dirty and you can easily put together data structures using lists. On the other hand, they allow you to do quick deconstruction of arbitrary list structures, and that's bad because they are not self-documenting in any way and nothing in them is named. Using something like a plist or record type is probably a better idea, so I will consider that later; Common Lisp provides destructuring, which would also be a valid option because it is self-documenting.

Anyway, that will build a given board. Now let's build a candidate board. The initial candidate list can be easily generated using another function:

(define make-initial-candidate-list
  (lambda ()
    ((lambda (max)
       (letrec ((make-candidate-list
                 (lambda (count)
                   (if (> count max) (list)
                       (cons count
                             (make-candidate-list
                              (+ count 1)))))))
         (make-candidate-list 1)))
     max-box-value)))

I'm going to wrap this up in the form of a value generator. A value generator is a function that receives x and y indices and returns a new value for the board. In this case our value generator will ignore the parameters:

(define initial-candidate-value-generator
  (lambda (x-idx y-idx)
    (make-initial-candidate-list)))

Now we can use our set-board-elts primitive, which is a curried array mapping function, to fill in the candidates. Each box gets a list containing the full set of candidates:

(define make-initial-candidate-board
  (lambda ()
    (let ((candidate-board
           (make-vector
            (* board-x-dim board-y-dim) '())))
      (set-board-elts
       candidate-board initial-candidate-value-generator)
      candidate-board)))

Now we've got our given board and our candidate board, and we can start smashing them together to shake out candidates, and begin finding values to go in our boxes. More on that next time.

Version Control Systems

I've had a fair amount of experience using CVS, but not for a couple of years ago. Recently I have been investigating version control systems again. One scenario I wanted to investigate was the following. Assuming a tree of files like this:

A/
  B/
    foo.cpp
    foobar.cpp
C/
  D/
    bar.cpp
    barbaz.cpp

From this, we want to be able to checkout the following to a sandbox directory with a single checkout command:

F/
  foo.cpp
  bar.cpp

If you're a CVS or svn user, go along with me for a moment and assume that this is a valid thing to want to do.

Unfortunately, it seems to be impossible.

What about CVS modules?

Well, I would summarize what modules can do by saying that upon checkout, the use of modules can collapse existing directory hierarchy, or add directory hierarchy that does not exist in the repository, or place directories in places where they don't exist to start with, but they fall short of allowing you to arbitrarily rearrange the tree, and they won't rearrange files. This is probably in part because of the ways in which metadata is stored on disk for CVS sandboxes: it goes into your directories, even though CVS does not treat directories in general as first-class citizens.

Here is what you can do with modules:

Alias modules will simply allow you to use a different name for an existing module or path. When you check out, all intermediate directories will be created. For example:

r5rv_alias_dirs -a r5rv/a/b r5rv/c/d

Will give you:

r5rv/
     a/
       b/
         foo.cpp
         foobar.cpp
     c/
       d/
         bar.cpp
         barbaz.cpp

Regular modules let you label all or some of the files in a directory with a module name. For example, this will get only the files indicated and write them in a directory with the name of the module, generating no intermediate directories in the output:

r5rv_foo r5rv/a/b foo.cpp

This will give you:

r5rv_foo/
         foo.cpp

Leaving off the filename will get you all the files in the specified subdirectory:

r5rv_b r5rv/a/b

giving you:

r5rv_b/
       foo.cpp
       foobar.cpp

Note that for regular modules you can't put multiple directories on the same line, or CVS gets confused.

If you want to put specific files or the contents of specific directories together, you have to use ampersand modules. But: when CVS writes out ampersand modules, it addes the module names to the directory structure! So, given:

r5rv_bar r5rv/c/d/ bar.cpp
r5rv_foo_bar &r5rv_foo &r5rv_bar

Checking out r5rv_foo_bar will give you:

r5rv_foo_bar/
             r5rv_foo/
                      foo.cpp
             r5rv_bar/
                      bar.cpp

Not quite what we want.

You can exclude directories when defining an alias module. For example:

r5rv_no_c -a !r5rv/c r5rv

will give you only:

r5rv/
     a/
       d/

Note that if you already have r5rv checked out from the module r5rv_alias_dirs above, CVS will not remove the contents of c/, so it would be best to remove your local sandbox version of r5rv before checking out this new module.

So can we achieve an output dir like:

r5rv_proj/
          f/
            foo.cpp
            bar.cpp

I don't think CVS can do this directly, but we come close, if we're willing to accept intermediate directories around our individual files. Using as a model the r5rv_foo_bar module above, we can change it to rename the working directory to something other than the module name:

r5rv_foo_bar_renamed -d r5rv_proj &r5rv_foo &r5rv_bar

We get:

r5rv_proj/
          r5rv_bar/
                   bar.cpp
          r5rv_foo/
                   foo.cpp

What if we wanted to fix up this hierarchy with a post-checkout script? Well, assuming we can guarantee the script will run on the client, if we have a script that will move our files, we can specify that we want a script to run by defining a module like this:

r5rv_post -o combine.sh -d r5rv_proj &r5rv_foo &r5rv_bar

This gives us what we want, but there is one problem: we have not properly preserved the contents of the CVS metadata directories. This means that CVS no longer knows what directory foo.cpp and bar.cpp belong to, and will be generating all kinds of errors when, for example, doing a status check or update. There are some workarounds that can be put into place using .cvsignore files, and I suppose that if I was feeling ambitious I could write some scripts that would actually run sed on the metadata files in the CVS subdirectories, but it should be obvious that this solution would be ugly and brittle.

So what about Subversion?

In Subversion, when you commit your initial structure, you can do something like this:

original:

a/
  b/
     foo.cpp
     foobar.cpp
c/
  d/bar.cpp
     barbaz.cpp

Let's say you want to be able to mix-and-match these into your output. You can check them in to svn as separate projects. Easily done:

svn import ./a file:///cygdrive/e/repo-svn/r5rv-a -m "initial"
svn import ./b file:///cygdrive/e/repo-svn/r5rv-b -m "initial"

Now you want to create a directory that will contain both of them upon checkout:

mkdir r5rv-proj
svn import ./r5rv-proj file:///cygdrive/e/repo-svn/r5rv-proj -m "initial"

Then over in your workspace, you check out that empty directory: the last parameter is the name to give it in your workspace:

svn checkout file:///cygdrive/e/repo-svn/r5rv-proj r5rv-proj

Now you have a working copy of an empty directory (something that is impossible in CVS!) You want to add a svn:externals property to it. This consists of a set of svn URLs for things that should be retrieved here when the directory is checked out. You can do this with svn propset, but since you want one local directory name and one svn URL per line, you can't really do it directly on the command line. Instead you can do something like:

echo "r5rv-a file:///cygdrive/e/repo-svn/r5rv-a" >> props
echo "r5rv-c file:///cygdrive/e/repo-svn/r5rv-c" >> props

Check that it looks right:

$ cat props
r5rv-a file:///cygdrive/e/repo-svn/r5rv-a
r5rv-b file:///cygdrive/e/repo-svn/r5rv-b

then set the externals property using -F, which means read from a file, like so:

$ svn propset svn:externals -F props r5rv-proj/
property 'svn:externals' set on 'r5rv-proj'

You can confirm the property has been set:

$ svn propget svn:externals r5rv-proj
r5rv-a file:///cygdrive/e/repo-svn/r5rv-a
r5rv-b file:///cygdrive/e/repo-svn/r5rv-b

Clean up:

$ rm props

OK, now commit:

svn commit r5rv-proj/ -m "added external property"

And update:

svn update r5rv-proj/

You see something like this:

Fetching external item into 'r5rv-proj/r5rv-a'
A    r5rv-proj/r5rv-a/b
A    r5rv-proj/r5rv-a/b/foo.cpp
Updated external to revision 4.

Fetching external item into 'r5rv-proj/r5rv-c'
A    r5rv-proj/r5rv-c/d
A    r5rv-proj/r5rv-c/d/bar.cpp
Updated external to revision 4.

That's a lot of work! But the model is actually simpler than elaborate use of entries in the modules file. In addition, under svn, a non-privileged user can make this change (that is, a user with ordinary write access, rather than administrator access). And more importantly, these changes are all versioned; under CVS, if you want to use those module definitions, you have to make sure that the paths specified in the modules don't change, or the modules file will be out of sync with the repository structure. This is a general problem with CVS, which is not really designed to cope well with restructuring the repository.

Now your local project looks like this:

r5rv-proj/
          r5rv-a/
                 b/
                   foo.cpp
          r5rv-c/
                 d/
                   bar.cpp

Which is pretty close to what we want, taking into account the fact that neither CVS nor svn seems to be able to write individual files from different repository locations to the same output directory in the working copy.

However, there are a couple of things to keep in mind about the svn solution. First, your external links have to be complete svn URLs. That is, they can't be relative URLS to directories in the same repository [see note 1].

Worse, it seems that since I'm using svn URLs that start with svn+ssh, the URL I put into the svn::externals property has to include my username! [see note 2]. This means that the external URL will only work when I do the update, not when my co-workers do it. What a mess.

It appears that svn ignores externals when you perform a commit. That is, it assumes you are using an external piece of code that you don't own, and don't want to (or don't have permission to) change. However, you can specify that the external reference should either remain stuck on a particular version, or should refer to the head; if you choose the latter, when you do an update, if the external tree have changed your working copy will be updated.

Anyway, there it is: externals are slightly more flexible than CVS, but not arbitrarily flexible, and with some critical usability issues. Neither of the tools really will let you assemble a checkout out of arbitrary directories and files.

I am going with svn and hoping not to look back, but the ability to define completely arbitrary modules would have been a nice fit to our build process.

From the point of view of programming languages, or even a DSL for version control, this would seem like a perfect example of failed Greenspunning. Why shouldn't the modules system in CVS let you specify arbitrary paths, or even use regular expression pattern matching, to specify the members of a retrievable module? Why shouldn't a tool like svn let you use a DSL to specify what actually happens in the checkout process, with primitives that correspond to the domain entities and proper handling of metadata?

Without this, it seems to me, there is still room in the version control space for a better tool. But I guess there is always room for a better tool; the question is "how flexible is flexible enough?"

NOTE 1: Apparently this subversion feature (relative external links) has been under discussion for a long time. See the bug report. It looks like it might appear in svn 1.5, but I'm not holding my breath.

NOTE 2: There seems to be a way to configure resource files so that command-line svn clients will let you use external svn+ssh URLs that don't specify a username. However, at my workplace we are all using the TortoiseSVN client, and I'm not aware of any way to make that client work with these URLs without specifying the username. Also, just as a matter of design, it seems like a bad idea to force your repository to contain hard-coded references to its own URL, since these will break if the entire repository is moved.

25 January 2006

Thinking about Sudoku and Scheme, Part 2

Here is how I implemented some of my array primitives. I'll start in the middle:

(define map-board-row-elts
  (array2D-make-row-elt-mapper board-x-dim))

When treating a vector like a two-dimensional array, assuming that you store them row-first, the key piece of information you need is the size of the row. (More generally, for an n-dimensional array, to access an element you need to know the size of dimensions 1 to n - 1). On the top side of my program, the dimensions of my Sudoku board are known. The array primitives don't know. Rather than pass this information to every array access call, I create a function. The function array2D-make-row-elt-mapper captures this information using closure and returns another function:

(define array2D-make-row-elt-mapper
  (lambda (x-dim)
    (lambda (array2D y-idx elt-handler)
      (array2D-map-row-elts
       array2D y-idx x-dim elt-handler))))

Note the nested lambda expressions. The top one receives the x-dim parameter, and the body of this function is another lambda expression. This second lambda expression is the return value of this function: it returns a function. This returned function invokes another function which requires an x-dim parameter.

In this case, the purpose of returning this function is to provide the functionality of another function, array2D-map-row-elts, but without the need to provide an x-dim parameter. The new function is a curried function.

What is a curried function? Currying means taking a function with n parameters and creating a new function with n - 1 parameters. The new function provides a default value for the missing parameter, giving you a simplified API to the original function.

Common Lisp and other languages provide curry and rcurry functions which peel off parameters from the left or right of the paramter list, respectively. So our currying may not meet a strict definition, although we are doing the same thing: providing an interface function that will supply a value for one of the original parameters. Currying often uses a constant for these default parameters; in our case, we're using a variable, but the idea is similar. Scheme does not have a standard curry function, but it is easy enough to express currying by using nested lambda expressions.

In my case, x-dim is the curried parameter. Inside this lambda expression is a reference to x-dim. Scheme supports lexical closure; at the time this lambda expression is evaluated, the result is a function object that "closes over" or "captures" -- that is, retains a reference to -- the variable binding in effect when the lambda expression is evaluated.

This is the variable binding owned by the caller, the variable bound to the name board-x-dim. It is not a copy: if some other piece of code updates the value in that variable, this function would see a different value for x-dim when it executes. This creates complex state with multiple possible "listeners" for the state change: this is one of the reasons that Scheme programmers tend to prefer a functional style where variables are not altered after initial binding.

If you are a C programmer, you may find this very confusing. You can think of it as if the returned function retains a pointer to the board-x-dim sent by the caller. The C runtime model is quite different, though: it has distinct lifetimes for variables; local non-static ("automatic") variables that exist on the stack may be recycled when a given function returns, so it is unsafe to retain pointers to them (in fact, retaining a pointer to an automatic variable is a common beginner's bug in C).

In Scheme, bindings exist as long as there are references to them; when there are no more references to them, they may be garbage-collected (or not).

Let's look at the definition of the next function we call, array2D-map-row-elts:

((define array2D-map-row-elts
  (lambda (array2D y-idx x-dim elt-handler)
    (array2D-map-subrow-elts
     array2D
     0 (- x-dim 1)
     y-idx
     x-dim
     elt-handler)))

OK, we call yet another function: array2D-map-subrow-elts, which is more general (it will map any contiguous subset of the array elements). It is the general case, and array2D-map-row-elts is the more specific case (it maps the whole row), which means we can quite naturally define this specific case by parameterizing the general case. It isn't quite the same as currying, but it is a similar idea; we provide some fixed parameters representing the start of the row (index 0) and end (index x-dim - 1). Here is the general function:

(define array2D-map-subrow-elts
  (lambda (array2D
           x-first-idx x-last-idx
           y-idx
           x-dim
           elt-handler)
    (letrec ((iter-x
              (lambda (x-idx y-idx)
                (elt-handler
                 (array2D-get array2D x-idx y-idx x-dim))
                (if (< x-idx x-last-idx)
                    (iter-x (+ x-idx 1) y-idx)))))
      (iter-x x-first-idx y-idx))))

Are we there yet?

This function receives all the parameters and walks through the array, starting with an arbitrary x-idx (in this case, zero), and proceeding until we have processed the last legal x index (in this case, x-idx - 1).

Note that instead of iteration to walk the array, I use recursion. I won't go into a full explanation of recursion here; suffice it to say that although many people feel that recursion is unnatural and does not reflect how they think about iteration in the "real world," after expressing algorithms recursively for a while, it starts to seem much more natural, and the proposition that recursion and iteration can be transformed into each other does not seem so strange.

This function is formulated using tail recursion, where the bindings are not referenced after the recursive function call. In C or Pascal, the compilers are generally not smart about this sort of thing, and recursing on a large array could result in a very deep set of stack frames. A language like Scheme is able to recognize that since the values that exist in previous function calls are not used, it is free to lose track of the previous values, so no deep stack is actually required. Since this is the case, the decision to use recursion was largely arbitrary. I chose it because I wanted to practice writing recursive functions.

The letrec may be unfamiliar. If the innermost function was not recursive, we could just call it like this, without ever binding the function to a name at all:

((lambda (x-idx y-idx)
   ... )
0 y-idx)

In this case the innermost function is never bound to a name, because nothing has to refer to it by name. The expression around it would simply call it. This is also known as applying parameters to the anonymous function object.

You might think that we could just do this with let. The semantics of let allow the call from outside the function, but the binding to the name iter-x would exist only within the body of the let expression. This body consists of the expression or expressions that occur after the binding list in the let expression. Since the definition of the function occurs as part of the binding list, not part of the body of the let expression, let will not work; letrec is designed specifically to allow definition of recursive or mutually recursive functions within the binding list.

So now are we there yet?

Almost. My final comment on this function concerns the use of the elt-handler parameter. The generated function allows the caller to pass in a handler function. Applying functions to functions is known as higher-order programming; support for higher-order programming is one of the essential features of Lisp and Scheme.

It might seem like this is much too complicated. Functions generating functions calling functions calling functions? Well, yes, this is a bit of overkill to access an array; normally we would use some standard functions for handling data structures. But the general style: creating a lot of small functions -- appears very commonly in Scheme programs, and now you have had a taste of how a typical data structure library might be implemented.

I don't claim to be very experienced with Scheme yet, but I am happy to have gotten to the point where writing functions like array2D-make-row-elt-mapper and its supporting functions comes rapidly and naturally. Reading examples helps, but the lessons don't really click until you tackle a programming problem, and feel your mind begin to undergo rewiring. The level of abstraction, and thus the leverage you can gain from Scheme, become much higher than this, so stay tuned.

Some Computer Science Classics

I've been building up a computer science library via ABE Books (http://abebooks.com), highly recommended), purchasing a couple of used books and a couple of new books each month.

Recently arrived used:

Hoare, Communicating Sequential Processes
Dijkstra, A Discipline of Programming
Agha, Actors

Recently arrived new:

Brodie, Thinking Forth

Some comments on these texts later.

Thinking about Sudoku and Scheme, Part 1

I've been working on some tools to help me solve Sudoku puzzles in Scheme. That goal is actually secondary to the goal of learning to write better, more idiomatic Scheme programs, and so progress towards the Sudoku solver is frequently interrupted while I backtrack in an attempt to reorganize and optimize the supporting code.

I'm working on this both in a bottom-up and top-down direction. Bottom up, to develop something of a domain-specific Sudoku-solving language. Top down, because my higher-level strategy is also not that clear at the outset and is undergoing stepwise refinement.

My approach to solving Sudoku by computer is not to recursively explore every the tree of moves until a solution is found. That approach is easily doable but less interesting to me than trying to emulate some of the solving strategies that can be performed by a human solver. Ultimately I would like the tool to assist a human solver, not replace her.

My tool of choice is PLT Scheme version 3.00. For reasons to painful to go into just now, I have currently given up on using either SRFIs or SLIB for libraries and am writing everything in plain RSR5 Scheme. More on that later.

Welcome

This blog, "Curse and Recurse" is a spinoff of my weblogs "Geek Like Me" and "Geek Like Me, Too."

http://thepottshouse.org/blosxom.cgi

http://geeklikemetoo.blogspot.com

The topic of this blog is pragmatic programming and programming languages. I am interested in clearer and more effective ways to express programs and tools that will get to bug-free code faster, Sapir-Worf be damned. I don't have a Ph.D. and this is not an exposition about theoretical type systems.

The languages I am currently most interested in are Scheme, and Ruby. I have long-term goals to learn idiomatic programming in Common Lisp and to find better approaches to embedded systems programming by using embedded languages such as Lua and Io. This is not currently another Ruby on Rails weblog and I am not doing web development these days, although I have done so in the distant past.

If you are still reading, welcome! Maybe we can both learn something.