09 February 2006

Thinking about Sudoku and Scheme, Part 5.1

Just a brief post today. I have been too busy to do much on my Sudoku solving program itself, but I did spend some time polishing up my array2D library. With revision this library has finally assumed what I feel is its "natural" shape. The optimal design was not clear at the outset, but with some use I feel I've discerned the implementation that minimizes the amount of redundant code and thus the number of places that possible errors may be hiding. I'm sure it could be even more concise; I have a feeling that a little macro magic could cut the lines of code in half, but the shape of that implementation has not yet become clear to me, and I have an allergy to using techniques that don't provide a clear advantage.

It is really quite a simple library, but in order to provide some assurance of correctness, I also wrote a series of PLT Scheme test cases. I have not figured out how to use DrScheme to tell me if this test suite covers all of the code, or if this is even possible, but it is at least nearly complete.

The library does not do meaningful error-checking. Scheme textbook code rarely does anything by way of error-checking. As a result I'm deficient on strategies I could use to address this, especially when trying to write to RSR5.

It isn't possible to code around every conceivable error. For some applications, it makes sense to perform verification any time data is read from or written to memory. We can't operate at that level. We also shouldn't put in effort to detect flaws in the implementation itself. Given our partially functional approach, in which we don't update any top-level definitions, there aren't a lot of unexpected states our library can get into. We should focus our attention on the response to incorrect inputs. There are only a couple of classes of errors that the library should expect:

1. The set of array boundary errors. Because coordinate pairs are mapped to a single index on a vector, many of possible boundary errors will result in an incorrect cell being set or get. Only cell references that exceed the internal representation will generate a meaningful run-time error, and this error will not tell us if the x or y index was out of bounds.

2. The set of errors that can arise from receiving an incorrect value-generator or mapping function. In Scheme, a function signature really consists only of the name and the number of parameters, and does not include type information about the parameters. Supplying an incorrect number of parameters will be caught with a meaningful error at run-time but because the passed functions may be called via several levels of indirection, and the function may not be bound to a named variable, it may be difficult to glean from the error message just what has gone wrong. (Another way to say this is that DrScheme has thrown away too much context information).

Type errors are potentially more insidious, but in my implementation these functions will tend to receive an array (vector), integer indices, and possibly another function. The result of doing a vector access on an integer or function, or treating a vector or function like an integer, should at least fail fast with a moderately meaningful error.

There are richer language constructs that could address these problems. For example, either Common Lisp or Dylan would allow me to specify type information for the parameters. The runtime is then capable of flagging type errors as soon as they happen, or better yet, verify type safety across functions and generate code that operates on POD (plain old data). Dylan also supports limited types, although implementation support in Gwydion's d2c is spotty. Limited types would allow range checking further upstream. Of course, if I wrote this program in Common Lisp or Dylan, I would not bother simulating an array. If our language supported exception handling, I could wrap the top-level operations to catch exceptions without too much modification inside all my functions.

Another approach to bounds-checking in my 2-D array would be to encapsulate the array data together with metadata on its x and y dimensions. I am doing this to a limited extent with the closure generation, but the captured x-dimension is not used to validate that the x-index is within bounds. Other array implementations use a record or a closure to wrap up the representation itself. But then it is more difficult to display the contents of the array, and if we are going to start down that road, we might as well try to generalize our library for an n-dimensional array. That seems like overkill, so for now we will call array2D complete as it is.