27 November 2006

Some Very Basic Haskell and Thoughts on Error Diagnosis

So, yesterday I attempted to write a very simple Haskell one-liner. I wanted to use a list comprehension to dump the results of some simple math on a range of numbers. Something like this:

[n / 5 | n <- [0..1024] ]

That's a tremendously concise program. You could read it as follows: "generate a list of n divided by 5 where n is drawn from the enumerated list of numbers 0 to 1024." Type this into GHCi (that's GCH in interactive mode) and it instantly spits out [0.0, 0.2, 0.4 through 204.8].

That's cool, but the reason I was writing this little program was actually to look at rounding behavior when doing division on hardware-sized integers. The '/' operator is not defined on the Int type, so if I ask the list comprehension to operate on a list of Int-typed values I understandably get an error:

Prelude> [n / 5 | n <- [(0::Int)..(1024::Int)] ]
<interactive>:1:1:
   No instance for (Fractional Int)
     arising from use of `/' at <interactive>:1:1-5
   Possible fix: add an instance declaration for (Fractional Int)
   In the expression: n / 5
   In the expression: [n / 5 | n <- [(0 :: Int) .. (1024 :: Int)]]
   In the definition of `it':
     it = [n / 5 | n <- [(0 :: Int) .. (1024 :: Int)]]

GHCi is giving me a lot of context here; that's nice. The message indicates the definition of "it." Apparently "it" is the way GHCi implements a REPL; my expression gets bound to "it" and evaluated. Then it zooms in on what goes wrong. Understanding this error message completely requires an understanding of type classes; I'm not fully up on type classes yet, but it obviously has something to do with type. The suggestion in this case is probably not what I want: I think it is suggesting that I could extend the slash operator to handle integer arguments. That would be a sensible suggestion if I was doing something a little more sophisticated with my own classes. But instead let's try div, which is supposed to do integer division, instead of the slash operator:

Prelude> [n div 5 | n <- [(0::Int)..(1024::Int)] ]
<interactive>:1:1:
   Couldn't match expected type `(a -> a -> a) -> t1 -> t'
          against inferred type `Int'
   In the expression: n div 5
   In the expression: [n div 5 | n <- [(0 :: Int) .. (1024 :: Int)]]
   In the definition of `it':
     it = [n div 5 | n <- [(0 :: Int) .. (1024 :: Int)]]

Whoah. Now the error message is different, but I'm doing something wrong. Let's see if we can decode it. It says that in n div 5 it is expecting the type (a -> a -> a) -> t1 -> t. What does that mean?

I think it means that div takes two values and returns a third, and I'm then trying to apply two arguments to the resulting value. Or something like that. This error message may have everything I need to know; maybe after studying my copy of Bird's Introduction to Functional Programming Using Haskell, which should be arriving in the mail any day now, I'll be able to decipher this; but then again, maybe not. Time to do some Googling. Googling complete, with no answer; time to ask for help.

I got a very quick and helpful reply on fa.haskell. (The newsgroup comp.lang.haskell supposedly has been created, but it doesn't seem to show up in Google Groups yet). The helpful reader suggested that I use `div` (note that those are backticks, not single quotation marks). What this really means is that div in Haskell is a prefix operator, not an infix operator like the slash. It expects its arguments to come afterwards. The magic backticks allow transforming a prefix operator into an infix operator. That works great.

My larger topic is error messages. Having tried to contribute just a little tiny bit on d2c, the Dylan to C compiler, motivated by stumbling across bugs and places where error messages were a bit weak, I can attest to the following principle. Since I've never heard it expressed anywhere else, I'm going to call it Potts' First Principle of Programming Language Implementation. Here it is:

A compiler implementation will require several times more code to provide a helpful diagnostic message than it does to detect a failure, issue a cryptic message, and exit. The diagnostic-generating code will be complicated, difficult to understand, and inelegant. It is nevertheless worth writing.

The difference between the ugly implementation which is helpful to the end user and the wonderfully elegant implementation that produces a cryptic failure message is that the former will be adopted and the latter will not.

Consider writing two methods, one that compiles good code quickly but doesn't give you much help when a failure is detected, and another fully instrumented method that diagnoses the problem in English. Having worked with a large number of different tools, I've found that the primary difference between a truly usable tool and one that is not so usable is the quality of the diagnostics.

A quick example from Dylan: Dylan allows multiple inheritance. Multiple inheritance introduces means that the inheritance graph of a given class can be not just a list but a DAG (a directed acyclic graph). This means that it is possible that you can have a diamond-shaped inheritance graph, in which one or more base classes is included via more than one path. This presents a problem when searching for the next applicable method. The rule, expressed informally, is this: the different partial orderings of the inherited classes, from child to parent to nth-generation ancestor, can't contradict each other. You can't have a class which is both a sports car first and a fire truck second, as well as a fire truck first and a sports car second. If you do, you don't know what is going to happen when you try to shift it into gear.

The d2c compiler complained about this situation with an error message that was, on the positive side, concise. It said "Inconsistent CPL" and quit.

The code in the compiler that detected this condition was a model of elegance. In order for it to display a much more detailed message, something like "class wonderCar: the partial class precedence list (sportsCar, fireTruck) reached via base class superClassTypeA contradicts the partial class precedence list (fireTruck, sportsCar) reached via base class superClassTypeB," some much uglier code has to be inserted, or wrapped around, that model of elegance, to accumulate the information to display to the end user. It's worth writing that method. It could even be implemented as a kind of exception handler when the elegant version fails.

Diagnostic messages: they're what's for breakfast.