07 March 2007

Haskell for the Short Attention Span: Run-Length Encoding, Part 3

This isn't a big improvement, but it occurs to me that my implementation from last time:

>deRLE :: [Integer] -> Bool -> Integer -> [Integer]
>deRLE [] False _ = []
>deRLE points False max = deRLEHelper points max
>deRLE points True max = deRLEHelper (0 : points) max

>deRLEHelper :: [Integer] -> Integer -> [Integer]
>deRLEHelper [] max = []
>deRLEHelper [unmatched] max = [unmatched..max]
>deRLEHelper (start:end:rest) max = [start..end - 1]
>    ++ deRLEHelper rest max

can be yet shorter; since deRLEHelper handles the empty list specially, deRLE doesn't need to:

>deRLE :: [Integer] -> Bool -> Integer -> [Integer]
>deRLE points False max = deRLEHelper points max
>deRLE points True max = deRLEHelper (0 : points) max

That way only the function that actually recurses down to the empty list has to explicitly handle it.

Can we make it even more concise and still retain the self-documenting aspects of the code? I'll have to come back to that question. It's time to respond to comments. When I first mentioned my triplify function, which breaks a string into strings of three characters, I wrote "I'm sure there is a very clever one-line fold that applies take 3 in an mind-expanding manner, but I was unable to find it." I got some great suggestions. The first, due to Cale Gibbard, was:

map (take 3) . takeWhile (not . null) . iterate (drop 3)

Let's look at it piecewise as a pipeline, from right to left.

First, our list/string (for example, "ABCDEFG") is fed to the function iterate along with (drop 3). Iterate is an interesting function. I tried to understand it a few months ago by reading the description at zvon.org, which says that iterate

creates an infinite list where the first item is calculated by applying the function on the second argument, the second item by applying the function on the previous result and so on.

They give the following example:

Input: take 10 (iterate (2*) 1)
Output: [1,2,4,8,16,32,64,128,256,512]

Note that if the behavior followed the description, the output would be [2,4,8,16,32,64,128,256,512,1024]. Their description is wrong, although the output matches GHCi's output. What does iterate really do? I'll describe it in the form of a poem.

iterate takes a function and a value, and returns a list consisting of:

the value

followed by
  the value of applying the function to the value

followed by
  the value of applying the function to
    the value of applying the function to the value

followed by
  the value of applying the function to
    the value of applying the function to
      the value of applying the function to the value
      
it will keep talking
  even when
    it has nothing
      new to say
        so please don't
        ask it any
          open-ended
            questions...

That is, I can't evaluate

iterate (drop 3) "ABCDEFG"

because even though the input is finite, the output isn't; there's no termination to the recursion. However, I can evaluate part of the result:

take 4 (iterate (drop 3) "ABCDEFG")
["ABCDEFG","DEFG","G",""]

And because of Haskell's lazy evaluation, we can feed this infinite result "upstream" to be used in the remainder of our calculation, and it will work fine as long as it doesn't evaluate the complete result. The next stage in our "points-free" pipeline is:

takeWhile (not . null)

The takeWhile function operates on a list, returning a list made up of its values as long as they match the predicate supplied. This has the effect of squashing the overly talkative iterate, because we stop evaluating its results as soon as it returns the first null value. Lazy evaluation is so cool!

Now, our finite list of strings is handed off to map (take 3), which generates a list of 3-character prefixes. Here's the whole process:

iterate (drop 3) "ABCDEFG"

yields an infinite list beginning ["ABCDEFG","DEFG","G",""];

takeWhile (not . null)

applied to the above will take elements from the list until it hits the first null: ["ABCDEFG","DEFG","G"]

map (take 3)

applied to the above will give us ["ABC","DEF","G"]. Note that the short remainder is preserved.

>

That seems so useful, I'm surprised it isn't a function in the Standard Prelude, called groupsOf. Although it would also be nice to have a version which didn't keep the short remainder, perhaps groupsOfOnly.

I got another suggestion from a user called "kirby." He suggested the function

takeWhile (\="") . List . unfoldr (Just . splitAt 3)

Wow, my first unfoldr! I'm going to have to stop there for today, while I learn a little something about Just, Nothing, and the Maybe warm fuzzy thing. My first funster!

1 comment:

Cale Gibbard said...

Speaking of warm fuzzy things, it's also possible to write that last one as

unfoldr (\x -> guard (x /= "") >> return (splitAt 3 x))

Probably not any clearer, but it makes better use of unfoldr.

Personally, I like the following definition of unfoldr, which turns out equivalent to the original, but often more natural to use:

unfoldr' p f g = map f . takeWhile p . iterate g

(Now you perhaps see how my original definition fits in with this.)

Under that definition we can write triplify as:

unfoldr' (not . null) (take 3) (drop 3)

The reason why the two versions of unfoldr are equivalent is that:

unfoldr f = unfoldr' isJust (fst . fromJust) (f . snd . fromJust) . f
and
unfoldr' p f g = unfoldr (\x -> guard (p x) >> Just (f x, g x))

Despite the bit of awkwardness in reexpressing the original unfoldr, the 3 parameter version seems closer to most applications where you really need it.

Another example is computing digits of a number in say, base 10:

digits = unfoldr' (/= 0) (`mod` 10) (`div` 10)