02 July 2013

The Polar Game in Haskell, Day 5: Array v. List

So, a little more progress in learning me a Haskell: I've managed to implement the board using an immutable array. There's good news and bad news here. If you're an old hand at functional programming, you probably know all this and more, but I needed to do a little thinking on purely functional data structures. I have not really been satisfied with the amount of code necessary to manage my 2-D board in a list. I spent some time doodling some possible alternative implementation before concluding that purely functional data structures -- in which nodes are never mutated -- are hard. Anything I might be accustomed to doing with double or multiply-linked lists is pretty much a washout, since you can't ever share structure. In fact, I think one data structure I came up with might not be constructible at all without being able to mutate links between nodes. So I'm starting to understand why the tutorials all advise me to stick with lists.

Nevertheless, this is a small problem, and efficiency is not my biggest concern, at least not in the learning phase. I wanted to figure out how to use an immutable array. The tutorials have not been very satisfying. They seem to assume that anything this trivial is too trivial to demonstrate. But here's what I did.

First, the type of an array in Haskell encodes the number of dimensions and the node type, but not the size. You set that when you call the constructor. Here's a 2-D array type for my board:

type BoardArray = Array ( Int, Int ) Tile

I specified some bounds:

max_row :: Int
max_row = 3

max_col :: Int
max_col = 23

And I should point out one of the fundamental problems with using arrays: it's very easy to kill your program by exceeding the array bounds. There is a similar problem with head, but when writing functions with pattern-matching and guards there are pretty accepted conventions for dealing with empty lists. I suppose one could use guard patterns on all array accesses, but it starts to seem a little silly.

The next thing is that a given array works with some auxiliary types. The // operator takes an array and a list of tuples and builds a new array with updated content. The type of that list of tuples is this:

type TileAssocList = [ ( ( Int, Int ), Tile ) ]

For accessing multiple items in an array, the range method builds lists of indexing tuples. The syntax to range requires tuples of tuples, with the parentheses piling up, so I wrapped it up in a function:

make_2d_range :: Int -> Int -> Int -> Int -> [ ( Int, Int ) ]
make_2d_range y0 x0 y1 x1 = range ( ( y0, x0 ), ( y1, x1 ) )

So how does that work? It just iterates coordinates, permuting from higher indices to lower, like so:

*Main> make_range 0 0 0 1
[(0,0),(0,1)]

*Main> make_range 0 0 1 3
[(0,0),(0,1),(0,2),(0,3),(1,0),(1,1),(1,2),(1,3)]

For this problem domain, I need to know how reversed ranges work. For example, when the penguin is facing West, I want to build a range and a list of tiles in reverse index order. Can range do that for me?

*Main> make_range 0 23 0 0
[]

Ah... no. I guess that would have been too easy. So I'll have to account for those cases specially. Here's a function to get the penguin's view out of a 2-D array of tiles, in the form of a tile association list I can use to create a freshly created "modified" array (it's not really modified, but a new one is created with the updates from that list applied):

view_array :: BoardArray -> Pos -> Dir -> TileAssocList
view_array board pos dir =
    let row = ( posY pos )
        col = ( posX pos )
        coord_list = case dir of
            East  -> if ( col == max_col )
                     then []
                     else make_2d_range row ( col + 1 ) row max_col
            South -> if ( row == max_row )
                     then []
                     else make_2d_range ( row + 1 ) col max_row col
            West ->  if ( col == 0 )
                     then []
                     else make_2d_range row 0 row ( col - 1 )
            North -> if ( row == 0 )
                     then []
                     else make_2d_range 0 col ( row - 1 ) col
        tile_assoc = zip coord_list ( map ( (!) board )
                                           coord_list )
    in case dir of
        East -> tile_assoc
        South -> tile_assoc
        West -> reverse tile_assoc
        North -> reverse tile_assoc

That's not so bad. The key to this function is the ! operator -- this gets a tuple and an array and returns an element -- and I zip the elements up with their coordinate tuples. Note that a lot of the bulk of this function is handling the edge cases, because we don't want to apply an out-of-range coordinate tuple to !. There may still be a shorter, clearer implementation possible. By comparison, here's a list-of-lists version factored a bit using currying to make it as self-documenting as I could get it -- note the use of id to let me return a general function as orient. I'm sure it doesn't impress FP whizzes, but I'm kinda proud of it -- I feel like I'm starting to use Haskell a little more idiomatically:

view_list :: BoardList -> Pos -> Dir -> [Tile]
view_list board pos dir =
    let row = ( posY pos )
        col = ( posX pos )
        transposed = elem dir [ South, North ]
        reversed = elem dir [ West, North ]
        orient | reversed = reverse
               | otherwise = id
        trim = case dir of
            East -> drop ( col + 1 )
            South -> drop ( row + 1 )
            West -> take col
            North -> take row
        extract | transposed = ( transpose board ) !! col
                | otherwise = board !! row  
    in orient $ trim $ extract

Testing view_list:

*Main> view_list init_board_list (Pos 0 0) East
[Empty,Empty,Empty,Empty,Empty,Empty,Empty,Empty,Empty,Empty,Empty,Empty,
Empty,Empty,Tree,Empty,Empty,Empty,Empty,Empty,Ice_Block,Empty,Empty]

*Main> view_array init_board_array (Pos 0 0) East
[((0,1),Empty),((0,2),Empty),((0,3),Empty),((0,4),Empty),
((0,5),Empty),((0,6),Empty),((0,7),Empty),((0,8),Empty),
((0,9),Empty),((0,10),Empty),((0,11),Empty),((0,12),Empty),
((0,13),Empty),((0,14),Empty),((0,15),Tree),((0,16),Empty),
((0,17),Empty),((0,18),Empty),((0,19),Empty),((0,20),Empty),
((0,21),Ice_Block),((0,22),Empty),((0,23),Empty)]

Now we can write step. Here's the list version I've presented before:

step_list :: [Tile] -> ( Bool, [Tile] )
step_list [] = ( False, [] )
step_list ts = if walkable (head ts) then ( True, ts )
                                     else ( False, collide ts )

The array version is a little more complicated, because I want to strip the list I pass to collide down to just a list of tiles, in order to retain that clean logic for dealing with just a list of tiles. So I unzip my coordinate tuples from my tiles, get a potentially updated tile list, and zip it back together. That complicates it a bit, like so:

step_array :: TileAssocList -> ( Bool, TileAssocList )
step_array [] = ( False, [] )
step_array tile_assoc = if ( walkable $ head tile_list )
                        then ( True, tile_assoc )
                        else ( False, zip coord_list
                               ( collide tile_list ) )
    where ( coord_list, tile_list ) = unzip tile_assoc

I'm going to have to uglify my nice collide method a bit because I need to return at least one additional value -- indicating whether collide consumed a heart, so that we can keep score of the game.

Next up, you can see the array and list solutions start to diverge hugely. It's hard to merge the list-based board back together with the potentially updated tile list to create the next immutable list-based board. My original method was pretty hideous. With Jeff's refactoring it's still a lot of code. (Note: I don't have this completely working yet; I'm getting a run-time error about bad patterns I haven't quite figured out yet):

next_board_list :: BoardList -> Pos -> Dir -> ( Bool, BoardList )
next_board_list board pos dir =
    let ( penguin_could_move, updated_view_list ) = 
        step_list $ view_list board pos dir
    in ( penguin_could_move, update_board_from_view_list 
         board pos dir updated_view_list )

apply_view_list_to_row :: [Tile] -> Int -> Bool -> [Tile] -> [Tile]
apply_view_list_to_row orig pos True update =
    take ( pos + 1 ) orig ++ ( init update )
apply_view_to_row orig pos False update =
    ( reverse ( init update ) ) ++ ( drop pos orig )

apply_view_list_to_rows :: BoardList -> Int -> Int -> 
    Bool -> [Tile] -> BoardList
apply_view_list_to_rows orig row pos is_forward update =
    take row orig ++
    nest ( apply_view_to_row ( orig !! row ) pos is_forward update ) ++
    drop ( row + 1 ) orig

update_board_from_view_list :: BoardList -> Pos -> Dir -> 
    [Tile] -> BoardList
update_board_from_view_list board pos dir updated_view_list
    | is_eastwest = apply_view_list_to_rows board
                        ( posY pos ) ( posX pos )
                        is_forward updated_view_list
    | otherwise = transpose ( apply_view_list_to_rows ( transpose board )
                              ( posX pos ) ( posY pos ) 
                              is_forward updated_view_list )
    where is_forward = elem dir [ East, South ]
          is_eastwest = elem dir [ East, West ]

By comparison, the array is much more suited to create an updated version of itself, given a list of elements to update. This is handled by the // function, in this simple function to create the next board in array form, called from step_array:

next_board_array :: BoardArray -> Pos -> Dir -> ( Bool, BoardArray )
next_board_array board pos dir =
    let ( penguin_could_move, updated_view ) =
        step_array $ view_array board pos dir
    in ( penguin_could_move, board // updated_view )

I like that -- it looks like we're working with the data structure rather than against it, although the overhead to manage the ranges and lists still feels to me more complicated than it should be. That complexity carries over elsewhere: for example, pretty-printing the array requires that range logic again. In fact I wind up just wrapping up and re-using the logic to pretty-print the list, so you can see how much additional code I needed:

pretty_tiles :: [Tile] -> String
pretty_tiles [] = "\n"
pretty_tiles (t:ts) = case t of
                 Empty     -> "___"
                 Mountain  -> "mt "
                 House     -> "ho "
                 Ice_Block -> "ic "
                 Heart     -> "he "
                 Bomb      -> "bo "
                 Tree      -> "tr "
             ++ pretty_tiles ts

pretty_board_list :: BoardList -> String
pretty_board_list [] = ""
pretty_board_list (ts:tss) = pretty_tiles ts ++ pretty_board_list tss

split_tile_list :: [ Tile ] -> [ [ Tile ] ]
split_tile_list [] = []
split_tile_list ts = [ take tiles_in_row ts ] ++
                     ( split_tile_list $ ( drop tiles_in_row ) ts )
    where tiles_in_row = max_col + 1

pretty_board_array :: BoardArray -> String 
pretty_board_array board = pretty_board_list split_tiles
    where full_range = make_2d_range 0 0 max_row max_col
          all_tiles = map ( (!) board ) full_range
          split_tiles = split_tile_list all_tiles

As an aside, it seems like there ought to be at least one standard list split function, but it looks like folks don't really agree on how it should work

So there it is -- the array is kind of a mixed blessing here. I haven't done any large-scale profiling on it, to determine if the need to generate a whole new array each pass is a big loss, compared to the potential structure-sharing in the list implementation. It simplifies some of the code dramatically, while adding a layer of dealing with ranges and lists of tuples everywhere -- as soon as we want to pull items out of the array, or merge them back in to a new array, we're dealing with lists again. Still, given the ugliness of the list merge code, it seems like the more natural choice for this kind of small game board data structure.

8 comments:

Jeff Licquia said...

Took care of a little insomnia by fixing your list implementation. There are two problems.

First: the bad patterns error comes from two places where apply_view_to_row wasn't renamed to account for the "_list_" implementation. A quick search-n-replace fixed that right up.

After that, the list version "steals" tiles; each eastward move at the beginning chops a tile off the end, and eventually the moves collide with the shortened list and things blow up. I remembered you used to have Edge tiles, but decided to get rid of them, so I guessed that the "init update" calls in apply_view_list_to_row were for chopping off the Edge tile that's no longer there, so I just took out the "init" call. Either my hunch was right, or I'm just lucky. :-)

After those two fixes, the list and array implementations seem to produce identical output. I know you're more a fan of the array version, but I want to see the all-important performance evaluation later, which won't happen with broken lists.

I really like the new view functions. I still tend to be leery of the "case dir of..." thing happening over and over, but am getting tired again (yay! die, insomnia!), so not seeing the problem clearly.

Also, the pretty printer stuff is screaming for something like a fold or map, but I'm not seeing how that improves things at the moment. Maybe some sleep will bring clarity.

Michael Alan Dorman said...

I think (as something of a Haskell newbie myself, so take this with a grain of salt) what you want to be looking at is lenses to simplify your board manipulation code.

http://www.youtube.com/watch?v=cefnmjtAolY is a presentation from Edward Kmett about them.

Paul R. Potts said...

Thanks, Jeff. Sorry to hear about the insomnia! But I'm happy for the help.

Yes, when I got rid of Edge it broke a lot of things including my collide and slide methods, and that was not immediately apparent. I had to rewrite that extensively. I'm keeping the github version up to date: https://github.com/paulrpotts/arctic-slide-haskell

but I'm not going to go back and revise all the blog posts as I go -- that way lies madness : )

Paul R. Potts said...

Michael, I am vaguely aware of lenses as something I would like to master... and I've still got to get a better handle on monads. I feel like I understand them when explained but attempting to use one in my own code outside of IO has been a complete failure. I will take a look at that video.

Jeff Licquia said...

I know this is your learning exercise, not mine... but I ended up implementing scorekeeping with the Writer monad.

Are you interested in seeing it, or do you not want the fun spoiled?

Paul R. Potts said...

Jeff, I'd love to see it! The feedback and comments are what makes this really valuable for me to blog about, rather than just writing it and keeping the code to myself. Maybe it will be the moment when I achieve monad enlightenment! If you like, send me an e-mail and I'll make it a real entry. Or actually if you have a Blogger account, I could enable you as a contributor on Blogger, at least I think I could.

Paul R. Potts said...

I am taking a sanity break from glowing screens over the Fourth of July Weekend -- I'll get back to looking at this next week.

Cake said...

Wow, I'm so happy I found your blog. I was trying to model a game in haskell recently (I thought it would be a good training), and I stumble on a lot of things you did, like Arrays.

I'll be sure to keep an eye on your articles now :)