08 March 2013

Scala, Day One

I am beginning an investigation of Scala. I have been a little bit disdainful of languages implemented on the JVM – but some of that disdain goes back to some impressions gained with much, much earlier versions of the JVM, plagued with various memory leaks. And I’m not a big fan of Java, having used it in anger enough to get to know its downsides in depth. But – the JVM seems to have improved considerably, and I shouldn’t blame the JVM for Java. As someone who has been pushing for the use of advanced languages in real, commercial projects my whole career, the only thing I regret about this is that despite my interest in languages like Haskell, I’ve rarely been able to use them “in anger” (that is, in a real, shipping product), at my day job. But that may be changing. It looks like Scala might just be, if not the one and only, at least an acceptable, practical Haskell.

So, I downloaded the Scala 2.10.0 binary distribution onto my Mac running MacOS X 10.8.2 (the latest update to Mountain Lion). I stuck the scala-2.10.0 directory into /Library/Scala/ and after a little sudo ln -s action to make symlinks to the binaries, I was in business, with no apparent issues.

There’s a free O’Reilly book online: Introducing Scala, so I am starting there. I also downloaded my “free” copy of Scala for the Impatient but it signed me up for a mailing list, appeared in my Downloads folder as a .pdf.flv file (WTF?), and launching that opens up Adobe Media Player, produces an error message, then demands that I update Adobe Air. (It turns out it’s actually a PDF. I have no idea why that happened). So I’m printing out the first 32 pages of that book as well, and I’ll read it tonight.

The introductory material in the O’Reilly text is quite interesting. I like the strong, static typing of variables. It’s something I like about Haskell. Except, of course, the Haskell doesn’t have variables. I like the “everything’s an object” – that’s Dylan. I like “everything has a value” – that’s Scheme. I like functional programing and object-oriented programming, both… so, in theory, Scala and I should get along quite well.

Here’s the first bit of sample code in the O’Reilly book:

class Upper {
  def upper(strings: String*): Seq[String] = {
    strings.map((s:String) => s.toUpperCase())
  }
}

val up = new Upper
Console.println(up.upper("A", "First", "Scala", "Program"))

The first thing I notice is that this code uses names that differ only in capitalization (Upper and upper). The upper method has nearly the same name as the class, and it’s not a constructor (following the C++ convention). This reminds me immediately of the style that Apple’s WebObjects used for Java code. It was very common to see code like this in WebObjects sample code:

Person person = person();

or even

Person person = (Person) objects().objectForKey("person");

WebObjects was incredibly powerful and you could get some cool things up and running with it very easily, but code like that always made me cringe. I thought one of the goals of Scala was to get away from Java’s overly-wordy syntax? But that looks like mostly a coding convention thing; let’s hope they don’t use redundant-looking names frequently.

Anyway, the class definition

class Upper {
  def upper(strings: String*): Seq[String] = {
    strings.map((s:String) => s.toUpperCase())
  }
}

doesn’t look too bad. It’s all in one place – no separate header like C++, and that code will execute without requiring any boilerplate module or namespace definitions or importing any standard modules. Classes have methods, unlike Dylan where they only have slots and are operated upon by specialized generic functions. (I really liked that paradigm, by the way, which Dylan borrowed from CLOS, but it seems to be out of fashion now. (Another note to self: does Scala support generic functions?)

Upper looks like a class with one method, upper, with a parameter called strings. (Why is it a class if it has only one method? Shhh, it’s just an introductory example. Move on!). I’m assuming String* is a type. But what is the significance of the asterisk in the type name? Is that a naming convention, indicating it is a general base class for strings, or does it indicate some kind of regular-expression-based pattern matching on type name? (Because that might be… well, weird?)

Ah… the book says that String* – that must be a type name – indicates a variable-length list – that is, you can provide any number of strings, including an empty list. So, a typed list (lists in old-school Lisp or Scheme are polymorphic and can contain objects of any type, but this has implementation consequences, requiring tagged or “boxed” data and a lot of runtime checks, while old-school Java’s lack of polymorphic containers was a headache for other reasons). The type of strings inside the method body is array of strings.

OK, I get that. In Haskell if I was writing a recursive algorithm that operated on a list, I’d probably pattern-match in order to specialize on the empty list (not take its tail, for example). But presumably if we aren’t doing some kind of guarding up front, it means we won’t be executing any operations on that list that fail catastrophically if the list is empty.

It looks to me like Seq[String] is a a return type – the specification of return types is sort of Dylanesque. Here’s a method definition in Dylan:

define method square-list (numbers :: <list>) => (out :: <list>)
  map(method(x) x * x end, numbers);
end;

Note the type of the incoming parameters and the return value is (a type name can contain punctuation in Dylan). In Scala, is Seq[String] an arbitrary name for a type, where the brackets mean something to the reader, or does that bracket syntax mean something to the compiler? (Note to self…)

Oh, the book says that Seq[String] is a parameterized type, like a generic type or instance of a type class. I think I get that, although I will no doubt need to study what Scala is doing with generic types much more thoroughly.

Next, I notice that we’re defining the function body with

def upper(strings: String*): Seq[String] = {
  strings.map((s:String) => s.toUpperCase())
}

in a pattern that looks to me like:

def name( parameter names and types ): return type or types = {
    body
}

That = { body } looks a little bit odd, but it raises some interesting possibilities (what if we don’t assign a function body?) I would also like it if the return type could have a name like the incoming parameters do; in Dylan you can do that, although since the name isn’t bound to anything in the method itself or in the calling method, it is strictly for documentation purposes.

The book tells me that the = is present because a function definition doesn’t have to be wrapped in brackets, and so it prevents some syntactic ambiguity. All right, I can live with that for now. Also, “functions are values.”

Moving on to

strings.map((s:String) => s.toUpperCase())

At first glance, I have to say I’m not entirely clear on what the => is doing. Is it some kind of “apply?” This line looks needlessly complicated. In Haskell, I would try to write this in a point-free style without naming any variables, just composing functions. In Haskell, the standard toUpper function does not take strings, but characters. If it did take strings, we could just write the function like so:

upper = map toUpper

In Haskell, strings are defined as lists of characters, so we can still compose it like so:

upper = map ( map toUpper )

and that works fine. Stick the following lines in a file load it into GHCi:

import Data.Char
pfupper = map ( map toUpper )

then type

pfupper ["These", "are", "Haskell", "strings"]

I get back

["THESE","ARE","HASKELL","STRINGS"]

But in Scala’s world everything is an object, and the functions we’re dealing with here are methods. I’ll grit my teeth a bit at the loss of elegance. Maybe we can get it back later – there’s a chapter on functional programming. For now, let’s press on and see how the book explains it.

Breaking down the method body, the outer expression, strings.map(), is just a call to the map method of the strings parameter. Strings is a list or array or some such, and mapping a data structure like that is a common operation in functional programming. Instead of passing a data structure to a method, we pass a method to a method of the data structure. That method is then called for each element, and the results are returned in a fresh list.

Oh, and since everything has a value, and functions or methods in this paradigm typically have the value of their last (or only) expression, there’s no return statement. That’s like Scheme or Dylan; no big deal.

What is up with that argument, (s:String) => s.toUpperCase()? Mr. Book says that this a function literal. I think that means it’s what Schemers call a lambda. In Haskell, we could do this with a lambda that looked something like:

\ s -> map toUpper s

putting it into a function gives us

lupper ss = map (\ s -> map toUpper s) ss 

And that works just like the point-free version did.

Anyway, all that Scala syntax is slightly off-putting, although new languages usually are. Syntax is both unimportant and the biggest source of frustration when adapting to a new language, especially when it’s your nth language, for some moderately large n; I keep falling into trying to write Haskell, or trying to write Scheme, or trying to write Dylan, or trying to write NewtonScript…

But wait, the book provides a simplified version:

object Upper {
  def upper(strings: String*) = strings.map(_.toUpperCase())
}

println(Upper.upper("A", "First", "Scala", "Program"))

That transforms

strings.map((s:String) => s.toUpperCase())

into

strings.map(_.toUpperCase())

That makes it a little more “point-free” since we don’t specify a name and type for the parameter to our lambda express. I’ll have to read up a little more about the underscore. It seems to be indicating that we should call the toUpperCase() method on whatever unnamed object comes into the lambda. We let the compiler deal with the type-checking.

That seems promising. Getting rid of the type and the name allows us to think a little more functionally. Note that this is very different than the Haskell undercore, which is used for pattern-matching. Maybe we should call it the “whatever” operator? Or the “let the compiler worry about it” operator? Hmmm… I’ll have to think about that some more.

So, what else is going on here? object defines a singleton… nice. Design patterns ahoy! (Yet another note to self – can you inherit from an object instance, as in Omega, or Self, or NewtonScript, or specify inheritance only from a class?) The return type of upper is inferred, not explicit. We don’t need the braces for a method that is only one expression.

More tomorrow! (Or maybe Monday.)