03 June 2013

Objective-C, Day 2

(Warning: non-FP content)

So today I'm working on chapter 3 in iOS Programming. This is about memory management. I have vague memories of manual reference-counting using retain and release in earlier experiments with Objective-C, but this book teaches ARM (Automatic Reference Counting). You enable ARM when you configure an XCode project of iOS.

The idea behind object references (well, pointers in iOS) and trees of objects and their owners is not new to me. And here they they still do not clarify whether a statement like items = nil actually invokes runtime behavior to destroy objects. I believe it does not. When I override a dealloc method to log, I see that the dealloc methods are called when the curly brace at the end of @autoreleasepool is reached. This makes sense, as in C it is the point where variables go out of scope. So reference-counting bookkeeping must be invoked then. It does not seem to be possible to step in with the debugger at this point to view what is happening, though -- Apple's Objective-C runtime is proprietary software.

We next get into weak references. There's a __weak specifier that can appear as part of an object pointer declaration, and iOS Programming says that "an interesting property of weak references is that they know when the object they reference is destroyed... the [parent object] automatically sets its [child object] instance variable to nil." I'm going to have to read more about that. I really wish this book were more precise in its language. There's another specifier, __unsafe_unretained, which is not set -- and so a pointer to a destroyed object could still be dereferenced. I suppose I should just be happy that this works and use it, but I'm the type to want to know what is going on at the register level. Apparently ARC (the "automatic" in reference-counting) is all due to some amazing work done in clang, where the static analysis actually figures out all the possible points of object creation and deletion and can wrap up your reference-counting for you. That seems like a great development -- because I've never quite liked how garbage collectors (at least, those without hardware support), no matter how efficient they are, had to actually be implemented, looking at dumb machine pointers. It seems to me that there could be room in purely-GC'ed languages for this kind of static analysis. But I haven't thought very hard about that yet.

Properties seem to be a shorthand, asking the compiler to provide getters and setters. There's a great example as to how much code this can eliminate. There's a strange wart in that we are asked to specify (nonatomic) in every property, because that is not the default setting. Presumably this is to support multiple threads accessing the object. Again, a dark corner to look into.

For a little refresher, I took a break by reading the first few chapters of Brad Cox's book. He outlines his vision of a software IC, along the lines of a hardware C, that would support reusability primarily through late binding, limited interfaces, and wrapping up functionality with data so that clients don't have to write the operations on an object's data type. It's an interesting vision.

Cox writes of early Objective-C:

Objective-C adds precisely one new data type, the object, to those C provides already, and precisely one new operation, the message expression.

Cox's book indicates that early versions of Objective-C used an intermediate preprocessor step, between the C macro-preprocessing step and the regular C compiler. If you are my age you might recall that, early on, C++ was treated similarly, with a program called CFront. This is no longer the case with either Objective-C or C++, although this approach lives on in tools like Qt, with its Meta Object Compiler, or moc).

Cox describes the pragmatic design of Objective-C:

One of Objective-C's key features is that it is always possible to bypass the object-oriented machinery to access an object's private information directly. This is one of a hybrid language's greatest theoretical weaknesses and one of its greatest pragmatic strengths. Low-level code is often best developed in a conventional manner, exploiting static binding to obtain high machine efficiency and strong type checking. Conversely, user-level code is often best written as objects. It is important that these levels interface smoothly and efficiently.

There is an interesting passage about object identifiers. In a slight muddling of his earlier statement he writes that "objects are identified by a new Objective-C data type called an id, or object identifier." In practice, though, this isn't really a new data type per se. The address (a pointer to) an object is this identifier. The veneer over C is so thin that, he writes, it would have been perfectly feasible to implement the dynamic message sends using straightforward C language calls like

reply = _msg_(aReceiver, "aMessage", argument1, ...)
(I assume he'd use C's variable-length argument list mechanism here), but for efficiency he wanted to avoid string comparison in the dispatch mechanism. Cox was quite aware of the reaction that messages like [anObject do:arg1 with:arg2] would inspire in C programmers, writing "the convention seems strange to those accustomed to conventional function call syntax, and frankly, I find it a mixed blessing."

In this formulation, Objective-C classes were still objects, and class methods were called "factory methods" (but I'm peeking ahead a number of pages, so there might be some differences from their current incarnation), as distinct from instance methods. Cox writes "...the programmer conceives the initial object... the loader gives it life by installing it in memory... this is done once for each class in the system. These primal objects are called factory objects... every factory's name is published in a global variable."

Cox is a very thoughtful writer and he presents a minimalist view of what an object-oriented language requires in order to provide the most basic advantage of OOP. He writes:

One last time: the only substantive difference between conventional programming and object-oriented programming is the selection mechanism. This is not to minimize its importance, but to demystify what is basically a very simple notion. Its significance is that it moves a single responsibility across the boundary between the consumer of a service and the supplier of that service. In conventional programming, the consumer is responsible for choosing functions that operate properly on the data managed by that service. The selection mechanism moves that single responsibility off the consumer and onto the supplier. In this small change lies all the power of the technique.

That really caused me to, as they say, nearly drop my monocle. Could this really be where most of the advantage, such as it is, imperfectly realized, understood, and applied, of OOP comes from?

The mainstream business world got C++ instead, a mash-up of C and Simula, and then Java instead, the Cobol of object-oriented programming languages. I was not pleased. As I consider my career I also consider which implementation languages I should focus my efforts on, for the next decade. After exposure to Haskell it's hard to believe that, ultimately, functions -- without explicit state -- won't prove to be a cleaner reusable element than classes, whatever kind of binding they use. And don't get me wrong -- I like classes. I'm pretty certain that I'll still be writing C code, at least occasionally, in ten years. I'd prefer to be writing less C++. But what I'd really like is to move on -- to pick a "last programming language." Objective-C isn't that, for me -- it's too imperative, and can't truly be made safe, just safer. I'm enjoying it in this context, but would Objective-C even be an viable option outside of the Apple ecosystem? It doesn't seem to have gained much adoption. The state of GnuStep doesn't seem terribly robust. And so my career-long quest for a Better Way continues, while at the same time trying to gain facility with all the Good Ways along the way.

No comments: