My Hacking Journal: 2009

Monday, April 27, 2009

Using the type system for discoverability and enforcing constraints

Now, anyone from a Ruby or other dynamically typed language background who read my last post might be thinking "big deal, I've been able to do that for ages". And it's true; some of these languages with support for first-class functions make writing functional code just as easy to write as it is in Scala and the like. So why would I want to use a strongly statically typed language over a dynamically typed one?

Well, I'm glad you asked. Before I really got stuck into understanding functional programming and strongly statically typed languages, I spent some time working on a Rails app. Unlike most of the experiences I'd heard from people moving to Ruby/Rails though, I joined this project during the maintenance phase. What followed for me was an up-hill battle of trying to understand the code and gain the implicit knowledge of my other colleagues who worked on the code before me. The codebase left a lot to be desired in terms of discoverability. Due to the magic of meta-programming, I couldn't even rely on grep to find all references to a particular method.

One example that remains burned into my memory was the frustration surrounding something as seemingly simple as:


tags = item.tags

What is the type of tags? Is it an array of Tag objects, an array of string tag names, or a comma separated string of tag names?

The answer in my case was all of the above! What it was at runtime depended on the code path that lead to the tags property being set on the item. By coincidence/implicit design/whatever, it just so happened that the way it was used was consistent across the code paths that invoked it, so it never showed up as a problem until I came in one day to re-use the data in a different context. Not even the tests (which covered > 98% of the code base) picked this up for me, because the tests were written upfront for the particular context that was being implemented and so it was always set to what was expected.

It got to the point where I was so paranoid about releasing anything because I wasn't sure of all of the knock-on effects my changes would have. I was bitten too many times by not executing all of the code paths that lead to the point of my changes, because I couldn't find them all up front. I was paranoid about deleting code that I didn't think was being used any more, and even worried about factoring common-looking code out because at least on one of each of those occasions, a bug turned up either late in testing or actually in production.

My flirt with mid-size scale projects in dynamically typed languages wasn't all doom and gloom though. The second rails app I worked on went a lot smoother. I put this more down to the fact that it was a green-fields project rather than because I was more familiar with the language/framework. In developing the project from the start, I had all of the assumptions and implicit knowledge about the types in my head. As a result, using the Rails framework was a breeze, and we managed to get the project completed on time and on budget.

Yes, it could be said that my first experience was just a one-off; that the previous developers didn't always stick to the conventions (whatever they're supposed to be), or were doing things in ways I wasn't aware of. But it's exactly some of these implicit assumptions and conventions that can be enforced by the compiler in a strongly statically typed language. There's no need to write your all assumptions as comments in code; many can be encoded in the language you're writing in.

Consider this code:


# user.rb
 class User
   def authorised?
     ...
   end
   ...
 end

# admin.rb
 class Admin
   def Admin.reset_system(user)
     check_authorisation(user)
     ...
     send_admin_email(user, "System was reset")
   end

   def Admin.send_admin_email(user, msg)
     check_authorisation(user)
     ...
   end

   def Admin.check_authorisation(user)
     raise "Unauthorised" unless user.authorised?
   end
 end

Without looking at the implementation, the tests or the documentation (assuming it exists), there is no way to indicate that clients calling the resetSystem and sendAdminEmail functions must pass in an authorised user (never mind the fact that you don't know for sure if the methods accept a User instance, a user identifier, or something else). Also note the authorisation check duplication. You could probably do some meta-programming to do the check before the methods are called, thus not having to worry about remembering to add the check for each authorised method in the future. However, the check is now removed from the context of the method, and it would be easy to circumvent the check if you needed to do a quick hack to get something working. Yes, the tests you (are supposed to) write should catch these problems, but they don't give you any guarantee over future changes (and from my own personal experience, the tests are a PITA to maintain).

Now, imagine not only being able to get around the problems above, but to do so in a way where it is practically not possible to get the code past the compiler without it conforming to your assumptions (unless you subvert the type system, in which case you're defeating its purpose). This is the situation I've become used to in a strongly statically typed language.


// User.scala
  class User(...) {
    def authorised_? = ...
  }

// AuthorisedUser.scala
  sealed abstract class AuthorisedUser(val user:User)

  private class AuthorisedUserImpl(user:User) extends AuthorisedUser(user)

  object AuthorisedUser {
    def authorise(user:User):Option[AuthorisedUser] =
        if (user.authorised_?) Some(new AuthorisedUserImpl(user)) else None
  }

// Admin.scala
  object Admin {
    def resetSystem(user:AuthorisedUser) = {
      ...
      sendAdminEmail(user, "System was reset")
    }
    def sendAdminEmail(user:AuthorisedUser, msg:String) = ...
  }

A quick note about the Scala syntax again:

Scala makes the distinction between objects and classes, where the former declares a singleton and only defines static functions (eg. authorise), and the latter can be instantiated and defines instance methods (eg. authorised_?). Java does not make this distinction; a class can define both static functions and instance methods.

The sealed keyword denotes a class that cannot be extended from outside of the file it is defined in. In this case, as a client, you cannot circumvent the authorisation check by creating another subclass of AuthorisedUser to pass in to the restricted functions.

Some and None are subclasses of the Option type in Scala. In this case, it allows an optional AuthorisedUser to be returned from the authorise function depending on whether or not the underlying user is authorised.

From the above code, it is just not practically possible to write client code that calls the resetSystem and sendAdminEmail functions without first calling the authorise function. If you try, your code won't compile. None of this finding out at test-time/run-time - it can't run, because it won't compile.

Not only that, but the compiler actually removes the need to test authorisation checks for the Admin functions. There are no tests to write that would make sense; either the user is authorised, or the code doesn't compile!

As you can see, in this case the statically typed language allows more assumptions to be made explicit in the code. If I decide to do a one-off hack, the compiler can be used to keep me honest, and I've found that it generally makes me think more about the problem I'm trying to solve. Yes, it can be frustrating trying to get your code to pass the compiler (especially when you think it's right), but more often than not, when it does compile, I am a lot more confident that it is correct in terms of being consistent with my assumptions. The type system can also be seen as the ultimate defensive coding practice. With a type system, discussions like these tend to become moot.

The most common objection I've come across to using a statically typed language over a dynamically typed language is that the former is more verbose and requires more code to achieve the same functionality in the latter. This is only true of languages with a poor type system implementation, and has already been addressed many times before. I wasn't going to mention it, but for those who didn't notice, the strongly-statically typed version above actually has fewer lines of code than the dynamically typed version. It also requires fewer tests (because there's no way to call the admin functions with an unauthorised user) and has all the other benefits of statically compiled code. Win-win-win.

Whilst there will be more examples of some of the advantages you can get from the type system, I don't plan to write too much more on the specific topic of Dynamic vs. Static types as it has already been covered many times before. I am very open to criticisms of the above comparison, as long as it focuses on the topic at hand, not the actual implementation (it is one of the smaller examples I could come up with to try get my point across, and as such, it is somewhat contrived).

Also note that you can get these benefits in a language with a weaker type system, such as Java. However, the ceremony involved in declaring the types usually makes it infeasible to do for the trivial cases and painful for the complex cases. Below you can see the previous Scala code that has been ported to Java, making use of the FunctionalJava library:


// User.java
  public class User {
    public boolean isAuthorised() {
      ...
    }
  }

// AuthorisedUser.java
  public abstract class AuthorisedUser {
    final User user;
 
    private AuthorisedUser(final User user) {
      this.user = user;
    }
    
    private static class AuthorisedUserImpl extends AuthorisedUser {
      public AuthorisedUserImpl(final User user) {
        super(user);
      }
    }
    
    public static Option<AuthorisedUser> authorise(final User user) {
      return user.isAuthorised()
          ? Option.<AuthorisedUser>some(new AuthorisedUserImpl(user))
          : Option.<AuthorisedUser>none();
    }
  }

// Admin.java
  public class Admin {
 
    public static void resetSystem(final AuthorisedUser user) {
      ...
      sendAdminEmail(user);
    }
 
    public static void sendAdminEmail(final AuthorisedUser user) {
      ...
    }
  }

Friday, March 13, 2009

Filtering a list

Ok, let's start off with an easy one. How many times have you written something like the following:


  public Iterable<User> getManagers(Iterable<User> users) {
    List<User> managers = new ArrayList<User>();
    for (User user : users) {
      if (user.isManager()) {
        managers.add(user);
      }
    }
    return managers;
  }

I know I've written a lot of code like that in the past. Each time, there's something slightly different. Either the types are different or the test in the if statement is different. For example, imagine a similar requirement to get the list of active users. There doesn't appear to be much more you can do aside from:


  public Iterable<User> getActiveUsers(Iterable<User> users) {
    List<User> activeUsers = new ArrayList<User>();
    for (User user : users) {
      if (user.isActive()) {
        activeUsers.add(user);
      }
    }
    return activeUsers;
  }

Well, there is some stuff we can factor out if we really wanted to:


  private static interface Condition {
    boolean check(User user);
  }

  private Iterable<User> filterUsers(Iterable<User> users, Condition cond) {
    List<User> filteredUsers = new ArrayList<User>();
    for (User user : users) {
      if (cond.check(user)) {
        filteredUsers.add(user);
      }
    }
    return filteredUsers;
  }

  public Iterable<User> getManagers(Iterable<User> users) {
    return filterUsers(users, new Condition() {
      public boolean check(User user) {
        return user.isManager();
      }
    });
  }

  public Iterable<User> getActiveUsers(Iterable<User> users) {
    return filterUsers(users, new Condition() {
      public boolean check(User user) {
        return user.isActive();
      }
    });
  }

However, given that it's more code than the two original methods, I probably wouldn't actually do that unless I had a lot of similar looking methods. In any case, there's still duplication in the creation of the conditions. Without resorting to reflection (which would instantly open myself up to a whole new class of bugs), I can't think of much more that can be done with this code in Java.

In terms of the bounds of readability I mentioned in my dilemma, I would have left it at the first two methods and moved on.

Now take a look at the same two methods written in Scala; a language that supports first class functions:


  def getManagers(users:List[User]):List[User] = {
    return users.filter(isUserManager);
  }

  def getActiveUsers(users:List[User]):List[User] = {
    return users.filter((user:User) => user.isActive());
  }

  def isUserManager(user:User):Boolean = {
    return user.isManager();
  }

First, some quick notes about the syntax:

The def keyword is used to declare a function or a method

In Scala, type annotations come at the end of the parameter and function declarations (after the ':' character)

Generics in Scala are expressed using square brackets as opposed to Java's angle brackets

On to the semantics of the code. The filter function on Scala's List class is a higher-order function, which in this case it means that it is a function that takes another function as a parameter. In the first case, you can see how the parameter to the filter function is another named function that is defined later on. In the second case, the parameter is defined in-line as an anonymous (unnamed) function.

In this example, the type of the parameter that List.filter expects is:

A function, which takes a single User parameter, and returns a Boolean value.

In the example, the List.filter function returns a (possibly empty) List of User objects.

I intentionally wrote out the Scala code in a verbose manner to try to make it easier read if you're coming from Java. However, in most cases there are a lot of aspects of the Scala syntax that are optional, such as semicolons, the return keyword, braces for single statements, and return types when they can be inferred. As someone who is more comfortable with the language, I find it easier to write it and at least as easy to read it as:


  def getManagers(users:List[User]) = users.filter(isUserManager)
  def getActiveUsers(users:List[User]) = users.filter((user:User) => user.isActive())
  def isUserManager(user:User) = user.isManager()

Actually, I would probably inline the isUserManager function because it's almost like an alias now, and because the user parameter in the anonymous function is only used once and the type of it can be inferred, the last function can be shortened further still:


  def getManagers(users:List[User]) = users.filter(_.isManager())
  def getActiveUsers(users:List[User]) = users.filter(_.isActive())

Now, imagine getting used to writing code like that and then coming back to this:


  public Iterable<User> getManagers(Iterable<User> users) {
    List<User> managers = new ArrayList<User>();
    for (User user : users) {
      if (user.isManager()) {
        managers.add(user);
      }
    }
    return managers;
  }

  public Iterable<User> getActiveUsers(Iterable<User> users) {
    List<User> activeUsers = new ArrayList<User>();
    for (User user : users) {
      if (user.isActive()) {
        activeUsers.add(user);
      }
    }
    return activeUsers;
  }

And, if you'll allow me to mix my metaphors, this is barely scratching the tip of the iceberg.

Wednesday, March 4, 2009

Moral dilemma

I have found myself in the unfortunate position of being asked to dumb-down the work that I do so it is understandable to an "average" programmer (whatever that means). I understand where the request is coming from, ie, there is a fear that if it's not dumbed down, there will be a difficulty in finding someone to understand and maintain the work in the future. However, for me to comply with the request would mean to knowingly write code that is less correct, contain more duplication (both in terms of the effort and the resulting code), and have more known potential traps for future maintenance - something that the "average" programmer would tell you is a "bad thing".

How did I find myself in this position?

In the past, I would try to adhere to the principles of DRY, but on a number of occasions, I would get to the point where in order to avoid repetition, I had to bend the language that I used in ways it wasn't designed for. The resulting code would end up being verbose and very difficult to understand after time, and I still wasn't achieving the re-use that I thought should be possible. In the end, I learned roughly where some of those boundaries of readability lay, and resigned myself to the belief that the duplication was a fact of life.¹

Then, about a year ago, I was lucky enough to spend a bit of time learning about Functional Programming and Strong, Static Type Systems with someone who is a self confessed fundamentalist functional programmer. Over many months, I was able to change the way I thought about programming. My progress was very slow to begin with, and I found that a lot of the discussions and reading I was doing on the subject were out of reach of my understanding. However, through continual encouragement, and when I finally started doing the excercises and writing small programs in this new way, things started to sink in and make sense. I didn't change my way of thinking in a single Eureka! moment, but more in a series of small "A ha, now I think I get it" moments.

Now I can write far more concise code using very powerful abstractions, with the added benefit that a large class of potential bugs don't even get past the compiler.

What am I going to do about it?

Back to my dilemma. Obviously, if I want to do something about it, I need to present my argument in a convincing way. I don't have a way of showing the whole picture in a single, simple, succinct way. In fact, I'm very doubtful that such a way exists. There was a lot of unlearning I had to do before I could really begin to see the benefits of functional programming.

I had planned to write up my experiences along the way while I was learning, but I didn't think I could do any better than what other people have already put out there; people who have come to the same conclusions and made the same observations that I have. Having been forced back into the position of writing imperative code though has given me a good reason to get my thoughts down in writing.

So, with that in mind, I'm planning to write up some examples of situations I've come across, where I have relented and written repetitive, less correct, or more error prone code in order to fulfill the contradictory requirements that lead to my dilemma. Using these examples, I hope to build a case for using languages that support functional programming and strong, static type systems even if these languages aren't popular or in the mainstream.

From the personal experience of someone who heavily bought into TDD, OOAD, Design Patterns, etc, I can predict that some of the examples will look wrong, or completely opposite to what is regarded as "best practice" in the mainstream. I will try to identify those areas and drill down into them when I get a chance. In the meantime, you should at least be prepared to question your current knowledge.

This is not going to be another burrito tutorial, and I can almost guarantee that anyone who reads only what I plan to write will not come away with an understanding of Functional Programming. The aim is to show some of the possibilities and maybe encourage people to look further into the context that makes it possible.

My Hacking Journal