A journal to record my notes and ideas related to software development and computing

Friday, March 13, 2009

Filtering a list

Ok, let's start off with an easy one. How many times have you written something like the following:

public Iterable<User> getManagers(Iterable<User> users) {
List<User> managers = new ArrayList<User>();
for (User user : users) {
if (user.isManager()) {
managers.add(user);
}
}
return managers;
}

I know I've written a lot of code like that in the past. Each time, there's something slightly different. Either the types are different or the test in the if statement is different. For example, imagine a similar requirement to get the list of active users. There doesn't appear to be much more you can do aside from:

public Iterable<User> getActiveUsers(Iterable<User> users) {
List<User> activeUsers = new ArrayList<User>();
for (User user : users) {
if (user.isActive()) {
activeUsers.add(user);
}
}
return activeUsers;
}

Well, there is some stuff we can factor out if we really wanted to:

private static interface Condition {
boolean check(User user);
}

private Iterable<User> filterUsers(Iterable<User> users, Condition cond) {
List<User> filteredUsers = new ArrayList<User>();
for (User user : users) {
if (cond.check(user)) {
filteredUsers.add(user);
}
}
return filteredUsers;
}

public Iterable<User> getManagers(Iterable<User> users) {
return filterUsers(users, new Condition() {
public boolean check(User user) {
return user.isManager();
}
});
}

public Iterable<User> getActiveUsers(Iterable<User> users) {
return filterUsers(users, new Condition() {
public boolean check(User user) {
return user.isActive();
}
});
}

However, given that it's more code than the two original methods, I probably wouldn't actually do that unless I had a lot of similar looking methods. In any case, there's still duplication in the creation of the conditions. Without resorting to reflection (which would instantly open myself up to a whole new class of bugs), I can't think of much more that can be done with this code in Java.

In terms of the bounds of readability I mentioned in my dilemma, I would have left it at the first two methods and moved on.


Now take a look at the same two methods written in Scala; a language that supports first class functions:

def getManagers(users:List[User]):List[User] = {
return users.filter(isUserManager);
}

def getActiveUsers(users:List[User]):List[User] = {
return users.filter((user:User) => user.isActive());
}

def isUserManager(user:User):Boolean = {
return user.isManager();
}

First, some quick notes about the syntax:

  • The def keyword is used to declare a function or a method

  • In Scala, type annotations come at the end of the parameter and function declarations (after the ':' character)

  • Generics in Scala are expressed using square brackets as opposed to Java's angle brackets


On to the semantics of the code. The filter function on Scala's List class is a higher-order function, which in this case it means that it is a function that takes another function as a parameter. In the first case, you can see how the parameter to the filter function is another named function that is defined later on. In the second case, the parameter is defined in-line as an anonymous (unnamed) function.

In this example, the type of the parameter that List.filter expects is:

A function, which takes a single User parameter, and returns a Boolean value.

In the example, the List.filter function returns a (possibly empty) List of User objects.


I intentionally wrote out the Scala code in a verbose manner to try to make it easier read if you're coming from Java. However, in most cases there are a lot of aspects of the Scala syntax that are optional, such as semicolons, the return keyword, braces for single statements, and return types when they can be inferred. As someone who is more comfortable with the language, I find it easier to write it and at least as easy to read it as:

def getManagers(users:List[User]) = users.filter(isUserManager)
def getActiveUsers(users:List[User]) = users.filter((user:User) => user.isActive())
def isUserManager(user:User) = user.isManager()

Actually, I would probably inline the isUserManager function because it's almost like an alias now, and because the user parameter in the anonymous function is only used once and the type of it can be inferred, the last function can be shortened further still:

def getManagers(users:List[User]) = users.filter(_.isManager())
def getActiveUsers(users:List[User]) = users.filter(_.isActive())

Now, imagine getting used to writing code like that and then coming back to this:

public Iterable<User> getManagers(Iterable<User> users) {
List<User> managers = new ArrayList<User>();
for (User user : users) {
if (user.isManager()) {
managers.add(user);
}
}
return managers;
}

public Iterable<User> getActiveUsers(Iterable<User> users) {
List<User> activeUsers = new ArrayList<User>();
for (User user : users) {
if (user.isActive()) {
activeUsers.add(user);
}
}
return activeUsers;
}

And, if you'll allow me to mix my metaphors, this is barely scratching the tip of the iceberg.

Wednesday, March 4, 2009

Moral dilemma

I have found myself in the unfortunate position of being asked to dumb-down the work that I do so it is understandable to an "average" programmer (whatever that means). I understand where the request is coming from, ie, there is a fear that if it's not dumbed down, there will be a difficulty in finding someone to understand and maintain the work in the future. However, for me to comply with the request would mean to knowingly write code that is less correct, contain more duplication (both in terms of the effort and the resulting code), and have more known potential traps for future maintenance - something that the "average" programmer would tell you is a "bad thing".

How did I find myself in this position?


In the past, I would try to adhere to the principles of DRY, but on a number of occasions, I would get to the point where in order to avoid repetition, I had to bend the language that I used in ways it wasn't designed for. The resulting code would end up being verbose and very difficult to understand after time, and I still wasn't achieving the re-use that I thought should be possible. In the end, I learned roughly where some of those boundaries of readability lay, and resigned myself to the belief that the duplication was a fact of life.1

Then, about a year ago, I was lucky enough to spend a bit of time learning about Functional Programming and Strong, Static Type Systems with someone who is a self confessed fundamentalist functional programmer. Over many months, I was able to change the way I thought about programming. My progress was very slow to begin with, and I found that a lot of the discussions and reading I was doing on the subject were out of reach of my understanding. However, through continual encouragement, and when I finally started doing the excercises and writing small programs in this new way, things started to sink in and make sense. I didn't change my way of thinking in a single Eureka! moment, but more in a series of small "A ha, now I think I get it" moments.

Now I can write far more concise code using very powerful abstractions, with the added benefit that a large class of potential bugs don't even get past the compiler.

What am I going to do about it?


Back to my dilemma. Obviously, if I want to do something about it, I need to present my argument in a convincing way. I don't have a way of showing the whole picture in a single, simple, succinct way. In fact, I'm very doubtful that such a way exists. There was a lot of unlearning I had to do before I could really begin to see the benefits of functional programming.

I had planned to write up my experiences along the way while I was learning, but I didn't think I could do any better than what other people have already put out there; people who have come to the same conclusions and made the same observations that I have. Having been forced back into the position of writing imperative code though has given me a good reason to get my thoughts down in writing.

So, with that in mind, I'm planning to write up some examples of situations I've come across, where I have relented and written repetitive, less correct, or more error prone code in order to fulfill the contradictory requirements that lead to my dilemma. Using these examples, I hope to build a case for using languages that support functional programming and strong, static type systems even if these languages aren't popular or in the mainstream.

From the personal experience of someone who heavily bought into TDD, OOAD, Design Patterns, etc, I can predict that some of the examples will look wrong, or completely opposite to what is regarded as "best practice" in the mainstream. I will try to identify those areas and drill down into them when I get a chance. In the meantime, you should at least be prepared to question your current knowledge.

This is not going to be another burrito tutorial, and I can almost guarantee that anyone who reads only what I plan to write will not come away with an understanding of Functional Programming. The aim is to show some of the possibilities and maybe encourage people to look further into the context that makes it possible.

Meta


I haven't really thought of a logical sequence, so things may seem out of order. However, I'll try to come back and provide cross-links to follow-ups and related entries.

Also, although there's not many existing entries in this blog, I'll use the tag whyfp (as in, Why Functional Programming Matters) to mark each entry that is related to my dilemma.


1 Interestingly, I recently read a blog post that discusses duplication/redundancy, by someone who seems to have come to similar conclusions that I came to. If I understand correctly, they refer to this as incidental redundancy, and conclude that you can't, and shouldn't, do anything about it. However, I now know that some redundancy that I previously thought was incidental, is in fact related and if you discover or know the abstractions and use a language that allows the abstractions to be easily expressed, the redundancy is avoidable.

About Me

Computer programmer with and interest in music, and a passion for brewing beer, which I'm working at developing into a career!