A journal to record my notes and ideas related to software development and computing

Friday, March 13, 2009

Filtering a list

Ok, let's start off with an easy one. How many times have you written something like the following:

public Iterable<User> getManagers(Iterable<User> users) {
List<User> managers = new ArrayList<User>();
for (User user : users) {
if (user.isManager()) {
managers.add(user);
}
}
return managers;
}

I know I've written a lot of code like that in the past. Each time, there's something slightly different. Either the types are different or the test in the if statement is different. For example, imagine a similar requirement to get the list of active users. There doesn't appear to be much more you can do aside from:

public Iterable<User> getActiveUsers(Iterable<User> users) {
List<User> activeUsers = new ArrayList<User>();
for (User user : users) {
if (user.isActive()) {
activeUsers.add(user);
}
}
return activeUsers;
}

Well, there is some stuff we can factor out if we really wanted to:

private static interface Condition {
boolean check(User user);
}

private Iterable<User> filterUsers(Iterable<User> users, Condition cond) {
List<User> filteredUsers = new ArrayList<User>();
for (User user : users) {
if (cond.check(user)) {
filteredUsers.add(user);
}
}
return filteredUsers;
}

public Iterable<User> getManagers(Iterable<User> users) {
return filterUsers(users, new Condition() {
public boolean check(User user) {
return user.isManager();
}
});
}

public Iterable<User> getActiveUsers(Iterable<User> users) {
return filterUsers(users, new Condition() {
public boolean check(User user) {
return user.isActive();
}
});
}

However, given that it's more code than the two original methods, I probably wouldn't actually do that unless I had a lot of similar looking methods. In any case, there's still duplication in the creation of the conditions. Without resorting to reflection (which would instantly open myself up to a whole new class of bugs), I can't think of much more that can be done with this code in Java.

In terms of the bounds of readability I mentioned in my dilemma, I would have left it at the first two methods and moved on.


Now take a look at the same two methods written in Scala; a language that supports first class functions:

def getManagers(users:List[User]):List[User] = {
return users.filter(isUserManager);
}

def getActiveUsers(users:List[User]):List[User] = {
return users.filter((user:User) => user.isActive());
}

def isUserManager(user:User):Boolean = {
return user.isManager();
}

First, some quick notes about the syntax:

  • The def keyword is used to declare a function or a method

  • In Scala, type annotations come at the end of the parameter and function declarations (after the ':' character)

  • Generics in Scala are expressed using square brackets as opposed to Java's angle brackets


On to the semantics of the code. The filter function on Scala's List class is a higher-order function, which in this case it means that it is a function that takes another function as a parameter. In the first case, you can see how the parameter to the filter function is another named function that is defined later on. In the second case, the parameter is defined in-line as an anonymous (unnamed) function.

In this example, the type of the parameter that List.filter expects is:

A function, which takes a single User parameter, and returns a Boolean value.

In the example, the List.filter function returns a (possibly empty) List of User objects.


I intentionally wrote out the Scala code in a verbose manner to try to make it easier read if you're coming from Java. However, in most cases there are a lot of aspects of the Scala syntax that are optional, such as semicolons, the return keyword, braces for single statements, and return types when they can be inferred. As someone who is more comfortable with the language, I find it easier to write it and at least as easy to read it as:

def getManagers(users:List[User]) = users.filter(isUserManager)
def getActiveUsers(users:List[User]) = users.filter((user:User) => user.isActive())
def isUserManager(user:User) = user.isManager()

Actually, I would probably inline the isUserManager function because it's almost like an alias now, and because the user parameter in the anonymous function is only used once and the type of it can be inferred, the last function can be shortened further still:

def getManagers(users:List[User]) = users.filter(_.isManager())
def getActiveUsers(users:List[User]) = users.filter(_.isActive())

Now, imagine getting used to writing code like that and then coming back to this:

public Iterable<User> getManagers(Iterable<User> users) {
List<User> managers = new ArrayList<User>();
for (User user : users) {
if (user.isManager()) {
managers.add(user);
}
}
return managers;
}

public Iterable<User> getActiveUsers(Iterable<User> users) {
List<User> activeUsers = new ArrayList<User>();
for (User user : users) {
if (user.isActive()) {
activeUsers.add(user);
}
}
return activeUsers;
}

And, if you'll allow me to mix my metaphors, this is barely scratching the tip of the iceberg.

No comments:

About Me

Computer programmer with and interest in music, and a passion for brewing beer, which I'm working at developing into a career!