My Hacking Journal: scala

Showing posts with label scala. Show all posts

Monday, April 27, 2009

Using the type system for discoverability and enforcing constraints

Now, anyone from a Ruby or other dynamically typed language background who read my last post might be thinking "big deal, I've been able to do that for ages". And it's true; some of these languages with support for first-class functions make writing functional code just as easy to write as it is in Scala and the like. So why would I want to use a strongly statically typed language over a dynamically typed one?

Well, I'm glad you asked. Before I really got stuck into understanding functional programming and strongly statically typed languages, I spent some time working on a Rails app. Unlike most of the experiences I'd heard from people moving to Ruby/Rails though, I joined this project during the maintenance phase. What followed for me was an up-hill battle of trying to understand the code and gain the implicit knowledge of my other colleagues who worked on the code before me. The codebase left a lot to be desired in terms of discoverability. Due to the magic of meta-programming, I couldn't even rely on grep to find all references to a particular method.

One example that remains burned into my memory was the frustration surrounding something as seemingly simple as:


tags = item.tags

What is the type of tags? Is it an array of Tag objects, an array of string tag names, or a comma separated string of tag names?

The answer in my case was all of the above! What it was at runtime depended on the code path that lead to the tags property being set on the item. By coincidence/implicit design/whatever, it just so happened that the way it was used was consistent across the code paths that invoked it, so it never showed up as a problem until I came in one day to re-use the data in a different context. Not even the tests (which covered > 98% of the code base) picked this up for me, because the tests were written upfront for the particular context that was being implemented and so it was always set to what was expected.

It got to the point where I was so paranoid about releasing anything because I wasn't sure of all of the knock-on effects my changes would have. I was bitten too many times by not executing all of the code paths that lead to the point of my changes, because I couldn't find them all up front. I was paranoid about deleting code that I didn't think was being used any more, and even worried about factoring common-looking code out because at least on one of each of those occasions, a bug turned up either late in testing or actually in production.

My flirt with mid-size scale projects in dynamically typed languages wasn't all doom and gloom though. The second rails app I worked on went a lot smoother. I put this more down to the fact that it was a green-fields project rather than because I was more familiar with the language/framework. In developing the project from the start, I had all of the assumptions and implicit knowledge about the types in my head. As a result, using the Rails framework was a breeze, and we managed to get the project completed on time and on budget.

Yes, it could be said that my first experience was just a one-off; that the previous developers didn't always stick to the conventions (whatever they're supposed to be), or were doing things in ways I wasn't aware of. But it's exactly some of these implicit assumptions and conventions that can be enforced by the compiler in a strongly statically typed language. There's no need to write your all assumptions as comments in code; many can be encoded in the language you're writing in.

Consider this code:


# user.rb
 class User
   def authorised?
     ...
   end
   ...
 end

# admin.rb
 class Admin
   def Admin.reset_system(user)
     check_authorisation(user)
     ...
     send_admin_email(user, "System was reset")
   end

   def Admin.send_admin_email(user, msg)
     check_authorisation(user)
     ...
   end

   def Admin.check_authorisation(user)
     raise "Unauthorised" unless user.authorised?
   end
 end

Without looking at the implementation, the tests or the documentation (assuming it exists), there is no way to indicate that clients calling the resetSystem and sendAdminEmail functions must pass in an authorised user (never mind the fact that you don't know for sure if the methods accept a User instance, a user identifier, or something else). Also note the authorisation check duplication. You could probably do some meta-programming to do the check before the methods are called, thus not having to worry about remembering to add the check for each authorised method in the future. However, the check is now removed from the context of the method, and it would be easy to circumvent the check if you needed to do a quick hack to get something working. Yes, the tests you (are supposed to) write should catch these problems, but they don't give you any guarantee over future changes (and from my own personal experience, the tests are a PITA to maintain).

Now, imagine not only being able to get around the problems above, but to do so in a way where it is practically not possible to get the code past the compiler without it conforming to your assumptions (unless you subvert the type system, in which case you're defeating its purpose). This is the situation I've become used to in a strongly statically typed language.


// User.scala
  class User(...) {
    def authorised_? = ...
  }

// AuthorisedUser.scala
  sealed abstract class AuthorisedUser(val user:User)

  private class AuthorisedUserImpl(user:User) extends AuthorisedUser(user)

  object AuthorisedUser {
    def authorise(user:User):Option[AuthorisedUser] =
        if (user.authorised_?) Some(new AuthorisedUserImpl(user)) else None
  }

// Admin.scala
  object Admin {
    def resetSystem(user:AuthorisedUser) = {
      ...
      sendAdminEmail(user, "System was reset")
    }
    def sendAdminEmail(user:AuthorisedUser, msg:String) = ...
  }

A quick note about the Scala syntax again:

Scala makes the distinction between objects and classes, where the former declares a singleton and only defines static functions (eg. authorise), and the latter can be instantiated and defines instance methods (eg. authorised_?). Java does not make this distinction; a class can define both static functions and instance methods.

The sealed keyword denotes a class that cannot be extended from outside of the file it is defined in. In this case, as a client, you cannot circumvent the authorisation check by creating another subclass of AuthorisedUser to pass in to the restricted functions.

Some and None are subclasses of the Option type in Scala. In this case, it allows an optional AuthorisedUser to be returned from the authorise function depending on whether or not the underlying user is authorised.

From the above code, it is just not practically possible to write client code that calls the resetSystem and sendAdminEmail functions without first calling the authorise function. If you try, your code won't compile. None of this finding out at test-time/run-time - it can't run, because it won't compile.

Not only that, but the compiler actually removes the need to test authorisation checks for the Admin functions. There are no tests to write that would make sense; either the user is authorised, or the code doesn't compile!

As you can see, in this case the statically typed language allows more assumptions to be made explicit in the code. If I decide to do a one-off hack, the compiler can be used to keep me honest, and I've found that it generally makes me think more about the problem I'm trying to solve. Yes, it can be frustrating trying to get your code to pass the compiler (especially when you think it's right), but more often than not, when it does compile, I am a lot more confident that it is correct in terms of being consistent with my assumptions. The type system can also be seen as the ultimate defensive coding practice. With a type system, discussions like these tend to become moot.

The most common objection I've come across to using a statically typed language over a dynamically typed language is that the former is more verbose and requires more code to achieve the same functionality in the latter. This is only true of languages with a poor type system implementation, and has already been addressed many times before. I wasn't going to mention it, but for those who didn't notice, the strongly-statically typed version above actually has fewer lines of code than the dynamically typed version. It also requires fewer tests (because there's no way to call the admin functions with an unauthorised user) and has all the other benefits of statically compiled code. Win-win-win.

Whilst there will be more examples of some of the advantages you can get from the type system, I don't plan to write too much more on the specific topic of Dynamic vs. Static types as it has already been covered many times before. I am very open to criticisms of the above comparison, as long as it focuses on the topic at hand, not the actual implementation (it is one of the smaller examples I could come up with to try get my point across, and as such, it is somewhat contrived).

Also note that you can get these benefits in a language with a weaker type system, such as Java. However, the ceremony involved in declaring the types usually makes it infeasible to do for the trivial cases and painful for the complex cases. Below you can see the previous Scala code that has been ported to Java, making use of the FunctionalJava library:


// User.java
  public class User {
    public boolean isAuthorised() {
      ...
    }
  }

// AuthorisedUser.java
  public abstract class AuthorisedUser {
    final User user;
 
    private AuthorisedUser(final User user) {
      this.user = user;
    }
    
    private static class AuthorisedUserImpl extends AuthorisedUser {
      public AuthorisedUserImpl(final User user) {
        super(user);
      }
    }
    
    public static Option<AuthorisedUser> authorise(final User user) {
      return user.isAuthorised()
          ? Option.<AuthorisedUser>some(new AuthorisedUserImpl(user))
          : Option.<AuthorisedUser>none();
    }
  }

// Admin.java
  public class Admin {
 
    public static void resetSystem(final AuthorisedUser user) {
      ...
      sendAdminEmail(user);
    }
 
    public static void sendAdminEmail(final AuthorisedUser user) {
      ...
    }
  }

Friday, March 13, 2009

Filtering a list

Ok, let's start off with an easy one. How many times have you written something like the following:


  public Iterable<User> getManagers(Iterable<User> users) {
    List<User> managers = new ArrayList<User>();
    for (User user : users) {
      if (user.isManager()) {
        managers.add(user);
      }
    }
    return managers;
  }

I know I've written a lot of code like that in the past. Each time, there's something slightly different. Either the types are different or the test in the if statement is different. For example, imagine a similar requirement to get the list of active users. There doesn't appear to be much more you can do aside from:


  public Iterable<User> getActiveUsers(Iterable<User> users) {
    List<User> activeUsers = new ArrayList<User>();
    for (User user : users) {
      if (user.isActive()) {
        activeUsers.add(user);
      }
    }
    return activeUsers;
  }

Well, there is some stuff we can factor out if we really wanted to:


  private static interface Condition {
    boolean check(User user);
  }

  private Iterable<User> filterUsers(Iterable<User> users, Condition cond) {
    List<User> filteredUsers = new ArrayList<User>();
    for (User user : users) {
      if (cond.check(user)) {
        filteredUsers.add(user);
      }
    }
    return filteredUsers;
  }

  public Iterable<User> getManagers(Iterable<User> users) {
    return filterUsers(users, new Condition() {
      public boolean check(User user) {
        return user.isManager();
      }
    });
  }

  public Iterable<User> getActiveUsers(Iterable<User> users) {
    return filterUsers(users, new Condition() {
      public boolean check(User user) {
        return user.isActive();
      }
    });
  }

However, given that it's more code than the two original methods, I probably wouldn't actually do that unless I had a lot of similar looking methods. In any case, there's still duplication in the creation of the conditions. Without resorting to reflection (which would instantly open myself up to a whole new class of bugs), I can't think of much more that can be done with this code in Java.

In terms of the bounds of readability I mentioned in my dilemma, I would have left it at the first two methods and moved on.

Now take a look at the same two methods written in Scala; a language that supports first class functions:


  def getManagers(users:List[User]):List[User] = {
    return users.filter(isUserManager);
  }

  def getActiveUsers(users:List[User]):List[User] = {
    return users.filter((user:User) => user.isActive());
  }

  def isUserManager(user:User):Boolean = {
    return user.isManager();
  }

First, some quick notes about the syntax:

The def keyword is used to declare a function or a method

In Scala, type annotations come at the end of the parameter and function declarations (after the ':' character)

Generics in Scala are expressed using square brackets as opposed to Java's angle brackets

On to the semantics of the code. The filter function on Scala's List class is a higher-order function, which in this case it means that it is a function that takes another function as a parameter. In the first case, you can see how the parameter to the filter function is another named function that is defined later on. In the second case, the parameter is defined in-line as an anonymous (unnamed) function.

In this example, the type of the parameter that List.filter expects is:

A function, which takes a single User parameter, and returns a Boolean value.

In the example, the List.filter function returns a (possibly empty) List of User objects.

I intentionally wrote out the Scala code in a verbose manner to try to make it easier read if you're coming from Java. However, in most cases there are a lot of aspects of the Scala syntax that are optional, such as semicolons, the return keyword, braces for single statements, and return types when they can be inferred. As someone who is more comfortable with the language, I find it easier to write it and at least as easy to read it as:


  def getManagers(users:List[User]) = users.filter(isUserManager)
  def getActiveUsers(users:List[User]) = users.filter((user:User) => user.isActive())
  def isUserManager(user:User) = user.isManager()

Actually, I would probably inline the isUserManager function because it's almost like an alias now, and because the user parameter in the anonymous function is only used once and the type of it can be inferred, the last function can be shortened further still:


  def getManagers(users:List[User]) = users.filter(_.isManager())
  def getActiveUsers(users:List[User]) = users.filter(_.isActive())

Now, imagine getting used to writing code like that and then coming back to this:


  public Iterable<User> getManagers(Iterable<User> users) {
    List<User> managers = new ArrayList<User>();
    for (User user : users) {
      if (user.isManager()) {
        managers.add(user);
      }
    }
    return managers;
  }

  public Iterable<User> getActiveUsers(Iterable<User> users) {
    List<User> activeUsers = new ArrayList<User>();
    for (User user : users) {
      if (user.isActive()) {
        activeUsers.add(user);
      }
    }
    return activeUsers;
  }

And, if you'll allow me to mix my metaphors, this is barely scratching the tip of the iceberg.

Friday, October 24, 2008

Proxy instantiation problem from Hibernate Session load

This one had me puzzled for a while yesterday. I was able to retrieve objects from a Session via a query, but I was getting the following error when I was using the load method:


org.hibernate.HibernateException: Javassist Enhancement failed: Thing
        at org.hibernate.proxy.pojo.javassist.JavassistLazyInitializer.getProxy(JavassistLazyInitializer.java:142)
        at org.hibernate.proxy.pojo.javassist.JavassistProxyFactory.getProxy(JavassistProxyFactory.java:72)
        at org.hibernate.tuple.entity.AbstractEntityTuplizer.createProxy(AbstractEntityTuplizer.java:402)
        at org.hibernate.persister.entity.AbstractEntityPersister.createProxy(AbstractEntityPersister.java:3483)
        at org.hibernate.event.def.DefaultLoadEventListener.createProxyIfNecessary(DefaultLoadEventListener.java:298)
        at org.hibernate.event.def.DefaultLoadEventListener.proxyOrLoad(DefaultLoadEventListener.java:219)
        at org.hibernate.event.def.DefaultLoadEventListener.onLoad(DefaultLoadEventListener.java:126)
        at org.hibernate.impl.SessionImpl.fireLoad(SessionImpl.java:905)
        at org.hibernate.impl.SessionImpl.load(SessionImpl.java:822)
        at org.hibernate.impl.SessionImpl.load(SessionImpl.java:815)
...
Caused by: java.lang.InstantiationException: Thing_$$_javassist_0
        at java.lang.Class.newInstance0(Class.java:335)
        at java.lang.Class.newInstance(Class.java:303)
        at org.hibernate.proxy.pojo.javassist.JavassistLazyInitializer.getProxy(JavassistLazyInitializer.java:139)
        ... 49 more

When I changed the code from


  session.load(classOf[Thing], id).asInstanceOf[Thing]


  session.createQuery("from Thing where id = " + id).list().asInstanceOf[java.util.List[Thing]].toList(1)

everything worked as expected.

I found a few results from the web, but nothing to indicate what my specific problem was.

The code I was working on was loosely based on another Scala project, which uses an older version of Hibernate. I changed the libraries over, and got a similar problem with CGLIB:


org.hibernate.HibernateException: CGLIB Enhancement failed: Thing
        at org.hibernate.proxy.pojo.cglib.CGLIBLazyInitializer.getProxy(CGLIBLazyInitializer.java:96)
        at org.hibernate.proxy.pojo.cglib.CGLIBProxyFactory.getProxy(CGLIBProxyFactory.java:49)
        at org.hibernate.tuple.entity.AbstractEntityTuplizer.createProxy(AbstractEntityTuplizer.java:379)
        at org.hibernate.persister.entity.AbstractEntityPersister.createProxy(AbstractEntityPersister.java:3455)
        at org.hibernate.event.def.DefaultLoadEventListener.createProxyIfNecessary(DefaultLoadEventListener.java:257)
        at org.hibernate.event.def.DefaultLoadEventListener.proxyOrLoad(DefaultLoadEventListener.java:191)
        at org.hibernate.event.def.DefaultLoadEventListener.onLoad(DefaultLoadEventListener.java:103)
        at org.hibernate.impl.SessionImpl.fireLoad(SessionImpl.java:878)
        at org.hibernate.impl.SessionImpl.load(SessionImpl.java:795)
        at org.hibernate.impl.SessionImpl.load(SessionImpl.java:788)
...
Caused by: java.lang.InstantiationException: Thing$$EnhancerByCGLIB$$e29dbda1
        at java.lang.Class.newInstance0(Class.java:335)
        at java.lang.Class.newInstance(Class.java:303)
        at org.hibernate.proxy.pojo.cglib.CGLIBLazyInitializer.getProxyInstance(CGLIBLazyInitializer.java:107)
        at org.hibernate.proxy.pojo.cglib.CGLIBLazyInitializer.getProxy(CGLIBLazyInitializer.java:93)
        ... 49 more

This time I was luckier with my search results, which eventually lead me to these two pages. Indeed, the problem was due to having a private no-args constructor in my Thing object. As soon as I changed the constructor to be visable, the calls to load() started working as expected.

I was thrown a bit by the InstantiationException, which according to the JavaDoc is

Thrown when an application tries to create an instance of a class using the newInstance method in class Class, but the specified class object cannot be instantiated because it is an interface or is an abstract class.

While I can see that it's related, I don't really think the private constructor issue strictly falls under that description. In any case, I'm sure that a better message, or a different exception, or an update to the documentation, would have helped me isolate the problem a lot quicker.

Oddly, the code from the other project also has a private no-args constructor, yet it seems to work. That part remains a mystery to me, but I thought I'd post my other findings in case it helps anyone else who runs into a similar issue.

Thursday, August 7, 2008

Installing GHC on OSX

During my quest to better understand Functors & Monads, it was suggested that I take a detour and learn Haskell first, as I was struggling a bit with Scala's type annotations. So, off to MacPorts I went:


  ~> sudo port install ghc
  --->  Fetching ghc
  --->  Verifying checksum(s) for ghc
  --->  Extracting ghc
  --->  Applying patches to ghc
  --->  Configuring ghc
  --->  Building ghc with target all

Half an hour later, I was still waiting for it to build. Something was not quite right. I killed the process and tried again (after issuing a sudo port clean ghc, because it seemed to be in an inconsistent state after killing it).

Same problem - it got up to building, then seemed to hang. Looking at top, I could see ghc-6.8.2 running intermittently. After a while, ghc-6.8.3 was also added to the list of processes (running intermittently), but still no progress on the build.

I gave it about 45 minutes before killing it again. Brad has been down this path before, and he put me on to the pre-packaged installer available from the http://haskell.org/ghc download page. As per the instructions, I installed XCode off the Leopard disc, then ran the installer, and hey presto, I was ghc hello_world.hs'ing it up!

Monday, August 4, 2008

Installing/upgrading Scala

Because I keep forgetting:


cd /opt
sudo tar xzf ~/install/scala-2.7.1.final.tgz
sudo rm scala
sudo ln -s scala-2.7.1.final scala
cd scala
sudo bin/sbaz install scala-devel-docs

This cryptic message when installing the docs indicates lack of privileges, ie, not running with sudo:


planning to install: scala-devel-docs/2.7.1.final
Installing...
java.io.FileNotFoundException: /opt/scala-2.7.1.final/meta/cache/scala-devel-docs-2.7.1.final.zip.tmp (No such file or directory)

My Hacking Journal