A journal to record my notes and ideas related to software development and computing

Monday, April 27, 2009

Using the type system for discoverability and enforcing constraints

Now, anyone from a Ruby or other dynamically typed language background who read my last post might be thinking "big deal, I've been able to do that for ages". And it's true; some of these languages with support for first-class functions make writing functional code just as easy to write as it is in Scala and the like. So why would I want to use a strongly statically typed language over a dynamically typed one?

Well, I'm glad you asked. Before I really got stuck into understanding functional programming and strongly statically typed languages, I spent some time working on a Rails app. Unlike most of the experiences I'd heard from people moving to Ruby/Rails though, I joined this project during the maintenance phase. What followed for me was an up-hill battle of trying to understand the code and gain the implicit knowledge of my other colleagues who worked on the code before me. The codebase left a lot to be desired in terms of discoverability. Due to the magic of meta-programming, I couldn't even rely on grep to find all references to a particular method.

One example that remains burned into my memory was the frustration surrounding something as seemingly simple as:

tags = item.tags

What is the type of tags? Is it an array of Tag objects, an array of string tag names, or a comma separated string of tag names?

The answer in my case was all of the above! What it was at runtime depended on the code path that lead to the tags property being set on the item. By coincidence/implicit design/whatever, it just so happened that the way it was used was consistent across the code paths that invoked it, so it never showed up as a problem until I came in one day to re-use the data in a different context. Not even the tests (which covered > 98% of the code base) picked this up for me, because the tests were written upfront for the particular context that was being implemented and so it was always set to what was expected.


It got to the point where I was so paranoid about releasing anything because I wasn't sure of all of the knock-on effects my changes would have. I was bitten too many times by not executing all of the code paths that lead to the point of my changes, because I couldn't find them all up front. I was paranoid about deleting code that I didn't think was being used any more, and even worried about factoring common-looking code out because at least on one of each of those occasions, a bug turned up either late in testing or actually in production.

My flirt with mid-size scale projects in dynamically typed languages wasn't all doom and gloom though. The second rails app I worked on went a lot smoother. I put this more down to the fact that it was a green-fields project rather than because I was more familiar with the language/framework. In developing the project from the start, I had all of the assumptions and implicit knowledge about the types in my head. As a result, using the Rails framework was a breeze, and we managed to get the project completed on time and on budget.


Yes, it could be said that my first experience was just a one-off; that the previous developers didn't always stick to the conventions (whatever they're supposed to be), or were doing things in ways I wasn't aware of. But it's exactly some of these implicit assumptions and conventions that can be enforced by the compiler in a strongly statically typed language. There's no need to write your all assumptions as comments in code; many can be encoded in the language you're writing in.


Consider this code:

# user.rb
class User
def authorised?
...
end
...
end

# admin.rb
class Admin
def Admin.reset_system(user)
check_authorisation(user)
...
send_admin_email(user, "System was reset")
end

def Admin.send_admin_email(user, msg)
check_authorisation(user)
...
end

def Admin.check_authorisation(user)
raise "Unauthorised" unless user.authorised?
end
end

Without looking at the implementation, the tests or the documentation (assuming it exists), there is no way to indicate that clients calling the resetSystem and sendAdminEmail functions must pass in an authorised user (never mind the fact that you don't know for sure if the methods accept a User instance, a user identifier, or something else). Also note the authorisation check duplication. You could probably do some meta-programming to do the check before the methods are called, thus not having to worry about remembering to add the check for each authorised method in the future. However, the check is now removed from the context of the method, and it would be easy to circumvent the check if you needed to do a quick hack to get something working. Yes, the tests you (are supposed to) write should catch these problems, but they don't give you any guarantee over future changes (and from my own personal experience, the tests are a PITA to maintain).


Now, imagine not only being able to get around the problems above, but to do so in a way where it is practically not possible to get the code past the compiler without it conforming to your assumptions (unless you subvert the type system, in which case you're defeating its purpose). This is the situation I've become used to in a strongly statically typed language.


// User.scala
class User(...) {
def authorised_? = ...
}

// AuthorisedUser.scala
sealed abstract class AuthorisedUser(val user:User)

private class AuthorisedUserImpl(user:User) extends AuthorisedUser(user)

object AuthorisedUser {
def authorise(user:User):Option[AuthorisedUser] =
if (user.authorised_?) Some(new AuthorisedUserImpl(user)) else None
}

// Admin.scala
object Admin {
def resetSystem(user:AuthorisedUser) = {
...
sendAdminEmail(user, "System was reset")
}
def sendAdminEmail(user:AuthorisedUser, msg:String) = ...
}

A quick note about the Scala syntax again:

  • Scala makes the distinction between objects and classes, where the former declares a singleton and only defines static functions (eg. authorise), and the latter can be instantiated and defines instance methods (eg. authorised_?). Java does not make this distinction; a class can define both static functions and instance methods.

  • The sealed keyword denotes a class that cannot be extended from outside of the file it is defined in. In this case, as a client, you cannot circumvent the authorisation check by creating another subclass of AuthorisedUser to pass in to the restricted functions.

  • Some and None are subclasses of the Option type in Scala. In this case, it allows an optional AuthorisedUser to be returned from the authorise function depending on whether or not the underlying user is authorised.



From the above code, it is just not practically possible to write client code that calls the resetSystem and sendAdminEmail functions without first calling the authorise function. If you try, your code won't compile. None of this finding out at test-time/run-time - it can't run, because it won't compile.

Not only that, but the compiler actually removes the need to test authorisation checks for the Admin functions. There are no tests to write that would make sense; either the user is authorised, or the code doesn't compile!

As you can see, in this case the statically typed language allows more assumptions to be made explicit in the code. If I decide to do a one-off hack, the compiler can be used to keep me honest, and I've found that it generally makes me think more about the problem I'm trying to solve. Yes, it can be frustrating trying to get your code to pass the compiler (especially when you think it's right), but more often than not, when it does compile, I am a lot more confident that it is correct in terms of being consistent with my assumptions. The type system can also be seen as the ultimate defensive coding practice. With a type system, discussions like these tend to become moot.



The most common objection I've come across to using a statically typed language over a dynamically typed language is that the former is more verbose and requires more code to achieve the same functionality in the latter. This is only true of languages with a poor type system implementation, and has already been addressed many times before. I wasn't going to mention it, but for those who didn't notice, the strongly-statically typed version above actually has fewer lines of code than the dynamically typed version. It also requires fewer tests (because there's no way to call the admin functions with an unauthorised user) and has all the other benefits of statically compiled code. Win-win-win.

Whilst there will be more examples of some of the advantages you can get from the type system, I don't plan to write too much more on the specific topic of Dynamic vs. Static types as it has already been covered many times before. I am very open to criticisms of the above comparison, as long as it focuses on the topic at hand, not the actual implementation (it is one of the smaller examples I could come up with to try get my point across, and as such, it is somewhat contrived).



Also note that you can get these benefits in a language with a weaker type system, such as Java. However, the ceremony involved in declaring the types usually makes it infeasible to do for the trivial cases and painful for the complex cases. Below you can see the previous Scala code that has been ported to Java, making use of the FunctionalJava library:

// User.java
public class User {
public boolean isAuthorised() {
...
}
}

// AuthorisedUser.java
public abstract class AuthorisedUser {
final User user;

private AuthorisedUser(final User user) {
this.user = user;
}

private static class AuthorisedUserImpl extends AuthorisedUser {
public AuthorisedUserImpl(final User user) {
super(user);
}
}

public static Option<AuthorisedUser> authorise(final User user) {
return user.isAuthorised()
? Option.<AuthorisedUser>some(new AuthorisedUserImpl(user))
: Option.<AuthorisedUser>none();
}
}

// Admin.java
public class Admin {

public static void resetSystem(final AuthorisedUser user) {
...
sendAdminEmail(user);
}

public static void sendAdminEmail(final AuthorisedUser user) {
...
}
}

3 comments:

fawcett said...

Interesting post. I've encountered problems similar to your "tags" issue many times in dynamic languages.

A grammatical nitpick: it's 'fewer lines' and 'fewer tests', not 'less' (because they are countable). Sorry, this one is a pet peeve of mine. :)

Gabriel C. said...

Interesting! Clearly your Ruby codebase needs 130% of code coverage :) (be prepared to be burned in the stake for your denouncement!)

On a similar vein, last month I wrote about what I think are some advantages of static type systems. Even the "lowly" Java type system can help.

Kristian Domagala said...

@fawcett, Thanks, grammatical nitpicks fixed :-)

@Gabriel C., Ha ha, I nearly included a link to that post in my entry! It really struck a chord with me about how a lot of people don't take full advantage of the type system even when it is available to them, and they end up with the disadvantages of a dynamically typed language while still incurring the overhead of non-inferred types. Although, I have to admit that since returning to Java on my current project, I've found that it is sometimes too frustrating to create all of my desired types and I occasionally resort to using their primitive counterparts.

Regarding being prepared, as long at there is evidence-based reason behind the fire, then I'm quite happy to learn and be burned :-)

About Me

Computer programmer with and interest in music, and a passion for brewing beer, which I'm working at developing into a career!