Sunday, September 26, 2010

Don't repeat yourself, even across platforms

It's well known that duplication (not even mentioning triplication, quadruplication , etc.) is bad for maintainability (hence the DRY principle). I have not seen much talk around DRY with regards to multiple platforms/programming languages, so this post is my attempt to distill my thoughts and to learn about the pros/cons of applying this principle across languages/platforms.

Why would you use multiple platforms, or do a polyglot project?

Without the goal of enumerating all possible reasons, some case are:
  • Web applications need the same input sanity validation performed both client side for usability (JavaScript) and server side (whatever technology you use) for security. The same argument can be made for any N-tier application for DTOs.
  • For a portfolio of applications in the same domain, there is a need for consistency - e.g.: reference data catalog for input fields (you want to use the same terminology across the applications)
  • There can be similar logic applicable to different platforms - e.g.: some static code analysis is the same across platforms, whether or not we talk about Java or C# code, and it'd be nice to have just a single implementation.

Some approaches

Meta data driven libraries- the canonical data source is stored in some kind of a database, which contains a cross-platform implementation of the logic (e.g.: validation logic saved as regular expressions) and you have a minimal implementation in each language that is generic and data driven. There is minimal amount of code that your need to depend on in your application, which makes it a stable dependency (note I'm not talking about the code stored in the database, just the API you program against). However, the actual validation logic becomes black box from the testing perspective, and any of these metadata severely limits what you can do with it, and extension points are much harder to find - e.g.: should you want to restrict your application to only accept a certain range of zip code, how do you build that into a regexp based data driven validation framework? It certainly is possible, but hard, and in many cases expanding the framework increases complexity with significantly, and the data dictionary becomes somewhat hard to maintain.

Plain Old X Objects* appeal more to me. They are easily debuggable, can be used in isolation and offline development. They can be much richer (you can have all the errors encountered reported back to the user rather than a pass/fail), more readable (though certainly you can write readable regular expressions), and should you want to enhance/override the default behavior in special cases, you can easily override them. The problem of course is to make the same logic available on multiple platforms, in a unified fashion.

If the technology stack allows it, pick a language that runs on all parts of the stack (e.g.: Ruby/Python/JavaScript are all available as standalone and for the CLR/JVM.) The integration tests are fairly easy, just run the same tests with the different platforms.

If there is no suitable bridge for the stack, code generation is one option, which is almost the same concept as the metadata driven simple framework above, just taking out the datastore and generating classes for each of the rules to enable offline usage, debugging, etc. This has the additional cost of creating and evolving the code generator tool in addition.

Testing

Irregardless of the path chosen, the shared logic must be tested and documented on all platforms, which might be hard to do. Acceptance testing tools (fitnesse, concordion, etc.) can help, or for the data driven tests (e.g.: for validation, input string, should be valid, expected error message) a simple testrunner can be created for each of the platforms.

(Potential) Problems

  • Diversity vs. monoculture. The library becomes a single point of failure, and any bug has far reaching consequences. On the other hand, the reverse is true: any problem has to be fixed only once, and the benefits can be reaped by all that use the library. However, there might be fewer people looking at the shared domain for corner cases...
  • Shared dependency overhead - shared libraries can slow down development both for the clients of the library and the library itself. Processes for integration must be in place, etc. Gojko Adzic has a great post on shared library usage.
  • False sense of security - users of the library might assume that's all they need to do and not think through every problem so carefully. E.g.: DTO validation library might be confused with entity/business rules validation
  • Ayende has recently written a post about Maintainability, Code Size & Code Complexity that is (slightly) relevant to this discussion ("The problem with the smaller and more complex code base is that the complexity tends to explode very quickly."). In my reading the points are more applicable for the data-driven (or from there code generated) approach, where that smart framework becomes overly complex and fragile. Note he talks about a single application, and it's known that when dealing with a portfolio (NHProf, EFProf, etc.), he chose to use a single base infrastructure.

Have you done something similar in practice? What are your thoughts and experiences? Or have you been thinking about this very same topic? What have I missed? What have I misunderstood? Let me know!