Thursday, March 17, 2011

How & Why I do code reviews

Code review is one of the programming practices that I advocate. However, as with any practice, I find understanding and communicating the values and driving forces behind it even more important than the actual practice itself (see: Cargo Cult). So, here is my take on code reviews.

Why?

I'm pretty much against introducing any practice just for the sake of introducing it. Different goals require different means, and without clear goals, it's rather difficult to evaluate whether a practice has helped the team or not. The reason I practice code reviews are:

Knowledge sharing (note: none of the teams I've been part of practiced pair programming, which serves this purpose quite well). It's a way to break the silos that exist within the team, regardless if they are horizontal (UI, server side, data access, etc.) or vertical (this is Jane Doe's component, noone else can touch that for there dragons live). I rather like that when I ask for a holiday my manager doesn't have to consider whether or not there is anyone who could work with my (recent) code.

Mentoring, though could be considered part of knowledge sharing, deserves to stand out on its own. I've found a lot of the concepts I came across first way too abstract, when read in a book, but when explained in context (what you've just written there is known as the Factory pattern...) I get a better and deeper understanding of it.

Culture creation. Again, abstract vs. concrete, talking vs. doing. One can have so many coding guidelines, best practices, etc. documents written. However, these documents usually highlight high level objectives, and almost any concrete implementation can be argued to fit with the description. I much prefer people arguing whether or not a new class should be extracted or not to a dry design guideline setting limits on the number of methods of a class and just checking against that formally (if at all).

Quality Assurance. Are there enough and appropriate tests? Do they makes sense to the next person (I routinely had to rename tests for my reviewers, because I was already too close to the code and didn't communicate intent well enough)? Are actually all tests passing? Haven't you accidentally reinvented the wheel (I've once failed to recognize that what I'm working with is an existing domain concept - the beauty of working with legacy code - so I created another model class for it)?

Last, but not least, catching obvious bugs is a benefit not to be overlooked. Face it: we all make mistakes, that are embarassing in hindsight. Catching them during the development cycle is keeps the embarassment in a smaller circle :)

What?

Computers have been invented to automate tiresome, repetitive, and boring tasks. Spending programmer time on checking tabs vs. spaces, curly brace placement, and other such, easily automated checks (StyleCop, Checkstyle, FindBugs, etc.) is more than waste. If such things are important for you, do a quick check that there are no warnings in the build for these, and move on to the areas where value is added.

When?

Code reviews should be concerned with small, easily digestible chunks, just like commits. That's why I prefer to review commits. Before they are commited. I can only offer nonscientific justification (experience + anecdotal), and no explanation, but my observation is that committing code makes it just as rigid as clay becomes in the oven - it becomes much more difficult to change it. Renaming a method/variable is much easier while it's still in the staging area (maybe the burden of writing another commit message?). This also gives enough time to ensure that tests are run before each commit (run the command line build while the review is ongoing).

Some suggest that developers should ask for code review when they feel it's a riskier change they've made. Since I'm really bad at predicting when I make mistakes (to be frank, I tend not to be able to predict), just to be on the safe side (and to ensure the flow of information), I advocate to review every commit.

How?

There are multiple ways to achieve the same thing, one that I've found working is the following:

  1. the developer who just completed a commitable chunk of work, signals the others (via a team chat window, though ACM ICPC style balloons could work too) that she can be reviewed.
    1. someone from the team picks up the request in the next few minutes - this delay ensures that the reviewer is not interrupted and dragged out from the flow.
    2. If noone picks it up in a few minutes, the developer stashes his work, and continues working until someone becomes available, when they return to the original change set.
  2. the command line build is started on the developer's machine while the nature of changes is described in a few sentences to the reviewer, so the context is set
  3. the reviewer drives the order the changes are reviewed. She asks questions, requests clarifications, which usually result in rename refactorings.
    I tend to start reviewing the tests - first reviewing the names make sense and no relevant corner cases are missed, then the actual test code. Only once I'm happy with that do I turn to the actual code.
    1. if any major flaws (bug, failing test, duplication - anything that cannot be fixed as fast as discussing it) are discovered, the code cannot be commited, and another review is needed (we tend to have that next reviewer be the same person as the one before in order not to slow down the process).
    2. otherwise, the normal checkin dance begins, and the code is committed.

Cons, worries, and pitfalls (real and unreal ones)

Just like any other practice, this takes time to learn and to do. However, I rather prefer the extra few minutes spent discovering duplication than the days it'd take later to remove. Exposing the potential misunderstandings that wouldn't be discovered while just talking again justifies it. If a review takes longer than a few minutes, that signals there is a deeper issue there, which should be discussed - like agile practices promote, we've exposed a problem early. In summary: I think it's worth it.

Co-location, while certainly makes the process easier, is not required. If you are working in a remote team, you probably have appropriate communication channels. A voice call with shared screen is all that's needed.

It takes effort and discipline. If the reviewer doesn't pay enough attention, isn't engaged, it's not going to work. Discussing regularly what kind of problems slipped through the past code reviews and whether they could (should) have been caught could be part of retrospectives.

Just as it's possible to create horrible designs with TDD, it's just as easy to deviate from the architecture, design of the program in small, correct steps. Even when reviewing only small chunks, keep in mind the big picture.

Some teams do post commit code reviews instead, arguing that if we do it in an asynchronous fashion, the work of others is not interrupted, the developer is not slowed down by waiting for others. While I've seen teams practice post commit reviews effectively (as they say, a team of good developers is likely to achieve good results regardless of the process/practices they follow), so I'm not saying that can't work, but I prefer to do it before. In addition to the clay analogy used above, synchronous communication is more effective than asynchronous (at least between humans, between machines its another story). At the end of the day, developers will still need to talk with each other for clarification, so not much time is saved anyway.

Tooling. Most code review software don't support pre commit reviews, however, they have value for other purposes. I like them to start discussions on things I see in the code that don't need immediate changes for the features we are working on now, but should be discussed at some point - either to become a nice to have task, or to actually realize it's a must have refactoring. I find these tools can be more effective than leaving TODO comments in the source code - threaded discussions are nice, and hard to have in code comments. Another area where they shine is architecture/prototype reviews. It's a great way to prepare for a design review meeting. And yes, I do prefer code based architecture evaluation to UML diagrams (though the later are useful when communicating about the architecture), because concrete coded scenarios can bring up more flaws than the white board - see the Non Coding Architect.

And do I need to say it? Just as any other practice, it's no silver bullet!


With regards to silver bullets - nothing is perfect, including this approach and the post itself. Please, do leave a comment pointing out any improvements/problems!




Update: Code Review During Retrospective lays out a great approach for purely educational focused code reviews, I recommend reading that too!

Friday, December 3, 2010

When in Rome, do as the Romans do

The Pragmatic Programmer gives a very popular advice: learn a new language every year. The main reason behind this advice is that when you learn a new language, you learn a new way to think.

However, especially when getting started with a new one, we (inadvertently) try to understand it by comparing it to something we already know, which is likely the language we are most comfortable with, thus turning the learning experience into a square peg-round hole problem. Be aware of of this limitation before dismissing a language or a language feature as something horrible.

Some easy to spot examples of mistakes I have seen myself and/or others make are:
  • creating lots of simple DTO classes in Python instead of using tuples
  • trying to create a synchronous API for a web service call in Flex
  • creating an indexer for the immutable list in Scala
  • ...
  • ...and of course, procedural PL/SQL code with lot's of loops and ifs instead of set based operations
The subtle differences (idioms) can be trickier, but are worthy to learn about. Similar operations can behave surprisingly differently in different languages. When splitting a collection in C#, asking for an invalid slice range, you get an IndexOutOfRangeException, while in a functional language you likely get an empty list as the result. In Python it's called monkey patching, while ruby people call that opening up a class.

Even a seasoned programmer picking up a new language misses these - so if you can get someone to review your code in such detail as it's been done for Uncle Bob in Clojure or Davy Brion in Ruby, do yourself a favor, and ask for it!

If you are afraid of learning in public, and don't have access to experienced mentors, read code, and try to understand unreadable-at-first bits - I have fond memories of the time we have tried figuring out what (and why) some method did in scalacheck.

Note that I'm not saying you should not innovate, there have been (and will be!) great things coming out of concepts moving between platforms - boy I'm glad for NHibernate, and annotation based test methods coming from NUnit to JUnit were quite welcome too. However, first understand the platform's own means to the given end before deciding you need something from another platform.

So embrace the culture of the language, and if you feel you are not learning something new in the new language, don't rest, but seek out those differences that surely are there under the hood!

Happy learning, dear reader!

Wednesday, November 10, 2010

My #citcon London 2010 experience

This was both my first citcon (apparently pronounced "kitcon") and open space conference I've attended, and I'm sure it hasn't been the last for either one!

The venue was the Skillsmatter Exchange, which was simple in appearance (guess it's a former warehouse of sort), but with enough space, flipcharts, tables, projectors, seats needed for this event. There was enough space to chat without disturbing sessions, or to just grab a coffee - which seemed to be sufficient for this conference (and for my needs/expectations).

Friday evening was dedicated for introductions and for session proposals, topic merging, voting, and of course, some beer. It was a nice reminder that my problems are not unique - I've actually stepped out from the proposals line given that a number people already suggested topics that covered my problem. Maybe it was more to do with the fact that I'm still not comfortable speaking in front of big audiences (more than 20-30 people make me nervous. Guess I'll just have to practice, given that there was a time when any public speaking used to scare me.).

Saturday morning the schedule was supposedly finalized by the end of the breakfest provided. Later on it turned out not to be the case, so I've missed a couple of sessions, though all I've attended were great. Having a problem to choose from many good sessions is much better than not having any good ones to chose from. Nonetheless, I'll keep this in mind for future open spaces not to take anything granted unitl you are in the session.

The first session was about database testing, versioning, and deployment. This is a topic close and dear to my heart, and it was interesting to realize the bounderies of the boxes I've been thinking in - e.g.: I've not been working in multiversion deployment scenarios, where you need to go from any version to the current. I've also learned to be more explicit in my communication, e.g.: when I asked about a tool like agitar's for databases to generate characteristic tests for regression, just because a few people nodded that might be useful, I've assumed everyone knows what I'm talking about, which led to a misunderstanding between me and Gojko Adzic about testing with "random data". Though we've clarified this after the session, I'm wondering why it happened that I couldn't explain it during the session.

Before lunch, I've attended the module dependency handling session, where I learned there is no tool yet for handling versioned dependencies (i.e.: I would want to know if the version/feature branches of a given project are compatible with all versions of dependant projects). The discussion was nice, but I've had some trouble following once things got Java specific.

Next up was the Overcoming Organisational Defensiveness session. I thought I knew a lot about this topic, and was pleasantly surprised by all the new aspects that I learned, including, but not limited to:

  • Don't fix your problem, fix their problem
  • while not everyone agrees, in certain contexts, un-branding a practice helps (e.g.: we referenced Dan North's example from another conference of not doing pair programming, but asking two people to look into a problem together)
  • when developers don't have time to improve, bringing in a coach doesn't solve the problem. You first need to dedicate time for them to learn.
  • instead of trying to change people, step back and see if you can create some change that would effect them in ways you would like, but doesn't require their prior approval/cooperation and you can do it yourself
  • don't just do 5 whys, do the 5 "so what?" too

The next session's title was Using KPIs/Getting a data-driven organization, however, the topic shifted to organization culture and psychology. Not suprisingly, we didn't find a silver bullet, but there were a number of good laughs and ideas (transparency, exposure, accountability, responsibility, over and undercontroling, ownership were the main themes). It was a smaller session, but in my experience that made it much more focused and interactive; contrasted with the bigger sessions. Plus I have been surprised, there are people using vim on the mac :)

The final session was Beyond Basic TDD, where the essence boils down to TDD having a marketing message that design will just evolve, despite Kent Beck admitting that might not always be the case (i.e.: if you've got people with good design sense, no methodology will prevent them from writing well designed code). There should be more focus on teaching programmers about design. Gojko took over from there to facilitate the discussion around what can be done to bring those familiar with TDD basics to the next level, I have to admit I've become quite exhausted by this session, something to keep in mind next time when organizing sessions around a whiteboard.

Overall, it was great fun, it was good to bounce ideas off from people, chat with random ones, and to talk with people known only from twitter. The one thing I regret is that I had to skip the after event pubbing (first evening I was just way too tired, while Saturday evening I was catching up with some friends living in London). Next time I'll try to allocate one more day for the trip, because the hallway conversations were great during the event, and I would be surprised if they were any worse in the pub.

Sunday, October 31, 2010

Dealing with crunch mode

Before going any further: I think that crunch mode is not sustainable and is best avoided. I won't analyze the phenomen or its causes, there has been much written about it. However, given it definitely exists, though usually for shorter periods of time. Having had this experience a number of times, I felt I should organize my thoughts for future reference (for myself, and maybe for others too).

Some of the points below might sound like good practice for software development in general - often we only start working on process problems after they've blown up into our face. Holding retrospectives after such periods can bring quite a change, so don't miss out on the opportunity! However, we haven't even started yet, so let's step back in time.

Knowing why the close deadline is important from the beginning is crucial to keep the team motivated during this period. Working overtime for no apparent reason makes the already bad situation worse.

Have well defined boundaries for both time and scope. This is generally a good practice, but this is a must - setting out on a death march without a clear purpose is unlikely to succeed. Break things down into small (4-8 hours) tasks, prioritize them (and I mean an ordered list, not making all of them into critical priority). If the added up estimates already run out of the possible available time, cut scope. Most likely the only way it can be cut is vertical - e.g.: we will have logging, but we will work against only a single library, and won't make it pluggable for this deadline. Or we won't make the logging destination configurable (though it's a contrived example, it helps making the point).

Taking on technical debt can help. Be sure you all understand that this does not equal to crap code, unconditional copy-paste, etc.. Only introduce debt knowingly (you might want to keep an actual list of the shortcuts you have taken so you know what to fix in the upcoming releases).

Stop experimenting and move back to the comfort zone. Almost all projects involve some new element that the team is not familiar with it. If the team is just learning unit testing, they likely write the tests after they got the code to work, and are taking a long time to write them. In that situation, not writing tests for the time being could be the right choice. However, be careful to get the right message across - you are not dropping the practice because it slowsdevelopment down, but only rescheduling the learning period to some other time. Also, just like with performance optimizations, make these decisions after measuring - don't do itbased on speculation.

On the non-technical front, you can cut back on meetings, both in numbers and in duration. You might also want to schedule them so they don't interrupt the workflow. Have a status in the morning and at the end of the day, with a team lunch if more discussion is needed.

Work from home. Of course, some infrastructure is needed, and it might not be an option for everyone, especially for the person where next door is a construction site. However, saving on the commute time might could give you some extra time during the day without cutting too much into your personal life. Significant Others are more understanding if you are a bit more tired when they get to see you at least.


[Scope section was updated based on @ljszalai's comment on twitter]

Friday, October 22, 2010

Evaluating software products

This topic has been on my mind for some time, having had to use products (that were chosen after evaluation) with much pain. The first reaction (well, maybe the second) to such situations is the "if only I was tasked picking the right tool...". However, after thinking more about it, I'm not quite sure how we can evaluate products to keep everyone happy.

So when Pusztai László brought up the buy vs. build topic at the 2nd Microsoft Hungary Application Lifecycle Management Conference, I just couldn't resist asking him - how do you evaluate software products properly and effectively? Unfortunately there wasn't enough time to discuss it in enough detail during his presentation, and I couldn't find him to follow up on it during the break, so here I am, elaborating (and trying to answer) my own questions, hoping you can correct, confirm, or add to my ideas.

The context is:
  • there is a recognized problem (that can be solved by tooling)
  • the impact of the tool is large enough to warrant an evaluation
  • there is commitment to get a tool
  • the potential tools have been narrowed down to a reasonable number of candidates
Resist the temptation to just start playing with them at this point. It's a mistake, because if we don't know what the success criteria is, the chance to succeed is greatly reduced. Even when we think it's obvious, let's state explicitly - the same argument applies here as when clarifying user requirements. Try to involve as many stakeholders as possible. (Note: if the goal is to replace an existing system, be sure to list explicitly the things that are good about the current one too, just in case those aspects didn't make the list because we were too busy focusing on the improvements for the current pain points!). This list will be the reminder when we are awed by the bells, whistles, and other extra features a system brings to check whether they solve the actual problem.

The next step varies depending on the number of people available - a big enterprise might be able to evaluate all the selected products in parallel and decide having all the results. A smaller company might need to do the evaluation in sequence, and maybe stop the eval process once a good enough candidate is found (kind of like hiring - perfect is the enemy of the good).

Beware that programmers like to get things to work (even if the things themselves wouldn't want to!). Pilot teams might go to great lengths to resolve issues during the pilot (with duct tape, if needed). If we have written about 70% of the features in use, should we chose that product (think about applying upgrades!)? Also, if people are complaining already during the pilot, that's a sure sign to stop investing any more time in that product.

Solutions found during a pilot should be treated like spikes - learn from them, take those ideas for the final implementation, but most of the time, the pilot solution should not be promoted to production use as-is.

One thing that is hard to find during the pilot is how the application scales beyond the pilot team. If some kind of partitioning (separate servers for separate teams, load balancing, etc.) is possible, that's great, but better to know the limits of the app before it hits them. It's worth searching the support forums (in addition to finding the long outstanding unresolved issues) for information, as is to invite people for lunch who have been using this product for a longer period of time to hear the war stories.

Another aspect to consider is data migration to (and from) the platform - when replacing an existing system, consider the cost/feasibility of migration to the new one. In some cases, not migrating might be the best choice (e.g.: keep a read-only CVS instance so people can dig back the revision history, and only import the mainlines into the new system), but the descriptions for all the manual test cases for our flagship product are probably something that should be migrated.

One of the things I'm still unsure about is how many products to evaluate in a row, if none seems to meet the requirement we set. Should we lower the bar, and review whether or not we really need all the things we listed? Or that is the point when we just write our own tool?


Jason Yip has a good post on criteria for evaluating Off-The-Shelf-Software


Updates: posts that I've found after publishing this piece that deal with the same topic, but just some other aspects.

Sunday, October 10, 2010

Slides for the Continuous Delivery talk

I gave a talk on Friday at the firm tech user group about the build/release evolution towards Continuous Delivery. The slides are up under my github presentations project. It was built on top of the Continuous Integration presentation I gave at the Agile Hungary Meetup earlier this year (also on github, Hungarian slides only). Contrary to that one, here I was talking of things I haven't yet accomplished and only am looking forward to do, which worried me a bit - however, declaring this upfront was well received and it didn't pressure me from that point on.

Though I wasn't as good as Tim Fitz (especially lacking the experience) the talk has generated questions and discussion, which was the purpose, so I'm quite happy with how it went.

My only regret is forgetting to record the presentation to review my presentation skills.

Sunday, September 26, 2010

Don't repeat yourself, even across platforms

It's well known that duplication (not even mentioning triplication, quadruplication , etc.) is bad for maintainability (hence the DRY principle). I have not seen much talk around DRY with regards to multiple platforms/programming languages, so this post is my attempt to distill my thoughts and to learn about the pros/cons of applying this principle across languages/platforms.

Why would you use multiple platforms, or do a polyglot project?

Without the goal of enumerating all possible reasons, some case are:
  • Web applications need the same input sanity validation performed both client side for usability (JavaScript) and server side (whatever technology you use) for security. The same argument can be made for any N-tier application for DTOs.
  • For a portfolio of applications in the same domain, there is a need for consistency - e.g.: reference data catalog for input fields (you want to use the same terminology across the applications)
  • There can be similar logic applicable to different platforms - e.g.: some static code analysis is the same across platforms, whether or not we talk about Java or C# code, and it'd be nice to have just a single implementation.

Some approaches

Meta data driven libraries- the canonical data source is stored in some kind of a database, which contains a cross-platform implementation of the logic (e.g.: validation logic saved as regular expressions) and you have a minimal implementation in each language that is generic and data driven. There is minimal amount of code that your need to depend on in your application, which makes it a stable dependency (note I'm not talking about the code stored in the database, just the API you program against). However, the actual validation logic becomes black box from the testing perspective, and any of these metadata severely limits what you can do with it, and extension points are much harder to find - e.g.: should you want to restrict your application to only accept a certain range of zip code, how do you build that into a regexp based data driven validation framework? It certainly is possible, but hard, and in many cases expanding the framework increases complexity with significantly, and the data dictionary becomes somewhat hard to maintain.

Plain Old X Objects* appeal more to me. They are easily debuggable, can be used in isolation and offline development. They can be much richer (you can have all the errors encountered reported back to the user rather than a pass/fail), more readable (though certainly you can write readable regular expressions), and should you want to enhance/override the default behavior in special cases, you can easily override them. The problem of course is to make the same logic available on multiple platforms, in a unified fashion.

If the technology stack allows it, pick a language that runs on all parts of the stack (e.g.: Ruby/Python/JavaScript are all available as standalone and for the CLR/JVM.) The integration tests are fairly easy, just run the same tests with the different platforms.

If there is no suitable bridge for the stack, code generation is one option, which is almost the same concept as the metadata driven simple framework above, just taking out the datastore and generating classes for each of the rules to enable offline usage, debugging, etc. This has the additional cost of creating and evolving the code generator tool in addition.

Testing

Irregardless of the path chosen, the shared logic must be tested and documented on all platforms, which might be hard to do. Acceptance testing tools (fitnesse, concordion, etc.) can help, or for the data driven tests (e.g.: for validation, input string, should be valid, expected error message) a simple testrunner can be created for each of the platforms.

(Potential) Problems

  • Diversity vs. monoculture. The library becomes a single point of failure, and any bug has far reaching consequences. On the other hand, the reverse is true: any problem has to be fixed only once, and the benefits can be reaped by all that use the library. However, there might be fewer people looking at the shared domain for corner cases...
  • Shared dependency overhead - shared libraries can slow down development both for the clients of the library and the library itself. Processes for integration must be in place, etc. Gojko Adzic has a great post on shared library usage.
  • False sense of security - users of the library might assume that's all they need to do and not think through every problem so carefully. E.g.: DTO validation library might be confused with entity/business rules validation
  • Ayende has recently written a post about Maintainability, Code Size & Code Complexity that is (slightly) relevant to this discussion ("The problem with the smaller and more complex code base is that the complexity tends to explode very quickly."). In my reading the points are more applicable for the data-driven (or from there code generated) approach, where that smart framework becomes overly complex and fragile. Note he talks about a single application, and it's known that when dealing with a portfolio (NHProf, EFProf, etc.), he chose to use a single base infrastructure.

Have you done something similar in practice? What are your thoughts and experiences? Or have you been thinking about this very same topic? What have I missed? What have I misunderstood? Let me know!