Do. Reflect. Learn. Repeat. (moved to zsoldosp.eu): Evaluating software products

This topic has been on my mind for some time, having had to use products (that were chosen after evaluation) with much pain. The first reaction (well, maybe the second) to such situations is the "if only I was tasked picking the right tool...". However, after thinking more about it, I'm not quite sure how we can evaluate products to keep everyone happy.

So when Pusztai László brought up the buy vs. build topic at the 2nd Microsoft Hungary Application Lifecycle Management Conference, I just couldn't resist asking him - how do you evaluate software products properly and effectively? Unfortunately there wasn't enough time to discuss it in enough detail during his presentation, and I couldn't find him to follow up on it during the break, so here I am, elaborating (and trying to answer) my own questions, hoping you can correct, confirm, or add to my ideas.

The context is:

there is a recognized problem (that can be solved by tooling)
the impact of the tool is large enough to warrant an evaluation
there is commitment to get a tool
the potential tools have been narrowed down to a reasonable number of candidates

Resist the temptation to just start playing with them at this point. It's a mistake, because if we don't know what the success criteria is, the chance to succeed is greatly reduced. Even when we think it's obvious, let's state explicitly - the same argument applies here as when clarifying user requirements. Try to involve as many stakeholders as possible. (Note: if the goal is to replace an existing system, be sure to list explicitly the things that are good about the current one too, just in case those aspects didn't make the list because we were too busy focusing on the improvements for the current pain points!). This list will be the reminder when we are awed by the bells, whistles, and other extra features a system brings to check whether they solve the actual problem.

The next step varies depending on the number of people available - a big enterprise might be able to evaluate all the selected products in parallel and decide having all the results. A smaller company might need to do the evaluation in sequence, and maybe stop the eval process once a good enough candidate is found (kind of like hiring - perfect is the enemy of the good).

Beware that programmers like to get things to work (even if the things themselves wouldn't want to!). Pilot teams might go to great lengths to resolve issues during the pilot (with duct tape, if needed). If we have written about 70% of the features in use, should we chose that product (think about applying upgrades!)? Also, if people are complaining already during the pilot, that's a sure sign to stop investing any more time in that product.

Solutions found during a pilot should be treated like spikes - learn from them, take those ideas for the final implementation, but most of the time, the pilot solution should not be promoted to production use as-is.

One thing that is hard to find during the pilot is how the application scales beyond the pilot team. If some kind of partitioning (separate servers for separate teams, load balancing, etc.) is possible, that's great, but better to know the limits of the app before it hits them. It's worth searching the support forums (in addition to finding the long outstanding unresolved issues) for information, as is to invite people for lunch who have been using this product for a longer period of time to hear the war stories.

Another aspect to consider is data migration to (and from) the platform - when replacing an existing system, consider the cost/feasibility of migration to the new one. In some cases, not migrating might be the best choice (e.g.: keep a read-only CVS instance so people can dig back the revision history, and only import the mainlines into the new system), but the descriptions for all the manual test cases for our flagship product are probably something that should be migrated.

One of the things I'm still unsure about is how many products to evaluate in a row, if none seems to meet the requirement we set. Should we lower the bar, and review whether or not we really need all the things we listed? Or that is the point when we just write our own tool?

Jason Yip has a good post on criteria for evaluating Off-The-Shelf-Software

Updates: posts that I've found after publishing this piece that deal with the same topic, but just some other aspects.