Friday, December 3, 2010

When in Rome, do as the Romans do

The Pragmatic Programmer gives a very popular advice: learn a new language every year. The main reason behind this advice is that when you learn a new language, you learn a new way to think.

However, especially when getting started with a new one, we (inadvertently) try to understand it by comparing it to something we already know, which is likely the language we are most comfortable with, thus turning the learning experience into a square peg-round hole problem. Be aware of of this limitation before dismissing a language or a language feature as something horrible.

Some easy to spot examples of mistakes I have seen myself and/or others make are:
  • creating lots of simple DTO classes in Python instead of using tuples
  • trying to create a synchronous API for a web service call in Flex
  • creating an indexer for the immutable list in Scala
  • ...
  • ...and of course, procedural PL/SQL code with lot's of loops and ifs instead of set based operations
The subtle differences (idioms) can be trickier, but are worthy to learn about. Similar operations can behave surprisingly differently in different languages. When splitting a collection in C#, asking for an invalid slice range, you get an IndexOutOfRangeException, while in a functional language you likely get an empty list as the result. In Python it's called monkey patching, while ruby people call that opening up a class.

Even a seasoned programmer picking up a new language misses these - so if you can get someone to review your code in such detail as it's been done for Uncle Bob in Clojure or Davy Brion in Ruby, do yourself a favor, and ask for it!

If you are afraid of learning in public, and don't have access to experienced mentors, read code, and try to understand unreadable-at-first bits - I have fond memories of the time we have tried figuring out what (and why) some method did in scalacheck.

Note that I'm not saying you should not innovate, there have been (and will be!) great things coming out of concepts moving between platforms - boy I'm glad for NHibernate, and annotation based test methods coming from NUnit to JUnit were quite welcome too. However, first understand the platform's own means to the given end before deciding you need something from another platform.

So embrace the culture of the language, and if you feel you are not learning something new in the new language, don't rest, but seek out those differences that surely are there under the hood!

Happy learning, dear reader!

Wednesday, November 10, 2010

My #citcon London 2010 experience

This was both my first citcon (apparently pronounced "kitcon") and open space conference I've attended, and I'm sure it hasn't been the last for either one!

The venue was the Skillsmatter Exchange, which was simple in appearance (guess it's a former warehouse of sort), but with enough space, flipcharts, tables, projectors, seats needed for this event. There was enough space to chat without disturbing sessions, or to just grab a coffee - which seemed to be sufficient for this conference (and for my needs/expectations).

Friday evening was dedicated for introductions and for session proposals, topic merging, voting, and of course, some beer. It was a nice reminder that my problems are not unique - I've actually stepped out from the proposals line given that a number people already suggested topics that covered my problem. Maybe it was more to do with the fact that I'm still not comfortable speaking in front of big audiences (more than 20-30 people make me nervous. Guess I'll just have to practice, given that there was a time when any public speaking used to scare me.).

Saturday morning the schedule was supposedly finalized by the end of the breakfest provided. Later on it turned out not to be the case, so I've missed a couple of sessions, though all I've attended were great. Having a problem to choose from many good sessions is much better than not having any good ones to chose from. Nonetheless, I'll keep this in mind for future open spaces not to take anything granted unitl you are in the session.

The first session was about database testing, versioning, and deployment. This is a topic close and dear to my heart, and it was interesting to realize the bounderies of the boxes I've been thinking in - e.g.: I've not been working in multiversion deployment scenarios, where you need to go from any version to the current. I've also learned to be more explicit in my communication, e.g.: when I asked about a tool like agitar's for databases to generate characteristic tests for regression, just because a few people nodded that might be useful, I've assumed everyone knows what I'm talking about, which led to a misunderstanding between me and Gojko Adzic about testing with "random data". Though we've clarified this after the session, I'm wondering why it happened that I couldn't explain it during the session.

Before lunch, I've attended the module dependency handling session, where I learned there is no tool yet for handling versioned dependencies (i.e.: I would want to know if the version/feature branches of a given project are compatible with all versions of dependant projects). The discussion was nice, but I've had some trouble following once things got Java specific.

Next up was the Overcoming Organisational Defensiveness session. I thought I knew a lot about this topic, and was pleasantly surprised by all the new aspects that I learned, including, but not limited to:

  • Don't fix your problem, fix their problem
  • while not everyone agrees, in certain contexts, un-branding a practice helps (e.g.: we referenced Dan North's example from another conference of not doing pair programming, but asking two people to look into a problem together)
  • when developers don't have time to improve, bringing in a coach doesn't solve the problem. You first need to dedicate time for them to learn.
  • instead of trying to change people, step back and see if you can create some change that would effect them in ways you would like, but doesn't require their prior approval/cooperation and you can do it yourself
  • don't just do 5 whys, do the 5 "so what?" too

The next session's title was Using KPIs/Getting a data-driven organization, however, the topic shifted to organization culture and psychology. Not suprisingly, we didn't find a silver bullet, but there were a number of good laughs and ideas (transparency, exposure, accountability, responsibility, over and undercontroling, ownership were the main themes). It was a smaller session, but in my experience that made it much more focused and interactive; contrasted with the bigger sessions. Plus I have been surprised, there are people using vim on the mac :)

The final session was Beyond Basic TDD, where the essence boils down to TDD having a marketing message that design will just evolve, despite Kent Beck admitting that might not always be the case (i.e.: if you've got people with good design sense, no methodology will prevent them from writing well designed code). There should be more focus on teaching programmers about design. Gojko took over from there to facilitate the discussion around what can be done to bring those familiar with TDD basics to the next level, I have to admit I've become quite exhausted by this session, something to keep in mind next time when organizing sessions around a whiteboard.

Overall, it was great fun, it was good to bounce ideas off from people, chat with random ones, and to talk with people known only from twitter. The one thing I regret is that I had to skip the after event pubbing (first evening I was just way too tired, while Saturday evening I was catching up with some friends living in London). Next time I'll try to allocate one more day for the trip, because the hallway conversations were great during the event, and I would be surprised if they were any worse in the pub.

Sunday, October 31, 2010

Dealing with crunch mode

Before going any further: I think that crunch mode is not sustainable and is best avoided. I won't analyze the phenomen or its causes, there has been much written about it. However, given it definitely exists, though usually for shorter periods of time. Having had this experience a number of times, I felt I should organize my thoughts for future reference (for myself, and maybe for others too).

Some of the points below might sound like good practice for software development in general - often we only start working on process problems after they've blown up into our face. Holding retrospectives after such periods can bring quite a change, so don't miss out on the opportunity! However, we haven't even started yet, so let's step back in time.

Knowing why the close deadline is important from the beginning is crucial to keep the team motivated during this period. Working overtime for no apparent reason makes the already bad situation worse.

Have well defined boundaries for both time and scope. This is generally a good practice, but this is a must - setting out on a death march without a clear purpose is unlikely to succeed. Break things down into small (4-8 hours) tasks, prioritize them (and I mean an ordered list, not making all of them into critical priority). If the added up estimates already run out of the possible available time, cut scope. Most likely the only way it can be cut is vertical - e.g.: we will have logging, but we will work against only a single library, and won't make it pluggable for this deadline. Or we won't make the logging destination configurable (though it's a contrived example, it helps making the point).

Taking on technical debt can help. Be sure you all understand that this does not equal to crap code, unconditional copy-paste, etc.. Only introduce debt knowingly (you might want to keep an actual list of the shortcuts you have taken so you know what to fix in the upcoming releases).

Stop experimenting and move back to the comfort zone. Almost all projects involve some new element that the team is not familiar with it. If the team is just learning unit testing, they likely write the tests after they got the code to work, and are taking a long time to write them. In that situation, not writing tests for the time being could be the right choice. However, be careful to get the right message across - you are not dropping the practice because it slowsdevelopment down, but only rescheduling the learning period to some other time. Also, just like with performance optimizations, make these decisions after measuring - don't do itbased on speculation.

On the non-technical front, you can cut back on meetings, both in numbers and in duration. You might also want to schedule them so they don't interrupt the workflow. Have a status in the morning and at the end of the day, with a team lunch if more discussion is needed.

Work from home. Of course, some infrastructure is needed, and it might not be an option for everyone, especially for the person where next door is a construction site. However, saving on the commute time might could give you some extra time during the day without cutting too much into your personal life. Significant Others are more understanding if you are a bit more tired when they get to see you at least.

[Scope section was updated based on @ljszalai's comment on twitter]

Friday, October 22, 2010

Evaluating software products

This topic has been on my mind for some time, having had to use products (that were chosen after evaluation) with much pain. The first reaction (well, maybe the second) to such situations is the "if only I was tasked picking the right tool...". However, after thinking more about it, I'm not quite sure how we can evaluate products to keep everyone happy.

So when Pusztai László brought up the buy vs. build topic at the 2nd Microsoft Hungary Application Lifecycle Management Conference, I just couldn't resist asking him - how do you evaluate software products properly and effectively? Unfortunately there wasn't enough time to discuss it in enough detail during his presentation, and I couldn't find him to follow up on it during the break, so here I am, elaborating (and trying to answer) my own questions, hoping you can correct, confirm, or add to my ideas.

The context is:
  • there is a recognized problem (that can be solved by tooling)
  • the impact of the tool is large enough to warrant an evaluation
  • there is commitment to get a tool
  • the potential tools have been narrowed down to a reasonable number of candidates
Resist the temptation to just start playing with them at this point. It's a mistake, because if we don't know what the success criteria is, the chance to succeed is greatly reduced. Even when we think it's obvious, let's state explicitly - the same argument applies here as when clarifying user requirements. Try to involve as many stakeholders as possible. (Note: if the goal is to replace an existing system, be sure to list explicitly the things that are good about the current one too, just in case those aspects didn't make the list because we were too busy focusing on the improvements for the current pain points!). This list will be the reminder when we are awed by the bells, whistles, and other extra features a system brings to check whether they solve the actual problem.

The next step varies depending on the number of people available - a big enterprise might be able to evaluate all the selected products in parallel and decide having all the results. A smaller company might need to do the evaluation in sequence, and maybe stop the eval process once a good enough candidate is found (kind of like hiring - perfect is the enemy of the good).

Beware that programmers like to get things to work (even if the things themselves wouldn't want to!). Pilot teams might go to great lengths to resolve issues during the pilot (with duct tape, if needed). If we have written about 70% of the features in use, should we chose that product (think about applying upgrades!)? Also, if people are complaining already during the pilot, that's a sure sign to stop investing any more time in that product.

Solutions found during a pilot should be treated like spikes - learn from them, take those ideas for the final implementation, but most of the time, the pilot solution should not be promoted to production use as-is.

One thing that is hard to find during the pilot is how the application scales beyond the pilot team. If some kind of partitioning (separate servers for separate teams, load balancing, etc.) is possible, that's great, but better to know the limits of the app before it hits them. It's worth searching the support forums (in addition to finding the long outstanding unresolved issues) for information, as is to invite people for lunch who have been using this product for a longer period of time to hear the war stories.

Another aspect to consider is data migration to (and from) the platform - when replacing an existing system, consider the cost/feasibility of migration to the new one. In some cases, not migrating might be the best choice (e.g.: keep a read-only CVS instance so people can dig back the revision history, and only import the mainlines into the new system), but the descriptions for all the manual test cases for our flagship product are probably something that should be migrated.

One of the things I'm still unsure about is how many products to evaluate in a row, if none seems to meet the requirement we set. Should we lower the bar, and review whether or not we really need all the things we listed? Or that is the point when we just write our own tool?

Jason Yip has a good post on criteria for evaluating Off-The-Shelf-Software

Updates: posts that I've found after publishing this piece that deal with the same topic, but just some other aspects.

Sunday, October 10, 2010

Slides for the Continuous Delivery talk

I gave a talk on Friday at the firm tech user group about the build/release evolution towards Continuous Delivery. The slides are up under my github presentations project. It was built on top of the Continuous Integration presentation I gave at the Agile Hungary Meetup earlier this year (also on github, Hungarian slides only). Contrary to that one, here I was talking of things I haven't yet accomplished and only am looking forward to do, which worried me a bit - however, declaring this upfront was well received and it didn't pressure me from that point on.

Though I wasn't as good as Tim Fitz (especially lacking the experience) the talk has generated questions and discussion, which was the purpose, so I'm quite happy with how it went.

My only regret is forgetting to record the presentation to review my presentation skills.

Sunday, September 26, 2010

Don't repeat yourself, even across platforms

It's well known that duplication (not even mentioning triplication, quadruplication , etc.) is bad for maintainability (hence the DRY principle). I have not seen much talk around DRY with regards to multiple platforms/programming languages, so this post is my attempt to distill my thoughts and to learn about the pros/cons of applying this principle across languages/platforms.

Why would you use multiple platforms, or do a polyglot project?

Without the goal of enumerating all possible reasons, some case are:
  • Web applications need the same input sanity validation performed both client side for usability (JavaScript) and server side (whatever technology you use) for security. The same argument can be made for any N-tier application for DTOs.
  • For a portfolio of applications in the same domain, there is a need for consistency - e.g.: reference data catalog for input fields (you want to use the same terminology across the applications)
  • There can be similar logic applicable to different platforms - e.g.: some static code analysis is the same across platforms, whether or not we talk about Java or C# code, and it'd be nice to have just a single implementation.

Some approaches

Meta data driven libraries- the canonical data source is stored in some kind of a database, which contains a cross-platform implementation of the logic (e.g.: validation logic saved as regular expressions) and you have a minimal implementation in each language that is generic and data driven. There is minimal amount of code that your need to depend on in your application, which makes it a stable dependency (note I'm not talking about the code stored in the database, just the API you program against). However, the actual validation logic becomes black box from the testing perspective, and any of these metadata severely limits what you can do with it, and extension points are much harder to find - e.g.: should you want to restrict your application to only accept a certain range of zip code, how do you build that into a regexp based data driven validation framework? It certainly is possible, but hard, and in many cases expanding the framework increases complexity with significantly, and the data dictionary becomes somewhat hard to maintain.

Plain Old X Objects* appeal more to me. They are easily debuggable, can be used in isolation and offline development. They can be much richer (you can have all the errors encountered reported back to the user rather than a pass/fail), more readable (though certainly you can write readable regular expressions), and should you want to enhance/override the default behavior in special cases, you can easily override them. The problem of course is to make the same logic available on multiple platforms, in a unified fashion.

If the technology stack allows it, pick a language that runs on all parts of the stack (e.g.: Ruby/Python/JavaScript are all available as standalone and for the CLR/JVM.) The integration tests are fairly easy, just run the same tests with the different platforms.

If there is no suitable bridge for the stack, code generation is one option, which is almost the same concept as the metadata driven simple framework above, just taking out the datastore and generating classes for each of the rules to enable offline usage, debugging, etc. This has the additional cost of creating and evolving the code generator tool in addition.


Irregardless of the path chosen, the shared logic must be tested and documented on all platforms, which might be hard to do. Acceptance testing tools (fitnesse, concordion, etc.) can help, or for the data driven tests (e.g.: for validation, input string, should be valid, expected error message) a simple testrunner can be created for each of the platforms.

(Potential) Problems

  • Diversity vs. monoculture. The library becomes a single point of failure, and any bug has far reaching consequences. On the other hand, the reverse is true: any problem has to be fixed only once, and the benefits can be reaped by all that use the library. However, there might be fewer people looking at the shared domain for corner cases...
  • Shared dependency overhead - shared libraries can slow down development both for the clients of the library and the library itself. Processes for integration must be in place, etc. Gojko Adzic has a great post on shared library usage.
  • False sense of security - users of the library might assume that's all they need to do and not think through every problem so carefully. E.g.: DTO validation library might be confused with entity/business rules validation
  • Ayende has recently written a post about Maintainability, Code Size & Code Complexity that is (slightly) relevant to this discussion ("The problem with the smaller and more complex code base is that the complexity tends to explode very quickly."). In my reading the points are more applicable for the data-driven (or from there code generated) approach, where that smart framework becomes overly complex and fragile. Note he talks about a single application, and it's known that when dealing with a portfolio (NHProf, EFProf, etc.), he chose to use a single base infrastructure.

Have you done something similar in practice? What are your thoughts and experiences? Or have you been thinking about this very same topic? What have I missed? What have I misunderstood? Let me know!

Monday, August 30, 2010

Executable bug tracker

Disclaimer: I have (yet) no practical experience with the concept I describe below, it is a "thinking out loud" kind of post. The context is a team working on a product in the maintenance/legacy phase of its life-cycle, with developers who are already comfortable with automated testing.1

It will be about the small, nice to have priority bugs/known issues. The ones that never get formally prioritized in any of the releases, because there are always more important features issues. The ones that you record in your issue tracker, to keep your conscience at peace; and which will be closed as "won't fix" at the end.

Some advocate2 that you should just save yourself the trouble of maintaining these bugs at all, and just don't bother recording them until clients/managements push for it.

To clarify: I'm not against not prioritizing issues by the clients. However, I would love to find a way to give a chance for these issues to be fixed, without compromising delivery of business features.

One of the contributing factors why these issues don't get fixed (IMHO, of course :)) is that it takes a lot of effort to actually find a bug to fix when you have some slack time. You have to search through your tracker for open bugs, scan them to pick one, build up the context to actually begin to work on it (aka.: getting into the zone), etc. All this makes it too much of a hassle when all you have is a spare few minutes, and would be happy to fix an issue nearby the current module you are working on otherwise, but not with this extra burden added.

A possible solution is to have a collection of automated tests reproducing the bugs, with asserts that fail on the current codebase. These tests live separate from the main test suite (extra jar/DLL, categories, namespace, etc.), but live together in the IDE with the app (to aid refactoring). There could even be a custom test runner or an additional step in the build process to notify you if any of these bugs are fixed - you might even fix one accidentally.

With such a setup we can rely on static code analysis to find bugs in the area of the code we are about to start working on/just finished with; thus lowering the cost for one to begin working on a bug. Even if one won't fix it straight away, the test could be simply improved upon (remember the boyscout rule?).

The one concern I have is with the recording phase of this process - many a time the most costly part of fixing a bug is actually finding a way to reproduce it :) However, if the original "bug report" is the programmatic equivalent of "open this form, enter these values, then right click and observe the application crash", it might not add a noticeable overhead (especially in comparison to filing a bug report in the issue tracker).

1. or open source projects
2. see disclaimer - it might actually make sense if working on a well kept codebase, with frequent releases.

Thursday, August 19, 2010

Executable documentation

It's good to see build and release automation becoming more and more common, but I'm curious to see whether this wave of automation will stop at just releasing or flow over to other areas of software development, and change the attitude in general.

Though automated testing and continuous integration took some time to spread - despite the fact that software developers (who spend their days automating mundane tasks so clients can focus on adding value, and thus should have been easily convinced) have been (and some are still) opposed to the idea of automating the mundane tasks that they perform; I'm hoping that one of my pet peeves - stale documentation - will become more and more extinct as automation becomes more mainstream.

Below are just some document types that could be made live and executable:
  • Specifications. I'm not the first to suggest this, acceptance testing, tools, and books have been around, but haven't caught on yet. I've been introduced to this concept by Gojko Adzic, and I can recommend his past talks/videos or books for getting started on this topic.
  • New developer getting started instructions. In addition to local machine setup (though the approach Tamas Sipos described of using virtual machines per project is even better than scripting it), this usually includes gaining access to all sorts of different file shares, web services, machines, databases, mapping them to proper local names, and so forth. This is usually presented in the form of list, where you actually copy-paste it into the command line. There is no reason this couldn't be scripted. Mirroring access from an existing developer is sometimes easier than keeping the setup scripts up to date.
  • Revoking access from departing developers. This might be more applicable to bigger enterprise environments, but it is just as important as setting up a new developer. Script it.
  • Installation instructions, and fixlogs/workarounds for 3rd party applications (or even your own applications). These are the ones that warn you to only ever run this script as a given user. Or from a particular machine. And to execute the following commands, containing a loops and decision branches, written in plain text. And to make SQL calls, send xml/json messages, where you just need to substitue <this> with the current value, etc. Script them, and make reduce the document to a single instruction - execute with the following two parameters.
  • Code review guidelines, coding standards. Naming conventions, indentations, method length/complexity, all sorts of other static code analysis (the domain shouldn't call the UI code directly! We shouldn't have commented out code! There should be no Windows Forms controls with Visible = false never ever changed elsewhere to true in the class! etc.) should not be done by hand if can be automated - and there are quite a number of mature tools out there, all extensible, such as StyleCop, FxCop, Checkstyle, FindBugs, xDepend. Focus code reviews on the more important things.
  • Data flow diagrams. For the live, production system, you are better off generating this dependency graph from the scheduling tool you use, which makes it surely represent production, as opposed to the manually maintained Visio diagram or similar.
Hope it was inspiring :) Do you know more document types I have missed?

Saturday, August 7, 2010

On hiring programmers - writing code before the interview

What prompted this post was this job ad for an experienced web developer by Netpositive and the discussions that it prompted - and the realization that there is no way I can explain my view within twitter's limitations.

I liked the ad because as a prerequisite for being invited to an interview, applicants are required to write a little web application (regularly read from an RSS feed, store it locally, display the posts on a page (with paging), and add Facebook's "like" functionality to the page).

Being able to conduct an interview based on code the candidate written at her own time before the interview has the following benefits:
  • the interview-fever problem is eliminated - some smart people can't solve simple problems during the interview they would be able to do in a minute under normal conditions
  • those that can talk the talk but can't apply the theoretical knowledge to real problems don't get past this filter
  • those that cannot sell themselves during the interview but are good programmers can be considered
  • as an interviewer, I can focus on what the candidate knows, can ask them to suggest ways to implement new features in an application they already familiar with
  • it is more fair for people who might not know the jargon and terminology (though they certainly should learn it later), but are good at programming
  • you can learn a lot that might not be uncovered in a regular interview, e.g.: how likely the candidate is to reinvent the wheel rather than looking for existing solutions
  • the interviewer can screen applicants for requirement analysis if needed - just give an ambiguous enough spec
  • those candidates, who just want to get a job rather than a job at the given company are likely not going to apply because of the extra effort required here. Some great candidates will not apply either; however, I think that is an OK risk to take.
The hardest thing with this approach is picking the problem to be solved -
  • should be big enough, not just a one off script, but something where some design is required, so the interviewers can learn about the candidate
  • should be small enough so a good candidate can complete it in only a few hours;
  • should be a problem relevant to the job the opening is for, not another hello world program;
  • should be such that it's not a problem if it's googleable - we all google all the time, the important part that the candidate should demonstrate an understanding of the googled solution
  • should obviously not be real work for the company - I would not want to apply to a company that wants to use my work for free to deliver to their customers.
I'm not even going to attempt to give a silver bullet solution that satisfies all the above, because - as everything in the software field - it depends on your context. However, the below ideas could be used as starting points:
  • problems from online programming competition archives, e.g.: UVa Online Judge, Sphere Online Judge, etc.
  • dedicated online screening applications, like Codility
  • using tasks (bugfix, new features, etc.) from OSS projects. Yes, it is free work in a sense, but it contributes to the applicant's resume and makes the world a better place! :)