Thursday, March 1, 2012

Book Review - Programming Collective Intelligence by Toby Segaran

Book cover photo Disclaimer: I received a free (electronic) copy of this ebook (Programming Collective Intelligence by Toby Segaran) from O'Reilly as part of the O'Reilly Blogger Review Program, which also requires me to write a review about it. That aside, I would have purchased this book this year anyway, and would have reviewed it on this blog too.

About me and why I read this book


I've been programming professionally for ~7.5 years, mainly business applications and reporting, so I already have quite some love for data. While I haven't used math much in my day jobs, I liked (and was good at) it in high school, including taking extra classes - so I have learned basic statistics. Refreshing and advancing my data analytics skills is one of my goals this year, and reading this book was part of the plan.


About the book



The book introduces lots of algorithms that can be used to gain new insight into any kind of data one might come across. The explanations are broken up into digestible chunks, and are supported by great visualizations. While understanding of the previous chunks is required for the later ones, this allowed me to read through most of the book on the train to and from work.


Each of the algorithms is illustrated with real world application examples, and examples where applying them doesn't make sense are brought too. The exercises at the end of the chapters are applied and not purely theoretical - and coming up with exercises from the domain I work with every day was pretty easy! The book is really inspiring, which is great for an introductory book!


In addition to the well written, gradual introduction, the book has a concise algorithm reference at the end, so when one needs a quick refresher, there is no need to wade through the lengthy tutorials.


While the prose and the logic of the explanations are great, I have found the code samples hard to follow: really short, cryptic variable names; leaky abstractions; inconsistent coding style just to name a few. Some code samples are actually incorrect implementations of the given algorithm and there are antipatterns like string sql concatenation in the code without a warning comment to the reader to remind them it's a bad practice.


Nonetheless, it is great to have actual code to play with, just the initial reading and reviewing of it requires some extra effort.

The book claims that you don't need previous Python knowledge to understand the code samples, which I can't confirm (I use Python at my day job), but I wouldn't be surprised if not knowing Python could make understanding the code even more difficult (I've actually learned a few new language features from the samples!). Also, the Python language has come a long way since 2.4, which is the version used in the book - and that old version makes the code feel dated.

The book was written in 2007, but is not dated. First, the foundations of any topic tend to be timeless, and the most recent algorithm the book describes was published in 1990. The Table of Contents is comparable to more recently written ones (though I haven't read other introductory books yet).

In summary: I would recommend it as a great introductory book!