Thursday, October 25, 2012

Book Review - Exploring Everyday Things with Ruby and R by Sau Sheong Chang

Book cover image 

Disclaimer: I received a free (electronic) copy of this ebook (Exploring Everyday Things with Ruby and R by Sau Sheong Chang) from O'Reilly as part of the O'Reilly Blogger Review Program, which also requires me to write a review about it. That aside, I would have purchased this book this year anyway, and would have reviewed it on this blog too.

About me and why I read this book

I have been programming professionally for about 8 years, mainly business applications and reporting, so I already have quite some love for data. While I haven't used math much in my day jobs, I liked (and was good at) it in high school, including taking extra classes - so I have learned basic statistics. Refreshing and advancing my data analytics skills is one of my goals this year, and reading this book was part of that plan - I have heard that R is one of the most powerful languages for statistical analysis currently available.

About the book

The book is written assuming basic understanding of programming and sets two goals:
  • to awaken the curiosity in the reader to go out and explore things and search for explanation, models, and experiments to validate understanding;
  • to show you some basic, but practical R and Ruby.
While the author intended each chapter to be more or less self sufficient, I have found it to be better read sequentially, especially the simulation chapters.

Ruby

I had no trouble with the code examples, even though I have only programmed about half an hour total in my life in Ruby. Beware that the only knowledge you gain about Ruby is the bare minimum required, so you'll have to put aside your thirst for complete understanding of the language and its ecosystem. If you need to have a proper understanding to work in a language (which I don't think is necessary), you are better off either reading a Ruby book first or using your favorite language to obtain the data - the code is easy to port.

Making me curious

I have had a lot of wow/a-ha moments, both about the topics chosen for discussion as well as the math/algorithmic ideas. You may find that you disagree with some of the conclusion the author draws, and it is emphasized during the introduction that the goal of the book is not to convince you about these conclusions, but to demonstrate the journey from question to conclusion in order to equip you with tools to enable you doing the same. This is mostly achieved.
I award extra bonus points for mentioning the limitations of the used analytical tools - I don't think I would trust any book/article/blog post which presents something without its downsides!
Not all examples are exactly everyday (e.g.: an analysis of going to work by car vs. public transportation would have been more everyday than how to simulate the flocking of birds), but they cover a wide breath of topics. The processing and analysis of the data is always challenging enough, plus your general knowledge is expanded.
One thing I was missing is a description of a really important part - being a layman, how do I go about finding which algorithms to use? While it isn't a book about Research 101, a description of the search process would have been great. You can of course always google, but when entering a new topic I find guided search helpful - which are some of the trick keywords, which sites to prefer/avoid, etc. On the other hand, enough methods are described that just properly learning and understanding them would make me a much better statistician already. Once done with that I could just fall back reading through the R packages and methods, hoping that if I have seen a word before it would emerge from my passive knowledge when I'm faced with a matching problem.

The R language

The book does a solid job to help you get started. It demonstrates enough language features to enable to you experiment with it for work projects (e.g.: use MySql as a datasource, create packages, etc.); points out the R component/library hubs to look for community packages; and recommends further learning resources.
The code examples are like most programming book snippets - procedural, (mostly) everything is located in a single method/script. Not a tangled-spaghetti mess that makes one despise it in legacy code, only it makes for a lower signal/noise ratio and requires more effort from the reader. Guess its a genre problem, so if you have read other programming books, you shouldn't have any problems with this one.
Technical comment: the ebook isn't formatted to play nice with the Kindle DX, and while in print the code block might be only broken between left & right pages, on the kindle it makes for awkward read.
The exposed APIs suggest that R is a bit too ceremonial for my taste, but that could be abstracted away for the project that warrants R's use. I have also used a number of visually great .NET UI third party components that were a pain to work with from a programmer's perspective, yet helped us create a great product. Plus things that feel alien first become second nature after enough practice, so it isn't a big deal. I plan to take a look at NumPy as well, and defer the decision whether to dive deeper into R (possibly via using F# 3.0 type providers for R).

Overall

The book hasn't left me in awe, but it didn't feel like a chore to read as some other books. I got the taste of R that I wanted when I picked up my copy to read. On top of that, I have learned about fun things, and it also added books to my reading (wish)list (e.g.: The Grammar of Graphics by Leland Wilkinson, Armchair Economist by Stephen E. Landsburg, and more). This is no definitive guide on R, but to wet your appetite and get you started, it is a good one I can recommend without reservations.