Monday, May 10, 2010
Pentaho 3.2 Data Integration: Beginner's Guide (not after reading the book)
This book was a total surprise. Though I've known Maria Roldan's work for a while as a very valuable community member - she wrote an excellent kettle tutorial in the wiki and by her participations in Portuguese and Spanish pentaho communities, but didn't know she was working on the book. Which is a very good sign for this very large and active community.
The contents of this book are also surprising. I wasn't expecting to find a chapter called "Working with databases" in #8 of the list. When working with kettle and the rest of the pentaho suite on a daily basis, building and populating large datawarehouses, you almost expect that to be first on the list. (By the way, the datawarehouse subject comes in chapter 12).
Odd? Well, not if you go through the book and think about it for a bit. Pentaho Data Integration (PDI) is not a tool to build datawarehouses and that can be used to other data integration tasks. PDI is an amazing tool to do any kind of data handling, involving datawarehouses or not, involving databases or not. Since it's great for any data handling, if course it's also great for the subset of data warehouse management.
And I think this is the main message Maria Roldan wants to pass. Forget perl hacks, forget homegrown shell scripts, please, don't write yet another stored procedure that will make you (or the poor guy after you) go nuts when it comes the time to support it. Just use kettle. You'll find a lot of screenshots with accurate explanations making this book very lightweight to read. You can even take it while on travel/vacations and you'll be able to understand it even without further experiments.
This book can bring anyone up to date with Kettle very fast. The first chapter is all about installation procedures. Chapters 2 and 10 describe all the concepts everyone needs to know about transformations and jobs. All the others are clearly the result of someone that spent a lot of time working with the tool. There are brief descriptions of a large number of steps used in real world examples and chapters that dedicate entirely to very important subjects: Data validation and error handling (chapter 7), transforming the rowset (chapter 6) and how to connect it all with the appropriate task flows (chapters 10 and 11)
Summing it all up, if you need to take bits and bytes from point A to point B, just buy the book.
ps: I support the home team, and as such I'm biased towards Pentaho just like I'm biased towards Mozilla or my team at WebDetails but I'm not doing that in regards to this review. The proof that I enjoyed this book and all the hard work behind it is that as soon as I saw this twitter message I acted as fast as I could and now Maria Roldan is officially a member of the WebDetails team. Welcome Cari!