
This book was a total surprise. Though I've known Maria Roldan's work for a while as a very valuable community member - she wrote an excellent kettle tutorial in the wiki and by her participations in Portuguese and Spanish pentaho communities, but didn't know she was working on the book. Which is a very good sign for this very large and active community.
The contents of this book are also surprising. I wasn't expecting to find a chapter called "Working with databases" in #8 of the list. When working with kettle and the rest of the pentaho suite on a daily basis, building and populating large datawarehouses, you almost expect that to be first on the list. (By the way, the datawarehouse subject comes in chapter 12).
Odd? Well, not if you go through the book and think about it for a bit. Pentaho Data Integration (PDI) is not a tool to build datawarehouses and that can be used to other data integration tasks. PDI is an amazing tool to do any kind of data handling, involving datawarehouses or not, involving databases or not. Since it's great for any data handling, if course it's also great for the subset of data warehouse management.
And I think this is the main message Maria Roldan wants to pass. Forget perl hacks, forget homegrown shell scripts, please, don't write yet another stored procedure that will make you (or the poor guy after you) go nuts when it comes the time to support it. Just use kettle. You'll find a lot of screenshots with accurate explanations making this book very lightweight to read. You can even take it while on travel/vacations and you'll be able to understand it even without further experiments.
This book can bring anyone up to date with Kettle very fast. The first chapter is all about installation procedures. Chapters 2 and 10 describe all the concepts everyone needs to know about transformations and jobs. All the others are clearly the result of someone that spent a lot of time working with the tool. There are brief descriptions of a large number of steps used in real world examples and chapters that dedicate entirely to very important subjects: Data validation and error handling (chapter 7), transforming the rowset (chapter 6) and how to connect it all with the appropriate task flows (chapters 10 and 11)
But my favorite chapter is number 5 - The javascript chapter. When most of the experts in kettle will say avoid the javascript step at all costs, Maria dedicated an entire chapter to it. I love javascript and I love the javascript step (that's right Thomas, regardless of your complains). When used right, it can save your day - true, it can also ruin your transformation if you're not cautious. It will definitely slow your transformation down, and if you're an ETL God you'll look for other alternatives. But for almost other situations in the mortals world where we have other bottlenecks to worry about just use this step!
Summing it all up, if you need to take bits and bytes from point A to point B, just buy the book.
ps: I support the home team, and as such I'm biased towards Pentaho just like I'm biased towards Mozilla or my team at WebDetails but I'm not doing that in regards to this review. The proof that I enjoyed this book and all the hard work behind it is that as soon as I saw this twitter message I acted as fast as I could and now Maria Roldan is officially a member of the WebDetails team. Welcome Cari!



An ETL God.. hrm.. I know you have to recite some "Hail Mary"s for that, but I don't know if I have to make any penance for being on the receiving side of idolatry. :)
ReplyDeleteMaybe my penance should just be a quick screenshot proving that while I do use fancy replacements for JS, I certainly don't leave home without it:
http://screencast.com/t/Zjc1ZjRmMDEt
幽默並不是諷刺,它或許帶有溫和的嘲諷,卻不傷人,它可能是以別人,也可以用自己為對象。........................................
ReplyDelete唯有穿鞋的人,才知道鞋的哪一處擠腳......................................................
ReplyDelete