Friday, July 29, 2011

5 days, 5 dashboards, 5 websites - 3/5

Day 3. Let's get geek!

Dashboard 3: Star Wars!





The first time I showed it to the team internally most of them said "There's no such Jedi!" in a very offended way! Bottom line, we did it cause we wanted to! It really shows that BI doesn't always have to be extremely serious and still pass data effectively and with a smile. Try it here



Site 3: cbf.webdetails.org





This was our first project, born over 4 years ago. And it's still a everyday lifesaver, a must have for any collaborative environment. Like it's siblings it also has a new home. This project is also in the base of the Guided CBF project by Analytical Labs

Thursday, July 28, 2011

5 days, 5 dashboards, 5 websites - 2/5

Day 2. Still looking good for the final goal

Dashboard 2: Sync Demo



This was made by students of the Ctools training course in Holland. Try it here



Site 2: cda.webdetails.org





An amazing and very mature project that hardly needs introduction - but definitely needed a home. CDA is The Way to extract data from Pentaho and can be used standalone. Tons of features.

Wednesday, July 27, 2011

5 days, 5 dashboards, 5 websites - 1/5

Challenge: in 5 days, introduce 5 demo dashboards made with the CTools and introduce 5 websites for the projects.


Lets start!


Dashboard 1: Greatest Driver of all times





This was made by the students of the CTools course in São Paulo. Try it here




Site 1: ccc.webdetails.org





CCC deserves a proper home. It's a great project that's been in production for a while. And now it has a home! Not only you can check a showcase of the supported charts but also test the changes.


Have fun, more on the following days!

Friday, July 22, 2011

Accented (international) characters in pentaho user console

The biggest disadvantage of a software made in USA is that they sometimes forget is that not everyone speaks English (and I'm the first one to admit that life would be simpler if we all did).


And most of the folks that *don't* sometimes need accented characters. And there's a few issues with PUC when trying, for instance, to create directories with accented characters (see picture)




There are 2 things you need to do to fix. The basic idea is: always work with utf-8 everywere: file encodings, database connections, browser encoding. It will save years of your life.

  1. Modify your start-pentaho.bat/start-pentaho.sh to add the option -Dfile.encoding=utf-8
  2. Modify webapps/pentaho/mantle/Mantle.jsp to add the following snippet:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Restart your server. It will work.


ps1: this was a request of one of the Ctools student in Brazil. Come meet us in one of the others to get your deepest questions answered

ps2: Thanks for helping me out here, Nick B. ;)

Tuesday, July 12, 2011

Elasticsearch, Kettle and the CTools

I'm not much into the sql vs nosql discussion. I have enough years of BI to know that the important thing is to choose the right tool for the job. And that requires a lot of tools!


Here's one more for our set: ElasticSearch. ElasticSearch is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Lucene


It may not be obvious, but there are tons of reasons why a search engine is a great choice as a BI data source - and far beyond the simple free-form text search.


Due to the intrinsic nature of nosql and it's schema-less approach, we can store virtually anything. Due to the clustering abilities of elasticsearch scalability is not even an issue. Using the query syntax we have a powerful way to get the data out. And it's blazing fast!


We initially used ElasticSearch for the twitter dashboard at Mozilla, described in a previous blog post. Everyone was very happy with the results, and we're betting quite a lot in elasticsearch at Mountain View.


So we made an effort to put elasticsearch closer to the Ctools and Pentaho. The first thing we did was to add an ElasticSearch Bulk Loader to Kettle.

In kettle 4.2 you'll be able to find this new step; Here's a sample of a transformation using it. As simple as it gets:




There are a few things I'd like to highlight:

  • It's simple - in 5 minutes you can get an elasticsearch engine with data on it
  • It's fast - 20krps on this sample docs (200k docs indexed in 9 seconds, for 60 Mbs of storage)
  • It's versatile - we can either index fields or full json documents
From this point on we can just query for documents in elasticsearch:





Now, what to do with this? What's really interesting is to be able to use this from with CDA. Doing that not only we'll be able to use ElasticSearch as a datasource to dashboards but also to reports. Using kettle to do the bridge between ES and our frontend tools guarantees a great degree of isolation and security. Here's a sample transformation:




Now we can tie this to CDA, and then use it with CDE for our dashboards. Here's the result:









Note: in order to run this from pentaho bi, both jsonpath.jar and json_simple.jar have to be added to the lib dir of the application server


With all this we can quickly build any dashboard that uses all this resources. As a very rough demonstration I built this one:




Have fun

Thursday, July 7, 2011

Pentaho Data Integration 4 Cookbook available




If you're like me, you're lazy. And if you're lazy, you're probably from the opinion that the best way to learn is by example.



Pentaho Data Integration 4 Cookbook
is exactly that - a collection of over 70 (I counted 79, I think even María Carina Roldan and Adrián Sergio Pulvirenti, the authors of the book, stopped counting them at some time) real world examples of how to use kettle to do... well, almost everything. It even has a recipe for integrating kettle in a CDE dashboard \o/


This book is very "usage" oriented. You won't find details about using it's api's, or integrate kettle in a 3rd party java program. You'll find recipes for what 95% of the kettle users do: implement jobs and transformations.


This is the list of chapters, will give a good idea of the contents of the book:

  • Chapter 1: Working with Databases
  • Chapter 2: Reading and Writing Files
  • Chapter 3: Manipulating XML Structures
  • Chapter 4: File Management
  • Chapter 5: Looking for Data
  • Chapter 6: Understanding Data Flows
  • Chapter 7: Executing and Reusing Jobs and Transformations
  • Chapter 8: Integrating Kettle and the Pentaho Suite
  • Chapter 9: Getting the Most Out of Kettle


The best thing about this? It's the 5th pentaho - related book:

And this is great for the open source community! Keep them coming! What 's next? A ctools book? ;)