Thursday, October 20, 2011

CTools training: Australia

Next in the Ctools world tour: Sydney, Australia, 16-18 November

Our friends at Bizcubed invited us for some amazing steaks, and we're obviously going. In exchange, we just need to give a Ctools training!

Register here and C you there!


Friday, October 14, 2011

Multidimensional support in CCC

It's time to improve CCC!

Currently, CCC data engine uses a resultset composed of an array of series, one array of categories and an array of values. Internally, in the DataTranslator class we store one array where the first line has the series, the first column has the categories and the content is the values:


One of the biggest advantages of CCC is the way it automagically integrates with the other Ctools. A user defines his datasource, connects it to a chart, and voilá, he has a chart. In order to achieve that, we had to implement specific translators that can map the specific datasources to this inner structure. We need 2: a relational translator, for sql queries, and a crosstabl translator, for mdx-style format.

This is how the data from a relational datasource is mapped to the internal CCC structure, when we do a query like:
select series, category, sum(value) from foo group by series, category


This directly applies to SQL queries. They're all like this. The other distinct format of inputting data is through a CrosstabTranslator. This is mostly used with mondrian queries or other pivot-like structures. Example query:
select {[Series].children} on COLUMNS,  {[Categories].children} on ROWS where [Measures].[Values]
This will be mapped to our internal structure almost directly:


This is what we currently support. It's enough for almost all the charts we support in CCC. Bar, line, pie, etc. For other, this isn't exactly applicable but we found a way to reuse this structure (eg: bullet charts)

But this is not enough for more complex visualization. Imagine you want to have a heatgrid but instead of having only different cell colors you also want different box sizes. Or imagine that instead of having only one category (eg: countries) we have 2, countries and years.

The following is our "RFI" to fix this: We'll implement a multiVal mode that will maintain the same tabular approach to the internal CCC structure but instead of supporting single values, on each cell an array of values will be supported



The regular cases will contain arrays of dimention one, and backward compatibility will be ensured by having the old values array to be a flatten representation of this new structure but holding only one value - the last index for the categories and series and the first for the values.

How does this work on a real scenario? Our sql query will be a bit different:
select s1, .., sN, c1.. cM, sum(v1) .. sum(vK) from X group by s1.. sN, c1.. cM
This is the mapping that will be done:


On the crosstab translator, the query will be different too, a very standard operation in analyzer:

select {C1} * {C2} * {M} on COLUMNS,  {S1} * {S2} on ROWS where [Measures].[Values]
(we have to decide where we want the measures to appear. Either we stick them on the rows or allow them to be on the cols too, really haven't thought about that long enough)


Once this support is implemented in the backend, 50% of the problem is fixed: The data is there, available to use.

Now new charts can (and should) be implemented using the new multiVar mode when it makes sense. Having 1 series / category / value ends up being only a particular case of the generic support.


New methods will have to be implemented on the DataEngine too. We now have getSeries() and getCategories() (and all it's variants) , that were enough till now. We'll need to new methods here: getMultiSeries() and getMultiCategories(), both returning arrays, and specific methods for when passing  a levelIndex, eg:  getMultiSeriesForLevel(levelIdx)


Feedback is welcomed. This will impact the way scales and labels are generated, along with other things that I'm not yet able to foresee, but one thing at a time

 This should really extend CCC's ability to generate new visualizations never losing one of the main goals - it has to be dead easy (and fast) to integrate with our data.


Monday, October 10, 2011

Pentaho Ctools versioning with GIT





We have a few pentaho projects where we are involved all the way from data to UI. Mozilla is one of those cases. Controlling all aspects of the projects makes it easier to manage different version.


But we also have a lot of projects where we work in collaboration with our customers, us developing all the UX and dashboard layer and them converting the data access layer (the CDA part). On this cases, it gets a bit more complicated to manage the changes we do on the dummy data scenario with the real data implementation.


Cees van Kemenade, from Vinzi, is working on a tool to tightly integrate version management and some other amazing features that will put this issue to history, but in the meanwhile we'll try to solve this using a standard tool: GIT





The idea is to create a GIT repository for the project. While we work on the master branch, all client's commits will go to the "real" branch. From that point on, we'll keep working on the master and the client works on the real, merging all the subsequent versions from that point on. Any specific bugs the client fixes, we can cherry-pick back to the master branch.


While this is nothing groundbreaking, can really save days / weeks of development just by using technologies that we already use on other contexts.