Friday, October 14, 2011

Multidimensional support in CCC

It's time to improve CCC!

Currently, CCC data engine uses a resultset composed of an array of series, one array of categories and an array of values. Internally, in the DataTranslator class we store one array where the first line has the series, the first column has the categories and the content is the values:


One of the biggest advantages of CCC is the way it automagically integrates with the other Ctools. A user defines his datasource, connects it to a chart, and voilá, he has a chart. In order to achieve that, we had to implement specific translators that can map the specific datasources to this inner structure. We need 2: a relational translator, for sql queries, and a crosstabl translator, for mdx-style format.

This is how the data from a relational datasource is mapped to the internal CCC structure, when we do a query like:
select series, category, sum(value) from foo group by series, category


This directly applies to SQL queries. They're all like this. The other distinct format of inputting data is through a CrosstabTranslator. This is mostly used with mondrian queries or other pivot-like structures. Example query:
select {[Series].children} on COLUMNS,  {[Categories].children} on ROWS where [Measures].[Values]
This will be mapped to our internal structure almost directly:


This is what we currently support. It's enough for almost all the charts we support in CCC. Bar, line, pie, etc. For other, this isn't exactly applicable but we found a way to reuse this structure (eg: bullet charts)

But this is not enough for more complex visualization. Imagine you want to have a heatgrid but instead of having only different cell colors you also want different box sizes. Or imagine that instead of having only one category (eg: countries) we have 2, countries and years.

The following is our "RFI" to fix this: We'll implement a multiVal mode that will maintain the same tabular approach to the internal CCC structure but instead of supporting single values, on each cell an array of values will be supported



The regular cases will contain arrays of dimention one, and backward compatibility will be ensured by having the old values array to be a flatten representation of this new structure but holding only one value - the last index for the categories and series and the first for the values.

How does this work on a real scenario? Our sql query will be a bit different:
select s1, .., sN, c1.. cM, sum(v1) .. sum(vK) from X group by s1.. sN, c1.. cM
This is the mapping that will be done:


On the crosstab translator, the query will be different too, a very standard operation in analyzer:

select {C1} * {C2} * {M} on COLUMNS,  {S1} * {S2} on ROWS where [Measures].[Values]
(we have to decide where we want the measures to appear. Either we stick them on the rows or allow them to be on the cols too, really haven't thought about that long enough)


Once this support is implemented in the backend, 50% of the problem is fixed: The data is there, available to use.

Now new charts can (and should) be implemented using the new multiVar mode when it makes sense. Having 1 series / category / value ends up being only a particular case of the generic support.


New methods will have to be implemented on the DataEngine too. We now have getSeries() and getCategories() (and all it's variants) , that were enough till now. We'll need to new methods here: getMultiSeries() and getMultiCategories(), both returning arrays, and specific methods for when passing  a levelIndex, eg:  getMultiSeriesForLevel(levelIdx)


Feedback is welcomed. This will impact the way scales and labels are generated, along with other things that I'm not yet able to foresee, but one thing at a time

 This should really extend CCC's ability to generate new visualizations never losing one of the main goals - it has to be dead easy (and fast) to integrate with our data.


0 comments:

Post a Comment