CDC - Community Distributed Cache
One more member for the CTools - and this one is big. Available through the ctools-installer using the -b dev flag, and requires Pentaho 4.5.
About
CDC stands for Community Distributed Cache and allows for high-performance, scalable and distributed memory clustering cache based on Hazelcast for both CDA and Mondrian.CDC is a pentaho plugin that provides the following features:
- CDA distributed cache support
- Mondrian distributed cache support
- Ability to switch between default and CDC cache for cda and mondrian
- Gracefully handles adding / removing new cache nodes
- Allows to selectively clear cache of specific CDE dashboards
- Allows to selectively clear cache of specific schemas / cubes / dimensions of mondrian cubes
- Provides an API to clean the cache from the outside (eg: after running etl)
- Provides a view over cluster status
- Supports multiple pentaho servers using the same cluster (eg: stage and production)
- Supports several memory configuration options
Motivation
Performance is a key point not only in business intelligence softwares but generally in any user interface. The goal of CDC is to give a Pentaho implementation based on Mondrian / CDA a distributed caching layer that can prevent as much as possible the database to be hit.One added functionality is the ability to clear the cache of only specific mondrian cubes. Even though Mondrian has a very complete api to control the member's cache, Pentaho only exposes a clean all functionality that ends up being very limited in production environments.
The cache being able to survive server restarts is a design bonus, and supported by CDA out of the box. It will be supported by Mondrian as soon as MONDRIAN-1107 is fixed.
Requirements
- Mondrian 3.4 or newer (in Pentaho 4.5)
- CDA 12.05.15
Usage
It's very simple to configure CDC.- Install CDC using either the installer (soon to be available) or ctools-installer. If you do a manual install, be sure to copy the contents of solution/system/cdc/pentaho/lib to server's WEB-INF/lib
- Download the standalone cache node
- Execute the standalone cache node in the same machine as pentaho or in the same internal network (launch-hazelcast.sh), optionally editing the file and changing the memory settings (defaults to 1Gb, increase at will). You can launch as many nodes as you want.
- Launch pentaho and click on the CDC button:
- Enable cache usage on CDA and Mondrian
- Restart pentaho server
- Check if the settings screen are satisfactory. Usually the defaults work fine.
Open analyzer, jpivot or a CDE dashboard that uses CDA and you should see the cache being populated
Cluster info
Hazelcast has a very good Management Center, so it's outside the scope of CDC to reimplement that kind of features. However, we do support a simple cluster information dashboard gives an overview of the state of the nodes.Note about lite nodes: Pentaho server is itself a cache node. However, it's configured in such a way that doesn't hold data, thus the term lite node
Clean cache
With CDC you can selectively control the contents of the cache, allowing you to clean either specific dashboards or cubes. The business case around this is simple: We need to clear the cache after new data is available (usually as a result of a etl job). CDC allows not only to do that but also to do it from within the etl process.CDA
CDC offers a solution navigator so that we can select a dashboard. When we select that dashboard, all the CDA queries used by that dashboard will be cleaned.
Clicking on the URL button we'll get a url that we can call externally (from an etl job). Be aware that you need to add the user credentials when calling from the outside (eg: &userid=joe&password=password)
Mondrian
This one is very similar to the previous one, but navigates through the available cubes. One can then either clean the entire schema, a specific cube or even the individual cell cache for a specific dimension (use this latest one with care).



This will increase database server performance, excellent contribution.
ReplyDeleteCongrats WebDetails!
Pedro, this is simply awesome! Great work WD team!
ReplyDeleteMan! Man! Oh, Man!
ReplyDeleteCongratulations to the team for this amazing release. Back when we designed the pluggable caches in Mondrian, we suspected that the community would make good use of it, but this is more than what we had hoped for. Let's keep working on this and push it further. Let us know if there are more things that you would like us to add. By the way, have you tried the statistics API of Mondrian? It should give you detailed data, at the cell level, on cache hits/miss and plenty of other infos like SQL queries running and MDX statements being executed.
ReplyDeleteThanks Luc!
ReplyDeleteAbout the statistics, that's amazing info! Is there a blog post / whatever that can teach us how to use it?
What you are looking for is MondrianServer.getMonitor(), which gives you access to 4 types of statistics.
ReplyDeleteServerInfo getServer();
List getConnections();
List getStatements();
List getSqlStatements();
Our API webpage seems outdated, but if you download 3.4.2 and look at the API there, you should get all the infos you need.
Pedro,
ReplyDeleteIs there any document where we can find the API of CDC? i would like to find the way to clean the cache from an ETL Job
Thanks,
Juan Gordon
Juan, to clean the cache from the ETL you just need to use a HHTP step and add the URL to the CDC address to clean the CUBE/DASHBOARD you need to be refreshed.
DeleteRemember to include the USERID and PASSWORD to the URL, you can create a Limited Functionality User for this if you want to manage the security permissions.
Hope this helps, Alexander Schurman
hi alexander, how to put the user and password on the URL
Deletehttp://localhost:8080/pentaho/content/ws-run/MondrianCacheCleanService/clearCube?catalog=project&cube=project&userid=joe&password=password
DeletePedro, I nice feature to add in the future is a way to Erase a Cube´s Cache based on one Dimension Element like a Day or Month....
ReplyDeleteMary times incremental updates only change information on the Cubes data for a Month/Week/Day, so insted of Cleaning all the cubes cache, we would love to be able to clean only all the information relates to XXXX month.
As you know this can be done using the Mondrian API, but for now the way to do it is very Programatic and painful....
Well it is just an idea... not easy one, but an idea..
Thanks, Alex
I installed plugin but when I click on CDC icon it shows a static screen with no functional button. Please suggest what can go wrong in configuring.
ReplyDelete