A dashboard is a great way to allow users to get information on a specific subject, considering we know what we're going to show. On this case I had absolutely no idea.
When we're sitting on a bunch of data, we need to go through a discovery phase where we'll actually decide what information can be valuable to the user.
Telemetry is a project from Mozilla that aims to make the products better - Firefox, Thunderbird and Fennec (codename for Firefox Mobile) by analyzing performance data sent by users while doing their real-world activity and the impact that developer changes had on that performance. The goal is simple - make better, happier, more productive.
As one can imagine, we have a bunch of data. All the submissions are primarily stored in HBase and later aggregated into ElasticSearch, allowing more versatile / real time analysis. We were then able to get a dynamic view over the data that summed up all the contributions from the users:
I previously blogged about the techniques that allow us to get data from/to elasticsearch, and once again it proved invaluable method. On this case, due to the huge amount of data, we had to use kettle's UDJC step, initially developed by Mozilla Metrics' chief engineer Daniel Einspanjer along with Jackson JSON processor to achieve high performance while processing the huge dataset of information and submitting some kettle improvements along the way.
This allowed developers to be able to view the impact of their changes and had the best effect a data tool can have while answering some questions - raise other questions.
Most of the following questions were related to time-based analysis, and be able to track, over time, the impact of the changes on a specific probe over a period of time. This would have the immediate effect of giving people data to decide if a specific release channel would be ready to pass to the next channel on the rapid release cycle and answer some of the new questions that the new process brings:
- Is Aurora ready to move to Beta?
- Are we getting the expected performance improvement in Nightly?
As a stretch goal, my personal objective was to implement any kind of system that allowed us to quickly identify regressions on the code without having to manually go through all the probes.
Back to basics - Kimball's DataWarehouse
This required a new approach on the data. Or rather, an old approach. In Business Intelligence, we live in exciting times where we have tons of available technologies that allow us to choose the best tool for the job (I recently did a blog post on the subject). But let's not forget 20 years of knowledge. This specific set of questions required building a standard, Kimball style data warehouse.
The goal is to have a way to track the improvements on the project's code by tracking, over time, the evolution of some key metrics we chose. Currently, the ones that are being tracked are:
- Standard Dev.
- Percentiles (25,50,75)
We'll consider platform builds for a specific application, version and OS as having the same codebase. So our key is platformBuildID-appName-appVersion-OS, and we consider that to be our "primary key", and all submissions with similar keys are aggregated together and considered to be generated from the same code.
On a daily basis we'll query telemetry and we'll query for the builds made on the last 7 days (configurable value). In this there's the assumption that after 7 days is enough for sampling and changes to the main kpis after that period would be due to environmental changes and not due to code reasons. This number is currently being studied to find out the best value to use.
We're also discarding, for this datawarehouse, all submission with less than 500 counts, in order to have a good enough sample size.
Spent about two weeks building this datawarehouse. With no guarantees that the results would yield anything decent. So once I had a resultset I could work with, took the opportunity to use R to analyze the data. This has been, for ages, an item on my to-do list.
R is an insanely powerful statistical analysis tool with tons of packages that will guarantee that the bottleneck will be your own mathematical knowledge (or lack of), making it one of the analysts' favorite tool.
R does wonders when we have the data in a tabular format and want to do ad-hoc analysis, so I picked a resultset and started playing with the data. I used CYCLE_COLLECTOR probe evolution on windows platform and Nightly channel.
The first thing I did was trying to get a feeling of the shape of the data (this, obviously, after a couple of days trying to find my way around R). After a while, it was looking like this:
The initial analysis led to a relation between the submission counts and mean / std dev. The higher the count, the lower the mean and standard deviation. This is coherent with something the metrics team already knew - the initial submissions are not representative of the general population, so on this case size really matters.
Also tried for a while to find a statistical model to this data, mostly around fitting a normal distribution and thus trying to get more analysis from the parameters, like the CDF and other density functions. This proved to be a frustrating task, as no decent fit came from it.
Due to the all the distinct types of probes in the code, we decided only take in consideration means and standard deviation, and looking at the evolution on time. This is the view we decided to use:
In a single chart we could be able to tell the evolution of the CYCLE_COLLECTOR with point position represents the mean, size of the points represent standard deviation (not the accurate value, but according to the scale) and color coded representing the size of the sample.
From R to CDE Dashboard
The next step, after knowing the kind of analysis we need to give to developers, is to build a dashboard that allows users to get this data from the BI system automatically, up to date and with the ability to quickly parametrize it. And obviously skip the need for the consumers of the data to have knowledge on R.
All the Ctools were made having in mind the capability to be able to virtually build *anything*, and replicate a R analysis is a very good challenge. Here's the end result after.... 2 days
With all the live connections to the data users can freely play with the data and change the parameters to be able to quickly see the impacts of the code
One of the biggest advantages of a datawarehouse is that it comes with an astonishing query language, MDX. In our case (and for anyone using pentaho as a BI server) we're using Mondrian as the Rolap engine that allows those queries to run.
MDX is very well suited for answering business questions, behaving particularly well on time-based analysis. So the next step was building a table that could compare the last 7 day average with the prior 28 days average. Big shifts would indicate either improvements or regressions. Here's the resulting table, ordered by default on regressions:
A regression of 1590% was immediately noticed. Clicking on that row allowed to inspect the actual histogram distribution:
I immediately checked with one of the firefox developers that mentioned that there was an error in that specific build that caused the counters of this probe to be totally skewed up. Success!
It's instantly rewarding to find out that the number of improvements absolutely outnumber the amount of regressions. One of my favorite ones that show all the improvements that developers have been putting in the code is IMAGE_DECODE_ON_DRAW_LATENCY
This is currently being used by the internal product developers to give them metrics over their code and the metrics team is working on allowing contributors outside the company to be able to take advantage of this tools.
Help Mozilla helping you
This is only possible to do with the help of users that are willing to submit their performance data back to mozilla. This is what we do with your data. There's absolutely nothing that can be traced back to you, as privacy is always the number one concern at mozilla. And here's how you can help: