We're struggling with versioning and deployment. We need some help in managing the development and (especially) deployment process
This is a very well known problem. While versioning is a relevant issue in all areas - and not only development - the correct way to approach it changes drastically, and Pentaho is not different.
We've been working in pentaho projects for over 5 years. Since day 1, on scenarios where we had to manage:
- Working on different projects locally
- Working with multiple people on the same project
- Managing several environments - development, staging, upgrades
- Managing platform upgrades
I'll write a collection of my experiences regarding this issue as of today. It can obviously change since we're always trying to optimize our processes.
I've been asked a specific set of question, that I'll introduce while contextualizing the big picture.
VersioningThere are 2 ways to approach this problem:
- Simple pentaho-solutions versioning
- CBF (Community Build Framework) setup
As for the CBF... well, this is something Paul Stoellberger , Saiku author and general BI guru said on the irc channel:
< pstoellberger> i really need to get my cbf out again
< pstoellberger> just done hackish installations recently
Everyone doing something serious on pentaho uses CBF. You may think you don't need it. You may think it's complicated and doesn't worth the effort (since it's not an argument, we've put up a quickstart bundle for you). You're wrong, and you'll know it once you start using it.
VCS infrastructureBut one thing at a time. Before going through the specific workflows, you need to choose your VCS (Version Control System) tool. Once again there are 2 options:
CVS on this list, press Alt-F4 to close your Internet Explorer browser and go back to 1998, we don't want you here!
I'll skip the long arguments about those two. Use Git. It's amazing, handles branches and tag in a very efficient way, allows multiple remote repositories and has great UI tools that will be very handy.
Also bear in mind that Git is not Github. While you can definitely host your solutions in there, you're not forced to. I'd even say most of us would rather keep our files and implementation very securely locked.
So starting with infrastructure; Git doesn't even need a "server". Any shared directory could be used as the central repository. I've even used the "poor man's git server", initializing a repository ( git init --bare myproject/ ) in a dropbox folder. That has proven to be a very error prone approach, since there's no way to guarantee that our repository won't get damaged with the dropbox synchronization. So use a proper system. There are 2 options (this is getting a bit repetitive):
We use Gitolite. Once installed, it's very easy to administer (creating repositories, adding / managing people and permissions) and very secure, as it uses ssh connections.
Regardless of what you chose, I'll now assume you have a proper VCS server available.
Versioning pentaho-solutions directoryOriginally we included the entire CBF structure in the same repository, as described in the documentation. SVN allows us to checkout a subdirectory of the repository, so we could checkout only the project-client/solution/ folder. In git that's not possible and I could never get my head around submodules, so I simply have 2 different repositories (notice a trend here with the number 2?):
The second has all the BI server solution files. The first one has... everything else, from CBF specific structure to ETL. After we clone the project-client repository we either link to the project-client-solution or clone that one inside the project-client directory.
Here's a real world example of one of our projects:
pedro@arpeggio:~/tex/pentaho/project-client (master)$ d total 68 24 -rw-r--r-- 1 pedro pedro 20859 Feb 21 2011 build.xml.cbf-3.7 4 drwxr-xr-x 2 pedro pedro 4096 May 21 16:41 config/ 4 drwxr-xr-x 2 pedro pedro 4096 Feb 17 2011 etl/ 4 -rwxr-xr-x 1 pedro pedro 138 Apr 27 2011 importCache.sh* 4 -rw-rw-r-- 1 pedro pedro 1057 May 21 14:46 kettle.properties.diogo 4 -rw-rw-r-- 1 pedro pedro 1118 May 21 14:46 kettle.properties.remote 8 -rw-rw-r-- 1 pedro pedro 4534 May 21 14:46 kettle.properties.server 4 drwxr-xr-x 4 pedro pedro 4096 Apr 19 12:57 patches/ 4 drwxr-xr-x 5 pedro pedro 4096 Feb 17 2011 patches-ee/ 4 -rwxrwxr-x 1 pedro pedro 261 May 21 14:46 remote_in.sh* 0 lrwxrwxrwx 1 pedro pedro 29 Sep 6 2011 solution -> ../project-client-solution/
The project-client-solution is simply the pentaho solution folder without the system specific folder, admin, bi-developers, plugin-samples, steel-wheels, system. Tune this exclusion list at will in a file called .gitignore. Here's mine for the project-client-solution:
From this point on we can use the generic VCS techniques. Git can take a while to get used to, but the list of commands we need are very simple. I won't focus a lot on the project-client CBF structure, as there's lots of documentation on the CBF website, but everything still applies.
pedro@arpeggio:~/tex/pentaho/project-stonegate-solution (master)$ cat .gitignore admin/ system/ steel-wheels/ bi-developers/ *_tmp* index*.properties cde_sample/ .project plugin-samples/
Moving on to the list of questions
FAQ: Frequently Asked Questions
Q: How do I checkout a project with git?A: $ git clone email@example.com:project-client-solution
This can be done not only for the development sandboxes (both project-client and project-client-solution) but also, for the latter, on the production and staging machines. That will allow us to manage versioning on the server too.
Q: Should several developers be working on the same development box? How to avoid conflicts?A: I do not recommend this. It's always a good idea to have a local development sandbox. It's doable if the developers are working on different areas, but you'll get into conflicts that will be harder to isolate
Q: How to check in/check out units of work?A: Once we have our local repository, we can jump to the most up to date with the command:
$ git pull
Q: How to check in my work?
Unlike svn, when you commit work it doesn't get pushed to the remote repository. You shuold commit early and often (don't even need internet for it) and then push to the central server. You do that with the following:
$ git commit -m 'message'
$ git push
Here's where a visual tool gets useful. Mac users have GitX, linux users have GitG, everyone has gitk and git update. So for the commit part I usual use the visual tool.
Q: Guidelines how to package a new version of a dashboard and migrate it from development to test, and from test to production
The development happens on the main branch, called the master branch. When we're ready to release a certain version, we create a new branch with the name of that version (obviously, feel free to choose what you want). In this example, I'll call it v1.
That branch can then be be checked out on the QA server.
dev$ git branch v1
dev$ git push
qa$ git pull
qa$ git checkout branch
Next step is testing it. If we find a bug that need fixing, we can fix it on that branch. If appropriated, we can merge the fix back on the development branch
qa$ git commit -m 'fixed bug on v1'
qa$ git push
dev$ git checkout master # be sure we change back to master
dev$ git merge v1 # pull the bug fix
dev$ git push # fix integrated
After we're happy with the solution and ready to go to production. This is where the tags come to good use. We can create a tag on it and push that information.
qa$ git tag v1.0
qa$ git push --tags
prod$ git pull
prod$ git checkout v1.0
That would put you on the correct version. Don't forget to update the solution repository.
Q: I was playing with the solution but I don't want to commit any change, just want to wipe the entire thing and get back to the clean state
$ git reset --hard HEAD
Q: I changed a single and I want to have it back / reverted to the last state
$ git checkout
Q: How to avoid overwriting each others work (which happens now if we're not careful)
No git command for this. Basically comes for free. However, if we're working on a bigger change, it's recommended that you create a new branch for it. That way you can work on that with guarantees that a specific feature can be developed independently.
Here's a schematic of how it conceptually works:
$ git branch featureX
This will start an isolated development. You can do regular commits, pushes, etc. When finished, you can can merge back to the man branch. You do that by
1) switching to the main branch, usually called master:
$ git checkout master
2) Merge feature x back to the main master
$ git merge featureX
3) If there are any conflicts you'll need to resolve them and commit. After a push, feature X will then be available.
This doesn't aim to be a full tutorial about git. There are tons of great documentation, and it's really a powerful tool. But should provide some best approaches on how to best handle a pentaho implementation.
Any extra questions, just email / comment here and I'll add them