HEP TMVA kit

Root is a C++ analysis framework (and much more) which is very popular among High Energy Physicists (HEP). TMVA is a toolkit for training and applying various Machine Learning algorithms  (or rather Multi Variate Analysis algorithms as they are called in HEP).

The kit provided here has been used to provide the "simple TMVA boosted trees" benchmark; it has been tested specifically on TMVA v4.1.3 in Root 5.34/03, on Mac OS 10.9.2. We provide it mostly to help HEP people to get started with the challenge.

Boosted trees algorithm is used as it is the one used in the ATLAS reference analysis, from which the provided data comes from. No attempt has been made to optimise the parameters of the trees (hence the "simple" keyword).

The kit provided is actually written in python (Root comes with a C++ interpreter but the more recent python binding  is increasingly popular). It is a simple script with five steps :

  1. conversion of the .csv file into a .root file
  2. training on the training file
  3. evaluation of the scores for the training and test files
  4. optimisation of the score threshold with respect to AMS
  5. creation of the submission file

By default, all steps are run one after the other but the output of each step is persistified, so that for easy debugging one can easily run or rerun each step separately.

How to run the kit (assuming Root is already installed):

  1. in one directory, download (from Kaggle) training.csv, test.csv, and from here the script higgsmltmva.py
  2. run the script : python higgsmltmva.py
  3. submit to the kaggle site the resulting submission.csv

Questions/suggestions about this kit preferably through the Challenge's forum on Kaggle

 

Recent Posts