ROOTR RMLR
RMLR (ROOT MACHINE LEARNING WITH R) FOR TMVA
Machine Learning in R
Introduction
R does not define a standardized interface for all its machine learning algorithms. Therefore, for any non-trivial experiments, you need to write lengthy, tedious and error-prone wrappers to call the different algorithms and unify their respective output. Additionally you need to implement infrastructure to resample your models, optimize hyperparameters, select features, cope with pre- and post-processing of data and compare models in a statistically meaningful way. As this becomes computationally expensive, you might want to parallelize your experiments as well. This often forces users to make crummy trade-offs in their experiments due to time constraints or lacking expert programming skills. mlr provides this infrastructure so that you can focus on your experiments! The framework currently focuses on supervised methods like classification, regression and survival analysis and their corresponding evaluation and optimization. It is written in a way that you can extend it yourself or deviate from the implemented convenience methods and construct your own complex experiments or algorithms.Methods for Classification, Regression, Cluster and Survival Analysis
http://mlr-org.github.io/mlr-tutorial/devel/html/integrated_learners/index.htmlRML Project Sites
https://github.com/berndbischl/mlr/http://berndbischl.github.io/mlr/tutorial/html/index.html
http://www.rdocumentation.org/packages/mlr
Required Packages
reshape colorspace minqa nloptr profileModel numDeriv rrcov GGally RColorBrewer dichromat munsell labeling pbkrtest itertools lme4 brglm lava cubature SparseM gdata caTools sgeostat robCompositions digest gtable scales proto Rcpp stringr rJava car missForest Formula corpcor foreach BradleyTerry2 ade4 rgl R2HTML prodlim np quantreg rgenoud lhs mnormt plotmo plotrix entropy latticeExtra acepack igraph combinat stabs nnls quadprog mvtnorm strucchange coin zoo sandwich doParallel iterators gtools bitops DEoptimR gplots pcaPP mvoutlier glasso matrixcalc RWekajars fdrtool maptree ParamHelpers BBmisc ggplot2 checkmate parallelMap plyr reshape2 ada adabag bartMachine brnn care caret clue clusterSim clValid cmaes CoxBoost crs Cubist DiceKriging DiceOptim DiscriMiner e1071 earth elmNN emoa extraTrees FNN FSelector gbm GenSA glmnet Hmisc irace kernlab kknn klaR kohonen laGP LiblineaR lqa mboost mco mda mlbench modeltools mRMRe nodeHarvest party penalized pls pROC randomForest randomForestSRC randomUniformForest RCurl rjson robustbase ROCR rrlda rsm RWeka sda stepPlr testthat tgp TH.data
RMLR
Installing
install.packages("mlr", dependencies=TRUE)
Example using background and signal from TMVA with LDA(Fisher) classification method
MLR classification method dont support weights http://mlr-org.github.io/mlr-tutorial/devel/html/integrated_learners/index.htmlWrite the next root script to load data from .root file and pass it to R environment ReadData?.C
#include<TRInterface.h> #include<vector> void ReadTree(std::vector<float> &var1,std::vector<float> &var2,std::vector<float> &var3,std::vector<float> &var4,TTree *t) { float v1,v2,v3,v4; t->SetBranchAddress("var1",&v1); t->SetBranchAddress("var2",&v2); t->SetBranchAddress("var3",&v3); t->SetBranchAddress("var4",&v4); for(int i=0;i<t->GetEntries();i++) { t->GetEntry(i); var1.push_back(v1); var2.push_back(v2); var3.push_back(v3); var4.push_back(v4); } } //read signal and background into R void ReadData() { ROOT::R::TRInterface &r=ROOT::R::TRInterface::Instance(); TString fname = "./tmva_class_example.root"; if (gSystem->AccessPathName( fname )) // file does not exist in local directory gSystem->Exec("curl -O http://root.cern.ch/files/tmva_class_example.root"); TFile *input = TFile::Open( fname ); std::cout << "--- RMLR Classification : Using input file: " << input->GetName() << std::endl; // --- Register the training and test trees TTree *signal = (TTree*)input->Get("TreeS"); TTree *background = (TTree*)input->Get("TreeB"); //NOTE: the half of var1,var2,var3,var4 is the signal other half the background //signal variables std::vector<float> var1,var2,var3,var4; ReadTree(var1,var2,var3,var4,signal); //backgruond variables ReadTree(var1,var2,var3,var4,background); //passing variables to R r["var1"]<<var1; r["var2"]<<var2; r["var3"]<<var3; r["var4"]<<var4; //custom variables like in tmva r<<"myvar1 <- var1+var2"; r<<"myvar2 <- var1-var2"; //Factor is required for classification //this factor have two levels (background and signal) //https://stat.ethz.ch/R-manual/R-devel/library/base/html/factor.html r<<"target<-factor(c(rep('signal',6000),rep('background',6000)))"; //NOTE: the dataframe have myvar1,myvar2,var3,var4,target // xs1 ,xs2 ,xs3 ,xs4 ,signal // ......(6000 registers for signal) // xb1 ,xb2 ,xb3 ,xb4 ,background // ......(6000 registers for background) r<<"tmvadata <- data.frame(myvar1,myvar2,var3,var4,target=target)"; }
With the loaded data run
ROOT::R::TRInterface &r=ROOT::R::TRInterface::Instance(); ReadData(); r<<"library(mlr,quietly =TRUE)"; //## Define the task: r<<"task = makeClassifTask(id = 'RMLR', data = tmvadata, target = 'target')"; //## Define the learner: r<<"lrn = makeLearner('classif.lda')"; //## Define the resampling strategy: r<<"rdesc = makeResampleDesc(method = 'CV', stratify = TRUE)"; //## Do the resampling: r<<"r = resample(learner = lrn, task = task, resampling = rdesc, show.info = FALSE)"; //## Train the learner //you can take a subset of data using sample function //and you can pass it using the option subset=data in the train function r<<"subset=seq(1, 12000, by = 2)";//3000 signals and 3000 background data //http://berndbischl.github.io/mlr/tutorial/html/train/index.html r<<"trainer = train('classif.lda', task,subset=subset)"; r<<"print('-------------------------------------------------------------------')"; r<<"print('------------Classification: LDA information ---------------------')"; r<<"print(trainer$learner)"; r<<"cat('TIME:',trainer$time,'\n\n')"; r<<"print(trainer$learner.model)"; r<<"print('-------------------------------------------------------------------')";