ROOTR RMLR

RMLR (ROOT MACHINE LEARNING WITH R) FOR TMVA

Machine Learning in R

Introduction

R does not define a standardized interface for all its machine learning algorithms. Therefore, for any non-trivial experiments, you need to write lengthy, tedious and error-prone wrappers to call the different algorithms and unify their respective output. Additionally you need to implement infrastructure to resample your models, optimize hyperparameters, select features, cope with pre- and post-processing of data and compare models in a statistically meaningful way. As this becomes computationally expensive, you might want to parallelize your experiments as well. This often forces users to make crummy trade-offs in their experiments due to time constraints or lacking expert programming skills. mlr provides this infrastructure so that you can focus on your experiments! The framework currently focuses on supervised methods like classification, regression and survival analysis and their corresponding evaluation and optimization. It is written in a way that you can extend it yourself or deviate from the implemented convenience methods and construct your own complex experiments or algorithms.

Methods for Classification, Regression, Cluster and Survival Analysis

http://mlr-org.github.io/mlr-tutorial/devel/html/integrated_learners/index.html

RML Project Sites

https://github.com/berndbischl/mlr/

http://berndbischl.github.io/mlr/tutorial/html/index.html

http://www.rdocumentation.org/packages/mlr

Required Packages

reshape colorspace minqa nloptr profileModel numDeriv rrcov GGally 
RColorBrewer dichromat munsell labeling pbkrtest itertools lme4 brglm 
lava cubature SparseM gdata caTools sgeostat robCompositions digest
gtable scales proto Rcpp stringr rJava car missForest Formula corpcor
foreach BradleyTerry2 ade4 rgl R2HTML prodlim np quantreg rgenoud lhs
mnormt plotmo plotrix entropy latticeExtra acepack igraph combinat stabs
nnls quadprog mvtnorm strucchange coin zoo sandwich doParallel iterators 
gtools bitops DEoptimR gplots pcaPP mvoutlier glasso matrixcalc RWekajars 
fdrtool maptree ParamHelpers BBmisc ggplot2 checkmate parallelMap plyr 
reshape2 ada adabag bartMachine brnn care caret clue clusterSim clValid 
cmaes CoxBoost crs Cubist DiceKriging DiceOptim DiscriMiner e1071 earth 
elmNN emoa extraTrees FNN FSelector gbm GenSA glmnet Hmisc irace kernlab 
kknn klaR kohonen laGP LiblineaR lqa mboost mco mda mlbench modeltools 
mRMRe nodeHarvest party penalized pls pROC randomForest randomForestSRC 
randomUniformForest RCurl rjson robustbase ROCR rrlda rsm RWeka sda stepPlr 
testthat tgp TH.data

RMLR

Installing

install.packages("mlr", dependencies=TRUE)

Example using background and signal from TMVA with LDA(Fisher) classification method

MLR classification method dont support weights http://mlr-org.github.io/mlr-tutorial/devel/html/integrated_learners/index.html

Write the next root script to load data from .root file and pass it to R environment ReadData?.C

#include<TRInterface.h>
#include<vector>

void ReadTree(std::vector<float> &var1,std::vector<float> &var2,std::vector<float> &var3,std::vector<float> &var4,TTree *t)
{
float v1,v2,v3,v4;

t->SetBranchAddress("var1",&v1);
t->SetBranchAddress("var2",&v2);
t->SetBranchAddress("var3",&v3);
t->SetBranchAddress("var4",&v4);

for(int i=0;i<t->GetEntries();i++)
{
	t->GetEntry(i);
        var1.push_back(v1);
        var2.push_back(v2);
        var3.push_back(v3);
        var4.push_back(v4);
}
}

//read signal and background into R
void ReadData()
{
   ROOT::R::TRInterface &r=ROOT::R::TRInterface::Instance();
   TString fname = "./tmva_class_example.root";
   
   if (gSystem->AccessPathName( fname ))  // file does not exist in local directory
      gSystem->Exec("curl -O http://root.cern.ch/files/tmva_class_example.root");
   
   TFile *input = TFile::Open( fname ); 
   std::cout << "--- RMLR Classification       : Using input file: " << input->GetName() << std::endl;

   // --- Register the training and test trees
   TTree *signal     = (TTree*)input->Get("TreeS");
   TTree *background = (TTree*)input->Get("TreeB");
  
   //NOTE: the half of var1,var2,var3,var4 is the signal other half the background
   //signal variables
   std::vector<float>  var1,var2,var3,var4;
   ReadTree(var1,var2,var3,var4,signal);
   //backgruond variables
   ReadTree(var1,var2,var3,var4,background);
   
   //passing variables to R   
   r["var1"]<<var1;
   r["var2"]<<var2;
   r["var3"]<<var3;
   r["var4"]<<var4;
   //custom variables like in tmva
   r<<"myvar1 <- var1+var2";
   r<<"myvar2 <- var1-var2";
  
   //Factor is required for classification
   //this factor have two levels (background and signal)
   //https://stat.ethz.ch/R-manual/R-devel/library/base/html/factor.html
   r<<"target<-factor(c(rep('signal',6000),rep('background',6000)))";
   //NOTE: the dataframe have myvar1,myvar2,var3,var4,target
   //                         xs1   ,xs2   ,xs3 ,xs4 ,signal
   //                         ......(6000 registers for signal)
   //                         xb1   ,xb2   ,xb3 ,xb4 ,background
   //                         ......(6000 registers for background)
   
   r<<"tmvadata <- data.frame(myvar1,myvar2,var3,var4,target=target)";
  
  
}

With the loaded data run

ROOT::R::TRInterface &r=ROOT::R::TRInterface::Instance();
    ReadData();
    r<<"library(mlr,quietly =TRUE)";
    //## Define the task:
    r<<"task = makeClassifTask(id = 'RMLR', data = tmvadata, target = 'target')";

    //## Define the learner:
    r<<"lrn = makeLearner('classif.lda')";

    //## Define the resampling strategy:
    r<<"rdesc = makeResampleDesc(method = 'CV', stratify = TRUE)";

    //## Do the resampling:
    r<<"r = resample(learner = lrn, task = task, resampling = rdesc, show.info = FALSE)";
    
    //## Train the learner
    //you can take a subset of data using sample function
    //and you can pass it using the option subset=data in the train function
    r<<"subset=seq(1, 12000, by = 2)";//3000 signals and 3000 background data
    //http://berndbischl.github.io/mlr/tutorial/html/train/index.html
    r<<"trainer = train('classif.lda', task,subset=subset)";
    
    r<<"print('-------------------------------------------------------------------')";
    r<<"print('------------Classification: LDA   information ---------------------')";
    r<<"print(trainer$learner)";
    r<<"cat('TIME:',trainer$time,'\n\n')";
    r<<"print(trainer$learner.model)";
    r<<"print('-------------------------------------------------------------------')";

Similar

Search Help

Non Advanced Search or Natural Search

The documents are returned sorted on relevance depending on order, proximity, frequency of terms.

Advanced Search or Boolean Search

Default search behavior

By default, all search terms are optional. It behaves like an OR logic. Objects that contain the more terms are rated higher in the results and will appear first in their type. For example, wiki forum will find:

objects that include both terms
objects that include the term wiki
objects that include the term forum

Requiring terms

Add a plus sign ( + ) before a term to indicate that the term must appear in results. Example: +wiki forum will find objects containing at least wiki. Objects with both terms and many occurences of the terms will appear first.

Excluding terms

Add a minus sign ( - ) before a term to indicate that the term must not appear in the results. To reduce a term's value without completely excluding it, use a tilde. Example: -wiki forum will find objects that do not contain wiki but contain forum

Grouping terms

Use parenthesis ( ) to group terms into subexpressions. Example: +wiki +(forum blog) will find objects that contain wiki and forum or that contain wiki and blog in any order.

Finding phrases

Use double quotes ( " " ) around a phrase to find terms in the exact order, exactly as typed. Example: "Alex Bell" will not find Bell Alex or Alex G. Bell.

Using wildcards

Add an asterisk ( * ) after a term to find objects that include the root word. For example, run* will find:

objects that include the term run
objects that include the term runner
objects that include the term running

Reducing a term's value

Add a tilde ( ~ ) before a term to reduce its value indicate to the ranking of the results. Objects that contain the term will appear lower than other objects (unlike the minus sign which will completely exclude a term). Example: +wiki ~forum will rate an object with only wiki higher that an object with wiki and forum.

Changing relevance value

Add a less than ( < ) or greater than ( > ) sign before a term to change the term's contribution to the overall relevance value assigned to a object. Example: +wiki +(>forum < blog) will find objects that contain wiki and forum or wiki and blog in any order. wiki forum will be rated higher.