Fullscreen
Loading...
 
Imprimir Comparte esta página

ROOTR RMLR

RMLR (ROOT MACHINE LEARNING WITH R) FOR TMVA

Image

Image

Machine Learning in R


Introduction

R does not define a standardized interface for all its machine learning algorithms. Therefore, for any non-trivial experiments, you need to write lengthy, tedious and error-prone wrappers to call the different algorithms and unify their respective output. Additionally you need to implement infrastructure to resample your models, optimize hyperparameters, select features, cope with pre- and post-processing of data and compare models in a statistically meaningful way. As this becomes computationally expensive, you might want to parallelize your experiments as well. This often forces users to make crummy trade-offs in their experiments due to time constraints or lacking expert programming skills. mlr provides this infrastructure so that you can focus on your experiments! The framework currently focuses on supervised methods like classification, regression and survival analysis and their corresponding evaluation and optimization. It is written in a way that you can extend it yourself or deviate from the implemented convenience methods and construct your own complex experiments or algorithms.

Methods for Classification, Regression, Cluster and Survival Analysis

http://mlr-org.github.io/mlr-tutorial/devel/html/integrated_learners/index.html(external link)

RML Project Sites

https://github.com/berndbischl/mlr/(external link)
http://berndbischl.github.io/mlr/tutorial/html/index.html(external link)
http://www.rdocumentation.org/packages/mlr(external link)

Required Packages

reshape colorspace minqa nloptr profileModel numDeriv rrcov GGally 
RColorBrewer dichromat munsell labeling pbkrtest itertools lme4 brglm 
lava cubature SparseM gdata caTools sgeostat robCompositions digest
gtable scales proto Rcpp stringr rJava car missForest Formula corpcor
foreach BradleyTerry2 ade4 rgl R2HTML prodlim np quantreg rgenoud lhs
mnormt plotmo plotrix entropy latticeExtra acepack igraph combinat stabs
nnls quadprog mvtnorm strucchange coin zoo sandwich doParallel iterators 
gtools bitops DEoptimR gplots pcaPP mvoutlier glasso matrixcalc RWekajars 
fdrtool maptree ParamHelpers BBmisc ggplot2 checkmate parallelMap plyr 
reshape2 ada adabag bartMachine brnn care caret clue clusterSim clValid 
cmaes CoxBoost crs Cubist DiceKriging DiceOptim DiscriMiner e1071 earth 
elmNN emoa extraTrees FNN FSelector gbm GenSA glmnet Hmisc irace kernlab 
kknn klaR kohonen laGP LiblineaR lqa mboost mco mda mlbench modeltools 
mRMRe nodeHarvest party penalized pls pROC randomForest randomForestSRC 
randomUniformForest RCurl rjson robustbase ROCR rrlda rsm RWeka sda stepPlr 
testthat tgp TH.data


RMLR

Image


Installing

install.packages("mlr", dependencies=TRUE)



Example using background and signal from TMVA with LDA(Fisher) classification method

MLR classification method dont support weights http://mlr-org.github.io/mlr-tutorial/devel/html/integrated_learners/index.html(external link)

Write the next root script to load data from .root file and pass it to R environment ReadData?.C

#include<TRInterface.h>
#include<vector>

void ReadTree(std::vector<float> &var1,std::vector<float> &var2,std::vector<float> &var3,std::vector<float> &var4,TTree *t)
{
float v1,v2,v3,v4;

t->SetBranchAddress("var1",&v1);
t->SetBranchAddress("var2",&v2);
t->SetBranchAddress("var3",&v3);
t->SetBranchAddress("var4",&v4);

for(int i=0;i<t->GetEntries();i++)
{
	t->GetEntry(i);
        var1.push_back(v1);
        var2.push_back(v2);
        var3.push_back(v3);
        var4.push_back(v4);
}
}

//read signal and background into R
void ReadData()
{
   ROOT::R::TRInterface &r=ROOT::R::TRInterface::Instance();
   TString fname = "./tmva_class_example.root";
   
   if (gSystem->AccessPathName( fname ))  // file does not exist in local directory
      gSystem->Exec("curl -O http://root.cern.ch/files/tmva_class_example.root");
   
   TFile *input = TFile::Open( fname ); 
   std::cout << "--- RMLR Classification       : Using input file: " << input->GetName() << std::endl;

   // --- Register the training and test trees
   TTree *signal     = (TTree*)input->Get("TreeS");
   TTree *background = (TTree*)input->Get("TreeB");
  
   //NOTE: the half of var1,var2,var3,var4 is the signal other half the background
   //signal variables
   std::vector<float>  var1,var2,var3,var4;
   ReadTree(var1,var2,var3,var4,signal);
   //backgruond variables
   ReadTree(var1,var2,var3,var4,background);
   
   //passing variables to R   
   r["var1"]<<var1;
   r["var2"]<<var2;
   r["var3"]<<var3;
   r["var4"]<<var4;
   //custom variables like in tmva
   r<<"myvar1 <- var1+var2";
   r<<"myvar2 <- var1-var2";
  
   //Factor is required for classification
   //this factor have two levels (background and signal)
   //https://stat.ethz.ch/R-manual/R-devel/library/base/html/factor.html
   r<<"target<-factor(c(rep('signal',6000),rep('background',6000)))";
   //NOTE: the dataframe have myvar1,myvar2,var3,var4,target
   //                         xs1   ,xs2   ,xs3 ,xs4 ,signal
   //                         ......(6000 registers for signal)
   //                         xb1   ,xb2   ,xb3 ,xb4 ,background
   //                         ......(6000 registers for background)
   
   r<<"tmvadata <- data.frame(myvar1,myvar2,var3,var4,target=target)";
  
  
}



With the loaded data run
ROOT::R::TRInterface &r=ROOT::R::TRInterface::Instance();
    ReadData();
    r<<"library(mlr,quietly =TRUE)";
    //## Define the task:
    r<<"task = makeClassifTask(id = 'RMLR', data = tmvadata, target = 'target')";

    //## Define the learner:
    r<<"lrn = makeLearner('classif.lda')";

    //## Define the resampling strategy:
    r<<"rdesc = makeResampleDesc(method = 'CV', stratify = TRUE)";

    //## Do the resampling:
    r<<"r = resample(learner = lrn, task = task, resampling = rdesc, show.info = FALSE)";
    
    //## Train the learner
    //you can take a subset of data using sample function
    //and you can pass it using the option subset=data in the train function
    r<<"subset=seq(1, 12000, by = 2)";//3000 signals and 3000 background data
    //http://berndbischl.github.io/mlr/tutorial/html/train/index.html
    r<<"trainer = train('classif.lda', task,subset=subset)";
    
    r<<"print('-------------------------------------------------------------------')";
    r<<"print('------------Classification: LDA   information ---------------------')";
    r<<"print(trainer$learner)";
    r<<"cat('TIME:',trainer$time,'\n\n')";
    r<<"print(trainer$learner.model)";
    r<<"print('-------------------------------------------------------------------')";