Random Forest Predrag Radenković 3237/10

Slides:



Advertisements
Similar presentations
Chapter 7 Classification and Regression Trees
Advertisements

CHAPTER 9: Decision Trees
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Decision Tree Rong Jin. Determine Milage Per Gallon.
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning: An Introduction
Additive Models and Trees
Tree-based methods, neutral networks
Lecture 5 (Classification with Decision Trees)
Three kinds of learning
Machine Learning: Ensemble Methods
Comp 540 Chapter 9: Additive Models, Trees, and Related Methods
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Ensemble Learning (2), Tree and Forest
Introduction to Directed Data Mining: Decision Trees
Overview DM for Business Intelligence.
Issues with Data Mining
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
by B. Zadrozny and C. Elkan
CS 391L: Machine Learning: Ensembles
Chapter 9 – Classification and Regression Trees
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Ensemble with Neighbor Rules Voting Itt Romneeyangkurn, Sukree Sinthupinyo Faculty of Computer Science Thammasat University.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
Ensemble Methods in Machine Learning
Konstantina Christakopoulou Liang Zeng Group G21
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Data Analytics CMIS Short Course part II Day 1 Part 3: Ensembles Sam Buttrey December 2015.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Regression Tree Ensembles Sergey Bakin. Problem Formulation §Training data set of N data points (x i,y i ), 1,…,N. §x are predictor variables (P-dimensional.
Classification and Regression Trees
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
1 Machine Learning: Ensemble Methods. 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
Lecture 16. Bagging Random Forest and Boosting¶
Decision tree and random forest
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
JMP Discovery Summit 2016 Janet Alvarado
Data Mining Practical Machine Learning Tools and Techniques
Bagging and Random Forests
Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Introduction to Data Mining, 2nd Edition by
ECE 471/571 – Lecture 12 Decision Tree.
Classification and Prediction
Roberto Battiti, Mauro Brunato
CSE 573 Introduction to Artificial Intelligence Decision Trees
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ensemble learning.
Classification with CART
©Jiawei Han and Micheline Kamber
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Random Forest Predrag Radenković 3237/10 Faculty of Electrical Engineering University Of Belgrade

Definition Random forest (or random forests) is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. The term came from random decision forests that was first proposed by Tin Kam Ho of Bell Labs in 1995. The method combines Breiman's "bagging" idea and the random selection of features. Predrag Radenković 3237/10

Decision trees Decision trees are individual learners that are combined. They are one of the most popular learning methods commonly used for data exploration. One type of decision tree is called CART… classification and regression tree. CART … greedy, top-down binary, recursive partitioning, that divides feature space into sets of disjoint rectangular regions. Regions should be pure wrt response variable Simple model is fit in each region – majority vote for classification, constant value for regression. Predrag Radenković 3237/10

Decision tress involve greedy, recursive partitioning. Simple dataset with two predictors Greedy, recursive partitioning along TI and PE Predrag Radenković 3237/10

Algorithm Each tree is constructed using the following algorithm: Let the number of training cases be N, and the number of variables in the classifier be M. We are told the number m of input variables to be used to determine the decision at a node of the tree; m should be much less than M. Choose a training set for this tree by choosing n times with replacement from all N available training cases (i.e. take a bootstrap sample). Use the rest of the cases to estimate the error of the tree, by predicting their classes. For each node of the tree, randomly choose m variables on which to base the decision at that node. Calculate the best split based on these m variables in the training set. Each tree is fully grown and not pruned (as may be done in constructing a normal tree classifier). For prediction a new sample is pushed down the tree. It is assigned the label of the training sample in the terminal node it ends up in. This procedure is iterated over all trees in the ensemble, and the average vote of all trees is reported as random forest prediction. Predrag Radenković 3237/10

Algorithm flow chart For computer scientists: Predrag Radenković 3237/10

Random Forest – practical consideration Splits are chosen according to a purity measure: E.g. squared error (regression), Gini index or devinace (classification) How to select N? Build trees until the error no longer decreases How to select M? Try to recommend defaults, half of them and twice of them and pick the best. Predrag Radenković 3237/10

Features and Advantages The advantages of random forest are: It is one of the most accurate learning algorithms available. For many data sets, it produces a highly accurate classifier. It runs efficiently on large databases. It can handle thousands of input variables without variable deletion. It gives estimates of what variables are important in the classification. It generates an internal unbiased estimate of the generalization error as the forest building progresses. It has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing. Predrag Radenković 3237/10

Features and Advantages It has methods for balancing error in class population unbalanced data sets. Generated forests can be saved for future use on other data. Prototypes are computed that give information about the relation between the variables and the classification. It computes proximities between pairs of cases that can be used in clustering, locating outliers, or (by scaling) give interesting views of the data. The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier detection. It offers an experimental method for detecting variable interactions. Predrag Radenković 3237/10

Disadvantages Random forests have been observed to overfit for some datasets with noisy classification/regression tasks. For data including categorical variables with different number of levels, random forests are biased in favor of those attributes with more levels. Therefore, the variable importance scores from random forest are not reliable for this type of data. Predrag Radenković 3237/10

RM - Additional information Estimating the test error: While growing forest, estimate test error from training samples For each tree grown, 33-36% of samples are not selected in bootstrap, called out of bootstrap (OOB) samples Using OOB samples as input to the corresponding tree, predictions are made as if they were novel test samples Through book-keeping, majority vote (classification), average (regression) is computed for all OOB samples from all trees. Such estimated test error is very accurate in practice, with reasonable N Predrag Radenković 3237/10

RM - Additional information Estimating the importance of each predictor: Denote by ê the OOB estimate of the loss when using original training set, D. For each predictor xp where p∈{1,..,k} Randomly permute pth predictor to generate a new set of samples D' ={(y1,x'1),…,(yN,x'N)} Compute OOB estimate êk of prediction error with the new samples A measure of importance of predictor xp is êk – ê, the increase in error due to random perturbation of pth predictor Predrag Radenković 3237/10

Conclusions & summary: Fast fast fast! RF is fast to build. Even faster to predict! Practically speaking, not requiring cross-validation alone for model selection significantly speeds training by 10x-100x or more. Fully parallelizable … to go even faster! Automatic predictor selection from large number of candidates Resistance to over training Ability to handle data without preprocessing data does not need to be rescaled, transformed, or modified resistant to outliers automatic handling of missing values Cluster identification can be used to generate tree-based clusters through sample proximity Predrag Radenković 3237/10

Thank you! Predrag Radenković 3237/10