Random Forest Photometric Redshift Estimation Samuel Carliles 1 Tamas Budavari 2, Sebastien Heinis 2, Carey Priebe 3, Alex Szalay 2 Johns Hopkins University.

Slides:

Advertisements

Similar presentations

Wei Fan Ed Greengrass Joe McCloskey Philip S. Yu Kevin Drummey

Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.

Random Forest Predrag Radenković 3237/10

Evaluating Classifiers

Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe

K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.

BIRCH: Is It Good for Databases? A review of BIRCH: An And Efficient Data Clustering Method for Very Large Databases by Tian Zhang, Raghu Ramakrishnan.

Chapter 7 – Classification and Regression Trees

Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Decision Tree Rong Jin. Determine Milage Per Gallon.

A new crossover technique in Genetic Programming Janet Clegg Intelligent Systems Group Electronics Department.

Ensemble Learning: An Introduction

End of Chapter 8 Neil Weisenfeld March 28, 2005.

Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.

What is Cluster Analysis?

ICS 273A Intro Machine Learning

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.

Ensemble Learning (2), Tree and Forest

Radial Basis Function Networks

Evaluating Performance for Data Mining Techniques

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.

Chapter 9 – Classification and Regression Trees

Sampling Distributions & Standard Error Lesson 7.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

Benk Erika Kelemen Zsolt

CpSc 810: Machine Learning Evaluation of Classifier.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

CLASSIFICATION: Ensemble Methods

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.

Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

Optimal Sampling Strategies for Multiscale Stochastic Processes Vinay Ribeiro Rolf Riedi, Rich Baraniuk (Rice University)

Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.

Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.

Vector Quantization CAP5015 Fall 2005.

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Sampling Theory and Some Important Sampling Distributions.

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!

Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Presenter: Jae Sung Park

CHAPTER 8 Estimating with Confidence

University of Waikato, New Zealand

DECISION TREES An internal node represents a test on an attribute.

k-Nearest neighbors and decision tree

Bagging and Random Forests

Introduction to Machine Learning and Tree Based Methods

Ch9: Decision Trees 9.1 Introduction A decision tree:

Introduction to Data Mining, 2nd Edition by

Ungraded quiz Unit 6.

Test for Mean of a Non-Normal Population – small n

Roberto Battiti, Mauro Brunato

Decision Trees By Cole Daily CSCI 446.

Classification with CART

Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.

… 1 2 n A B V W C X 1 2 … n A … V … W … C … A X feature 1 feature 2

Chapter 4 (cont.) The Sampling Distribution

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

Random Forest Photometric Redshift Estimation Samuel Carliles 1 Tamas Budavari 2, Sebastien Heinis 2, Carey Priebe 3, Alex Szalay 2 Johns Hopkins University 1 Dept. of Computer Science 2 Dept. of Physics & Astronomy 3 Dept. of Applied Mathematics & Statistics

Photometric Redshifts  You know what they are  I did it on SDSS DR6 colors  z spec = f(u-g, g-r, r-i, i-z)  z phot = f(u-g, g-r, r-i, i-z)   = z phot - z spec  I did it with Random Forests  You know what they are  I did it on SDSS DR6 colors  z spec = f(u-g, g-r, r-i, i-z)  z phot = f(u-g, g-r, r-i, i-z)   = z phot - z spec  I did it with Random Forests ˆ

Regression Trees »A Binary Tree »It partitions input training data into clusters of similar objects »Each new test object is matched with the cluster to which it is “closest” in the input space »The output value is the mean of the output values of training objects in its cluster »A Binary Tree »It partitions input training data into clusters of similar objects »Each new test object is matched with the cluster to which it is “closest” in the input space »The output value is the mean of the output values of training objects in its cluster

Building a Regression Tree Starting at the root node choose a dimension on which to split Choose the point which “best” distinguishes clusters in that dimension Points left go in the left child, right go in the right child Repeat the process in each child node until all objects are in their own leaf node x1 x2 x3

How Do You Choose the Dimension and Split Point? The best split point in a dimension is the one which minimizes resubstitution error in that dimension The best dimension is the one with the lowest best resubstitution error

What’s Resubstitution Error? For a candidate split point, there are points left and points right  =  L ( x - x L ) 2 / N L +  R (x - x R ) 2 / N R That’s the resubstitution error Minimize it ¯¯

Randomizing a Regression Tree  Train it on a bootstrap sample  This is a sample of N objects chosen uniformly at random with replacement from the complete training set  Instead of choosing the best dimension to split on, choose the best from among a random subset of input dimensions  Train it on a bootstrap sample  This is a sample of N objects chosen uniformly at random with replacement from the complete training set  Instead of choosing the best dimension to split on, choose the best from among a random subset of input dimensions

Random Forest An ensemble of “randomized” Regression Trees Ensemble estimate is the mean of individual tree estimates This gives a distribution of iid estimation errors Central Limit Theorem gives the distribution of their mean Their mean is exactly z phot - z spec That means we have the error distribution for that object!

Implemented in R ◊More training data -> better estimates ◊Forests converge pretty quickly in forest size ◊Training set size, input space constrained by memory in R implementation ◊More training data -> better estimates ◊Forests converge pretty quickly in forest size ◊Training set size, input space constrained by memory in R implementation

Results RMS Error = Training set size = 80,000

Error Distribution Standardized Error Distribution Since we know the error distribution * for each object, we can standardize them and the results should be standard normal over all test objects. Like in this plot! :) If the standardized errors are standard normal, then we can predict how many of the errors fall between the tails of the distribution for different tail sizes. Like in this plot! (mostly)

Summary FRandom Forest estimates come with Gaussian error distributions F0.023 RMS error is competitive with other methodologies FThis makes Random Forests good FRandom Forest estimates come with Gaussian error distributions F0.023 RMS error is competitive with other methodologies FThis makes Random Forests good

Future Work  CRLB says bigger N gives better estimates from the same estimator  80,000 objects is good, but we have way more than that available  Random Forests in R are extremely memory (=time) inefficient I believe due to FORTRAN implementation  So I’m writing a C# implementation  CRLB says bigger N gives better estimates from the same estimator  80,000 objects is good, but we have way more than that available  Random Forests in R are extremely memory (=time) inefficient I believe due to FORTRAN implementation  So I’m writing a C# implementation