CHAPTER 29 Classification and Regression Trees Dean L. Urban From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design,

Slides:



Advertisements
Similar presentations
Computing in Archaeology Session 12. Multivariate statistics © Richard Haddlesey
Advertisements

Tables, Figures, and Equations
CHAPTER 27 Mantel Test From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
CHAPTER 10 Sup. (Acceptance Sampling) Statistical Process Control – “Sampling to determine if process is within acceptable limits” Learned previously Acceptance.
CHAPTER 24 MRPP (Multi-response Permutation Procedures) and Related Techniques From: McCune, B. & J. B. Grace Analysis of Ecological Communities.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Chapter 17 Making Sense of Advanced Statistical Procedures in Research Articles.
Chapter 17 Overview of Multivariate Analysis Methods
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
GRA 6020 Multivariate Statistics; The Linear Probability model and The Logit Model (Probit) Ulf H. Olsson Professor of Statistics.
CHAPTER 22 Reliability of Ordination Results From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
CHAPTER 28 Nested Designs From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Curve-Fitting Regression
Statistics: Data Analysis and Presentation Fr Clinic II.
CHAPTER 19 Correspondence Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Additive Models and Trees
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
CHAPTER 17 Bray-Curtis (Polar) Ordination From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Indicator Species Analysis
Chapter 7 Data Screening From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
CHAPTER 30 Structural Equation Modeling From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
CHAPTER 20 Detrended Correspondence Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
Chapter 6 Distance Measures From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
CHAPTER 23 Multivariate Experiments From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
CHAPTER 18 Weighted Averaging From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Tables, Figures, and Equations
CHAPTER 3 Community Sampling and Measurements From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.
Ensemble Learning (2), Tree and Forest
Decision Tree Models in Data Mining
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 5: Classification Trees: An Alternative to Logistic.
Chapter Fourteen Statistical Analysis Procedures Statistical procedures that simultaneously analyze multiple measurements on each individual or.
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
NFL Play Predictions Will Burton, NCSU Industrial Engineering 2015
Curve-Fitting Regression
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Multivariate Data Analysis Chapter 5 – Discrimination Analysis and Logistic Regression.
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
LESSON 6: REGRESSION 2/21/12 EDUC 502: Introduction to Statistics.
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 6: Classification Trees.
Classification and Regression Trees
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Supervised learning in high-throughput data  General considerations  Dimension reduction with outcome variables  Classification models.
Multivariate Statistics with Grouped Units Hal Whitehead BIOL4062/5062.
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
10. Decision Trees and Markov Chains for Gene Finding.
BINARY LOGISTIC REGRESSION
Basic Estimation Techniques
Ch9: Decision Trees 9.1 Introduction A decision tree:
Chapter 15 Linear Regression
Basic Estimation Techniques
LEARNING OUTCOMES After studying this chapter, you should be able to
Bootstrapping Jackknifing
R & Trees There are two tree libraries: tree: original
CART on TOC CART for TOC R 2 = 0.83
Classification with CART
Presentation transcript:

CHAPTER 29 Classification and Regression Trees Dean L. Urban From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon Tables, Figures, and Equations

Table A matrix matching statistical techniques to various applications that require group classification or discrimination. Applications are discussed in the Introduction, coded here as groups defined on species composition (SPP) or environmental variables (ENV). Techniques are discriminant analysis (DA), group- contrast Mantel test (GC-Mantel), multivariate analysis of variance (MANOVA), nonparametric MANOVA (NPMANOVA), multi-response permutation procedures (MRPP), classification and regression trees (CART), generalized linear models (GLM), and generalized additive models (GAM). ApplicationAppropriate Techniques Exploratory data analysis: 1a. Do SPP groups differ?CART, DA, GC-Mantel, MANOVA, NPMANOVA, MRPP 1b. On which ENV variable(s)?CART, DA, partial GC-Mantel 2a. Do ENV groups differ?ISA, CART, GC-Mantel, MRPP 2b. On which SPP?ISA, CART, partial GC-Mantel 3a. Do habitats differ?DA, CART, MANOVA, NPMANOVA, MRPP, logistic regression, GLM, GAM, etc. 3b. On which variable(s)?CART, DA, partial GC-Mantel, logistic regression, etc. Predict group membership: 1c. on SPPISA (with some modification) 2c. on ENVCART, DA, (multinomial) logistic regression 3c. habitat variablesCART, DA, logistic regression

Table Indicator Species Analysis for the seven forest types identified via hierarchical clustering. Indicator values (IV) are percentage of perfect fidelity. Indicator values were tested for statistical significance based on 1000 permutations (**, p < 0.001; *, p < 0.005). Sequence = order of groups in data, Identifier = group identifier, Avg =Average IV, Max = Maximum IV, MaxGrp = Group with highest IV.

Figure Upper: Classification tree for 7 forest types on 15 environmental variables (function rpart, complexity parameter (cp) = , minsplit = 10, split = information).

Figure (Lower): Pruned classification tree, simplified by stopping the tree at the number of nodes corresponding to the point where the pruning curve crosses the minimum (1 S.E.) line (Fig. 29.2).

Table Misclassification table for the 7 forest types, based on a pruned CART model with 11 nodes (Fig. 29.3). Rows are actual forest types, columns are predicted forest types. Row totals are indexed as number correct/number misclassified. Total misclassification rate based on jack-knifing is 39/98 (39.8%).

Figure Cost-complexity pruning curve for the classification tree in Figure Error bars are estimated from 10 cross-validation subsets of the samples. The horizontal line is one standard error above the minimum error rate. “Inf” = infinite. Relative error is calculated by cross-validation.