Plans to improve estimators to better utilize panel data John Coulston Southern Research Station Forest Inventory and Analysis.

Slides:



Advertisements
Similar presentations
Chapter 16 Inferential Statistics
Advertisements

Random Forest Predrag Radenković 3237/10
Objectives 10.1 Simple linear regression
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Sparse vs. Ensemble Approaches to Supervised Learning
Chapter 9 Chapter 10 Chapter 11 Chapter 12
Evaluation.
Week 17GEOG2750 – Earth Observation and GIS of the Physical Environment1 Lecture 14 Interpolating environmental datasets Outline – creating surfaces from.
Prediction Methods Mark J. van der Laan Division of Biostatistics U.C. Berkeley
7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University.
Lecture 4. Interpolating environmental datasets
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Reduce Instrumentation Predictors Using Random Forests Presented By Bin Zhao Department of Computer Science University of Maryland May
Machine Learning: Ensemble Methods
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Experimental Evaluation
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.
Ensemble Learning (2), Tree and Forest
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Chapter 10 Hypothesis Testing
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Chapter 10. Sampling Strategy for Building Decision Trees from Very Large Databases Comprising Many Continuous Attributes Jean-Hugues Chauchat and Ricco.
Experimental Evaluation of Learning Algorithms Part 1.
Learning from observations
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Ensemble Methods in Machine Learning
Konstantina Christakopoulou Liang Zeng Group G21
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Using Classification Trees to Decide News Popularity
AN IMPROVED VOLUME, BIOMASS, AND CARBON DATABASE FOR U.S. TREE SPECIES James A. Westfall U.S. Forest Service Forest Inventory and Analysis.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Machine Learning in CSC 196K
Classification and Regression Trees
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Local Calibration: How Many Data Points are Best? Presented by Barry Boehm on behalf of Vu Nguyen, Thuy Huynh University of Science Vietnam National University.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
By Subhasis Dasgupta Asst Professor Praxis Business School, Kolkata Classification Modeling Decision Tree (Part 2)
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Section 11.2 Day 3.
Bagging and Random Forests
Introduction to Machine Learning and Tree Based Methods
Eco 6380 Predictive Analytics For Economists Spring 2016
Trees, bagging, boosting, and stacking
Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution.
Igor Appel Alexander Kokhanovsky
Psych 231: Research Methods in Psychology
Fixed, Random and Mixed effects
Classification with CART
Analyzing and Interpreting Quantitative Data
Machine Learning: Lecture 5
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Plans to improve estimators to better utilize panel data John Coulston Southern Research Station Forest Inventory and Analysis

Background and Motivation Symposium session on combining panel data: Recommendation –…any serious attempt at defining an estimation system for analysis of changes and trends over time must explicitly account for time in the assumed underlying model…adopt and encourage an inferential model for FIA that places time on an equal footing with area… Putting the A back in FIA – Clutter 2006.

Examples and approach Forest area change in Georgia from Spatial realization of forest age structure in Alabama in 2007 Use an appropriate technique for the question posed

Why reinvent the wheel? Some analytical alternatives to Bechtold and Patterson 2005 for the annual forest inventory –Mixed estimator (Van Deusen 1999, 2002) Current estimates – flexible underlying trend –Mixed model (Smith & Conkling 2005) Current estimates and significance of annual change – linear trend –Random Forest ( Breiman 2001, Crookston & Finley 2008) Machine learning approach to classification and regression. Implemented in temporal map based estimation.

Is there a trend in Georgia forest area from

Mixed Estimator

Mixed Model

Stratified Estimate

Is there a trend in Georgia forest area from

Example: forest area trends in GA

Typical sampling error approach Hypothesis: H0: Δpf=0 H1: Δpf0 Approach: Sampling errors overlap so no significant change. Issues: Type II errors; Failure to leverage repeated measures

Explicitly testing for change If trend is sufficiently linear then the mixed model can be used to test HO: b1 = 0 H1: b1 0 Recall the mixed model: b 1 is the slope (change in y over time).

Example 2: Spatial realization of forest age structure in Alabama in 2007 Using a time-series on Landsat images identify the disturbance year and magnitude for each pixel. Calibrate the disturbance year and magnitude information to FIA age class information based on: C jz =f(X z,Y z,M z(j-d),(j-d) z,F jz ) Where c jz is the age class for location z at time j. X z =longitude of location z Y z =latitude of location z M z(j-d) =magnitude of last disturbance in year j-d at location z. (j-d) z =the number of years since the last disturbance at location z. F jz =land cover in year j at location z.

Random Forest Algorithm Learning algorithm Each tree is constructed using the following algorithm: 1. Let the number of training cases be N, and the number of variables in the classifier be M. 2. We are told the number m of input variables to be used to determine the decision at a node of the tree; m should be much less than M. 3. Choose a training set for this tree by choosing N times with replacement from all N available training cases (i.e. take a bootstrap sample). Use the rest of the cases to estimate the error of the tree, by predicting their classes. 4. For each node of the tree, randomly choose m variables on which to base the decision at that node. Calculate the best split based on these m variables in the training set. 5. Each tree is fully grown and not pruned (as may be done in constructing a normal tree classifier).

Accuracy of Age Class Map

Conclusions No one technique could answer the two question posed Use the appropriate methodology or combination of methodologies to address your question. From the examples, time should be explicitly accounted for when doing trend analysis or making current estimates. Leverage the longitudinal (repeated measure) data when possible. The temporally indifferent method currently used by FIA does generally provide estimates with smaller standard error. However, it is not a current estimate and the estimate should be tied to the approximate mid-point of the cycle – not the end year. All demonstrated techniques run using the R statistical package which can be directly linked to either internal oracle tables or FIADB.