Presentation is loading. Please wait.

Presentation is loading. Please wait.

U.S. Department of the Interior U.S. Geological Survey Decision Trees for Land Cover Mapping Guilty Parties: B. Wylie, C. Homer, C. Huang, L. Yang, M.

Similar presentations


Presentation on theme: "U.S. Department of the Interior U.S. Geological Survey Decision Trees for Land Cover Mapping Guilty Parties: B. Wylie, C. Homer, C. Huang, L. Yang, M."— Presentation transcript:

1

2 U.S. Department of the Interior U.S. Geological Survey Decision Trees for Land Cover Mapping Guilty Parties: B. Wylie, C. Homer, C. Huang, L. Yang, M. Coan EROS Data Center, Sioux Falls

3 Tab contents + Concept + Advantages + Training + Descriptive and Predictive Trees + More than Just Class Prediction + Cross Validation + Splus versus See5/C5 + Hierarchical Trees + Recipe for Success

4 How do you eat an elephant? One bite at time! Divide & Conquer Stratify & Predict

5 Specific Gravity Weight 6/6 = 100% 6/7 = 86% Separating Apples and Oranges (Lemons?)

6 Mnf1<=28 Mnf3<=19 Mnf13>5 6 Mnf1>19Mnf16<=54 decid. shrub Mnf3<=38 Mnf1<=28Mnf1<=25 Mnf3<=2 4 Mnf8<=28 decid. shrub Mnf11<51 Mnf17>5 6 shrubcedar decid. P. pinecedar Mnf2<=4 3 cedar P. pine Example Decision Tree TrueFalse

7 Advantages of Decision Trees Rapid Repeatable Nonparametric Utilize categorical data Non-linear relationships Less sensitive to errors in training data Disadvantages of Decision Trees Lots of training Over-fitting Weights toward the relative % of training data Short-sighted (stepwise methods fail to identify optimal subset regressors –Fox,1991)

8 Other Methods: Unsupervised Clustering + Cluster busting can be time consuming + Cluster interpretation is subjective + Cannot include categorical inputs + Difficult to interpret if multiple date or if non spectral data include (DEM) + Parametric (assumes normal distribution) Supervised Classification + Parametric (assumes normal distribution) + Cannot include categorical inputs + Problematic multiple date or if non-spectral data include (DEM) + Difficult for large area applications Neural Nets + Long convergence times (training) + High CPU demands (training) + Grey box + Tricked by local minimums + Non-repeatability of results (random search functions) + Sensitive to errors in training data

9 Spectral variability: A monoculture of wheat Training data: capture variability of a class (sample size)

10 Specific Gravity Weight Would 2 examples of each produce a reliable separation?

11 Training samples Classification tree is a “Data Mining” method so it performs well with large training data sets. Sampling of classes should reflect their relative frequency in the study area. rare classes = few training points common classes = many training points Adequate but not over sampling of rare classes Samples should be widely distributed over the study area to minimize autocorrelation effects and allow effective use of date band information.

12 Descriptive or Prediction decision trees? (De’ath and Fabricus 2000) DESCRIPTIVE TREE: 1) A single tree 2) Objective is to understand important factors or functional relationships 3) The decisions used by the tree are as important as the predictions A)Drivers of bear and deer habitat (Kobler and Adamic 2000, Debeljak et al. 2001) B)Predicting species distributions (Vayssieres, et al. 2000) + CART outperformed logistic regression

13 PREDICTION TREES: 1)Objective is Best Possible Predictions 2)Combination of multiple trees 3)Higher accuracies, more stable and robust (DeFries and Chan 2000)

14 Multiple Tree Approaches: Prediction 1)Bagging (bootstrap sampling of training data)--Splus & C5 2)Subset data layers—Splus & C5 3)Boosting * – C5

15 2) Subset of data layers—Splus & C5 Multiple Tree Approaches: Prediction soils spectral LUDA Tree 1 Tree 2 Tree 3 VOTEVOTE

16 2) Boosting (iterative tree’s try to account for previous tree’s errors)—C5 Different over-fitting issues associated with each tree tend to be averaged out. Multiple Tree Approaches VOTEVOTE

17 Single treeBoosted

18 Boosting versus Single Tree (Zone 16, Independent Test Data)

19 Mnf1<=28 Mnf3<=19 Mnf13>5 6 Mnf1>19Mnf16<=54 decid. shrub Mnf3<=38 Mnf1<=28Mnf1<=25 Mnf3<=2 4 Mnf8<=28 decid. shrub Mnf11<51 Mnf17>5 6 shrubcedar decid. P. pinecedar Mnf2<=4 3 cedar P. pine At each “terminal node” or “leaf” we: + know the number of training points correct, incorrect, and % right + could assign arbitrary “node numbers” Trees provide more than just land cover predictions

20 “leaf” mapland cover % right or confidence

21 Identification of “useful” tree inputs Use relative frequency of use of data layers in the training data as a crude index of data layer “utility”. Top “utility” data layers from 40 possible input layers

22 How to use “utility” of input data layers 1)Reduce inputs to decision tree 2)Reduced tree may have improved accuracies 3)Increases speed that the tree can be applied to the study area 4)Interpretation of underlying functional relationships (drivers) 5)Produce multiple trees for class “voting”

23 Honest Real-time Error Assessment : Cross Validation (3 fold)

24 Accuracy Assessment: Cross Validation versus Independent Test, Zone 16, Utah

25 Uses of Cross Validation Accuracy Assessment Optimal tree data sets Pruning All training data used for prediction Cautions spatial autocorrelation look for “significant” error changes when pruning or selecting tree parameters

26 Splus – See5/C5 Bake Off

27 Past Experiences: hierarchical implementation of trees Landsat 7 ETM+ Mosaic (band 5,4,3) Mapping Zone 60, Spring, 2000 and 2001

28 Forest and Non-Forest Classification Mapping Zone 60, 2001 Established a classification tree model for mapping forest and non-forest class using 1700+ FIA plot data (669 forest and 1100+ non-forest plots). The classification was run using a 5-fold cross- validation procedure. The agreement between mapped and reference/validation data is 95% with standard error (SE) less than 1.0%.

29 Forest Classification Based on NLCD 2000 Classification System Mapping Zone 60, 2001 Established a classification tree model for mapping three MRLC forest classes using 669 FIA plots and 5-fold cross-validation procedure (134 plots for validation for each of the 5 runs). The agreement between mapped and reference/validation data is 80% with SE 1.0%

30 Forest Type Group Classification Based on USFS classification System Mapping Zone 60, 2001 Established a classification tree model for mapping six forest type groups using 669 FIA plots and a 5-fold cross-validation procedure (134 plots for validation for each of the 5 runs). The agreement between mapped and reference/validation data is 65% with SE 2.3%

31 Leaf-on Landsat 7 ETM+ scene mosaic (bands 5,4,3) for mapping zone 16 – Utah/Southern Idaho

32 Forest/non-forest classification for mapping zone 16 – Utah/Southern Idaho 0 * 16.8% 1 * 18.8% 2 * 19.1% 3 * 19.4% 4 * 16.3% Mean 18.1% SE 0.6%

33 Deciduous/evergreen/ mixed classification for mapping zone 16 – Utah/Southern Idaho Fold Decision Tree ---- ---------------- Size Errors 0 * 19.7% 1 * 19.9% 2 * 18.5% 3 * 19.4% 4 * 20.8% Mean 19.7% SE 0.4%

34 Forest type group classification for mapping zone 16 – Utah/Southern Idaho Fold Decision Tree ---- ---------------- Size Errors 0 * 31.9% 1 * 37.3% 2 * 36.8% 3 * 40.0% 4 * 31.9% 5 * 37.8% 6 * 33.0% 7 * 35.7% 8 * 35.5% 9 * 32.3% Mean 35.2% SE 0.9%

35 Mean and standard error (in parenthesis) of the overall accuracy (%) of classifications developed using a descriptive tree and an hierarchical tree approach in 5 repeated experiments. (Zone 16)

36 Recipe for Success Adequate and representative training (adequately represent rare classes, preserve relative proportions of training and population) Model construction assessed with Cross Validation (boosting, pruning, and data layer exclusions) Multiple trees for mapping (boosting) Visual inspect land cover and add training in miss-classified areas and reconstruct model Consider “hierarchical” trees to allow trees to focus on problematic separations Avoid See5/C5 “rule set” option (SLOW when applied spatially)

37 Training Data Collection From Imagine Files Geo-registration linking? Ignore values! X,Y, Land cover (*.dat)?

38

39 Apply Tree Spatially Mask!!!!! Avoid Rules!! Confidence map!!

40


Download ppt "U.S. Department of the Interior U.S. Geological Survey Decision Trees for Land Cover Mapping Guilty Parties: B. Wylie, C. Homer, C. Huang, L. Yang, M."

Similar presentations


Ads by Google