Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution.

Similar presentations


Presentation on theme: "Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution."— Presentation transcript:

1 Using decision trees and their ensembles for analysis of NIR spectroscopic data
WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution

2 Outline S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

3 What decision trees are? Decision trees ensembles Cases
Outline Why decision trees? What decision trees are? Decision trees ensembles Cases Tecator Olives Conclusions bpimediagroup.com S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

4 Why decision trees? Why not?
S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

5 But why decision trees? Kaggle CEO and Founder Anthony Goldbloom:
”…in the history of Kaggle competitions, there are only two Machine Learning approaches that win competitions: Handcrafted and Neural Networks” ”…It used to be random forest that was the big winner, but over the last six months a new algorithm called XGboost has cropped up, and it’s winning practically every competition in the structured data category” S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

6 Why NIR spectroscopic data?
When a linear regression can be better that the decision trees methods? when relationship between X and y is fully linear when there is a very large number of features with low S/N ratio when covariate shift is likely S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

7 What decision trees are? Decision trees ensembles Cases
Outline Why decision trees? What decision trees are? Decision trees ensembles Cases Tecator Olives Conclusions S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

8 What decision trees are?
Drinks beer? yes no Knows statistics? Not chemometrician Chemometrician Steals ideas from statisticians? Chemometrician Not chemometrician S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

9 Decision trees for numeric variables
S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

10 Decision trees for numeric variables
Where are other variables? On every split the best variable is used Number of splits (tree depth) is limited Efficiency of split is a reduction of misclassification errors S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

11 Decision trees for numeric variables
How many splits? Limit minimum number of objects in each bucket Limit the maximum tree size (depth/split number) Make a big tree and prune all inefficient splits S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

12 Decision trees for numeric variables
How many splits? Limit minimum number of objects in each bucket Limit the maximum tree size (depth/split number) Make a big tree and prune all inefficient splits S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

13 Decision trees for numeric variables
How many splits? Limit minimum number of objects in each bucket Limit the maximum tree size (depth/split number) Make a big tree and prune all inefficient splits S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

14 Decision trees for numeric variables
How many splits? Limit minimum number of objects in each bucket Limit the maximum tree size (depth/split number) Make a big tree and prune all inefficient splits S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

15 Decision trees for numeric variables
How many splits? –50% Limit minimum number of objects in each bucket Limit the maximum tree size (depth/split number) Make a big tree and prune all inefficient splits 50 –44% 6 –2% 4 S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

16 Decision trees for numeric variables
How many splits? –50% Limit minimum number of objects in each bucket Limit the maximum tree size (depth/split number) Make a big tree and prune all inefficient splits 50% –44% 6% –2% 4% Use cross-validation to calculate the errors S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

17 Decision trees regression
Variable importance Is calculated for each variable individually Take s into account the role of a variable in different splits Is accumulated across all splits and normalized S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

18 Decision trees regression
Response variable is split into several bins Minimize variance in each node S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

19 What decision trees are? Decision trees ensembles Cases
Outline Why decision trees? What decision trees are? Decision trees ensembles Cases Tecator Olives Conclusions viapesnyary.ru S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

20 Decision trees ensembles
Ensemble learning — combine several models together A group of week learners can perform better when together decrease variance, make prediction more stable and reliable S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

21 Decision trees ensembles
Bagging Create N random subsets (sampling with replacement) Train model for every subset (parallel) Use simple average for prediction Random forest Boosting Train a model from a random subset Make N better models by using new subsets (sequential) Use weighted average for prediction Randomly with replacement Gradient boosting S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

22 What decision trees are? Decision trees ensembles Cases
Outline Why decision trees? What decision trees are? Decision trees ensembles Cases Tecator Olives Conclusions S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

23 Prediction of fat content in chopped meat samples by NIR spectra
Tecator Prediction of fat content in chopped meat samples by NIR spectra 100 predictors (NIR spectra by Tecator Infratec Food and Feed Analyzer, 850–1050 nm) 215 measurements (172 for calibration and 43 for test) S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

24 Single tree — predictions
S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

25 Single tree — the tree S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

26 Single tree — variable importance
S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

27 Single tree — variable selection
S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

28 Single tree — variable selection
S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

29 Random forest — predictions
S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

30 Random forests — importance of variables
S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

31 Random forests — variable selection
S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

32 What decision trees are? Decision trees ensembles Cases
Outline Why decision trees? What decision trees are? Decision trees ensembles Cases Tecator Olives Conclusions S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

33 Olives S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

34 Single tree — the tree and splits
S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

35 Single tree — classification
S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

36 Random forest — classification
S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

37 Variable importance S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

38 What decision trees are? Decision trees ensembles Cases
Outline Why decision trees? What decision trees are? Decision trees ensembles Cases Tecator Olives Conclusions S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

39 Conclusions ”The bottom line is: You can spend 3 hours playing with the data, generating features and interaction variables and get a 77% r-squared; and I can “from sklearn.ensemble import RandomForestRegressor” and in 3 minutes get an 82% r-squared.” S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

40 IASIM-2016 S. Kucheryavskiy, WSC-11, Saint Petersburg 2018

41 IASIM-2018, June 17-20 2018, Seattle, WA, USA
12 March — for student scholarship April 5 — for abstract S. Kucheryavskiy, WSC-11, Saint Petersburg 2018


Download ppt "Using decision trees and their ensembles for analysis of NIR spectroscopic data WSC-11, Saint Petersburg, 2018 In the light of morning session on superresolution."

Similar presentations


Ads by Google