Presentation is loading. Please wait.

Presentation is loading. Please wait.

Robert Plant != Richard Plant

Similar presentations


Presentation on theme: "Robert Plant != Richard Plant"โ€” Presentation transcript:

1 Robert Plant != Richard Plant

2 Direct or Remotely sensed
May be the same data Covariates Direct or Remotely sensed Predictors Remotely sensed Field Data Response, coordinates Sample Data Response, covariates Qualify, Prep Qualify, Prep Qualify, Prep Random split? Randomness Inputs Test Data Training Data Outputs Temp Data Processes Build Model Repeated Over and Over The Model Statistics Validate Predict Randomness Predicted Values Uncertainty Maps Summarize Predictive Map

3 Cross-Validation Split the data into training (build model) and test (validate) data sets Leave-p-out cross-validation Validate on p samples, train on remainder Repeated for all combinations of p Non-exhaustive cross-validation Leave-p-out cross-validation but only on a subset of possible combinations Randomly splitting into 30% test and 70% training is common

4 K-fold Cross Validation
Break the data into K sections Test on ๐พ ๐‘– , Training remainder Repeat for all ๐พ ๐‘– 10-fold is common Test 1 2 3 4 Used in rpart() 5 Training 6 7 8 9 10

5 Bootstrapping Drawing N samples from the sample data (with replacement) Building the model Repeating the process over and over

6 Random Forest N samples drawn from the data with replacement
Repeated to create many trees A โ€œrandom forestโ€ โ€œSplitsโ€ are selected based on the most common splits in all the trees Bootstrap aggregation or โ€œBaggingโ€

7 Boosting Can a set of weak learners create a single strong learner? (Wikipedia) Lots of โ€œsimpleโ€ trees used to create a really complex tree "convex potential boosters cannot withstand random classification noise,โ€œ 2008 Phillip Long (at Google) and Rocco A. Servedio (Columbia University)

8 Boosted Regression Trees
BRTs combine thousands of trees to reduce deviance from the data Currently popular More on this later

9 Sensitivity Testing Injecting small amounts of โ€œnoiseโ€ into our data to see the effect on the model parameters. Plant The same approach can be used to model the impact of uncertainty on our model outputs and to make uncertainty maps Note: This is not the same as sensitivity testing for model parameters

10 Jackknifing Trying all combinations of covariates

11 Extrapolation vs. Prediction
From model Modeling: Creating a model that allows us to estimate values between data Extrapolation: Using existing data to estimate values outside the range of our data

12 Building Models Selecting the method
Selecting the predictors (โ€œModel Selectionโ€) Optimizing the coefficients/parameters of the model

13 Direct or Remotely sensed
May be the same data Covariates Direct or Remotely sensed Predictors Remotely sensed Field Data Response, coordinates Sample Data Response, covariates Qualify, Prep Qualify, Prep Qualify, Prep Random split? Randomness Inputs Test Data Training Data Outputs Temp Data Processes Build Model Repeated Over and Over The Model Statistics Validate Predict Randomness Predicted Values Uncertainty Maps Summarize Predictive Map

14 Model Selection Need a method to select the โ€œbestโ€ set of predictors
Really to select the best method, predictors, and coefficients (parameters) Should be a balance between fitting the data and simplicity R2 โ€“ only considers fit to data (but linear regression is pretty simple)

15 Simplicity Everything should be made as simple as possible, but not simpler. Albert Einstein "Albert Einstein Head" by Photograph by Oren Jack Turner, Princeton, licensed through Wikipedia

16 Parsimony โ€œโ€ฆtoo few parameters and the model will be so unrealistic as to make prediction unreliable, but too many parameters and the model will be so specific to the particular data set so to make prediction unreliable.โ€ Edwards, A. W. F. (2001). Occamโ€™s bonus. p. 128โ€“139; in Zellner, A., Keuzenkamp, H. A., and McAleer, M. Simplicity, inference and modelling. Cambridge University Press, Cambridge, UK.

17 Parsimony Under fitting model structure โ€ฆincluded in the residuals
Over fitting residual variation is included as if it were structural Parsimony Anderson

18 Akaike Information Criterion
AIC K = number of estimated parameters in the model L = Maximized likelihood function for the estimated model ๐ด๐ผ๐ถ=2๐‘˜ โˆ’2 lnโก(๐ฟ)

19 AIC Only a relative meaning Smaller is โ€œbetterโ€
Balance between complexity: Over fitting or modeling the errors Too many parameters And bias Under fitting or the model is missing part of the phenomenon we are trying to model Too few parameters

20 Likelihood Likelihood of a set of parameter values given some observed data=probability of observed data given parameter values Definitions ๐‘ฅ= all sample values ๐‘ฅ ๐‘– = one sample value ฮธ= set of parameters ๐‘ ๐‘ฅ ๐œƒ =probability of x, given ฮธ See: ftp://statgen.ncsu.edu/pub/thorne/molevoclass/pruning2013cme.pdf

21 Likelihood

22 -2 Times Log Likelihood

23 p(x) for a fair coin ๐ด๐ผ๐ถ=2๐‘˜โˆ’2 ln ๐ฟ ๐ฟ=๐‘ ๐‘ฅ 1 ๐œƒ โˆ—๐‘ ๐‘ฅ 2 ๐œƒ โ€ฆ 0.5 Heads
๐ฟ=๐‘ ๐‘ฅ 1 ๐œƒ โˆ—๐‘ ๐‘ฅ 2 ๐œƒ โ€ฆ 0.5 Heads Tails What happens as we flip a โ€œfairโ€ coin?

24 p(x) for an unfair coin ๐ด๐ผ๐ถ=2๐‘˜โˆ’2 ln ๐ฟ ๐ฟ=๐‘ ๐‘ฅ 1 ๐œƒ โˆ—๐‘ ๐‘ฅ 2 ๐œƒ โ€ฆ 0.8 Heads
๐ฟ=๐‘ ๐‘ฅ 1 ๐œƒ โˆ—๐‘ ๐‘ฅ 2 ๐œƒ โ€ฆ 0.8 Heads 0.2 Tails What happens as we flip a โ€œfairโ€ coin?

25 p(x) for a coin with two heads
๐ด๐ผ๐ถ=2๐‘˜โˆ’2 ln ๐ฟ ๐ฟ=๐‘ ๐‘ฅ 1 ๐œƒ โˆ—๐‘ ๐‘ฅ 2 ๐œƒ โ€ฆ 1.0 Heads 0.0 Tails What happens as we flip a โ€œfairโ€ coin?

26 Does likelihood from p(x) work?
if the likelihood is the probability of the data given the parameters, and a response function provides the probability of a piece of data (i.e. probability that this is suitable habitat) we can use the probability that a specific occurrence is suitable as the p(x|Parameters) Thus the likelihood of a habitat model (while disregarding bias) Can be computed by L(ParameterValues|Data)=p(Data1|ParameterValues)*p(Data2|ParameterValues)... Does not work, the highest likelihood will be to have a model with 1.0 everywhere, have to divide the model by itโ€™s area so the area under the model = 1.0 Remember: This only works when comparing the same dataset!

27 Akaikeโ€ฆ Akaike showed that: Which is equivalent to:
log โ„’ ๐œƒ ๐‘‘๐‘Ž๐‘ก๐‘Ž โˆ’๐พโ‰ˆ๐ธ ๐‘ฆ ๐ธ ๐‘ฅ log ๐‘” ๐‘ฅ| ๐œƒ (๐‘ฆ) Which is equivalent to: log โ„’ ๐œƒ ๐‘‘๐‘Ž๐‘ก๐‘Ž โˆ’๐พ=๐‘๐‘œ๐‘›๐‘ ๐‘ก๐‘Ž๐‘›๐‘กโˆ’ ๐ธ ๐œƒ I ๐‘“, ๐‘” Akaike then defined: AIC = โˆ’2log โ„’ ๐œƒ ๐‘‘๐‘Ž๐‘ก๐‘Ž +2๐พ

28 AICc Additional penalty for more parameters ๐ด๐ผ๐ถ๐‘=๐ด๐ผ๐ถ+ 2๐‘˜(๐‘˜+1) ๐‘›โˆ’๐‘˜โˆ’1
Recommended when n is small or k is large

29 ๐ต๐ผ๐ถ=2๐‘˜โˆ—๐‘™๐‘›(๐‘›) โˆ’2 lnโก(๐ฟ) BIC Bayesian Information Criterion
Adds n (number of samples) ๐ต๐ผ๐ถ=2๐‘˜โˆ—๐‘™๐‘›(๐‘›) โˆ’2 lnโก(๐ฟ)

30 Extra slides

31 Discrete: Continuous: Justification: ๐ท ๐พ๐ฟ = lnโก( ๐‘ƒ ๐‘– ๐‘„ ๐‘– )๐‘ƒ(๐‘–)
๐ท ๐พ๐ฟ (๐‘ƒ| ๐‘„ =โˆ’ ๐‘ ๐‘ฅ logโก(๐‘ž ๐‘ฅ + ๐‘ ๐‘ฅ logโก(๐‘(๐‘ฅ)

32 The distance can also be expressed as:
๐ผ ๐‘“,๐‘” = ๐‘“ ๐‘ฅ ๐‘™๐‘œ๐‘” ๐‘“ ๐‘ฅ ๐‘‘๐‘ฅโˆ’ ๐‘“ ๐‘ฅ ๐‘™๐‘œ๐‘” ๐‘” ๐‘ฅ ๐œƒ ๐‘‘๐‘ฅ ๐‘“ ๐‘ฅ is the expectation of ๐‘“ ๐‘ฅ so: ๐ผ ๐‘“,๐‘” = ๐ธ ๐‘“ log ๐‘“ ๐‘ฅ โˆ’ ๐ธ ๐‘“ log ๐‘” ๐‘ฅ ๐œƒ Treating ๐ธ ๐‘“ log ๐‘“ ๐‘ฅ as an unknown constant: ๐ผ ๐‘“,๐‘” โˆ’๐ถ= ๐ธ ๐‘“ log ๐‘” ๐‘ฅ ๐œƒ = Relative Distance between g and f


Download ppt "Robert Plant != Richard Plant"

Similar presentations


Ads by Google