More General Need different response curves for each predictor

Slides:



Advertisements
Similar presentations
Mining the MACHO dataset Markus Hegland, Mathematical Sciences Institute, ANU Margaret Kahn, ANU Supercomputer Facility.
Advertisements

SPM – introduction & orientation introduction to the SPM software and resources introduction to the SPM software and resources.
Overview of SPM p <0.05 Statistical parametric map (SPM)
Chapter 4: Basic Estimation Techniques
A Tale of Two GAMs Generalized additive models as a tool for data exploration Mariah Silkey, Actelion Pharmacueticals Ltd. 1.
Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Generalized Additive Models Keith D. Holler September 19, 2005 Keith D. Holler September 19, 2005.
More General Need different response curves for each predictor Need more complex responses.
Modeling silky shark bycatch
Best Model Dylan Loudon. Linear Regression Results Erin Alvey.
Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.
Model Assessment, Selection and Averaging
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Vector Generalized Additive Models and applications to extreme value analysis Olivier Mestre (1,2) (1) Météo-France, Ecole Nationale de la Météorologie,
Kernel methods - overview
Copyright © 2008 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics, 9e Managerial Economics Thomas Maurice.
Statistics: Data Analysis and Presentation Fr Clinic II.
Basis Expansions and Regularization Based on Chapter 5 of Hastie, Tibshirani and Friedman.
Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.
Jensen, et. al Winter distribution of blue crab Callinectes sapidus in Chesapeake Bay: application and cross- validation of a two- stage generalized.
Data Mining Volinsky - Columbia University 1 Chapter 4.2 Regression Topics Credits Hastie, Tibshirani, Friedman Chapter 3 Padhraic Smyth Lecture.
Regression. Population Covariance and Correlation.
Generalizing Linear Discriminant Analysis. Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller.
Data Mining: Neural Network Applications by Louise Francis CAS Annual Meeting, Nov 11, 2002 Francis Analytics and Actuarial Data Mining, Inc.
Trees Lives Temp>30° Lives Dies Temp
STANDARDIZATION OF CPUE FROM ALEUTIAN ISLANDS GOLDEN KING CRAB FISHERY OBSERVER DATA M.S.M. Siddeek 1, J. Zheng 1, Doug Pengilly 2, and Gretchen Bishop.
Data Analysis, Presentation, and Statistics
Machine Learning 5. Parametric Methods.
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
Linear regression models. Purposes: To describe the linear relationship between two continuous variables, the response variable (y- axis) and a single.
1 C.A.L. Bailer-Jones. Machine Learning. Model selection and combination Machine learning, pattern recognition and statistical data modelling Lecture 10.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Stats Methods at IC Lecture 3: Regression.
Who will you trust? Field technicians? Software programmers?
Chapter 4: Basic Estimation Techniques
PREDICT 422: Practical Machine Learning
Chapter 4 Basic Estimation Techniques
Chapter 4.2 Regression Topics
KAIR 2013 Nov 7, 2013 A Data Driven Analytic Strategy for Increasing Yield and Retention at Western Kentucky University Matt Bogard Office of Institutional.
Deep Feedforward Networks
Robert Plant != Richard Plant
Basic Estimation Techniques
How Good is a Model? How much information does AIC give us?
Project 4: Facial Image Analysis with Support Vector Machines
Trees Nodes Is Temp>30? False True Temp<=30° Temp>30°
Statistics in SPSS Lecture 7
Boosting and Additive Trees
Machine learning, pattern recognition and statistical data modelling
Regression Analysis.
Nonparametric Density Estimation
Bias and Variance of the Estimator
Basic Estimation Techniques
Direct or Remotely sensed
Longline CPUE standardization: IATTC 2006
Regression Model Building
Association, correlation and regression in biomedical research
Introduction to Predictive Modeling
Presenter: Georgi Nalbantov
Jensen, et. al Winter distribution of blue crab Callinectes sapidus in Chesapeake Bay: application and cross-validation of a two-stage generalized.
More General Need different response curves for each predictor
Bias-variance Trade-off
2/28/2019 Exercise 1 In the bcmort data set, the four-level factor cohort can be considered the product of two two-level factors, say “period” (
Model generalization Brief summary of methods
Parametric Methods Berlin Chen, 2005 References:
The BRT was made with over 5,000 trees!
Generalized Linear Models
Generalized Additive Model
STT : Intro. to Statistical Learning
Presentation transcript:

More General Need different response curves for each predictor Need more complex responses

Generalized Additive Models 𝑔 𝑓 𝑥 𝑖 = 𝛽 0 +𝑓 1 𝑥 1𝑖 + 𝑓 2𝑖 𝑥 2𝑖 +… Adds functions to linearize each predictor variable 𝐸 𝑌 𝑖 = 𝑔 −1 ( 𝑓 1 𝑥 1𝑖 + 𝑓 2𝑖 𝑥 2𝑖 +…) Functions can be parametric or non-parametric: Including splines Makes GAMS: Very general Prone to over-fitting

Spline Curves 𝑓 𝑥 = 1 4 (𝑥+2) 3 −2≤𝑥≤−1 1 4 3 𝑥 3 −6 𝑥 2 +4 −1≤𝑥≤1 1 4 2−𝑥 3 1≤𝑥≤2 Knots Bell-shaped Irwin-Hall spline

Spline Curves in R Wrap predictors in a spline function: s(predictor) Use “gamma” parameter to set the number of knots Controls over-fitting 1.4 is recommended In R: TheModel=gam(Height~s(AnnualPrecip), data=TheData,gamma=1.4)

Reading When you have time: For our next meeting (on web site): “The Elements of Statistical Learning” by Friedman Generalized Additive Models by Hastie and Tibshirani For our next meeting (on web site): Read Martinez-Rincon (wahoo) Jensen (crabs)

Which Approach? GAM Kernel Smoother Age Income Age Income Z-axis shows the proportion of families with a telephone at home Hastie and Tibshirani 1986, Generalized Additive Models

GAM Plots in R “Partial” = 1 Covariate Modeled Response Curve 95% CI Sample point “Grass” FIA Doug-Fir height data vs. BioClim Annual Precipitation

Brown Shrimp in GOM Data from SeaMap and NOAA SeaMap Data, brown shrimp prefer muddy bottoms. Also, they spawn in shallow waters and then migrate to deeper water as they mature. The reason the density goes down as the depth goes to 0 is that the size of the net allows the smaller shrimp to escape. Data from SeaMap and NOAA

Gamma=1.4 Explained Deviance: 59%, AIC=57807 Data from FIA and BioClim Models for Doug-Fir in California from FIA data Explained Deviance: 59%, AIC=57807 Data from FIA and BioClim

Gamma=10 Explained Deviance: 59%, AIC=57961 Data from FIA and BioClim

Gamma=20 Explained Deviance: 57%, AIC=58081 Data from FIA and BioClim

Gamma=20 Explained Deviance: 51%, AIC=58796 Data from FIA and BioClim

Gamma=0.1 Explained Deviance: 59%, AIC=57811 Data from FIA and BioClim

GAM Model Runs Layers Gamma Explained Deviance AIC All 6 1.4 59 57807 10 58 57961 20 57 58081 Best 3 51 58796 0.1 57811

Best Model? Best 3 predictors, gamma=20 Data from FIA and BioClim

Blue Crab Distribution Model

Blue Crab vs. Salinity Jensen et. al. 2005, Winter distribution of blue crab Callinectes sapidus in Chesapeake Bay: application and cross-validation of a two-stage generalized additive model

Response Curves (partial) GAMs BRTs

GAMs vs. BRTs The BRT was made with over 5,000 trees! “Results indicate little difference between the performance of GAM and BRT models” Martinez-Rincon 2012, Comparative performance of generalized additive models and boosted regression trees for statistical modeling of incidental catch of wahoo (Acanthocybium solandri) in the Mexican tuna purse-seine fishery

Gamma in GAMs 𝑛 = number of training points 𝑥 = degrees of freedom 𝑛 – number of estimated parameters gam() chooses smoothing parameters to minimize: Note: The reason the effect of gamma reverses itself at large values is that 𝑔𝑎𝑚𝑎 ∗𝑥 becomes larger than 𝑛 ( 𝑦 − 𝑦 𝑖 ) 2 (𝑛−𝑔𝑎𝑚𝑎 ∗𝑥) 2

Anderson We are not trying to model the data; instead, we are trying to model the information in the data. The goal is to recover the information that applies more generally to the process, not just to the particular data set. If we were merely trying to model the data well, we could fit high order Fourier series terms or polynomial terms until the fit is perfect. Data contain both information and noise; fitting the data perfectly would include modeling the noise and this is counter to our science objective.

Additional Resources Generalized Additive Models: an introduction with R Copyrighted book Includes: Linear models GLMs GAMs Examples in R Some matrix algebra

Additional Resources Geospatial Analysis with GAMs: http://www.casact.org/education/annual/2011/handouts/C3-Guszcza.pdf Disease mapping using GAMs (workshop): http://www.cireeh.org/pmwiki.php/Main/Gam-mapWorkshop Mapping population based studies: http://www.ij-healthgeographics.com/content/5/1/26