A Tale of Two GAMs Generalized additive models as a tool for data exploration Mariah Silkey, Actelion Pharmacueticals Ltd. 1.

Slides:



Advertisements
Similar presentations
What Could We Do better? Alternative Statistical Methods Jim Crooks and Xingye Qiao.
Advertisements

Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan.
Lecture 9: Smoothing and filtering data
Polynomial Functions and Models Lesson 4.2. Review General polynomial formula a 0, a 1, …,a n are constant coefficients n is the degree of the polynomial.
Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.
Generalized Additive Models Keith D. Holler September 19, 2005 Keith D. Holler September 19, 2005.
Flexible smoothing with B-splines and Penalties or P-splines P-splines = B-splines + Penalization Applications : Generalized Linear and non linear Modelling.
Chapter 2 Functions and Graphs Section 4 Polynomial and Rational Functions.
RECITATION 1 APRIL 14 Lasso Smoothing Parameter Selection Splines.
More General Need different response curves for each predictor Need more complex responses.
Introduction to Smoothing Splines
Model assessment and cross-validation - overview
Data mining and statistical learning - lecture 6
Vector Generalized Additive Models and applications to extreme value analysis Olivier Mestre (1,2) (1) Météo-France, Ecole Nationale de la Météorologie,
1 Curve-Fitting Spline Interpolation. 2 Curve Fitting Regression Linear Regression Polynomial Regression Multiple Linear Regression Non-linear Regression.
Datamining and statistical learning - lecture 9 Generalized linear models (GAMs)  Some examples of linear models  Proc GAM in SAS  Model selection in.
統計計算與模擬 政治大學統計系余清祥 2003 年 6 月 9 日 ~ 6 月 10 日 第十六週:估計密度函數
Missing at Random (MAR)  is unknown parameter of the distribution for the missing- data mechanism The probability some data are missing does not depend.
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Basis Expansions and Regularization Based on Chapter 5 of Hastie, Tibshirani and Friedman.
Nonparametric Smoothing Methods and Model Selections T.C. Lin Dept. of Statistics National Taipei University 5/4/2005.
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
Biostatistics-Lecture 14 Generalized Additive Models Ruibin Xi Peking University School of Mathematical Sciences.
Spline and Kernel method Gaussian Processes
Data Mining Volinsky - Columbia University 1 Chapter 4.2 Regression Topics Credits Hastie, Tibshirani, Friedman Chapter 3 Padhraic Smyth Lecture.
Model Comparison for Tree Resin Dose Effect On Termites Lianfen Qian Florida Atlantic University Co-author: Soyoung Ryu, University of Washington.
Linear Model. Formal Definition General Linear Model.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
9.2A- Linear Regression Regression Line = Line of best fit The line for which the sum of the squares of the residuals is a minimum Residuals (d) = distance.
Curve Fitting Pertemuan 10 Matakuliah: S0262-Analisis Numerik Tahun: 2010.
Generalized Additive Models: An Introduction and Example
DierckxSpline : An R Package For Minimal Knot Splines Sundar Dorai-Raj Spencer Graves July 29, 2007.
Chi Square Test for Goodness of Fit Determining if our sample fits the way it should be.
1 Copyright © 2015, 2011, and 2008 Pearson Education, Inc. Chapter 1 Functions and Graphs Section 4 Polynomial and Rational Functions.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
LECTURE 17: BEYOND LINEARITY PT. 2 March 30, 2016 SDS 293 Machine Learning.
Estimating standard error using bootstrap
Chapter 2 Functions and Graphs
Solving Higher Degree Polynomial Equations.
Chapter 4.2 Regression Topics
Polynomials Functions
More General Need different response curves for each predictor
Generalized Additive Models & Decision Trees
Chapter 2 Functions and Graphs
Curve-Fitting Spline Interpolation
Machine learning, pattern recognition and statistical data modelling
Polynomial Functions and Models
Log Linear Modeling of Independence
Linear Regression.
Spline Interpolation Class XVII.
Linear regression Fitting a straight line to observations.
Least Squares Fitting A mathematical procedure for finding the best-fitting curve to a given set of points by minimizing the sum of the squares of the.
Lecture 1: Introduction to Machine Learning Methods
Simon W. Rabkin et al. JACEP 2017;j.jacep
Lesson 2.2 Linear Regression.
Nonlinear Fitting.
More General Need different response curves for each predictor
Discrete Least Squares Approximation
Simon W. Rabkin et al. JACEP 2017;3:
Basis Expansions and Generalized Additive Models (2)
Multivariate Analysis Regression
Ch 4.1 & 4.2 Two dimensions concept
Splines Part 2: Cubic and Smoothing Splines
Logistic Regression with “Grouped” Data
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Chapter 3 Vocabulary Linear Regression.
Regression and Correlation of Data
Generalized Additive Model
Presentation transcript:

A Tale of Two GAMs Generalized additive models as a tool for data exploration Mariah Silkey, Actelion Pharmacueticals Ltd. 1

Some observational data Progressive disease Lots of missing data Time between visits, or whether visits take place at all is uncontrolled Patients may be on the study drug, or not, or on a competing treatment Patients can switch treatments at will 2

Linear Regression Y i =  +  1 ×X i +  2 ×X i 2 +  i where  i ~ N(0,  2 ) Ordinary least squares used to fit parameters  +  i 3

General additive models Additive models fit smooth functions through some terms, so rather than optimizing  i we optimize smoothing functions f 1 Y i =  + f 1 (X i ) + f (X i 2 ) +  i where  i ~ N(0,  2 ) optimize span for loess # and location of knots for a regression spline # and location of knots, and degree of polynomial for quadratic and cubic regression splines Λ for penalized splines 4

Some observational data Progressive disease Lots of missing data Time between visits, or whether visits take place at all is uncontrolled Patients may be on the study drug, or not, or on a competing treatment Patients can switch treatments at will 5

R package gam gam written by Trevor Hastie and Robert Tibshirani uses loess smoothers or smoothing splines Optimization by AIC, or visual inspection of residuals 6

gam alá gam Call: gam(formula = counts ~ studyday + lo(covariate2) + lo(covariate1), family = poisson, data = x5) Deviance Residuals: Min 1Q Median 3Q Max AIC: DF for Terms and Chi-squares for Nonparametric Effects Df Npar Df Npar Chisq P(Chi) lo(covariate2) e-06 lo(covariate1)

R for GAM - mgcv mgcv written by Simon Wood Uses penalized regression spline methods, using lots of knots and minimizing the penalized sum of squares equation below ║Y - X  ║ 2 + λ∫f ″ (x) 2 dx Either for a given λ, or uses cross validation to choose the optimal penalty. 8

gam alá mgcv b <- gam(counts ~ study day + s(covariate1)+s(covariate2), data=dat, family = poisson) 9

gam alá mgcv b <- gam(counts ~ study day + s(covariate1)+s(covariate2), data=dat, family = poisson) 10

gam alá mgcv b2 <- gamm(counts ~ studyday + s(covariate1)+s(covariate2), random =list (subject =~1), data=dat, family = poisson) 11

gam and mgcv Readers familiar with the classical textbook from Hastie and Tibshirani may prefer the gam package as it follows the theory described in the book. The gam in mgcv: allows for a wider variety of covariance structures, including repeated measures, and spatial and temporal correlations. It’s the bee’s knees, in short. 12

References Mixed Effects Models and Extensions in Ecology with R Alain F. Zuur, Elena N. Ieno, Neil J. Walker, Anatoly A. Saveliev, Graham M. Smith Generalized Additive Models: An Introduction with R, 2006 Simon N. Wood The Elements of Statistical Learning: Data Mining, Inference and Prediction,2009 Trevor Hastie, Robert Tibshirani, and Jerome Friedman R: A language and environment for statistical computing. R Development Core Team (2009). R Foundation for Statistical Computing, Vienna, Austria. ISBN , URL Wood, S.N. (2008) Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society (B) 70(3): ;more information at Packages mgcv and gam can be downloaded 13