Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.

Slides:



Advertisements
Similar presentations
The Maximum Likelihood Method
Advertisements

Estimation of Production Functions: Random Effects in Panel Data Lecture IX.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
The General Linear Model Or, What the Hell’s Going on During Estimation?
Lecture 6 (chapter 5) Revised on 2/22/2008. Parametric Models for Covariance Structure We consider the General Linear Model for correlated data, but assume.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Raymond J. Carroll Texas A&M University Non/Semiparametric Regression and Clustered/Longitudinal Data.
Model assessment and cross-validation - overview
Data mining and statistical learning - lecture 6
Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
The Simple Linear Regression Model: Specification and Estimation
Economics Prof. Buckles1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Raymond J. Carroll Texas A&M University Nonparametric Regression and Clustered/Longitudinal Data.
Maximum likelihood (ML) and likelihood ratio (LR) test
Raymond J. Carroll Department of Statistics and Nutrition Texas A&M University Non/Semiparametric Regression.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
Generalized Regression Model Based on Greene’s Note 15 (Chapter 8)
Nonparametric Regression and Clustered/Longitudinal Data
Score Tests in Semiparametric Models Raymond J. Carroll Department of Statistics Faculties of Nutrition and Toxicology Texas A&M University
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
Chapter 15 Panel Data Analysis.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey Joint work with F. Jay Breidt and Jean Opsomer September 8, 2005.
Model Selection in Semiparametrics and Measurement Error Models Raymond J. Carroll Department of Statistics Faculty of Nutrition and Toxicology Texas A&M.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Basics of regression analysis
Economics Prof. Buckles
Semiparametric Methods for Colonic Crypt Signaling Raymond J. Carroll Department of Statistics Faculty of Nutrition and Toxicology Texas A&M University.
Maximum likelihood (ML)
Linear regression models in matrix terms. The regression function in matrix terms.
9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
Modern Navigation Thomas Herring
Geographic Information Science
Model Comparison for Tree Resin Dose Effect On Termites Lianfen Qian Florida Atlantic University Co-author: Soyoung Ryu, University of Washington.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
1 Javier Aparicio División de Estudios Políticos, CIDE Primavera Regresión.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
EC 532 Advanced Econometrics Lecture 1 : Heteroscedasticity Prof. Burak Saltoglu.
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling STATISTICS Linear Statistical.
Lecture 5. Linear Models for Correlated Data: Inference.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Regression Analysis1. 2 INTRODUCTION TO EMPIRICAL MODELS LEAST SQUARES ESTIMATION OF THE PARAMETERS PROPERTIES OF THE LEAST SQUARES ESTIMATORS AND ESTIMATION.
Histograms h=0.1 h=0.5 h=3. Theoretically The simplest form of histogram B j = [(j-1),j)h.
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Numerical Analysis – Data Fitting Hanyang University Jong-Il Park.
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Probability Theory and Parameter Estimation I
LECTURE 11: Advanced Discriminant Analysis
The Maximum Likelihood Method
Chapter 3: TWO-VARIABLE REGRESSION MODEL: The problem of Estimation
The Regression Model Suppose we wish to estimate the parameters of the following relationship: A common method is to choose parameters to minimise the.
Lecture 7 Nonparametric Regression: Nadaraya Watson Estimator
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
OVERVIEW OF LINEAR MODELS
Lecture 4: Econometric Foundations
OVERVIEW OF LINEAR MODELS
Nonlinear Fitting.
Learning From Observed Data
Presentation transcript:

Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric Regression and Clustered/Longitudinal Data

2 Where am I From? College Station, home of Texas A&M I-35 I-45 Big Bend National Park Wichita Falls, my hometown

3 Raymond CarrollAlan Welsh Naisyin WangEnno Mammen Xihong Lin Oliver Linton Acknowledgments Series of papers are on my web site Lin, Wang and Welsh: Longitudinal data Linton and Mammen: time series data

4 Outline Longitudinal models: panel data Background: splines = kernels for independent data Nonparametric case: do splines = kernels? Semiparametric case: partially linear model: does it matter what nonparametric method is used?

5 Panel Data (for simplicity) i = 1,…,n clusters/individuals j = 1,…,m observations per cluster SubjectWave 1Wave 2…Wave m 1XXX 2XXX …X nXXX

6 Panel Data (for simplicity) i = 1,…,n clusters/individuals j = 1,…,m observations per cluster Important point: The cluster size m is meant to be fixed This is not a time series problem where the cluster size increases to infinity We have equivalent time series results

7 The Nonparametric Model Y = Response X = time-varying covariate Question : can we improve efficiency by accounting for correlation?

8 Independent Data Two major methods Splines (smoothing, P-splines, etc.) with penalty parameter = Kernels (local averages, local linear, etc.), with kernel function K and bandwidth h

9 Independent Data Two major methods Splines (smoothing, P-spline, etc.) Kernels (local averages, etc.) Both are linear in the responses Both give similar answers in data Silverman showed that the weight functions are asymptotically equivalent In this sense, splines = kernels

10 The weight functions G n (t=.25,x) in a specific case for independent data Kernel Smoothing Spline Note the similarity of shape and the locality: only X’s near t=0.25 get any weight

11 Working Independence Working independence: Ignore all correlations Posit some reasonable marginal variances Splines and kernels have obvious weighted versions Weighting important for efficiency Splines and kernels are linear in the responses The Silverman result still holds In this sense, splines = kernels

12 Accounting for Correlation Splines have an obvious analogue for non- independent data Let be a working covariance matrix Penalized Generalized least squares (GLS) Because splines are based on likelihood ideas, they generalize quickly to new problems Kernels have no such obvious analogue

13 Accounting for Correlation Kernels are not so obvious Local likelihood kernel ideas are standard in independent data problems Most attempts at kernels for correlated data have tried to use local likelihood kernel methods

14 Kernels and Correlation Problem: how to define locality for kernels? Goal: estimate the function at t Let be a diagonal matrix of standard kernel weights Standard Kernel method: GLS pretending inverse covariance matrix is The estimate is inherently local

15 Kernels and Correlation Specific case: m=3, n=35 Exchangeable correlation structure Red:  = 0.0 Green:  = 0.4 Blue:  = 0.8 Note the locality of the kernel method The weight functions G n (t=.25,x) in a specific case for independent data

16 Splines and Correlation Specific case: m=3, n=35 Exchangeable correlation structure Red:  = 0.0 Green:  = 0.4 Blue:  = 0.8 Note the lack of locality of the spline method The weight functions G n (t=.25,x) in a specific case for independent data

17 Splines and Correlation Specific case: m=3, n=35 Complex correlation structure Red: Nearly singular Green:  = 0.0 Blue:  = AR(0.8) Note the lack of locality of the spline method The weight functions G n (t=.25,x) in a specific case for independent data

18 Splines and Standard Kernels Accounting for correlation: Standard kernels remain local Splines are not local Numerical results can be confirmed theoretically Don’t we want our nonparametric regression estimates to be local?

19 Results on Kernels and Correlation GLS with weights Optimal working covariance matrix is working independence! Using the correct covariance matrix Increases variance Increases MSE Splines Kernels (or at least these kernels)

20 Better Kernel Methods: SUR Iterative, due to Naisyin Wang Consider current state in iteration For every j, assume function is fixed and known for Use the seemingly unrelated regression (SUR) idea For j, form estimating equation for local averages/linear for j th component only using GLS with weights Sum the estimating equations together, and solve

21 SUR Kernel Methods It is well known that the GLS spline has an exact, analytic expression We have shown that the SUR kernel method has an exact, analytic expression Both methods are linear in the responses Relatively nontrivial calculations show that Silverman’s result still holds Splines = SUR Kernels

22 Nonlocality The lack of locality of GLS splines and SUR kernels is surprising Suppose we want to estimate the function at t All observations in a cluster contribute to the fit, not just those with covariates near t Somewhat similar to GLIM’s, there is a residual- adjusted pseudo-response that has expectation = response Has local behavior in the pseudo-response

23 Nonlocality Wang’s SUR kernels = pseudo kernels with a clever linear transformation. Let SUR kernels are working independence kernels

24 The Semiparametric Model Y = Response X,Z = time-varying covariates Question: can we improve efficiency for  by accounting for correlation?

25 The Semiparametric Model General Method: Profile likelihood Given , solve for  say Your favorite nonparametric method applied to Working independence Standard kernel SUR kernel

26 Profile Methods Given , solve for  say Then fit WI/GLS to the model with mean Standard kernel methods have awkward, nasty properties SUR kernel methods have nice properties Semiparametric asymptotically efficient

27 ARE for  of Working Independence Cluster Size: Black: m = 3 Red: m = 4 Green: m = 5 Blue: m = 6 Scenario: X’s: common correlation 0.3 Z’s: common correlation 0.6 X & Z: common correlation 0.6  : common correlation  Note: Efficiency depends on cluster size

28 Profile Methods Given , solve for  say Then fit GLS to the model with mean If you fit working independence for your estimate of , there is not that great a loss of efficiency

29 ARE for  of Working Independence/Profile Method Cluster Size: Black: m = 3 Red: m = 4 Green: m = 5 Blue: m = 6 Scenario: X’s: common correlation 0.3 Z’s: common correlation 0.6 X & Z: common correlation 0.6  : common correlation  Note: Efficiency depends on cluster size

30 Conclusions In nonparametric regression Kernels = splines for working independence Working independence is inefficient Standard kernels splines for correlated data SUR kernels = splines for correlated data In semiparametric regression Profiling SUR kernels is efficient Profiling: GLS for  and working independence for  is nearly efficient