Raymond J. Carroll Department of Statistics and Nutrition Texas A&M University Non/Semiparametric Regression.

Slides:



Advertisements
Similar presentations
Copula Regression By Rahul A. Parsa Drake University &
Advertisements

Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
The General Linear Model Or, What the Hell’s Going on During Estimation?
Lecture 6 (chapter 5) Revised on 2/22/2008. Parametric Models for Covariance Structure We consider the General Linear Model for correlated data, but assume.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Raymond J. Carroll Texas A&M University Non/Semiparametric Regression and Clustered/Longitudinal Data.
Model assessment and cross-validation - overview
Chapter 2: Lasso for linear models
Data mining and statistical learning - lecture 6
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
The Simple Linear Regression Model: Specification and Estimation
Raymond J. Carroll Texas A&M University Nonparametric Regression and Clustered/Longitudinal Data.
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML)
Missing at Random (MAR)  is unknown parameter of the distribution for the missing- data mechanism The probability some data are missing does not depend.
Nonparametric Regression and Clustered/Longitudinal Data
Score Tests in Semiparametric Models Raymond J. Carroll Department of Statistics Faculties of Nutrition and Toxicology Texas A&M University
Maximum-Likelihood estimation Consider as usual a random sample x = x 1, …, x n from a distribution with p.d.f. f (x;  ) (and c.d.f. F(x;  ) ) The maximum.
Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.
Model Selection in Semiparametrics and Measurement Error Models Raymond J. Carroll Department of Statistics Faculty of Nutrition and Toxicology Texas A&M.
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Semiparametric Methods for Colonic Crypt Signaling Raymond J. Carroll Department of Statistics Faculty of Nutrition and Toxicology Texas A&M University.
Maximum likelihood (ML)
Lecture II-2: Probability Review
Calibration & Curve Fitting
Review of Lecture Two Linear Regression Normal Equation
9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Model Inference and Averaging
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
Andrew Thomson on Generalised Estimating Equations (and simulation studies)
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Part 6: MLE for RE Models [ 1/38] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Estimating standard error using bootstrap
Chapter 7. Classification and Prediction
Model Inference and Averaging
Spatial Prediction of Coho Salmon Counts on Stream Networks
Comparisons among methods to analyze clustered multivariate biomarker predictors of a single binary outcome Xiaoying Yu, PhD Department of Preventive Medicine.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Master Thesis Presentation
6.5 Taylor Series Linearization
OVERVIEW OF LINEAR MODELS
Generally Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

Raymond J. Carroll Department of Statistics and Nutrition Texas A&M University Non/Semiparametric Regression and Clustered/Longitudinal Data

College Station, home of Texas A&M University I-35 I-45 Big Bend National Park Wichita Falls, my hometown West Texas Palo Duro Canyon, the Grand Canyon of Texas Guadalupe Mountains National Park East Texas  Midland

Outline Series of Semiparametric Problems: Panel data Matched studies Family studies Finance applications Colon carcinogenesis experiments for crypt signaling

Outline General Framework: Likelihood-criterion functions Algorithms: kernel-based General results: Semiparametric efficiency Backfitting and profiling Splines and kernels: Summary and conjectures

Xihong Lin Harvard University Acknowledgments

References The most general formulation is given in Lin and Carroll, Semiparametric Estimation in General Repeated Measures Problems, JRSSB, to appear.

Basic Problems Semiparametric problems Parameter of interest, called  Unknown function  The key is that the unknown function is evaluated multiple times in computing the likelihood for an individual

Example 1: Panel Data i = 1,…,n clusters/individuals j = 1,…,m observations per cluster SubjectWave 1Wave 2…Wave m 1XXX 2XXX …X nXXX

Example 1: Panel Data (for simplicity) i = 1,…,n clusters/individuals j = 1,…,m observations per cluster Important points: The cluster size m is meant to be fixed This is not a multiple time series problem where the cluster size increases to infinity

Example 1: Marginal Parametric Model Y = Response X,Z = time-varying covariates General Result: We can improve efficiency for  by accounting for correlation: Generalized Least Squares (GLS)

Example 1: Marginal Semiparametric Model Y = Response X,Z = varying covariates Question: can we improve efficiency for  by accounting for correlation?

Example 1: Marginal Nonparametric Model Y = Response X = varying covariate Question: can we improve efficiency by accounting for correlation? (GLS)

Example 2: Matched Studies Prospective logistic model: i = person, S = stratum Matched case-control studies selected 1 case and 1 control from each stratum

Example 2: Matched Studies Then the likelihood is determined by Note how this involves two evaluations of the nonparametric function

Example 3: Model in Finance Model in finance Previous literature used an integration estimator, namely first solved Theory (only m = 2) and computation horrible For us, exact computation, general theory

Example 4: Twin Studies Family consists of twins, followed longitudinally Baseline for each twin modeled nonparametrically via Longitudinal modeled parametrically via

General Formulation Y ij = Response X ij,Z ij = possibly varying covariates Likelihood (or criterion function) The key is that the function is evaluated multiple times for each individual This distinguishes it from standard semiparametric models The goal is to estimate and efficiently

General Formulation: Examples Likelihood (or criterion function) We will spend most of the talk on the partially linear marginal Gaussian model Many of the results apply to marginal GLIM’s, and semiparametric GLMM’s Special finance model

General Formulation: Examples Likelihood (or criterion function) Complex example: units are families with m correlated individuals, each measured repeatedly in time Here X ij might be the baseline value for family member j Also, Y ij might be a multivariate ensemble of time-varying responses

General Formulation: Overview Likelihood (or criterion function) For these problems, I will give constructive methods of estimation with Asymptotic expansions and inference available If the criterion function is a likelihood function, then the methods are semiparametric efficient. Methods avoid solving integral equations

The Semiparametric Model Y = Response X,Z = time-varying covariates Question: can we improve efficiency for  by accounting for correlation, i.e., what method is semiparametric efficient?

Working Independence As a base comparison, it is useful to contrast new methods with the most common one: working independence Large literature for nonparametric and semiparametric problems We hope to improve efficiency in the parametric and the nonparametric parts

Semiparametric Efficiency The semiparametric efficient score is readily worked out. Involves a Fredholm equation of the 2 nd kind Effectively impossible to solve directly: Involves densities of each X conditional on the others The usual device of solving integral equations does not work here (or at least is not worth trying)

The Efficient Score

Profile methods work like this. Fix Apply your smoother Call the result Maximize the Gaussian likelihood function in Explicit solution for most smoothers Profiling

Backfitting Methods Backfitting methods work like this. Fix Apply your smoother Call the result Maximize the likelihood function in : Iterate until convergence (explicit solution for most smoothers, but different from profiling)

Comparing Profiling and Backfitting Result 1: (Hu, Wang, Carroll, 2004, Biometrika): Profiling is at least as efficient as backfitting Profiling is better for the working independence smoother Result 2: There exists a smoother for which profiling and backfitting are asymptotically equivalent, and which yield efficient estimates of

General Formulation: Revisited Y ij = Response X ij,Z ij = varying covariates Likelihood (or criterion function) The key is that the function is evaluated multiple times for each individual The goal is to estimate and efficiently

General Formulation: Revisited Likelihood (or criterion function) The goal is to estimate the function at a target value t Fix. Pretend that the formulation involves different functions

General Formulation: Revisited Pretend that the formulation involves different functions Pretend that are known Fit a local linear regression via local likelihood: Get the local score function for

General Formulation: Revisited Repeat: Pretend knowing Fit a local linear regression: Get the local score function Finally, solve Explicit solution in the Gaussian cases

Main Results Semiparametric Efficient for Backfitting (undersmoothed) = profiling Pseudo-likelihood extensions High-order expansions for parameters and functions

Gaussian Case: Independent Data Splines (smoothing, P-splines, etc.) with penalty parameter = Ridge regression fit Some bias, smaller variance is over-parameterized least squares is a polynomial regression

Gaussian case: Independent Data Kernels (local averages, local linear, etc.), with kernel density function K and bandwidth h As the bandwidth h  0, only observations with X near t get any weight in the fit

Kernel Regression

Independent Data Splines and kernels are linear in the responses Silverman showed that there is a kernel function and a bandwidth so that the weight functions are asymptotically equivalent In this sense, splines = kernels

Accounting for Correlation Splines have an obvious analogue for non- independent data Let be a working covariance matrix Penalized Generalized least squares (GLS) GLS ridge regression Because splines are based on likelihood ideas, they generalize quickly to new problems

Advertisement

What Do GLS Splines Do? GLS Splines are working independence splines Algorithm: iterate weighted but independent splines until convergence Proves non-locality by inspection Locality is at cluster level: If any X in a cluster is near value of interest, then all cluster observations get weight

What Should GLS Kernels Do? GLS Splines are working independence splines Algorithm: iterate weighted but independent splines until convergence Obvious(?) Approach: Iterate weighted by independent kernels until convergence Result is our method: in the Gaussian case, due to Naisyin Wang

Properties of GLS Kernels Algorithm: iterate weighted but independent kernels until convergence From a recent article in a major journal: “The procedure is very computationally intensive” Quite apart from the fact that compared to many new methods, it is hardly computationally intensive: It is easy to derive an exact, non-iterative form

Properties of GLS Kernels Algorithm: iterate weighted but independent kernels until convergence For a smoother matrix the function fits are The smoother matrix has to be applied to both sides: pseudo-observations and function fits “Simply” write out and solve Matlab code available

Kernels = Splines The GLS spline has an exact, analytic expression We have shown that the GLS kernel method has an exact, analytic expression Both methods are linear in the responses Nontrivial calculations show that Silverman’s result still holds Splines = Kernels

Age # of Smokes Drug Use? # of Partners Depression? Longitudinal CD4 Count Data (Zeger and Diggle): X = time of exam Working Independence Est. s.e. Semiparametric GLS Z-D

Conclusions General likelihood: Distinguishing Property: Unknown function evaluated repeatedly in each individual Kernel method: Iterated local likelihood calculations, explicit solution in Gaussian cases

Conclusions General results: Semiparametric efficient: construction, no integral equations need to be solved Backfitting and profiling: asymptotically equivalent

Conclusions Splines and Kernels: Asymptotically the same in the Gaussian case Splines: generally easier to compute, although smoothing parameter selection can be intensive Low-order splines: even easier to compute, and we have a good, fast approximate CV algorithm

Conclusions Splines and Kernels: One might conjecture that splines can be constructed for the general problem that are asymptotically efficient Open Problem: is this true, and how?

The decrease in s.e.’s is in accordance with our theory. The other phenomena are more difficult to explain. Nonetheless, they are not unique to semiparametric GEE methods. Similar discrepant outcomes occurred in parametric GEE estimation in which  (t) was replaced by a cubic regression function in time. Furthermore, we simulated data using the observed covariates but having responses generated from the multivariate normal with mean equal to the fitted mean in the parametric correlated GEE estimation, and with correlation given by Zeger and Diggle. The level of divergence between two sets of results in the simulated data was fairly consistent with what appeared in the Table. For example, among the first 25 generated data sets, 3 had different signs in sex partners and 7 had the scale of drug use coefficient obtained by WI 1.8 times or larger than what was obtained by the proposed method. The Numbers in the Table

The Marginal Nonparametric Model Important assumption Covariates at other waves are not conditionally predictive, i.e., they are surrogates This assumption is required for any GLS fit, including parametric GLS

Accounting for Correlation Splines have an obvious analogue for non- independent data Kernels are not so obvious One can do theory with kernels Local likelihood kernel ideas are standard in independent data problems Most attempts at kernels for correlated data have tried to use local likelihood kernel methods

The weight functions G n (t=.25,x) in a specific case for independent data Kernel Smoothing Spline Note the similarity of shape and the locality: only X’s near t=0.25 get any weight

Kernels and Correlation Problem: how to define locality for kernels? Goal: estimate the function at t Let be a diagonal matrix of standard kernel weights Standard Kernel method: GLS pretending inverse covariance matrix is The estimate is inherently local

Kernels and Correlation Specific case: m=3, n=35 Exchangeable correlation structure Red:  = 0.0 Green:  = 0.4 Blue:  = 0.8 Note the locality of the kernel method The weight functions G n (t=.25,x) in a specific case 18

Splines and Correlation Specific case: m=3, n=35 Exchangeable correlation structure Red:  = 0.0 Green:  = 0.4 Blue:  = 0.8 Note the lack of locality of the spline method The weight functions G n (t=.25,x) in a specific case

Splines and Standard Kernels Accounting for correlation: Standard kernels remain local Splines are not local Numerical results have been confirmed theoretically