Presentation is loading. Please wait.

Presentation is loading. Please wait.

Raymond J. Carroll Department of Statistics and Nutrition Texas A&M University Non/Semiparametric Regression.

Similar presentations


Presentation on theme: "Raymond J. Carroll Department of Statistics and Nutrition Texas A&M University Non/Semiparametric Regression."— Presentation transcript:

1 Raymond J. Carroll Department of Statistics and Nutrition Texas A&M University http://stat.tamu.edu/~carroll carroll@stat.tamu.edu Non/Semiparametric Regression and Clustered/Longitudinal Data

2 College Station, home of Texas A&M University I-35 I-45 Big Bend National Park Wichita Falls, my hometown West Texas Palo Duro Canyon, the Grand Canyon of Texas Guadalupe Mountains National Park East Texas  Midland

3 Outline Series of Semiparametric Problems: Panel data Matched studies Family studies Finance applications Colon carcinogenesis experiments for crypt signaling

4 Outline General Framework: Likelihood-criterion functions Algorithms: kernel-based General results: Semiparametric efficiency Backfitting and profiling Splines and kernels: Summary and conjectures

5 Xihong Lin Harvard University Acknowledgments

6 References The most general formulation is given in Lin and Carroll, Semiparametric Estimation in General Repeated Measures Problems, JRSSB, to appear.

7 Basic Problems Semiparametric problems Parameter of interest, called  Unknown function  The key is that the unknown function is evaluated multiple times in computing the likelihood for an individual

8 Example 1: Panel Data i = 1,…,n clusters/individuals j = 1,…,m observations per cluster SubjectWave 1Wave 2…Wave m 1XXX 2XXX …X nXXX

9 Example 1: Panel Data (for simplicity) i = 1,…,n clusters/individuals j = 1,…,m observations per cluster Important points: The cluster size m is meant to be fixed This is not a multiple time series problem where the cluster size increases to infinity

10 Example 1: Marginal Parametric Model Y = Response X,Z = time-varying covariates General Result: We can improve efficiency for  by accounting for correlation: Generalized Least Squares (GLS)

11 Example 1: Marginal Semiparametric Model Y = Response X,Z = varying covariates Question: can we improve efficiency for  by accounting for correlation?

12 Example 1: Marginal Nonparametric Model Y = Response X = varying covariate Question: can we improve efficiency by accounting for correlation? (GLS)

13 Example 2: Matched Studies Prospective logistic model: i = person, S = stratum Matched case-control studies selected 1 case and 1 control from each stratum

14 Example 2: Matched Studies Then the likelihood is determined by Note how this involves two evaluations of the nonparametric function

15 Example 3: Model in Finance Model in finance Previous literature used an integration estimator, namely first solved Theory (only m = 2) and computation horrible For us, exact computation, general theory

16 Example 4: Twin Studies Family consists of twins, followed longitudinally Baseline for each twin modeled nonparametrically via Longitudinal modeled parametrically via

17 General Formulation Y ij = Response X ij,Z ij = possibly varying covariates Likelihood (or criterion function) The key is that the function is evaluated multiple times for each individual This distinguishes it from standard semiparametric models The goal is to estimate and efficiently

18 General Formulation: Examples Likelihood (or criterion function) We will spend most of the talk on the partially linear marginal Gaussian model Many of the results apply to marginal GLIM’s, and semiparametric GLMM’s Special finance model

19 General Formulation: Examples Likelihood (or criterion function) Complex example: units are families with m correlated individuals, each measured repeatedly in time Here X ij might be the baseline value for family member j Also, Y ij might be a multivariate ensemble of time-varying responses

20 General Formulation: Overview Likelihood (or criterion function) For these problems, I will give constructive methods of estimation with Asymptotic expansions and inference available If the criterion function is a likelihood function, then the methods are semiparametric efficient. Methods avoid solving integral equations

21 The Semiparametric Model Y = Response X,Z = time-varying covariates Question: can we improve efficiency for  by accounting for correlation, i.e., what method is semiparametric efficient?

22 Working Independence As a base comparison, it is useful to contrast new methods with the most common one: working independence Large literature for nonparametric and semiparametric problems We hope to improve efficiency in the parametric and the nonparametric parts

23 Semiparametric Efficiency The semiparametric efficient score is readily worked out. Involves a Fredholm equation of the 2 nd kind Effectively impossible to solve directly: Involves densities of each X conditional on the others The usual device of solving integral equations does not work here (or at least is not worth trying)

24 The Efficient Score

25 Profile methods work like this. Fix Apply your smoother Call the result Maximize the Gaussian likelihood function in Explicit solution for most smoothers Profiling

26 Backfitting Methods Backfitting methods work like this. Fix Apply your smoother Call the result Maximize the likelihood function in : Iterate until convergence (explicit solution for most smoothers, but different from profiling)

27 Comparing Profiling and Backfitting Result 1: (Hu, Wang, Carroll, 2004, Biometrika): Profiling is at least as efficient as backfitting Profiling is better for the working independence smoother Result 2: There exists a smoother for which profiling and backfitting are asymptotically equivalent, and which yield efficient estimates of

28 General Formulation: Revisited Y ij = Response X ij,Z ij = varying covariates Likelihood (or criterion function) The key is that the function is evaluated multiple times for each individual The goal is to estimate and efficiently

29 General Formulation: Revisited Likelihood (or criterion function) The goal is to estimate the function at a target value t Fix. Pretend that the formulation involves different functions

30 General Formulation: Revisited Pretend that the formulation involves different functions Pretend that are known Fit a local linear regression via local likelihood: Get the local score function for

31 General Formulation: Revisited Repeat: Pretend knowing Fit a local linear regression: Get the local score function Finally, solve Explicit solution in the Gaussian cases

32 Main Results Semiparametric Efficient for Backfitting (undersmoothed) = profiling Pseudo-likelihood extensions High-order expansions for parameters and functions

33 Gaussian Case: Independent Data Splines (smoothing, P-splines, etc.) with penalty parameter = Ridge regression fit Some bias, smaller variance is over-parameterized least squares is a polynomial regression

34 Gaussian case: Independent Data Kernels (local averages, local linear, etc.), with kernel density function K and bandwidth h As the bandwidth h  0, only observations with X near t get any weight in the fit

35 Kernel Regression

36 Independent Data Splines and kernels are linear in the responses Silverman showed that there is a kernel function and a bandwidth so that the weight functions are asymptotically equivalent In this sense, splines = kernels

37 Accounting for Correlation Splines have an obvious analogue for non- independent data Let be a working covariance matrix Penalized Generalized least squares (GLS) GLS ridge regression Because splines are based on likelihood ideas, they generalize quickly to new problems

38 Advertisement http://stat.tamu.edu/~carroll/semiregbook/

39 What Do GLS Splines Do? GLS Splines are working independence splines Algorithm: iterate weighted but independent splines until convergence Proves non-locality by inspection Locality is at cluster level: If any X in a cluster is near value of interest, then all cluster observations get weight

40 What Should GLS Kernels Do? GLS Splines are working independence splines Algorithm: iterate weighted but independent splines until convergence Obvious(?) Approach: Iterate weighted by independent kernels until convergence Result is our method: in the Gaussian case, due to Naisyin Wang

41 Properties of GLS Kernels Algorithm: iterate weighted but independent kernels until convergence From a recent article in a major journal: “The procedure is very computationally intensive” Quite apart from the fact that compared to many new methods, it is hardly computationally intensive: It is easy to derive an exact, non-iterative form

42 Properties of GLS Kernels Algorithm: iterate weighted but independent kernels until convergence For a smoother matrix the function fits are The smoother matrix has to be applied to both sides: pseudo-observations and function fits “Simply” write out and solve Matlab code available

43 Kernels = Splines The GLS spline has an exact, analytic expression We have shown that the GLS kernel method has an exact, analytic expression Both methods are linear in the responses Nontrivial calculations show that Silverman’s result still holds Splines = Kernels

44 Age.014.035.010.033 # of Smokes.984.192.549.144 Drug Use?1.05.53.58.33 # of Partners-.054.059.080.038 Depression?-.033.021-.045.013 Longitudinal CD4 Count Data (Zeger and Diggle): X = time of exam Working Independence Est. s.e. Semiparametric GLS Z-D

45 Conclusions General likelihood: Distinguishing Property: Unknown function evaluated repeatedly in each individual Kernel method: Iterated local likelihood calculations, explicit solution in Gaussian cases

46 Conclusions General results: Semiparametric efficient: construction, no integral equations need to be solved Backfitting and profiling: asymptotically equivalent

47 Conclusions Splines and Kernels: Asymptotically the same in the Gaussian case Splines: generally easier to compute, although smoothing parameter selection can be intensive Low-order splines: even easier to compute, and we have a good, fast approximate CV algorithm

48 Conclusions Splines and Kernels: One might conjecture that splines can be constructed for the general problem that are asymptotically efficient Open Problem: is this true, and how?

49 The decrease in s.e.’s is in accordance with our theory. The other phenomena are more difficult to explain. Nonetheless, they are not unique to semiparametric GEE methods. Similar discrepant outcomes occurred in parametric GEE estimation in which  (t) was replaced by a cubic regression function in time. Furthermore, we simulated data using the observed covariates but having responses generated from the multivariate normal with mean equal to the fitted mean in the parametric correlated GEE estimation, and with correlation given by Zeger and Diggle. The level of divergence between two sets of results in the simulated data was fairly consistent with what appeared in the Table. For example, among the first 25 generated data sets, 3 had different signs in sex partners and 7 had the scale of drug use coefficient obtained by WI 1.8 times or larger than what was obtained by the proposed method. The Numbers in the Table

50 The Marginal Nonparametric Model Important assumption Covariates at other waves are not conditionally predictive, i.e., they are surrogates This assumption is required for any GLS fit, including parametric GLS

51 Accounting for Correlation Splines have an obvious analogue for non- independent data Kernels are not so obvious One can do theory with kernels Local likelihood kernel ideas are standard in independent data problems Most attempts at kernels for correlated data have tried to use local likelihood kernel methods

52 The weight functions G n (t=.25,x) in a specific case for independent data Kernel Smoothing Spline Note the similarity of shape and the locality: only X’s near t=0.25 get any weight

53 Kernels and Correlation Problem: how to define locality for kernels? Goal: estimate the function at t Let be a diagonal matrix of standard kernel weights Standard Kernel method: GLS pretending inverse covariance matrix is The estimate is inherently local

54 Kernels and Correlation Specific case: m=3, n=35 Exchangeable correlation structure Red:  = 0.0 Green:  = 0.4 Blue:  = 0.8 Note the locality of the kernel method The weight functions G n (t=.25,x) in a specific case 18

55 Splines and Correlation Specific case: m=3, n=35 Exchangeable correlation structure Red:  = 0.0 Green:  = 0.4 Blue:  = 0.8 Note the lack of locality of the spline method The weight functions G n (t=.25,x) in a specific case

56 Splines and Standard Kernels Accounting for correlation: Standard kernels remain local Splines are not local Numerical results have been confirmed theoretically


Download ppt "Raymond J. Carroll Department of Statistics and Nutrition Texas A&M University Non/Semiparametric Regression."

Similar presentations


Ads by Google