Presentation is loading. Please wait.

Presentation is loading. Please wait.

Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric.

Similar presentations


Presentation on theme: "Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric."— Presentation transcript:

1 Raymond J. Carroll Texas A&M University http://stat.tamu.edu/~carroll carroll@stat.tamu.edu Postdoctoral Training Program: http://stat.tamu.edu/B3NC Non/Semiparametric Regression and Clustered/Longitudinal Data

2 College Station, home of Texas A&M University I-35 I-45 Big Bend National Park Wichita Falls, my hometown West Texas Palo Duro Canyon, the Grand Canyon of Texas Guadalupe Mountains National Park East Texas  Midland

3 Raymond CarrollAlan Welsh Naisyin WangEnno Mammen Xihong Lin Oliver Linton Acknowledgments Series of papers are on my web site Lin, Wang and Welsh: Longitudinal data (Mammen & Linton for pseudo- observation methods) Linton and Mammen: time series data

4 Outline Longitudinal models: panel data Background: splines = kernels for independent data Correlated data: do splines = kernels?

5 Panel Data (for simplicity) i = 1,…,n clusters/individuals j = 1,…,m observations per cluster SubjectWave 1Wave 2…Wave m 1XXX 2XXX …X nXXX

6 Panel Data (for simplicity) i = 1,…,n clusters/individuals j = 1,…,m observations per cluster Important points: The cluster size m is meant to be fixed This is not a multiple time series problem where the cluster size increases to infinity Some comments on the single time series problem are given near the end of the talk

7 The Marginal Nonparametric Model Y = Response X = time-varying covariate Question: can we improve efficiency by accounting for correlation?

8 Nonstandard Example: Colon carcinogenesis experiments. Cell-Based Measures in single rats of DNA damage, differentiation, proliferation, apoptosis, P27, etc.

9 The Marginal Nonparametric Model Important assumption Covariates at other waves are not conditionally predictive, i.e., they are surrogates This assumption is required for any GLS fit, including parametric GLS

10 Independent Data Splines (smoothing, P-splines, etc.) with penalty parameter = Ridge regression fit Some bias, smaller variance is over-parameterized least squares is a polynomial regression

11 Independent Data Kernels (local averages, local linear, etc.), with kernel density function K and bandwidth h As the bandwidth h  0, only observations with X near t get any weight in the fit

12 Independent Data Major methods Splines Kernels Smoothing parameters required for both Fits: similar in many (most?) datasets Expectation: some combination of bandwidths and kernel functions look like splines 12

13 Independent Data Splines and kernels are linear in the responses Silverman showed that there is a kernel function and a bandwidth so that the weight functions are asymptotically equivalent In this sense, splines = kernels

14 The weight functions G n (t=.25,x) in a specific case for independent data Kernel Smoothing Spline Note the similarity of shape and the locality: only X’s near t=0.25 get any weight

15 Working Independence Working independence: Ignore all correlations Fix up standard errors at the end Advantage: the assumption is not required Disadvantage: possible severe loss of efficiency if carried too far

16 Working Independence Working independence: Ignore all correlations Should posit some reasonable marginal variances Weighting important for efficiency Weighted versions: Splines and kernels have obvious analogues Standard method: Zeger & Diggle, Hoover, Rice, Wu & Yang, Lin & Ying, etc.

17 Working Independence Working independence: Weighted splines and weighted kernels are linear in the responses The Silverman result still holds In this sense, splines = kernels

18 Accounting for Correlation Splines have an obvious analogue for non- independent data Let be a working covariance matrix Penalized Generalized least squares (GLS) GLS ridge regression Because splines are based on likelihood ideas, they generalize quickly to new problems

19 Accounting for Correlation Splines have an obvious analogue for non- independent data Kernels are not so obvious One can do theory with kernels Local likelihood kernel ideas are standard in independent data problems Most attempts at kernels for correlated data have tried to use local likelihood kernel methods

20 Kernels and Correlation Problem: how to define locality for kernels? Goal: estimate the function at t Let be a diagonal matrix of standard kernel weights Standard Kernel method: GLS pretending inverse covariance matrix is The estimate is inherently local

21 Kernels and Correlation Specific case: m=3, n=35 Exchangeable correlation structure Red:  = 0.0 Green:  = 0.4 Blue:  = 0.8 Note the locality of the kernel method The weight functions G n (t=.25,x) in a specific case 18

22 Splines and Correlation Specific case: m=3, n=35 Exchangeable correlation structure Red:  = 0.0 Green:  = 0.4 Blue:  = 0.8 Note the lack of locality of the spline method The weight functions G n (t=.25,x) in a specific case

23 Splines and Correlation Specific case: m=3, n=35 Complex correlation structure Red: Nearly singular Green:  = 0.0 Blue:  = AR(0.8) Note the lack of locality of the spline method The weight functions G n (t=.25,x) in a specific case

24 Splines and Standard Kernels Accounting for correlation: Standard kernels remain local Splines are not local Numerical results can be confirmed theoretically

25 Results on Kernels and Correlation GLS with weights Optimal working covariance matrix is working independence! Using the correct covariance matrix Increases variance Increases MSE Splines Kernels (or at least these kernels) 24

26 Pseudo-Observation Kernel Methods Better kernel methods are possible Pseudo-observation: original method Construction: specific linear transformation of Y Mean =  (X) Covariance = diagonal matrix This adjusts the original responses without affecting the mean

27 Pseudo-Observation Kernel Methods Construction: specific linear transformation of Y Mean =  (X) Covariance = diagonal Iterative: Efficiency: More efficient than working independence Proof of Principle: kernel methods can be constructed to take advantage of correlation

28 Efficiency of Splines and Pseudo- Observation Kernels Exchng: Exchangeable with correlation 0.6 AR: autoregressive with correlation 0.6 Near Sing: A nearly singular matrix

29 What Do GLS Splines Do? GLS Splines are really working independence splines using pseudo-observations Let GLS Splines are working independence splines

30 GLS Splines and SUR Kernels GLS Splines are working independence splines Algorithm: iterate until convergence Idea: for kernels, do same thing This is Naisyin Wang’s SUR method

31 Better Kernel Methods: SUR Another view Consider current state in iteration For every j, assume function is fixed and known for Use the seemingly unrelated regression (SUR) idea For j, form estimating equation for local averages/linear for j th component only using GLS with weights Sum the estimating equations together, and solve

32 SUR Kernel Methods It is well known that the GLS spline has an exact, analytic expression We have shown that the SUR kernel method has an exact, analytic expression Both methods are linear in the responses Relatively nontrivial calculations show that Silverman’s result still holds Splines = SUR Kernels

33 Nonlocality The lack of locality of GLS splines and SUR kernels is surprising Suppose we want to estimate the function at t Result: All observations in a cluster contribute to the fit, not just those with covariates near t Locality: Defined at the cluster level

34 Nonlocality Wang’s SUR kernels = pseudo kernels with a clever linear transformation. Let SUR kernels are working independence kernels

35 Locality of Splines Splines = SUR kernels (Silverman-type result) GLS spline: Iterative standard independent spline smoothing SUR pseudo-observations at each iteration GLS splines are not local GLS splines are local in (the same!) pseudo- observations

36 Time Series Problems Time series problems: many of the same issues arise Original pseudo-observation method Two stages Linear transformation Mean  (X) Independent errors Single standard kernel applied Potential for great gains in efficiency (even infinite for AR problems with large correlation)

37 Time Series: AR(1) Illustration, First Pseudo Observation Method AR(1), correlation  : Regress Y t 0 on X t

38 Time Series Problems More efficient methods can be constructed Series of regression problems: for all j, Pseudo observations Mean White noise errors Regress for each j: fits are asymptotically independent Then weighted average Time series version of SUR-kernels for longitudinal data?

39 Time Series: AR(1) Illustration, New Pseudo Observation Method AR(1), correlation  : Regress Y t 0 on X t and Y t 1 on X t-1 Weights: 1 and  2

40 Time Series Problems AR(1) errors with correlation  Efficiency of original pseudo-observation method to working independence: Efficiency of new (SUR?) pseudo-observation method to original method: 36

41 The Semiparametric Model Y = Response X,Z = time-varying covariates Question: can we improve efficiency for  by accounting for correlation?

42 Profile Methods Given , solve for  say Basic idea: Regress Working independence Standard kernels Pseudo –observations kernels SUR kernels

43 Profile Methods Given , solve for  say Then fit GLS or W.I. to the model with mean Question: does it matter what kernel method is used? Question: How bad is using W.I. everywhere? Question: are there efficient choices?

44 The Semiparametric Model: Special Case If X does not vary with time, simple semiparametric efficient method available The basic point is that has common mean and covariance matrix If were a polynomial, GLS likelihood methods would be natural

45 The Semiparametric Model: Special Case Method: Replace polynomial GLS likelihood with GLS local likelihood with weights Then do GLS on the derived variable Semiparametric efficient

46 Profile Method: General Case Given , solve for  say Then fit GLS or W.I. to the model with mean In this general case, how you estimate  matters Working independence Standard kernel Pseudo-observation kernel SUR kernel

47 Profile Methods In this general case, how you estimate  matters Working independence Standard kernel Pseudo-observation kernel SUR kernel The SUR method leads to the semiparametric efficient estimate So too does the GLS spline

48 Age.014.035.010.033.008.032 # of Smokes.984.192.549.144.579.139 Drug Use?1.05.53.58.33.58.33 # of Partners-.054.059.080.038.078.039 Depression?-.033.021-.045.013-.046.014 Longitudinal CD4 Count Data (Zeger and Diggle) Working Independence Est. s.e. Semiparametric GLS Z-D Semiparametric GLS refit

49 Conclusions (1/3): Nonparametric Regression In nonparametric regression Kernels = splines for working independence (W.I.) Working independence is inefficient Standard kernels splines for correlated data

50 Conclusions (2/3): Nonparametric Regression In nonparametric regression Pseudo-observation methods improve upon working independence SUR kernels = splines for correlated data Splines and SUR kernels are not local Splines and SUR kernels are local in pseudo- observations

51 Conclusions (3/3): Semiparametric Regression In semiparametric regression Profile methods are a general class Fully efficient parameter estimates are easily constructed if X is not time-varying When X is time-varying, method of estimating affects properties of parameter estimates Using SUR kernels or GLS splines as the nonparametric method leads to efficient results Conclusions can change between working independence and semiparametric GLS

52 Conclusions: Splines versus Kernels One has to be struck by the fact that all the grief in this problem has come from trying to define kernel methods At the end of the day, they are no more efficient than splines, and harder and more subtle to define Showing equivalence as we have done suggests the good properties of splines

53 The decrease in s.e.’s is in accordance with our theory. The other phenomena are more difficult to explain. Nonetheless, they are not unique to semiparametric GEE methods. Similar discrepant outcomes occurred in parametric GEE estimation in which  (t) was replaced by a cubic regression function in time. Furthermore, we simulated data using the observed covariates but having responses generated from the multivariate normal with mean equal to the fitted mean in the parametric correlated GEE estimation, and with correlation given by Zeger and Diggle. The level of divergence between two sets of results in the simulated data was fairly consistent with what appeared in the Table. For example, among the first 25 generated data sets, 3 had different signs in sex partners and 7 had the scale of drug use coefficient obtained by WI 1.8 times or larger than what was obtained by the proposed method. The Numbers in the Table


Download ppt "Raymond J. Carroll Texas A&M University Postdoctoral Training Program: Non/Semiparametric."

Similar presentations


Ads by Google