Presentation is loading. Please wait.

Presentation is loading. Please wait.

Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.

Similar presentations


Presentation on theme: "Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model."— Presentation transcript:

1 Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model Search and Uncertainty with Many Predictors Christopher M. Hans

2 26 July 2005 Regression Model Search & Uncertainty Overview Regression Model Space Exploration SSS Methodology –Comparisons to MCMC –Analytic Evaluation Sparsity in the Normal Linear Model Several examples from cancer genomics

3 26 July 2005 Regression Model Search & Uncertainty Variable Selection & Model Uncertainty Regression Modeling –Many possible predictors Model/Variable Selection –Choose one set of “relevant” predictors Model Averaging –Use all models (or at least a set of high probability)

4 26 July 2005 Regression Model Search & Uncertainty Model Search Strategies Stepwise Methods –Forward/Backward selection –“Leaps and Bounds” [Furnival & Wilson, 1974] MCMC –Gibbs sampling [George & McCulloch, 1993, 1997; Geweke, 1996; Smith and Kohn, 1996, 1997; Brown et al., 1998] –Metropolis-Hastings [Madigan & York, 1995; Raftery et al. 1997; Green, 1995]

5 26 July 2005 Regression Model Search & Uncertainty Sparsity via p(  ) Focus is on “sparse” models Sparsity is encoded in the model via the prior Specification of  important

6 26 July 2005 Regression Model Search & Uncertainty Parameter Space Priors Encompassing Model Induced Priors Closed form calculation of p(y |  )

7 26 July 2005 Regression Model Search & Uncertainty Shotgun Stochastic Search Shoot out many proposals Evaluate them in parallel Sample a new model from the proposals

8 26 July 2005 Regression Model Search & Uncertainty Criteria for Model Search At each iteration: 1. Move across dimension effectively 2. Allow each variable to be considered 3. Quickly identify “similar” models

9 26 July 2005 Regression Model Search & Uncertainty Regression Model SSS Defining the Neighborhood: Current model  of size k Consider three types of proposals: –neighboring models  - of dimension k - 1 –neighboring models  ± of dimension k –neighboring models  + of dimension k + 1 These are the models “shot out” (in parallel)

10 26 July 2005 Regression Model Search & Uncertainty Choosing the New Model Would like dimensional balance: Bayes: sample based on relative posterior probabilities Otherwise: BIC, R 2, F statistic, etc.

11 26 July 2005 Regression Model Search & Uncertainty Regression Model SSS Current Model Parallel Computing Step Three Proposals New Model

12 26 July 2005 Regression Model Search & Uncertainty SSS Output As the search progresses: –Maintain list  * of the best models evaluated –Based on a score function [log p(y |  ) + log p(  )] –Use these models to summarize the posterior

13 26 July 2005 Regression Model Search & Uncertainty Posterior Summarization Condition on  * –Norming Constant: –Model probabilities: –Dimension Importance: –Variable importance:

14 26 July 2005 Regression Model Search & Uncertainty Relationship to MCMC Metropolis-Hastings: –Use P(x) restricted to a neighborhood B(x) as the proposal distribution –Acceptance Probability:

15 26 July 2005 Regression Model Search & Uncertainty Analysis of SSS Performance Fixed dimensional SSS: nbd(  ) =  ± Expected time to find  * Orthogonal design

16 26 July 2005 Regression Model Search & Uncertainty Simulation Study Real Data: n = 41, p = 8,408 50 simulated y from “true” model: |  * | = 4 p 2 { 10, 100, 500, 1000, 2500, 5000, 7500, 8408, 10000, 12500, 15000, 17500, 20000, 22500, 25000 }

17 26 July 2005 Regression Model Search & Uncertainty Comparison to MCMC 40,000 SSS iterations SSS: 11 hrs. 53 min. 29,163 Gibbs iterations Gibbs: 75.41% 1,137,195,208 model evaluations 135,252 Gibbs iterations Gibbs: 55 hrs. 13 min. Gibbs: 97.49%

18 26 July 2005 Regression Model Search & Uncertainty Illustration: Glioblastoma Survival Study Keck Center for Neurooncogenomics at Duke –n = 41 (patients) p = 8,408 (genes) –Expression levels are standardized –Priors:  = 1,  = 3,  = 10/p

19 26 July 2005 Regression Model Search & Uncertainty Posterior Summarization 1,000,000 models from 40,000 iterations (<12 hours)

20 26 July 2005 Regression Model Search & Uncertainty Assessing Model Fit Model Averaged Fit –Sample from

21 26 July 2005 Regression Model Search & Uncertainty Extension to Binary Regression

22 26 July 2005 Regression Model Search & Uncertainty Extension to Weibull Survival Models

23 26 July 2005 Regression Model Search & Uncertainty Sparsity in the Normal Linear Model Sparsity via p(  ) Sparsity via p(y |  ): shrinkage Marginal Likelihood:

24 26 July 2005 Regression Model Search & Uncertainty Lower Bound on the Marginal Likelihood Theorem: –y and x j are centered, scaled –rank(X) = k for all k < n –For fixed  >0,  >0 and y –Equality achieved when:

25 26 July 2005 Regression Model Search & Uncertainty Implications Model “fit” has been removed –Latent penalty for adding an irrelevant predictor  = 1:

26 26 July 2005 Regression Model Search & Uncertainty Inference on Sparsity Estimate average value of p(y |  ) for models in

27 26 July 2005 Regression Model Search & Uncertainty Inference on Sparsity Shift by lower bound Approximate parametrically Estimate missing posterior mass

28 26 July 2005 Regression Model Search & Uncertainty Stochastic Version of Lower Bound

29 26 July 2005 Regression Model Search & Uncertainty Stochastic Version of Lower Bound First order Taylor expansion gives

30 26 July 2005 Regression Model Search & Uncertainty Assessing the Approximation Random Models Keck Models

31 26 July 2005 Regression Model Search & Uncertainty Marginal Likelihood for 

32 26 July 2005 Regression Model Search & Uncertainty Marginal Posterior p(  | y) Assign prior distribution:

33 26 July 2005 Regression Model Search & Uncertainty Future Considerations Connections to other modeling frameworks –Gaussian graphical models Model space prior distributions –p(  | X) –p(  |  ), X » F  p * (y |  ) for other priors Extended analysis of SSS, connections to MCMC

34 26 July 2005 Regression Model Search & Uncertainty Notation Regression Subsets / Models: –  is a p £ 1 indicator vector  j = 1 if x j is in the model  j = 0 if x j is not in the model  0 = (1,0,0,1,…) ! model has x 1 and x 4 “Dimension” or Size of a model

35 26 July 2005 Regression Model Search & Uncertainty Linear Model Framework For a given model Bayesian framework

36 26 July 2005 Regression Model Search & Uncertainty Parameter Space Priors Regression Model: Implied Regression:

37 26 July 2005 Regression Model Search & Uncertainty Parameter Space Priors Derive from an encompassing model –Consistency across regression models Assign a prior

38 26 July 2005 Regression Model Search & Uncertainty Neighborhood Example deletion set swap set addition set Note – |  - | = k if k > 2 (  - = ; if k = 1) – |  ± | = k (p – k) – |  + | = p – k

39 26 July 2005 Regression Model Search & Uncertainty Relationship to MCMC Closely related to Metropolis-Hastings –Use P(x) restricted to a neighborhood B(x) as the proposal distribution

40 26 July 2005 Regression Model Search & Uncertainty Relationship to MCMC Accept move with probability Relating Notation:

41 26 July 2005 Regression Model Search & Uncertainty Relationship to MCMC Can’t use “two stage” sampling –Say we sample a model  0 from  +

42 26 July 2005 Regression Model Search & Uncertainty Analysis of SSS Performance Define the map Z t 2 {0,…,k} Analyze the induced chain {Z t }

43 26 July 2005 Regression Model Search & Uncertainty Analysis of SSS Performance Consider the random variable Interest is in Focus on v 0 : Z 0 = 0

44 26 July 2005 Regression Model Search & Uncertainty Analysis of SSS Performance Need to specify the transition matrix P p,k The vector of expected hitting times is

45 26 July 2005 Regression Model Search & Uncertainty Analysis of SSS for Orthogonal Designs x i 0 x j = 0 for all i  j Two models  a,  b X a = (X 1 X 2 ) and X b = (X 1 X 3 ) X 1 is a set of k - 1 common variables

46 26 July 2005 Regression Model Search & Uncertainty Analysis of SSS for Orthogonal Designs Ratio of marginal likelihoods Least squares coefficient

47 26 July 2005 Regression Model Search & Uncertainty Analysis of SSS for Orthogonal Designs Simplifying assumption:

48 26 July 2005 Regression Model Search & Uncertainty Simulation Study Time for SSS to find  * as p increases Based on brain cancer survival data –n = 41, p = 8,408 –“True” model has four variables –Simulated m = 1,...,50 response vectors, y (m) –  (m) i » N(0,0.5), i = 1,…,n

49 26 July 2005 Regression Model Search & Uncertainty Simulation Study p 2 { 10, 100, 500, 1000, 2500, 5000, 7500, 8408, 10000, 12500, 15000, 17500, 20000, 22500, 25000 } Reorder X so that “true” variables are 1,2,3,4 p · 8,408 –Take first p columns of X and randomly permute p > 8,408 –Take first 8,408 columns, add p – 8,408 columns of random noise, and permute

50 26 July 2005 Regression Model Search & Uncertainty Marginal Posterior for  Assign prior distribution When p >> k,

51 26 July 2005 Regression Model Search & Uncertainty Example: Binary Regression Can extend SSS to binary regression Use Laplace approximation

52 26 July 2005 Regression Model Search & Uncertainty Predicting Lymph Node Status X: Gene Expression (breast cancer tumors) Y: Lymph Node Positivity Status n = 148 n 0 = 100 low risk (node negative) n 1 = 48 high risk (high node positive) p = 4,512

53 26 July 2005 Regression Model Search & Uncertainty SSS Results: 100,000 Models

54 26 July 2005 Regression Model Search & Uncertainty Model Fit and Predictive Accuracy

55 26 July 2005 Regression Model Search & Uncertainty Example: Survival Regression Weibull survival models Use Laplace approximation

56 26 July 2005 Regression Model Search & Uncertainty Predicting Survival Time X: Gene Expression (lung cancer tumors) Y: Survival Time n = 91 patients d = 45 observed survival times n-d = 46 censored times p = 2,717

57 26 July 2005 Regression Model Search & Uncertainty SSS Results: 100,000 models

58 26 July 2005 Regression Model Search & Uncertainty Survival Predictions

59 26 July 2005 Regression Model Search & Uncertainty Sensitivity/Specificty


Download ppt "Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model."

Similar presentations


Ads by Google