Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Latent Variable and Structural Equation Models: Bayesian Perspectives and Implementation. Peter Congdon, Queen Mary University of London, School of Geography.
Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,
Computer vision: models, learning and inference Chapter 8 Regression.
1 Parametric Sensitivity Analysis For Cancer Survival Models Using Large- Sample Normal Approximations To The Bayesian Posterior Distribution Gordon B.
Computer vision: models, learning and inference
Markov Networks.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Visual Recognition Tutorial
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Predictive sub-typing of subjects Retrospective and prospective studies Exploration of clinico-genomic data Identify relevant gene expression patterns.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Visual Recognition Tutorial
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Chapter 13: Inference in Regression
Bayes Factor Based on Han and Carlin (2001, JASA).
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Model Inference and Averaging
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
An Efficient Sequential Design for Sensitivity Experiments Yubin Tian School of Science, Beijing Institute of Technology.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
STA 216 Generalized Linear Models Meets: 2:50-4:05 T/TH (Old Chem 025) Instructor: David Dunson 219A Old Chemistry, Teaching.
1 Analytic Solution of Hierarchical Variational Bayes Approach in Linear Inverse Problem Shinichi Nakajima, Sumio Watanabe Nikon Corporation Tokyo Institute.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Lecture 2: Statistical learning primer for biologists
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning 5. Parametric Methods.
Tutorial I: Missing Value Analysis
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Review of statistical modeling and probability theory Alan Moses ML4bio.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Bayesian Semi-Parametric Multiple Shrinkage
Chapter 7. Classification and Prediction
Bayesian data analysis
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
STA 216 Generalized Linear Models
Multimodal Learning with Deep Boltzmann Machines
Roberto Battiti, Mauro Brunato
STA 216 Generalized Linear Models
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Robust Full Bayesian Learning for Neural Networks
Generally Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Wellington Cabrera Advisor: Carlos Ordonez
Markov Networks.
Presentation transcript:

Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model Search and Uncertainty with Many Predictors Christopher M. Hans

26 July 2005 Regression Model Search & Uncertainty Overview Regression Model Space Exploration SSS Methodology –Comparisons to MCMC –Analytic Evaluation Sparsity in the Normal Linear Model Several examples from cancer genomics

26 July 2005 Regression Model Search & Uncertainty Variable Selection & Model Uncertainty Regression Modeling –Many possible predictors Model/Variable Selection –Choose one set of “relevant” predictors Model Averaging –Use all models (or at least a set of high probability)

26 July 2005 Regression Model Search & Uncertainty Model Search Strategies Stepwise Methods –Forward/Backward selection –“Leaps and Bounds” [Furnival & Wilson, 1974] MCMC –Gibbs sampling [George & McCulloch, 1993, 1997; Geweke, 1996; Smith and Kohn, 1996, 1997; Brown et al., 1998] –Metropolis-Hastings [Madigan & York, 1995; Raftery et al. 1997; Green, 1995]

26 July 2005 Regression Model Search & Uncertainty Sparsity via p(  ) Focus is on “sparse” models Sparsity is encoded in the model via the prior Specification of  important

26 July 2005 Regression Model Search & Uncertainty Parameter Space Priors Encompassing Model Induced Priors Closed form calculation of p(y |  )

26 July 2005 Regression Model Search & Uncertainty Shotgun Stochastic Search Shoot out many proposals Evaluate them in parallel Sample a new model from the proposals

26 July 2005 Regression Model Search & Uncertainty Criteria for Model Search At each iteration: 1. Move across dimension effectively 2. Allow each variable to be considered 3. Quickly identify “similar” models

26 July 2005 Regression Model Search & Uncertainty Regression Model SSS Defining the Neighborhood: Current model  of size k Consider three types of proposals: –neighboring models  - of dimension k - 1 –neighboring models  ± of dimension k –neighboring models  + of dimension k + 1 These are the models “shot out” (in parallel)

26 July 2005 Regression Model Search & Uncertainty Choosing the New Model Would like dimensional balance: Bayes: sample based on relative posterior probabilities Otherwise: BIC, R 2, F statistic, etc.

26 July 2005 Regression Model Search & Uncertainty Regression Model SSS Current Model Parallel Computing Step Three Proposals New Model

26 July 2005 Regression Model Search & Uncertainty SSS Output As the search progresses: –Maintain list  * of the best models evaluated –Based on a score function [log p(y |  ) + log p(  )] –Use these models to summarize the posterior

26 July 2005 Regression Model Search & Uncertainty Posterior Summarization Condition on  * –Norming Constant: –Model probabilities: –Dimension Importance: –Variable importance:

26 July 2005 Regression Model Search & Uncertainty Relationship to MCMC Metropolis-Hastings: –Use P(x) restricted to a neighborhood B(x) as the proposal distribution –Acceptance Probability:

26 July 2005 Regression Model Search & Uncertainty Analysis of SSS Performance Fixed dimensional SSS: nbd(  ) =  ± Expected time to find  * Orthogonal design

26 July 2005 Regression Model Search & Uncertainty Simulation Study Real Data: n = 41, p = 8, simulated y from “true” model: |  * | = 4 p 2 { 10, 100, 500, 1000, 2500, 5000, 7500, 8408, 10000, 12500, 15000, 17500, 20000, 22500, }

26 July 2005 Regression Model Search & Uncertainty Comparison to MCMC 40,000 SSS iterations SSS: 11 hrs. 53 min. 29,163 Gibbs iterations Gibbs: 75.41% 1,137,195,208 model evaluations 135,252 Gibbs iterations Gibbs: 55 hrs. 13 min. Gibbs: 97.49%

26 July 2005 Regression Model Search & Uncertainty Illustration: Glioblastoma Survival Study Keck Center for Neurooncogenomics at Duke –n = 41 (patients) p = 8,408 (genes) –Expression levels are standardized –Priors:  = 1,  = 3,  = 10/p

26 July 2005 Regression Model Search & Uncertainty Posterior Summarization 1,000,000 models from 40,000 iterations (<12 hours)

26 July 2005 Regression Model Search & Uncertainty Assessing Model Fit Model Averaged Fit –Sample from

26 July 2005 Regression Model Search & Uncertainty Extension to Binary Regression

26 July 2005 Regression Model Search & Uncertainty Extension to Weibull Survival Models

26 July 2005 Regression Model Search & Uncertainty Sparsity in the Normal Linear Model Sparsity via p(  ) Sparsity via p(y |  ): shrinkage Marginal Likelihood:

26 July 2005 Regression Model Search & Uncertainty Lower Bound on the Marginal Likelihood Theorem: –y and x j are centered, scaled –rank(X) = k for all k < n –For fixed  >0,  >0 and y –Equality achieved when:

26 July 2005 Regression Model Search & Uncertainty Implications Model “fit” has been removed –Latent penalty for adding an irrelevant predictor  = 1:

26 July 2005 Regression Model Search & Uncertainty Inference on Sparsity Estimate average value of p(y |  ) for models in

26 July 2005 Regression Model Search & Uncertainty Inference on Sparsity Shift by lower bound Approximate parametrically Estimate missing posterior mass

26 July 2005 Regression Model Search & Uncertainty Stochastic Version of Lower Bound

26 July 2005 Regression Model Search & Uncertainty Stochastic Version of Lower Bound First order Taylor expansion gives

26 July 2005 Regression Model Search & Uncertainty Assessing the Approximation Random Models Keck Models

26 July 2005 Regression Model Search & Uncertainty Marginal Likelihood for 

26 July 2005 Regression Model Search & Uncertainty Marginal Posterior p(  | y) Assign prior distribution:

26 July 2005 Regression Model Search & Uncertainty Future Considerations Connections to other modeling frameworks –Gaussian graphical models Model space prior distributions –p(  | X) –p(  |  ), X » F  p * (y |  ) for other priors Extended analysis of SSS, connections to MCMC

26 July 2005 Regression Model Search & Uncertainty Notation Regression Subsets / Models: –  is a p £ 1 indicator vector  j = 1 if x j is in the model  j = 0 if x j is not in the model  0 = (1,0,0,1,…) ! model has x 1 and x 4 “Dimension” or Size of a model

26 July 2005 Regression Model Search & Uncertainty Linear Model Framework For a given model Bayesian framework

26 July 2005 Regression Model Search & Uncertainty Parameter Space Priors Regression Model: Implied Regression:

26 July 2005 Regression Model Search & Uncertainty Parameter Space Priors Derive from an encompassing model –Consistency across regression models Assign a prior

26 July 2005 Regression Model Search & Uncertainty Neighborhood Example deletion set swap set addition set Note – |  - | = k if k > 2 (  - = ; if k = 1) – |  ± | = k (p – k) – |  + | = p – k

26 July 2005 Regression Model Search & Uncertainty Relationship to MCMC Closely related to Metropolis-Hastings –Use P(x) restricted to a neighborhood B(x) as the proposal distribution

26 July 2005 Regression Model Search & Uncertainty Relationship to MCMC Accept move with probability Relating Notation:

26 July 2005 Regression Model Search & Uncertainty Relationship to MCMC Can’t use “two stage” sampling –Say we sample a model  0 from  +

26 July 2005 Regression Model Search & Uncertainty Analysis of SSS Performance Define the map Z t 2 {0,…,k} Analyze the induced chain {Z t }

26 July 2005 Regression Model Search & Uncertainty Analysis of SSS Performance Consider the random variable Interest is in Focus on v 0 : Z 0 = 0

26 July 2005 Regression Model Search & Uncertainty Analysis of SSS Performance Need to specify the transition matrix P p,k The vector of expected hitting times is

26 July 2005 Regression Model Search & Uncertainty Analysis of SSS for Orthogonal Designs x i 0 x j = 0 for all i  j Two models  a,  b X a = (X 1 X 2 ) and X b = (X 1 X 3 ) X 1 is a set of k - 1 common variables

26 July 2005 Regression Model Search & Uncertainty Analysis of SSS for Orthogonal Designs Ratio of marginal likelihoods Least squares coefficient

26 July 2005 Regression Model Search & Uncertainty Analysis of SSS for Orthogonal Designs Simplifying assumption:

26 July 2005 Regression Model Search & Uncertainty Simulation Study Time for SSS to find  * as p increases Based on brain cancer survival data –n = 41, p = 8,408 –“True” model has four variables –Simulated m = 1,...,50 response vectors, y (m) –  (m) i » N(0,0.5), i = 1,…,n

26 July 2005 Regression Model Search & Uncertainty Simulation Study p 2 { 10, 100, 500, 1000, 2500, 5000, 7500, 8408, 10000, 12500, 15000, 17500, 20000, 22500, } Reorder X so that “true” variables are 1,2,3,4 p · 8,408 –Take first p columns of X and randomly permute p > 8,408 –Take first 8,408 columns, add p – 8,408 columns of random noise, and permute

26 July 2005 Regression Model Search & Uncertainty Marginal Posterior for  Assign prior distribution When p >> k,

26 July 2005 Regression Model Search & Uncertainty Example: Binary Regression Can extend SSS to binary regression Use Laplace approximation

26 July 2005 Regression Model Search & Uncertainty Predicting Lymph Node Status X: Gene Expression (breast cancer tumors) Y: Lymph Node Positivity Status n = 148 n 0 = 100 low risk (node negative) n 1 = 48 high risk (high node positive) p = 4,512

26 July 2005 Regression Model Search & Uncertainty SSS Results: 100,000 Models

26 July 2005 Regression Model Search & Uncertainty Model Fit and Predictive Accuracy

26 July 2005 Regression Model Search & Uncertainty Example: Survival Regression Weibull survival models Use Laplace approximation

26 July 2005 Regression Model Search & Uncertainty Predicting Survival Time X: Gene Expression (lung cancer tumors) Y: Survival Time n = 91 patients d = 45 observed survival times n-d = 46 censored times p = 2,717

26 July 2005 Regression Model Search & Uncertainty SSS Results: 100,000 models

26 July 2005 Regression Model Search & Uncertainty Survival Predictions

26 July 2005 Regression Model Search & Uncertainty Sensitivity/Specificty