Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rainer Spang, Max Planck Institute for Molecular Genetics, Berlin

Similar presentations


Presentation on theme: "Rainer Spang, Max Planck Institute for Molecular Genetics, Berlin"— Presentation transcript:

1 Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC
Rainer Spang, Max Planck Institute for Molecular Genetics, Berlin Harry Zuzan, Carrie Blanchette, Erich Huang, Holly Dressman, Jeff Marks, Joe Nevins, Mike West Duke Medical Center & Duke University

2 Estrogen Receptor Status
7000 genes 49 breast tumors 25 ER+ 24 ER-

3 Tumor – Chip Numbers

4 Given Wanted 89% The probability that the tumor is ER+ 7000 Numbers

5 7000 Numbers Are More Numbers Than We Need
Predict ER status based on the expression levels of super-genes

6 Singular Value Decomposition
Singular values Loadings Data Expression levels of super genes, orthogonal matrix

7 Probit Model Class of tumor i
Distribution Function of a Standard Normal Regression weight for super gene i Expression Level of super gene i

8 Overfitting Using only a small number of super genes is not robust at all When using many (all) supergenes, the linear model can be easily saturated, i.e. we have several models that fit perfectly well Consequence: For a new patient we find among these models some that support that she is ER+ and others that predict she is ER-

9 Given the Few Profiles With Known Diagnosis:
The uncertainty on the right model is high The variance of the model-weights is large The likelihood landscape is flat We need additional model assumptions to solve the problem

10 Informative Priors Likelihood Prior Posterior

11 If the Prior Is Chosen Badly:
We can not reproduce the diagnosis of the training profiles any more We still can not identify the model The diagnosis is driven mostly by the additional assumptions and not by the data

12 The Prior Needs to Be designed in 49 Dimensions
Shape? Center? Orientation? Not to narrow ... not to wide

13 Shape multidimensional normal for simplicity

14 Assumptions on the model correspond to assumptions on the diagnosis
Center Assumptions on the model correspond to assumptions on the diagnosis

15 Orientation orthogonal super-genes !

16 Not to Narrow ... Not to Wide
Auto adjusting model Scales are hyper parameters with their own priors

17 Prior given the hyper parameter
Rescaling by singular values Hyper parameter Independent super genes Unbiased prior

18 A prior for the hyper parameters
Conjugate prior Flexibility for Symmetric U-Shaped prior for k=2 or k=3

19 Latent Variable Albert & Chip 1993

20 MCMC - Gibbs Sampler - Sequential updates of conditional distributions
All conditional posteriors can be calculated analytically West 2001, Albert & Chip 1993

21 What are the additional assumptions that came in by the prior?
The model can not be dominated by only a few super-genes ( genes! ) The diagnosis is done based on global changes in the expression profiles influenced by many genes The assumptions are neutral with respect to the individual diagnosis

22

23 Which Genes Have Driven the Prediction ?
Weight nuclear factor 3 alpha 0.853 cysteine rich heart protein 0.842 estrogen receptor 0.840 intestinal trefoil factor x box binding protein 1 0.835 gata 3 0.818 ps 2 liv1 0.812 ... many many more ... ...

24 Thank you!


Download ppt "Rainer Spang, Max Planck Institute for Molecular Genetics, Berlin"

Similar presentations


Ads by Google