Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Regulation of Consumer Tests in California AAAS Meeting June 1-2, 2009 Beatrice OKeefe Acting Chief, Laboratory Field Services California Department of.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
1 1 Slide MA4704Gerry Golding Developing Null and Alternative Hypotheses Hypothesis testing can be used to determine whether Hypothesis testing can be.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Computational Diagnostics We are a new research group in the department of Computational Molecular Biology at the Max Planck Institute for Molecular Genetics.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Differentially expressed genes
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Predictive sub-typing of subjects Retrospective and prospective studies Exploration of clinico-genomic data Identify relevant gene expression patterns.
3 rd Summer School in Computational Biology September 10, 2014 Frank Emmert-Streib & Salissou Moutari Computational Biology and Machine Learning Laboratory.
Gene Expression Based Tumor Classification Using Biologically Informed Models ISI 2003 Berlin Claudio Lottaz und Rainer Spang Computational Diagnostics.
Go to Table of ContentTable of Content Analysis of Variance: Randomized Blocks Farrokh Alemi Ph.D. Kashif Haqqi M.D.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Objectives of Multiple Regression
Hypothesis Testing A hypothesis is a conjecture about a population. Typically, these hypotheses will be stated in terms of a parameter such as  (mean)
Practical Issues in Microarray Data Analysis Mark Reimers National Cancer Institute Bethesda Maryland.
Classification (Supervised Clustering) Naomi Altman Nov '06.
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
Molecular Diagnosis Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
Essential Statistics in Biology: Getting the Numbers Right
Metrological Experiments in Biomarker Development (Mass Spectrometry—Statistical Issues) Walter Liggett Statistical Engineering Division Peter Barker Biotechnology.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Sample classification using Microarray Data. AB We have two sample entities malignant vs. benign tumor patient responding to drug vs. patient resistant.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
A Short Overview of Microarrays Tex Thompson Spring 2005.
Gene Expression Signatures for Prognosis in NSCLC, Coupled with Signatures of Oncogenic Pathway Deregulation, Provide a Novel Approach for Selection of.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Empirical Efficiency Maximization: Locally Efficient Covariate Adjustment in Randomized Experiments Daniel B. Rubin Joint work with Mark J. van der Laan.
Statistical Principles of Experimental Design Chris Holmes Thanks to Dov Stekel.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
MRNA Expression Experiment Measurement Unit Array Probe Gene Sequence n n n Clinical Sample Anatomy Ontology n 1 Patient 1 n Disease n n ProjectPlatform.
Using Predictive Classifiers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Statistics for Differential Expression Naomi Altman Oct. 06.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
Genes and Chips. Genes….  The proper and harmonious expression of a large number of genes is a critical component of normal growth and development and.
Review of statistical modeling and probability theory Alan Moses ML4bio.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Chapter 9 Introduction to the t Statistic
Chapter 9: Testing a Claim
Classification with Gene Expression Data
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Chapter 9: Testing a Claim
Hypothesis testing using contrasts
Gene expression.
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Claudio Lottaz and Rainer Spang
Computational Diagnostics
Rainer Spang, Max Planck Institute for Molecular Genetics, Berlin
Rainer Spang, Max Planck Institute for Molecular Genetics, Berlin
Transcriptional Signature of Histone Deacetylases in Breast cancer
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Chapter 9: Testing a Claim
Claudio Lottaz and Rainer Spang
Evidence Based Diagnosis
Presentation transcript:

Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin

Will the patient respond to this drug? ?

computational diagnostics A simple solution for simple problems Find all genes that are induced at least x-fold and use them to predict clinical outcomes

computational diagnostics Statistical Modeling Experimental Design, Quality Control, Scaling, Normalization, Dimension Reduction, Predictive Classification, Quantifying the Evidence, Identifying the Evidence

computational diagnostics Computational Infrastructure and more Data Databases, Automatic Uploading, Standard Analysis Protocols, Analysis Software, Query Language, Understanding the disease, Designing a small Diagnostic Chip

computational diagnostics Clinical Practice Large Patient Databases complemented by expression profiles monitoring the Epidemiology of the disease

Breast Cancer, Expression Profiles and Binary Regression in 7000 Dimensions Rainer Spang, Harry Zuzan, Carrie Blanchette, Erich Huang, Holly Dressman, Jeff Marks, Joe Nevins, Mike West Duke Medical Center & Duke University

Estrogen Receptor Status 7000 genes 49 breast tumors 25 ER+ 24 ER-

Tumor – Chip Numbers

We Assume That the Following Steps Are Done: Choosing the patients Doing the surgery Handling the tissues Preparing mRNA Hybridizing the chips Image analysis Excluding low quality data Normalization Scaling

How Much Evidence Is There? I am 80% sure The probability that I know it the patient has xxx It was a guess given the profile is 0.8, 1, 0.5

Given 7000 Numbers Wanted 89% The probability that the tumor is ER+

7000 Numbers Are More Numbers Than We Need Predict ER status based on the expression levels of super-genes

Overfitting: We Can Not Identify a Model There are many different models that assign high probabilities for ER+ tumors and low probabilities for ER- tumors in the training set For a new patient we find among these models some that support that she is ER+ and others that predict she is ER- ???

Given the Few Profiles With Known Diagnosis: The uncertainty on the right model is high The variance of the model-weights is large The likelihood landscape is flat We need additional model assumptions to solve the problem

Informative Priors LikelihoodPriorPosterior

If the Prior Is Chosen Badly: We can not reproduce the diagnosis of the training profiles any more We still can not identify the model The diagnosis is driven mostly by the additional assumptions and not by the data

The Prior Needs to Be Designed in 49 Dimensions Shape? Center? Orientation? Not to narrow... not to wide

Shape multidimensional normal for simplicity

Center Assumptions on the model correspond to assumptions on the diagnosis

Orientation orthogonal super-genes !

Not to Narrow... Not to Wide Auto adjusting model Scales are hyper parameters with their own priors

What are the additional assumptions that came in by the prior? The model can not be dominated by only a few super-genes ( genes! ) The diagnosis is done based on global changes in the expression profiles influenced by many genes The assumptions are neutral with respect to the individual diagnosis

Which Genes Have Driven the Prediction ? GeneWeight nuclear factor 3 alpha0.853 cysteine rich heart protein0.842 estrogen receptor0.840 intestinal trefoil factor0.840 x box binding protein gata ps liv many many more......

Cysteine Rich Heart Protein

Summary... so far We have solved a relatively simple computational diagnostics problem (ER-status in human breast cancers) Probit model Overfitting is a problem Additional model assumptions do the trick

A Common Problem With Expression Profiles We do not have enough samples to answer a certain question A possible strategy: Introduce additional model assumptions

Differential Expression I Setup: Two conditions ( healthy vs sick ), some repetitions, genes Which genes are up or down regulated ? The most basic question Good because it is a hypothesis free approach

Differential Expression II degrees of freedom A very bad multiple testing problem It is possible in principal, but might require many replications depending on signal to noise ratios SAM: regularized t-statistic + permutation based false positive rates Hard to improve the analysis because it is a hypothesis free approach

Clustering of Genes Setup: many different conditions - time series - multiple knock-outs 100% explorative analysis Essentially it is rearranging the data Good for finding hypotheses but not for verifying them

Clustering of Profiles (Patients) Maybe we can find new disease types or refine existing ones Completely different results when different sets of genes are used No predictive analysis

Think About Data Analysis Ahead of Time Collect possible questions on the data Which of them are easy ? - Biologists and Bioinformaticians might have a different take on that - Compare: number of samples vs. degrees of freedom It is possible to compensate lack of data with model assumptions: Which assumptions make sense ? More complex question can be the easier ones if they allow for an appropriate model