Gene Expression Based Tumor Classification Using Biologically Informed Models ISI 2003 Berlin Claudio Lottaz und Rainer Spang Computational Diagnostics.

Slides:



Advertisements
Similar presentations
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Advertisements

Instance-based Classification Examine the training samples each time a new query instance is given. The relationship between the new query instance and.
Achim Tresch Computational Biology ‘Omics’ - Analysis of high dimensional Data.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Linking Genetic Profiles to Biological Outcome Paul Fogel Consultant, Paris S. Stanley Young National Institute of Statistical Sciences NISS, NMF Workshop.
Computational Diagnostics We are a new research group in the department of Computational Molecular Biology at the Max Planck Institute for Molecular Genetics.
Seeing the forest for the trees : using the Gene Ontology to restructure hierarchical clustering Dikla Dotan-Cohen, Simon Kasif and Avraham A. Melkman.
Lecture 14 – Neural Networks
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
. Differentially Expressed Genes, Class Discovery & Classification.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
Microarray-based Disease Prognosis using Gene Annotation Signatures Michael Kovshilovsky Swapna Annavarapu SoCalBSI 2005.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Copyright  2003 limsoon wong Diagnosis of Childhood Acute Lymphoblastic Leukemia and Optimization of Risk-Benefit Ratio of Therapy Limsoon Wong Institute.
SIMPLISTIC CLASSIFICATION OF LYMPHOID LEUKEMIAS
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
Model Assessment and Selection Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
Gene expression profiling identifies molecular subtypes of gliomas
Automatic methods for functional annotation of sequences Petri Törönen.
Whole Genome Expression Analysis
Structured Analysis of Microarrays & Differential Coexpression Claudio Lottaz, Dennis Kostka & Rainer Spang Courses in Practical DNA Microarray Analysis.
Classification (Supervised Clustering) Naomi Altman Nov '06.
Knowledge Discovery in Biomedicine Limsoon Wong Institute for Infocomm Research.
Jo Anne Zujewski, MD Cancer Therapy Evaluation Program Division of Cancer Diagnosis and Treatment National Cancer Institute May 2011 Breast Cancer Biology.
Frédéric Schütz Statistics and bioinformatics applied to –omics technologies Part II: Integrating biological knowledge Center.
Molecular Diagnosis Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
Exagen Diagnostics, Inc., all rights reserved Biomarker Discovery in Genomic Data with Partial Clinical Annotation Cole Harris, Noushin Ghaffari.
University of Washington Institute of Technology Tacoma, WA, USA Ecole des Hautes Etudes en Santé Publique Département Infobiostat Rennes, France Isabelle.
Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor.
Diagnosis using computers. One disease Three therapies.
Sample classification using Microarray Data. AB We have two sample entities malignant vs. benign tumor patient responding to drug vs. patient resistant.
The Broad Institute of MIT and Harvard Classification / Prediction.
Gene Ontology as a tool for the systematic analysis of large-scale gene-expression data Stefan Bentink Joint groupmeeting Klipp/Spang
Classification of microarray samples Tim Beißbarth Mini-Group Meeting
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Computational Diagnostics A new research group at the Max Planck Institute for molecular Genetics, Berlin.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Statistical Testing with Genes Saurabh Sinha CS 466.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Flat clustering approaches
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Classifiers!!! BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Evolution-informed Modeling discover biomarkers for precision oncology Li Liu, M.D. August 22, 2016.
Semi-Supervised Clustering
Classification with Gene Expression Data
Statistical Testing with Genes
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Molecular Classification of Cancer
Claudio Lottaz and Rainer Spang
Computational Diagnostics
Rainer Spang, Max Planck Institute for Molecular Genetics, Berlin
Rainer Spang, Max Planck Institute for Molecular Genetics, Berlin
Multivariate Methods Berlin Chen
Label propagation algorithm
Claudio Lottaz and Rainer Spang
Presentation transcript:

Gene Expression Based Tumor Classification Using Biologically Informed Models ISI 2003 Berlin Claudio Lottaz und Rainer Spang Computational Diagnostics Group Max Planck Institute for Molecular Genetics, Berlin

Tumor Diagnosis/Prognosis with expression profiles Data: Tens of thousands of genes Tens to hundreds of patients Patients are labeled E.g. Disease (D) / Control (C) Problem: Predict the label of a new patient given his/her expression profile Patients Genes D C

Statistical Context: Learning Theory High dimensional models  overfitting problems Additional constraints  Regularized Multivariate Models What are these constraints? -A small maximal number of genes allowed in the model (Variable Selection, Sparse Models) - Likelihood penalties, - Informative priors, - Large Margins,

Frustrations Falsely predicted patients Questionable Labels Genes that make no sense in the context Secondary and tertiary effects are more prominent then causal molecular mechanisms

The patient groups are seen as molecular homogenous groups Genes are anonymous variables: x 1,…,x n Implicit assumptions of standard approaches

Our approach: 1. Sub-class finding instead of global class prediction 2. Use of functional annotations of genes Molecular Symptoms

Global class prediction vs. Subclass finding D C D‘ D\D‘ C Global class prediction: Find a molecular signature that separates D from C and generalizes to new patients Subclass finding: Find a Subclass D‘  D and a molecular signature that separates D‘ from C We call this signature a molecular symptom associated to D

Molecular symptoms High specificity and sub-optimal sensitivity partially supervised Molecular properties that are (almost) unique to the disease group, but do not need to be present in all patients having the disease Novel stratification of patients Hidden molecular sub-entities There are in general many molecular symptoms associated to a disease One patient can have several molecular symptoms

Exploiting functional annotations of genes A posteriori use of functional annotations A priori use of functional annotations ( suggested here ) Data Functional Annotations Statistical Analysis Data Functional Annotations Statistical Analysis

What are we looking for? C D‘ 1 D‘ 2 D‘ 3 D‘ 4 DNA Repair Apoptosis Cell Proliferation HOX Genes

Functional grouping of genes using Gene Ontologies (GO) n GO is a database of terms for genes n Known genes are annotated to the terms n Terms are connected as a directed acyclic graph n Levels represent specifity of the terms

The whole thing at a glance GO contains three different sub- ontologies: –Molecular function –Biological process –Cellular component

The Annotation Gene 1023 Gene Gene Gene 13 Gene Gene Gene 311 Gene 314 Gene Gene 6702 Gene Gene Genes are annotated to both leave and inner nodes Genes can have multiple annotations

Gene 1023 Gene Gene Gene 1023 Gene Gene Augmented Ontology

Structured Analysis of Microarrays (StAM) - Claudio Lottaz - Modular Grid of three components 1. Classification in Leave Nodes 2.Diagnosis Propagation 3.Regularization: Gene 1023 Gene Gene Gene 311 Gene 314 Gene 22666

1. Classification in the leave nodes by Shrunken centroid classification: PAM (Tibshirani et al 2002) DLDA-like Discrimination Discriminant function via the distance to the shrunken class centroids d(C), d(D) Regularization: Variable selection via centroid shrinkage Class Probabilities:

2. Propagation of Diagnosis by weighted averages w1w1 w2w2 w3w3 C1 C3C2 Pa Weights are proportional to CV-Performance measured by a weighted deviance The weight  is used to enforce high specificity and relaxed sensitivity. Typically:  =0.95

3. Regularization by graph shrinkage To get rid of uninformative branches of the Gene Ontology, we shrink the weights in the progression step by a constant   is chosen by crossvalidation

Redundancy of a shrunken graph For two nodes we define the distance:... which reflects the probability of an inconsistent diagnosis For a single node we define its redundancy after shrinkage:... where K  is the set of all remaining nodes in the graph after shrinkage For a shrunken GO graph, we define its redundancy:

Expression data from a leukemia study Study on acute lymphoblastic leukaemia (ALL) 327 patients genes (Affymetrix HG- U95Av2) Yeoh et al., Cancer Cell 2002 My focus in this talk: MLL – ( ) vs. Others Objective: Diagnosis of cytogenetic subtypes of ALLs: 20 MLL - ( ) 27 E2A-PBX1 15 BCR-ABL 79 TEL-AML1 87 Hyperdiploid 7 Hypodiploid 29 Pseudodiploid 18 nomal (B-cell ALL) 43 T-ALL

Training and Test Disease group: ( MLL positive ALL ) Control group : ( other types of ALL ) Trainings – Test data (2/3 – 1/3 of both Disease and Control cases, randomly split) All model selection steps are part of the training !!! They are performed using CV of only the trainings data. - Centroid shrinkage in the leave nodes - Deviances for the propagation weights - Graph shrinkage

Error Rate Redundancy

The GO graph before and after shrinkage AfterBefore The results on the next slides show the performance of the fully shrinked model on the test data

Rows: GO- Nodes Columns: 7 MLL-Patients (Testset) Color: MLL- Score Selectivity/ Specificity

Molecular Symptom III Molecular Symptom I Molecular Symptom II

Spermatogenesis ??? Cyclin A1 Even Skipped Homolog Transcription Factor like 5 Cell CycleOncogenesisCell Proliferation Molecular Symptom I

Apoptosis Lectin Protein Tyrosin Phosphotase Molecular Symptom I

Phosphorylation Molecular Symptom II

Cell-Cell Signalling Molecular Symptom III

Almost Global Nodes Signal Transduction The Root

Claudio Lottaz R-code StAM Structured Analysis of Microarrays Julie Floch Java Application for browsing StAM output

Summary Data Functional Annotations Statistical Analysis D‘ D\D‘ C a prori use of functional annotations gene ontologies StAM 1. Classification in Leave Nodes 2.Diagnosis Propagation 3.Regularization: subclass finding Molecular Symptoms