Presentation is loading. Please wait.

Presentation is loading. Please wait.

It is only the beginning: Putting microarrays into context Matthias E. Futschik Institute for Theoretical Biology Humboldt-University, Berlin, Germany.

Similar presentations


Presentation on theme: "It is only the beginning: Putting microarrays into context Matthias E. Futschik Institute for Theoretical Biology Humboldt-University, Berlin, Germany."— Presentation transcript:

1 It is only the beginning: Putting microarrays into context Matthias E. Futschik Institute for Theoretical Biology Humboldt-University, Berlin, Germany Hvar sommer school, 2004

2 The Whole Picture ?Metabolites Protein Functions Protein Structures Medical expert knowledge knowledge Microarrays DNA Chromosomal Location

3 Networks of Genes Gene expression is regulated by complex genetic networks with a variety of interactions on different levels (DNA, RNA, protein), on many different time scales (seconds to years) and at various locations (nucleus, cytoplasma, tissue). Models: Boolean networks Bayesian networks Differential equations

4 Onthologíes: Categorising and labeling objects Onthology: restricted vocabulary with structuríng rules describing relationship between terms Representation as graph: Terms as nodes Edges as rules Transitivity rule Parent and child nodes Bard and Rhee, Nature Gen. Rev., 2004

5 Gene Onthology Consists of three independent onthologies: molecular function e.g. enzyme biological process e.g. signal transduction cellular component Gene sets / clusters can easily be analysed based on gene onthology terms

6 Mapping of gene expression to chromosomal location Significance analysis of chromosomal location of differential gene expression (SW620 vs SW480) The p-value for finding at least k from a total of s significant differentially expressed genes within a cytoband window is where g is the total number of genes with cytoband location and n the total number of genes within the cytoband window.

7 Relating number of gene copies and gene expression I Pollack et al., PNAS, 2001 Study of chromosomal abnormalities in breast cancer usage of genomic DNA and cDNA arrays hotspots of increased number of gene copies

8 Relating number of gene copies and gene expression II Correlation of gene copy number and transcriptional levéls detected

9 Correlation of mRNA and protein abundance Ideker et al, Science, 2001 Study of yeast galactose-utilisation pathway Use of microarrays, quantative proteomics and databases of protein interactions Dissection of transcriptional and post- translational control New genes and interactions in GAL pathway were found Av. Correlation of 0.63 between transcript and protein levels

10 Genome, Transcriptome and Translatome Greenbaum et al, Genome Research, 2001 Interrelating geneome, transcriptome and translatome Similar compostion based on functional categories of translatome and transcriptome Differing composition of genome

11 Linking expression to drug effectiveness Relevance networks: Butte et al, PNAS, 2000 Correlation between growth inhibition by drugs and gene expression for NCI60 cell lines Gene expression based on Affymetrix chips (7245 genes) 5000 anticancer agents Significance testing based on randomisation Significant link between LCP1 and NSC 624044

12 Combinining gene expression data with clinical parameters Diffuse large B-cell Lymphoma (DLBCL) Most common lymphoid malignancy in adults Treatment by multi-agent chemotherapy In case of a relapse: bone marrow transplantation Clinical course of DLBCL is widely variable: Only 40% of treatments successful => Accurate outcome prediction is crucial for stratifying patients for intensified therapy

13 Case study: DLBCL Current prognostic model: International Prediction Index (IPI) Alternative: Microarray-based prediction of treatment outcome DLBCL study by Shipp et al. (Nature Medicine, 2002, 8(1):68-74) expression profiles of 58 patients using Hu6800 Affymetrix chips (corresponding to ca. 6800 genes) Prediction accuracy of outcome using leave-one-out procedure: Knn: 70.7%; WV: 75.9%; SVM:77.6%

14 Sammon's Mapping of top 22 genes ranked by signal-to-noise: Large overlap between classes with ‘cured’ and ‘fatal’ outcome. Low correlation of gene expression with classes: Only 3 genes with correlation coef > 0.4 Leukemia study by Golub et al : 263 genes Colon cancer study by Alon et al.: 215 genes Limitation of microarray approach: Only mRNA abundance is measured. However, many different factors (patient and tumour related) determine outcome of therapy: Integration might be necessary! DLBCL outcome prediction is challenging!

15 Prognostic models for DLBCL Clinical predictor: IPI based on five risk factors (age, tumour stage, patient’s performance, number of extranodal sites, LDH concentration) Survival rate determined in clinical study: Low risk: 73%, low-intermediate: 51%, intermediate-high: 42%, high: 26% Conversion of IPI into Bayesian classifier using survival rates as conditional probabilities P: e.g. Sample belongs to class ‘cured’ if P(‘cured’|IPI)> P(‘fatal’|IPI) => Overall accuracy of 73.2%.

16 Prognostic models for DLBCL Microarray-based predictor: Identifies clusters by unsupervised learning Supervised classification EfuNN as five layered neural network Based on 17 genes using signal-to-noise criterion Accuracy using leave-one-out: 78.5%

17 Independence of predictors Set theory: For 19 of 56 samples complementary (8 samples only correctly classified by IPI-based predictor, 11 only by microarray-based predictor) Setting upper threshold to 92.6% (52 out of 56 samples) Mutual Information x,y = (0,1) : microarray-based, IPI-based predictions of class (cured -fatal) P,Q : probability of microarray-, IPI-based predictions R(x,y): joint probability of predictions by microarray- and IPI-based predictors I = Σ x,y R(x,y) log 2 (R(x,y)/[P(x)Q(y)]) ~ 0.05 => Microarray-based and IPI-based predictor statistically independent!

18 Hierarchical modular decision system Three layered hierarchical model Predictor module layer consisting of independently trained predictors Class unit layer integrating prediction by single predictors Decision layer producing final prediction Model parameters: α, β 1,β 2 Training: error backpropagation with parallel training of neural network Integration of predictions in class units: weighted sum Validation: leave-one-out

19 Improved prediction by integration Significantly improved accuracy of modular hierarchical system (parameter values:α=0.4, β 1 = 0.8, β 2 = 0.75) 73.2%IPI 78.5%EFuNN 87.5%Hierarchical model AccuracyModel Constructive and destructive interference: Both microarray-based and clinical predictor are necessary for improvement

20 Identification of areas of expertise => Data stratification can be used to detect areas of expertise e.g. IPI risk group low, low-intermediate, intermediate- high for microarray-based classifier => Identification by data stratification can indicate limits of models e.g. IPI risk group high for microarray-based classifier Stratification of data set by IPI category M.E. Futschik et al., Prediction of clinical behaviour and treatment for cancers, OMJ Applied Bioinformatics, 2003

21 The way out of the microarray cave Hvala i dovidjenja!


Download ppt "It is only the beginning: Putting microarrays into context Matthias E. Futschik Institute for Theoretical Biology Humboldt-University, Berlin, Germany."

Similar presentations


Ads by Google