Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research.

Slides:



Advertisements
Similar presentations
Hotspot Hunter: a computational system for large-scale screening and selection of candidate immunological hotspots in pathogen proteomes G.L. Zhang, A.M.
Advertisements

Instance-based Classification Examine the training samples each time a new query instance is given. The relationship between the new query instance and.
Show & Tell Limsoon Wong KRDL Datamining: Turning Biological Data into Gold.
Computer Aided Vaccine Design Dr G P S Raghava. Concept of Drug and Vaccine Concept of Drug Concept of Drug –Kill invaders of foreign pathogens –Inhibit.
Copyright © 2004 by Limsoon Wong Research & Discovery: Technologies Today for Solving Problems Tomorrow Limsoon Wong Institute for Infocomm Research.
4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Copyright (c) 2004 by Limsoon Wong Assessing Reliability of Protein-Protein Interaction Experiments Limsoon Wong Institute for Infocomm Research.
Copyright  2004 limsoon wong Assessing Reliability of Protein- Protein Interaction Experiments Limsoon Wong Institute for Infocomm Research.
. Differentially Expressed Genes, Class Discovery & Classification.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
3 rd Summer School in Computational Biology September 10, 2014 Frank Emmert-Streib & Salissou Moutari Computational Biology and Machine Learning Laboratory.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Computational learning of stem cell fates Martina Koeva 09/10/07.
Copyright  2003 limsoon wong Diagnosis of Childhood Acute Lymphoblastic Leukemia and Optimization of Risk-Benefit Ratio of Therapy Limsoon Wong Institute.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
AAAI05 Tutorial on Bioinformatics & Machine Learning Jinyan Li & Limsoon Wong Institute for Infocomm Research 21 Heng Mui Keng Terrace Singapore Copyright.
Gene expression profiling identifies molecular subtypes of gliomas
Whole Genome Expression Analysis
Structured Analysis of Microarrays & Differential Coexpression Claudio Lottaz, Dennis Kostka & Rainer Spang Courses in Practical DNA Microarray Analysis.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
Knowledge Discovery in Biomedicine Limsoon Wong Institute for Infocomm Research.
Copyright  2003 limsoon wong Data Mining of Gene Expression Profiles for the Diagnosis and Understanding of Diseases Limsoon Wong Institute for Infocomm.
Chapter 7 Essential Concepts in Molecular Pathology Companion site for Molecular Pathology Author: William B. Coleman and Gregory J. Tsongalis.
From motif search to gene expression analysis
Copyright  2004 limsoon wong CS2220: Computation Foundation in Bioinformatics Limsoon Wong Institute for Infocomm Research Lecture slides for 3 February.
Finish up array applications Move on to proteomics Protein microarrays.
It is only the beginning: Putting microarrays into context Matthias E. Futschik Institute for Theoretical Biology Humboldt-University, Berlin, Germany.
Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong For written notes on this lecture, please read Chapters 4 and 7 of The Practical Bioinformatician.
The Broad Institute of MIT and Harvard Classification / Prediction.
Selection of Patient Samples and Genes for Disease Prognosis Limsoon Wong Institute for Infocomm Research Joint work with Jinyan Li & Huiqing Liu.
Michael Birrer Ian McNeish New Developments in Biology and Targets of Epithelial Ovarian Cancer.
Computational biology of cancer cell pathways Modelling of cancer cell function and response to therapy.
Knowledge Discovery from Biological and Clinical Data: BASIC BACKGROUND.
PCA, Clustering and Classification by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Copyright  2004 limsoon wong A Practical Introduction to Bioinformatics Limsoon Wong Institute for Infocomm Research Lecture 2, May 2004 For written notes.
Enabling Reproducible Gene Expression Analysis Using Biological Pathways Limsoon Wong 7 April 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo)
Bertinoro, Nov 2005 Some Data Mining Challenges Learned From Bioinformatics & Actions Taken Limsoon Wong National University of Singapore.
Gene Expression Signatures for Prognosis in NSCLC, Coupled with Signatures of Oncogenic Pathway Deregulation, Provide a Novel Approach for Selection of.
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
Copyright  2003 limsoon wong From Informatics to Bioinformatics: The Knowledge Discovery Perspective Limsoon Wong Institute for Infocomm Research Singapore.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics.
Medstar: a prototype for biomedical social network Xiaoli Li Institute for Infocomm Research A*Star, Singapore.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Copyright  2004 limsoon wong A Practical Introduction to Bioinformatics Limsoon Wong Institute for Infocomm Research Lecture 3, May 2004 For written notes.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
Prof. Yechiam Yemini (YY) Computer Science Department Columbia University (c)Copyrights; Yechiam Yemini; Lecture 2: Introduction to Paradigms 2.3.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong For written notes on this lecture, please read chapter 14 of The Practical Bioinformatician, CS2220:
Limsoon Wong Laboratories for Information Technology Singapore From Datamining to Bioinformatics.
Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong For written notes on this lecture, please read chapter 3 of The Practical Bioinformatician, CS2220:
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
Copyright  2004 limsoon wong CS2220: Computation Foundation in Bioinformatics Limsoon Wong Institute for Infocomm Research Lecture slides for 13 January.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Classifiers!!! BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
Show & Tell Limsoon Wong Kent Ridge Digital Labs Singapore Role of Bioinformatics in the Genomic Era.
Evolution-informed Modeling discover biomarkers for precision oncology Li Liu, M.D. August 22, 2016.
Limsoon Wong Laboratories for Information Technology Singapore From Informatics to Bioinformatics.
Gene Expression Analysis
Classifiers!!! BCH339N Systems Biology / Bioinformatics – Spring 2016
Fanfan Zeng & Roland Yap National University of Singapore Limsoon Wong
PCA, Clustering and Classification by Agnieszka S. Juncker
Volume 1, Issue 2, Pages (March 2002)
Lymphoma in Pediatrics 23rd Nov 2018
Presentation transcript:

Exciting Bioinformatics Adventures Limsoon Wong Institute for Infocomm Research

Plan Treatment optimization of childhood ALL Treatment prognosis of DLBC lymphoma Prediction of translation initiation site Prediction of vaccine target Reliability Assessment of Y2H expts

Treatment Optimization of Childhood Leukemia Image credit: FEER

Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Childhood ALL Major subtypes are: T- ALL, E2A-PBX, TEL-AML, MLL genome rearrangements, Hyperdiploid>50, BCR-ABL Diff subtypes respond differently to same Tx Over-intensive Tx –Development of secondary cancers –Reduction of IQ Under-intensiveTx –Relapse The subtypes look similar Conventional diagnosis –Immunophenotyping –Cytogenetics –Molecular diagnostics Unavailable in most ASEAN countries

Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Image credit: Affymetrix Single-Test Platform of Microarray & Machine Learning

Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Multidimensional Scaling Plot Subtype Diagnosis

Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong Is there a new subtype? Hierarchical clustering of gene expression profiles reveals a novel subtype of childhood ALL

Conclusions Conventional Tx: intermediate intensity to everyone  10% suffers relapse  50% suffers side effects  costs US$150m/yr Our optimized Tx: high intensity to 10% intermediate intensity to 40% low intensity to 50% costs US$100m/yr Copyright © 2004, 2005 by Jinyan Li and Limsoon Wong High cure rate of 80% Less relapse Less side effects Save US$51.6m/yr

References E.-J. Yeoh et al., “Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling”, Cancer Cell, 1: , 2002

Treatment Prognosis for DLBC Lymphoma Image credit: Rosenwald et al, 2002

Diffuse Large B-Cell Lymphoma DLBC lymphoma is the most common type of lymphoma in adults Can be cured by anthracycline-based chemotherapy in 35 to 40 percent of patients  DLBC lymphoma comprises several diseases that differ in responsiveness to chemotherapy Intl Prognostic Index (IPI) –age, “Eastern Cooperative Oncology Group” Performance status, tumor stage, lactate dehydrogenase level, sites of extranodal disease,... Not very good for stratifying DLBC lymphoma patients for therapeutic trials  Use gene-expression profiles to predict outcome of chemotherapy? Copyright © 2005 by Limsoon Wong. Adapted from Huiqing Liu

Knowledge Discovery from Gene Expression of “Extreme” Samples “extreme” sample selection: 8 yrs knowledge discovery from gene expression 240 samples 80 samples 26 long- term survivors 47 short- term survivors 7399 genes 84 genes T is long-term if S(T) < 0.3 T is short-term if S(T) > 0.7 Copyright © 2005 by Jinyan Li, Huiqing Liu, and Limsoon Wong

p-value of log-rank test: < Risk score thresholds: 0.7, 0.3 Kaplan-Meier Plot for 80 Test Cases Copyright © 2005 by Jinyan Li, Huiqing Liu, and Limsoon Wong

(A) IPI low, p-value = (B) IPI intermediate, p-value = Improvement Over IPI Copyright © 2005 by Jinyan Li, Huiqing Liu, and Limsoon Wong

(A) W/o sample selection (p =0.38) (B) With sample selection (p=0.009) No clear difference on the overall survival of the 80 samples in the validation group of DLBCL study, if no training sample selection conducted Merit of “Extreme” Samples Copyright © 2005 by Jinyan Li, Huiqing Liu, and Limsoon Wong

References H. Liu et al, “Selection of patient samples and genes for outcome prediction”, Proc. CSB2004, pages

Protein Translation Initiation Site Recognition

299 HSU CAT U27655 Homo sapiens CGTGTGTGCAGCAGCCTGCAGCTGCCCCAAGCCATGGCTGAACACTGACTCCCAGCTGTG 80 CCCAGGGCTTCAAAGACTTCTCAGCTTCGAGCATGGCTTTTGGCTGTCAGGGCAGCTGTA 160 GGAGGCAGATGAGAAGAGGGAGATGGCCTTGGAGGAAGGGAAGGGGCCTGGTGCCGAGGA 240 CCTCTCCTGGCCAGGAGCTTCCTCCAGGACAAGACCTTCCACCCAACAAGGACTCCCCT iEEEEEEEEEEEEEEEEEEEEEEEEEEE 160 EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE 240 EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE A Sample cDNA What makes the second ATG the TIS? Copyright © 2005 by Limsoon Wong

Approach Training data gathering Signal generation –k-grams, distance, domain know-how,... Signal selection –Entropy,  2, CFS, t-test, domain know-how... Signal integration –SVM, ANN, PCL, CART, C4.5, kNN,... Copyright © 2005 by Limsoon Wong

Amino-Acid Features Copyright © 2005 by Jinyan Li, Huiqing Liu, and Limsoon Wong

Amino-Acid Features Copyright © 2005 by Jinyan Li, Huiqing Liu, and Limsoon Wong

Amino Acid K-grams Discovered (by entropy) Copyright © 2005 by Jinyan Li, Huiqing Liu, and Limsoon Wong

Validation Results (on Hatzigeorgiou’s) Using top 100 features selected by entropy and trained on Pedersen & Nielsen’s dataset Copyright © 2005 by Limsoon Wong. Adapted from Huiqing Liu

ATGpr Our method Validation Results (on Chr X and Chr 21) Using top 100 features selected by entropy and trained on Pedersen & Nielsen’s Copyright © 2005 by Limsoon Wong. Adapted from Huiqing Liu

References L. Wong et al., “Using feature generation and feature selection for accurate prediction of translation initiation sites”, GIW 13: , 2002

Image credit: Asif Khan Vaccine Target Prediction

T-Cell Epitope Prediction Why? –Only 1%-5% of peptides from a protein bind to any one HLA molecule –Traditional approaches are slow, & inapplicable to large-scale screening  Computer Modeling –Enable systematic screening for HLA binders –Minimize number of expts –Reduce cost 10x Challenges: –There are ~2000 variants of HLA classified in ~20 supertypes –Relatively small number of expt data on peptides that bind HLA molecules –for majority of HLA molecules expt data do not exist H1 H4H3H2 P1 P2 P3 P4 Promiscuous peptides One supertype Copyright © 2005 by Limsoon Wong. Adapted from Asif Khan.

Multipred Approach Copyright © 2005 by Asif Khan, Guanglan Zhang, Vladimir Brusic

FP FN DR supertype Cut-off Threshold HCV IB protein sequence Copyright © 2005 by Asif Khan, Guanglan Zhang, Vladimir Brusic Expt Validation

Accuracy of Multipred Copyright © 2005 by Asif Khan, Guanglan Zhang, Vladimir Brusic

Conclusions Computer models are necessary to aid in identification of vaccine targets Prediction models built are both sensitive and specific MULTIPRED can identify promiscuous peptides and immunological hot-spots which are useful for vaccine design Hot-spots are ideal for development of epitope- based vaccines

References K.N. Srinivasan, et al. “Predictions of Class I T- cell epitopes: Evidence of presence of immunological hot spots inside antigens”, Bioinformatics, 20:i297-i302, 2004.

% of TP based on co-localization % of TP based on shared cellular role (I = 1) % of TP based on shared cellular role (I =.95) TP = ~50% Image credit: Sprinzak et al, 2003 Assessing Reliability of Protein-Protein Interaction Expts

Large disagreement betw methods Copyright © 2005 by Limsoon Wong. Adapted from Sprinzak et al, 2003 Some Protein Interaction Data Sets Can we find a way to rank candidate interacting pairs according to their reliability?

Copyright © 2005 by Limsoon Wong. Adapted from Chen et al, 2004 Some “Reasonable” Speculations A true interacting pair is often connected by at least one alternative path (reason: a biological function is performed by a highly interconnected network of interactions) The shorter the alternative path, the more likely the interaction (reason: evolution of life is through “add-on” interactions of other or newer folds onto existing ones)  Existence of a strong short alternative path connecting an interacting pair indicates that the interaction is “reliable”

Interaction Pathway Reliability Copyright © 2005 by Limsoon Wong. Adapted from Chen et al, 2004

The number of pairs not in the intersection of Ito & Uetz is not changed much wrt the ipr value of the pairs The number of pairs in the intersection of Ito & Uetz increases wrt the ipr value of the pairs Evaluation wrt Reproducible Interactions “ipr” correlates well to “reproducible” interactions “ipr” seems to work Copyright © 2005 by Limsoon Wong. Adapted from Chen et al, 2004

At the ipr threshold that eliminated 80% of pairs, ~85% of the of the remaining pairs have common cellular roles Evaluation wrt Common Cellular Role, etc “ipr” correlates well to common cellular roles, localization, & expression Copyright © 2005 by Limsoon Wong. Adapted from Chen et al, 2004

Evaluation wrt “Many-few” Interactions Number of “Many-few” interactions increases when more “reliable” IPR threshold is used to filter interactions Consistent with the Maslov-Sneppen prediction Part of the network of physical interactions reported by Ito et al., PNAS, 2001 Copyright © 2005 by Limsoon Wong. Adapted from Chen et al., 2004

Evaluation wrt “Cross-Talkers” A MIPS functional cat: –| 02 | ENERGY –| | glycolysis and gluconeogenesis –| | glycolysis methylglyoxal bypass –| | regulation of glycolysis & gluconeogenesis First 2 digits is top cat Other digits add more granularity to the cat  Compare non-co- localized high- & low- IPR pairs to find number that fall into same cat. More high-IPR pairs in same cat, then IPR works For top cat –148/257 high-IPR pairs are in same cat –65/260 low-IPR pairs are in same cat For fine-granularity cat –135/257 high-IPR pairs are in same cat. 37/260 low-IPR pairs are in same cat  IPR works  IPR pairs that are not co-localized are real cross-talkers! Copyright © 2005 by Limsoon Wong.

Conclusions There are latent local & global “motifs” that indicate the likelihood of protein interactions These motifs can be exploited in computational elimination of false positives from high- throughput Y2H expts Copyright © 2005 by Limsoon Wong.

References J. Chen et al, “Mining high-throughput experimental data for reliable protein interaction data using using network”, 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2004), Florida, November 15-17, 2004

Acknowledgements Childhood ALL: –Jinyan Li, Huiqing Liu –Allen Yeoh DLBC Lymphoma: –Jinyan Li, Huiqing Liu Translation Initiation: –Fanfan Zeng, Roland Yap –Huiqing Liu T-Cell Epitopes: –Vladimir Brusic, Asif Khan, Guanglan Zhang –Tom August, KN Srinivasan Protein Interaction Reliability: –Jin Chen, Mong Li Lee, Wynne Hsu –See-Kiong Ng –Prasanna Kolatkar, Jer- Ming Chia