CSCE555 Bioinformatics Lecture 23 Integrative Genomics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Slides:

Advertisements

Similar presentations

CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Advertisements

Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.

Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.

. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.

Gene regulatory network

Functions, Networks, and Phenotypes by Integrative Genomics Analysis Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern.

A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles Authors: Chia-Hao Chin 1,4,

Open Day 2006 From Expression, Through Annotation, to Function Ohad Manor & Tali Goren.

Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.

Threshold selection in gene coexpression networks using spectral graph theory techniques Andy D Perkins*,Michael A Langston BMC Bioinformatics 1.

Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.

The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.

. Class 1: Introduction. The Tree of Life Source: Alberts et al.

27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.

Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.

Functional genomics and inferring regulatory pathways with gene expression data.

Yeast Dataset Analysis Hongli Li Final Project Computer Science Department UMASS Lowell.

Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.

Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.

Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al

Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.

Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Finding Transcription Modules from large gene-expression data sets Ned Wingreen – Molecular Biology Morten Kloster, Chao Tang – NEC Laboratories America.

27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.

Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.

Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.

Bryan Heck Tong Ihn Lee et al Transcriptional Regulatory Networks in Saccharomyces cerevisiae.

Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.

Epistasis Analysis Using Microarrays Chris Workman.

Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.

Modeling the Gene Expression of Saccharomyces cerevisiae Δcin5 Under Cold Shock Conditions Kevin McKay Laura Terada Department of Biology Loyola Marymount.

Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.

341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London

ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE

Reconstruction of regulatory modules based on heterogeneous data sources Karen Lemmens PhD Defence September 29th 2008.

A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.

Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.

Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.

Graph and Topological Structure Mining on Scientific Articles Fan Wang, Ruoming Jin, Gagan Agrawal and Helen Piontkivska The Ohio State University The.

Networks and Interactions Boo Virk v1.0.

CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.

Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC.

Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.

HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.

Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.

Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.

CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.

Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.

Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.

DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.

Modeling Promoter and Untranslated Regions in Yeast Abstract T ranscriptional regulation is the primary form of gene regulation in eukaryotes. Approaches.

TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.

Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School

Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.

Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,

Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.

G LOBAL S IMILARITY B ETWEEN M ULTIPLE B IONETWORKS Yunkai Liu Computer Science Department University of South Dakota.

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.

Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.

1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION

CSCI2950-C Lecture 13 Network Motifs; Network Integration

Schedule for the Afternoon

Network Inference Chris Holmes Oxford Centre for Gene Function, &,

SEG5010 Presentation Zhou Lanjun.

Anastasia Baryshnikova Cell Systems

Principle of Epistasis Analysis

Presentation transcript:

CSCE555 Bioinformatics Lecture 23 Integrative Genomics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: University of South Carolina Department of Computer Science and Engineering

Outline: Integrative Genomics II Integrative Microarray Analysis The Data Sources Graph mining of co-expression relationships ◦ Codense ◦ Second order relationship analysis

DNA RNA Protein Gene Expression Datasets: the Transcriptome Oligonucleotide Array cDNA Array Protein abundance measurement (Mass Spec) Protein interactions (yeast 2-hybrid system, protein arrays) Protein complexes (Mass Spec) Information Flow Cellular systems 1995 Bacteria ~1.6K genes 1997 Eukaryote ~6K genes 1998 Animal ~20K genes 2001 Human 30~100K genes Objective: $1000 human genome

Rapid accumulation of microarray data in public repositories NCBI Gene Expression Omnibus EBI Array Express 137,231 experiments 55,228 experiments The public microarray data increases by 3 folds per year

Multiple Microarray Technology Platforms Microarray Platforms

---- Datasets Graph-based Approach for the Integrative Microarray Analysis gene experiments e g h i Annotation a b d f Functional Annotation Gene Ontology e g h i Transcriptional Annotation TF

Frequent Subgraph Mining Problem is hard! Problem formulation: Given n graphs, identify subgraphs which occur in at least m graphs (m  n) Our graphs are massive! (>10,000 nodes and >1 million edges) The traditional pattern growth approach (expand frequent subgraph of k edges to k+1 edges) would not work, since the time and memory requirements increase exponentially with increasing size of patterns and increasing number of networks.

Novel Algorithms to identify diverse frequent network patterns CoDense (Hu et al. ISMB 2005) ◦ identify frequent coherent dense subgraphs across many massive graphs Network Biclustering (Huang et al, ISMB 2007) ◦ identify frequent subgraphs across many massive graphs Network Modules (NeMo) (Yan et al. ISMB 2007) ◦ identify frequent dense vertex sets across many massive graphs

CODENSE: identify frequent coherent dense subgraphs across massive graphs

Identify frequent co-expression clusters across multiple microarray data sets c 1 c 2 … c m g 1.1.2….2 g 2.4.3….4 … c 1 c 2 … c m g 1.8.6….2 g 2.2.3….4 … c 1 c 2 … c m g 1.9.4….1 g 2.7.3….5 … c 1 c 2 … c m g 1.2.5….8 g 2.7.1….3 … a b c d e f g h i j k a b c d e f g h i j k a b c d e f g h i j k a b d e f g h i j k c......

The solution: CODENSE a novel algorithm, called CODENSE, to mine frequent coherent dense subgraphs. The target subgraphs have three characteristics: (1) All edges occur in >= k graphs (frequency) (2) All edges should exhibit correlated occurrences in the given graph set. (coherency) (3) The subgraph is dense, where density d is higher than a threshold  and d=2m/(n(n-1)) (density) m: #edges, n: #nodes

CODENSE: Mine coherent dense subgraph a b d e g h i c f summary graph Ĝ f a b d e g h i c G1G1 f a b c d e f g h i a b c d e f g h i a b c d e f g h i a b c d e f g h i a b c d e g h i G3G3 G2G2 G6G6 G5G5 G4G4 (1) Builds a summary graph by eliminating infrequent edges

(2) Identify dense subgraphs of the summary graph a b d e g h i c f summary graph Ĝ e g h i c f Sub( Ĝ) Step 2 MODES Observation: If a frequent subgraph is dense, it must be a dense subgraph in the summary graph. However, the reverse conclusion is not true. CODENSE: Mine coherent dense subgraph

(3) Construct the edge occurrence profiles for each dense summary subgraph e g h i c f Sub( Ĝ) Step 3 ………………… e-f c-i c-h c-f c-e G6G5G4G3G2G1 E edge occurrence profiles CODENSE: Mine coherent dense subgraph

………………… e-f c-i c-h c-f c-e G6G5G4G3G2G1 E edge occurrence profiles Step 4 c-f c-h c-e e-h e-f f-h c-i e-i e-g g-i h-i second-order graph S g-h f-i (4) builds a second-order graph for each dense summary subgraph CODENSE: Mine coherent dense subgraph

c-f c-h c-e e-h e-f f-h c-i e-i e-g g-i h-i second-order graph S g-h f-i Step 4 c-f c-h c-e e-h e-f f-h e-i e-g g-i h-i Sub(S) g-h (5) Identify dense subgraphs of the second-order graph Observation: if a subgraph is coherent (its edges show high correlation in their occurrences across a graph set), then its 2nd- order graph must be dense. CODENSE: Mine coherent dense subgraph

(6) Identify the coherent dense subgraphs c-f c-h c-e e-h e-f f-h e-i e-g g-i h-i Sub(S) g-h Step 5 c e f h e g h i Sub(G) CODENSE: Mine coherent dense subgraph

CODESNE Solution The identified subgraphs by definition satisfy the three criteria: (1)All edges occur in >= k graphs (frequency) (2)All edges should exhibit correlated occurrences in the given graph set. (coherency) (3)The subgraph is dense, where density d is higher than a threshold  and d=2m/(n(n-1)) (density) m: #edges, n: #nodes

CODENSE: Mine coherent dense subgraph

CODENSE The design of CODENSE can solve the scalability issue. Instead of mining each biological network individually, CODENSE compresses the networks into two meta-graphs (the summary graph and the second-order graph) and performs clustering in these two graphs only. Thus, CODENSE can handle any large number of networks.

c 1 c 2 … c m g 1.1.2….2 g 2.4.3….4 … c 1 c 2 … c m g 1.8.6….2 g 2.2.3….4 … c 1 c 2 … c m g 1.9.4….1 g 2.7.3….5 … c 1 c 2 … c m g 1.2.5….8 g 2.7.1….3 … a b c d e f g h i j k a b c d e f g h i j k a b c d e f g h i j k a b d e f g h i j k c a b c d e f g h i j k a b c d e f g h i j k a b c d e f g h i j k a b d e f g h i j k c Applying CoDense to 39 yeast microarray data sets

Functional annotation Annotation

Functional Annotation (Validation) Method: leave-one-out approach - masking a known gene to be unknown, and assign its function based on the other genes in the subgraph pattern. Functional categories: 166 functional categories at GO level at least 6 Results: 448 predictions with accuracy of 50%

Functional Annotation (Prediction) We made functional predictions for 169 genes, covering a wide range of functional categories, e.g. amino acid biosynthesis, ATP biosynthesis, ribosome biogenesis, vitamin biosynthesis, etc. A significant number of our predictions can be supported by literature.

Functional annotation Annotation We made functional predictions for 779 known and 116 unknown genes by random forest classification with 71% accuracy. Variables for random forest classification: functional enrichment P-valuenetwork topology score network connectivitypattern recurrence numbers average node degreeunknown gene ratio Network size

Reconstruct transcriptional cascades by second-order analysis Zhou et al. Nature Biotech 2005

Frequently occurring tight clusters

Transcription Factors

Co-occurrence of tight clusters Coexpression network constructed with the dataset 1

Co-occurrence of tight clusters Coexpression network constructed with the dataset 2

Co-occurrence of tight clusters Coexpression network constructed with the dataset 3

Co-occurrence of tight clusters Coexpression network constructed with the dataset 4

Co-occurrence of tight clusters Coexpression network constructed with the dataset 5

Coexpression Networks Transcription Factors Set 1 Transcription Factor Set 2 Cooperativity

Relevance Networks

Three types of transcription cascades TF1 TF2 TF3 Type I Type II gene 2 gene 1 gene 3 gene 4 gene 5 TF1 TF2 gene 2 gene 1 gene 3 gene 4 gene 5 Type III TF1 TF2 gene 2 gene 1 gene 3 gene 4 gene 5 Transcription Regulation Protein Interaction

Applying to 39 yeast microarray data sets We identified 60 transcription modules. Among them, we found 34 pairs that showed high 2nd- order correlation. A significant portion (29%, p- value<10-5 by Monte Carlo simulation ) of those modules pairs are participants in transcription cascades: 2 pairs in Type I, 8 pairs in Type II, and 3 pairs in type III cascades. In fact, these transcription cascades inter-connect into a partial cellular regulatory network.

CDC45, RAD27, RNR1, SPT21, YPL267W YBL113C YRF1- 1 YRF1-7 SWI5 YEL077C YRF1-3 YRF1-7 ALK1 BUD4 CDC20 CDC5 CLB2 HST3 SWI5 YIL158W YJL051W YOR315W MSN4 NDD1 YBL113C YIL177C YJL225C YLR464W YML133C YRF1-1 YRF1-3 YRF1-4 YRF1-5 YRF1- 6 YRF1-7 BUD19 RPL13A RPL13B RPL16A RPL24B RPL28 RPL2B RPL33B RPL39 RPL42B RPS16B RPS18A RPS21B RPS22A RPS23B RPS4B RPS6A MET4 GCN4 MET10 MET14 MET28 SUL2 ALD5 ARG1 ARG2 ARG3 ARG4 ARG5,6 ARO1 ARO3 ARO4 ASN1 BNA1 CPA2 DED81 ECM40 HIS1 HIS4 HOM3 LEU4 LEU9 LYS20 MET22 ORT1 PCL5 TEA1 TRP2 YBR043C YDR341C YHM1 YHR162W YJL200C YJR111C Leu3 BAT1 ILV2 LEU9 SWI4 MBP1 CLB6 RNR1 SPT21 YPL267W CDC21 CDC45 CLB5 CLB6 CLN1 GIN4 IRR1 MCD1 MSH6 PDS5 RAD27 RNR1 SPO16 SPT21 SWE1 TOF1 YBR070C HGH1 LOC1 NOC3 YDR152W YHL013C YBL113C YRF1-1 YRF1-3 YRF1-7 SWI6 RCS1 GAT3 YAP5 PDR1 GAL4 RGM1 GAS1 HTA1 HTB1 Regulation of transcription modules by transcription factors, based on ChIP-chip data and supported by recurrent expression clusters Regulation between transcription factors, based on ChIP-chip data and supported by 2nd-order expression correlation Protein interactions between two transcription factors, based on experimental data and supported by 2nd-order expression correlation Two transcription modules with high 2nd-order expression correlations

Integrative Array Analyzer (iArray): a software package for cross- platform and cross-species microarray analysis Bioinformatics. 22(13):1665-7

Acknowledgement This slides are adapted from the following presentation: Xianghong Jasmine Zhou, Molecular and Computational Biology University of Southern California. Functions, Networks, and Phenotypes by Integrative Genomics Analysis IMS, NUS, Singapore, July 18, 2007