Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Predicting essential genes via impact degree on metabolic networks ISSSB’11 Takeyuki Tamura Bioinformatics Center, Institute for Chemical Research Kyoto.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Fast Algorithms For Hierarchical Range Histogram Constructions
Analysis and Modeling of Social Networks Foudalis Ilias.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Directional triadic closure and edge deletion mechanism induce asymmetry in directed edge properties.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Mutual Information Mathematical Biology Seminar
Introduction to Bioinformatics Algorithms Clustering.
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Global topological properties of biological networks.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Data Mining Presentation Learning Patterns in the Dynamics of Biological Networks Chang hun You, Lawrence B. Holder, Diane J. Cook.
Introduction to Bioinformatics - Tutorial no. 12
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Protein Classification A comparison of function inference techniques.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Gene expression & Clustering (Chapter 10)
The importance of enzymes and their occurrences: from the perspective of a network W.C. Liu 1, W.H. Lin 1, S.T. Yang 1, F. Jordan 2 and A.J. Davis 3, M.J.
ANALYZING PROTEIN NETWORK ROBUSTNESS USING GRAPH SPECTRUM Jingchun Chen The Ohio State University, Columbus, Ohio Institute.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Qiong Cheng, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Oct IEEE 7 th International Conference on BioInformatics.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Introduction to biological molecular networks
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Cluster validation Integration ICES Bioinformatics.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Gene Expression Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Tommy Messelis * Stefaan Haspeslagh Burak Bilgin Patrick De Causmaecker Greet Vanden Berghe *
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Clustering Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Structures of Networks
Biological networks CS 5263 Bioinformatics.
Network biology : protein – protein interactions
Building and Analyzing Genome-Wide Gene Disruption Networks
Anastasia Baryshnikova  Cell Systems 
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Clustering The process of grouping samples so that the samples are similar within each group.
Clustering.
Presentation transcript:

Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002 – an oxford journal Presented by: Ohad Shai – Big data seminar 2013 bioinformatics Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Agenda Introduction Constructing biological networks graph Constructing gene’s expression graph Combining both into a single graph Divide nodes to groups using hierarchical clustering Decide the number of clusters by statistical measures with biological meaning Validate process using the yeast diauxic shift Conclusions 2 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only What is gene expression Genes are sequences in the DNA Genes are the recipe for creating Proteins 3 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only What is gene expression Genes are sequences in the DNA Genes are the recipe for creating Proteins DNA chips / Microarrays tools used to measure gene expression in cellular processes 4 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only What is gene expression Genes are sequences in the DNA Genes are the recipe for creating Proteins DNA chips / Microarrays tools used to measure gene expression in cellular processes Cluster by: – Genes: find related genes, predict function of genes – Samples: find related samples (as in previous paper) 5 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only What is gene expression Genes are sequences in the DNA Genes are the recipe for creating Proteins DNA chips / Microarrays tools used to measure gene expression in cellular processes Cluster by: – Genes: find related genes, predict function of genes – Samples: find related samples (as in previous paper) Sequential evaluation is sub- optimal (also seen in previous paper) 6 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only What is biological network A graph that describe biological features / processes Bioinformatics has increasingly shifted its focus from individual genes and proteins to large-scale networks Network topology might be similar to other non-biological networks such as the Internet For example: Scale free network - a network whose degree distribution follows a power law – Small amount of nodes with many connections (hubs) 7 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Metabolic networks Metabolism is the set of life-sustaining chemical transformations within the cells of living organisms These reactions allow organisms to grow and reproduce, maintain their structures, and respond to their environments Catalyzed by enzymes – Enzyme is a protein that accelerate metabolic reactions (ie: change one chemical to another) Each reaction: – Educt – Product – Enzyme 8 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Metabolic networks In the paper, KEGG database’ metabolic networks were usedKEGG Not only humans metabolism 9 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Previous evaluation methods The goal is to find genes, enzymes and metabolites related to a biological process (pathway) Sequential evaluation is sub- optimal (also seen in previous paper) Pathway scoring is used to add data Previous techniques suffers from combinatorial explosion or static analysis of data 10 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Agenda Introduction Constructing biological networks graph Constructing gene’s expression graph Combining both into a single graph Divide nodes to groups using hierarchical clustering Decide the number of clusters by statistical measures with biological meaning Validate process using the yeast diauxic shift Conclusion 11 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Transform metabolic network to graph Assemble set of reactions into a Petri-net A bi-partite graph – Molecules (metabolites and enzymes) on one side – Chemical reactions on the other side Enzyme can have more than one vertex – Related according to proximity 12 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Transform metabolic network to graph Assemble set of reactions into a Petri-net A bi-partite graph – Molecules (metabolites and enzymes) on one side – Chemical reactions on the other side Enzyme can have more than one vertex – Related according to proximity The graph is undirected Each edge has weight w The graph is the weighted distance graph, only with vertices that have gene expression data – How to do that? 13 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only How to create Remove vertices without expression data O(V^2) – Modify edges weight to all pair of vertices that are connected by removed vertex, if the connection has lower weight Run Floyd-Warshall on remaining vertices O(W^3) 14 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Floyd-Warshall Uses dynamic programming Algorithm: 15 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Floyd Warshall Algorithm - Example Consider Vertex 3: Nothing changes. Consider Vertex 2: D(1,3) = D(1,2) + D(2,3) Consider Vertex 1: D(3,2) = D(3,1) + D(1,2) Original weights. 16 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Floyd-Warshall Uses dynamic programming Algorithm: 17 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only How to create Remove vertices without expression data O(V^2) – Modify edges weight to all pair of vertices that are connected by removed vertex, if the connection has lower weight Run Floyd-Warshall on remaining vertices O(W^3) Alternatively, can use Dijkstra on all vertices O(WVlogV+WE) – Has better worse-time but actual run was slower The number of interesting vertices is relatively small 18 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Weight of edges (distance function) 2 alternatives suggested – Fixed weight – Normalized by degree The weight is decided by the degree of the molecule vertex since the graph is “scale free” (ie: few hubs), that weighting prevents exploiting hubs 19 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Next step Calculated graph – Calculate weight – Remove unneeded vertices – Run Floyd-Warshall to calculate min distance for all pairs Now we need to calculate gene expression graph… 20 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Gene expression graph The distance (weight) is calculated by Pearson correlation coefficient between time courses of gene expression levels – This was explained in previous class 21 Co-clustering of biological networks and gene expression data Gik – The value for gene Gi in time k

Intel Confidential – Internal Only Gene expression graph The distance (weight) is calculated by Pearson correlation coefficient between time courses of gene expression levels – This was explained in previous class – The authors used log of the ratio (the values range is 0-2) Anti-correlated genes are most distant The graph notation is 22 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Co-Clustering Create 2 graphs – Gene expression distance – Metabolic distance Combine them to one graph – For that a mapping of genes enzymes is required 23 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Mapping genes enzymes mapping is required For yeast, mapping is taken from MIPS databaseMIPS The mapping is *not* one to one Each new object representation consists of: Protein, Enzyme classification and Reaction – For example: (HXK1, EC /R01786) There is a translation between each object to enzyme and gene. Now we have to calculate a distance function 24 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Combined distance function It should match defined properties – Perfect correlation should not result very small distance – Missing links should not lead to very high distances Sum of 2 logistic curves was selected Logistic curve: – The initial stage of growth is approximately exponential; then, as saturation begins, the growth slows, and at maturity, growth stops. 25 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Combined distance function Sum of 2 logistic curves was selected 26 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Co-Clustering Create 2 graphs – Gene expression distance – Metabolic distance Combine them to one graph – For that a mapping of genes enzymes is required Cluster the graph 27 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Clustering Hierarchical average linkage clustering (UPGMA) – The result is a binary tree – Each node is a join step – The distance between nodes is pair-wise average of weights It might be time consuming (not mentioned in the paper) 28 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Clustering Hierarchical average linkage clustering (UPGMA) – The result is a binary tree – Each node is a join step – The distance between nodes is pair-wise average of weights It might be time consuming (not mentioned in the paper) 29 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Clustering – determine # of clusters We want to select # of clusters, such that clusters will have biological meaning Used measures are separation and tightness Calculated by silhouette-coefficient 30 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only silhouette-coefficient i – node in cluster A Average distance from A Average distance from Other nearest cluster Silhouette coefficient Range [-1,1] 31 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only silhouette-coefficient i – node in cluster A Average distance from A Average distance from Other nearest cluster Silhouette coefficient Range [-1,1] 32 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only silhouette-coefficient Silhouette coefficient Range [-1,1] 33 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Co-Clustering Create 2 graphs – Gene expression distance – Metabolic distance Combine them to one graph – For that a mapping of genes enzymes is required Cluster the graph Decide number of clusters by silhouette-coefficient Analyze the results 34 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Results Used data of diauxic shift of yeast – Yeast transforms into a sugar-rich medium – 7 time points for gene expression – Expect that the co-clustering will detect relations in that metabolic process 35 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Results The glycolysis pathway is much closer (less distant) than average – hence it will cluster together 36 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Results silhouette-coefficient A – same EC number B – pathway like clusters 37 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Results 38 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Results 39 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only Conclusion – Co-clustering Create 2 graphs with distance function – Normalized # of connections in metabolic network (and Floyd- Warshall) – PCC in gene expression network Join them – Mapping to objects – Distance is sum of 2 logistic curve Cluster by hierarchical average linkage clustering Decide number of clusters using silhouette coefficient Can be elaborated to other networks Outperform other pathway scoring methods 40 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only 41 Co-clustering of biological networks and gene expression data

Intel Confidential – Internal Only 42 Co-clustering of biological networks and gene expression data