Lecture 8. Topics in Biological Networks (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology.

Lecture 8. Topics in Biological Networks (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

Lecture outline 1.Definition and different types of biological networks 2.Some high-throughput experimental methods for probing biological networks – Important databases 3.Some computational methods for reconstructing biological networks 4.Data analysis – Analyzing the networks – Using the networks to analyze other data – Visualization and analysis tools Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 20152

DEFINITION AND TYPES Part 1

Biological networks A biological network is represented by a graph G=(V, E) – V: a set of nodes (vertices). Each node v i  V represents an object A gene, protein, metabolite, drug,... – E: a set of edges. Each edge e ij  E connects two nodes v i and v j, and represents a relationship between the two objects Protein-protein interaction (PPI), gene regulation,... Undirected (e ij  E  e ji  E, e.g., PPI) or directed (e ij  E does not imply e ji  E, e.g., gene regulation) – May have additional node and edge attributes such as confidence of interaction Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 20154 v1v1 v2v2 v3v3 v4v4

Network types Gene regulatory networks [project] – Transcription factor binding Promoters Distal regulatory elements – Micro-RNA Co-expression networks Protein-protein interaction networks (lecture) Genetic interaction networks [project] Metabolic networks [project] Gene-drug interaction networks [project] Signaling networks Neural networks Disease transmission networks Phylogenetic networks Food web... Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 20155 Molecular: DNA Inter-species Multi-cellular Inter-organism Molecular: RNA Cellular: pathways Molecular: proteins Gene regulatory networks [project] – Transcription factor binding Promoters Distal regulatory elements – Micro-RNA [project] Co-expression networks Protein-protein interaction networks Genetic interaction networks [project] Metabolic networks [project] Gene-drug interaction networks [project] Signaling networks Neural networks Disease transmission networks Phylogenetic networks Food web...

TF regulatory networks Each node represents a gene and the protein(s) that it encodes An edge e ij exists if v i represents a transcription factor (TF) and it regulates the gene represented by v j – Edges are directed – Edges should be signed (activation vs. repression) – although this information is usually unavailable – May have edge weights to indicate confidence – Should record only direct regulation – The network itself does not provide information about the relationships between different edges Other types of gene regulatory (e.g., miRNA) networks are defined in similar ways Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 20156 Image credit: Deneris and Wyler., Nature Neuroscience published online 26 February 2012

Co-expression networks Each node represents a gene An edge e ij exists if the genes represented by v i and v j co-express – Co-expression could be measured by correlation across multiple samples/conditions May have edge weights to represent degree of co-expression – Edges are usually undirected Unless measures like expression ranks are used – Usually more meaningful to measure protein abundance, but easier to measure RNA level – Co-expression may suggest functional relationships Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 20157 Image credit: Prieto et al., PLoS One 3(12):e3911, (2008) Node color indicates some network statistics to be explained later.

Protein-protein interaction (PPI) networks Each node represents a protein An edge e ij exists if the proteins represented by v i and v j physically interact – Edges are undirected – Usually not distinguishing between permanent and transient interactions – In some datasets/databases, e ij simply indicates that both the proteins represented by v i and v j participate in a complex, but they may not physically interact directly – Usually not considering whether it is possible for the different interactions to happen simultaneously – There are networks for specific types of interactions, e.g., phosphorylation networks Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 20158 Human Calcineurin heterodimer (1AUI) Image source: RCSB Protein Data Bank

Genetic interaction networks The term “genetic interaction” in general means any types of relationship between genes Specifically, it has been used to describe some particular types of scenarios: – Each node represents a gene – An edge e ij exists if the growth rate of the cell is affected by the knockout/knockdown/overdose of the genes as shown in the table – Depending on the type, the edges can be directed or undirected Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 20159 TypeDefinition Synthetic lethality0=ij<i,j Synthetic sick0<ij<i,j Synthetic rescue0?=i<ij Dosage lethality0=ij*<i Dosage sick0<ij*<i Dosage rescue0?=i<ij* Phenotypic enhancementij<E[ij] Phenotypic suppressionE[ij]<ij Image credit: Drees et al., Genome Biology 6(4):R38, (2005) *: overdose

Metabolic pathways Each node is a metabolite An edge e ij exists if there is a reaction that turns the metabolite represented by v i to the metabolite represented by v j – Edges are directed – Both e ij and e ji exist if the reaction is reversible – Each edge is labeled by the enzyme that accelerates the reaction in the cell There is a dual representation, in which each node is a reaction, and an edge e ij exists if the reaction represented by v i produces a product that is a substrate of the reaction represented by v j Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201510

Metabolic pathways: an example Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201511 Image source: Kyoto Encyclopedia of Genes and Genomes

Metabolic pathways: an example Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201512 Image source: Kyoto Encyclopedia of Genes and Genomes

Signaling pathways Describing the events that happen in a cell in response to an external signal A heterogeneous network involving different types of data – Protein-protein interaction Phosphorylation – Gene regulation –... Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201513 Image source: Kyoto Encyclopedia of Genes and Genomes

Handling many types of relationship Need a systematic way to represent the many different types of relationship Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201514 Image credit: Lu et al., Trends in Biochemical Sciences 32(7):320-331, (2007)

Phylogenetic networks Generalization of phylogenetic trees, allowing non-tree structures (i.e., cycles, due to for example horizontal gene transfers) Each node is a species/clade An edge e ij exists if the species represented by v j was diverged from/received genetic materials from the species represented by v i – Network based on a single gene vs. network based on the whole genome of a species Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201515 Image credit: Wikipedia; Smets and Barkay, Nature Reviews Microbiology 3(9):675-678, (2005) Phylogenetic tree Phylogenetic network

HIGH-THROUGHPUT EXPERIMENTAL METHODS Part 2

Probing gene regulatory networks Transcription factor binding targets – Chromatin immunoprecipitation followed by Microarray (ChIP-chip) Sequencing (ChIP-seq) miRNA targets – Over-expression/silencing of miRNA, followed by profiling of changes in mRNA/protein levels Including direct and indirect targets – Cross-linking immunoprecipitation-high- throughput sequencing (CLIP-seq) Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201517

PPI: Yeast-two-hybrid (Y2H) To test whether two proteins physically interact Fuse one protein with a DNA binding domain (BD) Fuse the other with an activation domain (AD) If the two proteins physically interact, a reporter gene is expressed Can fix the first protein (the “bait”), and try many different second proteins (the “preys”) Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201518 Image source: Wikipedia

Protein complex: TAP-MS Tandem affinity purification followed by mass spectrometry – Adding a TAP tag to a bait protein – The protein and other proteins that bind to it (directly or indirectly) bind to IgG beads, while other proteins are washed away – The identity of the proteins pulled-down in this way can be determined by mass spectrometry Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201519 Image source: Wikipedia

Synthetic lethality There are different methods One of them is Synthetic Genetic Array (SGA) – Create single mutation strains of different mating types – Mate and select for double mutation – Growth rate measured by visual inspection or image analysis of colony size Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201520 Image credit: Tong et al., Science 294(5550):2364-2368, (2001)

Databases There are many databases for biological networks BioGrid is a general database for various types of interactions in multiple species Gene Expression Omnibus (GEO) contains a lot of gene expression data The Kyoto Encyclopedia of Genes and Genomes (KEGG) contains information about pathways The Protein Data Bank (PDB) contains some crystal structures about interacting biological objects There are species-specific databases – Human Protein Reference Database (HPRD) – Saccharomyces Genome Database (SGD) –... There are also databases that integrate other databases – Biological Networks database (IntegromeDB) Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201521

File formats Two main ways to store matrices: – Adjacency matrix – Adjacency list Since most biological networks are sparse, adjacency list is more commonly used Simplest formats: – – Simple interaction file (SIF): – XML – Formats with visualization information (e.g., GML) (See http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats for some commonly used formats) http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201522

COMPUTATIONAL NETWORK RECONSTRUCTION METHODS Part 3

Problem definition Network reconstruction as a machine learning problem – As a low-cost supplement to experimental methods Inputs – A set of nodes V, each node v i described by a vector of features x i – Each node pair (v i, v j ) described by a (potentially empty) vector of features z ij – A (potentially empty) set of positive example edges E +  V  V (ideally E +  E) – A (potentially empty) set of negative example edges E -  V  V Goal: For each node pair (v i, v j ), determine whether the edge (v i, v j ) is in the unknown set of edges, E Evaluating accuracy of predictions: – Cross-validation (using some examples for training and some for testing. Repeat for different training/testing splits) – Functional enrichment analysis – Experimental validation Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201524

Example: TF regulation Inputs – V: the set of all genes (TFs and non-TFs), each node v i described by a vector x i of node features: Expression level of the gene at different time points Sequence at the promoter region of the gene... – Each node pair (v i, v j ) is described by a vector z ij of features: (If v i represents a TF) Binding signal of the TF represented by v i at the promoter region of the gene represented by v j (If v i represents a TF) Expression of the gene represented by v j when the gene represented by v i is knocked out/down... – In some settings, there are no input positive examples – Usually there are no negative examples Goal: Determine which gene each TF regulates (and how, i.e., activation vs. repression, coefficients, etc.) Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201525

Some common difficulties Big data size (O(n 2 ) number of node pairs for n nodes) – Long computational time – Large memory consumption Small number of positive examples Noisy positive examples (false positives) Lack of negative examples How node features should be used to predict edges is not trivial Weak features Non-linear relationship between features and class (interaction/no interaction) Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201526

DATA ANALYSIS Part 4

How to interpret these hair balls? Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201528 Image credit: Zhu et al., Genes & Development 21(9):1010-1024, (2007) Transcription factor bindingProtein-protein interactions Phosphorylation MetabolicGenetic interactions

Interpreting biological networks Network statistics – Identifying important nodes/edges Network generation process – Understanding the formation/evolution of networks Network modules – Identifying functional object groups Network motifs – Understanding working principles Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201529

Network statistics Some statistics about a network: – Degree of a node: number of edges incident on the node In-degree and out-degree for a directed graph – Clustering coefficient of a node, what fraction of the neighbors of the node is connected – Shortest path length between two nodes – Eccentricity of a node: the maximum of its shortest path lengths to all other nodes – Betweenness of a node: number of shortest paths that involve the node Similar definition for the betweenness of an edge Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201530

Identifying important objects A hub is an object with a large degree – It is likely important as if it is disrupted, many interactions could be affected A bottleneck is an object with a large betweenness – It is likely important as if it is disrupted, the information flow between many node pairs could be affected Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201531 Image credit: Yu et al., PLoS Computational Biology 3(4):e59, (2007)

Degree distribution It has been found that in many biological (and non-biological networks), the degree distribution has a long tail – Most nodes have few interactions – A few nodes have many interactions – It has been proposed that these networks are “scale-free”, where the degree distribution follows a power law: P(k) ~ ck -  (usually 2 <  < 3) Preferential attachment is one way to produce a scale-free network – The rich becomes richer, the poor becomes poorer Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201532 An Erdős-Rényi random network A scale-free network Image source: Wikipedia

Identifying important pathways In functional enrichment analysis, we check if an unexpectedly large fraction of genes in a target set share a common annotation This idea can be generalized: whether the genes in a target set are unexpectedly similar to each other A biological network provides a natural way to compute similarity – Finding cluster of genes with many direct connections (similar to finding protein complexes from PPI) Alternatively, finding such highly-connected modules could suggest gene sets for performing standard functional enrichment analysis – Finding cluster of genes that are close to each other in the network – Finding genes (in the target set or not) that are close to the genes in the target set Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201533

Network modules Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201534 Image credit: Palla et al., Nature 435(7043):814-818, (2005)

Network modules Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201535 Image credit: Costanzo et al., Science 327(5964):425-431, (2010)

Genetic interaction network Between-pathway vs. within-pathway explanations for negative interactions (phenotype of double knock-out worse than the expected one based on the two single knock-outs): Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201536 Image credit: Dixon et al., Annual Review in Genetics 43(1):601-625, (2009)

Phenotype-associating sub-networks A biological network can also be used to find consistent signals in sub- networks (and average out noise) Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201537 Image credit: Chuang et al., Molecular Systems Biology 3:140, (2007)

Biological networks and network motifs Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201538 Image credit: Milo et al., Science 298(5594):824-827, (2002)

Statistical significance of a motif To evaluate whether a pattern is over-represented, we want to know how many such patterns would be found in a “random” network How to form a random network? – Erdos-Renyi random graphs: define the nodes, then each edge appears with a certain probability Not close to reality in many cases – Price/Barabasi-Albert model: add the nodes one by one, where the chance for the new node to connect to an old node is proportional to the number of edges the old node already had Closer to reality – Permuting the graph by reconnecting edges Preserving the total number of nodes Preserving the total number of edges Preserving the number of edges of each node Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201539

Statistical significance Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201540 Image credit: Milo et al., Science 298(5594):824-827, (2002)

Actual numbers observed Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201541 Image credit: Milo et al., Science 298(5594):824-827, (2002)

Possible functions of network motifs A coherent feed- forward loop can reject rapid variations in the input, so that output is produced only when there is a persistent input A single input motif (SIM) can turn on and turn off several downstream devices at different time according to their activation thresholds Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201542 Image credit: Shen-Orr et al., Nature Genetics 31(1):64-68, (2002) X  Y  Z

Visualization tools Aisee – Tool for generating network figures in vector format Cytoscape – one of the most popular tool, a visualization tool and a platform with many open- source plugins for various types of analysis JUNG N-Browse Osprey Pajek tYNA... Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201543

Analysis tools Some of the tools listed on the last slide GeneSpring – Popular tool for pathway analysis (commonly used for microarray data) GraphWeb HCE, Weka,... (for clustering and other types of data mining/machine learning tasks) NetBox Pandora (See http://wiki.reactome.org/index.php/Reactome_Resource_Guide for a long list of tools) http://wiki.reactome.org/index.php/Reactome_Resource_Guide Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201544

Summary There are many types of biological networks – Gene regulatory – Protein-protein interaction – Metabolic –... There are high-throughput experimental methods for identifying the interactions There are also many computational methods for supplementing the noisy networks from experimental data Networks can be used to study object relationships, identifying important objects and modules, and associations with a phenotype Last update: 22-Oct-2015CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 201545

Lecture 8. Topics in Biological Networks (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology.

Similar presentations

Presentation on theme: "Lecture 8. Topics in Biological Networks (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 8. Topics in Biological Networks (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology.

Similar presentations

Presentation on theme: "Lecture 8. Topics in Biological Networks (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology."— Presentation transcript:

Similar presentations

About project

Feedback