Report due: March 31, electronically submit, pdf format. Find a computational research article (5 pages or more) from one of the following journals: Bioinformatics.

Slides:



Advertisements
Similar presentations
Network analysis Sushmita Roy BMI/CS 576
Advertisements

DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng.
Statistical methods and tools for integrative analysis of perturbation signatures Mario Medvedovic Laboratory for Statistical Genomics and Systems Biology.
Analysis and Modeling of Social Networks Foudalis Ilias.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
Emergence of Scaling in Random Networks Barabasi & Albert Science, 1999 Routing map of the internet
The Barabási-Albert [BA] model (1999) ER Model Look at the distribution of degrees ER ModelWS Model actorspower grid www The probability of finding a highly.
A Real-life Application of Barabasi’s Scale-Free Power-Law Presentation for ENGS 112 Doug Madory Wed, 1 JUN 05 Fri, 27 MAY 05.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Network Statistics Gesine Reinert. Yeast protein interactions.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Evidence for dynamically organized modularity in the yeast protein- protein interaction network Han, et al
Advanced Topics in Data Mining Special focus: Social Networks.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Bayesian Networks Alan Ritter.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Network Analysis and Application Yao Fu
Gene Set Enrichment Analysis (GSEA)
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Biological Pathways & Networks
ANALYZING PROTEIN NETWORK ROBUSTNESS USING GRAPH SPECTRUM Jingchun Chen The Ohio State University, Columbus, Ohio Institute.
Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Reconstructing gene networks Analysing the properties of gene networks Gene Networks Using gene expression data to reconstruct gene networks.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Statistical Testing with Genes Saurabh Sinha CS 466.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Introduction to biological molecular networks
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Constructing and Analyzing a Gene Regulatory Network Siobhan Brady UC Davis.
1 Lesson 12 Networks / Systems Biology. 2 Systems biology  Not only understanding components! 1.System structures: the network of gene interactions and.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Information Retrieval Search Engine Technology (10) Prof. Dragomir R. Radev.
Biological Network Analysis
Canadian Bioinformatics Workshops
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Structures of Networks
Biological networks CS 5263 Bioinformatics.
Statistical Testing with Genes
Biological Networks Analysis Degree Distribution and Network Motifs
Building and Analyzing Genome-Wide Gene Disruption Networks
Schedule for the Afternoon
Static properties of transcription factors (TFs) within the hierarchical framework. Static properties of transcription factors (TFs) within the hierarchical.
Statistical Testing with Genes
Presentation transcript:

Report due: March 31, electronically submit, pdf format. Find a computational research article (5 pages or more) from one of the following journals: Bioinformatics BMC Bioinformatics Genome Research Journal of Proteome Research Nucleic Acids Research The article needs to be published after 1/1/2015. Write a report based on the study of the paper, and its citations.

Report due: March 31, electronically submit, pdf format. Requirements: 8 Pages, 1’’ margin, 1.5 line spacing not including figures/tables. Figures/tables need to be attached at the end of the document. Include (but not limited to) the following components: Background and significance of the work. What’s the technical improvement of the work over previous works? What could have been done better? If you were the authors, what’s your next step to extend this work?

Networks in Bioinformatics General Characteristics Directed Acyclic Graph and Gene Ontology Defining distances on DAGs Network and expression data Testing on an existing network Reverse engineering of networks

Network / Graph A network is a set of vertices connected by edges. undirected edges  “undirected network” directed edges  “directed network”. Vertex-level characteristic: The number of connections to a vertex : “degree” Incoming edges  “in-degree” k i Outgoing edges  “out-degree” k o k=k i +k o kiki koko Evolution of networks. S.N. Dorogovtsev, J.F.F. Mendes

Network Network-level characteristics: Number of vertices: N Number of edges: L Number of loops: I For an undirected network: I=L-N+1 Degree: The distribution of vertex degrees

Network Distribution of shortest path: ℓ μν is the shortest path between nodes u and v The mean value is called the “diameter” of the network Clustering coefficient: For each vertex, the fraction of existing connections between nearest neighbors of the vertex: C (μ) ≡ y (μ) /[z (μ) (z (μ) − 1)/2], z (μ) : Number of neighboring vertices y (μ) : Number of edges between the neighboring vertices Clustering coefficient C is the mean of C (μ)

Scale-free Network Scale-free network: The degree distribution follows the power law: Few nodes are of high degree, while most nodes are of low degree. Contrast: random edge generation yields Poisson distribution.

Scale-free Network Nature 406(6794):378. Quote from the figure legend: Both networks contain 130 nodes and 215 links. Red, the five nodes with the highest number of links; green, their first neighbours.

A large number of real-world networks, including biological networks are found to have power law degree distribution. Some nodes serve as “hubs”. This makes sense for WWW, social networks, and for biological networks, where controllers like the transcription factors are well known. Scale-free networks are “ultra small-world” – most nodes can reach one another in a few steps. These networks exhibit “high tolerance to random perturbations but are sensitive to targeted attack on the highly connected nodes”.

One way to generate a network with such distribution is the “rich get richer” model by Barabási and Albert (1999): Initiate a network, with degree ≥ 1 for each node; Add new node to the network, linking to existing nodes with probabilities:, where k i is the degree of the node. Higher-degree nodes are more likely to gain new connections. Scale-free Network

The protein-protein interaction network is a scale-free network. S. Wuchty, E. Ravasz and A.-L. Baraba¶si: The Architecture of Biological Networks Scale-free Network

Bioinformaticians’ interest in network Characterizing the structure of biological networks, and find functional and evolutionary implications. (a) Ashbya gossypii ATCC (b) Burkholderia sp. (c) human Scientific Reports 5, (2015) “The compounds that have multiple pathways to the core compounds are less likely to cause diseases than the compounds without multiple pathways.”

Bioinformaticians’ interest in network Characterizing the behavior of network nodes/subnetworks on an existing network. BMC Genomics,15:314

Bioinformaticians’ interest in network Reverse engineering of networks based on observations of gene expression behavior – inference of regulatory relations. Current Genomics, 2015, 16, 3-22

Bioinformaticians’ interest in network Disease etiology

Bioinformaticians’ interest in network Disease etiology

Testing on the network Goal: Utilize existing network to aid biomarker selection (“network marker”) disease mechanism finding predictive model building Data: A network between biological units Signal transduction network Genetic interaction network Protein-protein interaction network TF regulatory network …… Behavior of nodes Expression data Knock-out data …...

Testing on the network An example of machine- learning approach. Mol Syst Biol. 2007; 3: 140.

Testing on the network Mol Syst Biol. 2007; 3: 140. Network markers: Diamond – univariate significant

Testing on the network Ann. Appl. Stat. (Epub ahead of print) Example: A Bayesian framework  Univariate test of all genes  Transform p-values to normal quantiles  Assume a gene is either “1” (disease related) or “0” (unrelated)  Use a network-based mixture model – neighboring genes are more likely to share status

Reverse engineering of networks from microarray data Goal: infer genetic regulation network structure from microarray data Key assumption: The mRNA level measured by microarray truly reflects the activity of the regulator Sadly this is only true for ~20% of the regulators Methods incorporating more data/knowledge are developed

Reverse engineering of networks from microarray data Margolin & Califano, Ann N Y Acad Sci. 2007,1115:51. Hesselberth et al. Genome Biology. 2006,7:R30.

Reverse engineering of networks from microarray data Correlation Partial correlation (Gaussian graphical models) Expression data alone Expression data + other information Known transcription factor targets ChIP-chip and ChIP-seq Known interactions/pathways … Mutual information Bayesian network

Reverse engineering of networks from microarray data Margolin & Califano, Ann N Y Acad Sci. 2007,1115:51. Differentiating mechanisms of co-regulation based on expression data alone is a daunting task.

Network for knowledge representation Directed Acyclic Graph (DAG) Directed graph with no directed loops, i.e. from any node, no route to come back to the same node. The structure leads to partial ordering of the nodes: If an edge i  j exists, node i is at higher level than node j.

Organize knowledge about genes in a directed acyclic graph. The lower the level, the more detailed knowledge. Each gene is annotated to the terms, reflecting people’s knowledge about it. The Gene-Ontology knowledge-base

Similar thinking has been used on the tree of life and other areas Mol. BioSyst., 2014, 10, 86-92

Here’s how people’s knowledge about the gene ACE2 is summarized using the database. Based on these papers: The Gene-Ontology knowledge-base

Gene ontology and high-throughput data Gene ontology was necessitated by high-throughput data --- when thousands of genes are measured simultaneously, people must be able to combine the results with existing knowledge in a computationally efficient way.

Gene ontology and high-throughput data Two general types of considerations:  Does a GO term have first-order association with the clinical outcome?  Does the GO term change its interactions with other functional units in response to the clinical factor?

Gene ontology and high-throughput data How to deal with dependency between (neighboring) GO terms ? General strategies: Treat all GO terms as independent units, test for significant changes one-by-one, and let biologists remove the redundant information. Using the GO structure to remove redundant terms, and only test a small informative subset of all GO terms. Test for independence conditioned on the results of descendant nodes.

Gene ontology and high-throughput data Given a GO term, how to find whether it is up- or down- regulated in association with disease is an active research area. We list a few examples here. Difficulty: Within each GO term, a number of genes exist. These genes in fact operate in a network fashion in the cell. Competitions and feed back loops are common. The genes in one GO term don’t change in one direction. In association with a disease, some are up-regulated, some are suppressed, and some don’t change.

Gene ontology and high-throughput data GO term: positive regulation of I-kappaB kinase/NF-kappaB cascade Disease: Oral cancer metastasis

Gene ontology and high-throughput data Cutoff-based methods: General Idea: Test significance gene-by-gene. Select a threshold level, divide all genes into two groups: differentially expressed and non-differentially expressed. For each GO term, test the hypothesis that the differentially expressed genes are drawn from the pool of all genes independent of the GO term. Hypergeometric Binomial Chi-square test … … The arbitrary threshold has substantial impact on the results.

Gene ontology and high-throughput data Cutoff-free methods: Try to avoid the use of arbitrary threshold. Usually use permutation tests to find significance. This ensures the correlation structure between the genes are preserved. With group of genes to analyze, the hypothesis becomes complicated. Different method may use different assumptions and test for different hypotheses.

Gene ontology and high-throughput data JOURNAL OF COMPUTATIONAL BIOLOGY. 13:798. Comparing the p-value ( or correlation, or other statistics) distributions from one GO term to the overall distribution:  Kolmogorov–Smirnov goodness-of-fit test statistic for comparing two distributions  Anderson–Darling test statistic for testing for a uniform distribution  Wilcoxon rank-sum test statistic

GSEA. PNAS vol. 102 no

The competitive null hypothesis: genes in the gene set are not more associated with the phenotype than genes outside the gene set. The self-contained null hypothesis: no genes in the gene set are associated with the phenotype. Gene ontology and high-throughput data

GSDCA. Single gene set gene set pairs

GSDCA.