Prior Knowledge Driven Causality Analysis in Gene Regulatory Network Discovery Authors: Shun Yao, Shinjae Yoo, Dantong Yu Stony Brook University Computational.

Slides:

Advertisements

Similar presentations

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST

Advertisements

Chapter 6 Matrix Algebra.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.

By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.

Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.

and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $

Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Title Subtitle.

ALGEBRAIC EXPRESSIONS

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)

SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION

MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

FACTORING Think Distributive property backwards Work down, Show all steps ax + ay = a(x + y)

Bayesian network for gene regulatory network construction

BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.

ABC Technology Project

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.

Recombinant DNA Technology

1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.

1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University

Squares and Square Root WALK. Solve each problem REVIEW:

1 Weiren Yu 1,2, Xuemin Lin 1, Wenjie Zhang 1 1 University of New South Wales 2 NICTA, Australia Towards Efficient SimRank Computation over Large Networks.

Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN

Basic Gene Expression Data Analysis--Clustering

Chapter 5 Test Review Sections 5-1 through 5-4.

Addition 1’s to 20.

25 seconds left…...

Test B, 100 Subtraction Facts

Analyzing Genes and Genomes

We will resume in: 25 Minutes.

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

Figure Essential Cell Biology (© Garland Science 2010)

Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8

CPSC 322, Lecture 14Slide 1 Local Search Computer Science cpsc322, Lecture 14 (Textbook Chpt 4.8) Oct, 5, 2012.

1 Unit 1 Kinematics Chapter 1 Day

PSSA Preparation.

IP, IST, José Bioucas, Probability The mathematical language to quantify uncertainty  Observation mechanism:  Priors:  Parameters Role in inverse.

1/22 Worst and Best-Case Coverage in Sensor Networks Seapahn Meguerdichian, Farinaz Koushanfar, Miodrag Potkonjak, and Mani Srivastava IEEE TRANSACTIONS.

How Cells Obtain Energy from Food

1 ECE 776 Project Information-theoretic Approaches for Sensor Selection and Placement in Sensor Networks for Target Localization and Tracking Renita Machado.

13-Optimization Assoc.Prof.Dr. Ahmet Zafer Şenalp Mechanical Engineering Department Gebze Technical.

The multi-layered organization of information in living systems

D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.

. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.

Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,

Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:

Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.

ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.

Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.

MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.

A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.

Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.

Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics.

EB3233 Bioinformatics Introduction to Bioinformatics.

Introduction to biological molecular networks

341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.

Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.

1 Using Graph Theory to Analyze Gene Network Coherence José A. Lagares Jesús S. Aguilar Norberto Díaz-Díaz Francisco A. Gómez-Vela

Building and Analyzing Genome-Wide Gene Disruption Networks

Network Inference Chris Holmes Oxford Centre for Gene Function, &,

Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar

Presentation transcript:

Prior Knowledge Driven Causality Analysis in Gene Regulatory Network Discovery Authors: Shun Yao, Shinjae Yoo, Dantong Yu Stony Brook University Computational Science Center, Brookhaven National Laboratory Presenter: Shun Yao 1

Overview Motivation Challenges & Methods Experiments Contributions 2

Next Generation Sequencing: Data explosion Speed improvements in DNA seqCost improvements in DNA seq Nature 458, (2009) Analyzing the data systematically has become a challenge. 3

Time Series Gene Expression Data Domain question: How do different genes coordinate with each other to make a process happen? – Cell cycle – Developmental biology – Or anything What to do experimentally? – Time Series Gene Expression Data through microarray or sequencing. – Find the regulatory relationships from the data. Bioinformaticians’ job to analyze the time series gene expression data 4 Biological process from a systematic perspective

Overview Motivation Challenges & Methods Experiments Contributions 5

Granger causality modeling Granger causality modeling: – Originated from time series analysis in economics. – One of the most popular vector autoregressive (VAR) models. – Results could be statistically analyzed. Bivariate Granger Causality modeling Pairwise Granger Causality (PGC) Multivariate Granger Causality modeling Conditional Granger Causality (CGC) General strategies 6

Bivariate Granger Causality model (PGC) Two time series x t and y t (t=1,2,…,T). Model order is p. OLS Calculate significance value  7 Whether x t Granger causes y t t=p+1,…,T Total number of regressions m=T-p.

Multivariate Granger Causality model (CGC) y t is a nx1 vector, representing the expression of n genes at time t. A i is a nxn matrix, representing the causality at model order i. T>=(n+1)p X’X must be invertible 8 Matrix form: OLS solution

Real situation for CGC and PGC Limitation of Pairwise GCLimitation of Conditional GC 9 X’X is not invertible Significant number of false positives as n increases

Overcoming the limitations simultaneously Limitations in PGC and CGC – False discoveries in PGC. – Lack of data in CGC. Advantages of using prior knowledge – Different available biological experiment data. – Additional information besides expression data. Insufficient information 10 Lose F-statistics!

New Framework: Utilizing the prior knowledge Using prior knowledge to guide clustering to assist Granger Causality analysis 11

Overview Motivation Challenges & Methods Experiments Contributions 12

Microarray data: Yeast Metabolic Cycle dataset Science 310 (5751), (2005) Target gene set selection based on significance and periodicity: 2935 genes with 36 times points covering three yeast metabolic cycles The expression profile of 6209 uniquely expressed ORFs 13

Prior knowledge data: YeastNet A probabilistic functional gene network of yeast genes – Constructed from ~1.8 million experimental observations – Covers linkages among 5483 yeast proteins – Currently version 2 (version 3 will be available soon) Plos One 2(10), e988 (2007) A general way to summarize heterogeneous knowledge Graph Constructing Formula Where 14

Properties of the extracted YeastNet graph Extracted YeastNet based on the target gene set The extracted YeastNet is a well-connected gene association graph. The biggest component covers most of the genes. The nodes are well-connected with each other. 15 Prior knowledge graph: 2953 nodes and edges

Clustering using prior knowledge graph We used spectral clustering algorithm to cluster genes – Based on distances/similarities – Normalized cut The cluster size distribution at k=300 Tuning of the spectral clustering algorithm 16

CGC analysis on small clusters GCCA toolbox developed by Seth. – Model order p is selected by BIC (Bayesian information criterion) criterion. – Bonferroni approach to build Granger causality networks. For a network with significance level , the corresponding edge significance level in the graph is  /n(n-1). 17 Bonferroni approach Journal of Neuroscience Methods.186:

An example discovered network Two properties: 1. With different significance value, resulting networks are slightly different. 2. Granger causality networks are highly hierarchical. Edge significance level 0.05/18(18-1)= Edge significance level 0.10/18(18-1)=

Functional prediction through the result causality network Saccharomyces genome database (SGD) function search PCL9: Cyclin in the late M/early G1 phase. UTP15, PAB1,PBN1: Cell cycle material preparation genes for early G1 phase. TDA10: ATP-binding protein with unknown function; similar to an E. coli kinase. TDA10 might play a signal transduction role in late M/early G1 phase. 19

Overview Motivation Challenges & Methods Experiments Contributions 20

Contributions We proposed a new framework on applying Granger Causality analysis to large target gene set to overcome two existing limitations. – PGC limitation: False discoveries – CGC limitation: Lack of data 21 We used prior knowledge graph to find the group structure inside the target gene set, then applied the more accurate CGC model inside each groups. Yeast Metabolic cycle dataset are tested as an example. We found meaningful new biological causality networks based on our approach.

Acknowledgements This work is supported by Brookhaven National Lab LDRD No