Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Linear Regression.
Brief introduction on Logistic Regression
Lecture 3: A brief background to multivariate statistics
RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-
Correlation and Linear Regression.
R OBERTO B ATTITI, M AURO B RUNATO The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
The General Linear Model Or, What the Hell’s Going on During Estimation?
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Some Ideas Behind Finite Element Analysis
CONNECTIVITY “The connectivity of a network may be defined as the degree of completeness of the links between nodes” (Robinson and Bamford, 1978).
Mutual Information Mathematical Biology Seminar
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Control Systems and Adaptive Process. Design, and control methods and strategies 1.
Educational Research: Correlational Studies EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
Decision Tree Models in Data Mining
Relationships Among Variables
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Modeling and Finding Abnormal Nodes (chapter 2) 駱宏毅 Hung-Yi Lo Social Network Mining Lab Seminar July 18, 2007.
1. Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Systems of Equations CHAPTER 9.1Solving Systems of Linear Equations Graphically.
Lecture 3-2 Summarizing Relationships among variables ©
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Chapter 2 Modeling and Finding Abnormal Nodes. How to define abnormal nodes ? One plausible answer is : –A node is abnormal if there are no or very few.
14 Elements of Nonparametric Statistics
Topic 6.1 Statistical Analysis. Lesson 1: Mean and Range.
Parallel dc Circuits.
ME 1202: Linear Algebra & Ordinary Differential Equations (ODEs)
Simplex method (algebraic interpretation)
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Geo597 Geostatistics Ch9 Random Function Models.
Algebra Form and Function by McCallum Connally Hughes-Hallett et al. Copyright 2010 by John Wiley & Sons. All rights reserved. 3.1 Solving Equations Section.
Yaomin Jin Design of Experiments Morris Method.
INTRODUCTION When two or more instruments sound the same portion of atmosphere and observe the same species either in different spectral regions or with.
Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.
The Correlational Research Strategy
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
V13: Causality Aims: (1) understand the causal relationships between the variables of a network (2) interpret a Bayesian network as a causal model whose.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
Obtaining Electric Field from Electric Potential Assume, to start, that E has only an x component Similar statements would apply to the y and z.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Uncertainty Management in Rule-based Expert Systems
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
IMPROVED RECONSTRUCTION OF IN SILICO GENE REGULATORY NETWORKS BY INTEGRATING KNOCKOUT AND PERTURBATION DATA Yip, K. Y., Alexander, R. P., Yan, K. K., &
Lecture 12 Factor Analysis.
LITAR ELEKTRIK II EET 102/4. SILIBUS LITAR ELEKTRIK II  Mutual Inductance  Two port Network Pengenalan Jelmaan Laplace Pengenalan Jelmaan Laplace Kaedah.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Biclustering of Expression Data by Yizong Cheng and Geoge M. Church Presented by Bojun Yan March 25, 2004.
Quantum Two 1. 2 Angular Momentum and Rotations 3.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Slide Copyright © 2009 Pearson Education, Inc. Slide Copyright © 2009 Pearson Education, Inc. Welcome to MM150 – Unit 4 Seminar Unit 4 Seminar.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
CORRELATION ANALYSIS.
Ultra-high dimensional feature selection Yun Li
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Does the brain compute confidence estimates about decisions?
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Data Mining, Neural Network and Genetic Programming
Roberto Battiti, Mauro Brunato
Walking the Interactome for Prioritization of Candidate Disease Genes
An Introduction to Correlational Research
Simplex method (algebraic interpretation)
Vector Spaces RANK © 2012 Pearson Education, Inc..
Presentation transcript:

Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between them. Similarly in a gene network each node corresponds to a gene, and a pair of nodes is connected with an edge if there is a significant co-expression relationship between them. But unlike many networks - like the com­po­nents inside your car engine or the wires inside a robot - biological sys­tems are black boxes. We can observe the out­come of their inter­ac­tions, but not the inter­ac­tions themselves.

In the context of biology, link prediction refers to the problem of identifying functional links between genes from data that may be confounded by indirect effects. Suppose we are looking at 3 genes A, B, and C. Gene A inhibits the expression of gene B, and also gene B inhibits the expression of gene C. If the expression of A increases, it will decrease the expression of B, which in turn increase C. Therefore one might observe correlation in the expression levels of gene A and C, even though there is no direct interaction between them. A B C Gene Expression helps in identifying active and inactive genes in a cell Gene Expression helps in identifying active and inactive genes in a cell

We start with a gene expression values of m genes for n samples (conditions), the input data would be an m×n matrix. In first step, a similarity score (co- expression measure) is calculated between each pair of rows in expression matrix. The resulting matrix would be an m×m matrix. Each element in this matrix shows how similar the expression level of two genes change together. Pearson Correlation Or Mutual Information Or Euclidian Distance Or Spearman Rank Correlation

Pearson correlations measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation. Mutual information measures how much knowing the expression levels of one gene reduces the uncertainty about the expression levels of another. Euclidean distance measures the geometric distance between two vectors, and so considers both the direction and the magnitude of the vectors of gene expression values.

elements in similarity matrix which are above a certain threshold are replaced by 1 and the remaining elements are replaced by 0. > = 0.8

Global Response Matrix G, can be measured directly from gene expression measurement. It captures the change in node i’s activity in response to changes in node j’s. Gij cannot distinguish between direct and indirect relationships: a path i → k → j can result in a measurable response observed between I and j, falsely suggesting the existence of a direct link between them. Gij cannot distinguish between direct and indirect relationships: a path i → k → j can result in a measurable response observed between I and j, falsely suggesting the existence of a direct link between them. “∂” indicates that Sij is defined to capture only local effects, i.e. the response of i to changes in j when all surrounding nodes except i and j remain unchanged.

We’ve already seen Global and Local response matrix, G and S resp. To extract Sij from the experimentally accessible Gij, we formally link both the equation We can further solve this to calculate EQ.X Which provides Sij from the experimentally accessible Gij by ‘silencing’ indirect responses and preserving direct response terms. We can further solve this to calculate EQ.X Which provides Sij from the experimentally accessible Gij by ‘silencing’ indirect responses and preserving direct response terms.

Thresholding The experimentally observed global response matrix, Gij, accounts for direct as well as indirect correlations, with no clear separation between them. Thresholding predicts spurious links (thick dashed lines) and overlook true links (thin solid lines). Thus although the average Gij terms associated with direct links are higher than the average terms associated with indirect links, as captured by the discrimination ratio, ∆G, the difference is not sufficient to fully discriminate between direct and indirect links. ∆G = {Gij} Dir / {Gij} In-dir

Instead of thresholding if we apply Silencing method the flow from the source j to the target i is carried through the indirect effect G kj (brown) coupled with the direct impact S ik of the target’s nearest neighbor k. By silencing the indirect contributions, EQ X provides the local response matrix, Sij, whose nonzero elements correspond to direct links. As indirect terms become much smaller in Sij, we obtain a greater discrimination ratio, ∆S. ∆S = {Sij} Dir / {Sij} In-dir

Authors used a scale-free network consisting of N=5000 nodes and L=20000 links to test power of EQ.X. We obtain Gij by perturbing the activity of each node and then calculated Sij using EQ.X. Gij and Sij associated with interacting and non interacting node pairs. Sij silences the correlations associated with indirect interactions, resulting in a clear separation between direct and indirect interactions, a phenomenon absent from Gij. Indeed, the receiver operating characteristic (ROC) curve derived from Gij has an area of AUROC = 0.91, reflecting inherent limitations in separating direct from indirect interactions based on Gij only. In contrast, for Sij we obtain AUROC = (blue), where the true-positive rate reaches 100% with a false- positive rate

The discrimination ratio, ∆s, is much higher in Sij compared to ∆ G, of Gij. This indicates that Sij is a much better predictor of direct versus indirect interactions. This silencing effect can be quantified in terms of discrimination ratio k. In model system it was found that k= 15 i.e. S has 15 times more power to discriminate direct from indirect interaction as compared to G Longer the distance dij between two nodes, the larger is the silencing. Consider a linear cascade in which changes in any node result in a finite response Gij by all other nodes. EQ. X silences all indirect responses, while leaving the response of direct links effectively unchanged, offering a discriminative measure that enables a perfect reconstruction of the original network.

To test the predictive power of equation X on real data, E. coli data set was used. The input data includes the expression levels of 4,511 genes were measured under different experimental conditions, giving rise to an 805 X 4511 expression matrix Three separate global response matrices Gij were generated based on - Pearson correlations - Spearman rank correlations -mutual information From each of the three Gij matrices, Sij was obtained using EQ.X, and its performance was compared with Gij and validated against gold standard used in the DREAM5 challenge. 56% improvement 67% improvement 6% improvement

A network with 8 nodes of which 2 are hidden. The resulting sub-network has 6 nodes (light blue), 5 of which constitute a connected component and one which is isolated. Silencer equation if applied to the sub-network will successfully silence the indirect correlations associated with the unhidden paths of the connected component. However the correlations between the isolated node and the rest of the network, which cannot be associated with an existing indirect path, will not be silenced. Thus as long as the isolated node pairs (connected via hidden paths) are a minority Sij maintains its advantage, but if the majority of nodes become isolated, Sij becomes comparable to Gij and hence no silencing effect. Consider a simple example of a linear cascade i → k → j, in which k is a hidden node, and all we are offered is experimental response of i to j, Gij. Clearly, under these circumstances, EQ.X will not be able to classify the i → j link as indirect. Indeed, because it is mathematically impossible to classify this link as direct or indirect, as there is no information in the observed response matrix from which the existence of node could be inferred.

This research could be pivotal in tack­ling a range of problems that involve under­standing the com­plex network sys­tems. This work spans from studying the global spread of dis­ease to ana­lyzing social media data as a way to better under­stand fields ranging from polit­ical sci­ence to dis­aster preparedness. Hence Silencer Equation helps translate the ever-growing amount of data on global correlations which contains both direct as well as indirect interactions into valuable local information with only direct interactions.