Presentation is loading. Please wait.

Presentation is loading. Please wait.

Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Similar presentations


Presentation on theme: "Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between."— Presentation transcript:

1

2

3 Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between them. Similarly in a gene network each node corresponds to a gene, and a pair of nodes is connected with an edge if there is a significant co-expression relationship between them. But unlike many networks - like the com­po­nents inside your car engine or the wires inside a robot - biological sys­tems are black boxes. We can observe the out­come of their inter­ac­tions, but not the inter­ac­tions themselves.

4 In the context of biology, link prediction refers to the problem of identifying functional links between genes from data that may be confounded by indirect effects. Suppose we are looking at 3 genes A, B, and C. Gene A inhibits the expression of gene B, and also gene B inhibits the expression of gene C. If the expression of A increases, it will decrease the expression of B, which in turn increase C. Therefore one might observe correlation in the expression levels of gene A and C, even though there is no direct interaction between them. A B C Gene Expression helps in identifying active and inactive genes in a cell Gene Expression helps in identifying active and inactive genes in a cell

5 We start with a gene expression values of m genes for n samples (conditions), the input data would be an m×n matrix. In first step, a similarity score (co- expression measure) is calculated between each pair of rows in expression matrix. The resulting matrix would be an m×m matrix. Each element in this matrix shows how similar the expression level of two genes change together. Pearson Correlation Or Mutual Information Or Euclidian Distance Or Spearman Rank Correlation

6 Pearson correlations measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation. Mutual information measures how much knowing the expression levels of one gene reduces the uncertainty about the expression levels of another. Euclidean distance measures the geometric distance between two vectors, and so considers both the direction and the magnitude of the vectors of gene expression values.

7 elements in similarity matrix which are above a certain threshold are replaced by 1 and the remaining elements are replaced by 0. > = 0.8

8 Global Response Matrix G, can be measured directly from gene expression measurement. It captures the change in node i’s activity in response to changes in node j’s. Gij cannot distinguish between direct and indirect relationships: a path i → k → j can result in a measurable response observed between I and j, falsely suggesting the existence of a direct link between them. Gij cannot distinguish between direct and indirect relationships: a path i → k → j can result in a measurable response observed between I and j, falsely suggesting the existence of a direct link between them. “∂” indicates that Sij is defined to capture only local effects, i.e. the response of i to changes in j when all surrounding nodes except i and j remain unchanged.

9 We’ve already seen Global and Local response matrix, G and S resp. To extract Sij from the experimentally accessible Gij, we formally link both the equation We can further solve this to calculate EQ.X Which provides Sij from the experimentally accessible Gij by ‘silencing’ indirect responses and preserving direct response terms. We can further solve this to calculate EQ.X Which provides Sij from the experimentally accessible Gij by ‘silencing’ indirect responses and preserving direct response terms.

10 Thresholding The experimentally observed global response matrix, Gij, accounts for direct as well as indirect correlations, with no clear separation between them. Thresholding predicts spurious links (thick dashed lines) and overlook true links (thin solid lines). Thus although the average Gij terms associated with direct links are higher than the average terms associated with indirect links, as captured by the discrimination ratio, ∆G, the difference is not sufficient to fully discriminate between direct and indirect links. ∆G = {Gij} Dir / {Gij} In-dir

11 Instead of thresholding if we apply Silencing method the flow from the source j to the target i is carried through the indirect effect G kj (brown) coupled with the direct impact S ik of the target’s nearest neighbor k. By silencing the indirect contributions, EQ X provides the local response matrix, Sij, whose nonzero elements correspond to direct links. As indirect terms become much smaller in Sij, we obtain a greater discrimination ratio, ∆S. ∆S = {Sij} Dir / {Sij} In-dir

12 Authors used a scale-free network consisting of N=5000 nodes and L=20000 links to test power of EQ.X. We obtain Gij by perturbing the activity of each node and then calculated Sij using EQ.X. Gij and Sij associated with interacting and non interacting node pairs. Sij silences the correlations associated with indirect interactions, resulting in a clear separation between direct and indirect interactions, a phenomenon absent from Gij. Indeed, the receiver operating characteristic (ROC) curve derived from Gij has an area of AUROC = 0.91, reflecting inherent limitations in separating direct from indirect interactions based on Gij only. In contrast, for Sij we obtain AUROC = 0.997 (blue), where the true-positive rate reaches 100% with a false- positive rate

13 The discrimination ratio, ∆s, is much higher in Sij compared to ∆ G, of Gij. This indicates that Sij is a much better predictor of direct versus indirect interactions. This silencing effect can be quantified in terms of discrimination ratio k. In model system it was found that k= 15 i.e. S has 15 times more power to discriminate direct from indirect interaction as compared to G Longer the distance dij between two nodes, the larger is the silencing. Consider a linear cascade in which changes in any node result in a finite response Gij by all other nodes. EQ. X silences all indirect responses, while leaving the response of direct links effectively unchanged, offering a discriminative measure that enables a perfect reconstruction of the original network.

14 To test the predictive power of equation X on real data, E. coli data set was used. The input data includes the expression levels of 4,511 genes were measured under different experimental conditions, giving rise to an 805 X 4511 expression matrix Three separate global response matrices Gij were generated based on - Pearson correlations - Spearman rank correlations -mutual information From each of the three Gij matrices, Sij was obtained using EQ.X, and its performance was compared with Gij and validated against gold standard used in the DREAM5 challenge. 56% improvement 67% improvement 6% improvement

15 A network with 8 nodes of which 2 are hidden. The resulting sub-network has 6 nodes (light blue), 5 of which constitute a connected component and one which is isolated. Silencer equation if applied to the sub-network will successfully silence the indirect correlations associated with the unhidden paths of the connected component. However the correlations between the isolated node and the rest of the network, which cannot be associated with an existing indirect path, will not be silenced. Thus as long as the isolated node pairs (connected via hidden paths) are a minority Sij maintains its advantage, but if the majority of nodes become isolated, Sij becomes comparable to Gij and hence no silencing effect. Consider a simple example of a linear cascade i → k → j, in which k is a hidden node, and all we are offered is experimental response of i to j, Gij. Clearly, under these circumstances, EQ.X will not be able to classify the i → j link as indirect. Indeed, because it is mathematically impossible to classify this link as direct or indirect, as there is no information in the observed response matrix from which the existence of node could be inferred.

16 This research could be pivotal in tack­ling a range of problems that involve under­standing the com­plex network sys­tems. This work spans from studying the global spread of dis­ease to ana­lyzing social media data as a way to better under­stand fields ranging from polit­ical sci­ence to dis­aster preparedness. Hence Silencer Equation helps translate the ever-growing amount of data on global correlations which contains both direct as well as indirect interactions into valuable local information with only direct interactions.

17


Download ppt "Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between."

Similar presentations


Ads by Google