Download presentation

Presentation is loading. Please wait.

1
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no.21 2009 Anindya Bhattacharya and Rajat K. De

2
Outline Introduction Bi-correlation clustering algorithm (BCCA) Results Conclusion

3
Introduction Biclustering Performs simultaneous grouping on genes and conditions of a dataset to determine subgroups of genes that exhibit similar behavior over a subset of experimental condition. A new correlation-based biclustering algorithm called bi-correlation clustering algorithm (BCCA) Produce a diverse set of biclusters of co-regulated genes All the genes in a bicluster have a similar change of expression pattern over the subset of samples.

4
Introduction Cluster analysis Most cluster analysis try to find group of genes that remains co-expressed through all experimental conditions. In reality, genes tends to be co-regulated and thus co-expressed under only a few experimental conditions.

5
Bi-correlation clustering algorithm Notation A set of n genes Each gene has m expression values For each gene g i there is an m-dimensional vector, there is the j-th expression value of g i. A set of m microarry experiments (measurements) n genes will have to be grouped into K overlapping biclusters

6
Bi-correlation clustering algorithm Bicluster: A bicluster can be defined as a subset of genes possesing a similar behavior over a subset of experiments Represented as A bicluster contains a subset of genes and a subset of experiments where each gene in is correlated with a correlation valued greater than or equal to specified threshold, with all other genes in over the measurements in.

7
Bi-correlation clustering algorithm BCCA Use person correlation coefficient for measuring similarity between expression patterns of two genes and.

8
Bi-correlation clustering algorithm Step 1: The set of bicluster S is initialized to NULL and number of bicluster Bicount is initialized to 0 Step 2A BCCA generate a bicluster (C) for each pair of genes in a dataset under a set of conditions For each pair of genes.BCCA creates a bicluster, where and.

9
Bi-correlation clustering algorithm In step 2C: For a pair of genes in C, if then a sample is detected from C, deletion of which caused maximum increase in correlation value between and. If being a threshold, the sample is deleted from. otherwise, C is discarded. Deletion of a measurement for which genes differ in expression value the most will result in the highest increase in correlation value. BCCA deletes one measurement at a time from.

10
Bi-correlation clustering algorithm In step 2D(a): Other genes from, which satisfy the definition of a bicluster are included in C for its augmentation. In step 2D(b): Whether present bicluster C has been found. If it is so then we do not to include C, otherwise, C is considered as a new bicluster.

11
Bi-correlation clustering algorithm

13
Results Datasets We demonstrate the affectiveness of BCCA in determining a set of co-regulated genes (i.e. the genes having common transcription factors) and functionally enriched clusters (and atributes) on five dataset

14
Results Variation with respect to threshold Plot of YCCD dataset : Average number of functionally enriched attributes (computed using P-values) versus correlation threshold value

15
Results Follow a guideline on this value from a previous study by Allocco et al. (2004) which has concluded that if two genes have a correlation between their expression profiles >0.84 then therre is >50% chance of being bounded by a common transcription factor.

16
Results By locating common transcription factors At first, we only consider those biclusters that have less than or equal to 50 genes. Use a software TOUCAN 2 (Aerts et al., 2005) for performance comparison by extracting information on the number of transcription factors present in proximal promoters of all the genes in a single bicluster. Presence of common transcription factors in the promoter regions of a set of genes is a good evidence toward co-regulation.

17
Results

19
Sequences of all the five genes found in a bicluster generated by BCCA from SPTD dataset. Any transcription factor may be found present in more than one location in upstream region.

20
Results Functional enrichment : P-value The functional enrichment of each GO category in each of the bicluster employed the software Funcassociate (Berriz et al., 2003). P-value represents the probability of observing the number of genes from a specific GO functional category within each cluster. A low P-value indicates that the genes belonging to the enriched functional categories are biologically significant in the corresponding clusters.

21
Results P-value of a functional category Suppose we have total population of N genes, in which M has a particular annotation. If we observe x genes with that annotation, in a sample of n genes, then we can calculate the probability of that observation. The probability of seeing x or more genes with an annotation, out of n, given that M in the population of N have that annotation

22
Results Only functional categories with are reported. Analysis of the 10 biclusters obtained for the YCCD, the highly enriched category in bicluster Bicluster 1 is the ‘ribosome’ with P-value of

23
Results

25
Conclusion BCCA is able to find a group of genes that show similar pattern of variation in their expression profiles over a subset of measurements. Better than other biclustering algorithm: Find higher number of common transcription factors of a set of gene in a bicluster More functionally enriched

Similar presentations

© 2021 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google