Presentation on theme: "Introduction to Bioinformatics - Tutorial no. 12"— Presentation transcript:
1 Introduction to Bioinformatics - Tutorial no. 12 Expression Data Analysis:- Clustering- GEO- EPClust
2 Application of Microarrays We only know the function of about 20% of the 30,000 genes in the Human GenomeGene explorationFaster and betterApplications:EvolutionBehaviorCancer Research
3 Microarray Analysis Unsupervised Grouping: Clustering Pattern discovery via grouping similarly expressed genes togetherThree techniques most often usedk-Means ClusteringHierarchical ClusteringKohonen Self Organizing Feature Maps
4 Hierarchical Agglomerative Clustering Michael Eisen, 1998Cluster (algorithm)TreeView (visualization)Hierarchical Agglomerative ClusteringStep 1: Similarity score between all pairs of genesPearson CorrelationEuclidean distanceStep 2: Find the two most similar genes, replace with a node that contains the averageBuilds a tree of genesStep 3: Repeat
5 Agglomerative Hierarchical Clustering Need to define the distance between the new cluster and the other clusters.Single Linkage: distance between closest pair.Complete Linkage: distance between farthest pair.Average Linkage: average distance between all pairsor distance between cluster centersAgglomerative Hierarchical ClusteringDistance between joined clusters5243142513The dendrogram induces a linear ordering of the data pointsDendrogram
6 Results of Clustering Gene Expression CLUSTER is simple and easy to useDe facto standard for microarray analysisLimitations:Hierarchical clustering in general is not robustGenes may belong to more than one cluster
7 K-Means Clustering Algorithm Randomly initialize k cluster meansIterate:Assign each genes to the nearest cluster meanRecompute cluster meansStop when clustering convergesNotes:Really fastGenes are partitioned into clustersHow do we select k?