Multivariate statistical methods Cluster analysis.

Slides:



Advertisements
Similar presentations
BioInformatics (3).
Advertisements

Basic Gene Expression Data Analysis--Clustering
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Clustering Basic Concepts and Algorithms
N. Kumar, Asst. Professor of Marketing Database Marketing Cluster Analysis.
PARTITIONAL CLUSTERING
K Means Clustering , Nearest Cluster and Gaussian Mixture
Metrics, Algorithms & Follow-ups Profile Similarity Measures Cluster combination procedures Hierarchical vs. Non-hierarchical Clustering Statistical follow-up.
Introduction to Bioinformatics
AEB 37 / AE 802 Marketing Research Methods Week 7
Cluster Analysis.
Cluster Analysis Hal Whitehead BIOL4062/5062. What is cluster analysis? Non-hierarchical cluster analysis –K-means Hierarchical divisive cluster analysis.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)
6-1 ©2006 Raj Jain Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
What is Cluster Analysis?
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Evaluating Performance for Data Mining Techniques
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Main Clustering Algorithms §K-Means §Hierarchical §SOM.
Data mining and machine learning A brief introduction.
CLUSTER ANALYSIS.
© 2007 Prentice Hall20-1 Chapter Twenty Cluster Analysis.
Cluster analysis 포항공과대학교 산업공학과 확률통계연구실 이 재 현. POSTECH IE PASTACLUSTER ANALYSIS Definition Cluster analysis is a technigue used for combining observations.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
es/by-sa/2.0/. Principal Component Analysis & Clustering Prof:Rui Alves Dept Ciencies Mediques.
1 Gene Ontology Javier Cabrera. 2 Outline Goal: How to identify biological processes or biochemical pathways that are changed by treatment.Goal: How to.
Clustering Procedure Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 16, 2015.
Multivariate Data Analysis  G. Quinn, M. Burgman & J. Carey 2003.
Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.
Cluster Analysis Cluster Analysis Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous groups.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Analyzing Expression Data: Clustering and Stats Chapter 16.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L10.1 Lecture 10: Cluster analysis l Uses of cluster analysis.
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Clustering / Scaling. Cluster Analysis Objective: – Partitions observations into meaningful groups with individuals in a group being more “similar” to.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Basic statistical concepts Variance Covariance Correlation and covariance Standardisation.
Unsupervised Learning
Multivariate statistical methods
CZ5211 Topics in Computational Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Data Mining K-means Algorithm
Clustering (3) Center-based algorithms Fuzzy k-means
Clustering and Multidimensional Scaling
Multivariate Statistical Methods
Dimension reduction : PCA and Clustering
Data Mining – Chapter 4 Cluster Analysis Part 2
Cluster Analysis.
Multidimensional Space,
Clustering The process of grouping samples so that the samples are similar within each group.
Cluster analysis Presented by Dr.Chayada Bhadrakom
Unsupervised Learning
Presentation transcript:

Multivariate statistical methods Cluster analysis

Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation vs. eploration analysis  confirmation – impact on parameter estimate and hypothesis testing  exploration – impact on data exploration, finding out of patterns and structure

Multivariate statistical methods Unit classification Cluster analysis Discrimination analysis Analysis of relations among variables Cannonical correlation analysis Factor analysis Principal component analysis

Unit classification methods

Cluster analysis (CA) aim is find out groups of objects, which are similar and are different from other groups methods of cluster analysis:  hierarchical  nonhierarchical

1. Hierarchical methods creation of clusters of different level (clusters of the highest level include clusters of lower level) results of hierarchical methods are formed in tree structure, results are presented by dendrogram is specified:  similarity rate  algorithms of clustering

Hierarchical methods – similarity expression qualitative values  number of indentical values/number of all values quantitative values:  Euclidean distance vzdálenost  Manhattan distance (Hemming distance)  Tschebyshev distance

Similarity rates Euclidean distance Manhattan (Hemming distance) Tschebyshev distance where x ik, x jk are objects, which distance is explored in n-dimension, n is number of observed characteristics

Distance of objects in 2D Distances: Circle – Euclidean Internal square – Hemming External square – Tshebyshev

Other types of similarity rates Power definied by user, the higher p is, the higher weight of larger distances is and it means lower signification of smaller distances. Parameter r causes conversely. 1-Pearson r unsuitable for smal number of dimension Percentual discrepancy suitable for categorical variables

Algoritms of clustering Nearest neighbor linkage: distance between two clusters is definied as distance of two nearest objects Furthest neighbor linkage: distance between two clusters is definied as distance of two furthest objects Nonweighted group average linkage: distance between two clusters is definied as average distance among all of pairs, where 1st member is from 1st cluster and 2nd member is from 2nd cluster Weighted group average linkage: as previous, extra takes note of cluster size (number of objects) as weights

Algorithms of clustering Nonweighted centroid: distance between two clusters is definied as distance of centroids of these clusters. Centroid is vector of averages (each coordinate is average of appropriate coordinates of objects in the each cluster) Weighted centroid: as previous,extra takes note of cluster size (number of objects) as weights Ward´s method: different from previous, for computation of distance among clusters is used analysis of variance. For clustering is important this rule, that the internal cluster sum of squares is minimal

2. Nonhierarchical method mostly used is method K – means algorithm is based on moving of objects among clusters number of clusters is beforehand defined; randomly or according to experiences of analyst centroids are defined for all clusters in the same step all objects are explored. If the object is nearest to the own centroid, we leave it in this cluster. If not, move it in cluster, which centroid is the nearest. Intercluster sum of square should be minimal. This procedure repeat until at no objects shall be moved. Than we have final solution. we are not working with distance matrix → K – means method is suitable for clustering of larger size of objects