Dr. Michael R. Hyman Cluster Analysis. 2 Introduction Also called classification analysis and numerical taxonomy Goal: assign objects to groups so that.

Slides:



Advertisements
Similar presentations
Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Advertisements

Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Cluster Analysis: Basic Concepts and Algorithms
N. Kumar, Asst. Professor of Marketing Database Marketing Cluster Analysis.
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Metrics, Algorithms & Follow-ups Profile Similarity Measures Cluster combination procedures Hierarchical vs. Non-hierarchical Clustering Statistical follow-up.
Introduction to Bioinformatics
AEB 37 / AE 802 Marketing Research Methods Week 7
Cluster Analysis.
Cluster Analysis Hal Whitehead BIOL4062/5062. What is cluster analysis? Non-hierarchical cluster analysis –K-means Hierarchical divisive cluster analysis.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)
Chapter 17 Overview of Multivariate Analysis Methods
6-1 ©2006 Raj Jain Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different.
Clustering CMPUT 466/551 Nilanjan Ray. What is Clustering? Attach label to each observation or data points in a set You can say this “unsupervised classification”
Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Cluster Analysis (1).
Introduction to Bioinformatics - Tutorial no. 12
Dr. Michael R. Hyman Factor Analysis. 2 Grouping Variables into Constructs.
Multivariate Data Analysis Chapter 9 - Cluster Analysis
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Clustering analysis workshop Clustering analysis workshop CITM, Lab 3 18, Oct 2014 Facilitator: Hosam Al-Samarraie, PhD.
Segmentation Analysis
Cluster Analysis Chapter 12.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
CLUSTER ANALYSIS.
Descriptive Statistics Descriptive Statistics describe a set of data.
Ch10 Machine Learning: Symbol-Based
© 2007 Prentice Hall20-1 Chapter Twenty Cluster Analysis.
Cluster analysis 포항공과대학교 산업공학과 확률통계연구실 이 재 현. POSTECH IE PASTACLUSTER ANALYSIS Definition Cluster analysis is a technigue used for combining observations.
Descriptive Statistics Descriptive Statistics describe a set of data.
1 Cluster Analysis Objectives ADDRESS HETEROGENEITY Combine observations into groups or clusters such that groups formed are homogeneous (similar) within.
Cluster Analysis Cluster Analysis Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous groups.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
Factor & Cluster Analyses. Factor Analysis Goals Data Process Results.
Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.
Copyright © 2010 Pearson Education, Inc Chapter Twenty Cluster Analysis.
Clustering / Scaling. Cluster Analysis Objective: – Partitions observations into meaningful groups with individuals in a group being more “similar” to.
1 Cluster Analysis Prepared by : Prof Neha Yadav.
Chapter_20 Cluster Analysis Naresh K. Malhotra
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Pabitra Mitra Student Member 國立雲林科技大學 National Yunlin University.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Clustering (1) Clustering Similarity measure Hierarchical clustering
Chapter 15 – Cluster Analysis
Clustering based on book chapter Cluster Analysis in Multivariate Analysis by Hair, Anderson, Tatham, and Black.
CZ5211 Topics in Computational Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Data Clustering Michael J. Watts
K-means and Hierarchical Clustering
Clustering and Multidimensional Scaling
Multivariate Statistical Methods
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Data Mining – Chapter 4 Cluster Analysis Part 2
Chapter_20 Cluster Analysis
Cluster Analysis.
Text Categorization Berlin Chen 2003 Reference:
Cluster analysis Presented by Dr.Chayada Bhadrakom
Presentation transcript:

Dr. Michael R. Hyman Cluster Analysis

2 Introduction Also called classification analysis and numerical taxonomy Goal: assign objects to groups so that intra-group similarity and inter-group dissimilarity as maximized No (in)dependent variables Find naturally occurring groupings of objects

3 Uses in Studying Consumers Benefit segmentation Finding market niches Finding homogeneous market segments for future study Data reduction

4 Clusters Formed by Using Data on Two Characteristics

5 Scatter Plot of Income and Education Data for PC Owners and Non-owners

6

7

8 Procedure #1: Divisive (tear down) Start with profile data Find variable with highest variance Split objects above and below mean on this variable Find remaining high variance variable and split along mean

9 Procedure #2: Agglomerative (build up) Select similarity measure –Distance (Euclidean, city block) –Correlation –Similarity Search similarity matrix for most similar cluster pair Repeat iteratively until only one cluster remains

10 Commonly Used Similarity Coefficients 20

11 Procedure #2: Agglomerative Stopping Rules Theory and practice Distance that clusters combine Within/between group variance Relative sizes of clusters

12 Procedure #2: Agglomerative Linkage Methods Single (nearest neighbor) Makes long, thin clusters Complete (maximum distance to farthest neighbor) Sensitive to outliers Average distance between objects Variance methods (minimum within- cluster variance) Nodal (begin with two least similar objects as nodes)

13

14

15 Procedure #2: Agglomerative Reliability and Validity Assessment Use different distance measures Use different clustering methods Split data, run both halves, and compare Shuffle cases (objects) Solve with subset of profile variables

16 General Problems Early assignments treated as permanent –Precludes later revision for improved fit Number of clusters –More clusters means greater intra-group homogeneity but less descriptive power No good measure of cluster compactness Lack of statistical properties makes inference difficult

17 General Problems (cont.) Coping with inter-correlated profile variables Must select profile variables that can discriminate among objects Sensitive to unit of measurement and outliers –Fix: Standardize data and delete outliers Subjective interpretation of results (i.e., naming clusters)

18 Steps for Conducting a Cluster Analysis: A Summary

19