Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set.

Slides:



Advertisements
Similar presentations
Info 2040 Foundation of Quantitative Analysis
Advertisements

Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Copyright Jiawei Han, modified by Charles Ling for CS411a
Introduction To Statistics
Types of Variables Objective:
Clustering.
Clustering: Introduction Adriano Joaquim de O Cruz ©2002 NCE/UFRJ
TYPES OF DATA. Qualitative vs. Quantitative Data A qualitative variable is one in which the “true” or naturally occurring levels or categories taken by.
Lecture Notes for Chapter 2 Introduction to Data Mining
Clustering.
ROBERT MORRIS UNIVERSITY
Cluster Analysis.
What is Cluster Analysis
1 Chapter 8: Clustering. 2 Searching for groups Clustering is unsupervised or undirected. Unlike classification, in clustering, no pre- classified data.
Cluster Analysis.
CLUSTERING (Segmentation)
Data Mining Strategies. Scales of Measurement  Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103,  Four Scales  Categorical.
Section 1-2 Variables and types of Data. Objective 3: Identify types of Data In this section we will detail the types of data and nature of variables.
Unit 1 Section 1.2.
2013 Teaching of Clustering
Cluster Analysis Part I
DATA MINING CLUSTERING K-Means.
STATISTICS is about how to COLLECT, ORGANIZE,
Variables and Types of Data.   Qualitative variables are variables that can be placed into distinct categories, according to some characteristic or.
Introduction to Statistics What is Statistics? : Statistics is the sciences of conducting studies to collect, organize, summarize, analyze, and draw conclusions.
Data Mining & Knowledge Discovery Lecture: 2 Dr. Mohammad Abu Yousuf IIT, JU.
Probability & Statistics – Bell Ringer  Make a list of all the possible places where you encounter probability or statistics in your everyday life. 1.
Section 1.1 What is Statistics.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Basics: Data Remark: Discusses “basics concerning data sets (first half of Chapter.
Section 1.1 Statistics Statistics :
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Vocabulary of Statistics Part Two. Variable classifications Qualitative variables: can be placed into distinct categories, according to some characteristic.
Unit 1 Section : Variables and Types of Data  Variables can be classified in two ways:  Qualitative Variable – variables that can be placed.
1 Data Mining: Data Lecture Notes for Chapter 2. 2 What is Data? l Collection of data objects and their attributes l An attribute is a property or characteristic.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Bell Ringer Using female = 0 and male = 1, calculate the average maleness in this classroom.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Clustering.
Cluster Analysis.
INTRODUCTION TO STATISTICS CHAPTER 1: IMPORTANT TERMS & CONCEPTS.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Data Classification Lesson 1.2.
1 PAUF 610 TA 1 st Discussion. 2 3 Population & Sample Population includes all members of a specified group. (total collection of objects/people studied)
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
1 What is Data? l An attribute is a property or characteristic of an object l Examples: eye color of a person, temperature, etc. l Attribute is also known.
Types of data Categorical Nominal Ordinal Numeric Discrete Continuous C.
Chapter 1: Section 2-4 Variables and types of Data.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
1 Cluster Analysis – 2 Approaches K-Means (traditional) Latent Class Analysis (new) by Jay Magidson, Statistical Innovations based in part on a presentation.
Descriptive Statistics Printing information at: Class website:
By: Michael Mack, Ana Meneses and Zhane’ Fleming.
2 NURS/HSCI 597 NURSING RESEARCH & DATA ANALYSIS GEORGE MASON UNIVERSITY.
Data Preliminaries CSC 600: Data Mining Class 1.
Variables and Types of Data
Unit 1 Section 1.2.
Central Tendency & Scale Types
Lecture Notes for Chapter 2 Introduction to Data Mining
NATURE OF Measurement.
Review Data: {2, 5, 6, 8, 5, 6, 4, 3, 2, 1, 4, 9} What is F(5)? 2 4 6
Vocabulary of Statistics
PBH 616: Quantitative Research Method
CSCI N317 Computation for Scientific Applications Unit Weka
What Is Good Clustering?
Preparing for Research
Group 9 – Data Mining: Data
Data Preliminaries CSC 576: Data Mining.
Data Pre-processing Lecture Notes for Chapter 2
CONCEPT TO BE INCLUDED Variable Value.
Business Statistics For Contemporary Decision Making 9th Edition
Presentation transcript:

Data Mining Techniques Clustering

Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set of objects is partitioned into several clusters All members in one cluster are similar to each other and different from the members of other clusters, according to some similarity metric (e.g., the opposite of distance between objects)

Cluster Analysis X (Income) Y (Age) Customer (Object) Variables Cluster

Cluster Analysis Data Matrix Dissimilarity Matrix (n  n) n objetcs p variables

Attribute Types Involved in Cluster Analysis Interval Variables –An interval variable contains continuous measurements (e.g., height, weight, temperature, cost, etc.) which follow a linear scale –It is essential that intervals keep the same importance throughout the scale Nominal Variables –A nominal variable takes on more than two states. For example, the eye color of a person can be blue, brown, green or grey eyes –These states may be coded as 1, 2,..., M, however their order and the interval between any two states do not have any meaning

Attribute Types Involved in Cluster Analysis Ordinal Variables –An ordinal variable takes on more than two states. For example, you may ask someone to convey his/her appreciation of some paintings in terms of the following categories: 1=detest, 2=dislike, 3=indifferent, 4=like and 5=admire –In an ordinal variable, their states are ordered in a meaningful sequence. However, the interval between any two consecutive states are not equally distanced Binary Variables –Binary variables have only two possible states. For example, the gender of a person is either female or male

Dissimilarity (Distance) Measure

Categorization of Clustering Methods Exclusive vs. Non-Exclusive (Overlapping) Hierarchical Methods vs. Partitioning Methods Hierarchical Methods –Single Link Method –Complete Link Method Partitioning Methods –Kohonen Self-Organizing Feature Maps –K-Means Methods –K-Medoids Methods (PAM, CLARA, CLARANS) –Density-Based Methods –…

Hierarchical Methods Dissimilarity Matrix (5  5)

K-Means Methods

Sensitive to Outlier!

Exercise 7 ObjectXY Number of clusters = 2 Using Single Link, Complete Link and K-Means to cluster the following data: