Presentation is loading. Please wait.

Presentation is loading. Please wait.

MIS 451 Building Business Intelligence Systems

Similar presentations


Presentation on theme: "MIS 451 Building Business Intelligence Systems"— Presentation transcript:

1 MIS 451 Building Business Intelligence Systems
Clustering (2)

2 Problem Target Marketing Diaper, Baby food, Swiss cheese and Belgian
Toys chocolate French Wine

3 Clustering Clustering is a data mining method for grouping objects such that objects within the same cluster are similar and objects in different clusters are dissimilar. Why clustering SQL based OLAP is not suitable for clustering objects whose attributes have a large number of possible values SQL based OLAP is not suitable for clustering objects with a large number of attributes

4 Clustering Steps in clustering objects
Compute similarity between objects Clustering based on similarity between objects

5 Similarity An object (e.g., a customer) has a list of variables (e.g., attributes of a customer such as age, spending, gender etc.) When measuring similarity between objects we measure similarity between variables of objects. Instead of measuring similarity between variables, we use distance to measure dissimilarity between variables.

6 Dissimilarity Manhattan distance Euclidean distance
Continuous variable Manhattan distance Euclidean distance

7 Dissimilarity For two objects X and Y with continuous variables 1,2,…n, Manhattan distance is defined as:

8 Dissimilarity Example of Manhattan distance NAME AGE SPENDING($)
Sue Carl TOM JACK

9 Dissimilarity For two objects X and Y with continuous variables 1,2,…n, Euclidean distance is defined as:

10 Dissimilarity Example of Euclidean distance NAME AGE SPENDING($)
Sue Carl TOM JACK

11 Dissimilarity Standardize values of an variable Calculate mean value
Calculate mean absolute deviation Standardize values of an variable using the formula: new value = (old value – mean value)/mean standard deviation

12 Dissimilarity Binary variable
distance = number of matched variables/total number of variables NAME Married(Y/N) Gender Internet connection at home Sue Y M Y Carl Y F Y TOM N M N JACK N F N

13 Clustering based on dissimilarity
After calculating dissimilarity between objects, a dissimilarity matrix can be created with objects as indexes and dissimilarities between objects as elements.

14 Clustering based on dissimilarity
Sue Tom Carl Jack Mary Sue Tom Carl Jack Mary

15 Clustering based on dissimilarity
Step 1:Initially, place each object in an unique cluster Step 2: Calculate dissimilarity between clusters Dissimilarity between clusters is the minimum dissimilarity between two objects of the clusters, one from each cluster Step 3: Merge two clusters with the least dissimilarity Step 4: Continue step 1-3 until all objects are in one cluster


Download ppt "MIS 451 Building Business Intelligence Systems"

Similar presentations


Ads by Google