Other Clustering Techniques

Slides:

Advertisements

Similar presentations

Advertisements

Clustering (2). Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like.

Cluster Analysis: Basic Concepts and Algorithms

Hierarchical Clustering, DBSCAN The EM Algorithm

Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.

Birch: Balanced Iterative Reducing and Clustering using Hierarchies By Tian Zhang, Raghu Ramakrishnan Presented by Vladimir Jelić 3218/10

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.

Part II - Clustering© Prentice Hall1 Clustering Large DB Most clustering algorithms assume a large data structure which is memory resident. Most clustering.

More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.

Chapter 3: Cluster Analysis

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.

K-Means and DBSCAN Erik Zeitler Uppsala Database Laboratory.

Cluster Analysis.

4. Clustering Methods Concepts Partitional (k-Means, k-Medoids)

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.

Cluster Analysis.

Cluster Analysis: Basic Concepts and Algorithms

Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.

What is Cluster Analysis?

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Birch: An efficient data clustering method for very large databases

Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.

Radial Basis Function Networks

DATA MINING LECTURE 8 Clustering The k-means algorithm

Math 5364 Notes Chapter 8: Cluster Analysis Jesse Crawford Department of Mathematics Tarleton State University.

Cluster Analysis Part II. Learning Objectives Hierarchical Methods Density-Based Methods Grid-Based Methods Model-Based Clustering Methods Outlier Analysis.

9/03Data Mining – Clustering G Dong (WSU) 1 4. Clustering Methods Concepts Partitional (k-Means, k-Medoids) Hierarchical (Agglomerative & Divisive, COBWEB)

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

The BIRCH Algorithm Davitkov Miroslav, 2011/3116

Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.

Topic9: Density-based Clustering

BIRCH: An Efficient Data Clustering Method for Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny University of Wisconsin-Maciison Presented.

Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

DBSCAN Data Mining algorithm Dr Veljko Milutinović Milan Micić

BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies A hierarchical clustering method. It introduces two concepts : Clustering feature Clustering.

COMP5331 Outlier Prepared by Raymond Wong Presented by Raymond Wong

Presented by Ho Wai Shing

5/29/2008AI UEC in Japan Chapter 12 Clustering: Large Databases Written by Farial Shahnaz Presented by Zhao Xinyou Data Mining Technology.

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.

Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.

Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.

CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.

Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.

Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.

Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.

ΠΑΝΕΠΙΣΤΗΜΙΟ ΙΩΑΝΝΙΝΩΝ ΑΝΟΙΚΤΑ ΑΚΑΔΗΜΑΪΚΑ ΜΑΘΗΜΑΤΑ Εξόρυξη Δεδομένων Ομαδοποίηση (clustering) Διδάσκων: Επίκ. Καθ. Παναγιώτης Τσαπάρας.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Data Mining: Basic Cluster Analysis

DATA MINING Spatial Clustering

More on Clustering in COSC 4335

CSE 4705 Artificial Intelligence

Data Mining Soongsil University

Clustering in Ratemaking: Applications in Territories Clustering

BIRCH: An Efficient Data Clustering Method for Very Large Databases

CS 685: Special Topics in Data Mining Jinze Liu

CS 685: Special Topics in Data Mining Jinze Liu

CS 485G: Special Topics in Data Mining

The BIRCH Algorithm Davitkov Miroslav, 2011/3116

CSE572, CBS572: Data Mining by H. Liu

Gyozo Gidofalvi Uppsala Database Laboratory

Birch presented by : Bahare hajihashemi Atefeh Rahimi

BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies

CSE572: Data Mining by H. Liu

CS 685: Special Topics in Data Mining Jinze Liu

BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies

Presentation transcript:

Other Clustering Techniques COMP5331 Other Clustering Techniques Prepared by Raymond Wong Presented by Raymond Wong raywong@cse

What we learnt K-mean Dendrogram

Other Clustering Models Model-Based Clustering EM Algorithm Density-Based Clustering DBSCAN Scalable Clustering Method BIRCH

EM Algorithm Drawback of the K-means/Dendrogram Each point belongs to a single cluster There is no representation that a point can belong to different clusters with different probabilities Use probability density to associate to each point

EM Algorithm Assume that we know there are k clusters Each cluster follows a distribution (e.g., Gaussian Distribution) 1D Gaussian Distribution Mean  Standard derivation  p(x| <, >) =

EM Algorithm Since there are k clusters, we have k distributions. Gaussian Distribution Mean 1 Standard derivation 1 Cluster 2 Mean 2 Standard derivation 2 … Cluster k Mean k Standard derivation k

EM Algorithm EM Algorithm Algorithm Expectation-Maximization Step 1 (Parameter Initialization) Initialize all i and i Step 2 (Expectation) For each point x, For each cluster i, Calculate the probability that x belongs cluster i Step 3 (Maximization) Calculate the mean i according to the probabilities that all points belong to cluster i Repeat Step 2 and Step 3 until the parameters converge One possible implementation: p(xCi) = p(x|<i, i>) jp(x|<j, j>) One possible implementation: i = x x . p(x|<i, i>)

Other Clustering Models Model-Based Clustering EM Algorithm Density-Based Clustering DBSCAN Scalable Clustering Method BIRCH

DBSCAN Traditional Clustering DBSCAN Can only represent sphere clusters Cannot handle irregular shaped clusters DBSCAN Density-Based Spatial Clustering of Applications with Noise

DBSCAN Given a point p and a non-negative real number , the -neighborhood of point p, denoted by N(p), is the set of points q (including point p itself) such that the distance between p and q is within .

DBSCAN c b d a e According to -neighborhood of point p, we classify all points into three types core points border points noise points Given a point p and a non-negative integer MinPts, if the size of N(p) is at least MinPts, then p is said to be a core point. Given a point p, p is said to be a border point if it is not a core point but N(p) contains at least one core point. Given a point p, p is said to be a noise point if it is neither a core point nor a border point.

DBSCAN Principle 1: Each cluster contains at least one core point. Principle 2: Given any two core points p and q, if N(p) contains q (or N(q) contains p), then p and q are in the same cluster. Principle 3: Consider a border point p to be assigned to one of the clusters formed by Principle 1 and Principle 2. Suppose N(p) contains multiple core points. A border point p is assigned arbitrarily to one of the clusters containing these core points (formed by Principle 1 and Principle 2). Principle 4: All noise points do not belong to any clusters.

Other Clustering Models Model-Based Clustering EM Algorithm Density-Based Clustering DBSCAN Scalable Clustering Method BIRCH

BIRCH Disadvantage of Previous Algorithms BIRCH Most previous algorithms cannot handle update Most previous algorithms are not scalable BIRCH Balanced Iterative Reducing and Clustering Using Hierarchies

BIRCH Advantages Incremental Scalable

BIRCH Each cluster has the following three terms. Mean Radius Diameter Average distance from member objects to the mean Average pairwise distance within a cluster

BIRCH L is to store a list of clusters Idea of the Algorithm L  {} When there is a new data point x If L = {} Create cluster C containing x only Insert C into L Else Find the closest cluster C from x Insert x into C If C has diameter D greater than a given threshold, Split cluster C into two sub-clusters C1 and C2 Remove C from L Insert C1 and C2 into L

BIRCH If there is no efficient data structure, the computation of D is very slow Running time = O(?)

BIRCH BIRCH stores the following for each cluster instead of , R and D clustering feature (CF) <n, LS, SS> n: no. of points in the cluster LS: the linear sum of the n points SS: the square sum of the n points LS = SS =

BIRCH It is easy to verify that R and D can be derived from CF

BIRCH Why Efficient? When a new data point comes, we can update the CF structure efficiently Running time = O(?) clustering feature (CF) <n, LS, SS> n: no. of points in the cluster LS: the linear sum of the n points SS: the square sum of the n points LS = SS =