AIM: Clustering the Data together

Slides:



Advertisements
Similar presentations
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 11: K-Means Clustering Martin Russell.
Advertisements

K-means Clustering Ke Chen.
K-MEANS ALGORITHM Jelena Vukovic 53/07
Clustering Basic Concepts and Algorithms
PARTITIONAL CLUSTERING
Introduction to Bioinformatics
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Clustering Color/Intensity
Clustering.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
1 Partitioning Algorithms: Basic Concepts  Partition n objects into k clusters Optimize the chosen partitioning criterion Example: minimize the Squared.
Advanced Multimedia Text Clustering Tamara Berg. Reminder - Classification Given some labeled training documents Determine the best label for a test (query)
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Evaluating Performance for Data Mining Techniques
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Machine Learning Queens College Lecture 7: Clustering.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining and Text Mining. The Standard Data Mining process.
COMP24111 Machine Learning K-means Clustering Ke Chen.
Data Mining – Algorithms: K Means Clustering
Gilad Lerman Math Department, UMN
Homework 1 Tutorial Instructor: Weidong Shi (Larry), PhD
Data Mining: Basic Cluster Analysis
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Clustering Data Streams
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Slides by Eamonn Keogh (UC Riverside)
A Genetic Algorithm Approach to K-Means Clustering
Ke Chen Reading: [7.3, EA], [9.1, CMB]
Data Mining K-means Algorithm
K-Means Seminar Social Media Mining University UC3M Date May 2017
CSE 5243 Intro. to Data Mining
John Nicholas Owen Sarah Smith
Clustering Basic Concepts and Algorithms 1
CSSE463: Image Recognition Day 23
Ke Chen Reading: [7.3, EA], [9.1, CMB]
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Problem Definition Input: Output: Requirement:
DATA MINING Introductory and Advanced Topics Part II - Clustering
Data Mining – Chapter 4 Cluster Analysis Part 2
Cluster Analysis.
Text Categorization Berlin Chen 2003 Reference:
CSSE463: Image Recognition Day 23
Statistical Models and Machine Learning Algorithms --Review
Clustering The process of grouping samples so that the samples are similar within each group.
Unsupervised Learning: Clustering
CSSE463: Image Recognition Day 23
EM Algorithm and its Applications
Data Mining CSCI 307, Spring 2019 Lecture 24
K-means and Hierarchical Clustering
Presentation transcript:

AIM: Clustering the Data together Clustering not dependent upon visible features Visible features example: fingerprints – whorls/archs/loops These are dependent upon the genetic formation People of similar regions may have similarity in terms of these features Thus clustering based on such features is biased

Letting the Data speak Collect the data samples from the test domain Cluster the data based on some clustering algorithm

Clustering Algorithm used K-means clustering Algorithm Algorithm in short: Start with a predefined number of clusters Initialize the clusters with a certain centroid Run the algorithm to associate each member point to a cluster Re-calculate the centroids

K-means clustering algorithm Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids. Assign each object to the group that has the closest centroid. When all objects have been assigned, recalculate the positions of the K centroids. Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.

Algorithm Evaluation Non-intuitive, yet true: Always Converges Reason: finite number of ways of partitioning ‘R’ into ‘k’ groups Each time the configuration changes, we go to an improved distortion (sum of square error) Every iteration – new configuration If run forever, number of configurations exhaust

Algorithm Evaluation Convergence does not guarantee Optimality To Assure a near-Optimal solution: Careful selection of starting points Several runs of the Algorithm

Selecting the starting points Using the k-nearest neighbor concept ‘k’ corresponds to the number of clusters Find the global mean of the entire dataset Find the 'k' - closest data points to the global mean These 'k'-closest data samples are the initial 'k'- centroids.

The code so far….

References Statistical data mining Tutorial slides Andrew Moore (http://www-2.cs.cmu.edu/~awm/tutorials/kmeans.html) A Tutorial on clustering algorithms (http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/kmeans.html)