Unsupervised Learning: Clustering

Slides:



Advertisements
Similar presentations
CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
Advertisements

Microarray Data Analysis (Lecture for CS397-CXZ Algorithms in Bioinformatics) March 19, 2004 ChengXiang Zhai Department of Computer Science University.
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , Chapter 8.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Introduction to Bioinformatics - Tutorial no. 12
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Unsupervised Learning. Supervised learning vs. unsupervised learning.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Clustering Unsupervised learning introduction Machine Learning.
Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Machine Learning Queens College Lecture 7: Clustering.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Data Mining and Text Mining. The Standard Data Mining process.
Data Mining – Algorithms: K Means Clustering
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Chapter 15 – Cluster Analysis
Machine Learning Clustering: K-means Supervised Learning
Constrained Clustering -Semi Supervised Clustering-
Machine Learning Lecture 9: Clustering
Data Mining K-means Algorithm
cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14
Machine Learning and Data Mining Clustering
Machine Learning I & II.
Data Clustering Michael J. Watts
CSE 5243 Intro. to Data Mining
Clustering.
Image Processing for Physical Data
AIM: Clustering the Data together
John Nicholas Owen Sarah Smith
Clustering.
Information Organization: Clustering
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Revision (Part II) Ke Chen
Data Mining – Chapter 4 Cluster Analysis Part 2
K-means Ask user how many clusters they’d like. (e.g. k=5) Copyright © 2001, 2004, Andrew W. Moore.
INTRODUCTION TO Machine Learning
Text Categorization Berlin Chen 2003 Reference:
Clustering Techniques
Data Mining CSCI 307, Spring 2019 Lecture 24
Presentation transcript:

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. See http://www.autonlab.org/tutorials/ for a repository of Data Mining tutorials

Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a function f : X→Y. But, what if we don’t have labels? No labels = unsupervised learning Only some points are labeled = semi-supervised learning Getting labels may be expensive, so we only get a few Clustering is the unsupervised grouping of data points. It can be used for knowledge discovery.

Top-down vs. Bottom Up Clustering is typically done using a distance measure defined between instances The distance is defined in the instance feature space Agglomerative approach works bottom up: Treat each instance as a cluster Merge the two closest clusters Repeat until the stop condition is met Top-down approach starts a cluster with all instances Find a cluster to split into two or more smaller clusters Repeat until stop condition met

Clustering Data

K-Means Clustering K-Means ( k , data ) Randomly choose k cluster center locations (centroids). Loop until convergence Assign each point to the cluster of the closest centroid. Reestimate the cluster centroids based on the data assigned to each.

K-Means Clustering K-Means ( k , data ) Randomly choose k cluster center locations (centroids). Loop until convergence Assign each point to the cluster of the closest centroid. Reestimate the cluster centroids based on the data assigned to each.

K-Means Clustering K-Means ( k , data ) Randomly choose k cluster center locations (centroids). Loop until convergence Assign each point to the cluster of the closest centroid. Reestimate the cluster centroids based on the data assigned to each.

Problems with K-Means Very sensitive to the initial points Do many runs of k-Means, each with different initial centroids. Seed the centroids using a better method than random. (e.g. Farthest-first sampling) Must manually choose k Learn the optimal k for the clustering. (Note that this requires a performance measure.)

Problems with K-Means How do you tell it which clustering you want? Constrained clustering techniques Same-cluster constraint (must-link) Different-cluster constraint (cannot-link)