Unsupervised Learning: Clustering

Slides:

Advertisements

Similar presentations

CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.

Advertisements

Microarray Data Analysis (Lecture for CS397-CXZ Algorithms in Bioinformatics) March 19, 2004 ChengXiang Zhai Department of Computer Science University.

Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.

Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , Chapter 8.

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.

Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.

Introduction to Bioinformatics - Tutorial no. 12

Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:

Clustering Unsupervised learning Generating “classes”

Evaluating Performance for Data Mining Techniques

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.

Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (

Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

Unsupervised Learning. Supervised learning vs. unsupervised learning.

Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.

Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.

K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.

Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.

Clustering Unsupervised learning introduction Machine Learning.

Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

Machine Learning Queens College Lecture 7: Clustering.

Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.

Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.

Data Mining and Text Mining. The Standard Data Mining process.

Data Mining – Algorithms: K Means Clustering

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering

Semi-Supervised Clustering

Clustering CSC 600: Data Mining Class 21.

Chapter 15 – Cluster Analysis

Machine Learning Clustering: K-means Supervised Learning

Constrained Clustering -Semi Supervised Clustering-

Machine Learning Lecture 9: Clustering

Data Mining K-means Algorithm

cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Machine Learning and Data Mining Clustering

Machine Learning I & II.

Data Clustering Michael J. Watts

CSE 5243 Intro. to Data Mining

Image Processing for Physical Data

AIM: Clustering the Data together

John Nicholas Owen Sarah Smith

Information Organization: Clustering

Data Mining 資料探勘分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Revision (Part II) Ke Chen

Data Mining – Chapter 4 Cluster Analysis Part 2

K-means Ask user how many clusters they’d like. (e.g. k=5) Copyright © 2001, 2004, Andrew W. Moore.

INTRODUCTION TO Machine Learning

Text Categorization Berlin Chen 2003 Reference:

Clustering Techniques

Data Mining CSCI 307, Spring 2019 Lecture 24

Presentation transcript:

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. See http://www.autonlab.org/tutorials/ for a repository of Data Mining tutorials

Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a function f : X→Y. But, what if we don’t have labels? No labels = unsupervised learning Only some points are labeled = semi-supervised learning Getting labels may be expensive, so we only get a few Clustering is the unsupervised grouping of data points. It can be used for knowledge discovery.

Top-down vs. Bottom Up Clustering is typically done using a distance measure defined between instances The distance is defined in the instance feature space Agglomerative approach works bottom up: Treat each instance as a cluster Merge the two closest clusters Repeat until the stop condition is met Top-down approach starts a cluster with all instances Find a cluster to split into two or more smaller clusters Repeat until stop condition met

Clustering Data

K-Means Clustering K-Means ( k , data ) Randomly choose k cluster center locations (centroids). Loop until convergence Assign each point to the cluster of the closest centroid. Reestimate the cluster centroids based on the data assigned to each.

K-Means Clustering K-Means ( k , data ) Randomly choose k cluster center locations (centroids). Loop until convergence Assign each point to the cluster of the closest centroid. Reestimate the cluster centroids based on the data assigned to each.

K-Means Clustering K-Means ( k , data ) Randomly choose k cluster center locations (centroids). Loop until convergence Assign each point to the cluster of the closest centroid. Reestimate the cluster centroids based on the data assigned to each.

Problems with K-Means Very sensitive to the initial points Do many runs of k-Means, each with different initial centroids. Seed the centroids using a better method than random. (e.g. Farthest-first sampling) Must manually choose k Learn the optimal k for the clustering. (Note that this requires a performance measure.)

Problems with K-Means How do you tell it which clustering you want? Constrained clustering techniques Same-cluster constraint (must-link) Different-cluster constraint (cannot-link)