Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.

Slides:



Advertisements
Similar presentations
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Advertisements

Unsupervised Learning
K Means Clustering , Nearest Cluster and Gaussian Mixture
Machine Learning and Data Mining Clustering
Unsupervised learning: Clustering Ata Kaban The University of Birmingham
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
Clustering.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Unsupervised Learning
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
What is Cluster Analysis?
What is Cluster Analysis?
Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Clustering & Dimensionality Reduction 273A Intro Machine Learning.
Radial Basis Function Networks
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Clustering Unsupervised learning Generating “classes”
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
ICS 178 Introduction Machine Learning & data Mining Instructor max Welling Lecture 6: Logistic Regression.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
1 E. Fatemizadeh Statistical Pattern Recognition.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Flat clustering approaches
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Other Clustering Techniques
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Machine Learning and Data Mining Clustering
Data Mining K-means Algorithm
Ch8: Nonparametric Methods
Classification of unlabeled data:
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Clustering 77B Recommender Systems
Text Categorization Berlin Chen 2003 Reference:
Machine Learning and Data Mining Clustering
Junheng, Shengming, Yunsheng 11/09/2018
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Machine Learning and Data Mining Clustering
Presentation transcript:

Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining

Unsupervised Learning In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning we are only given attributes. Our task is to discover structure in the data. Example: the data may be structured in clusters: Is this a good clustering?

Why Discover Structure ? Often, the result of an unsupervised learning algorithm is a new representation for the same data. This new representation should be more meaningful and could be used for further processing (e.g. classification). Clustering: The new representation is now given by the label of a cluster to which the data-point belongs. This tells us which data-cases are similar to each other. The new representation is smaller and hence more convenient computationally. Clustering: Each data-case is now encoded by its cluster label. This is a lot cheaper than its attribute values. CF: We can group the users into user-communities or/and the movies into movie genres. If we need to predict something we simply pick the average rating in the group.

Clustering: K-means We iterate two operations: 1. Update the assignment of data-cases to clusters 2. Update the location of the cluster. Denote the assignment of data-case “i” to cluster “c”. Denote the position of cluster “c” in a d-dimensional space. Denote the location of data-case i Then iterate until convergence: 1. For each data-case, compute distances to each cluster and pick the closest one: 2. For each cluster location, compute the mean location of all data-cases assigned to it: Set of data-cases assigned to cluster cNr. of data-cases in cluster c

K-means Cost function: Each step in k-means decreases this cost function. Often initialization is very important since there are very many local minima in C. Relatively good initialization: place cluster locations on K randomly chosen data-cases. How to choose K? Add complexity term: and minimize also over K

Vector Quantization K-means divides the space up in a Voronoi tesselation. Every point on a tile is summarized by the code-book vector “+”. This clearly allows for data compression !

Mixtures of Gaussians K-means assigns each data-case to exactly 1 cluster. But what if clusters are overlapping? Maybe we are uncertain as to which cluster it really belongs. The mixtures of Gaussians algorithm assigns data-cases to cluster with a certain probability.

MoG Clustering Covariance determines the shape of these contours Idea: fit these Gaussian densities to the data, one per cluster.

EM Algorithm: E-step “r” is the probability that data-case “i” belongs to cluster “c”. is the a priori probability of being assigned to cluster “c”. Note that if the Gaussian has high probability on data-case “i” (i.e. the bell-shape is on top of the data-case) then it claims high responsibility for this data-case. The denominator is just to normalize all responsibilities to 1:

EM Algorithm: M-Step total responsibility claimed by cluster “c” expected fraction of data-cases assigned to this cluster weighted sample mean where every data-case is weighted according to the probability that it belongs to that cluster. weighted sample covariance

EM-MoG EM comes from “expectation maximization”. We won’t go through the derivation. If we are forced to decide, we should assign a data-case to the cluster which claims highest responsibility. For a new data-case, we should compute responsibilities as in the E-step and pick the cluster with the largest responsibility. E and M steps should be iterated until convergence (which is guaranteed). Every step increases the following objective function (which is the total log-probability of the data under the model we are learning):

Agglomerative Hierarchical Clustering Define a “distance” between clusters (later). Initially, every data-case is its own cluster. At each iteration, compute the distances between all existing clusters (you can store distances and avoid their re-computation). Merge the closest clusters into 1 single cluster. Update you “dendrogram”. Every data-case is a cluster

Iteration 1

Iteration 2

Iteration 3 This way you build a hierarchy. Complexity Order (why?)

Dendrogram

Distances produces minimal spanning tree. avoids elongated clusters.

Gene Expression Data Micro-array Data The expression level of genes is tested under different experimental conditions. We like to find the genes which co-express in a subset of conditions. Both genes and conditions are clustered and shown as dendrograms.

Exercise I Imagine I have run a clustering algorithm on some data describing 3 attributes of cars: height, weight, length. I have found two clusters. An expert comes by and tells you that class 1 is really Ferrari’s while class 2 is Hummers. A new data-case (car) is presented, i.e. you get to see the height, weight, length. Describe how you can use the output of your clustering, including the information obtained from the expert to classify the new car as a Ferrari or a Hummer. Be very precise: use an equation or pseudo-code to describe what to do. You add the new car to the dataset and run the K-means starting at its converged assignments and cluster means obtained from before. Is it possible that the assignments of the old data change due to the addition of the new data-case?

Exercise II We classify data according to the 3-nearest neighbors (3-NN) rule. Explain in detail how this works. Which decision surface do you think is smoother: the one for 1-NN or for 100-NN? Explain. Is k-NN a parametric or non-parametric method. Give an important property of non-parametric classification method. We will do linear regression on data of the form (Xn,Yn) where Xn and Yn are real values: Yn = AXn+b+n where A,b are parameters and n is the noise variable. Provide the equation for the total Error of the data-items. We want to minimize the Error. With respect to what ? You are given a new attribute Xnew. What would you predict for Ynew.