Clustering & Dimensionality Reduction 273A Intro Machine Learning.

Slides:



Advertisements
Similar presentations
Clustering. How are we doing on the pass sequence? Pretty good! We can now automatically learn the features needed to track both people But, it sucks.
Advertisements

Unsupervised Learning
Clustering Beyond K-means
Unsupervised learning
Segmentation and Fitting Using Probabilistic Methods
K-means clustering Hongning Wang
Machine Learning and Data Mining Clustering
Clustering and Dimensionality Reduction Brendan and Yifang April
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Unsupervised Learning
What is Cluster Analysis?
Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Radial Basis Function Networks
Clustering Unsupervised learning Generating “classes”
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
CSE 185 Introduction to Computer Vision Pattern Recognition.
CSC2515 Fall 2007 Introduction to Machine Learning Lecture 5: Mixture models, EM and variational inference All lecture slides will be available as.ppt,.ps,
Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
CSE 446 Dimensionality Reduction and PCA Winter 2012 Slides adapted from Carlos Guestrin & Luke Zettlemoyer.
Flat clustering approaches
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Other Clustering Techniques
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Machine Learning and Data Mining Clustering
Data Mining K-means Algorithm
Classification of unlabeled data:
Latent Variables, Mixture Models and EM
Probabilistic Models with Latent Variables
Clustering 77B Recommender Systems
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Biointelligence Laboratory, Seoul National University
Machine Learning and Data Mining Clustering
Machine Learning and Data Mining Clustering
Presentation transcript:

Clustering & Dimensionality Reduction 273A Intro Machine Learning

What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning we are only given attributes. Our task is to discover structure in the data. Example I: the data may be structured in clusters: Example II: the data may live on a lower dimensional manifold:

Why Discover Structure ? Data compression: If you have a good model you can encode the data more cheaply. Example PCA: To encode the data I have to encode the x and y position of each data-case. However, I could also encode the offset and angle of the line plus the deviations from the line. Small numbers can be encoded more cheaply than large numbers with the same precision. This idea is the basis for model selection: The complexity of your model (e.g. the number of parameters) should be such that you can encode the data-set with the fewest number of bits (up to a certain precision). Clustering: represent every data-case by a cluster representative plus deviations. ML is often trying to find semantically meaningful representations (abstractions). These are good as a basis for making new predictions.

Clustering: K-means We iterate two operations: 1. Update the assignment of data-cases to clusters 2. Update the location of the cluster. Denote the assignment of data-case “i” to cluster “c”. Denote the position of cluster “c” in a d-dimensional space. Denote the location of data-case i Then iterate until convergence: 1. For each data-case, compute distances to each cluster and the closest one: 2. For each cluster location, compute the mean location of all data-cases assigned to it: Set of data-cases assigned to cluster cNr. of data-cases in cluster c

K-means Cost function: Each step in k-means decreases this cost function. Often initialization is very important since there are very many local minima in C. Relatively good initialization: place cluster locations on K randomly chosen data-cases. How to choose K? Add complexity term: and minimize also over K Or X-validation Or Bayesian methods

Vector Quantization K-means divides the space up in a Voronoi tesselation. Every point on a tile is summarized by the code-book vector “+”. This clearly allows for data compression !

Mixtures of Gaussians K-means assigns each data-case to exactly 1 cluster. But what if clusters are overlapping? Maybe we are uncertain as to which cluster it really belongs. The mixtures of Gaussians algorithm assigns data-cases to cluster with a certain probability.

MoG Clustering Covariance determines the shape of these contours Idea: fit these Gaussian densities to the data, one per cluster.

EM Algorithm: E-step “r” is the probability that data-case “i” belongs to cluster “c”. is the a priori probability of being assigned to cluster “c”. Note that if the Gaussian has high probability on data-case “i” (i.e. the bell-shape is on top of the data-case) then it claims high responsibility for this data-case. The denominator is just to normalize all responsibilities to 1:

EM Algorithm: M-Step total responsibility claimed by cluster “c” expected fraction of data-cases assigned to this cluster weighted sample mean where every data-case is weighted according to the probability that it belongs to that cluster. weighted sample covariance

EM-MoG EM comes from “expectation maximization”. We won’t go through the derivation. If we are forced to decide, we should assign a data-case to the cluster which claims highest responsibility. For a new data-case, we should compute responsibilities as in the E-step and pick the cluster with the largest responsibility. E and M steps should be iterated until convergence (which is guaranteed). Every step increases the following objective function (which is the total log-probability of the data under the model we are learning):