INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 Lecture Slides for.

Slides:



Advertisements
Similar presentations
Image Modeling & Segmentation
Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
INTRODUCTION TO Machine Learning 2nd Edition
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Unsupervised Learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning 3rd Edition
K Means Clustering , Nearest Cluster and Gaussian Mixture
Machine Learning and Data Mining Clustering
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
INTRODUCTION TO Machine Learning 2nd Edition
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Clustering.
Gaussian Mixture Example: Start After First Iteration.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Unsupervised Learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Clustering & Dimensionality Reduction 273A Intro Machine Learning.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
Flat clustering approaches
Chapter 13 (Prototype Methods and Nearest-Neighbors )
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Review for final exam 2015 Fundamentals of ANN RBF-ANN using clustering Bayesian decision theory Genetic algorithm SOM SVM.
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Lecture 7 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Introduction to Machine Learning Nir Ailon Lecture 12: EM, Clustering and More.
INTRODUCTION TO Machine Learning 3rd Edition
Classification of unlabeled data:
Probabilistic Models with Latent Variables
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
INTRODUCTION TO Machine Learning
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Radial Basis Functions: Alternative to Back Propagation
Linear Discrimination
EM Algorithm and its Applications
Presentation transcript:

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for

CHAPTER 7: Clustering

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 Semiparametric Density Estimation Parametric: Assume a single model for p (x | C i ) (Chapter 4 and 5) Semiparametric: p (x | C i ) is a mixture of densities Multiple possible explanations/prototypes: Different handwriting styles, accents in speech Nonparametric: No model; data speaks for itself (Chapter 8)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 Mixture Densities where G i the components/groups/clusters, P ( G i ) mixture proportions (priors), p ( x | G i ) component densities Gaussian mixture where p(x|G i ) ~ N ( μ i, ∑ i ) parameters Φ = {P ( G i ), μ i, ∑ i } k i=1 unlabeled sample X={x t } t (unsupervised learning)

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 Classes vs. Clusters Supervised: X = { x t,r t } t Classes C i i=1,...,K where p ( x | C i ) ~ N ( μ i, ∑ i ) Φ = {P (C i ), μ i, ∑ i } K i=1 Unsupervised : X = { x t } t Clusters G i i=1,...,k where p ( x | G i ) ~ N ( μ i, ∑ i ) Φ = {P ( G i ), μ i, ∑ i } k i=1 Labels, r t i ?

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6 k-Means Clustering Find k reference vectors (prototypes/codebook vectors/codewords) which best represent data Reference vectors, m j, j =1,...,k Use nearest (most similar) reference: Reconstruction error

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7 Encoding/Decoding

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8 k-means Clustering

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9

10 Expectation-Maximization (EM) Log likelihood with a mixture model Assume hidden variables z, which when known, make optimization much simpler Complete likelihood, L c ( Φ |X,Z), in terms of x and z Incomplete likelihood, L( Φ |X), in terms of x

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 E- and M-steps Iterate the two steps 1. E-step: Estimate z given X and current Φ 2. M-step: Find new Φ ’ given z, X, and old Φ. An increase in Q increases incomplete likelihood

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12 EM in Gaussian Mixtures z t i = 1 if x t belongs to G i, 0 otherwise (labels r t i of supervised learning); assume p(x|G i )~N( μ i,∑ i ) E-step: M-step: Use estimated labels in place of unknown labels

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 P(G 1 |x)=h 1 =0.5

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14 Mixtures of Latent Variable Models Regularize clusters 1. Assume shared/diagonal covariance matrices 2. Use PCA/FA to decrease dimensionality: Mixtures of PCA/FA Can use EM to learn V i (Ghahramani and Hinton, 1997; Tipping and Bishop, 1999)

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15 After Clustering Dimensionality reduction methods find correlations between features and group features Clustering methods find similarities between instances and group instances Allows knowledge extraction through number of clusters, prior probabilities, cluster parameters, i.e., center, range of features. Example: CRM, customer segmentation

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16 Clustering as Preprocessing Estimated group labels h j (soft) or b j (hard) may be seen as the dimensions of a new k dimensional space, where we can then learn our discriminant or regressor. Local representation (only one b j is 1, all others are 0; only few h j are nonzero) vs Distributed representation (After PCA; all z j are nonzero)

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 Mixture of Mixtures In classification, the input comes from a mixture of classes (supervised). If each class is also a mixture, e.g., of Gaussians, (unsupervised), we have a mixture of mixtures:

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18 Hierarchical Clustering Cluster based on similarities/distances Distance measure between instances x r and x s Minkowski (L p ) (Euclidean for p = 2) City-block distance

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19 Agglomerative Clustering Start with N groups each with one instance and merge two closest groups at each iteration Distance between two groups G i and G j :  Single-link:  Complete-link:  Average-link, centroid

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20 Dendrogram Example: Single-Link Clustering

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21 Choosing k Defined by the application, e.g., image quantization Plot data (after PCA) and check for clusters Incremental (leader-cluster) algorithm: Add one at a time until “elbow” (reconstruction error/log likelihood/intergroup distances) Manual check for meaning