1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

Slides:



Advertisements
Similar presentations
CLUSTERING.
Advertisements

Conceptual Clustering
Clustering.
Albert Gatt Corpora and Statistical Methods Lecture 13.
K Means Clustering , Nearest Cluster and Gaussian Mixture
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Artificial neural networks:
Machine Learning and Data Mining Clustering
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Clustering.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Unsupervised Learning and Data Mining
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Introduction to Bioinformatics - Tutorial no. 12
1 Kunstmatige Intelligentie / RuG KI2 - 7 Clustering Algorithms Johan Everts.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Chapter 5 Data mining : A Closer Look.
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
Presented by Tienwei Tsai July, 2005
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
Machine Learning Queens College Lecture 7: Clustering.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Unsupervised Classification
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Data Mining and Text Mining. The Standard Data Mining process.
4.0 - Data Mining Sébastien Lemieux Elitra Canada Ltd.
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Clustering MacKay - Chapter 20.
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Clustering.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
INTRODUCTION TO Machine Learning
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Text Categorization Berlin Chen 2003 Reference:
Unsupervised Learning: Clustering
Presentation transcript:

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

2 Overview § So far, in all the learning techniques we considered, a training example consisted of a set of attributes (or features) and either a class (in the case of classification) or a real number (in the case of regression) attached to it. §Unsupervised Learning takes as training examples the set of attributes/features alone. §The purpose of unsupervised learning is to attempt to find natural partitions in the training set. §Two general strategies for Unsupervised learning include: Clustering and Hierarchical Clustering.

3 Clustering and Hierarchical Clustering Clustering Hierarchical Clustering E E1 E2 E3 E4 E7 E8 E E1 E2 E7 E8

4 What is Unsupervised Learning Useful for?(I) §Collecting and labeling a large set of sample patterns can be very expensive. By designing a basic classifier with a small set of labeled samples, and then tuning the classifier up by allowing it to run without supervision on a large, unlabeled set, much time and trouble can be saved. §Training with large amounts of often less expensive, unlabeled data, and then using supervision to label the groupings found. This may be used for large "data mining" applications where the contents of a large database are not known beforehand.

5 What is Unsupervised Learning Useful for?(II) §Unsupervised methods can also be used to find features which can be useful for categorization. There are unsupervised methods that represent a form of data-dependent "smart pre-processing" or "smart feature extraction." §Lastly, it can be of interest to gain insight into the nature or structure of the data. The discovery of similarities among patterns or of major departures from expected characteristics may suggest a significantly different approach to designing the classifier.

6 Clustering Methods I: A Method Based on Euclidean Distance Suppose we have R randomly chosen cluster seekers C 1, …, C R. (During the training process, these points will move towards the center of one of the clusters of patterns) §For each pattern X i, presented, l Find the cluster seeker C j that is closest to X i l Move C j closer to X i as follows: C j <-- (1-  j )C j +  j X i where  j is a learning rate parameter for the j-th cluster seeker which determines how far C j is moved towards X i

7 Clustering Methods I: A Method Based on Euclidean Distance §It might be useful to make the cluster seekers move less far as training proceeds. §To do so, let each cluster seeker have a mass, m j, equal to the number of times it has moved. §If we set  j =1/(1+m j ), it can be seen that a cluster seeker is always at the center of gravity (sample mean) of the set of patterns towards which it has so far moved. §Once the cluster seekers have converged, the implied classifier can be based on a Voronoi partitioning of the space (based on the distances to the various cluster seekers) §Note that it is important to normalize the pattern features.

8 Clustering Methods I: A Method Based on Euclidean Distance §The number of cluster seekers can be chosen adaptively as a function of the distance between them and the sample variance of each cluster. Examples: §If the distance d ij between two cluster seekers C i and C j ever falls below some threshold , then merge the two clusters. §If the sample variance of some cluster is larger than some amount , then split the clusters into two separate clusters by adding an extra cluster seeker.

9 Clustering Methods II: A Method Based on Probabilities Given a set of unlabeled patterns, an empty list, L, of clusters, and a measure of similarity between an instance X and a cluster C i, S(X,C i )=p(x 1 |C i ) … p(x n |C i )p(C i ), X= §For each pattern X (one at a time) { l Compute S(X,C i ) for each cluster, C i. Let S(X, C max ) be the largest. If S(X,C max ) > , assign X to C max and update the sample statistics Otherwise, create a new cluster C new ={X} and add C new to L l Merge any existing clusters C i and C j if (M i -M j ) 2 <  [Mi, Mj are the sample means]. Compute the new sample statistics for the new cluster C merge l If the sample statistics did not change, then return L. }

10 Hierarchical Clustering I: A Method Based on Euclidean Distance Agglomerative Clustering 1. Compute the Euclidean Distance between every pair of patterns. Let the smallest distance be between patterns Xi and Xj. 2. Create a cluster C composed of Xi and Xj. Replace Xi and Xj by cluster vector C in the training set (the average of Xi and Xj). 3. Go back to 1, treating clusters as points, though with an appropriate weight, until no point is left.

11 Hierarchical Clustering II: A Method Based on Probabilities A probabilistic quality measure for partitions §Given a partitioning of the training set into R classes C 1,.., C R and given that each component of a pattern X= can take values v ij (where i is the component number and j, the different values that that component can take), §If we use the following probability measure for guessing the i th component correctly given that it is in class k:  j [p i (v ij |C k )] 2 where p i (v ij |C k )=probability(x i =v ij |C k ), then §The average number of components whose values are guessed correctly is:  i  j [p i (v ij |C k )] 2 §The goodnes measure of this partitioning is  k p(Ck)  i  j [p i (v ij |C k )] 2 §The final measure of goodness is: (1/R)  k p(Ck)  i  j [p i (v ij |C k )] 2

12 Hierarchical Clustering II: A Method Based on Probabilities COBWEB 1. Let us start with a tree whose root node contains all the training patterns and has a single empty successor. 2. Select a pattern X i (if there are no more pattern to select, then terminate). 3. Set  to the root node. 4. For each of the successors of , calculate the best host for X i, as a function of the Z value of the different potential partitions. 5. If the best host is an empty node , then place X i in  and generate an empty successor and sibling of . Go to If the best host  is a non-empty singleton, then place X i in , generate successors {X i } and previous  to the new , add empty successors everywhere, and go to If the best host is a non-empty, non-singleton node, , place X i in , set  to , and go to 4.

13 Hierarchical Clustering II: A Method Based on Probabilities Making COBWEB less order dependent §Node Merging : l Attempt to merge the two best hosts l If merging improves the Z value, then do so. §Node Splitting: l Attempt to replace the best host among a group of sibling by that hosts’s successors. l If splitting improves the Z value, then do so.

14 Other Unsupervised Methods: §There are a lot of other Unsupervised Learning Methods. §Examples: l k-means l The EM Algorithm l Competitive Learning l Kohonen’s Neural Networks: Self-Organizing Maps l Principal Component Analysis, Autoassociation