Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Manifold Learning Dimensionality Reduction. Outline Introduction Dim. Reduction Manifold Isomap Overall procedure Approximating geodesic dist. Dijkstra’s.
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
1 A Survey on Distance Metric Learning (Part 1) Gerry Tesauro IBM T.J.Watson Research Center.
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
Clustering and Dimensionality Reduction Brendan and Yifang April
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Isomap Algorithm.
Two Technique Papers on High Dimensionality Allan Rempel December 5, 2005.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Semi-Supervised Classification by Low Density Separation Olivier Chapelle, Alexander Zien Student: Ran Chang.
Radial Basis Functions
Dimensionality Reduction and Embeddings
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Manifold Learning: ISOMAP Alan O'Connor April 29, 2008.
1 NONLINEAR MAPPING: APPROACHES BASED ON OPTIMIZING AN INDEX OF CONTINUITY AND APPLYING CLASSICAL METRIC MDS TO REVISED DISTANCES By Ulas Akkucuk & J.
Clustering.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Distributed Model-Based Learning PhD student: Zhang, Xiaofeng.
NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
CSE 185 Introduction to Computer Vision Pattern Recognition.
Graph Embedding: A General Framework for Dimensionality Reduction Dong XU School of Computer Engineering Nanyang Technological University
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
CSE 185 Introduction to Computer Vision Face Recognition.
Manifold learning: MDS and Isomap
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Non-Linear Dimensionality Reduction
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Lecture 2: Statistical learning primer for biologists
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.
CSC321: Extra Lecture (not on the exam) Non-linear dimensionality reduction Geoffrey Hinton.
High Dimensional Probabilistic Modelling through Manifolds
Spectral Methods for Dimensionality
Nonlinear Dimensionality Reduction
Deep Feedforward Networks
LECTURE 11: Advanced Discriminant Analysis
CS 2750: Machine Learning Dimensionality Reduction
Machine Learning Dimensionality Reduction
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
CS4670: Intro to Computer Vision
NonLinear Dimensionality Reduction or Unfolding Manifolds
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Data Mining Course 2007 Eric Postma Clustering

Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering 2.Distance preservation Sammon mapping, Isomap, SPE 3.Maximum likelihood density estimation Gaussian Mixtures

These datasets have identical statistics up to 2 nd order

1. Minimization of reconstruction error

Illustration of PCA (1) Face dataset (Rice database)

Illustration of PCA (2) Average face

Illustration of PCA (3) Top 10 Eigenfaces

Each 39-dimensional data item describes different aspects of the welfare and poverty of one country. 2D PCA projection

Non-linear PCA Using neural networks (to be discussed tomorrow)

2. Distance preservation

Sammon mapping Given a data set X. The distance between any two samples is defined as D ij We consider the projection on a two dimensional plane where the projected points are separated by d ij Define an Error function

Sammon mapping

Main limitations of Sammon The Sammon mapping procedure is a gradient descent method Main limitation: local minima MDS may be preferred because it finds global minima (being based on PCA) Both methods have difficulty with “curved or curly subspaces”

Isomap Tenenbaum Build a graph in which each node represents a data point Compute shortest distances along the graph (e.g., Dijkstra’s algorithm) Store all distances in a matrix D Perform MDS on the matrix D

Illustration of Isomap (1) For two arbitrary points on the manifold Euclidean distance does not always reflect similarity (cf. dashed blue line)

Illustration of Isomap (2) Isomap finds the appropriate shortest path along the graph (red curve, for K=7, N=1000)

Illustration of Isomap (3) Two-dimensional embedding (red line is the shortest path along the graph, blue line is the true distance in the embedding.

Illustration of Isomap (4) Isomaps (●) ability to find the intrinsic dimensionality as compared to PCA and MDS (∆ and o).

Illustration of Isomap (5)

Illustration of Isomap (6)

Illustration of Isomap (7) Interpolation along a straight line

Stochastic Proximity Embedding SPE algorithm Agrafiotis, D.K. and Xu, H. (2002). A self-organizing principle for learning nonlinear manifolds. Proceedings of the National Academy of Sciences U.S.A.

Stress function Output proximity between points i and jInput proximity between points i and j

Swiss roll data set Original 3D set2D embedding obtained by SPE

Stress as a function of embedding dimension (averaged over 30 runs)

Scalability (# steps for four set sizes) Linear scaling

Conformations of methylpropylether C 1 C 2 C 3 O 4 C 5

Diamine combinatorial library

Clustering Minimize the total within-cluster variance (reconstruction error) k ic = 1 if a data point belongs to cluster c K-means clustering 1.Random selection of C cluster centres 2.Partition the data by assigning them to the clusters 3.The mean of each partitioning is the new cluster centre A distance threshold may be used…

Effect of distance threshold on the number of clusters

Main limitation of k-means clustering Final partitioning and cluster centres depend on initial configuration Discrete partitioning may introduce errors Instead of minimizing the reconstruction error, we may maximize the likelihood of the data (given some probabilistic model)

Neural algorithms related to k-means Kohonen self-organizing feature maps Competitive learning networks

3. Maximum likelihood

Gaussian Mixtures Model the pdf of the data using a mixture of distributions K is the number of kernels (<< # data points) Common choice for the component densities p(x|i):

Illustration of EM applied to GM model The solid line gives the initialization of the EM algorithm: two kernels, P(1) = P(2) = 0:5, μ1 = ; μ2 = , σ1 = σ2 = 0:2356

Convergence after 10 EM steps..

Relevant literature L.J.P. van der Maaten, E.O. Postma, and H.J. van den Herik (submitted). Dimensionality Reduction: A Comparative Review.