CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Spring 2008 Jinze.

Slides:

Advertisements

Similar presentations

3D Geometry for Computer Graphics

Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.

Dimensionality Reduction PCA -- SVD

CLUSTERING PROXIMITY MEASURES

Dimension reduction (1)

Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009

Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Isomap Algorithm.

Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.

Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8

Principal Component Analysis

Computer Graphics Recitation 5.

Dimensionality Reduction and Embeddings

Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap

Dimensional reduction, PCA

1 Numerical geometry of non-rigid shapes Spectral Methods Tutorial. Spectral Methods Tutorial 6 © Maks Ovsjanikov tosca.cs.technion.ac.il/book Numerical.

SVD and PCA COS 323. Dimensionality Reduction Map points in high-dimensional space to lower number of dimensionsMap points in high-dimensional space to.

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.

Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.

A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.

SVD and PCA COS 323, Spring 05. SVD and PCA Principal Components Analysis (PCA): approximating a high-dimensional data set with a lower-dimensional subspacePrincipal.

Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.

E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:

Dimensionality Reduction

NonLinear Dimensionality Reduction or Unfolding Manifolds Tennenbaum|Silva|Langford [Isomap] Roweis|Saul [Locally Linear Embedding] Presented by Vikas.

Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.

Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)

Nonlinear Dimensionality Reduction by Locally Linear Embedding Sam T. Roweis and Lawrence K. Saul Reference: "Nonlinear dimensionality reduction by locally.

CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.

Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.

Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University

The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.

Summarized by Soo-Jin Kim

Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering Maastricht University.

Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.

1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.

Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.

Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)

Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)

Dimensionality Reduction

Manifold learning: MDS and Isomap

Dimensionality Reduction Part 2: Nonlinear Methods

Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.

Linear Models for Classification

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory.

CpSc 881: Machine Learning PCA and MDS. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach,

Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.

Principal Components Analysis ( PCA)

Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Dimensionality Reduction Part 1: Linear Methods Comp Spring 2007.

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:

Spectral Methods for Dimensionality

Dimensionality Reduction

Dimensionality Reduction

Spectral Methods Tutorial 6 1 © Maks Ovsjanikov

Machine Learning Dimensionality Reduction

Dimensionality Reduction

Dimensionality Reduction

Feature space tansformation methods

Marios Mattheakis and Pavlos Protopapas

Presentation transcript:

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Dimensionality Reduction CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu

CS685 : Special Topics in Data Mining, UKY Overview What is Dimensionality Reduction? – Simplifying complex data – Using dimensionality reduction as a Data Mining “tool” – Useful for both “data modeling” and “data analysis” – Tool for “clustering” and “regression” Linear Dimensionality Reduction Methods – Principle Component Analysis (PCA) – Multi-Dimensional Scaling (MDS) Non-Linear Dimensionality Reduction – Next week

CS685 : Special Topics in Data Mining, UKY What is Dimensionality Reduction? Given N objects, each with M measurements, find the best D-dimensional parameterization Goal: Find a “compact parameterization” or “Latent Variable” representation Given N examples of find where Underlying assumptions to DimRedux – Measurements over-specify data, M > D – The number of measurements exceed the number of “true” degrees of freedom in the system – The measurements capture all of the significant variability

CS685 : Special Topics in Data Mining, UKY Uses for DimRedux Build a “compact” model of the data – Compression for storage, transmission, & retrieval – Parameters for indexing, exploring, and organizing – Generate “plausible” new data Answer fundamental questions about data – What is its underlying dimensionality? How many degrees of freedom are exhibited? How many “latent variables”? – How independent are my measurements? – Is there a projection of my data set where important relationships stand out?

CS685 : Special Topics in Data Mining, UKY DimRedux in Data Modeling Data Clustering - Continuous to Discrete – The curse of dimensionality: the sampling density is proportional to N 1/p. – Need a mapping to a lower-dimensional space that preserves “important” relations Regression Modeling – Continuous to Continuous – A functional model that generates input data – Useful for interpolation Embedding Space

CS685 : Special Topics in Data Mining, UKY Today’s Focus Linear DimRedux methods – PCA – Pearson (1901); Hotelling (1935) – MDS – Torgerson (1952), Shepard (1962) “Linear” Assumption – Data is a linear function of the parameters (latent variables) – Data lies on a linear (Affine) subspace where the matrix M is m x d

CS685 : Special Topics in Data Mining, UKY PCA: What problem does it solve? Minimizes “least-squares” (Euclidean) error – The D-dimensional model provided by PCA has the smallest Euclidean error of any D-parameter linear model. where is the model predicted by the D- dimensional PCA. Projects data s.t. the variance is maximized Find an optimal “orthogonal” basis set for describing the given data

CS685 : Special Topics in Data Mining, UKY Principle Component Analysis Also known to engineers as the Karhunen-Loéve Transform (KLT) Rotate data points to align successive axes with directions of greatest variance – Subtract mean from data – Normalize variance along each direction, and reorder according to the variance magnitude from high to low – Normalized variance direction = principle component Eigenvectors of system’s Covariance Matrix permute to order eigenvectors in descending order

CS685 : Special Topics in Data Mining, UKY Simple PCA Example Simple 3D example >> x = rand(2, 500); >> z = [1,0; 0,1; -1,-1] * x + [0;0;1] * ones(1, 500); >> m = (100 * rand(3,3)) * z + rand(3, 500); >> scatter3(m(1,:), m(2,:), m(3,:), 'filled');

CS685 : Special Topics in Data Mining, UKY Simple PCA Example (cont) >> mm = (m- mean(m')' * ones(1, 500));; >> [E,L] = eig(cov(mm ‘ )); >> E E = >> L L = >> newm = E’ * (m - mean(m’)’' * ones(1, 500)); >> scatter3(newm(1,:), newm(2,:), newm(3,:), 'filled'); axis([-50,50, -50,50, -50,50]);

CS685 : Special Topics in Data Mining, UKY Simple PCA Example (cont)

CS685 : Special Topics in Data Mining, UKY PCA Applied to Reillumination Illumination can be modeled as an additive linear system. )(R ixy 

CS685 : Special Topics in Data Mining, UKY Simulating New Lighting We can simulate the appearance of a model under new illumination by combining images taken from a set of basis lights We can then capture real-world lighting and use it to modulate our basis lighting functions

CS685 : Special Topics in Data Mining, UKY Problems There are too many basis lighting functions – These have to be stored in order to use them – The resulting lighting model can be huge, in particular when representing high frequency lighting – Lighting differences can be very subtle The cost of modulation is excessive – Every basis image must be scaled and added together – Each image requires a high-dynamic range Is there a more compact representation? – Yes, use PCA.

CS685 : Special Topics in Data Mining, UKY PCA Applied to Illumination More than 90% variance is captured in the first five principle components Generate new illumination by combining only 5 basis images V0 for n lights

CS685 : Special Topics in Data Mining, UKY Results Video

CS685 : Special Topics in Data Mining, UKY Results Video

CS685 : Special Topics in Data Mining, UKY Results Video

CS685 : Special Topics in Data Mining, UKY MDS: What problem does it solve? Takes as input a dissimilarity matrix M, containing pairwise dissimilarities between N-dimensional data points Finds the best D-dimensional linear parameterization compatible with M (in other words, outputs a projection of data in D-dimensional space where the pairwise distances match the original dissimilarities as faithfully as possible)

CS685 : Special Topics in Data Mining, UKY Multidimensional Scaling (MDS) Dissimilarities can be metric or non-metric Useful when absolute measurements are unavailable; uses relative measurements Computation is invariant to dimensionality of data

CS685 : Special Topics in Data Mining, UKY An example: map of the US Given only the distance between a bunch of cities

CS685 : Special Topics in Data Mining, UKY An example: map of the US MDS finds suitable coordinates for the points of the specified dimension.

CS685 : Special Topics in Data Mining, UKY MDS Properties Parameterization is not unique – Axes are meaningless – Not surprising since Euclidean transformations and reflections preserve distances between points Useful for visualizing relationships in high dimensional data. – Define a dissimilarity measure – Map to a lower-dimensional space using MDS Common preprocess before cluster analysis – Aids in understanding patterns and relationships in data Widely used in marketing and psychometrics

CS685 : Special Topics in Data Mining, UKY Dissimilarities Dissimilarities are distance-like quantities that satisfy the following conditions: A dissimilarity is metric if, in addition, it satisfies: “The triangle inequality”

CS685 : Special Topics in Data Mining, UKY Relating MDS to PCA Special case: when distances are Euclidean PCA = eigendecomposition of covariance matrix M T M Convert the pair-wise distance matrix to the covariance matrix

CS685 : Special Topics in Data Mining, UKY How to get M T M from Euclidean Pair-wise Distances Eigendecomposition on b to get VSV T VS 1/2 = matrix of new coordinates j i k Law of cosines Definition of a dot product

CS685 : Special Topics in Data Mining, UKY Algebraically… The distance between points p i and p j The *Column Average* the average distance that a given point is from p j The *Row Average* the average distance that a given point is from p i The “Matrix Average” So we “centered” the matrix

CS685 : Special Topics in Data Mining, UKY MDS Mechanics Given a Dissimilarity matrix, D, the MDS model is computed as follows: Where, H, the so called “centering” matrix, is a scaled identity matrix computed as follows: MDS coordinates given by (in order of decreasing :

CS685 : Special Topics in Data Mining, UKY MDS Stress The residual variance of B (i.e. the sum of the remaining eigenvalues) indicate the goodness of fit for the selected d-dimensional model This term is often called MDS “stress” Examining the residual variance gives an indication of the inherent dimensionality

CS685 : Special Topics in Data Mining, UKY Reflectance Modeling Example From Pellacini, et. al. “Toward a Psychophysically-Based Light Reflection Model for Image Synthesis,” SIGGRAPH 2000 Objective – Find a perceptually meaningful parameterization for reflectance modeling The top row of white, grey, and black balls have the same “physical” reflectance parameters, however, the bottom row is “perceptually” more consistent.

CS685 : Special Topics in Data Mining, UKY Reflectance Modeling Example User Task – Subjects were presented with 378 pairs of rendered spheres an asked to rate their difference in “glossiness” on a scale of 0 (no difference) to 100. A dissimilarity 27 x 27 dissimilarity matrix was constructed and MDS applied

CS685 : Special Topics in Data Mining, UKY Reflectance Modeling Example Parameters of a 2D embedding space were determined Two axes of “gloss” were established

CS685 : Special Topics in Data Mining, UKY Limitations of Linear methods What if the data does not lie within a linear subspace? Do all convex combinations of the measurements generate plausible data? Low-dimensional non-linear Manifold embedded in a higher dimensional space Next time: Nonlinear Dimensionality Reduction

CS685 : Special Topics in Data Mining, UKY Summary Linear dimensionality reduction tools are widely used for – Data analysis – Data preprocessing – Data compression PCA transforms the measurement data s. t. successive directions of greatest variance are mapped to orthogonal axis directions (bases) – An D-dimensional embedding space (parameterization) can be established by modeling the data using only the first d of these basis vectors – Residual modeling error is the sum of the remaining eigenvalues

CS685 : Special Topics in Data Mining, UKY Summary (cont) MDS finds a d-dimensional parameterization that best preserves a given dissimilarity matrix – Resulting model can be Euclidean transformed to align data with a more intuitive parameterization – An D-dimensional embedding spaces (parameterization) are established by modeling the data using only the first d coordinates of the scaled eigenvectors – Residual modeling error (MDS stress) is the sum of the remaining eigenvalues – If Euclidean metric dissimilarity matrix is used for MDS the resulting d-dimensional model will match the PCA weights for the same dimensional model