Presentation is loading. Please wait.

Presentation is loading. Please wait.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Dimensionality Reduction Part 1: Linear Methods Comp 790-090 Spring 2007.

Similar presentations


Presentation on theme: "The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Dimensionality Reduction Part 1: Linear Methods Comp 790-090 Spring 2007."— Presentation transcript:

1 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Dimensionality Reduction Part 1: Linear Methods Comp 790-090 Spring 2007

2 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Overview What is Dimensionality Reduction? Simplifying complex data Using dimensionality reduction as a Data Mining “tool” Useful for both “data modeling” and “data analysis” Tool for “clustering” and “regression” Linear Dimensionality Reduction Methods Principle Component Analysis (PCA) Multi-Dimensional Scaling (MDS) Non-Linear Dimensionality Reduction Next week

3 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL What is Dimensionality Reduction? Given N objects, each with M measurements, find the best D-dimensional parameterization Goal: Find a “compact parameterization” or “Latent Variable” representation Given N examples of find where Underlying assumptions to DimRedux Measurements over-specify data, M > D The number of measurements exceed the number of “true” degrees of freedom in the system The measurements capture all of the significant variability

4 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Uses for DimRedux Build a “compact” model of the data Compression for storage, transmission, & retrieval Parameters for indexing, exploring, and organizing Generate “plausible” new data Answer fundamental questions about data What is its underlying dimensionality? How many degrees of freedom are exhibited? How many “latent variables”? How independent are my measurements? Is there a projection of my data set where important relationships stand out?

5 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL DimRedux in Data Modeling Data Clustering - Continuous to Discrete The curse of dimensionality: the sampling density is proportional to N 1/p. Need a mapping to a lower- dimensional space that preserves “important” relations Regression Modeling – Continuous to Continuous A functional model that generates input data Useful for interpolation Embedding Space

6 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Today’s Focus Linear DimRedux methods PCA – Pearson (1901); Hotelling (1935) MDS – Torgerson (1952), Shepard (1962) “Linear” Assumption Data is a linear function of the parameters (latent variables) Data lies on a linear (Affine) subspace where the matrix M is m x d

7 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL PCA: What problem does it solve? Minimizes “least-squares” (Euclidean) error The D-dimensional model provided by PCA has the smallest Euclidean error of any D-parameter linear model. where is the model predicted by the D- dimensional PCA. Projects data s.t. the variance is maximized Find an optimal “orthogonal” basis set for describing the given data

8 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Principle Component Analysis Also known to engineers as the Karhunen-Loéve Transform (KLT) Rotate data points to align successive axes with directions of greatest variance Subtract mean from data Normalize variance along each direction, and reorder according to the variance magnitude from high to low Normalized variance direction = principle component Eigenvectors of system’s Covariance Matrix permute to order eigenvectors in descending order

9 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Simple PCA Example Simple 3D example >> x = rand(2, 500); >> z = [1,0; 0,1; -1,-1] * x + [0;0;1] * ones(1, 500); >> m = (100 * rand(3,3)) * z + rand(3, 500); >> scatter3(m(1,:), m(2,:), m(3,:), 'filled');

10 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Simple PCA Example (cont) >> mm = (m- mean(m')' * ones(1, 500));; >> [E,L] = eig(cov(mm ‘ )); >> E E = 0.8029 -0.5958 0.0212 0.1629 0.2535 0.9535 0.5735 0.7621 -0.3006 >> L L = 172.2525 0 0 0 116.2234 0 0 0 0.0837 >> newm = E’ * (m - mean(m’)’' * ones(1, 500)); >> scatter3(newm(1,:), newm(2,:), newm(3,:), 'filled'); axis([-50,50, -50,50, -50,50]);

11 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Simple PCA Example (cont)

12 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL PCA Applied to Reillumination Illumination can be modeled as an additive linear system. )(R ixy 

13 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Simulating New Lighting We can simulate the appearance of a model under new illumination by combining images taken from a set of basis lights We can then capture real-world lighting and use it to modulate our basis lighting functions

14 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Problems There are too many basis lighting functions These have to be stored in order to use them The resulting lighting model can be huge, in particular when representing high frequency lighting Lighting differences can be very subtle The cost of modulation is excessive Every basis image must be scaled and added together Each image requires a high-dynamic range Is there a more compact representation? Yes, use PCA.

15 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL PCA Applied to Illumination More than 90% variance is captured in the first five principle components Generate new illumination by combining only 5 basis images V0 for n lights

16 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Results Video

17 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Results Video

18 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Results Video

19 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL MDS: What problem does it solve? Takes as input a dissimilarity matrix M, containing pairwise dissimilarities between N-dimensional data points Finds the best D-dimensional linear parameterization compatible with M (in other words, outputs a projection of data in D-dimensional space where the pairwise distances match the original dissimilarities as faithfully as possible)

20 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Multidimensional Scaling (MDS) Dissimilarities can be metric or non-metric Useful when absolute measurements are unavailable; uses relative measurements Computation is invariant to dimensionality of data

21 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL An example: map of the US Given only the distance between a bunch of cities

22 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL An example: map of the US MDS finds suitable coordinates for the points of the specified dimension.

23 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL MDS Properties Parameterization is not unique – Axes are meaningless Not surprising since Euclidean transformations and reflections preserve distances between points Useful for visualizing relationships in high dimensional data. Define a dissimilarity measure Map to a lower-dimensional space using MDS Common preprocess before cluster analysis Aids in understanding patterns and relationships in data Widely used in marketing and psychometrics

24 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Dissimilarities Dissimilarities are distance-like quantities that satisfy the following conditions: A dissimilarity is metric if, in addition, it satisfies: “The triangle inequality”

25 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Relating MDS to PCA Special case: when distances are Euclidean PCA = eigendecomposition of covariance matrix M T M Convert the pair-wise distance matrix to the covariance matrix

26 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL How to get M T M from Euclidean Pair-wise Distances Eigendecomposition on b to get VSV T VS 1/2 = matrix of new coordinates j i k Law of cosines Definition of a dot product Eigendecomposition on b to get VSV T VS 1/2 = matrix of new coordinates

27 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Algebraically… The distance between points p i and p j The *Column Average* the average distance that a given point is from p j The *Row Average* the average distance that a given point is from p i The “Matrix Average” So we “centered” the matrix

28 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL MDS Mechanics Given a Dissimilarity matrix, D, the MDS model is computed as follows: Where, H, the so called “centering” matrix, is a scaled identity matrix computed as follows: MDS coordinates given by (in order of decreasing :

29 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL MDS Stress The residual variance of B (i.e. the sum of the remaining eigenvalues) indicate the goodness of fit for the selected d-dimensional model This term is often called MDS “stress” Examining the residual variance gives an indication of the inherent dimensionality

30 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Reflectance Modeling Example From Pellacini, et. al. “Toward a Psychophysically-Based Light Reflection Model for Image Synthesis,” SIGGRAPH 2000 Objective – Find a perceptually meaningful parameterization for reflectance modeling The top row of white, grey, and black balls have the same “physical” reflectance parameters, however, the bottom row is “perceptually” more consistent.

31 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Reflectance Modeling Example User Task – Subjects were presented with 378 pairs of rendered spheres an asked to rate their difference in “glossiness” on a scale of 0 (no difference) to 100. A dissimilarity 27 x 27 dissimilarity matrix was constructed and MDS applied

32 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Reflectance Modeling Example Parameters of a 2D embedding space were determined Two axes of “gloss” were established

33 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Limitations of Linear methods What if the data does not lie within a linear subspace? Do all convex combinations of the measurements generate plausible data? Low-dimensional non- linear Manifold embedded in a higher dimensional space Next time: Nonlinear Dimensionality Reduction

34 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Summary Linear dimensionality reduction tools are widely used for Data analysis Data preprocessing Data compression PCA transforms the measurement data s. t. successive directions of greatest variance are mapped to orthogonal axis directions (bases) An D-dimensional embedding space (parameterization) can be established by modeling the data using only the first d of these basis vectors Residual modeling error is the sum of the remaining eigenvalues

35 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Summary (cont) MDS finds a d-dimensional parameterization that best preserves a given dissimilarity matrix Resulting model can be Euclidean transformed to align data with a more intuitive parameterization An D-dimensional embedding spaces (parameterization) are established by modeling the data using only the first d coordinates of the scaled eigenvectors Residual modeling error (MDS stress) is the sum of the remaining eigenvalues If Euclidean metric dissimilarity matrix is used for MDS the resulting d-dimensional model will match the PCA weights for the same dimensional model


Download ppt "The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Dimensionality Reduction Part 1: Linear Methods Comp 790-090 Spring 2007."

Similar presentations


Ads by Google