# DIMENSIONALITY REDUCTION Computer Graphics CourseJune 2013.

## Presentation on theme: "DIMENSIONALITY REDUCTION Computer Graphics CourseJune 2013."— Presentation transcript:

DIMENSIONALITY REDUCTION Computer Graphics CourseJune 2013

What is high dimensional data? ImagesVideos Documents Most data, actually!

What is high dimensional data?  Images – dimension 3·X·Y  Videos – dimension of image * number of frames  Documents  Most data, actually

 Images – dimension 3·X·Y  This is the number of bytes in the image file  We can treat each byte as a dimension  Each image is a point in high dimensional space  Which space?  “space of images of size X·Y” How many dimensions?

 But we can describe an image using less bytes!  “Blue sky, green grass, yellow road…”  “Drawing of a kong-fu rat” How many dimensions?

 Visualization: Understanding the structure of data Why do Dimensionality Reduction?

 Visualization: Understanding the structure of data  Fewer dimensions are easy to describe and find correlations (rules)  Compression of data for efficiency  Clustering  Discovering similarities between elements Why do Dimensionality Reduction?

 Curse of dimensionality  100000000000  010000000000  001000000000  000100000000 ……  All these vectors are the same Euclidean distance from each other  But some dimensions could be “worth more”  Can you work with 1,000 images of 1,000,000 dimensions? Why do Dimensionality Reduction?

 Image features:  Average colors  Histograms  FFT based features (Frequency space)  More…  Video features  Document features  Etc… How to reduce dimensions?

 Feature dimension is still quite high (512, 1024, etc)  What now? How to reduce dimensions?

 Simplest way: Project all points on a plane (2D) or a lower dimension sub-space Linear Dimensionality Reduction

 Simplest way: Project all points on a plane (2D)  Only one question: Which plane is the best?  PCA (SVD) Linear Dimensionality Reduction

 Simplest way: Project all points on a plane (2D)  Only one question: Which plane is the best?  PCA (SVD)  For specific applications:  CCA (correlation)  LDA (data with labels)  NMF (non-negative components)  ICA (multiple sources) Linear Dimensionality Reduction

 What if data is not linear?  No plane will work here Non-Linear Dimensionality Reduction

 MDS – MultiDimensional Scaling  Use only distances between elements  Try to reconstruct element positions from distances such that:  Reconstruction can happen in 1D, 2D, 3D, …  More dimensions = less error Non-Linear Dimensionality Reduction

 MDS – MultiDimensional Scaling  Classical MDS: an algebraic solution  Construct a squared proximity matrix using some normalization (“double centering”)  Extract d largest eigenvectors / eigenvalues  Multiply each eigenvector with sqrt(eigenvalue)  Each row is the coordinates of its corresponding point Non-Linear Dimensionality Reduction

 MDS – MultiDimensional Scaling  Classical MDS: an algebraic solution Non-Linear Dimensionality Reduction e1e2e3e4e5 x1 x2 x3 x4 x5 Each vector adds a dimension to the mapping …

 Non-metric MDS: Optimization problem  Example: Sammon’s projection  Start from random positions for each element  Define stress of the system:  In each step, move towards positions that reduce the stress (gradient descent)  Continue until convergence Non-Linear Dimensionality Reduction

 Spectral embedding:  Create a graph of nearest neighbors  Compute the graph laplacian (relates to probability of walking on each edge in a random walk)  Compute Eigenvalues – why?  Computing Eigenvalues is like multiplying the matrix by itself many many times (towards infinity), which is like performing random walks over and over until we reach a stable point  Again, the eigenvectors are the coordinates  Does not preserve distances like MDS – instead it groups together points that are likely neighbors Non-Linear Dimensionality Reduction

 Other non-linear methods  Locally Linear Embedding (LLE): express each point as a linear combination of its neighbors  Isomap: Takes adjacency graph as input, and calculate MDS of the geodesic distances (distances on the graph)  Self Organizing Maps (SOM): Next part… Non-Linear Dimensionality Reduction

SELF ORGANIZING MAPS & RECENT APPLICATIONS Computer Graphics CourseJune 2013

Self Organizing Maps (SOM)  Originated from neural networks  Created by Kohonen, 1982  Also known as Kohonen Maps  Teuvo Kohonen: A Finnish researcher, learning and neural networks  Due to SOM, became the most cited Finnish scientist!  More than 8,000 citations  So what is it?

What is a SOM?  A type of neural network  What is a neuron?  A function with several inputs and one output  In this case – usually a linear combination of the input according to weights

What is a SOM? neurons input (x k ) weights (m ik ) no connection (feedback/feed forward) between neurons

Training a SOM  Start from random weights  For each input X(t) at iteration t:  Find the Best Matching Cell (BMC) (also called Best Matching Unit or BMU) for X(t)  Update weights for each neuron close to the BMU  Weights are updated according to a decaying learning rate and radius

Training a SOM neurons (m i ) X(1) BMC(1) X(2) BMC(2)

Training a SOM – The Math  Best Matching Cell: m c for which is minimal  Another option for BMC: maximal dot product x(t) T m c (t)  Weight adaptation:  is a learning rate dependant of both the time and the distance of m i from the BMC m c

Training a SOM – The Math  Example (motion map): distance between BMC and m i learning ratekernel width maximum number of iterations height and width of the neuron map

Training a SOM – The Math  Example (motion map): =0.25*(H+W)*(1-t/n L ) distance between BMC and m i learning ratekernel width maximum number of iterationsheight and width of the neuron map

Presenting a SOM  Option 1: at each node present the data that relates to vector m i (3D data, colors, continuous spaces)  So for a color map with 3 inputs, if a neuron weights are (0.7, 0.2, 0.3) we would show a reddish color with 0.7 red component, 0.2 green component and 0.3 blue component  For a map of points on the plane with 2 inputs, we would draw a point for each neuron in position (W x, W y )

Presenting a SOM  Option 1: at each node present the data that relates to vector m i (3D data, colors, continuous spaces)

Presenting a SOM  Option 2: give each neuron a representation from the training set X which is closest to vector m i

More Examples

Motion Map  Motion Map: Image-based Retrieval and Segmentation of Motion Data  Sakamato, Kuriyama, Kenko  SCA: Symposium on Computer Animation 2004  Goal: Presenting the user with a grid of postures in order to select a clip of motion data from a large database  Perform clustering on the SOM instead of the abstract data

Motion Map  Example results: 436 posture samples from 55K frames of 51 motion files

Motion Map  Example results: Clustering based on SOM

Motion Map - Details  A map of posture samples is created from all motion files together  Each sample similarity to its closest sample is over a given threshold to reduce computation time  A standard SOM is calculated  Each posture is then connected to a hash table of the motion files that contain similar postures  Clustering the SOM enables display of a simplified map to the user (next page)

Motion Map - Details  Simplified map after SOM clustering: 17 dance styles

Procedural Texture Preview  Eurographics 2012  Goal: Present the user with a single image which shows all possibilities of a procedural texture  Method overview:  Selecting candidate vectors of parameters which maximize completeness, variety and smoothness  Organizing the candidates in a SOM  Synthesis of a continuous map

Procedural Texture Preview  Results thumbnails of random parameters texture preview in a single image texture parameters

Procedural Texture Preview - Details  Selecting candidates for the parameters map using the following optimizations: C = a set of dense samples X = the candidates in the parameter map  Completeness: minimize  Variety: maximize  Smoothness: minimize

Procedural Texture Preview - Details  A standard SOM will jointly optimize the completeness and the smoothness  To optimize the variety as well, the SOM implementation switches between minimizing Ev and maximizing Ec  Instead of regular learning rate, at each step the candidates (weights vectors) are replaced by a new candidate according to the above optimizations

Procedural Texture Preview - Details  After the candidate selection, an image is synthesized which smoothly combines all selected candidates  Stitching is done using standard patch based texture synthesis methods (Graphcut Textures, Kwarta et al, TOG 2003)

Procedural Texture Preview  Some more results

That’s all folks!  Questions?