Dimensionality R e d u c t i o n. Another unsupervised task Clustering, etc. -- all forms of data modeling Trying to identify statistically supportable.

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Dimensionality Reduction PCA -- SVD
Presented by: Mingyuan Zhou Duke University, ECE April 3, 2009
A novel supervised feature extraction and classification framework for land cover recognition of the off-land scenario Yan Cui
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
© 2003 by Davi GeigerComputer Vision September 2003 L1.1 Face Recognition Recognized Person Face Recognition.
Principal Component Analysis
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
1 Efficient Clustering of High-Dimensional Data Sets Andrew McCallum WhizBang! Labs & CMU Kamal Nigam WhizBang! Labs Lyle Ungar UPenn.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Principal Component Analysis Barnabás Póczos University of Alberta Nov 24, 2009 B: Chapter 12 HRF: Chapter 14.5.
Unsupervised Learning
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Summarized by Soo-Jin Kim
Linear Least Squares Approximation. 2 Definition (point set case) Given a point set x 1, x 2, …, x n  R d, linear least squares fitting amounts to find.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Presented By Wanchen Lu 2/25/2013
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Presented by Xianwang Wang Masashi Sugiyama.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
CSE 185 Introduction to Computer Vision Face Recognition.
Nonlinear Dimensionality Reduction Approach (ISOMAP)
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CS 2750: Machine Learning Dimensionality Reduction Prof. Adriana Kovashka University of Pittsburgh January 27, 2016.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
LDA (Linear Discriminant Analysis) ShaLi. Limitation of PCA The direction of maximum variance is not always good for classification.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Nonlinear Dimension Reduction: Semi-Definite Embedding vs. Local Linear Embedding Li Zhang and Lin Liao.
Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Spectral Methods for Dimensionality
Principal Component Analysis (PCA)
Dimensionality Reduction
University of Ioannina
Dimensionality Reduction
Lecture 8:Eigenfaces and Shared Features
CS 2750: Machine Learning Dimensionality Reduction
Recognition: Face Recognition
Machine Learning Dimensionality Reduction
Introduction PCA (Principal Component Analysis) Characteristics:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Dimensionality Reduction
Feature space tansformation methods
CS4670: Intro to Computer Vision
Feature Selection Methods
Presentation transcript:

Dimensionality R e d u c t i o n

Another unsupervised task Clustering, etc. -- all forms of data modeling Trying to identify statistically supportable patterns in data Another way of looking at it: reduce complexity of data Clustering: 1000 data points → 3 clusters Dimensionality reduction: reduce complexity of space in which data lives Find low-dimensional projection of data

Objective functions All learning methods depend on optimizing some objective function Otherwise, can’t tell if you’re making any progress Measures whether model A is better than B Supervised learning: loss function Difference between predicted and actual values Unsupervised learning: model fit/distortion How well does model represent data?

The fit of dimensions Given: Data set X={X 1,...,X N } in feature space F Goal: find a low-dimensional representation of data set Projection of X into F’ ⊂ F That is: find g() such that g(X) ∈ F’ Constraint: preserve some property of X as much as possible

Capturing classification Easy “fit” function: keep aspects of data that make it easy to classify Uses dimensionality reduction in conjunction with classification Goal: find g() such that loss of model learned on g(X) is minimized:

Feature subset selection Early idea: Let g() be a subset of the feature space E.g., if X=[X[1], X[2],..., X[d]] Then g(X)=[X[2], X[17],..., X[k]] for k ≪ d Tricky part: picking the indices to keep Q: How many such index sets are possible?

Wrapper method Led to wrapper method for FSS Kohavi et al. (KDD-1995, AIJ 97(1-2), etc.) Core idea: use target learning algorithm as black-box subroutine Wrap (your favorite) search for feature subset around black box

An example wrapper FSS // hill-climbing search-based wrapper FSS function wrapper_FSS_hill(X,Y,L,baseLearn) // Inputs: data X, labels Y, loss function L, // base learner method, baseLearn() // Outputs: feature subset S, model fHat S={} // initialize: empty set [XTr,YTr,Xtst,Ytst]=split_data(X,Y); l=Inf; do { lLast=l; nextSSet=extend_feature_set(S); foreach sp in nextSSet { model=baseLearn(Xtr[sp],Ytr); err=L(model(Xtst),Ytst); if (err<l) { l=err; fHat=model; } } } while (l<lLast);

More general projections FSS uses orthagonal projection onto a subspace Essentially: drop some dimensions, keep others Often useful to work with more general projection functions, g() Example: linear projection: Pick A to reduce dimension: k×d matrix, k ≪ d

The right linearity How to pick A ? What property of the data do we want to preserve? Typical answer: squared-error between the original data point and the low-dimensional representation of that point: Leads to method of principle component analysis (PCA), a.k.a., Karhunen-Loéve (KL) transform

PCA 1. Find mean of data:

PCA 1. Find mean of data: 2. Find scatter matrix: Essentially, denormalized covariance matrix

PCA 1. Find mean of data: 2. Find scatter matrix: Essentially, denormalized covariance matrix 3. Find eigenvectors/eigenvalues of S :

PCA 1. Find mean of data: 2. Find scatter matrix: Essentially, denormalized covariance matrix 3. Find eigenvectors/eigenvalues of S : 4. Take top k<<d eigenvectors:

PCA 1. Find mean of data: 2. Find scatter matrix: Essentially, denormalized covariance matrix 3. Find eigenvectors/eigenvalues of S : 4. Take top k<<d eigenvectors: 5. Form A from those vectors:

Nonlinearity The coolness of PCA: Finds directions of “maximal variance” in data Good for linear data sets The downfall of PCA: Lots of stuff in the world is nonlinear

LLE et al. Leads to a number of methods for nonlinear dimensionality reduction (NLDR) LLE, Isomap, MVUE, etc. Core idea to all of them: Look at small “patch” on surface of data manifold Make low-dim local, linear approximation to patch “Stitch together” all local approximations into global structure

Unfolding the swiss roll 3-d data2-d approximation