Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dimensionality R e d u c t i o n. Another unsupervised task Clustering, etc. -- all forms of data modeling Trying to identify statistically supportable.

Similar presentations


Presentation on theme: "Dimensionality R e d u c t i o n. Another unsupervised task Clustering, etc. -- all forms of data modeling Trying to identify statistically supportable."— Presentation transcript:

1 Dimensionality R e d u c t i o n

2 Another unsupervised task Clustering, etc. -- all forms of data modeling Trying to identify statistically supportable patterns in data Another way of looking at it: reduce complexity of data Clustering: 1000 data points → 3 clusters Dimensionality reduction: reduce complexity of space in which data lives Find low-dimensional projection of data

3 Objective functions All learning methods depend on optimizing some objective function Otherwise, can’t tell if you’re making any progress Measures whether model A is better than B Supervised learning: loss function Difference between predicted and actual values Unsupervised learning: model fit/distortion How well does model represent data?

4 The fit of dimensions Given: Data set X={X 1,...,X N } in feature space F Goal: find a low-dimensional representation of data set Projection of X into F’ ⊂ F That is: find g() such that g(X) ∈ F’ Constraint: preserve some property of X as much as possible

5 Capturing classification Easy “fit” function: keep aspects of data that make it easy to classify Uses dimensionality reduction in conjunction with classification Goal: find g() such that loss of model learned on g(X) is minimized:

6 Feature subset selection Early idea: Let g() be a subset of the feature space E.g., if X=[X[1], X[2],..., X[d]] Then g(X)=[X[2], X[17],..., X[k]] for k ≪ d Tricky part: picking the indices to keep Q: How many such index sets are possible?

7 Wrapper method Led to wrapper method for FSS Kohavi et al. (KDD-1995, AIJ 97(1-2), etc.) Core idea: use target learning algorithm as black-box subroutine Wrap (your favorite) search for feature subset around black box

8 An example wrapper FSS // hill-climbing search-based wrapper FSS function wrapper_FSS_hill(X,Y,L,baseLearn) // Inputs: data X, labels Y, loss function L, // base learner method, baseLearn() // Outputs: feature subset S, model fHat S={} // initialize: empty set [XTr,YTr,Xtst,Ytst]=split_data(X,Y); l=Inf; do { lLast=l; nextSSet=extend_feature_set(S); foreach sp in nextSSet { model=baseLearn(Xtr[sp],Ytr); err=L(model(Xtst),Ytst); if (err<l) { l=err; fHat=model; } } } while (l<lLast);

9 More general projections FSS uses orthagonal projection onto a subspace Essentially: drop some dimensions, keep others Often useful to work with more general projection functions, g() Example: linear projection: Pick A to reduce dimension: k×d matrix, k ≪ d

10 The right linearity How to pick A ? What property of the data do we want to preserve? Typical answer: squared-error between the original data point and the low-dimensional representation of that point: Leads to method of principle component analysis (PCA), a.k.a., Karhunen-Loéve (KL) transform

11 PCA 1. Find mean of data:

12 PCA 1. Find mean of data: 2. Find scatter matrix: Essentially, denormalized covariance matrix

13 PCA 1. Find mean of data: 2. Find scatter matrix: Essentially, denormalized covariance matrix 3. Find eigenvectors/eigenvalues of S :

14 PCA 1. Find mean of data: 2. Find scatter matrix: Essentially, denormalized covariance matrix 3. Find eigenvectors/eigenvalues of S : 4. Take top k<<d eigenvectors:

15 PCA 1. Find mean of data: 2. Find scatter matrix: Essentially, denormalized covariance matrix 3. Find eigenvectors/eigenvalues of S : 4. Take top k<<d eigenvectors: 5. Form A from those vectors:

16 Nonlinearity The coolness of PCA: Finds directions of “maximal variance” in data Good for linear data sets The downfall of PCA: Lots of stuff in the world is nonlinear

17 LLE et al. Leads to a number of methods for nonlinear dimensionality reduction (NLDR) LLE, Isomap, MVUE, etc. Core idea to all of them: Look at small “patch” on surface of data manifold Make low-dim local, linear approximation to patch “Stitch together” all local approximations into global structure

18 Unfolding the swiss roll 3-d data2-d approximation


Download ppt "Dimensionality R e d u c t i o n. Another unsupervised task Clustering, etc. -- all forms of data modeling Trying to identify statistically supportable."

Similar presentations


Ads by Google