Download presentation

Presentation is loading. Please wait.

Published byKaleb Snipes Modified about 1 year ago

1
Curse of Dimensionality Prof. Navneet Goyal Dept. Of Computer Science & Information Systems BITS - Pilani

2
Curse of Dimensionality!! Poses serious challenges ! Important factor influencing the design on pattern recognition techniques Mixture of oil, water & gas (homogeneous, annular & laminar) Each data point is a point in a 12-dimensional space. 100 points along only two dimensions, x6 & x7 x – predict its class? Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

3
Curse of Dimensionality!! Unlikely that it belongs to the blue class! Surrounded by lot of red points Also, many green points nearby Intuition: identity of the x should be determined strongly by nearby points and less strongly by more distant points How can we turn this intuition into a learning algorithm? Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

4
Curse of Dimensionality!! Make grid lines! Use majority voting Problems?? Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

5
Curse of Dimensionality No. of cells grow exponentially with D Need exponentially large no. of training data points Not a good approach for more than a few dimensions! Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer

6
Curse of Dimensionality Solutions?? – Dimensionality Reductions – Develop Algorithms that are not affected by Curse of Dimensionality

7
Curse of Dimensionality Problems: – running time – over-fitting – number of samples required Reference: CS434a/541a: Pattern Recognition - Prof. Olga Veksler

8
Running Time Complexity (running time) increases with dimension d! A lot of methods have at least O(nd 2 ) complexity (n=no. of samples) – For eg.: estimation of covariance matrix With large d, O(nd 2 ) complexity may be too costly Reference: CS434a/541a: Pattern Recognition - Prof. Olga Veksler

9
Number of Samples Suppose we want to use the nearest neighbor approach with k = 1 (1NN) Suppose we start with only one feature This feature is not discriminative, i.e. it does not separate the classes well Use 2 features 1NN method needs a lot of samples, i.e. Samples have to be dense To maintain the same density as in 1D (9 samples per unit length), how many samples do we need? Reference: CS434a/541a: Pattern Recognition - Prof. Olga Veksler

10
Number of Samples We need 9 2 samples to maintain the same density as in 1D Reference: CS434a/541a: Pattern Recognition - Prof. Olga Veksler

11
Number of Samples When we go from 1 feature to 2, no one gives us more samples, we still have 9 This is way too sparse for 1NN to work well Reference: CS434a/541a: Pattern Recognition - Prof. Olga Veksler

12
Number of Samples Things go from bad to worse if we decide to use 3 features If 9 was dense enough in 1D, in 3D we need 9 3 =729 samples! Reference: CS434a/541a: Pattern Recognition - Prof. Olga Veksler

13
Number of Samples In general, if n samples is dense enough in 1D Then in d dimensions we need n d samples! n d grows really really fast as a function of d Common pitfall: If we can’t solve a problem with a few features, adding more features seems like a good idea However the number of samples usually stays the same The method with more features is likely to perform worse instead of expected better Reference: CS434a/541a: Pattern Recognition - Prof. Olga Veksler

14
Number of Samples For a fixed number of samples, as we add features, the graph of classification error: Thus for each fixed sample size n, there is the optimal number of features to use Reference: CS434a/541a: Pattern Recognition - Prof. Olga Veksler

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google