Presentation is loading. Please wait.

Presentation is loading. Please wait.

Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +

Similar presentations


Presentation on theme: "Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +"— Presentation transcript:

1 Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + - - - - - + - -+ +. - - + - + + -

2 Classification Problem Unknown probability distributionUnknown probability distribution We need to estimate:We need to estimate:

3 The Bayesian Classifier Loss function:Loss function: Expected loss (conditional risk) associated with class j:Expected loss (conditional risk) associated with class j: Bayes rule:Bayes rule: Zero-one loss function:Zero-one loss function: Bayes rule

4 The Bayesian Classifier Bayes rule achieves the minimum error rateBayes rule achieves the minimum error rate How to estimate the posterior probabilities:How to estimate the posterior probabilities:

5 Density estimation Use Bayes theorem to estimate the posterior probability values:Use Bayes theorem to estimate the posterior probability values: is the probability density function of given class is the probability density function of given class is the prior probability of class is the prior probability of class

6 Naïve Bayes Classifier Makes the assumption of independence of features given the class:Makes the assumption of independence of features given the class: The task of estimating a q -dimensional density function is reduced to the estimation of q one-dimensional density functions. Thus, the complexity of the task is drastically reduced.The task of estimating a q -dimensional density function is reduced to the estimation of q one-dimensional density functions. Thus, the complexity of the task is drastically reduced. The use of Bayes theorem becomes much simpler.The use of Bayes theorem becomes much simpler. Proven to be effective in practice.Proven to be effective in practice.

7 Nearest-Neighbor Methods Predict the class label of as the most frequent one occurring in the neighborsPredict the class label of as the most frequent one occurring in the neighbors - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + - - - - - + - -+ + - - + - + + -. + -

8 Nearest-Neighbor Methods Predict the class label of as the most frequent one occurring in the neighborsPredict the class label of as the most frequent one occurring in the neighbors - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + - - - - - + - -+ + - - + - + + - + + -

9 Nearest-Neighbor Methods Predict the class label of as the most frequent one occurring in the neighborsPredict the class label of as the most frequent one occurring in the neighbors - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + - - - - - + - -+ + - - + - + + -. + -..distancemetric Basic assumption:

10 Example: Letter Recognition... Edge count First statistical moment moment

11 Asymptotic Properties of K-NN Methods if and The first condition reduces the variance by making the estimation independent of the accidental characteristics of the K nearest neighbors. The first condition reduces the variance by making the estimation independent of the accidental characteristics of the K nearest neighbors. The second condition reduces the bias by assuring that the K nearest neighbors are arbitrarily close to the query point. The second condition reduces the bias by assuring that the K nearest neighbors are arbitrarily close to the query point.

12 Asymptotic Properties of K-NN Methods classification error rate of the 1-NN rule classification error rate of the Bayes rule In the asymptotic limit no decision rule is more than twice as accurate as the 1-NN rule

13 Finite-sample settings If the number of training data N is large and the number of input features q is small, then the asymptotic results may still be valid. If the number of training data N is large and the number of input features q is small, then the asymptotic results may still be valid. However, for a moderate to large number of input variables, the sample required for their validity is beyond feasibility. However, for a moderate to large number of input variables, the sample required for their validity is beyond feasibility. How well the 1-NN rule works in finite- sample settings? How well the 1-NN rule works in finite- sample settings?

14 Curse-of-Dimensionality This phenomenon is known as the curse-of-dimensionality This phenomenon is known as the curse-of-dimensionality It refers to the fact that in high dimensional spaces data become extremely sparse and are far apart from each other It refers to the fact that in high dimensional spaces data become extremely sparse and are far apart from each other It affects any estimation problem with high dimensionality It affects any estimation problem with high dimensionality

15 Curse of Dimensionality Sample of size N=500 uniformly distributed in DMAX DMINDMAX/DMIN

16 Curse of Dimensionality dim The distribution of the ratio DMAX/DMIN converges to 1 as the dimensionality increases

17 Curse of Dimensionality dim Variance of distances from a given point

18 Curse of Dimensionality The variance of distances from a given point converges to 0 as the dimensionality increases dim

19 Curse of Dimensionality Distance values from a given point Values flatten out as dimensionality increases

20 Computing radii of nearest neighborhoods

21 median radius of a nearest neighborhood

22 Curse-of-Dimensionality Random sample of size uniform distribution in the Random sample of size uniform distribution in the -dimensional unit hypercube Diameter of a neighborhood using Euclidean Diameter of a neighborhood using Euclideandistance: As dimensionality increases, the distance from the closest point increases faster Large Highly biased estimations

23 Curse-of-Dimensionality It is a serious problem in many real-world applications It is a serious problem in many real-world applications Microarray data: 3,000-4,000 genes; Microarray data: 3,000-4,000 genes; Documents: 10,000-20,000 words in dictionary; Documents: 10,000-20,000 words in dictionary; Images, face recognition, etc. Images, face recognition, etc.

24 How can we deal with How can we deal with the curse of dimensionality? the curse of dimensionality?

25

26

27 variance variance covariance covariance

28

29 Dimensionality Reduction Many dimensions are often interdependent (correlated);Many dimensions are often interdependent (correlated); We can: Reduce the dimensionality of problems;Reduce the dimensionality of problems; Transform interdependent coordinates into significant and independent ones;Transform interdependent coordinates into significant and independent ones;


Download ppt "Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query - - - - - - - - - - - - - - - - - + + + + +"

Similar presentations


Ads by Google