Presentation is loading. Please wait.

Presentation is loading. Please wait.

KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Similar presentations


Presentation on theme: "KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make."— Presentation transcript:

1 KNN Classifier

2  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make that the prediction 8/29/032Instance Based Classification

3  Assign the most common class among the K-nearest neighbors (like a vote) 8/29/033Instance Based Classification

4 8/29/034Instance Based Classification

5  Train  Load training data  Classify  Read in instance  Find K-nearest neighbors in the training data  Assign the most common class among the K-nearest neighbors (like a vote) 8/29/035 Euclidean distance: a is an attribute (dimension) Instance Based Classification

6  Naïve approach: exhaustive  For the instance to be classified  Visit every training sample and calculate distance  Sort  First K in the list 8/29/036 Euclidean distance: a is an attribute (dimension) Instance Based Classification

7  The Work that Must be Performed  Visit every training sample and calculate distance  Sort  Lots of floating point calculations  Classifier puts-off work till time to classify 8/29/037Instance Based Classification Euclidean distance: a is an attribute (dimension)

8  This is known as a “lazy” learning method  If do most of the work during the training stage known as “eager”  Our next classifier, Naïve Bayes, will be eager  Training takes a while but can classify fast  Which do you think is better? 8/29/038Instance Based Classification Where the work happens

9 From Wikipedia : space ‑ partitioning data structure for organizing points in a k ‑ dimensional space. kd ‑ trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbor searches). kd-trees are a special case of BSP trees. 8/29/039Instance Based Classification

10  Speeds up classification  Probably slows “training” 8/29/03Instance Based Classification10

11  Choosing K can be a bit of an art  What if you could include all data-points (K=n)?  How might you do such a thing? 8/29/0311Instance Based Classification How include all data points? What if weighted the votes of each training sample by its distance from the point being classified?

12  1 over distance squared  Could get less fancy and go linear  But then training data very-far-away still have strong influence 8/29/03Instance Based Classification12

13  Other Radial Basis Functions  Sometimes known as a Kernel Function  One of the more common 8/29/03Instance Based Classification13

14  Work back-loaded  Worse the bigger the training data  Can alleviate with data structures  What else? 8/29/03Instance Based Classification14 Other Issues? What if only some dimensions contribute to ability to classify? Differences in other dimensions would put distance between that point and the target.

15  Book calls this the curse of dimensionality  More is not always better  Might be identical in important dimensions but distant in others 8/29/0315Instance Based Classification From Wikipedia: In applied mathematics, curse of dimensionality (a term coined by Richard E. Bellman),[1][2] also known as the Hughes effect[3] or Hughes phenomenon[4] (named after Gordon F. Hughes),[5][6] refers to the problem caused by the exponential increase in volume associated with adding extra dimensions to a mathematical space. For example, 100 evenly-spaced sample points suffice to sample a unit interval with no more than 0.01 distance between points; an equivalent sampling of a 10- dimensional unit hypercube with a lattice with a spacing of 0.01 between adjacent points would require 1020 sample points: thus, in some sense, the 10-dimensional hypercube can be said to be a factor of 1018 "larger" than the unit interval. (Adapted from an example by R. E. Bellman; see below.) From Wikipedia: In applied mathematics, curse of dimensionality (a term coined by Richard E. Bellman),[1][2] also known as the Hughes effect[3] or Hughes phenomenon[4] (named after Gordon F. Hughes),[5][6] refers to the problem caused by the exponential increase in volume associated with adding extra dimensions to a mathematical space. For example, 100 evenly-spaced sample points suffice to sample a unit interval with no more than 0.01 distance between points; an equivalent sampling of a 10- dimensional unit hypercube with a lattice with a spacing of 0.01 between adjacent points would require 1020 sample points: thus, in some sense, the 10-dimensional hypercube can be said to be a factor of 1018 "larger" than the unit interval. (Adapted from an example by R. E. Bellman; see below.)

16  Thousands of genes  Relatively few patients  Is there a curse? 8/29/03Instance Based Classification16 gene patient g1g1 g2g2 g3g3 …gngn disease p1p1 x 1,1 x 1,2 x 1,3 …x 1,n Y p2p2 x 2,1 x 2,2 x 2,3 …x 2,n N............ pmpm x m,1 x m,2 x m,3 …x m,n ?

17  Bayesian could  Think of discrete data as being pre-binned  Remember RNA classification  Data in each dimension was A, C, U, or G 8/29/03Instance Based Classification17 How measure distance? A might be closer to G than C or U (A and G are both purines while C and U are pyrimidines). Dimensional distance becomes domain specific. Representation becomes all important If could arrange appropriately could use techniques like Hamming distances Representation becomes all important If could arrange appropriately could use techniques like Hamming distances

18 RednessYellownessMassVolumeClass 4.8164722.347954125.508225.01441apple 2.0363184.879481125.877518.2101lemon 2.7673833.353061109.968733.53737orange 4.3272483.322961118.426619.07535peach 2.961974.124945159.257329.00904orange 5.6557191.706671147.069539.30565apple 8/29/03Instance Based Classification18  First few records in the training data  See any issues?  Hint: think of how Euclidean distance is calculated Should really normalize the data

19  Function approximation  Real valued prediction: take average of nearest k neighbors  If don’t know the function and/or it is too complex to “learn”, just plug-in a new value the KNN classifier can “learn” the predicted value on the fly by averaging the nearest neighbors 8/29/0319Instance Based Classification Why average?

20  Choose an m and b that minimizes the squared error  But again, computationally How? 8/29/03Instance Based Classification20

21  If want to learn an instantaneous slope  Can do local regression  Get the slope of a line that fits just the local data 8/29/03Instance Based Classification21

22  For each of the training datum we know what Y should be  If we have a randomly generated m and b, these, along with X will tell us a predicted Y  Know whether the m and b yield too large or too small a prediction  Can nudge “m” and “b” in an appropriate direction (+ or -)  Sum these proposed nudges across all training data 8/29/03Instance Based Classification22 Target Y too low Line represents output or predicted Y

23  Which way should m go to reduce error? 8/29/03Instance Based Classification23 y actual Rise b

24  Locally weighted linear regression  Would still perform gradient descent  Becomes a global function approximation 8/29/03Instance Based Classification24

25  KNN highly effective for many practical problems  With sufficient training data  Robust to noisy training  Work back-loaded  Susceptible to dimensionality curse 8/29/03Instance Based Classification25

26 8/29/0326Instance Based Classification


Download ppt "KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make."

Similar presentations


Ads by Google