Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN.

Similar presentations


Presentation on theme: "Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN."— Presentation transcript:

1 Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN

2 k-NN 2 Classification problems Exemplar characterised by a set of features; decide class to which exemplar belongs Compare regression problems Exemplar characterised by a set of features; decide value of continuous output (dependant) variable

3 k-NN 3 Lazy v’s Eager D-Trees are an example of an Eager ML Algorithm  D-Tree is built in advance off-line  Less work to do at run-time k-NN is a Lazy approach  Little work done off-line  keep training examples, find k nearest at run time

4 k-NN 4 To what class does this belong? Classifying Apples & Pears

5 k-NN 5 Sal Amt Age JCat Gen  Sal  Amt  JC  Gn  Age Loan Approval System Nearest Neighbour based on Similarity:  What does similar mean? Sal Amt Age JCat Gen

6 k-NN 6 Imagine just 2 features 2 features  Amount  Monthly_Sal o o o o o o Amount Monthly_Sal x x x x x x x x o o o

7 k-NN 7 Voronoi Diagrams query point q nearest neighbor x Indicate areas in which prediction influenced by same set of examples

8 k-NN 8 3-Nearest Neighbors query point q 2x,1o

9 k-NN 9 7-Nearest Neighbors query point q 7 nearest neighbors 3x,4o

10 k-NN 10 k-NN and Noise 1-NN easy to implement  susceptible to noise a misclassification every time a noisy pattern retrieved k-NN with k  3 will overcome this

11 k-NN 11 k-Nearest Neighbour D set of training samples Find k nearest neighbours to q according to this difference criterion For each x i  D where Category of q decided by its k Nearest Neighbours

12 k-NN 12 Minkowski Distances Generalisation of Euclidean (p=2) & Manhattan (p=1) distance

13 k-NN 13 Appropriate  functions To what class does this belong?

14 k-NN 14 e.g. MVT (now part of Agilent) Machine Vision for inspection of PCBs components present or absent solder joints good or bad

15 k-NN 15 Components present? Absent Present

16 k-NN 16 Characterise image as a set of features

17 k-NN 17 Dimension reduction in k-NN Not all features required  noisy features a hindrance Some examples redundant  retrieval time depends on no. of examples p features q best features n covering examples m examples Feature Selection Case Selection

18 k-NN 18 Condensed NN D set of training samples Find E where E  D; NN rule used with E should be as good as D choose x  D randomly, D  D \ {x}, E  {x}, DO learning?  FALSE, FOR EACH x  D classify x by NN using E, if classification incorrect then E  E  {x}, D  D \ {x}, learning  TRUE, WHILE (learning?  FALSE)

19 k-NN 19 Condensed NN 100 examples 2 categories Different CNN solutions

20 k-NN 20 Improving Condensed NN Different outcomes depending on data order  that’s a bad thing in an algorithm Sort data based on distance to nearest unlike neighbour A B identify exemplars near decision surface in diagram B more useful than A

21 k-NN 21 Condensed NN 100 examples 2 categories Different CNN solutions CNN using NUN

22 k-NN 22 Aside: Real Data is not Uniform Iris Data in two dimensions

23 k-NN 23 k-NN for spam filtering An Lazy Learning system will be able to adapt to the changing nature of Spam A Local Learning system is good for diverse disjunctive concepts like Spam Porn, mortgage, religion, cheap drugs… Work, family, play… Case base editing techniques exist to improve competence of a case base

24 k-NN 24 Spam Filtering C1 C2 C3 C4 Cn E-mail messages Feature Extraction Case F1 F2 F3 Fn S C1 ~ ~ ~ ~ S1 C2 ~ ~ ~ ~ S2 C3 ~ ~ ~ ~ S3 Cn ~ ~ ~ ~ Sn e.g. Bag-of-Words model in Text Classification & Information Retrieval.

25 k-NN 25 Texts as Bag-of-Words 1. The easiest online school on earth. 2. Here is the information from Computer Science for the next Graduate School Board meeting. 3. Please find attached the agenda for the Graduate School Board meeting. 1. The easiest online school on earth. 2. Here is the information from Computer Science for the next Graduate School Board meeting. 3. Please find attached the agenda for the Graduate School Board meeting. No.EasiestOnlineSchoolEarthInfo.ComputerScienceGraduateBoardMeetingPleaseFindAttachedAgenda 1xxxx 2xxxxxxx 3xxxxxxxx

26 k-NN 26 Texts as Bag-of-Words Similarity can be measured by dot-product between these vectors. Information is lost e.g. sequence information No.EasiestOnlineSchoolEarthInfo.ComputerScienceGraduateBoardMeetingPleaseFindAttachedAgenda 1xxxx 2xxxxxxx 3xxxxxxxx

27 k-NN 27 Runtime System ECUE - Email Classification Using Examples Email Feature Extraction Casebase Feature Selection Casebase Case Selection Casebase Classification spam! Target Case

28 k-NN 28 Classification k-NN classifier with k=3 Unanimous Voting to bias away from False Positives Case Retrieval Net [Lenz et al. 98] improves performance of kNN search

29 k-NN 29 K-NN: Summary ML avoids some KE effort Lazy v’s Eager How k-NN works Dimmension reduction  Condensed NN  Feature Selection Spam Filtering application


Download ppt "Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN."

Similar presentations


Ads by Google