Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA 92521 Who.

Slides:



Advertisements
Similar presentations
Machine Learning Instance Based Learning & Case Based Reasoning Exercise Solutions.
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Support Vector Machines and Margins
Indian Statistical Institute Kolkata
Mining Time Series.
Classification and Decision Boundaries
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Instance Based Learning. Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return the answer associated.
Classification Continued
Three kinds of learning
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Dear SIR, I am Mr. John Coleman and my sister is Miss Rose Colemen, we are the children of late Chief Paul Colemen from Sierra Leone. I am writing you.
Results Comparison with existing approaches on benchmark datasets Evaluation on a uveal melanoma datasetEvaluation on the two-spiral dataset Evaluation.
Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2.
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
JM - 1 Introduction to Bioinformatics: Lecture VIII Classification and Supervised Learning Jarek Meller Jarek Meller Division.
This week: overview on pattern recognition (related to machine learning)
Data mining and machine learning A brief introduction.
GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013.
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
Mining Time Series.
Semi-Supervised Time Series Classification & DTW-D REPORTED BY WANG YAWEN.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
Learning from observations
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Exact indexing of Dynamic Time Warping
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin, Sanghack Lee, Ngot Bui.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
K nearest neighbors algorithm Parallelization on Cuda PROF. VELJKO MILUTINOVIĆ MAŠA KNEŽEVIĆ 3037/2015.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
KNN & Naïve Bayes Hongning Wang
Data Mining Introduction to Classification using Linear Classifiers
2- Syllable Classification
The Classification Problem
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
*Currently, self driving cars do a bit of both.
Supervised Time Series Pattern Discovery through Local Importance
Dimensionality reduction
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
Nearest-Neighbor Classifiers
A Fast and Scalable Nearest Neighbor Based Classification
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
CS Fall 2016 (Shavlik©), Lecture 2
Instance Based Learning
Nearest Neighbor Classifiers
CSCI N317 Computation for Scientific Applications Unit Weka
Nearest Neighbor Classifiers
CSE4334/5334 Data Mining Lecture 7: Classification (4)
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Data Mining CSCI 307, Spring 2019 Lecture 11
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who is smarter, Humans or Pigeons?

Section 1.1 (again) Section 4.1 Section 4.3 Read in Detail Section Section 4.34 Glance over

Examples of class AExamples of class B 1) What class is this object? 2) What class is this object?

Examples of class AExamples of class B 1) What class is this object? 2) What class is this object?

Examples of class AExamples of class B 1) What class is this object? 2) What class is this object?

The “game” we have just been playing is Supervised Classification. Why is it useful?

Examples of class A People who contracted disease X. Examples of class B People who are disease free. 1) What class is this person? Is this person at risk of getting the disease? 2) What class is this person? Is this person at risk of getting the disease? Patient temperature 99 Blood count 4214 Weight 167 Patient temperature 98 Blood count 3214 Weight 179 Patient temperature 97 Blood count 2763 Weight 121 Patient temperature 99 Blood count 3234 Weight 117 Patient temperature 97 Blood count 0012 Weight 190 Patient temperature 99 Blood count 0114 Weight 202 Patient temperature 98 Blood count 1014 Weight 345 Patient temperature 99 Blood count 1214 Weight 190 Patient temperature 97 Blood count 0118 Weight 280 Patient temperature 99 Blood count 3452 Weight 99

Examples of class AExamples of class B 1) What class is this object? 2) What class is this object?

Examples of class AExamples of class B ) What class is this object? 2) What class is this object?

Classification There are many classification algorithms, in this class we will consider only… Simple Linear Classifier. Nearest Neighbor Classifier. Decision Tree. Naïve Bayes.

The classification problem The classification algorithm is shown a number of labeled examples from the problem domain of interest. (this collection of labeled data is called the training set). The algorithm builds a model that “explains” the labeling of the examples. (this model may or may not be accessible to humans, depending on the algorithm). At some future time the algorithm is shown an unlabeled example, and asked to classify it. Shape Domain Cat Domain

Class:IncomeSavings Num_credit_cardsIs_married A:123,00034,100 0 N B: 24,000 -2, Y A: 45,200 12,100 3 N … ….. …… … … B: 423,020 23,440 0 N B: 14,000 87,000 0 Y A: 11,200 -2,000 2 Y Sample dataset for a credit worthiness problem ?123,00034,100 0 N What is this instances class? Number of rows is the size of the training set, number of columns is the dimensionality of the training set, each row is called an instance (or exemplar) each column is called a feature.

Visualizing classification algorithms We can visualize some classification algorithms in 2D… Warning: This tends to make the problem look easy... Class feature 1 feature 2 height1 height2 A34 B52.5 A1.55 ……...

A trivial machine learning example represented in 2D Euclidean Space. The blue circles and red squares represent the two classes in our training data, and the black shapes are the objects we are trying to classify. From now on we will only consider the 2D plots when explaining algorithms and problems. We should always remember that this plots are representations of real world objects.

Simple Linear Classifier A dataset which is not linearly separable

Piecewise Linear Classifier Simple Quadratic Classifier (or some other function)

This example is one for which we know a perfect rule, “above the diagonal is circle class, below the diagonal is square class”. (Don’t forget that for real world problems we can never know a perfect rule, even if there is one). What happens if we learn a piecewise linear classifier or a quadratic classifier on this dataset with small training dataset? This problem is called overfitting. Piecewise Linear Classifier Simple Quadratic Classifier

The Nearest Neighbor Algorithm The nearest neighbor algorithm (NN) works by projecting the item to be classified into the same space as the training data, then finding the labeled exemplar which is closest. Whatever class that nearest neighbor is, is then assigned to the item to be classified. In this example, the item (6, 2) is correctly classified. In spite of its amazing simplicity, Nearest Neighbor is one of the best algorithms for many problems. We can use many different distance measures to measure the distance between objects. Typically Euclidean distance is used.

Evaluation of Classification Leaving one out Cross fold validation

Discussion of Nearest Neighbor I It is sensitive to irrelevant features. One possible solution is search for good subsets. It is sensitive to noise. One possible solution is use KNN Suppose there is a disease. Although we don’t know this, it happens that if your blood sugar is over 5.5 you have the disease and below you don’t….

Discussion of Nearest Neighbor II It is sensitive to the units in which the features are measured. One possible solution is to normalize the features. X axis measured in feet Y axis measure in dollars X axis measured in inches Y axis measure in dollars

Discussion of Nearest Neighbor III Scalability

A Famous Problem. R. A. Fisher’s Iris Dataset. 3 classes 50 of each class The task is to classify Iris plants into one of 3 varieties using the Petal Length and Petal Width. Iris SetosaIris VersicolorIris Virginica