CES 514 – Data Mining Lecture 8 classification (contd…)

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Classification / Regression Support Vector Machines
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques

CHAPTER 10: Linear Discrimination
Support Vector Machines
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Separating Hyperplanes
Classification and Decision Boundaries
Classification Neural Networks 1
Data Mining Classification: Alternative Techniques
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Machine Learning Neural Networks
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Support Vector Machines and Kernel Methods
CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Machines
Lecture 10: Support Vector Machines
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Data mining and machine learning A brief introduction.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
K Nearest Neighborhood (KNNs)
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
CS 478 – Tools for Machine Learning and Data Mining SVM.
An Introduction to Support Vector Machine (SVM)
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Supervised Learning. CS583, Bing Liu, UIC 2 An example application An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc)
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
PREDICT 422: Practical Machine Learning
Classification Nearest Neighbor
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Classification Nearest Neighbor
COSC 4335: Other Classification Techniques
Artificial Intelligence Lecture No. 28
Lecture Notes for Chapter 4 Artificial Neural Networks
Other Classification Models: Support Vector Machine (SVM)
COSC 4368 Machine Learning Organization
CSE4334/5334 Data Mining Lecture 7: Classification (4)
Presentation transcript:

CES 514 – Data Mining Lecture 8 classification (contd…)

Example: PEBLS l PEBLS: Parallel Examplar-Based Learning System (Cost & Salzberg) –Works with both continuous and nominal features  For nominal features, distance between two nominal values is computed using modified value difference metric (MVDM) –Each record is assigned a weight factor –Number of nearest neighbor, k = 1

Example: PEBLS Class Marital Status Single MarriedDivorced Yes201 No241 Distance between nominal attribute values: d(Single,Married) = | 2/4 – 0/4 | + | 2/4 – 4/4 | = 1 d(Single,Divorced) = | 2/4 – 1/2 | + | 2/4 – 1/2 | = 0 d(Married,Divorced) = | 0/4 – 1/2 | + | 4/4 – 1/2 | = 1 d(Refund=Yes,Refund=No) = | 0/3 – 3/7 | + | 3/3 – 4/7 | = 6/7 Class Refund YesNo Yes03 No34

Example: PEBLS Distance between record X and record Y: where: w X  1 if X makes accurate prediction most of the time w X > 1 if X is not reliable for making predictions

Support Vector Machines l Find a linear hyperplane (decision boundary) that will separate the data

Support Vector Machines l One Possible Solution

Support Vector Machines l Another possible solution

Support Vector Machines l Other possible solutions

Support Vector Machines l Which one is better? B1 or B2? l How do you define better?

Support Vector Machines l Find hyperplane maximizes the margin (e.g. B1 is better than B2.)

Support Vector Machines

l We want to maximize: –Which is equivalent to minimizing: –But subjected to the following constraints:  This is a constrained optimization problem –Numerical approaches to solve it (e.g., quadratic programming)

Overview of optimization Simplest optimization problem: Maximize f(x) (one variable) If the function has nice properties (such as differentiable), then we can use calculus to solve the problem. solve equation f’(x) = 0. Suppose a root is a. Then if f’’(a) < 0 then a is a maximum. Tricky issues: How to solve the equation f’(x) = 0? what if there are many solutions? Each is a “local” optimum.

How to solve g(x) = 0 Even polynomial equations are very hard to solve. Quadratic has a closed-form. What about higher- degrees? Numerical techniques: (iteration) bisection secant Newton-Raphson etc. Challenges: initial guess rate of convergence?

Functions of several variables Consider equation such as F(x,y) = 0 To find the maximum of F(x,y), we solve the equations and If we can solve this system of equations, then we have found a local maximum or minimum of F. We can solve the equations using numerical techniques similar to the one-dimensional case.

When is the solution maximum or minimum? Hessian: if the Hessian is positive definite in the neighborhood of a, then a is a minimum. if the Hessian is negative definite in the neighborhood of a, then a is a maximum. if it is neither, then a is a saddle point.

Application - linear regression Problem: given (x1,y1), … (xn, yn), find the best linear relation between x and y. Assume y = Ax + B. To find A and B, we will minimize Since this is a function of two variables, we can solve by setting and

Constrained optimization Maximize f(x,y) subject to g(x,y) = c Using Lagrange multiplier, the problem is formulated as maximizing: h(x,y) = f(x,y) + (g(x,y) – c) Now, solve the equations:

Support Vector Machines (contd) l What if the problem is not linearly separable?

Support Vector Machines l What if the problem is not linearly separable? –Introduce slack variables  Need to minimize:  Subject to:

Nonlinear Support Vector Machines l What if decision boundary is not linear?

Nonlinear Support Vector Machines l Transform data into higher dimensional space

Artificial Neural Networks (ANN) Output Y is 1 if at least two of the three inputs are equal to 1.

Artificial Neural Networks (ANN)

l Model is an assembly of inter-connected nodes and weighted links l Output node sums up each of its input value according to the weights of its links l Compare output node against some threshold t Perceptron Model or

General Structure of ANN Training ANN means learning the weights of the neurons

Algorithm for learning ANN l Initialize the weights (w 0, w 1, …, w k ) l Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples –Objective function: –Find the weights w i ’s that minimize the above objective function  e.g., backpropagation algorithm

WEKA

WEKA implementations WEKA has implementation of all the major data mining algorithms including: decision trees (CART, C4.5 etc.) naïve Bayes algorithm and all variants nearest neighbor classifier linear classifier Support Vector Machine clustering algorithms boosting algorithms etc.