Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines

Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Support vector machine
SVMs Reprised. Administrivia I’m out of town Mar 1-3 May have guest lecturer May cancel class Will let you know more when I do...
Classification and Decision Boundaries
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
CES 514 – Data Mining Lecture 8 classification (contd…)
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Support Vector Machines Kernel Machines
Support Vector Machines
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
An Introduction to Support Vector Machines (M. Law)
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.
Biointelligence Laboratory, Seoul National University
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
Today’s Topics 11/10/15CS Fall 2015 (Shavlik©), Lecture 22, Week 101 Support Vector Machines (SVMs) Three Key Ideas –Max Margins –Allowing Misclassified.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines Tao Department of computer science University of Illinois.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Support Vector Machine
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Support Vector Machines
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Support Vector Machines
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta

Outline Review of SVMs Formalization of FSVMs Training algorithm for FSVMs Noisy distribution model Determination of heuristic function Experiment results

SVM – brief review Classification technique Method: Maps points into high-dimensional feature space Finds a separating hyperplane that maximizes the margin

Set S of labeled training points: Each point belongs to one of the two classes, Let be feature space vector, with mapping from to feature space Then equation of hyperplane: For linearly separable data, Optimization problem: Subject to

For non-linearly separable data (soft margin), introduce slack variables Optimization problem: -> some measure of amount of misclassifications Limitation: All training points are treated equal

FSVM – Fuzzy SVM each training point belongs exactly to no more than one class some training points are more important than others- these meaningful data points must be classified correctly (even if some noisy, less important points, are misclassified). Fuzzy membership: s i : how much point x i belongs to one class (amount of meaningful information in the data point) : amount of noise in the data point

Set S of labeled training points: Optimization problem: large C -> narrower margin, less misclassifications - Regularization constant

Lagrange function: Taking derivatives:

Optimization problem: Kuhn-Tucker conditions : λ – lagrange multiplier g(x) – inequality constraint

Points with are support vectors (lie on red boundary). => Points with same could be different types of support vectors in FSVM due to => SVM – one free parameter (C) FSVM - number of free params = C, s i (~ number of training points)  lies on margin of hyperplane Two types of support vectors: misclassified if > 1

Training algorithm for FSVMs Objective function for optimization  Minimization of the error function  Maximization of the margin  The balance is controlled by tuning C

Selection of error function Least absolute value in SVMs Least square value in LS-SVMs  Suykens and Vanewalle, 1999 Suykens and Vanewalle, 1999  the QP is transformed to solving a linear system the QP is transformed to solving a linear system  the support values are mostly nonzero the support values are mostly nonzero

Selection of error function maximum likelihood method  when the underlying error probability can be estimated  optimization problem becomes

Maximum likelihood error limitation  the precision of estimation of hyperplane depends on estimation of error function  the estimation of error is reliable only when the underlying hyperplane is well estimated

Selection of error function Weighted least absolute value  each data is associated with a cost or importance factor  when the noise distribution model of data given  p x (x) is the probability that point x is not a noise  optimization becomes

Weighted least absolute value Relation with FSVMs take p x (x) as a fuzzy membership, i.e p x (x) = s

Selection of max margin term Generalized optimal plane (GOP)‏ Robust linear programming(RLP)‏

Implementation of NDM Goal  build a probability distribution model for data Ingredients  a heuristic function: highly relevant to p x (x)  confident factor: h C  trashy factor: h T

Density function for data

Heuristic function Kernel-target alignment K-nearest neighbors Basic idea: Outliers have higher probability to be noise

Kernel-target alignment Measurement of how likely the point x i is noise.

K-nearest neighbors: example Gaussian kernel can be written as the cosine of the angel between two vectors in the feature space

The outlier data point x i will have smaller value of f K (x i,y i ) Use f K (x,y) as a heuristic function h(x)

K-nearest neighbors (k-NN) For each x i, the set S i k consists k nearest neighbors of x i n i is the number of data points in the set S i k that the class label is the same as the class label of data point x i Heuristic function h(x i )=n i

Comparison of two heuristic function Kernel-target alignment  Operate in the feature space, use the information of all data points to determine the heuristic for one point k-NN  Operate in the original space, use the information of k data points to determine the heuristic for one point How about combine them two?!

Overall Procedure for FSVMs 1. Use SVM algorithm to get the optimal kernel parameters and the regularization parameter C 2. Fix the kernel parameters and the regularization parameter C, determine heuristic function h(x), and use exhaustive search to choose the confident factor h c and trashy factor h T, mapping degree d and the fuzzy membership lower bound σ

Experiments Data with time property

SVM results for data with time property FSVM results for data with time property

Experiments Two classes with different weighting

Results from SVM Results from FSVM

Experiments Using class center to reduce effect of outliers.

Results from SVM Results from FSVM

Experiments (setting fuzzy membership) Kernel Target Alignment  Two step strategy Fix f UB k and f LB k as following:  f UB k = max i f k (x i, yi ) and f LB k = min i f k (x i, yi ) Find σ and d using a two-dimensional search. Now, find f UB k and f LB k

Experiments (setting fuzzy membership) k-Nearest Neighbor  Perform a two-dimensional search for parameters σ and k.  k UB = k/2 and d=1 are fixed.

Experiments Comparison of results from KTA and k-NN with other classifiers (Test Errors)

Conclusion FSVMs work well when the average training error is high, which means it can improve performance of SVMs for noisy data. No. of free parameters for FSVMs is very high C, s i for each data point. Results using KTA and k-NN are similar but KTA is more complicated and takes more time to find optimal values of parameters. This papers studies FSVMs only for two classes, multi-class scenarios are not explored.