ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct. 2005.

Slides:



Advertisements
Similar presentations
Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
ECG Signal processing (2)
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Support Vector Machines
SVM—Support Vector Machines
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.
Three Papers: AUC, PFA and BIOInformatics The three papers are posted online.
Classification and risk prediction Usman Roshan. Disease risk prediction What is the best method to predict disease risk? –We looked at the maximum likelihood.
Variations of Minimax Probability Machine Huang, Kaizhu
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.
MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning.
Constructing a Large Node Chow-Liu Tree Based on Frequent Itemsets Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.
Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong.
Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,
Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.
A Study of the Relationship between SVM and Gabriel Graph ZHANG Wan and Irwin King, Multimedia Information Processing Laboratory, Department of Computer.
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Learning Maximum Likelihood Bounded Semi-Naïve Bayesian network classifiers Huang, Kaizhu Sept.25, 2002 Huang, Kaizhu Sept.25, 2002.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Real-Time Odor Classification Through Sequential Bayesian Filtering Javier G. Monroy Javier Gonzalez-Jimenez
Crash Course on Machine Learning
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
NAÏVE CREDAL CLASSIFIER 2 : AN EXTENSION OF NAÏVE BAYES FOR DELIVERING ROBUST CLASSIFICATIONS 이아람.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept
An Introduction to Support Vector Machines (M. Law)
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
11 Overview of Predictive Learning Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota Presented at the University.
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
An Introduction to Support Vector Machine (SVM)
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller 國立雲林科技大學 National Yunlin University of.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Dd Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems Generalized Optimal Kernel-based Ensemble Learning for HS Classification.
CpSc 881: Machine Learning
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Ensemble Methods in Machine Learning
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
ICONIP 2010, Sydney, Australia 1 An Enhanced Semi-supervised Recommendation Model Based on Green’s Function Dingyan Wang and Irwin King Dept. of Computer.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Semi-supervised Machine Learning Gergana Lazarova
Discriminative Training of Chow-Liu tree Multinet Classifiers
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Discriminative Frequent Pattern Analysis for Effective Classification
LECTURE 23: INFORMATION THEORY REVIEW
Recap: Naïve Bayes classifier
Presentation transcript:

ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct. 2005

ICONIP 2005 Outline Background Background –Classifiers »Discriminative classifiers: Support Vector Machines »Generative classifiers: Naïve Bayesian Classifiers Motivation Motivation Discriminative Naïve Bayesian Classifier Discriminative Naïve Bayesian Classifier Experiments Experiments Discussions Discussions Conclusion Conclusion

ICONIP 2005 Background Discriminative Classifiers Discriminative Classifiers –Directly maximize a discriminative function or posterior function –Example: Support Vector Machines SVM

ICONIP 2005 Background Generative Classifiers Generative Classifiers –Model the joint distribution for each class P(x|C) and then use Bayes rules to construct posterior classifiers P(C|x), C : class label, x: features. –Example: Naïve Bayesian Classifiers »Model the distribution for each class under the assumption: each feature of the data is independent of others features, when given the class label. Constant w.r.t. C Combining the assumption

ICONIP 2005 Background Comparison Comparison Example of Missing Information: From left to right: Original digit, 50% missing digit, 75% missing digit, and occluded digit.

ICONIP 2005 Background Why Generative classifiers are not accurate as Discriminative classifiers? Why Generative classifiers are not accurate as Discriminative classifiers? Training set subset D1 labeled as Class 1 subset D2 Labelled as Class 2 Estimate distribution P1 to approximate D1 Estimate distribution P2 to approximate D2 Construct Bayes rule for classification 1. It is incomplete for generative classifiers to just approximate the inner-class information. 2. The inter-class discriminative information between classes are discarded Scheme for Generative classifiers in two-category classification tasks Needed!

ICONIP 2005 Background Why Generative Classifiers are superior to Discriminative Classifiers in handling missing information problems? Why Generative Classifiers are superior to Discriminative Classifiers in handling missing information problems? –SVM lacks the ability under the uncertainty –NB can conduct uncertainty inference under the estimated distribution. A is the feature set T is the subset of A, which is missing A-T is thus the known features

ICONIP 2005 Motivation It seems that a good classifier should combine the strategies of discriminative classifiers and generative classifiers. It seems that a good classifier should combine the strategies of discriminative classifiers and generative classifiers. Our work trains one of the generative classifier: Naïve Bayesian Classifier in a discriminative way. Our work trains one of the generative classifier: Naïve Bayesian Classifier in a discriminative way.

ICONIP 2005 Interaction is needed!! Discriminative Naïve Bayesian Classifier Training set Sub-set D1 labeled as Class I Sub-set D2 labeled as Class 2 Estimate the distribution P1 to approximate D1 Estimate the distribution P2 to approximate D2 Use Bayes rule for classification Working Scheme of Naïve Bayesian Classifier Mathematic Explanation of Naïve Bayesian Classifier Easily solved by Lagrange Multiplier method

ICONIP 2005 Discriminative Naïve Bayesian Classifier (DNB) Optimization function of DNB Optimization function of DNB On one hand, the minimization of this function tries to approximate the dataset as accurately as possible. On the other hand, the optimization on this function also tries to enlarge the divergence between classes. Optimization on joint distribution directly inherits the ability of NB in handling missing information problems Divergence item

ICONIP 2005 Discriminative Naïve Bayesian Classifier (DNB) Complete Optimization problem Complete Optimization problem Nonlinear optimization problem under linear constraints.

ICONIP 2005 Discriminative Naïve Bayesian Classifier (DNB) Solve the Optimization problem Solve the Optimization problem –Using Rosen Gradient Projection methods

ICONIP 2005 Discriminative Naïve Bayesian Classifier (DNB) Gradient and Projection matrix Gradient and Projection matrix

ICONIP 2005 Extension to Multi-category Classification problems

ICONIP 2005 Experimental results Experimental Setup Experimental Setup –Datasets »4 benchmark datasets from UCI machine learning repository –Experimental Environments »Platform:Windows 2000 »Developing tool: Matlab 6.5

ICONIP 2005 Without information missing  Observations –DNB outperforms NB in every datasets –DNB wins in 2 datasets while it loses in the other 2 datasets in comparison with SVM –SVM outperforms DNB in Segment and Satimages

ICONIP 2005 With information missing Scheme Scheme –DNB uses to conduct inference when there is information missing –SVM sets 0 values to the missing features (the default way to process unknown features in LIBSVM) …………..(5)

ICONIP 2005 With information missing Error Rate in Iris with missing information Setup : Randomly discard features gradually from a small percentage to a big percentage Error Rate in Vote with missing information

ICONIP 2005 With information missing Error Rate in Satimage with missing informationError Rate in DNA with missing information

ICONIP 2005 Summary of Experiment Results 1. Observations  NB demonstrates a robust ability in handling missing information problems.  DNB inherits the ability of NB in handling missing information problems while it has a higher classification accuracy than NB  SVM cannot deal with missing information problems easily.

ICONIP 2005 Discussion Can DNB be extended to general Bayesian Network (BN) Classifier? Can DNB be extended to general Bayesian Network (BN) Classifier? –Structure learning problem will be involved. Direct application of DNB will encounter difficulties since the structure is non-fixed in restricted BNs. –Finding optimal General Bayesian Network Classifiers is an NP-complete problem. Discriminative training on constrained Bayesian Network Classifier is possible… Discriminative training on constrained Bayesian Network Classifier is possible…

ICONIP 2005 Conclusion We develop a novel model named Discriminative Naïve Bayesian Classifiers We develop a novel model named Discriminative Naïve Bayesian Classifiers –It outperforms Naïve Bayesian Classifier when no information is missing –It outperforms SVMs in handling missing information problems.