Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong.

Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong Wong

Outline Background Background –Classifiers »Discriminative classifiers: Support Vector Machines »Generative classifiers: Naïve Bayesian Classifiers Motivation Motivation Discriminative Naïve Bayesian Classifiers Discriminative Naïve Bayesian Classifiers Experiments Experiments Discussions Discussions Conclusion Conclusion

Background Discriminative Classifiers Discriminative Classifiers –Directly maximize a discriminative function or posterior function –Example: Support Vector Machines SVM

Background Generative Classifiers Generative Classifiers –Model the joint distribution for each class P(x|C) and then use Bayes rules to construct posterior classifiers P(C|x). –Example: Naïve Bayesian Classifiers »Model the distribution for each class under the assumption: each feature of the data is independent with others features, when given the class label. Constant w.r.t. C Combining the assumption

Background Comparison Comparison Example of Missing Information: From left to right: Original digit, 50% missing digit, 75% missing digit, and occluded digit.

Background Why Generative classifiers are not accurate as Discriminative classifiers? Why Generative classifiers are not accurate as Discriminative classifiers? Pre-classified dataset Sub-dataset D1 for Class 1 Sub-dataset D2 for Class 2 Estimate the distribution P1 to approximate D1 accurately Estimate the distribution P2 to approximate D2 accurately Use Bayes rule to perform classification 1. It is incomplete for generative classifiers to just approximate the inner-class information. 2. The inter-class discriminative information between classes are discarded Scheme for Generative classifiers in two-category classification tasks

Background Why Generative Classifiers are superior to Discriminative Classifiers in handling missing information problems? Why Generative Classifiers are superior to Discriminative Classifiers in handling missing information problems? –SVM lacks the ability under the uncertainty –NB can conduct uncertainty inference under the estimated distribution. A is the feature set T is the subset of A, which is missing

Motivation It seems that a good classifier should combine the strategies of discriminative classifiers and generative classifiers. It seems that a good classifier should combine the strategies of discriminative classifiers and generative classifiers. Our work trains one of the generative classifier: Naïve Bayesian Classifies in a discriminative way. Our work trains one of the generative classifier: Naïve Bayesian Classifies in a discriminative way.

Roadmap of our work Discriminative training

How our work relates to other work? Discriminative ClassifiersGenerative Classifiers 1. Jaakkola and Haussler NIPS98 HMM and GMMDiscriminative training 2. Difference: Our method performs a reverse process: From Generative classifiers to Discriminative classifiers Beaufays etc., ICASS99, Hastie etc., JRSS 96 Difference: Our method is designed for Bayesian classifiers.

How our work relates to other work? Optimization on Posterior Distribution P(C|x) 3. Difference: LR will encounter computational difficulties in handling missing information problems. When number of the missing or unknown features grows, it will be intractable to perform inference. Logistical Regression (LR)

Roadmap of our work

Discriminative Naïve Bayesian Classifiers Pre-classified dataset Sub-dataset D1 for Class I Sub-dataset D2 for Class 2 Estimate the distribution P1 to approximate D1 accurately Estimate the distribution P2 to approximate D2 accurately Use Bayes rule to perform classification Working Scheme of Naïve Bayesian Classifier Mathematic Explanation of Naïve Bayesian Classifier Easily solved by Lagrange Multiplier method

Discriminative Naïve Bayesian Classifiers (DNB) Optimization function of DNB Optimization function of DNB On one hand, the minimization of this function tries to approximate the dataset as accurately as possible. On the other hand, the optimization on this function also tries to enlarge the divergence between classes. Optimization on joint distribution directly inherits the ability of NB in handling missing information problems Divergence item

Discriminative Naïve Bayesian Classifiers (DNB) Complete Optimization problem Complete Optimization problem Cannot separately optimize and as in NB, Since they are interactive variables now.

Discriminative Naïve Bayesian Classifiers (DNB) Solve the Optimization problem Solve the Optimization problem –Nonlinear optimization problem under linear constraints. Using Rosen Gradient Projection methods

Discriminative Naïve Bayesian Classifiers (DNB) Gradient and Projection matrix Gradient and Projection matrix

Extension to Multi-category Classification problems

Experimental results Experimental Setup Experimental Setup –Datasets »5 benchmark datasets from UCI machine learning repository –Experimental Environments »Platform:Windows 2000 »Developing tool: Matlab 6.5

Without information missing  Observations –DNB outperforms NB in every datasets –DNB wins in 2 datasets while it loses in three dataets in comparison with SVM –SVM outperforms DNB in Segment and Satimages

With information missing DNB uses DNB uses to conduct inference when there is information missing SVM sets 0 values to the missing features (the default way to process unknown features in LIBSVM) SVM sets 0 values to the missing features (the default way to process unknown features in LIBSVM)

With information missing

1. Observations  NB demonstrates a robust ability in handling missing information problems.  DNB inherits the ability of NB in handling missing information problems while it has a higher classification accuracy than NB  SVM cannot deal with missing information problems easily.  In small datasets, DNB demonstrates a superior ability than NB.

Discussion Why SVM outperforms DNB when no information missing? Why SVM outperforms DNB when no information missing? SVM DNB  SVM directly minimizes the error rate, while DNB minimizes an intermediate term.  SVM assumes no model, while DNB assumes independent relationship among features. “all models are wrong but some are useful”.

Discussion How DNB relates to Fisher Discriminant (FD)? How DNB relates to Fisher Discriminant (FD)? FD  Using the difference of the mean between two classes as the divergence measure is not an informative way in comparison with using distributions.  FD is usually used as dimension reduction method rather than a classification method

Discussion Can DNB be extended to general Bayesian Network (BN) Classifier? Can DNB be extended to general Bayesian Network (BN) Classifier? –Finding optimal General Bayesian Network Classifiers is an NP-complete problem. –Structure learning problem will be involved. Direct application of DNB will encounter difficulties since the structure is non-fixed in restricted BNs. The tree-like discriminative Bayesian Network Classifier is ongoing. The tree-like discriminative Bayesian Network Classifier is ongoing.

Discussion Discriminative training of Tree-like Bayesian Network Classifiers Two reference distributions are used in each iteration. Approximate the Empirical distribution as close as possible And as far as possible from the distribution of the other dataset

Future work Extensive evaluations on discriminative Bayesian network classifiers including Discriminative Naïve Bayesian Classifiers and tree-like Bayesian Network Classifiers. Extensive evaluations on discriminative Bayesian network classifiers including Discriminative Naïve Bayesian Classifiers and tree-like Bayesian Network Classifiers.

Conclusion We develop a novel model named Discriminative Naïve Bayesian Classifiers We develop a novel model named Discriminative Naïve Bayesian Classifiers It outperforms Naïve Bayesian Classifiers when no information is missing It outperforms Naïve Bayesian Classifiers when no information is missing It outperforms SVMs in handling missing information problems. It outperforms SVMs in handling missing information problems.

Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong.

Similar presentations

Presentation on theme: "Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong.

Similar presentations

Presentation on theme: "Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong."— Presentation transcript:

Similar presentations

About project

Feedback