Boosted Augmented Naive Bayes. Efficient discriminative learning of

Slides:



Advertisements
Similar presentations
An Introduction to Boosting Yoav Freund Banter Inc.
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
Boosting Rong Jin.
CMPUT 466/551 Principal Source: CMU
Computer vision: models, learning and inference
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Final review LING572 Fei Xia Week 10: 03/13/08 1.
Sparse vs. Ensemble Approaches to Supervised Learning
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
THE LEARNING AND USE OF GRAPHICAL MODELS FOR IMAGE INTERPRETATION Thesis for the degree of Master of Science By Leonid Karlinsky Under the supervision.
Computational Learning Theory
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Sparse vs. Ensemble Approaches to Supervised Learning
Latent Boosting for Action Recognition Zhi Feng Huang et al. BMVC Jeany Son.
Data mining and machine learning A brief introduction.
Boosting Neural Networks Published by Holger Schwenk and Yoshua Benggio Neural Computation, 12(8): , Presented by Yong Li.
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Benk Erika Kelemen Zsolt
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings:
Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.
An Introduction to Support Vector Machines (M. Law)
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
CLASSIFICATION: Ensemble Methods
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Bayesian Averaging of Classifiers and the Overfitting Problem Rayid Ghani ML Lunch – 11/13/00.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
NTU & MSRA Ming-Feng Tsai
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Boosting and Additive Trees (Part 1) Ch. 10 Presented by Tal Blum.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
LEARNING FROM EXAMPLES AIMA CHAPTER 18 (4-5) CSE 537 Spring 2014 Instructor: Sael Lee Slides are mostly made from AIMA resources, Andrew W. Moore’s tutorials:
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Learning Deep Generative Models by Ruslan Salakhutdinov
An Empirical Comparison of Supervised Learning Algorithms
A Fast Trust Region Newton Method for Logistic Regression
Boosting and Additive Trees (2)
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Boosting and Additive Trees
Markov Random Fields with Efficient Approximations
Model Averaging with Discrete Bayesian Network Classifiers
Bayesian Averaging of Classifiers and the Overfitting Problem
Janardhan Rao (Jana) Doppa, Alan Fern, and Prasad Tadepalli
A New Boosting Algorithm Using Input-Dependent Regularizer
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Behrouz Minaei, William Punch
Bucket Renormalization for Approximate Inference
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Model generalization Brief summary of methods
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Presentation transcript:

Boosted Augmented Naive Bayes. Efficient discriminative learning of Boosted Augmented Naive Bayes Efficient discriminative learning of Bayesian network classifiers Yushi Jing GVU, College of Computing, Georgia Institute of Technology Vladimir Pavlović Department of Computer Science, Rutgers University James M. Rehg

Contribution Boosting approach to Bayesian network classification Additive combination of simple models (e.g. Naïve Bayes) Weighted maximum likelihood learning Generalizes Boosted Naïve Bayes (Elkan 1997) Comprehensive experimental evaluation of BNB. Boosted Augmented Naïve Bayes (BAN) Efficient training algorithm Competitive classification accuracy Naïve Bayes, TAN, BNC (2004), ELR (2001)

Bayesian network classifiers Modular and Intuitive graphical representation Explicit Probabilistic Representation Bayesian network classifiers Joint distribution Conditional distribution Class Label How to efficiently train Bayesian network discriminatively to improve its classification accuracy?

Parameter Learning Maximum Likelihood parameter learning Efficient parameter learning algorithm Maximizes LLG score No analytic solution for parameters that maximizes CLLG

Model selection A ML does not optimize CLLA ELRA optimizes CLLA (Greiner and Zhou, 2002) Excellent classification accuracy Computationally expensive in training B ML optimizes CLLB when B is optimal BNC algorithm searches for the optimal structure (Grossman and Domingos, 2004) C Ensemble of sparse model as an alternative to B Using ML to train each sparse model

Our Goal: Talk outline Combine parameter and structure optimization Avoid over-fitting Retain training efficiency Talk outline Minimization function for Boosted Bayesian network Empirical Evaluation of Boosted Naïve Bayes Boosted Augmented Naïve Bayes (BAN) Empirical Evaluation of BAN

Exponential Loss Function (ELF) Boosted Bayesian network classifier minimizes ELF function. ELFF is an upper bound of –CLLF

Minimizing ELF via ensemble method Adaboost (Population version) constructs F(x) additively to approximately minimizes ELFF Discriminatively updates the data weights Tractable ML learning to train the parameters …

Results: 25 UCI datasets (BNB) (13) BNB vs. NB 0.151 vs. 0.173

Results: 25 UCI datasets (BNB) (13) BNB (9) (14) BNB vs. NB 0.151 vs. 0.173 BNB vs. TAN 0.151 vs. 0.184 NB (2) TAN (2) BNB (5*) (16) BNB (7) (15) BNB vs. ELR-NB 0.151 vs. 0.161 BNB vs. BNC-2P 0.151 vs. 0.164 ELR-NB (4*) BNC-2P (3)

Evaluation of BNB Computationally Efficient method O(MNT) , T = 5~20, O(MN) Good classification Accuracy Outperforms NB, TAN Competitive with ELR, BNC Sparse structure + boosting = competitive accuracy Potential drawbacks Strongly correlated features (Corral, etc)

Structure Learning Challenge: Our proposed solution: Efficiency NP-hard problem K-2, Hill Climbing search still examines polynomial number of structures Resisting overfitting Structure controls classifier capacity Our proposed solution: Combines sparse model to form an ensemble Constrains edge selection

Creating Step 1 (Friedman et al. 1999) Build pair-wise conditional mutual information table Create maximum spanning tree using conditional mutual information as edge weight Convert a undirected graph into a directed graph 1 2 3 4

Initial structure Select Naïve Bayes Create BNB via AdaBoost Evaluate BNB 1 2 3 4

Iteratively adding edges Ensemble CLL = -0.75 Ensemble CLL = -0.65 Ensemble CLL = -0.50 Ensemble CLL = -0.55? 1 2 3 4

Final BAN structure Ensemble of the final structure produced by

Analysis of BAN BAN The base structure is sparser than BNC model BAN uses an ensemble of sparser models to approximate a densely connected structure Example of BAN model Example of BNC-2P model

Computational complexity of BAN Training Complexity: O(MN^2+ MNTS) O (MN^2) G_tree O (MNTS) Structure Search T => boosting iteration per structure S => number of structure examined S < N Empirical training time T = 5~25, S = 0~5 Approximately 25-100 times the training of NB

Result (simulated dataset): True structure: Naïve Bayes: 25 different distribution CPT table Number of features 4000 samples, 5 fold cross validation

Results: (simulated dataset): (6) BAN(19) NB (0) BAN VS NB

Results: (simulated dataset): True structure: BNB (0) BAN (3) 22 BAN VS BNB Correct edges added under BAN BNB achieved optimal error in 22 datasets BAN outperforms BNB in the remaining 3

Results: 25 UCI datasets (BAN) Standard datasets for Bayesian network classifiers Friedman et. al. 1999 Greiner and Zhou 2002 Grossman and Domingos 2004 5 fold cross validation Implemented NB, TAN, BAN, BNB, BNC-2P Obtained results for ELR-NB, ELR-TAN

Results: BAN vs. Standard method (13) BAN (10) BAN (10) NB (2) TAN (2) BAN VS NB 0.141 VS 0.173 BAN VS TAN 0.141 VS 0.184

Results: BAN vs. Structure Learning BNC (1) BAN VS BNC-2P 0.141 VS 0.164 BAN contains 0-5 augmented edges BNC-2P contains 4-16 augmented edges

Results: BAN vs. ELR Error stats directly taken from published results (13) (14) BAN (5)* BAN (6)* BAN (4)* BAN VS ELR-NB 0.141 vs. 0.161 BAN VS ELR-TAN 0.141 vs. 0.155 Error stats directly taken from published results BAN is more efficient to train

Evaluation of BAN vs. BNB Comparison under significance test BAN outperforms BNB (7) Corral 2% - 5% BNB outperforms BAN (2) 0.5%-2% Not significant 13 BAN choose BNB as base structure IRIS, MOFN Average testing error 0.141 vs. 0.151 BAN outperforms BNB (16) BNB outperforms BAN (6) BAN (7) (14) BNB (2) BAN VS BNB 0.141 VS 0.151

Conclusion An ensemble of sparse model as an alternative to structure and parameter optimization Simple to implement Very efficient in training Competitive classification accuracy NB, TAN, HGC BNC ELR

Future Work Extend BAN to handle sequential data Analyze the class of Bayesian network classifiers that can be approximated with an ensemble of sparse structures. Can the BAN model parameters be obtained through parameter learning given the final model structure? Can we use BAN approach to learn generative models?