Boosted Augmented Naive Bayes. Efficient discriminative learning of

Slides:

Advertisements

Similar presentations

An Introduction to Boosting Yoav Freund Banter Inc.

Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.

ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct

Boosting Rong Jin.

CMPUT 466/551 Principal Source: CMU

Computer vision: models, learning and inference

Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.

Final review LING572 Fei Xia Week 10: 03/13/08 1.

Sparse vs. Ensemble Approaches to Supervised Learning

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

THE LEARNING AND USE OF GRAPHICAL MODELS FOR IMAGE INTERPRETATION Thesis for the degree of Master of Science By Leonid Karlinsky Under the supervision.

Computational Learning Theory

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

Sparse vs. Ensemble Approaches to Supervised Learning

Latent Boosting for Action Recognition Zhi Feng Huang et al. BMVC Jeany Son.

Data mining and machine learning A brief introduction.

Boosting Neural Networks Published by Holger Schwenk and Yoshua Benggio Neural Computation, 12(8): , Presented by Yong Li.

Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.

Benk Erika Kelemen Zsolt

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings:

Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.

An Introduction to Support Vector Machines (M. Law)

Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB

Today Ensemble Methods. Recap of the course. Classifier Fusion

Ensemble Methods: Bagging and Boosting

CLASSIFICATION: Ensemble Methods

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.

Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Bayesian Averaging of Classifiers and the Overfitting Problem Rayid Ghani ML Lunch – 11/13/00.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.

1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:

NTU & MSRA Ming-Feng Tsai

Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.

1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.

Boosting and Additive Trees (Part 1) Ch. 10 Presented by Tal Blum.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.

Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.

LEARNING FROM EXAMPLES AIMA CHAPTER 18 (4-5) CSE 537 Spring 2014 Instructor: Sael Lee Slides are mostly made from AIMA resources, Andrew W. Moore’s tutorials:

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Learning Deep Generative Models by Ruslan Salakhutdinov

An Empirical Comparison of Supervised Learning Algorithms

A Fast Trust Region Newton Method for Logistic Regression

Boosting and Additive Trees (2)

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Boosting and Additive Trees

Markov Random Fields with Efficient Approximations

Model Averaging with Discrete Bayesian Network Classifiers

Bayesian Averaging of Classifiers and the Overfitting Problem

Janardhan Rao (Jana) Doppa, Alan Fern, and Prasad Tadepalli

A New Boosting Algorithm Using Input-Dependent Regularizer

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

Behrouz Minaei, William Punch

Bucket Renormalization for Approximate Inference

المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen

Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.

Model generalization Brief summary of methods

Sofia Pediaditaki and Mahesh Marina University of Edinburgh

Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,

Derek Hoiem CS 598, Spring 2009 Jan 27, 2009

Presentation transcript:

Boosted Augmented Naive Bayes. Efficient discriminative learning of Boosted Augmented Naive Bayes Efficient discriminative learning of Bayesian network classifiers Yushi Jing GVU, College of Computing, Georgia Institute of Technology Vladimir Pavlović Department of Computer Science, Rutgers University James M. Rehg

Contribution Boosting approach to Bayesian network classification Additive combination of simple models (e.g. Naïve Bayes) Weighted maximum likelihood learning Generalizes Boosted Naïve Bayes (Elkan 1997) Comprehensive experimental evaluation of BNB. Boosted Augmented Naïve Bayes (BAN) Efficient training algorithm Competitive classification accuracy Naïve Bayes, TAN, BNC (2004), ELR (2001)

Bayesian network classifiers Modular and Intuitive graphical representation Explicit Probabilistic Representation Bayesian network classifiers Joint distribution Conditional distribution Class Label How to efficiently train Bayesian network discriminatively to improve its classification accuracy?

Parameter Learning Maximum Likelihood parameter learning Efficient parameter learning algorithm Maximizes LLG score No analytic solution for parameters that maximizes CLLG

Model selection A ML does not optimize CLLA ELRA optimizes CLLA (Greiner and Zhou, 2002) Excellent classification accuracy Computationally expensive in training B ML optimizes CLLB when B is optimal BNC algorithm searches for the optimal structure (Grossman and Domingos, 2004) C Ensemble of sparse model as an alternative to B Using ML to train each sparse model

Our Goal: Talk outline Combine parameter and structure optimization Avoid over-fitting Retain training efficiency Talk outline Minimization function for Boosted Bayesian network Empirical Evaluation of Boosted Naïve Bayes Boosted Augmented Naïve Bayes (BAN) Empirical Evaluation of BAN

Exponential Loss Function (ELF) Boosted Bayesian network classifier minimizes ELF function. ELFF is an upper bound of –CLLF

Minimizing ELF via ensemble method Adaboost (Population version) constructs F(x) additively to approximately minimizes ELFF Discriminatively updates the data weights Tractable ML learning to train the parameters …

Results: 25 UCI datasets (BNB) (13) BNB vs. NB 0.151 vs. 0.173

Results: 25 UCI datasets (BNB) (13) BNB (9) (14) BNB vs. NB 0.151 vs. 0.173 BNB vs. TAN 0.151 vs. 0.184 NB (2) TAN (2) BNB (5*) (16) BNB (7) (15) BNB vs. ELR-NB 0.151 vs. 0.161 BNB vs. BNC-2P 0.151 vs. 0.164 ELR-NB (4*) BNC-2P (3)

Evaluation of BNB Computationally Efficient method O(MNT) , T = 5~20, O(MN) Good classification Accuracy Outperforms NB, TAN Competitive with ELR, BNC Sparse structure + boosting = competitive accuracy Potential drawbacks Strongly correlated features (Corral, etc)

Structure Learning Challenge: Our proposed solution: Efficiency NP-hard problem K-2, Hill Climbing search still examines polynomial number of structures Resisting overfitting Structure controls classifier capacity Our proposed solution: Combines sparse model to form an ensemble Constrains edge selection

Creating Step 1 (Friedman et al. 1999) Build pair-wise conditional mutual information table Create maximum spanning tree using conditional mutual information as edge weight Convert a undirected graph into a directed graph 1 2 3 4

Initial structure Select Naïve Bayes Create BNB via AdaBoost Evaluate BNB 1 2 3 4

Iteratively adding edges Ensemble CLL = -0.75 Ensemble CLL = -0.65 Ensemble CLL = -0.50 Ensemble CLL = -0.55? 1 2 3 4

Final BAN structure Ensemble of the final structure produced by

Analysis of BAN BAN The base structure is sparser than BNC model BAN uses an ensemble of sparser models to approximate a densely connected structure Example of BAN model Example of BNC-2P model

Computational complexity of BAN Training Complexity: O(MN^2+ MNTS) O (MN^2) G_tree O (MNTS) Structure Search T => boosting iteration per structure S => number of structure examined S < N Empirical training time T = 5~25, S = 0~5 Approximately 25-100 times the training of NB

Result (simulated dataset): True structure: Naïve Bayes: 25 different distribution CPT table Number of features 4000 samples, 5 fold cross validation

Results: (simulated dataset): (6) BAN(19) NB (0) BAN VS NB

Results: (simulated dataset): True structure: BNB (0) BAN (3) 22 BAN VS BNB Correct edges added under BAN BNB achieved optimal error in 22 datasets BAN outperforms BNB in the remaining 3

Results: 25 UCI datasets (BAN) Standard datasets for Bayesian network classifiers Friedman et. al. 1999 Greiner and Zhou 2002 Grossman and Domingos 2004 5 fold cross validation Implemented NB, TAN, BAN, BNB, BNC-2P Obtained results for ELR-NB, ELR-TAN

Results: BAN vs. Standard method (13) BAN (10) BAN (10) NB (2) TAN (2) BAN VS NB 0.141 VS 0.173 BAN VS TAN 0.141 VS 0.184

Results: BAN vs. Structure Learning BNC (1) BAN VS BNC-2P 0.141 VS 0.164 BAN contains 0-5 augmented edges BNC-2P contains 4-16 augmented edges

Results: BAN vs. ELR Error stats directly taken from published results (13) (14) BAN (5)* BAN (6)* BAN (4)* BAN VS ELR-NB 0.141 vs. 0.161 BAN VS ELR-TAN 0.141 vs. 0.155 Error stats directly taken from published results BAN is more efficient to train

Evaluation of BAN vs. BNB Comparison under significance test BAN outperforms BNB (7) Corral 2% - 5% BNB outperforms BAN (2) 0.5%-2% Not significant 13 BAN choose BNB as base structure IRIS, MOFN Average testing error 0.141 vs. 0.151 BAN outperforms BNB (16) BNB outperforms BAN (6) BAN (7) (14) BNB (2) BAN VS BNB 0.141 VS 0.151

Conclusion An ensemble of sparse model as an alternative to structure and parameter optimization Simple to implement Very efficient in training Competitive classification accuracy NB, TAN, HGC BNC ELR

Future Work Extend BAN to handle sequential data Analyze the class of Bayesian network classifiers that can be approximated with an ensemble of sparse structures. Can the BAN model parameters be obtained through parameter learning given the final model structure? Can we use BAN approach to learn generative models?