Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

DIMENSIONALITY REDUCTION: FEATURE EXTRACTION & FEATURE SELECTION Principle Component Analysis.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Fast Algorithms For Hierarchical Range Histogram Constructions
ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
Pattern Recognition and Machine Learning
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
What is Statistical Modeling
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
Visual Recognition Tutorial
Assuming normally distributed data! Naïve Bayes Classifier.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
Computational Learning Theory
Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Introduction to Bayesian Parameter Estimation
Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
1 E. Fatemizadeh Statistical Pattern Recognition.
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Optimal Bayes Classification
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Univariate Gaussian Case (Cont.)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.
A Method to Approximate the Bayesian Posterior Distribution in Singular Learning Machines Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Ch3: Model Building through Regression
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Special Topics In Scientific Computing
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Recognition and Machine Learning
The Naïve Bayes (NB) Classifier
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Learning From Observed Data
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Presentation transcript:

Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition

Outline •Introduction •Background on Naïve Bayesian Network •Computational Issue with KDE •Proposed solution: Spline Approximated KDE •Experiments •Conclusion

Introduction •Bayesian Network (NB) classifiers have been successfully applied to a variety of domains •Attains asymptotically optimal classification error (i.e., Bayes Risk) given that the conditional and prior density estimates are asymptotically consistent (e.g., KDE) •A particular form of the BN is the Naïve BN (NBN) which has shown to provide good performance in practice and can help alleviate the curse of dimensionality [Zhang 2004] •Hence NBN is the focus of this work

Naïve Bayesian Network (NBN) •A BN expresses joint probability distributions (nodes = RVs, edges = dependencies) •Because expressing node densities is difficult in high dimensions (sample density becomes sparse), the BN can be constrained so that the attributes (RVs) are independent for a given class (increases sample densities) •This constrained BN is called the Naïve BN •The following introductory slides are obtained from A. Moore tutorial

Estimating prior and conditional probabilities •Methods for estimating prior P(C) and conditional P(e|C) probabilities –Parametric •Gaussian form are mainly used (CRV) •Fast to compute •May not accurately reflect the true distribution –Non-parametric •KDE •Slow •Can accurately model the true distribution •Can we come up with a fast non-parametric method?

Cost of calculating conditionals •Let N_ts = test patterns; N_tr = training patterns; N_f = # of dimensions; N_c = # of classes •Parametric approach: O(N_ts * N_c * N_f) •Non-parametric approach: O(N_ts * N_tr * N_f) •N_c << N_tr

Reducing N_tr: Spline approximation •Estimate the KDE using splines •Splines are piecewise polynomial regression of order P interpolated at K intervals over the domain constrained to some smoothness property (e.g., s 1 ’’=s 2 ’’) •Spline regression only requires O(P * Log K) or O (P) (if a hash function is employed) •Usually P = 4 •Hence significant computational savings can be attained over the direct KDE

Constructing the Splines •Calculate the endpoints for the K intervals to interpolate –K+1 estimates from the KDE –O(K * N_tr) •Calculate the P coefficients for all the individual splines of the K intervals –O(K * P) •Once splines have been obtained, a density query can be computed in O(P) time

Experiment •Measurement –Approximation accuracy –Classification accuracy –Classification speed •Classifiers –BN-KDE –BN-Spline –BN-Gauss •Synthetic and real-world

Approximation Accuracy

Classification Accuracy

Classification Speed

Conclusion •Spline based method can well approximate the univariate standard KDE •Speed gains can be realized over the direct KDE •Comments –How to determine the # of intervals in the splines? Analogous problem to bandwidth specification in KDE.. –Assigns static intervals.. Same problem as the global bandwidth –This is an approximation for the global bandwidth KDE. How well do the splines approximate the AKDE? –Proposed method works for static data set, however if data distribution changes, then splines will need to be reconstructed •May not be directly applicable to data streams –Implication to LR-KDE •Develop multi-query algorithms (e.g., deriving K+1 endpoints/knots) •Assign dynamic spline intervals based on regularized LR since each LR models a simple density

Reference •H. Zhang, “The optimality of Naïve Bayes”, AAAI 2004