Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory.

Similar presentations


Presentation on theme: "Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory."— Presentation transcript:

1 Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory The Chinese University of Hong Kong Shatin, NT. Hong Kong {kzhuang, king, lyu}@cse.cuhk.edu.hk SMC2002, October 8, 2002 Hammamet, Tunisia

2 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 2 Outline  Abstract  Background Classifiers Naïve Bayesian Classifiers Semi-Naïve Bayesian Classifiers Chow-Liu Tree  Bounded Semi-Naïve Bayesian Classifiers  Experimental Results  Discussion  Conclusion

3 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 3 Abstract  Propose a technique for constructing semi- naïve Bayesian classifiers. It is bounded by the number of variables that can be combined into a node. It has a less computational cost than the traditional semi-naïve Bayesian networks. Experiments show the proposed technique is more accurate.

4 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 4 A Typical Classification Problem  Given a set of symptoms, one wants to find out whether these symptoms give rise to a particular disease.

5 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 5  Classifiers Given a pre-classified dataset D, where is the training data in m-dimension real space, is the class label. A classifier is defined as a mapping function: to satisfy. Background

6 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 6  Probabilistic Classifiers The classification mapping function is defined as: The joint probability is not easily estimated from the dataset; however, the assumption about the distribution has to be made, e.g., dependent or independent? a constant for a given x Background

7 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 7  Naïve Bayesian Classifiers (NB) Assumption: Given the class label C, the attributes are independent: Classification mapping function Related Work

8 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 8 Related Work  Naïve Bayesian Classifiers NB’s performance is comparable with some state- of-the-art classifiers even when its independency assumption does not hold in normal cases.  Question: Can the performance be better when the conditional independency assumption of NB is relaxed ?

9 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 9  Semi-Naïve Bayesian Classifiers(SNB) A looser assumption than NB. Independency occurs among the jointed variables, given the class label C. Related Work

10 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 10 A tree dependence structure Related Work  Chow-Liu Tree (CLT) Another looser assumption than NB. A dependence tree exists among the variables, given the class variable C.

11 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 11 A conditional tree dependency assumption among variables A conditional independency assumption among jointed variables Chow & Liu68 developed a global optimal and polynomial time cost algorithm Traditional SNBs are not well developed like CLT Summary of Related Work

12 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 12 Kononenko91Pazzani96 Local heuristic Efficient? Accurate? No Inefficient even in jointing 3 variables No Exponential time cost Problems of Traditional SNBs

13 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 13 Our Novel Bounded Semi-Naïve Bayesian Network  Accurate? We use a global combinatorial optimization method.  Efficient? We find the network based on Linear Programming, which can be solved in polynomial time.

14 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 14  Jointed variables  Completely covering the variable set without overlapping  Conditional independency  Bounded Bounded Semi-Naïve Bayesian Network Model Definition

15 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 15  Large search space  Reduced by adding the constraint as follows: The cardinality of each jointed variable is exactly equal to K Hidden principle:  When K is small, a K cardinality of jointed variables will be more accurate than separating them into several jointed variables.  Example: P(a,b) P(c,d) is more close to P(a,b,c,d) than P(a,b)P(c)P(d).  Search space after reduction: Constraining the Search Space

16 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 16  How to search for the appropriate model? Finding the m= [n/K ] K-cardinality subsets (jointed variables) from variables (features) set which satisfy the SNB conditions to maximize the Log likelihood. [x] means rounding the x to the nearest integer Searching K-Bounded-SNB Model

17 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 17 Relax the previous constraints into 0  x  1--an integer programming (IP) problem is changed into a linear programming (LP) problem Relax the previous constraints into 0  x  1--an integer programming (IP) problem is changed into a linear programming (LP) problem No coverage among jointed variables All the jointed variables forms the variable set Rounding Scheme: Rounding LP solution into an IP Solution. Rounding Scheme: Rounding LP solution into an IP Solution. Global Optimization Procedure

18 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 18 Rounding Scheme

19 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 19 Experimental Setup  Datasets 6 benchmark datasets from UCI machine learning repository 1 synthetically generated dataset named “XOR”  Experimental Environments Platform:Windows 2000 Developing tool: Matlab 6.1

20 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 20 Overall Prediction Rate(%) We set the bound parameter K to 2 and 3. 2-BSNB means the BSNB model for bounded parameter set to 2. Experimental Results

21 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 21 Experimental Results Average Error Rate Chart

22 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 22 123 456 789 Results on Tic-Tac-Toe Dataset 9 attributes for Tic-Tac-Toe dataset

23 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 23 Observations  Large K B-SNBs are not good for sparse datasets. Post dataset: 90 samples; K=3, the accuracy decreases.  Which value for K is good depends on the properties of the datasets. For example, Tic-Tac-Toe, Vehicle: 3-variable bias; K=3, the accuracy increases.

24 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 24 Discussion  When n cannot be divided by K exactly (n mod K)=l, l  0, The assumption that all the joined variable has the same cardinality K will be violated. Solution:  Find an l-cardinality jointed variable with the minimum entropy  Do the optimization on the other n-l variables since (n-l mod K) will be 0.  How to choose K ? When the sample number of the dataset is small, a large K may not get a good performance. A good K should be related to the nature of the datasets.  How to relax SNB? SNB is still strongly constrained. Upgrading into a mixture of SNBs.

25 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 25 Conclusion  A novel Bounded Semi-Naïve Bayesian classifier is proposed. Direct combinatorial optimization method enables B-SNB to have global optimization. The transformation from IP into a LP problem reduces the computational complexity into a polynomial one. It outperforms NB and CLT in our experiments.

26 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 26 Main References  Chow, C. K. and Liu, C.N., Approximating discrete probability distributions with dependence trees. In IEEE Trans. on Information Theory, Pages 462-467, Vol.14, 1968.  I. Kononenko. Semi-naive Bayesian classier. In Proceedings of sixth European Working Session on Learning, pages 206-219. Springer-Verlag, 1991.  M.J.Pazzani. Searching dependency in Bayesian classifiers. In D. Fisher and H.-J. Lenz, editors, Learning from data: Artificial intelligence and statistics V, pages 239-248. New York, NY:Springer-Verlag, 1996.  Nathan Srebro. Maximum likelihood bounded tree-width Markov networks, MIT Master thesis, 2001.  Patrick M. Murphy. UCI repository of machine learning databases. In ftp.ics.uci.edu: pub/machine-learning-databases. http://www.ics.uci.edu/ mlearn/MLRepository.html.

27 SMC 2002, October 8, 2002 The Chinese University of Hong Kong Multimedia Information Processing Lab 27 Q&A  Thanks!


Download ppt "Learning Maximum Likelihood Bounded Semi-Naïve Bayesian Network Classifier Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory."

Similar presentations


Ads by Google