Robust Bayesian Classifier Presented by Chandrasekhar Jakkampudi.

Slides:



Advertisements
Similar presentations
Robust Feature Selection by Mutual Information Distributions Marco Zaffalon & Marcus Hutter IDSIA IDSIA Galleria 2, 6928 Manno (Lugano), Switzerland
Advertisements

Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection Marcus Hutter & Marco Zaffalon IDSIA IDSIA Galleria.
Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Decision Tree.
Data Mining Classification: Alternative Techniques
Yazd University, Electrical and Computer Engineering Department Course Title: Machine Learning By: Mohammad Ali Zare Chahooki Bayesian Decision Theory.
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Data Mining Classification: Naïve Bayes Classifier
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Overview Full Bayesian Learning MAP learning
Assuming normally distributed data! Naïve Bayes Classifier.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Statistical Methods Chichang Jou Tamkang University.
Huang,Kaizhu Classifier based on mixture of density tree CSE Department, The Chinese University of Hong Kong.
Lecture 5 (Classification with Decision Trees)
Visual Recognition Tutorial
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Introduction to Machine Learning Approach Lecture 5.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Bayesian Networks. Male brain wiring Female brain wiring.
by B. Zadrozny and C. Elkan
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Text Classification, Active/Interactive learning.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Naive Bayes Classifier
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
Predicting Earthquakes By Lois Desplat. Why Predict Earthquakes?  To minimize the loss of life and property.  Unfortunately, current techniques do not.
1 Bayesian Methods. 2 Naïve Bayes New data point to classify: X=(x 1,x 2,…x m ) Strategy: – Calculate P(C i /X) for each class C i. – Select C i for which.
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification Techniques: Bayesian Classification
Additive Data Perturbation: the Basic Problem and Techniques.
1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Bayesian Classification Using P-tree  Classification –Classification is a process of predicting an – unknown attribute-value in a relation –Given a relation,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Bayesian Learning. Bayes Classifier A probabilistic framework for solving classification problems Conditional Probability: Bayes theorem:
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
KDD CUP 2001 Task 1: Thrombin Jie Cheng (
Data Mining: Concepts and Techniques
Chapter 6 Classification and Prediction
Bayesian Classification
Data Mining Lecture 11.
Classification Techniques: Bayesian Classification
Prepared by: Mahmoud Rafeek Al-Farra
Classification and Prediction
The Naïve Bayes (NB) Classifier
Using Clustering to Make Prediction Intervals For Neural Networks
Presentation transcript:

Robust Bayesian Classifier Presented by Chandrasekhar Jakkampudi

Classification Classification consists of assigning a class label to a set of unclassified cases. 1. Supervised Classification The set of possible classes is known in advance. 2. Unsupervised Classification Set of possible classes is not known. After classification we can try to assign a name to that class. Unsupervised classification is called clustering.

Supervised Classification The input data, also called the training set, consists of multiple records each having multiple attributes or features. Each record is tagged with a class label. The objective of classification is to analyze the input data and to develop an accurate description or model for each class using the features present in the data. This model is used to classify test data for which the class descriptions are not known. (1)

Bayesian Classifier Assumptions : 1.The classes are mutually exclusive and exhaustive. 2.The attributes are independent given the class. Called “Naïve” classifier because of these assumptions. Empirically proven to be useful. Scales very well.

Bayesian Classifier Bayesian classifier is defined by a set C of classes and a set A of attributes. A generic class belonging to C is denoted by c j and a generic attribute belonging to A as A i. Consider a database D with a set of attribute values and the class label of the case. The training of the Bayesian Classifier consists of the estimation of the conditional probability distribution of each attribute, given the class.

Bayesian Classifier Let n(a ik |c j ) be the number of cases in which A i appears with value a ik and the class is c j. Then p(a ik |c j ) = n(a ik |c j )/  n(a ik |c j ) Also p(c j ) = n(c j )/n This is only an estimate based on frequency. To incorporate our prior belief about p(a ik |c j ) we add α j imaginary cases with class c j of which α jk is the number of imaginary cases in which A i appears with value a ik and the class is c j.

Bayesian Classifier Thus p(a ik |c j ) = (α jk + n(a ik |c j ))/(α j + n(c j )) Also p(c j ) = (α j + n(c j ))/(α + n) where α is the prior global precision. Once the training (estimation of the conditional probability distribution of each attribute, given the class) is complete we can classify new cases. To find p(c j |e k ) we begin by calculating p(c j |a 1k ) = p(a 1k |c j )p(c j )/Σp(a 1k |c h ) p(c h ) p(c j |a 1k, a 2k ) = p(a 2k |c j )p(c j |a 1k )/Σp(a 2k |c h ) p(c h |a 1k ) and so on.

Bayesian Classifier Works well with complete databases. Methods exist to classify incomplete databases Examples include EM algorithm, Gibbs sampling, Bound and Collapse (BC) and Robust Bayesian Classifier etc.

Robust Bayesian Classifier Incomplete databases seriously compromise the computational efficiency of Bayesian classifiers. One approach is to throw away all the incomplete entries. Another approach is to try to complete the database by allowing the user to specify the pattern of the data. Robust Bayesian Classifier makes no assumption about the nature of the data. It provides probability intervals that contain estimates learned from all possible completions of the database.

Training We need to estimate the conditional probability p(a ik /c j ) We have three types of incomplete cases. 1.A i is missing. 2.C is missing 3.Both are missing. Consider the case where value of A i is not known. Fill in the all values of A i with a ik and calculate p max (a ik /c j ). Fill none of the values of A i with a ik and calculate p min (a ik /c j ). Actual value of p(a ik /c j ) lies somewhere between these two extremes.

Prediction Prediction involves computing p(c j /e k ). Since we now have an interval for p(a ik /c j ) we will now calculate p max (c j /e k ) and p min (c j /e k ). To make the actual prediction of the class, the authors have introduced two criteria. 1. Stochastic dominance : Assign class label as c j if p min (c j /e k ) is greater than p max (c h /e k ) for all h≠j. 2. Weak Dominance : Arrive at a single probability for p(c j /e k ) by assigning a score that will fall in the interval between p max (c j /e k ) and p min (c j /e k )

Prediction Stochastic dominance criteria reduces coverage because the probability intervals may be overlapping. This is a more conservative and safe method. Weak dominance criteria improves coverage. Classification depends on the score used to arrive at a single probability for p(c j /e). Score used by the authors = (p min (c j /e)(c-1)/c) + (p max (c j /e)/c) where c is the total number of classes.

Results Robust Bayesian Classifier was tested on the Adult database which consists of 14 attributes over cases from the US Census of % of the database is incomplete. The database is divided into two classes: People who earn more than $50000 a year and people who don’t. Bayesian classifier gave an accuracy of 81.74% with a coverage of 93%. Robust Bayesian classifier under the Stochastic Dominance criteria gave an accuracy of 86.51% with a coverage of 87% Robust Bayesian classifier under the weak dominance criteria gave an accuracy of 82.5% with 100% coverage.

Conclusion Retains or improves upon the accuracy of the Naïve Bayesian Classifier. Stochastic dominance criterion should be the method used when accuracy is more important as compared to the coverage achieved. For more general databases, the weak stochastic dominance criterion should be used because it maintains the accuracy of the classification while improving the coverage.

1. SLIQ: A fast scalable Classifier for Data Mining; Manish Mehta, Rakesh Agarwal and Jorma Rissanen 2. An Introduction to the Robust Bayesian Classifier; Marco Ramoni and Paola Sebastiani 3. A Bayesian Approach to Filtering Junk ; Mehran Sahami, Susan Dumais, David Heckerman, Eric Horvitz 4. Bayesian Networks without Tears; Eugene Charniak 5. Bayesian networks basics; Finn V. Jensen Bibliography