Slides for “Data Mining” by I. H. Witten and E. Frank.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
Data Mining Classification: Naïve Bayes Classifier
Overview Full Bayesian Learning MAP learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Bayesian Belief Networks
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Learning Bayesian Networks
Today Logistic Regression Decision Trees Redux Graphical Models
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials.
Machine Learning Chapter 3. Decision Tree Learning
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Naive Bayes Classifier
Bayesian Networks Martin Bachler MLA - VO
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Bayes February 17, 2009.
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Introduction to Bayesian Networks
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Oliver Schulte Machine Learning 726 Bayes Net Classifiers The Naïve Bayes Model.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Chapter 6 Bayesian Learning
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Lecture 2: Statistical learning primer for biologists
1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.
Ensemble Methods in Machine Learning
Rotem Golan Department of Computer Science Ben-Gurion University of the Negev, Israel.
Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1.
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Machine Learning in Practice Lecture 18
INTRODUCTION TO Machine Learning 2nd Edition
Qian Liu CSE spring University of Pennsylvania
Naive Bayes Classifier
Data Science Algorithms: The Basic Methods
Perceptrons Lirong Xia.
Data Science Algorithms: The Basic Methods
Data Mining Lecture 11.
CSCI 5822 Probabilistic Models of Human and Machine Learning
Data Mining Classification: Alternative Techniques
Machine Learning Chapter 3. Decision Tree Learning
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Propagation Algorithm in Bayesian Networks
Machine Learning Chapter 3. Decision Tree Learning
Bayesian Learning Chapter
Machine Learning: UNIT-3 CHAPTER-1
Perceptrons Lirong Xia.
Presentation transcript:

Slides for “Data Mining” by I. H. Witten and E. Frank

From Naïve Bayes to Bayesian Network Classifiers

3 Naïve Bayes  Assumption: attributes conditionally independent given the class  Doesn’t hold in practice but classification accuracy often high  However: sometimes performance much worse than e.g. decision tree  Can we eliminate the assumption?

4 Enter Bayesian networks  Graphical models that can represent any probability distribution  Graphical representation: directed acyclic graph, one node for each attribute  Overall probability distribution factorized into component distributions  Graph’s nodes hold component distributions (conditional distributions)

5 Bayes net for weather data

6

7 Computing the class probabilities  Two steps: computing a product of probabilities for each class and normalization  For each class value  Take all attribute values and class value  Look up corresponding entries in conditional probability distribution tables  Take the product of all probabilities  Divide the product for each class by the sum of the products (normalization)

8 Why can we do this? (Part I)  Single assumption: values of a node’s parents completely determine probability distribution for current node  Means that node/attribute is conditionally independent of other nodes given parents

9 Why can we do this? (Part II)  Chain rule from probability theory:  Because of our assumption from the previous slide:

10 Learning Bayes nets  Basic components of algorithms for learning Bayes nets:  Method for evaluating the goodness of a given network  Measure based on probability of training data given the network (or the logarithm thereof)  Method for searching through space of possible networks  Amounts to searching through sets of edges because nodes are fixed

11 Problem: overfitting  Can’t just maximize probability of the training data  Because then it’s always better to add more edges (fit the training data more closely)  Need to use cross-validation or some penalty for complexity of the network  AIC measure:  MDL measure:  LL: log-likelihood (log of probability of data), K: number of free parameters, N: #instances  Another possibility: Bayesian approach with prior distribution over networks

12 Searching for a good structure  Task can be simplified: can optimize each node separately  Because probability of an instance is product of individual nodes’ probabilities  Also works for AIC and MDL criterion because penalties just add up  Can optimize node by adding or removing edges from other nodes  Must not introduce cycles!

13 The K2 algorithm  Starts with given ordering of nodes (attributes)  Processes each node in turn  Greedily tries adding edges from previous nodes to current node  Moves to next node when current node can’t be optimized further  Result depends on initial order

14 Some tricks  Sometimes it helps to start the search with a naïve Bayes network  It can also help to ensure that every node is in Markov blanket of class node  Markov blanket of a node includes all parents, children, and children’s parents of that node  Given values for Markov blanket, node is conditionally independent of nodes outside blanket  I.e. node is irrelevant to classification if not in Markov blanket of class node

15 Other algorithms  Extending K2 to consider greedily adding or deleting edges between any pair of nodes  Further step: considering inverting the direction of edges  TAN (Tree Augmented Naïve Bayes):  Starts with naïve Bayes  Considers adding second parent to each node (apart from class node)  Efficient algorithm exists

16 Likelihood vs. conditional likelihood  In classification what we really want is to maximize probability of class given other attributes  Not probability of the instances  But: no closed-form solution for probabilities in nodes’ tables that maximize this  However: can easily compute conditional probability of data based on given network  Seems to work well when used for network scoring

17 Discussion  We have assumed: discrete data, no missing values, no new nodes  Different method of using Bayes nets for classification: Bayesian multinets  I.e. build one network for each class and make prediction using Bayes’ rule  Different class of learning methods: testing conditional independence assertions