Rosa Cowan April 29, 2008 Predictive Modeling & The Bayes Classifier.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Naive Bayes Classifiers, an Overview By Roozmehr Safi.
Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Data Mining Classification: Alternative Techniques
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
An Overview of Machine Learning
Supervised Learning Recap
What is Statistical Modeling
Bayesian Network Classifiers for Identifying the Slope of the customer Lifecycle of Long-Life Customers Authored by: Bart Baesens, Geert Vertraeten, Dirk.
Data Mining Classification: Naïve Bayes Classifier
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Statistical Methods Chichang Jou Tamkang University.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Bayesian Belief Networks
Today Logistic Regression Decision Trees Redux Graphical Models
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Chapter 5 Data mining : A Closer Look.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Crash Course on Machine Learning
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Sergios Theodoridis Konstantinos Koutroumbas Version 2
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
1 E. Fatemizadeh Statistical Pattern Recognition.
Classification Techniques: Bayesian Classification
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Slides for “Data Mining” by I. H. Witten and E. Frank.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
CIS 335 CIS 335 Data Mining Classification Part I.
Machine Learning in Practice Lecture 6 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
CSE 4705 Artificial Intelligence
Machine Learning – Classification David Fenyő
Course: Autonomous Machine Learning
Data Mining Lecture 11.
Pattern Recognition and Image Analysis
The Naïve Bayes (NB) Classifier
LECTURE 23: INFORMATION THEORY REVIEW
LECTURE 07: BAYESIAN ESTIMATION
Multivariate Methods Berlin Chen, 2005 References:
Recap: Naïve Bayes classifier
Chapter 14 February 26, 2004.
Naïve Bayes Classifier
Presentation transcript:

Rosa Cowan April 29, 2008 Predictive Modeling & The Bayes Classifier

Goal of Predictive Modeling To identify class membership of a variable (entity, event, or phenomenon) through known values of other variables (characteristics, features, attributes). This means finding a function f such that y = f(x,  ) where x = {x 1,x 2,…,x p }  is a set of estimated parameters for the model y = c  {c 1,c 2,…,c m } for the discrete case y is a real number for the continuous case

Example Applications of Predictive Models Forecasting peak bloom period for Washington’s cherry blossoms Numerous applications in Natural Language Processing including semantic parsing, named entity extraction, coreference resolution and machine translation. Medical diagnosis (MYCIN – identification of bacterial infections) Sensor threat identification Predicting stock market behavior Image processing Predicting consumer purchasing behaviors Predicting successful movie and record productions

Predictive Modeling Ingredients 1. A model structure 2.A score function 3.An optimization strategy for finding the best  4.Data or expert knowledge for training and testing

2 Types of Predictive Models Classifiers or Supervised Classification* – for the case when C is categorical Regression – for the case when C is real- valued. *The remainder of this presentation focuses on Classifiers

Classifier Variants & Example Types 1.Discriminative: work by defining decision boundaries or decision surfaces –Nearest Neighbor Methods; K-means –Linear & Quadratic Discriminant Methods –Perceptrons –Support Vector Machines –Tree Models (C4.5) 2. Probabilistic Models: work by identifying the most likely class for a given observation by modeling the underlying distributions of the features across classes* –Bayes Modeling –Naïve Bayes Classifiers *Remainder of presentation will focus on Probabilistic Models with particular attention paid to the Naïve Bayes Classifier

General Bayes Modeling Uses Bayes Rule: For general conditional probability classification modeling, we’re interesting in

Bayes Example Let’s say we’re interested in predicting if a particular student will pass CMSC498K. We have data on past student performance. For each student we know: –If student’s GPA > 3.0 (G) –If student had a strong math background (M) –If student is a hard worker (H) –If student passed or failed course

General Bayes Example (Cont.) GPA>3 (G)Math? (M)Hardworker (H) Prob GPA>3 (G)Math? (M)Hardworker (H) Prob PassFail Joint Probability Distributions grow exponentially with # of features! For binary-valued features, we need O(2 p ) JPDs for each class.

Augmented Naïve Bayes Net (Directed Acyclic Graph) pass 0.5 GMH G and H are conditionally independent of M given pass

Naïve Bayes pass 0.5 GMH Strong assumption of the conditional independence of all feature variables. Feature variables only dependent on class variable

Characteristics of Naïve Bayes Only requires the estimation of the prior probabilities P(C K ) and p conditional probabilities for each class, to be able to answer full set of queries across classes and features. Empirical evidence shows that Naïve Bayes classifiers work remarkable well. The use of a full Bayes (belief) network provide only limited improvements in classification performance.

Why do Naïve Bayes Classifiers work so well? Performance measured using 0-1 loss function which counts the number of incorrect classifications rather than a measure of how accurate the classifier estimates the posterior probabilities Additional explanation by Harry Zhang claiming that the distribution of dependencies among features over the classes affects the accuracy of Naïve Bayes.

Zhang’s Explanation Define Local Dependencies – measure of the dependency between a node and its parents. Ratio of the conditional probability of the node given its parents over the node without parents.

Zhang’s Theorem #1 Given an augmented naïve Bayes graph and its correspondent naïve Bayes graph on features X 1,X 2,…X p, assume that f b and f nb are the Bayes and Naïve Bayes classifiers respectively, then the equation below is true.

Zhang’s Theorem #2

Analysis Determine when f nb results in the same classification as f b. Clearly when DF(X) = 1. There are 3 cases for DF(X)=1. 1. All the features are independent 2. Local dependencies for each node distributes evenly in both classes 3. Local dependencies supporting classification in one class are canceled by others supporting the opposite class. If

The End Except For Questions List of Sources

Hand, D., Mannila, H., & Smyth, P. (2001). Principles of Data Mining; Chapter 10. Massachusetts:The MIT Press. Zhang, H. (2004). The Optimality of Naïve Bayes. Retrieved April 17, 2008, Web site: Moore, A. (2001) Bayes Nets for Representing and reasoning about uncertainty. Retrieved April 22, 2008, Web site: Naïve Bayes classifier. Retrieved April 10, 2008, Web site : Ruane, Michael (March 30, 2008) Cherry Blossom Forecast gets a Digital Aid. Retrieved April 10, 2008, Web site: recast_gets_a_digital_aid /