Naïve Bayes based Model Billy Doran 09130985. “If the model does what people do, do people do what the model does?”

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Cognitive Modelling – An exemplar-based context model Benjamin Moloney Student No:
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
CS 484 – Artificial Intelligence1 Announcements Homework 8 due today, November 13 ½ to 1 page description of final project due Thursday, November 15 Current.
What is Statistical Modeling
Assuming normally distributed data! Naïve Bayes Classifier.
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
Classification and risk prediction
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Statistical Learning: Bayesian and ML COMP155 Sections May 2, 2007.
Data Mining Techniques Outline
Single Category Classification Stage One Additive Weighted Prototype Model.
A/Prof Geraint Lewis A/Prof Peter Tuthill
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Assessing cognitive models What is the aim of cognitive modelling? To try and reproduce, using equations or similar, the mechanism that people are using.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.
Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.
Text Categorization Moshe Koppel Lecture 2: Naïve Bayes Slides based on Manning, Raghavan and Schutze.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Does Naïve Bayes always work?
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Rule Generation [Chapter ]
Text Classification, Active/Interactive learning.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Naive Bayes Classifier
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS464 Introduction to Machine Learning1 Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease.
Correlation Analysis. Correlation Analysis: Introduction Management questions frequently revolve around the study of relationships between two or more.
Bayesian Classifier. 2 Review: Decision Tree Age? Student? Credit? fair excellent >40 31…40
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Naïve Bayes Classifier. Red = Yellow = Mass = Volume = Apple Sensors, scales, etc… 8/29/03Bayesian Classifier2.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Bayesian Classification Using P-tree  Classification –Classification is a process of predicting an – unknown attribute-value in a relation –Given a relation,
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Chapter 6 Bayesian Learning
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Bayesian Classification
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Bayesian Learning Provides practical learning algorithms
Copyright © 2014 by Nelson Education Limited Chapter 11 Introduction to Bivariate Association and Measures of Association for Variables Measured.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Bayesian Learning Evgueni Smirnov Overview Bayesian Theorem Maximum A Posteriori Hypothesis Naïve Bayes Classifier Learning Text Classifiers.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Applied statistics Usman Roshan.
Does Naïve Bayes always work?
Naive Bayes Classifier
Categorical Data Aims Loglinear models Categorical data
Data Mining Lecture 11.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Parametric Methods Berlin Chen, 2005 References:
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Presentation transcript:

Naïve Bayes based Model Billy Doran

“If the model does what people do, do people do what the model does?”

Bayesian Learning Determines the probability of a hypothesis H given a set of data D: Ρ(Η|D) = P(D|H) P(H)⁄P(D)

P(H) is the prior probability of H. The probability of observing H for the whole data set P(H|D) is the posterior probability of H. This means that given the Data D what is the probability of the hypothesis H. P(D) is the prior probability of observing D. It is constant throughout the data set and can be ignored. P(D|H) is the likelihood of observing the data given the hypothesis. Does the hypothesis reproduce the data?

Maximum a Posteriori Probability In order to classify an example as belonging to one category or another we aim to find the maximal value of P(H|D) For example we can take the training pattern, if we want to find the probability that this example belongs to category A the posterior probability is: P(Category A|A,X,C)

Naïve Bayes The Naïve Bayes algorithm allows us to assume conditional independence of the dimensions. This means that we consider each dimension in terms of its probability given the category: P(A,B|Cat A) = P(A|Cat A)P(B|Cat A) Using this information we are able to build a table of the Conditional Probabilities for each dimension

Conditional Probability Table Probabilities for Category A ABC Dimension Dimension Dimension P(Dimension1=A|Category A) is 4/6, which is

Calculation In order to get the scores for the pattern we first find P(A|A,B,C)=P(A|A)P(B|A)P(C|A)P(A) =0.666*0.1666*0.1666*0.375= P(B|A,B,C)=0.166*0.5*0.1*0.375= P(C|A,B,C)=0.1*0.1*0.833*0.375= Next we normalise the score to get a value in the range [0-1] A= /( ) = 0.52

Conjunctions In order to calculate the conjunction of categories we find the joint probability of the two categories P(A&B) = P(A)P(B) This is similar to the Prototype Theory for conjunctions.

Training Data ABCA&BA&CB&CSingleJoint A A A A BAB AAB B B B B C C C C C C

Training Data The model is almost perfectly consistent learner, meaning that it reproduces the original training data with 100% accuracy. For the conjunction examples #5 and #6 it classifies them as B and A respectively. They obtain a significantly higher score in the AB conjunction than in the AC or BC conjunctions. This seems to suggest that these two examples are more representative of one member of the conjunction than the other.

Test Data ABCA&BA&CB&CSingleJoint A>B>CAB>AC>BC C>B>ABC>AC>AB C>A>BAC>BC>AB C>B>ABC>AC>AB C>B=AAB=AC>AC

Graphs: Comparing Experimental results to Model results

Test Data The results are generally consistent with the experimental data. Except for #3 and #4: For #3 the experiment predicts AC>AB>BC, while the model generates AC>BC>AB For #4 the experimental data predicts C>B>A, the model gives B>C>A

Statistical Analysis The average for the correlation between the model and experimental data was R=0.88 At alpha =0.05 and df = n-2, this was significant. #1 0.82, #2 0.93, #3 0.85, #4 0.84, #5 0.88, #6 0.92

Unusual Predictions How would the model handle ? Output: A > B > C, AC > AB > BC Is it possible to ask the model about triple conjunction? Example: Model predicts: C>B=A, AB=AC>ABC>BC

Conclusion Naïve Bayes produces a good hypothesis of how people learn category classification. The use of probabilities matches well with the underlying logic of the correlations between the dimensions and categories. Creating a Causal Network might be an informative way to investigate further the interactions between the individual dimensions.

Limitations As the model uses a version of prototype to calculate its conjunction it is not able to capture overextension. To rectify this The formulae: Can be used to approximate overextension, where C is the category and KC is the set of non-C categories.

Limitations The model, also, does not take into account negative evidence. While it captures the general trend of the categories it does not, for example, represent the strength of negativity for Category C in test pattern #5 This pattern is very similar to the conjunction patterns given in the training data. The strong negative reaction seems to be caused by the association between these conjunctions and categories A and B.

The End