Ch5 Stochastic Methods Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011.

Slides:



Advertisements
Similar presentations
Introduction to Probability The problems of data measurement, quantification and interpretation.
Advertisements

Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011
Chapter 4 Probability and Probability Distributions
COUNTING AND PROBABILITY
Bayesian Classification
MEGN 537 – Probabilistic Biomechanics Ch. 1 – Introduction Ch
Lecture 10 – Introduction to Probability Topics Events, sample space, random variables Examples Probability distribution function Conditional probabilities.
NIPRL Chapter 1. Probability Theory 1.1 Probabilities 1.2 Events 1.3 Combinations of Events 1.4 Conditional Probability 1.5 Probabilities of Event Intersections.
1 BASIC NOTIONS OF PROBABILITY THEORY. NLE 2 What probability theory is for Suppose that we have a fair dice, with six faces, and that we keep throwing.
1 Introduction to Stochastic Models GSLM Outline  course outline course outline  Chapter 1 of the textbook.
Probability and Statistics Dr. Saeid Moloudzadeh Sample Space and Events 1 Contents Descriptive Statistics Axioms of Probability Combinatorial.
1 Business Statistics - QBM117 Assigning probabilities to events.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 4-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska Hiroo Kusaba.
Chapter 4 Probability.
Classification and Regression. Classification and regression  What is classification? What is regression?  Issues regarding classification and regression.
Data-intensive Computing Algorithms: Classification Ref: Algorithms for the Intelligent Web 6/26/20151.
Finite probability space set  (sample space) function P:  R + (probability distribution)  P(x) = 1 x 
CSc411Artificial Intelligence1 Chapter 5 STOCHASTIC METHODS Contents The Elements of Counting Elements of Probability Theory Applications of the Stochastic.
Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 4 Probability See.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Stochastic Methods A Review (Mostly). Relationship between Heuristic and Stochastic Methods  Heuristic and stochastic methods useful where –Problem does.
Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Bayesian Networks. Male brain wiring Female brain wiring.
6/28/2014 CSE651C, B. Ramamurthy1.  Classification is placing things where they belong  Why? To learn from classification  To discover patterns  To.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 5-1 Chapter 5 Some Important Discrete Probability Distributions Basic Business Statistics.
Chapter 8 Probability Section R Review. 2 Barnett/Ziegler/Byleen Finite Mathematics 12e Review for Chapter 8 Important Terms, Symbols, Concepts  8.1.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Copyright © Cengage Learning. All rights reserved. Elementary Probability Theory 5.
Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment.
“PREDICTIVE MODELING” CoSBBI, July Jennifer Hu.
Section 7.2. Section Summary Assigning Probabilities Probabilities of Complements and Unions of Events Conditional Probability Independence Bernoulli.
CSCI 115 Chapter 3 Counting. CSCI 115 §3.1 Permutations.
CS 415 – A.I. Slide Set 12. Chapter 5 – Stochastic Learning Heuristic – apply to problems who either don’t have an exact solution, or whose state spaces.
Introduction to Probability  Probability is a numerical measure of the likelihood that an event will occur.  Probability values are always assigned on.
Let’s consider this problem… The table below gives the proportion of time that the gerbil spends in each compartment. Compartment Proportion A 0.25 B 0.20.
Bayesian Classifier. 2 Review: Decision Tree Age? Student? Credit? fair excellent >40 31…40
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
George F Luger ARTIFICIAL INTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving STOCHASTIC METHODS Luger: Artificial Intelligence,
Chapter 4 Probability ©. Sample Space sample space.S The possible outcomes of a random experiment are called the basic outcomes, and the set of all basic.
Modeling Data Greg Beckham. Bayes Fitting Procedure should provide – Parameters – Error estimates on the parameters – A statistical measure of goodness.
1 CHAPTERS 14 AND 15 (Intro Stats – 3 edition) PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin A Survey of Probability Concepts Chapter 5.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Copyright © Cengage Learning. All rights reserved. Elementary Probability Theory 5.
Class 2 Probability Theory Discrete Random Variables Expectations.
Bayesian Classification
Chapter 3 Probability  The Concept of Probability  Sample Spaces and Events  Some Elementary Probability Rules  Conditional Probability and Independence.
Chapter 4. Probability 1.
Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.
Classification And Bayesian Learning
Classification & Prediction — Continue—. Overfitting in decision trees Small training set, noise, missing values Error rate decreases as training set.
11.7 Continued Probability. Independent Events ► Two events are independent if the occurrence of one has no effect on the occurrence of the other ► Probability.
Classification Today: Basic Problem Decision Trees.
+ Chapter 5 Overview 5.1 Introducing Probability 5.2 Combining Events 5.3 Conditional Probability 5.4 Counting Methods 1.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
STATISTICS 6.0 Conditional Probabilities “Conditional Probabilities”
Welcome to MM207 Unit 3 Seminar Dr. Bob Probability and Excel 1.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Data-intensive Computing Algorithms: Classification Ref: Algorithms for the Intelligent Web 7/10/20161.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Chapter 7: Counting Principles
Chapter 4. Probability
CSCI 5832 Natural Language Processing
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Intro. to Data Mining Chapter 6. Bayesian.
Presentation transcript:

Ch5 Stochastic Methods Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Outline Introduction Intro to Probability Baye’s Theory Naïve Baye’s Theory Application’s of the Stochastic Methods

Introduction Chapter 4 introduced heuristic search as an approach to problem solving in domains where A problem does not have an exact solution or The full state space may be to costly to calculate

Introduction Important application domains for the use of the stochastic method are diagnostic reasoning where cause/effect relationships are not always captured in a purely deterministic fashion Gambling

Outline Introduction Intro to Probability Baye’s Theory Naïve Baye’s Theory Application’s of the Stochastic Methods

Elements of Probability Theory Elementary Event An elementary or atomic event is a happing or occurrence that cannot be made up of other events Event E An event is a set of elementary events Sample Space, S The set of all possible outcomes of an event E Probability, p The probability of an event E in a sample space is the ratio of the number of elements in E to the total number of possible outcomes

Elements of Probability Theory For example, what is the probability that a 7 or an 11 are the result of the roll of two fair dice? Elementary Event: play two dice Event: Roll the dice Sample Space: each die has 6 outcomes, so the total set of outcomes of the two dice is 36

Elements of Probability Theory The number of combinations of the two dice that can give an 7 is 1,6; 2,5; 3,4; 4,3; 5,2 and 6,1 So the probability of rolling a 7 is 6/36 The number of combinations of the two dice that can give an 11 is 5,6; 6,5 So the probability of rolling a 11 is 2/36 Thus, the probability to the answer is 6/36 + 2/36 =2/9

Probability Reasoning Suppose you are driving the interstate highway and realize you are gradually slowing down because of increase traffic congestion The you access to the state highway statistics and download the relevant statistical information

Probability Reasoning In this situation, we have 3 parameter Slowing down (S): T or F Whether or not there’s an accident (A): T or F Whether or not there’s a road construction (C): T of F

Probability Reasoning We may also present it in the traditional Venn Diagram

Elements of Probability Theory Two events A and B are independent if and only if the probability of their both occurring is equal to the product o their occurring individually P(A B) = P(A) * P(B)

Elements of Probability Theory Consider the situation where bit strings of length 4 are randomly generated We want to know whether the event of the bit sting containing an even number of 1s is independent of the event where the bit string ends with 0 We know the total space is 2^4 = 16

Elements of Probability Theory There are 8 bit strings of length four that end with 0 There are 8 bit strings of length four that have even number of 1’s The number of bit strings that have both an even number of 1s and end with 0 is 4: {1100, 1010, 0110, 0000}

Elements of Probability Theory P({even number of 1s} {end with 0})=p({even number of 1s}) * p({end with 0}) 4/16=8/16*8/16

Probability Reasoning Finally, the conditional probability p(d|s) = |d s|/|s|

Probability Reasoning

Outline Introduction Intro to Probability Baye’s Theory Naïve Baye’s Theory Application’s of the Stochastic Methods

Bayes’ Theorem P(A), P(B) is the prior probabilityprior probability P(A|B) is the conditional probability of A, given B.conditional probability P(B|A) is the conditional probability of B, given A.conditional probability

Bayes’ Theorem Suppose there is a school with 60% boys and 40% girls as its students. The female students wear trousers (50%) or skirts (50%) in equal numbers; the boys all wear trousers. An observer sees a (random) student from a distance, and what the observer can see is that this student is wearing trousers. What is the probability this student is a girl? The correct answer can be computed using Bayes' theorem

Bayes’ Theorem P(B|A), or the probability of the student wearing trousers given that the student is a girl. Since girls are as likely to wear skirts as trousers, this is 0.5. P(A), or the probability that the student is a girl regardless of any other information, this probability equals 0.4. P(B), or the probability of a (randomly selected) student wearing trousers regardless of any other information. Since half of the girls and all of the boys are wearing trousers, this is 0.5× ×0.6 = 0.8.

Bayes’ Theorem

Outline Introduction Intro to Probability Baye’s Theory Naïve Baye’s Theory Application’s of the Stochastic Methods

Naïve Bayesian Classifier: Training Dataset Class: C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’ Data sample X = (age <=30, Income = medium, Student = yes Credit_rating = Fair)

Bayesian Theorem: Basics Let X be a data sample Let H be a hypothesis (our prediction) that X belongs to class C Classification is to determine P(H|X), the probability that the hypothesis holds given the observed data sample X Example: customer X will buy a computer given that know the customer’s age and income

Naïve Bayesian Classifier: Training Dataset Class: C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’ Data sample X = (age <=30, Income = medium, Student = yes Credit_rating = Fair)

Naïve Bayesian Classifier: An Example P(C i ): P(buys_computer = “yes”) = 9/14 = P(buys_computer = “no”) = 5/14= Compute P(X|C i ) for each class P(age = “<=30” | buys_computer = “yes”) = 2/9 = P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 P(income = “medium” | buys_computer = “yes”) = 4/9 = P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 P(student = “yes” | buys_computer = “yes) = 6/9 = P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

Naïve Bayesian Classifier: An Example X = (age <= 30, income = medium, student = yes, credit_rating = fair) P(X|C i ) : P(X|buys_computer = “yes”) = x x x = P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = P(X|C i )*P(C i ) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = P(X|buys_computer = “no”) * P(buys_computer = “no”) = Therefore, X belongs to class (“buys_computer = yes”)

Towards Naïve Bayesian Classifier This can be derived from Bayes’ theorem Since P(X) is constant for all classes, only needs to be maximized

Naïve Bayesian Classifier: An Example Test on the following example: X = (age > 30, Income = Low, Student = yes Credit_rating = Excellent)

Outline Introduction Intro to Probability Baye’s Theory Naïve Baye’s Theory Application’s of the Stochastic Methods

Tomato You say [t ow m ey t ow] and I say [t ow m aa t ow] Probabilistic finite machine A finite state machine where the next state function is a probability distribution over the full set of states of the machine Probabilistic finite state acceptor An acceptor, whene one or more states are indicates as the start state and one or more as the accept states

So how is “Tomato” pronounced A probabilistic finite state acceptor for the pronunciation of “tomato”, adapted from Jurafsky and Martin (2000).

Natural Language Processing IN the second example, we consider the phoneme recognition problem, Often called decoding Suppose a phoneme recognition algorithm has identified the phone ni (as in “knee”) that occurs just after the recognized word I

Natural Language Processing We want to associate ni with either a word or the first part of the word Then we need Switchboard Corpora, which is 1.4M word collection of telephone conversation, to assist us.

Natural Language Processing

We next apply a form of Naïve Baye’s theorem to analysis the phone ni following I