November 2004CSA4050: Crash Concepts in Probability1 CSA4050: Advanced Topics in NLP Probability I Experiments/Outcomes/Events Independence/Dependence.

Slides:



Advertisements
Similar presentations
0 0 Review Probability Axioms –Non-negativity P(A)≥0 –Additivity P(A U B) =P(A)+ P(B), if A and B are disjoint. –Normalization P(Ω)=1 Independence of two.
Advertisements

1 Chapter 3 Probability 3.1 Terminology 3.2 Assign Probability 3.3 Compound Events 3.4 Conditional Probability 3.5 Rules of Computing Probabilities 3.6.
Presentation 5. Probability.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 11 (Lab): Probability reminder.
Chapter 4 Probability and Probability Distributions
COUNTING AND PROBABILITY
SI485i : NLP Day 2 Probability Review. Introduction to Probability Experiment (trial) Repeatable procedure with well-defined possible outcomes Outcome.
Unit 32 STATISTICS.
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
1 BASIC NOTIONS OF PROBABILITY THEORY. NLE 2 What probability theory is for Suppose that we have a fair dice, with six faces, and that we keep throwing.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.1 Chapter Six Probability.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 4-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
September SOME BASIC NOTIONS OF PROBABILITY THEORY Universita’ di Venezia 29 Settembre 2003.
1 Copyright M.R.K. Krishna Rao 2003 Chapter 5. Discrete Probability Everything you have learned about counting constitutes the basis for computing the.
PROBABILITY IDEAS Random Experiment – know all possible outcomes, BUT
Conditional Probability and Independence Section 3.6.
Review of Probability Theory. © Tallal Elshabrawy 2 Review of Probability Theory Experiments, Sample Spaces and Events Axioms of Probability Conditional.
Probability.
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
Chapter 4 Probability See.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
November 2005CSA3180: Statistics II1 CSA3180: Natural Language Processing Statistics 2 – Probability and Classification II Experiments/Outcomes/Events.
“PROBABILITY” Some important terms Event: An event is one or more of the possible outcomes of an activity. When we toss a coin there are two possibilities,
Counting and Probability. Counting Elements of Sets Theorem. The Inclusion/Exclusion Rule for Two or Three Sets If A, B, and C are finite sets, then N(A.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
CPSC 531: Probability Review1 CPSC 531:Probability & Statistics: Review Instructor: Anirban Mahanti Office: ICT Class.
 Review Homework Chapter 6: 1, 2, 3, 4, 13 Chapter 7 - 2, 5, 11  Probability  Control charts for attributes  Week 13 Assignment Read Chapter 10: “Reliability”
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin A Survey of Probability Concepts Chapter 5.
Probability & Statistics I IE 254 Exam I - Reminder  Reminder: Test 1 - June 21 (see syllabus) Chapters 1, 2, Appendix BI  HW Chapter 1 due Monday at.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Probability. Basic Concepts of Probability and Counting.
Computing Fundamentals 2 Lecture 6 Probability Lecturer: Patrick Browne
NLP. Introduction to NLP Very important for language processing Example in speech recognition: –“recognize speech” vs “wreck a nice beach” Example in.
Copyright © Cengage Learning. All rights reserved. Elementary Probability Theory 5.
PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY
PROBABILITY IN OUR DAILY LIVES
Probability You’ll probably like it!. Probability Definitions Probability assignment Complement, union, intersection of events Conditional probability.
Introduction to Probability 1. What is the “chance” that sales will decrease if the price of the product is increase? 2. How likely that the Thai GDP will.
Conditional Probability and Intersection of Events Section 13.3.
Education as a Signaling Device and Investment in Human Capital Topic 3 Part I.
Sixth lecture Concepts of Probabilities. Random Experiment Can be repeated (theoretically) an infinite number of times Has a well-defined set of possible.
Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Conditional Probability and the Multiplication Rule NOTES Coach Bridges.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Probability part 1 - review Chapter 7.
Chapter 7: Probability Lesson 1: Basic Principles of Probability Mrs. Parziale.
3.4 Elements of Probability. Probability helps us to figure out the liklihood of something happening. The “something happening” is called and event. The.
9/14/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing Probability AI-Lab
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
PROBABILITY 1. Basic Terminology 2 Probability 3  Probability is the numerical measure of the likelihood that an event will occur  The probability.
Mutually Exclusive & Independence PSME 95 – Final Project.
Concepts of Probability Introduction to Probability & Statistics Concepts of Probability.
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 2: Probability CIS Computational Probability and Statistics.
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 2: Probability CIS Computational Probability and.
Conditional Probability & the Multiplication Rule
Chapter 3: Probability Topics
Chapter 3 Probability.
Natural Language Processing
A Survey of Probability Concepts
Natural Language Processing
Introduction to Probability
CSCI 5832 Natural Language Processing
Probability Probability underlies statistical inference - the drawing of conclusions from a sample of data. If samples are drawn at random, their characteristics.
Great Theoretical Ideas In Computer Science
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 2: Probability CIS Computational Probability and.
Chapter 4 Section 1 Probability Theory.
Introduction to Probability
A random experiment gives rise to possible outcomes, but any particular outcome is uncertain – “random”. For example, tossing a coin… we know H or T will.
Presentation transcript:

November 2004CSA4050: Crash Concepts in Probability1 CSA4050: Advanced Topics in NLP Probability I Experiments/Outcomes/Events Independence/Dependence Bayes’ Rule Conditional Probability/Chain Rule

November 2004 CSA4050: Crash Concepts in Probability 2 Acknowledgement Much of this material is based on material by Mary Dalrymple, Kings College, London

November 2004 CSA4050: Crash Concepts in Probability 3 Experiment, Basic Outcome, Sample Space Probability theory is founded upon the notion of an experiment. An experiment is a situation which can have one or more different basic outcomes. Example: if we throw a die, there are six possible basic outcomes. A Sample Space Ω is a set of all possible basic outcomes. For example,  If we toss a coin, Ω = {H,T}  If we toss a coin twice, Ω = {HT,TH,TT,HH}  if we throw a die, Ω = {1,2,3,4,5,6}

November 2004 CSA4050: Crash Concepts in Probability 4 Event An Event A  Ω is a set of basic outcomes e.g.  tossing two heads {HH}  throwing a 6, {6}  getting either a 2 or a 4, {2,4}. Ω itself is the certain event, whilst { } is the impossible event. Event Space ≠ Sample Space

November 2004 CSA4050: Crash Concepts in Probability 5 Probability distribution A probability distribution of an experiment is a function that assigns a number (or probability) between 0 and 1 to each basic outcome such that the sum of all the probabilities = 1. The probability p(E) of an event E is the sum of the probabilities of all the basic outcomes in E. Uniform distribution is when each basic outcome is equally likely.

November 2004 CSA4050: Crash Concepts in Probability 6 Probability of an Event: die example Sample space = set of basic outcomes = {1,2,3,4,5,6} If the die is not loaded, distribution is uniform. Thus for each basic outcome, e.g. {6} (throwing a six) is assigned the same probability = 1/6. So p({3,6}) = p({3}) + p({6}) = 2/6 = 1/3

November 2004 CSA4050: Crash Concepts in Probability 7 Estimating Probability Repeat experiment T times and count frequency of E. Estimated p(E) = count(E)/count(T) This can be done over m runs, yielding estimates p 1 (E),...p m (E). Best estimate is (possibly weighted) average of individual p i (E)

November 2004 CSA4050: Crash Concepts in Probability 8 3 times coin toss Ω = {HHH,HHT,HTH,HTT,THH,THT,TTH,TTT} Cases with exactly 2 tails = {HTT, THT,TTH} Experiment i = 1000 cases (3000 tosses).  c 1 (E)= 386, p 1 (E) =.386  c 2 (E)= 375, p 2 (E) =.375  p mean (E)= ( )/2 =.381 Uniform distribution is when each basic outcome is equally likely. Assuming uniform distribution, p(E) = 3/8 =.375

November 2004 CSA4050: Crash Concepts in Probability 9 Word Probability General Problem: What is the probability of the next word/character/phoneme in a sequence, given the first N words/characters/phonemes. To approach this problem we study an experiment whose sample space is the set of possible words. N.B. The same approach could be used to study the the probability of the next character or phoneme.

November 2004 CSA4050: Crash Concepts in Probability 10 Word Probability Approximation 1: all words are equally probable Then probability of each word = 1/N where N is the number of word types. But all words are not equally probable Approximation 2: probability of each word is the same as its frequency of occurrence in a corpus.

November 2004 CSA4050: Crash Concepts in Probability 11 Word Probability Estimate p(w) - the probability of word w: Given corpus C p(w)  count(w)/size(C) Example  Brown corpus: 1,000,000 tokens  the: 69,971 tokens  Probability of the: 69,971/1,000,000 .07  rabbit: 11 tokens  Probability of rabbit: 11/1,000,000   conclusion: next word is most likely to be the Is this correct?

November 2004 CSA4050: Crash Concepts in Probability 12 A counter example Given the context: Look at the cute... is the more likely than rabbit? Context matters in determining what word comes next. What is the probability of the next word in a sequence, given the first N words?

November 2004 CSA4050: Crash Concepts in Probability 13 Independent Events A: eggs B: monday sample space

November 2004 CSA4050: Crash Concepts in Probability 14 Sample Space (eggs,mon)(cereal,mon) (nothing,mon) (eggs,tue)(cereal,tue) (nothing,tue) (eggs,wed)(cereal,wed) (nothing,wed) (eggs,thu)(cereal,thu) (nothing,thu) (eggs,fri)(cereal,fri) (nothing,fri) (eggs,sat)(cereal,sat) (nothing,sat) (eggs,sun)(cereal,sun) (nothing,sun)

November 2004 CSA4050: Crash Concepts in Probability 15 Independent Events Two events, A and B, are independent if the fact that A occurs does not affect the probability of B occurring. When two events, A and B, are independent, the probability of both occurring p(A,B) is the product of the prior probabilities of each, i.e. p(A,B) = p(A) · p(B)

November 2004 CSA4050: Crash Concepts in Probability 16 Dependent Events Two events, A and B, are dependent if the occurrence of one affects the probability of the occurrence of the other.

November 2004 CSA4050: Crash Concepts in Probability 17 Dependent Events A B sample space A  B

November 2004 CSA4050: Crash Concepts in Probability 18 Conditional Probability The conditional probability of an event A given that event B has already occurred is written p(A|B) In general p(A|B)  p(B|A)

November 2004 CSA4050: Crash Concepts in Probability 19 Dependent Events: p(A|B)≠ p(B|A) A B sample space A  B

November 2004 CSA4050: Crash Concepts in Probability 20 Example Dependencies Consider fair die example with  A = outcome divisible by 2  B = outcome divisible by 3  C = outcome divisible by 4 p(A|B) = p(A  B)/p(B) = (1/6)/(1/3) = ½ p(A|C) = p(A  C)/p(C) = (1/6)/(1/6) = 1

November 2004 CSA4050: Crash Concepts in Probability 21 Conditional Probability Intuitively, after B has occurred, event A is replaced by A  B, the sample space Ω is replaced by B, and probabilities are renormalised accordingly The conditional probability of an event A given that B has occurred (p(B)>0) is thus given by p(A|B) = p(A  B)/p(B). If A and B are independent, p(A  B) = p(A) · p(B) so p(A|B) = p(A) · p(B) /p(B) = p(A).

November 2004 CSA4050: Crash Concepts in Probability 22 Bayesian Inversion For A and B to occur, either B must occur first, then B, or vice versa. We get the following possibilites: p(A|B) = p(A  B)/p(B) p(B|A) = p(A  B)/p(A) Hence p(A|B) p(B) = p(B|A) p(A) We can thus express p(A|B) in terms of p(B|A) p(A|B) = p(B|A) p(A)/p(B) This equivalence, known as Bayes’ Theorem, is useful when one or other quantity is difficult to determine

November 2004 CSA4050: Crash Concepts in Probability 23 Bayes’ Theorem p(B|A) = p(B  A)/p(A) = p(A|B) p(B)/p(A) The denominator p(A) can be ignored if we are only interested in which event out of some set is most likely. Typically we are interested in the value of B that maximises an observation A, i.e. arg max B p(A|B) p(B)/p(A) = arg max B p(A|B) p(B)

November 2004 CSA4050: Crash Concepts in Probability 24 The Chain Rule We can use the definition of conditional probability to more than two events p(A1 ...  An) = p(A1) * p(A2|A1) * p(A3|A1  A2)..., p(An|A1 ...  An-1) The chain rule allows us to talk about the probability of sequences of events p(A1,...,An).