Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probability for dummies

Similar presentations


Presentation on theme: "Probability for dummies"— Presentation transcript:

1 Probability for dummies

2 Probability long-term chance that a certain outcome will occur from some random (stochastic) process deterministic process only one possible "reality" of how the process might evolve under time no randomness, the same output for the same starting conditions example: Newton’s second law random process the outcomes are never certain most of processes in nature even if the initial condition (or starting point) is known, there are many possibilities the process might go to, but some paths are more probable and others less

3 synonyms of probability: chance, likelihood, percentage, proportion
number between 0 and 1 chance applied to an individual “What are my chances to win lottery?” applied to a group “The overall percentage of adults who get cancer …” percentage (80%), ratio (0.80), word (“likely”) probability terms revolve around idea of long-term chance

4 Interpreting probabilities
Chance of winning prize in lottery is 10%. It means that in long-term (over thousands of tickets), 10% of all lottery tickets purchased for this game will win a prize, and 90% won’t. It does not mean that if you buy 10 tickets, one of them will win. If you buy 10 tickets, each of them has 10% chance to win. You may expect higher chance to win with 10 tickets, but it’s not 100% (actually, it is 65%).

5 Coming up with probabilities
some are easy: probability of die landing on 6 (1 out of 6 = 0.167) some are difficult to figure: probability of tropical storm developing into a hurricane (a probability of such event is an estimate)

6 classical approach relative frequencies mathematical, formula-based
rolling die however, it doesn’t always work e. g. decide between different brands of fridge to buy, criterion: have the least chance of needing repairs in the next five years no math formula to figure out the chances of repairs for different brands, it depends on past data being collected regarding repairs in this case use relative frequencies instead relative frequencies collecting data, percentage of time that an event occurs - three ways to obtain probabilities

7 simulations set up a certain scenario (based on scientific model, which means some assumptions are introduced) play out the scenario over and over many times look at the percentage certain outcome happens e. g. hurricane predicting

8 Terms and notation probability is a chance for certain outcome
sample space S – a list of all possible outcomes (S={1,2,3,4,5,6} for die) finite sample space – the elements in set can be counted (e.g. die) countably infinite – you have a way to show the progression, but it goes to the infinity (S={0,1,2,3,4,5,6,…}) uncountably infinite – interval of values, such as S={all real numbers x such that 0 < x < 5}

9 event – subset of the sample space S, notation: A, B, C, D, …
null set, empty set – {} union A U B, corresponds to A or B intersection A ∩ B, corresponds to A and B complement AC, complement of A is set of outcomes from S not in A, e.g.: S={1,2,3,4,5,6}, A={1,2}, AC={3,4,5,6} - the red area represents the result

10 Types of probabilities
all probabilities boil down to the big five: marginal (marginální, nepodmíněná) union (pravděpodobnost sjednocení) joint (intersection), (průniku) conditional (podmíněná) of complement (doplňku) notation: S={1,2,3,4,5,6}, A={1,3,5}, P(A)=3/6=0.5, P(1)=1/6

11 joint (intersection) probability
marginal probability probability of set A all by itself, e.g. S={1,2,3,4,5,6}, A={1,3,5}, P(A)=0.5 you’re interested in one single characteristic of an outcome (in this case odd die) union probability P(A U B) means prob. of A or B A={2,4,6}, B={1,2,3}, P(A U B) = 5/6 two characteristics of outcome or means A or B or both, not “either or” !!! joint (intersection) probability P(A ∩ B) or P(A, B), means happening at the same time, prob. of A and B A={2,4,6}, B={1,2,3}, A∩B={2}, P(A∩B)=1/6

12 complement probability conditional probability
A={2,4}, AC={1,3,5,6}, P(AC)=2/3 conditional probability the probability of an event given that another event has already occurred P(A|B), “probability of A given B” You roll die, one roll comes up odd. What’s the probability that the roll is 5? we want P(die is 5|die is odd), “die is odd” – just three possibilities A={1,3,5}, each of them is equally likely, i.e. P(Five|Odd)=1/3 formula P(Five∩Odd)=P({1,3,5}∩{5})=P({5})=1/6 P(Odd)=P({1,3,5})=3/6 P(Five|Odd)=1/6 ÷ 3/6=1/3

13 Rules of probability Imagine we have two boxes, one red and one blue.
In the red box we have 2 apples and 6 oranges. In the blue box we have 3 apples and 1 orange. Suppose we pick one of the box, and we randomly select a fruit from that box. Suppose that doing so we pick the red box 40% of the time, and the blue one 60%. And that we are equaly likely to select a fruit. P(B=r) = 4/10, P(B=b) = 6/10 from Pattern Recognition and Machine Learning, Ch. Bishop

14 Number of trials in which X=xi and Y=yj is nij.
Now consider slightly more general example involving two random variables X nd Y (e.g. Box and Fruit variables). X takes any of values xi, i=1-5, and Y: yj, j=1-3. Consider a total of N trials in which we sample both of the variables X and Y. Number of trials in which X=xi and Y=yj is nij. Number of trials in which X takes the value xi is ci Number of trials in which Y takes the value yj is rj 3 2 1 from Pattern Recognition and Machine Learning, Ch. Bishop

15 joint probability that X takes value xi and Y takes value yj is
P(X=xi,Y=yj) = nij/N P(X=xi)=ci/N because ci = Σj nij ⇒ P(X=xi) = Σj nij/N = Σj P(X=xi,Y=yj) … sum rule Now consider only instances for which X=xi. Then the fraction of such instances for which Y=yj is conditional probability P(Y=yj| X=xi)=nij/ci and we can derive product rule: from Pattern Recognition and Machine Learning, Ch. Bishop

16 Because P(X,Y) = P(Y,X), we immediately obtain Bayes’ theorem
sum rule product rule P(X,Y) – joint probability, “probability of X and at the same time Y” P(Y|X) – conditional probability, “probability of Y given X” P(X) – marginal probability, “probability of X” Because P(X,Y) = P(Y,X), we immediately obtain Bayes’ theorem denominator in Bayes theorem is a normalization constant so that sum of P(Y|X) over all Y equals 1

17 from Pattern Recognition and Machine Learning, Ch. Bishop

18 It is important to realize differences between joint and conditional:
joint – you select someone with two characteristics from the entire group conditional – you pull out subgroup that has one characteristic already, and you want probability that someone from that subgroup has a second characteristic P(green, face)=2/12 P(green|face)=2/5

19 back to the boxes of fruit example
P(B=r) = 4/10, P(B = b) = 6/10 P(F=a|B=r) = ¼ P(F=o|B=r) = ¾, P(F=a|B=b) = ¾, P(F=o|B=b)= ¼ Question: orange has been selected, but which box is more probable that it comes from? We want to calculate conditional probability P(B=r|F=o), or P(B=b|F=o). But we know only P(F|B) and P(B). So let’s use Bayes theorem. - and P(B=b|F=o)=1-P(B=r|F=o)=1-2/3=1/3

20 Bayes theorem interpretation
If we had been asked which box had been chosen before being told the fruit identity, the most complete information we have available is provided by the P(B). This probability is called prior probability because it is the probability before observing the fruit. Once we’re told that the fruit is an orange, we used Bayes to compute P(B|F). This probability is called posterior because it is the probability after we have observed F. Just based on the prior we would say we have chosen blue box (P(blue)=6/10), however based on the knowledge of fruit identity we actually answer red box (P(red|orange)=2/3). This also agrees with the intuition.

21 Independent events independence - knowledge that one event has happened does not affect the probability of the other event (knowing A has occurred does not change the probability of B occuring given A) P(A|B)=P(A), P(B|A)=P(B) ⇒ “bye-bye conditional probabilities, rest in peace” If two event are independent, then their joint probability P(A,B)=P(A)*P(B) This follows from product rule

22 Spam filtering spam spam ham My work for Prof. Sponer has been
progressing really slowly, since I had to find a little temporary job here 3 different tablets are currently available from the doctor and these work when there is sexual stimulation. Weather is still very nice where I live, we get sunshine every day take a look at the luscious dishes you'll be trying without any obligation to buy! click here I am sending the revised version of the draft. The introduction is little changed. How would you like to make $75 - $250 every single day just for clicking your mouse? I don't know how it is in Italy, but I know that in the US is little nasty too. increase at least a little size of your penis spam spam ham

23 Which probabilities can we get?
Each message is composed from words and falls into a category spam/ham. Which probabilities can we get? P(category) - probability that a randomly selected document will be in this category number of documents in the category/total number of documents P(word|category) – given the particular category, what is the probability that a word is in this category divide number of times the word appears in a particular category by the total number of documents in that category

24 If we have new message, we want to decide if it is spam or not.
So which probability do we need to calculate? For the given message, is it spam or not (i.e. which category) so we want P(category|message) But we know only P(category) and P(word|category) So we have to solve two problems: How to get P(message|category) from P(word|category)? How to get P(category|message) from P(message|category)?

25 P(word|category) → P(message|category)?
message consists of words each word has certain probability that it falls within the given category assuming the independence of occuring words the P(message|category) can be calculated as a product of P(word|category) of all words in the document

26 P(message|category) → P(category|message)
use Bayes theorem so for the given message you calculate probability that it is spam and that it is ham Take the result with highest probability This classification procedure is called naïve Bayes naïve because the assumption of independence of words in the message is false (e.g. it is more likely that words “prize” and “casino” appear together than that “python” and “casino” will be in the same message)


Download ppt "Probability for dummies"

Similar presentations


Ads by Google