# CS1512 1 CS1512 Foundations of Computing Science 2 Lecture 4 Probability and statistics (4) © J R W Hunter, 2006; C J van Deemter 2007.

## Presentation on theme: "CS1512 1 CS1512 Foundations of Computing Science 2 Lecture 4 Probability and statistics (4) © J R W Hunter, 2006; C J van Deemter 2007."— Presentation transcript:

CS1512 1 CS1512 Foundations of Computing Science 2 Lecture 4 Probability and statistics (4) © J R W Hunter, 2006; C J van Deemter 2007

CS1512 2 (Revd. Thomas) Bayes Theorem P( E 1 and E 2 ) = P( E 1 )* P( E 2 | E 1 ) Order of E 1 and E 2 is not important. P( E 2 and E 1 ) = P( E 2 )* P( E 1 | E 2 ). So, P( E 2 )* P( E 1 | E 2 ) = P( E 1 )* P( E 2 | E 1 ). So, P( E 1 )* P( E 2 | E 1 ) P( E 1 | E 2 ) = P( E 2 )

CS1512 3 (Revd. Thomas) Bayes Theorem P( E 1 )* P( E 2 | E 1 ) P( E 1 | E 2 ) = P( E 2 ) So, once you know P(E1) and P(E2), knowing one of the two conditional probabilities (that is, knowing either P(E1|E2) or P(E2|E1)) implies knowing the other of the two.

CS1512 4 (Revd. Thomas) Bayes Theorem P( E 1 )* P( E 2 | E 1 ) P( E 1 | E 2 ) = P( E 2 ) A simple example, without replacement: Two red balls and one black: RRB P(R)=2/3 P(B)= P(R|B)= P(B|R)=

CS1512 5 (Revd. Thomas) Bayes Theorem P( E 1 )* P( E 2 | E 1 ) P( E 1 | E 2 ) = P( E 2 ) A simple example: Two red balls and one black: RRB P(R)=2/3 P(B)=1/3 P(R|B)=1 P(B|R)=1/2

CS1512 6 (Revd. Thomas) Bayes Theorem P( E 1 )* P( E 2 | E 1 ) P( E 1 | E 2 ) = P( E 2 ) A simple example: Two red balls and one black: RRB P(R)=2/3 P(B)=1/3 P(R|B)=1 P(B|R)=1/2. Now calculate P(R|B) using Bayes: P(R|B)=

CS1512 7 (Revd. Thomas) Bayes Theorem P( E 1 )* P( E 2 | E 1 ) P( E 1 | E 2 ) = P( E 2 ) A simple example: Two red balls and one black: RRB P(R)=2/3 P(B)=1/3 P(R|B)=1 P(B|R)=1/2 Now calculate P(R|B) using Bayes: P(R|B)=(2/3*1/2)/1/3=(2/6)/(1/3)=1 --- correct!

CS1512 8 Bayes Theorem Why bother? Suppose E 1 is a particular disease (say measles) Suppose E 2 is a particular disease (say spots) P( E 1 ) is the probability of a given patient having measles (before you see them) P( E 2 ) is the probability of a given patient having spots (for whatever reason) P( E 2 | E 1 ) is the probability that someone who has measles has spots Bayes theorem allows us to calculate the probability: P( E 1 | E 2 ) = P (patient has measles given that the patient has spots)

CS1512 9 Discrete random variables Discrete random variable a discrete variable where the values taken are associated with a probability Discrete probability distribution formed by the possible values and their probabilities P ( X = ) =...

CS1512 10 Simple example – fair coin Let variable ( X ) be number of heads in one trial (toss). Values are 0 and 1 Probability distribution: Value of X ( i ) 0 1 P ( X = i ) 0.5 0.5

CS1512 11 Example – total of two dice Let variable ( X ) be the total when two fair dice are tossed. Its values are 2, 3, 4,... 11, 12 Note that not all these combination values are equally probable! Probability distribution: P ( X = 2 ) = 1/36 P ( X = 3 ) = 2/36 (because 3 = 1+2 = 2+1) P ( X = 4 ) = 3/36 (because 4 = 1+3 = 3+1 = 2+2)... 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12

CS1512 12 Example – total of two dice Let variable ( X ) be the total when two fair dice are tossed. Its values are 2, 3, 4,... 11, 12 Probability distribution: P ( X = 2 ) = 1/36 P ( X = 3 ) = 2/36... 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12

CS1512 13 Bernouilli probability distribution Bernoulli trial: one in which there are only two possible and exclusive outcomes; call the outcomes success and failure; consider the variable X which is the number of successes in one trial Probability distribution: Value of X ( i ) 0 1 P ( X = i ) (1-p) p (fair coin p=1/2 ; graph p=4/10)

CS1512 14 Binomial distribution Carry out n Bernoulli trials. X is the number of successes in n trials. Possible values are: 0, 1, 2,... i,... n -1, n What is P(X=r) ? (= the probability of scoring exactly r successes in n trials) Let s=success and f=failure. Then a trial is a tuple like e.g. (n=7) Generally, a trial is an n-tuple consisting of s and f. Consider the 7-tuple. Its probability is P(f)*P(s)*P(f)*P(s)*P(f)*P(f)*P(f). Thats P(s) 2 * P(f) 5. Probability of a given tuple having r s-es and n-r f-s in it = P(s) r * P(f) n-r Note: P(f)=1-P(s) (i.e., P(f)+P(s)=1)

CS1512 15 Binomial distribution Probability of a given tuple having r s-es and n-r f-s in it = P(s) r * P(f) n-r Were interested in the probability of all tuples together that have r s-es and n-r f-s in them (i.e. the sum of their probabilities). How many ways are there of picking exactly r elements out of a set of n? Theres a formula for this (coming from Combinatorics), which says there are n! r! (n-r)! (Def: n! = n (n-1) (n-2)... 2 1) ( To see that this is not as bad as it looks, try a few values of n and r ) So, we need to multiply P(s) r * P(f) n-r with this factor.

CS1512 16 Binomial distribution Combining what youve seen in the previous slides Carry out n Bernoulli trials; in each trial the probability of success is P(s) and the probablity of failure is P(f) The distribution of X (i.e., the number of successes) is: P(X=r) = n C r P(s) r P(f) n-r n! n C r = r! (n-r)!

CS1512 17 Binomial distribution (slightly different formulation) Carry out n Bernoulli trials; in each trial the probability of success is p X is the number of successes in n trials. Possible values are: 0, 1, 2,... i,... n -1, n We state (without proving) that the distribution of X is: P(X=r) = n C r p r (1-p) n-r n! n C r = r! (n-r)!

CS1512 18 Example of binomial distribution Probability of r heads in n trials ( n = 20) P(X = r)

CS1512 19 Example of binomial distribution Probability of r heads in n trials ( n = 100) P(X = r)

CS1512 20 Probability of being in a given range Probability of r heads in n trials ( n = 100) P(0 X < 5) P(5 X < 10)...

CS1512 21 Probability of being in a given range P(55 X < 65) = P(55 X < 60) + P(60 X < 65) (mutually exclusive events) Add the heights of the columns

CS1512 22 Continuous random variable Consider a large population of individuals e.g. all males in the UK over 16 Consider a continuous attribute e.g. Height: X Select an individual at random so that any individual is as likely to be selected as any other X is said to be a continuous random variable

CS1512 23 Continuous random variable Suppose the heights in your sample range from 31cm (a baby) to 207cm Continuous cant list every height separately need bins. If you are satisfied with a rough division into a few dozen categories, you might decide to use bins as follows: 19.5 - 39.5cm: relative frequency = 3/100 39.5 - 59.5cm: relative frequency = 7/100 59.5 - 79.5cm: relative frequency = 23/100 etc. Relative frequencies may be plotted as follows:

CS1512 24 Relative Frequency Histogram height of the bar gives the relative frequency relative frequency

CS1512 25 Probability density function The probability distribution of X is said to be its probability density function defined such that: P(a x > b) = area under the curve between a and b NB total area under curve must be 1.0

CS1512 26 Increase the number of samples... and decrease the width of the bin...

CS1512 27 Relative frequency as area under the curve relative frequency of values between a and b = area

CS1512 28 The normal distribution Very common distribution: often called a Gaussian distribution variable measured for large number of nominally identical objects; variation assumed to be caused by a large number of factors; each factor exerts a small random positive or negative influence; e.g. height: age diet genetic influences etc. Symmetric about mean Unimodal: there is only one peak

CS1512 29 Mean Mean determines the centre of the curve: Mean = 10 Mean = 30

CS1512 30 Standard deviation Standard deviation determines the width of the curve: Std. Devn. = 2 Std. Devn. = 1

CS1512 31 Normal distribution Certain things can be proven about any normal distribution (This is why statisticians often assume that their sample space is normally distributed.... even though this is usually very difficult to be sure about!) Example (without proof): In normally distributed data, the probability of lying within 2 standard deviations of the mean (either side) are approximately 0.95 NB: Given a normal distribution, mean and standard deviation together allow you to reconstruct a probability distribution.

CS1512 32 Probability of sample lying within mean ± 2 standard deviations x Prob Dist Cum Prob Dist 2.00.0001338300.003% 2.40.00029194700.007% 2.80.00061190200.016% 3.20.00123221900.034% 3.60.00238408800.069% 4.00.00443184800.135% 4.40.00791545200.256% 4.80.01358296900.466% 5.20.0223945300.820% 5.60.03547459301.390% 6.00.05399096702.275% 6.40.07895015803.593% 6.80.11092083505.480% 7.20.14972746608.076% 7.60.19418605511.507% 8.00.24197072515.866% 8.40.28969155321.186% 8.80.33322460327.425% 9.20.3682701434.458% 9.60.39104269442.074% 10.00.3989422850.000% Example: Mean ( μ )= 10.0 Std. devn ( σ ) = 2.0 P ( X < μ – σ ) = 15.866% P ( X < μ – 2 σ ) = 2.275% P ( μ – 2 σ < X < μ + 2 σ ) = = (100 – 2 * 2.275)% = 94.5%

CS1512 33 Probability of sample lying within mean ± 2 standard deviations 2.275% 94.5% μ – 2σμ + 2σμ

CS1512 34 Non-normal distributions Some non-normal distributions are so close to normal that it doesnt harm much to pretend that they normal But some distributions are very far from normal As a stark reminder, suppose were considering the variable X= error when a persons age is state. Error = def real age – age last birthday Range(X)= 0 to 12 months Which of these values is most probable?

CS1512 35 Non-normal distributions Some non-normal distributions are so close to normal that it doesnt harm much to pretend that they normal But some distributions are very far from normal As a stark reminder, suppose were considering the variable X= error when a persons age is state. Error = def real age – age last birthday Range(X)= 0 to 12 months Which of these values is most probable? None: always P(x)=1/12 How would you plot this?

CS1512 36 Uniform probability distribution Also called rectangular distribution 0.0 1.0 x P(X) 0.0 1.0 P(X < y) = 1.0 y = y y

CS1512 37 Uniform probability distribution 0.0 x P(X) 0.0 P(X < a) = a P(X < b) = b P(a X < b) = b - a 1.0 a b

CS1512 38 Non-normal distributions (end)

Download ppt "CS1512 1 CS1512 Foundations of Computing Science 2 Lecture 4 Probability and statistics (4) © J R W Hunter, 2006; C J van Deemter 2007."

Similar presentations