Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Probability Models for Distributions of Discrete Variables.

Similar presentations


Presentation on theme: "1 Probability Models for Distributions of Discrete Variables."— Presentation transcript:

1 1 Probability Models for Distributions of Discrete Variables

2 2 xp(x)p(x) 00.20 10.30 20.20 30.15 40.10 50.05 Randomly select a college student. Determine x, the number of credit cards the student has. x = # of cards p(x) = probability of x occurring

3 3 A population is a collection of all units of interest. Example: All college students A sample is a collection of units drawn from the population. Example: Any subcollection of college students. Probabilities go with populations. Scientific studies randomly sample from the entire population. Each unit in the sample is chosen randomly. The entire sample is random as well. Populations / Samples

4 4 For discrete data, a population and a sample are summarized the same way (for instance, as a table of values and accompanying relative frequencies). A probability distribution (or model) for a discrete variable is a description of values, with each value accompanied by a probability. Probability Models and Populations

5 5 Definitions of Probability 2. the probability of an event is the long term (technically forever) relative frequency of occurrence of the event, when the experiment is performed repeatedly under identical starting conditions. 3. The probability of an event is the relative frequency of units in the population for which the event applies. To aggregate these meanings: The probability associated with an event is its relative frequency of occurrence over all possible ways the phenomena can take place. Probability Models and Populations

6 6 “All models are wrong. Some are useful.” George Box -industrial statistician Probability Models

7 7 A probability distribution for a discrete variable is tabulated with a set of values, x and probabilities, p(x). xp(x)p(x) 00.20 10.30 20.20 30.15 40.10 50.05 Probabilities Must be nonnegative.

8 8 A probability distribution for a discrete variable is tabulated with a set of values, x and probabilities, p(x). xp(x)p(x) 00.20 10.30 20.20 30.15 40.10 50.05 SUM1.00 Probabilities Must be nonnegative. Must sum to 1. Within rounding error.

9 9 The mean  of a probability distribution is the mean value observed for all possible outcomes of the phenomena.

10 10 Consider idealized data sets xp(x)p(x) 00.2020 0s 10.3030 1s 20.2020 2s 30.1515 3s 40.1010 4s 50.055 5s

11 11 Idealized data set n = 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 Mean = 1.80SD = 1.44

12 12 Consider idealized data sets xp(x)p(x) 00.20200 0s 10.30300 1s 20.20200 2s 30.15150 3s 40.10100 4s 50.0550 5s

13 13 Idealized data set n = 1000 0 0 0 0 0 0 0 … 0 (200) 1 1 1 1 1 1 1 1 1 1 … 1 (300) 2 2 2 2 2 2 … 2 (200) 3 3 3 3 … 3(150) 4 4 … 4(100) 5 … 5(50) Mean = 1.80SD = 1.44

14 14 Values for the mean and standard deviation don’t depend on the number of data values; they depend instead on the relative location of the data values – they depend on the distribution in relative frequency terms.

15 15 The mean  of a probability distribution is the mean value observed for all possible outcomes of the phenomena. Formula:  is synonymous with “population mean” SUM symbol Greek letter “myou”

16 16 xp(x)p(x) x  p(x)x  p(x) 00.20 0  0.20 = 0.00 10.30 1  0.30 = 0.30 20.20 2  0.20 = 0.40 30.15 3  0.15 = 0.45 40.10 4  0.10 = 0.40 50.05 5  0.05 = 0.25 1.001.80 Multiply each value by its probability Sum the products Mean  = 1.80

17 17 The standard deviation  of a probability distribution is the standard deviation of the values observed for all possible outcomes of the phenomena. Formula:  denotes “population standard deviation” Greek letter “sigma”

18 18 First obtain the variance. xp(x)p(x) 00.20 (0 – 1.8) 2  0.20 = 0.648 10.30 (1 – 1.8) 2  0.30 = 0.192 20.20 (2 – 1.8) 2  0.20 = 0.008 30.15 (3 – 1.8) 2  0.15 = 0.216 40.10 (4 – 1.8) 2  0.10 = 0.484 50.05 (5 – 1.8) 2  0.05 = 0.512  2 = 2.060 (take square root to obtain)  = 1.44

19 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 Mean = 1.80SD = 1.44 Mean – SD = 0.56Mean + SD = 3.24 65 / 100 = 65%

20 20 xp(x)p(x) 00.20 10.30 20.20 30.15 40.10 50.05 Mean = 1.80SD = 1.44 Mean – SD = 0.56 Mean + SD = 3.24 0.30 + 0.20 + 0.15 = 0.65

21 21 x = # children in randomly selected college student’s family. xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

22 22 x = # children in randomly selected college student’s family. 0.2194 = 21.94% of all college students come from a 1 child family. xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

23 23 Guess at mean? Above 2 (right skew  mean > mode). xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

24 24 To determine the mean, multiply values by probabilities, x  p(x) and sum these. 55/10 = 5.50 is not the mean 1.000/10 = 0.10 is not the mean xp(x)p(x) x  p(x)x  p(x) 10.21941(0.2194) = 0.2194 20.2806 2(0.2806) = 0.5612 30.23293(0.2329) = 0.6987 40.1442: = 0.5768 50.07360.3680 60.03170.1902 70.01240.0868 80.00430.0344 90.00050.0045 100.00030.0030 551.0000 Mean:  = 2.7430

25 25 To determine the variance, multiply squared deviations from the mean by probabilities, (x –  ) 2  p(x) and sum these. xp(x)p(x) (x –  ) 2  p(x) 10.2194 (1 – 2.743) 2  0.2194 = 0.6665 20.2806 (2 – 2.743) 2  0.2806 = 0.1549 30.2329 (3 – 2.743) 2  0.2329 = 0.0154 40.1442: = 0.2278 50.07360.3749 60.03170.3363 70.01240.2247 80.00430.1188 90.00050.0196 100.00030.0158 551.0000 Variance:  2 = 2.1548

26 26 The standard deviation is the square root of the variance. Examining the data set consisting of # of children in the family recorded for all students: The mean is 2.743; the standard deviation is 1.468.

27 27 Determine the probability a student is from a family with more than 5 siblings. P(x > 5) xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

28 28 Determine the probability a student is from a family with more than 5 siblings. P(x > 5) xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

29 29 Determine the probability a student is from a family with more than 5 siblings. P(x > 5) xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

30 30 Determine the probability a student is from a family with more than 5 siblings. P(x > 5) xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

31 31 Determine the probability a student is from a family with more than 5 siblings. P(x > 5) xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

32 32 Determine the probability a student is from a family with more than 5 siblings. P(x > 5)= 0.0317 + 0.0124 + 0.0043 + 0.0005 + 0.0003 xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

33 33 Determine the probability a student is from a family with more than 5 siblings. P(x > 5)= 0.0317 + 0.0124 + 0.0043 + 0.0005 + 0.0003 = 0.0492 xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

34 34 Determine the probability a student is from a family with more than 5 siblings. P(x > 5)= 0.0492 4.92% of all college students come from families with more than 5 children (they have 4 or more brothers and sisters). xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

35 35 Determine the probability a student is from a family with at most 3 siblings. P(x  3)= 0.2194 + 0.2806 + 0.2329 = 0.7329 xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

36 36 Determine the probability a student is from a family with at least 7 siblings. P(x  7)= 0.0124 + 0.0043 + 0.0005 + 0.0003 = 0.0175 Good idea: Take the reciprocal of a small probability… 1/.0175 = 57.1  1 in 57 students xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

37 37 Determine the probability a student is from a family with fewer than 5 siblings. P(x < 5)= 0.2194 + 0.2806 + 0.2329 + 0.1442 = 0.8771 xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

38 38 at most 3at least 7 less than or equal to 3greater than or equal to 7  no more than 3no fewer/less than 7  x  3x  7

39 39 Determine the probability a student’s number of siblings falls within 1 standard deviation of the mean. Guess? 0.68 xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

40 40 Determine the probability a student’s number of siblings falls within 1 standard deviation of the mean. Mean  = 2.743 SD  = 1.468 1 SD below the mean 2.743 – 1.468 = 1.275 1 SD above the mean 2.743 + 1.468 = 4.211 xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

41 41 Determine the probability a student’s number of siblings falls within 1 standard deviation of the mean. 1 SD below the mean = 1.275 1 SD above the mean = 4.211 Values are within 1 SD of the mean if they are between these. xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

42 42 Determine the probability a student’s number of siblings falls within 1 standard deviation of the mean. 1 SD below the mean = 1.275 1 SD above the mean = 4.211 Values are within 1 SD of the mean if they are between these. xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

43 43 Determine the probability a student’s number of siblings falls within 1 standard deviation of the mean. 1 SD below the mean = 1.275 1 SD above the mean = 4.211 Values are within 1 SD of the mean if they are between these. The probability of being between these: 0.2806 + 0.2329 + 0.1442 = 0.6577 xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

44 44 Determine the probability a student’s number of siblings falls within 2 standard deviations of the mean. Guess?0.95 2 SD below the mean 1.275 – 1.468 = -0.193 2 SD above the mean 4.211+ 1.468 = 5.679 Between -0.193 and 5.679. xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

45 45 Determine the probability a student’s number of siblings falls within 2 standard deviations of the mean. Between -0.193 and 5.679. (Equivalent to 5 or fewer.) xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

46 46 Determine the probability a student’s number of siblings falls within 2 standard deviations of the mean. Between -0.193 and 5.679. (Equivalent to 5 or fewer.) We know an outcome more than 5 has probability 0.0492. xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

47 47 Determine the probability a student’s number of siblings falls within 2 standard deviations of the mean. Between -0.193 and 5.679. (Equivalent to 5 or fewer.) We know an outcome more than 5 has probability 0.0492. The probability of an outcome at most 5 is 1 – 0.0492 = 0.9508. xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

48 48 Determine the probability a student’s number of siblings falls within 2 standard deviations of the mean. Between -0.193 and 5.679. 0.9508. xp( x) 10.2194 20.2806 30.2329 40.1442 50.0736 60.0317 70.0124 80.0043 90.0005 100.0003

49 49 A company monitors pollutants downstream of discharge into a stream. Data were collected on 200 days from a point 1 mile downstream of the plant on Stream A. Data were collected on 100 days from a point 1 miles downstream of the plant on Stream B. Pollutant Particles in Streamwater

50 50 How do means compare? (What are the means?) How do SDs compare? (What are the SDs?)

51 51 Similar Means. Similar Standard Deviations. (Similar everything except ns.)

52 52

53 53 Stream B Mean = 1.775 SD = 1.242 Stream A Mean = 1.770 SD = 1.340

54 54 Here is the probability distribution for the number of diners seated at a table in a small café. xp(x)p(x) 10.10 20.20 3____ 40.40 a) Fill in the blank

55 55 xp(x)p(x) 10.10 20.20 30.30 40.40 a) Fill in the blank Here is the probability distribution for the number of diners seated at a table in a small café.

56 56 b) Determine the mean  Start by computing xp(x) for each row. xp(x)p(x) 10.10 20.20 30.30 40.40 Here is the probability distribution for the number of diners seated at a table in a small café.

57 57 xp(x)p(x)xp(x) 10.10 20.20 30.30 40.40 b) Determine the mean  Start by computing xp(x) for each row. Here is the probability distribution for the number of diners seated at a table in a small café.

58 58 xp(x)p(x)xp(x) 10.10 1  0.10 = 0.10 20.20 30.30 40.40 b) Determine the mean  Start by computing xp(x) for each row. Here is the probability distribution for the number of diners seated at a table in a small café.

59 59 xp(x)p(x)xp(x) 10.10 1  0.10 = 0.10 20.20 2  0.20 = 0.40 30.30 40.40 b) Determine the mean  Start by computing xp(x) for each row. Here is the probability distribution for the number of diners seated at a table in a small café.

60 60 xp(x)p(x)xp(x) 10.10 1  0.10 = 0.10 20.20 2  0.20 = 0.40 30.30 3  0.30 = 0.90 40.40 4  0.40 = 1.60 b) Determine the mean  Start by computing xp(x) for each row. Here is the probability distribution for the number of diners seated at a table in a small café.

61 61 xp(x)p(x)xp(x) 10.10 1  0.10 = 0.10 20.20 2  0.20 = 0.40 30.30 3  0.30 = 0.90 40.40 4  0.40 = 1.60 b) Determine the mean  Sum these. Here is the probability distribution for the number of diners seated at a table in a small café.

62 62 xp(x)p(x)xp(x) 10.10 1  0.10 = 0.10 20.20 2  0.20 = 0.40 30.30 3  0.30 = 0.90 40.40 4  0.40 = 1.60 b) Determine the mean  Sum these.  = 3.00 Here is the probability distribution for the number of diners seated at a table in a small café.

63 63 b) Determine the standard deviation  Start by computing ( x –  ) 2 p(x) for each row. xp(x)p(x) 10.10 20.20 30.30 40.40 Here is the probability distribution for the number of diners seated at a table in a small café.

64 64 b) Determine the standard deviation  Start by computing ( x –  ) 2 p(x) for each row.  = 3 xp(x)p(x) 10.10 20.20 30.30 40.40 Here is the probability distribution for the number of diners seated at a table in a small café.

65 65 xp(x)p(x)( x – 3) 2 p(x) 10.10 20.20 30.30 40.40 b) Determine the standard deviation  Start by computing ( x – 3) 2 p(x) for each row.  = 3 Here is the probability distribution for the number of diners seated at a table in a small café.

66 66 xp(x)p(x)( x – 3) 2 p(x) 10.10 (1 – 3) 2  0.10 = 0.40 20.20 30.30 40.40 b) Determine the standard deviation  Start by computing ( x – 3 ) 2 p(x) for each row.  = 3 Here is the probability distribution for the number of diners seated at a table in a small café.

67 67 xp(x)p(x)( x – 3) 2 p(x) 10.10 (1 – 3) 2  0.10 = 0.40 20.20 (2 – 3) 2  0.20 = 0.20 30.30 40.40 b) Determine the standard deviation  Start by computing ( x – 3 ) 2 p(x) for each row.  = 3 Here is the probability distribution for the number of diners seated at a table in a small café.

68 68 xp(x)p(x)( x – 3) 2 p(x) 10.10 (1 – 3) 2  0.10 = 0.40 20.20 (2 – 3) 2  0.20 = 0.20 30.30 (3 – 3) 2  0.20 = 0.00 40.40 (4 – 3) 2  0.20 = 0.40 b) Determine the standard deviation  Start by computing (x – 3 ) 2 p(x) for each row.  = 3 Here is the probability distribution for the number of diners seated at a table in a small café.

69 69 xp(x)p(x)( x – 3) 2 p(x) 10.10 (1 – 3) 2  0.10 = 0.40 20.20 (2 – 3) 2  0.20 = 0.20 30.30 (3 – 3) 2  0.20 = 0.00 40.40 (4 – 3) 2  0.20 = 0.40 b) Determine the standard deviation  Sum these Here is the probability distribution for the number of diners seated at a table in a small café.

70 70 xp(x)p(x)( x – 3) 2 p(x) 10.10 (1 – 3) 2  0.10 = 0.40 20.20 (2 – 3) 2  0.20 = 0.20 30.30 (3 – 3) 2  0.30 = 0.00 40.40 (4 – 3) 2  0.40 = 0.40 b) Determine the standard deviation  Sum these Variance = 1.00 SD:  = 1.00 Here is the probability distribution for the number of diners seated at a table in a small café.

71 71 This framework makes it possible to obtain fairly good approximations to means and standard deviations from a histogram of continuous data. [Optional] Application

72 72 Here are waiting times between student arrivals in a class. There are 21 students (20 waits). Example Approximate the mean and median. How do they compare?

73 73 For each class, determine its frequency and corresponding midpoint. Example: Mean Frequency = 10 Midpoint = 5

74 74 Tabulate frequencies and midpoints. Example: Mean MidpointFrequency 510

75 75 Tabulate frequencies and midpoints. Example: Mean MidpointFrequency 510 155 253 351 451 Total20

76 76 Obtain relative frequencies. Example: Mean MidpointFrequencyRelativeFrequency 51010/20= 0.50 155 253 351 451 Total20

77 77 Obtain relative frequencies. Example: Mean MidpointFrequencyRelativeFrequency 51010/20= 0.50 1555/20= 0.25 2533/20= 0.15 3511/20= 0.05 4511/20= 0.05 Total201.00

78 78 Proceed with the formula Example: Mean MidpointRel FreqProduct 50.505(0.50) = 2.50 150.25 250.15 350.05 450.05 Total20

79 79 Proceed as a discrete population distribution. Example: Mean MidpointRel FreqProduct 50.505(0.50) = 2.50 150.2515(0.25) = 3.75 250.1525(0.15) = 3.75 350.0535(0.05) = 1.75 450.0545(0.05) = 2.25 Total20 Mean

80 80 Proceed as a discrete population distribution. Example: Mean MidpointRel FreqProduct 50.505(0.50) = 2.50 150.2515(0.25) = 3.75 250.1525(0.15) = 3.75 350.0535(0.05) = 1.75 450.0545(0.05) = 2.25 Total2014.00 Mean  14.00

81 81 Find the value with 50% below and 50% above. Example: Median

82 82 Obtain relative frequencies. Example: Median MidpointRel Freq 50.50 150.20 250.15 350.05 450.05 Total1.00

83 83 Find the value with 50% below and 50% above. Example: Median 10 of 20 = 50% below 10 Median  10.00 Mean  14.00 Range  44 S.D.  11

84 84 1.3 1.9 1.9 2.5 2.6 3.0 3.6 3.7 5.9 9.7 10.4 10.6 11.2 13.5 15.9 21.4 27.5 29.8 33.6 43.5 Approximations:Actual Values: Median  10.0.05 Median = Mean  14.0Mean = Range  44Range = SD  11SD = Example: Data / Exact Values

85 85 1.3 1.9 1.9 2.5 2.6 3.0 3.6 3.7 5.9 9.7 10.4 10.6 11.2 13.5 15.9 21.4 27.5 29.8 33.6 43.5 Approximations:Actual Values: Median  10.0.05 Median = 10.05 Mean  14.0Mean = 12.68 Range  44Range = 42.2 SD  11SD = 12.31 Example: Data / Exact Values


Download ppt "1 Probability Models for Distributions of Discrete Variables."

Similar presentations


Ads by Google