Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions.

Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions

Proportion of a property When a sample is collected in relation to a property, it is important to know if its proportion is reasonable. For example, when we interview a group of people for work, we would like to know if the proportion of candidates is normal based on gender, age, race etc. The proportion of a property is highly dependent on the size of samples. In small samples, it is not surprising if the proportion of a sample is unusual. When the size increases, we expect the proportion to be closer to that of the population.

Distribution of proportions If the proportion of a property in a population is , and we take samples of size n, then the proportion p is expected to follow the normal distribution, with a mean , and a variance  (1 -  )/n. As can be seen, the variance decreases as the sample size increases. When n is large, we would expect the proportion of the property in the sample to be very close to that of the population.

Need for Continuity Adjustment Since the proportion is based on a ratio m/n, the value of m will be an integer. In order to avoid bias in obtaining the correct proportion, it is necessary to introduce a correction of ½ unit. This is the same as for continuity correction in discrete- to-continuous approximation. Thus we shall treat p > m/n as p > (m+½)/n, p  m/n as p  (m – ½)/n, p < m/n as p < (m – ½)/n, and p  m/n as p  (m+½)/n.

Example 1 30% of customers to a fast-food restaurant are old folks who are given discounts. During a short period, the restaurant serves 40 customers. What is the probability the percentage of old folks is not more than 25%? Solution: p ~ N(0.3, 0.3×(1–0.3)/40). P(p  0.25)  P(p  0.25 + 0.5/40) [Continuity adjustment] = P(z  [0.2625–0.3]/  0.00525) = P(z  –0.52) = 0.5–0.1985 = 0.3015.

Example 2 A furniture factory claims that less than 12% of its executive chairs has defects. An office just ordered 25 such chairs. What is the probability the percentage of defects exceeds 15%? Solution: p ~ N(0.12, 0.12×(1–0.12)/25). P(p> 0.15)  P(p > 0.15 + 0.5/25) [Continuity adjustment] = P(z > [0.17–0.12]/  0.004224) = P(z > 0.77) = 0.5–0.2794 = 0.2206.

Example 3 It is estimated that 65% of students in the Faculty of Education are ladies. A class in FoE has 120 students. What is the probability the proportion of ladies in the class exceeds 70%? Solution: Let p represents the proportion for ladies, then p ~ N(0.65, [0.65×(1-0.65)]/120). After continuity correction, P(p > 0.70)  P(p > 0.70 + 0.5/120) = P(z > [0.7042 – 0.65]/  (0.65×0.35/120) = P(z > 1.24) = 0.5–0.3925 = 0.1075.

Alternative: Binomial distribution. We note that the same question can be solved using binomial distribution as follows: Let X represent number of ladies. X~Bin(120, 0.65). As n>30, X is approximated by normal distribution  X~N(120×0.65, 120×0.65×0.35). 70% of 120 is 84. We are looking for P(X>84). By continuity adjustment, we have P(X>84.5) = P(z>[84.5-78]/  27.3) = P(z > 1.24) = 0.1075, as we obtained earlier.

Example 4 18% of students withdraw half-way through a course. In a class with 45 students, what is the probability less than 15% will withdraw? Solution: p ~ N(0.18, 0.18×(1–0.18)/45) After continuity adjustment, the event p < 0.15  p < 0.15–0.5/45 P(a < 0.1389) = P(z < [0.1389–0.18]/  0.00328) = P(z < –0.72) = 0.5 – 0.2642 = 0.2358.

Binomial Alternative: Let W represent the number of students who withdraw. Then W~Bin(45, 0.18). 15% of 45 is 6.75. So the event is W<6.75. Even though the number here is a decimal, we still need to make the same continuity adjustment. Thus we look for W < 6.75–0.5. As n>30, we use the approximation W~N(45×0.18, 45×0.18×0.82). P(W < 6.25) = P([6.25 – 8.1]/  6.642) = P(z < –0.72) = 0.2358, as found above.

Difference between proportions The same rules on the distribution of the difference between means will apply to the difference between proportions. Thus if  1 and  2 are proportions of the same property for two populations, and we take samples of sizes n 1 and n 2 from those two population respectively, then we expect the difference of proportions p 1 –p 2 of the samples to satisfy p 1 –p 2 ~N(  1 –  2,  1 (1–  1 )/n 1 +  2 (1–  2 )/n 2 ).

Example 5 In the 1985 cohort, it is known that 20% of non-graduates and 14% of graduates remain unemployed 6 months after coming on to the market. A survey tracks 80 non- graduates and 50 graduates of the cohort. Find the probability the percentage of non- graduates who remain unemployed exceed that of graduates by at least 10%.

Solution: Let p n represent the proportion that of non- graduates and p g that of unemployed graduates. p n – p g ~ (0.20–0.14, 0.2×0.8/80+0.14×0.86/50) P(p n – p g > 0.1) = P(z > [0.1 – 0.06] /  (0.2×0.8/80+0.14×0.86/50)) = P(z > 0.60) = 0.5 – 0.2257 = 0.2743.

Example 6 The Transport Ministry believes that 35% of express buses exceed speed limits on the highway. On a certain day, two teams track express buses going in opposite directions. The team for north-bound traffic monitor 60 buses, while the south-bound team has 75 buses on record. What is the probability the percentage of speeding buses for north-bound exceeds that of southbound by at least 4%? Solution:Let p n represent the proportion of north-bound buses which speed, and p s the same proportion for south-bound buses.

p n –p s ~ (0.35-0.35, 0.35×0.65/60+0.35×0.65/75) P(p n -p s > 0.04) = P(z > [0.04 – 0]  (0.35×0.65/60+0.35×0.65/75)) = P(z > 0.48) = 0.5 – 0.1844 = 0.3156. So there is a probability of 0.3156 that the north- bound speeding percentage might exceed that of south-bound by 4% or more. Note that in this case, we also have the same probability 0.3156 that the proportion of south- bound speeders exceeds that of north-bound by 4%!

Confidence Interval for Proportion When we have the proportion of a property from the population, we expect the proportion for a sample to follow the normal distribution. Hence, we may apply the same procedure to estimate the (1–  )100% confidence interval as for the mean. We shall use two examples to illustrate the method.

Example 7 The Tourism Department reports says that 32% of tourists are foreigners. A group of 150 tourists are visiting the Royal Museum. What is 98% confidence interval for the percent of foreign tourists? Solution : p~N(0.32, 0.32×0.68/150); p~N(0.32, 0.001451) At 95% confidence,  =0.05,  /2=0.025. Z 0.025 = 1.96. Hence the 95% confidence interval for the proportion of foreign tourist is 0.32–1.96×  0.001451  p  0.32+1.96×  0.001451  0.2453  p  0.3947  24.53% to 39.47% of the tourists are foreigners.

Example 8 The records of a bank shows that 17% of its customers are business customers, but the transactions for this group make up 75%. During a certain hour, there were 50 customers and 400 transactions. Find the 90% confidence interval for the percentage of (i)Business customers; (ii)Business transactions.

Solution: p1 = proportion of business customers; p2 = proportion of business transactions. p1~N(0.17, 0.17×0.83/50); p2~N(0.75, 0.75×0.25/400). At 90% confidence,  =0.1,  /2=0.05. z 0.05 = 1.6449. The confidence intervals are: 0.17 – 1.6449×  0.002822  p1  0.17 + 1.6449×  0.002282  0.0826  p1  0.2574; and 0.75 – 1.6449×  0.00046875  p2  0.75 + 1.6449×  0. 00046875  0.7144  p2  0.7856. Hence the range is 8.26% to 25.74% for business customers, and 71.44% to 78.56% for business transactions.

Confidence Interval From Sample When the proportions are derives from data of samples, we expect the same normal distribution can be used to model the population proportion, using the sample proportion as the estimator. For such purposes, we expect the result will be good only if the sample size is reasonably large. For small samples, it is not reliable to use the proportion obtained to obtain a general picture of the population proportion.

Example 9 In a survey on cleanliness of eating stalls, it was found that only 55 out of 140 stalls checked follow proper procedures to maintain hygienic environments. Based on this, estimate the 95% confidence interval for the percentage of clean eating stalls nationwide.

Solution: Even though only the sample data are available, we can safely assume that the proportion from such a big sample is a good estimator for the wider proportion. Hence we shall use the normal distribution to estimate the proportion for the nation: p~N(55/140, [55/140×85/140]/140) At 95%,  =0.05,  /2=0.025. Z 0.025 = 1.96. So the 95% interval for population proportion of clean eateries is 55/140 – 1.96  ([55/140×85/140]/140) to 55/140 + 1.96  ([55/140×85/140]/140)  0.3120  p  0.4738 or 31.2% to 47.38%.

Example 10 During a screening process, it was found that 20 out of 80 boys 15-18 years old and 30 out of 100 girls of the same age group are fat. Based on this study, find the probability the proportion of fat girls exceeds that of boys by 2% or more. NOTE: In this case, we only have the sample proportions. However, as the sample sizes are large enough, we can use these data to project the likely distribution of the difference of proportions.

Solution: Note: 20/80 = 0.25, 30/100 = 0.3. p b ~N(0.25, 0.25×0.75/80); p g ~N(0.3, 0.3×0.7/100); pg – pb ~ N(0.3-0.25, 0.25×0.75/80 + 0.3×0.7/100) P(pg – pb > 0.02) = P(z >[0.02-0.05]/  (0.25×0.75/80 + 0.3×0.7/100) = P(z > -0.45) = 0.5 + 0.1736 = 0.6736.

Difference of proportions Using the distribution for difference between proportions, we can find the probability for the difference between proportions (Exs 11 & 12). When the sample sizes are large, we can also use the sample proportions to estimate the interval for the difference between population proportions. The same procedure is used to determine the confidence interval for the difference in proportions (Ex 13).

Example 11 On the average, 37% of men and 18% of women in the country smoke. A survey is taken for 50 men and 60 women. What is the probability the proportion of men who smoke exceeds that of women by at least 20%? Solution: Let p m and p w represent the proportion of men and women who smoke. Then p m ~ N(0.37, 0.37×0.63/50); p w ~ N(0.18, 0.18×0.82/60).

Example 11 (Solution) This means that p m – p w ~N(0.37 – 0.18, 0.37×0.63/50+ 0.18×0.82/60). So P(p m  p w +0.20) = P(p m – p w  0.20) = P(z  [0.20 – 0.19]/  (0.37×0.63/50+ 0.18×0.82/60). = P(z  0.12) = 0.5 – 0.0478 = 0.4522

Example 12 65% of those achieving good results at STPM exam and 55% of those for Matriculation exam get admitted to universities of their choice. A check is made on 72 students successful at STPM and 45 of those at Matriculation. What is the probability the success rate in university admission for those through Matriculation is at least as good as those through STPM? Solution: Let p s be the proportion of STPM candidates who are successful and p m for that of matriculation candidates.

Example 12 (Solution) Then we have: p s ~ N(0.65, 0.65×0.35/72); p m ~ N(0.55, 0.55×0.45/45). And so p m – p s ~N(0.55 – 0.65, 0.65×0.35/72 + 0.55×0.45/45). Hence P(p m  p s ) = P(p m – p s  0.0) = P(z  [0.0 – (-0.10]/  (0.65×0.35/72 + 0.55×0.45/45). = P(z  1.07) = 0.5 – 0.3577 = 0.1423.

Example 13 Out of 75 sticks of LajuMaut cigarettes, 20 are found to have nicotine exceeding danger levels. For 60 sticks of LajuMaut cigarettes, 15 are also found to have nicotine exceeding danger levels. What is the 90% confidence interval of p L –p C, where p L and p C represents the proportions of LajuMuat and CepatMaut cigarettes with excessive levels of nicotine?

From the data given, p L =20/75 = 0.2667, and p C =15/60 = 0.25. By theory, p L –p C ~N(0.2667– 0.25, 0.2667×0.7333÷75 + 0.25×0.75÷60). At 90% confidence,  =0.1,  /2=0.05. And z 0.05 =1.6449. Hence the confidence interval for the difference in proportion is from 0.0167 – 1.6449×  (0.2667×0.7333÷75 + 0.25×0.75÷60) to 0.0167+1.6449×  (0.2667×0.7333÷75 + 0.25×0.75÷60), I.e. –0.1078 to 0.1412. NOTE: The left boundary –0.1078 indicates that p L may actually be less than p C.

Multiple Groups When we want to compare the proportions of multiple (3 or more) groups in a population, the method using normal distribution becomes ineffective. An alternative is to use the differences between what are expected and what are obtained and treat them as variations. The sum of squares of the differences can be modeled using the  2 distribution. However, as  2 distribution tables do not provide for probabilities, we shall only look at these cases in hypothesis testing. (See 4C).

Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions.

Similar presentations

Presentation on theme: "Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions.

Similar presentations

Presentation on theme: "Engineering Statistics Chapter 3 Distribution of Samples Distribution of sample statistics 3C - Proportions and Difference between Proportions."— Presentation transcript:

Similar presentations

About project

Feedback