3 2010 #1 Agricultural experts are trying to develop a bird deterrent to reduce costly damage to crops in the United States. An experiment is to be conducted using garlic oil to study its effectiveness as a nontoxic, environmentally safe bird repellant. The experiment will use European starlings, a bird that causes considerable damage annually to the corn crop in the United States. Food granules made from corn are to be infused with garlic oil in each of five concentrations of garlic – 0 percent, 2 percent, 10 percent, 25 percent, and 50 percent. The researchers will determine the adverse reaction of the birds to the repellent by measuring the number of food granules consumed during a two-hour period following overnight food deprivation. There are forty birds available for the experiment, and the researchers will use eight birds for each concentration of garlic. Each bird will be kept in a separate cage and provided with the same number of food granules. a) For the experiment, identify i. the treatments ii. the experimental units iii. the response that will be measured
4 i. The treatments are the different concentrations of garlic in the food granules. Specifically, there are five treatments: 0 percent, 2 percent, 10 percent, 25 percent and 50 percent. ii. The experimental units are the birds (starlings), each placed in an individual cage. iii. The response is the number of food granules consumed by the bird.
5 After performing the experiment, the researchers recorded the data shown in the table below. i. Construct a graph of the data that could be used to investigate the appropriateness of a linear regression model for analyzing the results of the experiment.
7 ii. Based on your graph, do you think a linear regression model is appropriate? Explain. The curved pattern in this scatterplot reveals that a linear regression model would not be appropriate for modeling the relationship between these variables.
9 2003B #2 A simple random sample of adults living in a suburb of a large city was selected. The age and annual income of each adult in the sample were recorded. The resulting data are summarized in the table below.Annual IncomeAge Category$25,000-$35,000$35,001-$50,000Over $50,000Total21-30815275031-452232358946-60121453Over 60537476496207
10 a) What is the probability that a person chosen at random from those in this sample will be in the age category?b) What is the probability that a person chosen at random from those in this sample whose incomes are over $50,000 will be in the age category? Show your work.Annual IncomeAge Category$25,000-$35,000$35,001-$50,000Over $50,000Total21-30815275031-452232358946-60121453Over 60537476496207
11 c) Based on your answers to parts (a) and (b), is annual income independent of age category for those in this sample? Explain. If annual income and age were independent, the probabilities in (a) and (b) would be equal. Since these probabilities are not equal, annual income and age category are not independent for adults in this sample.
12 2009B #2 The ELISA tests whether a patient has contracted HIV 2009B #2 The ELISA tests whether a patient has contracted HIV. The ELISA is said to be positive if it indicates that HIV is present in a blood sample, and the ELISA is said to be negative if it does not indicate that HIV is present in a blood sample. Instead of directly measuring the presence of HIV, the ELISA measures levels of antibodies in the blood that should be elevated if HIV is present. Because of variability in antibody levels among human patients, the ELISA does not always indicate the correct result. As part of a training program, staff at a testing lab applied the ELISA to 50 blood samples known to contain HIV. The ELISA was positive for 489 of those blood samples and negative for the other 11 samples. As a part of the same training program, the staff also applied the ELISA to 500 other blood samples known to not contain HIV. The ELISA was positive for 37 of those blood samples and negative for the other 463 samples.
13 a) When a new blood sample arrives at the lab, it will be tested to determine whether HIV is present. Using the data from the training program, estimate the probability that the ELISA would be positive when it is applied to a blood sample that does not contain HIV.The estimated probability of a positive ELISA if the blood sample does not have HIV present is
14 b) Among the blood samples examined in the training program that provided positive ELISA results for HIV, what proportion actually contained HIV? A total of = 526 blood samples resulted in a positive ELISA. Of these, 489 samples actually contained HIV. Therefore the proportion of samples that resulted in a positive ELISA that actually contained HIV is
15 c) When a blood sample yields a positive ELISA result, two more ELISAs are performed on the same blood sample. If at least one of the two additional ELISAs is positive, the blood sample is subjected to a more expensive and more accurate test to make a definitive determination of whether HIV is present in the sample. Repeated ELISAs on the same sample are generally assumed to be independent. Under the assumption of independence, what is the probability that a new blood sample that comes into the lab will be subjected to the more expensive test if that sample does not contain HIV?
16 From part (a), the probability that the ELISA will be positive, given that the blood sample does not actually have HIV present, is Thus, the probability of a negative ELISA, given that the blood sample does not actually have HIV present, is 1 – = P(new blood sample that does not contain HIV will be subjected to the more expensive test) = P(1st ELISA positive and 2nd ELISA positive OR 1st ELISA positive and 2nd ELISA negative and 3rd ELISA positive | HIV not present in blood) = P(1st ELISA positive and 2nd ELISA positive | HIV not present in blood) + P(1st ELISA positive and 2nd ELISA negative and 3rd ELISA positive | HIV not present in blood) = (0.074)(0.074) + (0.074)(0.926)(0.074) = = ≈
17 P(new blood sample that does not contain HIV will be subjected to the more expensive test) = P(1st ELISA positive and not both the 2nd and 3rd are negative)= (0.074)( )= (0.074)( )=≈
19 2002 #3 There are 4 runners on the New High School team 2002 #3 There are 4 runners on the New High School team. The team is planning to participate in a race in which each runner runs a mile. The team time is the sum of the individual times for the 4 runners. Assume that individual times of the 4 runners are all independent of each other. The individual times, in minutes, of the runners in similar races are approximately normally distributed with the following means and standard deviations.MeanStandard DeviationRunner 14.90.15Runner 24.70.16Runner 34.50.14Runner 44.8
20 a) Runner 3 thinks that he can run a mile in less than 4 a) Runner 3 thinks that he can run a mile in less than 4.2 minutes in the next race. Is this likely to happen? Explain.It is possible but unlikely that runner 3 will run a mile in less than 4.2 minutes on the next race. Based on his running time distribution, we would expect that he would have times less than 4.2 minutes less than 2 times in 100 races in the long run. OR It is possible but unlikely that runner 3 will run a mile in less than 4.2 minutes on the next race because 4.2 is more than 2 standard deviations below the mean. Since the running time has a normal distribution, it is unlikely to be more than 2 standard deviations below the mean.
21 b) The distribution of possible team times is approximately normal b) The distribution of possible team times is approximately normal. What are the mean and standard deviation of this distribution? The runners times are independently distributed, therefore
22 c) Suppose the team’s best time to date is 18. 4 minutes c) Suppose the team’s best time to date is 18.4 minutes. What is the probability that the team will beat its own best time in the next race?
23 2010 #4 An automobile company wants to learn about customer satisfaction among the owners of five specific car models. Large sales volumes have been recorded for three of the models, but the other two models were recently introduced so their sales volumes are smaller. The number of new cars sold in the last six months for each of the models is shown in the table below. The company can obtain a list of all individuals who purchased new cars in the last six months for each of the five models shown in the table. The company wants to sample 2,000 of these owners.Car ModelABCDETotalNumber of new cars sold in the last six months112,33896,17483,2413,2782,323297,354
24 a) For simple random samples of 2,000 new car owners, what is the expected number of owners of model E and the standard deviation of the number of owners of model E? Because the population size is so large compared with the sample size (≈ 149 times the sample size), far greater than the usual standard of 10 or 20 times larger, we can use the binomial probability distribution even though this is technically sampling without replacement. The parameters of this binomial distribution are the sample size, n, which has a value of n = 2,000, and the proportion of new car buyers who bought model E, p, which has a value of p = The expected value of the number of model E buyers in a simple random sample of 2,000 is therefore n× p = 2,000× ≈ The variance is n× p×(1− p) = 2,000×0.0078×(1− ) ≈15.50, so the standard deviation is the square root of ≈ 3.94.
25 b) When selecting a simple random sample of 2,000 new car owners, how likely is it that fewer than 12 owners of model E would be included in the sample? Justify your answer. For the reason given in part (a), the binomial distribution with n = 2,000 and p ≈ can be used here. The probability that the sample would contain fewer than 12 owners of model E is calculated from the binomial distribution to be This probability is small enough that the result (fewer than 12 owners of model E in the sample) is not likely, but this probability is also not small enough to consider the result very unlikely.
26 c) The company is concerned that a simple random sample of 2,000 owners would include fewer than 12 owners of Model D or fewer than 12 owners of Model E. Briefly describe a sampling method for randomly selecting 2,000 owners that will ensure at least 12 owners will be selected for each of the 5 car models.Stratified random sampling addresses the concern about the number of owners for models D and E. By stratifying on car model and then taking a simple random sample of at least 12 owners from thepopulation of owners for each model, the company can ensure that at least 12 owners are included in the sample for each model while maintaining a total sample size of 2,000. For example, the companycould select simple random samples of sizes 755, 647, 560, 22 and 16 for models A, B, C, D and E, respectively, to make the sample size approximately proportional to the size of the owner population foreach model.
28 2006 #3 The depth from the surface of Earth to a refracting layer beneath the surface can be estimated using methods developed by seismologists. One method is based on the time required for vibrations to travel from a distant explosion to a receiving point. The depth measurement (M) is the sum of the true depth (D) and the random measurement error (E). That is, M = D + E. The measurement error (E) is assumed to be normally distributed with mean 0 feet and standard deviation 1.5 feet. a) If the true depth at a certain point is 2 feet, what is the probability that the depth measurement will be negative? Since M = D + E (a normal random variable plus a constant is a normal random variable), we know that M is normally distributed with a mean of 2 feet and a standard deviation of 1.5 feet. Thus,
29 b) Suppose three independent depth measurements are taken at the point where the true depth is 2 feet. What is the probability that at least one of these measurements will be negative? P(at least one measurement < 0) = 1 – P(all three measurements 0) = 1 – (1 – )3 = 1 – (0.9082)3 = 1 – =
30 c) What is the probability that the mean of the three independent depth measurements taken at the point where the true depth is 2 feet will be negative? Let denote the mean of three independent depth measurement taken at a point where the true depth is 2 feet. Since each measurement comes from a normal distribution, the distribution of is normal with a mean of 2 feet and a standard deviation of feet. Thus,
31 2009 #2 2. A tire manufacturer designed a new tread pattern for its all-weather tires. Repeated tests were conducted on cars of approximately the same weight traveling at 60 miles per hour. The tests showed that the new tread pattern enables the cars to stop completely in an average distance of 125 feet with a standard deviation of 6.5 feet and the stopping distances are approximately normally distributed. a) What is the 70th percentile of the distribution of stopping distances? Let X denote the stopping distance of a car with new tread tires where X is normally distributed with a mean of 125 feet and a standard deviation of 6.5 feet. The z-score corresponding to a cumulative probability of 70 percent is z = Thus, the 70th percentile value can be computed as:
32 b) What is the probability that at least 2 cars out of 5 randomly selected cars in the study will stop in a distance that is greater than the distance calculated in part (a)? From part (a), it was found that a stopping distance of feet has a cumulative probability of Thus the probability of a stopping distance greater than is 1– 0.70 = Let Y denote the number of cars with the new tread pattern out of five cars that stop in a distance greater than feet. Y is a binomial random variable with n = 5 and p = 0.30.
33 c) What is the probability that a randomly selected sample of 5 cars in the study will have a mean stopping distance of at least 130 feet? Let denote the mean of the stopping distances of five randomly selected cars. All tires have the new tread pattern. Because the stopping distance for each of the five cars has a normal distribution, the distribution of is normal with a mean of 125 feet and a standard deviation of feet. Thus,
35 2010 #3A humane society wanted to estimate with 95 percent confidence the proportion of households in its county that own at least one dog.a) Interpret the 95 percent confidence level in this context.The 95 percent confidence level means that if one were to repeatedly take random samples of the same size from the population and construct a 95 percent confidence interval from each sample, then in the long run 95 percent of those intervals would succeed in capturing the actual value of the population proportion of households in the county that own at least one dog.
36 The humane society selected a random sample of households in its county and used the sample to estimate the proportion of households that own at least one dog. The conditions for calculating a 95 percent confidence interval for the proportion of households in this county that own at least one dog were checked and verified, and the resulting confidence interval was ± b) A national pet products association claimed that 39 percent of all American households owned at least one dog. Does the humane society’s interval estimate provide evidence that the proportion of dog owners in its county is different from the claimed national proportion? Explain. No. The 95 percent confidence interval ± is the interval (0.298, 0.536). This interval includes the value 0.39 as a plausible value for the population proportion of households in the county that own at least one dog. Therefore, the confidence interval does not provide evidence that the proportion of dog owners in this county is different from the claimed national proportion.
37 c) How many households were selected in the humane society’s sample c) How many households were selected in the humane society’s sample? Show how you obtained your answer. The sample proportion is 0.417, and the margin of error is Determining the sample size requires solving the equation Thus, so the humane society must have selected 66 households for its sample.
39 2006B #6 Sunshine Farms wants to know whether there is a difference in consumer preference for two new juice products – Citrus Fresh and Tropical Taste. In an initial blind taste test, 8 randomly selected consumers were given unmarked samples of the two juices. The product that each consumer tasted first was randomly decided by the flip of a coin. After tasting the two juices, each consumer was asked to choose which juice he or she preferred, and the results were recorded. a) Let p represent the population proportion of consumers who prefer Citrus Fresh. In terms of p, state the hypotheses that Sunshine Farms is interested in testing. H0 : p = 0.5 versus Ha : p ≠ 0.5
40 b) One might consider using a one-proportion z-test to test the hypotheses in part (a). Explain why this would not be a reasonable procedure for this sample. The conditions for the large sample one-proportion z-test are not satisfied. np = n(1 – p) = 8 x 0.5 = 4 < 5.
41 c) Let X represent the number of consumers in the sample who prefer Citrus Fresh. Assuming there is no difference in consumer preference, find the probability for each possible value of X. Record the x-values and the corresponding probabilities in the table below. X will follow a binomial distribution with n = 8 and p = 8. The possible values of X and their corresponding probabilities are given in the table below:
42 d) When testing the hypotheses in part (a), Sunshine Farms will conclude that there is a consumer preference if too many or too few individuals prefer Citrus Fresh. Based on your probabilities in part (c), is it possible for the significance level (probability of rejecting the null hypothesis when it is true) for this test to be exactly 0.05? Justify your answer. No, there is no possible test with a p-value of exactly 0.5. The probability that none of the individuals (X = 0) or all of the individuals (X = 8) prefer Citrus Fresh is 2 x = , which is less than The probability that one or fewer of the individuals (X 1) or seven or more of the individuals (X 7) prefer Citrus Fresh is 2 x ( ) = , which is greater than 0.05.
43 e) The preference data for the 8 randomly selected consumers are given in the table below. Based on these preferences and your previous work, test the hypotheses in part (a).IndividualJuice Preference1Tropical Taste2Citrus Fresh345678
44 For the preference data provided, X = 2 For the preference data provided, X = 2. From the table of binomial probabilities computed in part (c), the probability that two or fewer of the individuals (X 2) or six or more of the individuals (X 6) prefer Citrus Fresh when p = 0.05 is 2 x ( ) = Because the p-value of is greater than any reasonable significance level, say , we would not reject the null hypothesis that p = 0.5. That is, we do not have statistically significant evidence for a consumer preference between Citrus Fresh and Tropical Taste.
45 f) Sunshine Farms plans to add one of these two new juices – Citrus Fresh or Tropical Taste – to its production schedule. A follow-up study will be conducted to decide which of the two juices to produce. Make one recommendation for the follow-up study that would make it better than the initial study. Provide a statistical justification for your recommendation in the context of the problem.
46 Increase the number of consumers involved in the preference test Increase the number of consumers involved in the preference test. More consumers will give you more data, and you will be better able to detect a difference between the population proportion of consumers who prefer Citrus Fresh and 0.5. The sample proportion in the initial study was only 0.25 (2/8), but we were not able to reject the null hypothesis that p = ½. By increasing the number of consumers, a difference of that magnitude would allow the null hypothesis to be rejected. For example, with n = 80 and X = 20 the large sample z- statistic would be and the p-value would be approximately zero.