We learn how to make inferences about the population on the basis of information contained in the sample. Several of these techniques are based on the assumption that the population is approximately normally distributed. It will be important to determine whether the sample of data come from a normal population before we can apply these techniques properly.
Example 4.24 The EPA mileage ratings on 100 cars are reproduced in the following table. Numerical and graphical descriptive measures for the data are shown on the StatCrunch and SPSS printouts presented in Figure 4.26. Determine whether the EPA mileage ratings are from an approximate normal distribution.
#1: Histogram or Stem-and-leaf Display Clearly, the mileages fall into an approximately mound shaped, symmetric distribution centered around the mean of about 37 mpg. Therefore, check #1 in the box indicates that the data are approximately normal.
#2: Compute the Intervals These percentages agree almost exactly with those from a normal distribution.
#3: Ratio IQR/s Since the value is approximately equal to 1.3, we have further confirmation that the data are approximately normal.
Conclusion The checks for normality are simple, yet powerful, techniques to apply, but they are only descriptive in nature. Thus, we should be careful not to claim that the 100 EPA mileage ratings are, in fact, normally distributed. We can only stat that it is reasonable to believe that the data are from a normal distribution.
In previous sections, we assumed that we knew the probability distribution of a random variable, and using this knowledge, we were able to compute the mean, variance, and probabilities associated with the random variable. However, in most practical applications, the true mean and standard deviation are unknown quantities that have to be estimated.
Before being able to use the sample statistics to make inferences about population parameters, we need to be able to evaluate their properties. Does one sample statistic contain more information than another about a population parameter? On what basis should we choose the ‘best’ statistic for making inferences about a parameter?
This illustrates an important point: Neither the sample mean or the sample median will always fall closer to the population mean. We cannot compare these two sample statistics or, in general, any two sample statistics on the basis of their performance with a single sample. We recognize that sample statistics are themselves random variables, because different samples can lead to different values for a sample statistics.
Last, as random variables, sample statistics must be judged and compared on the basis of their probability distribution. This means the collection of values and associated probabilities of each statistics that would be obtained if the sampling experiment were repeated a VERY LARGE NUMBER OF TIME.
In actual practice, the sampling distribution of statistic is obtained mathematically or (at least approximately) by simulating the sample on a computer, using a procedure similar to that just described.
Example 4.26 Consider the popular casino game of craps, in which a player throws two dice and bets on the outcome (the sum total of the dots showing on the upper faces of the two dice). Let’s say that if the sum total of the die is 7 or 11, the roller wins $5; if the total is 2, 3, or 12, the roller loses $5; and for any other total (4, 5, 6, 8, 8, 9, or 10) no money is lost or won on the roll. Let x represent the result of the come-out roll wager (-$5, $0, or +$5). The following table is the actual probability distribution of x is: Outcome of Wager, x-505 p(x)1/96/92/9
Though the following example demonstrates the procedure for finding the exact sampling distribution of a statistic when the number of different samples that could be selected from the population is relative small. In the real world, populations often consist of large number of different values, making samples difficult to count. When this occurs, we choose to obtain the approximate sampling distribution for a statistic by simulating the sampling over and over again and recording the proportion of times different values of the statistic occur.
4.8 Homework due Wednesday 4.9 Notes on Monday Chapter 4 Test Next Thursday