# Sta220 - Statistics Mr. Smith Room 310 Class #14.

## Presentation on theme: "Sta220 - Statistics Mr. Smith Room 310 Class #14."— Presentation transcript:

Sta220 - Statistics Mr. Smith Room 310 Class #14

Section 4.6 and 4.8

Section 4.6

We learn how to make inferences about the population on the basis of information contained in the sample. Several of these techniques are based on the assumption that the population is approximately normally distributed. It will be important to determine whether the sample of data come from a normal population before we can apply these techniques properly.

Example 4.24 The EPA mileage ratings on 100 cars are reproduced in the following table. Numerical and graphical descriptive measures for the data are shown on the StatCrunch and SPSS printouts presented in Figure 4.26. Determine whether the EPA mileage ratings are from an approximate normal distribution.

Summary statistics:

#1: Histogram or Stem-and-leaf Display Clearly, the mileages fall into an approximately mound shaped, symmetric distribution centered around the mean of about 37 mpg. Therefore, check #1 in the box indicates that the data are approximately normal.

#2: Compute the Intervals These percentages agree almost exactly with those from a normal distribution.

#3: Ratio IQR/s Since the value is approximately equal to 1.3, we have further confirmation that the data are approximately normal.

SPSS normal probability plot for gas mileage data An SPSS normal probability plot of the mileage data is shown in Figure 4.26. Notice that the ordered mileage values fall reasonably close to a straight line when plotted against the expected values from a normal distribution. These suggest that EPA mileage data are approximately normally distributed. Copyright © 2013 Pearson Education, Inc.. All rights reserved.

Conclusion The checks for normality are simple, yet powerful, techniques to apply, but they are only descriptive in nature. Thus, we should be careful not to claim that the 100 EPA mileage ratings are, in fact, normally distributed. We can only stat that it is reasonable to believe that the data are from a normal distribution.

Section 4.8

In previous sections, we assumed that we knew the probability distribution of a random variable, and using this knowledge, we were able to compute the mean, variance, and probabilities associated with the random variable. However, in most practical applications, the true mean and standard deviation are unknown quantities that have to be estimated.

We will often use the information contained in these sample statistics to make inferences about the parameters of a population.

Table 4.8 Note that the term statistic refers to sample quantity and the term parameter refers to a population quantity. Copyright © 2013 Pearson Education, Inc.. All rights reserved.

Before being able to use the sample statistics to make inferences about population parameters, we need to be able to evaluate their properties. Does one sample statistic contain more information than another about a population parameter? On what basis should we choose the ‘best’ statistic for making inferences about a parameter?

This illustrates an important point: Neither the sample mean or the sample median will always fall closer to the population mean. We cannot compare these two sample statistics or, in general, any two sample statistics on the basis of their performance with a single sample. We recognize that sample statistics are themselves random variables, because different samples can lead to different values for a sample statistics.

Last, as random variables, sample statistics must be judged and compared on the basis of their probability distribution. This means the collection of values and associated probabilities of each statistics that would be obtained if the sampling experiment were repeated a VERY LARGE NUMBER OF TIME.

In actual practice, the sampling distribution of statistic is obtained mathematically or (at least approximately) by simulating the sample on a computer, using a procedure similar to that just described.

Copyright © 2013 Pearson Education, Inc.. All rights reserved. Say that you have two statistics, A and B, for estimating the same parameter and the following graph below represents their sampling distribution. Which would you prefer and why?

Example 4.26 Consider the popular casino game of craps, in which a player throws two dice and bets on the outcome (the sum total of the dots showing on the upper faces of the two dice). Let’s say that if the sum total of the die is 7 or 11, the roller wins \$5; if the total is 2, 3, or 12, the roller loses \$5; and for any other total (4, 5, 6, 8, 8, 9, or 10) no money is lost or won on the roll. Let x represent the result of the come-out roll wager (-\$5, \$0, or +\$5). The following table is the actual probability distribution of x is: Outcome of Wager, x-505 p(x)1/96/92/9

-5-3.33-1.6701.673.335

-505

Once again, the most likely median outcome after 3 randomly selected come-out rolls M = \$0, a result that occurs with probability.8395.

Though the following example demonstrates the procedure for finding the exact sampling distribution of a statistic when the number of different samples that could be selected from the population is relative small. In the real world, populations often consist of large number of different values, making samples difficult to count. When this occurs, we choose to obtain the approximate sampling distribution for a statistic by simulating the sampling over and over again and recording the proportion of times different values of the statistic occur.

4.8 Homework due Wednesday 4.9 Notes on Monday Chapter 4 Test Next Thursday