Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.

3 Consider selecting two different samples of size n from the same population distribution. The x i ’s in the second sample will virtually always differ at least a bit from those in the first sample. For example, a first sample of n = 3 cars of a particular type might result in fuel efficiencies x 1 = 30.7, x 2 = 29.4, x 3 = 31.1, whereas a second sample may give x 1 = 28.8, x 2 = 30.0, and x 3 = 32.5. Before we obtain data, there is uncertainty about the value of each x i.

4 Statistics and Their Distributions Because of this uncertainty, before the data becomes available we view each observation as a random variable and denote the sample by X 1, X 2,..., X n (uppercase letters for random variables). This variation in observed values in turn implies that the value of any function of the sample observations—such as the sample mean, sample standard deviation, or sample fourth spread—also varies from sample to sample. That is, prior to obtaining x 1,..., x n, there is uncertainty as to the value of, the value of s, and so on.

5 Example 19 Suppose that material strength for a randomly selected specimen of a particular type has a Weibull distribution with parameter values  = 2 (shape) and  = 5 (scale). The corresponding density curve is shown in Figure 5.6. Figure 5.6 The Weibull density curve for Example 19

6 Example 19 Using the formulas, and where and  = E (x) = 4.4311 = 4.1628  2 = V(X) = 5.365  = 2.316 The mean exceeds the median because of the distribution’s positive skew. We used statistical software to generate six different samples, each with n = 10, from this distribution (material strengths for six different groups of ten specimens each). cont’d

7 Example 19 The results appear in Table 5.1. Table 5.1 Samples from the Weibull Distribution of Example 19 cont’d

8 Example 19 Followed by the values of the sample mean, sample median, and sample standard deviation for each sample. Notice first that the ten observations in any particular sample are all different from those in any other sample. Second, the six values of the sample mean are all different from one another, as are the six values of the sample median and the six values of the sample standard deviation. The same is true of the sample 10% trimmed means, sample fourth spreads, and so on. cont’d

9 Example 19 Furthermore, the value of the sample mean from any particular sample can be regarded as a point estimate (“point” because it is a single number, corresponding to a single point on the number line) of the population mean , whose value is known to be 4.4311. None of the estimates from these six samples is identical to what is being estimated. The estimates from the second and sixth samples are much too large, whereas the fifth sample gives a substantial underestimate. cont’d

10 Example 19 Similarly, the sample standard deviation gives a point estimate of the population standard deviation. All six of the resulting estimates are in error by at least a small amount. In summary, the values of the individual sample observations vary from sample to sample, so will in general the value of any quantity computed from sample data, and the value of a sample characteristic used as an estimate of the corresponding population characteristic will virtually never coincide with what is being estimated. cont’d

11 Statistics and Their Distributions Definition A statistic is any quantity whose value can be calculated from sample data. Prior to obtaining data, there is uncertainty as to what value of any particular statistic will result. Therefore, a statistic is a random variable and will be denoted by an uppercase letter; a lowercase letter is used to represent the calculated or observed value of the statistic.

12 Statistics and Their Distributions Thus the sample mean, regarded as a statistic (before a sample has been selected or an experiment carried out), is denoted by ; the calculated value of this statistic is. Similarly, S represents the sample standard deviation thought of as a statistic, and its computed value is s. If samples of two different types of bricks are selected and the individual compressive strengths are denoted by X 1,..., X m and Y 1,..., Y n, respectively, then the statistic, the difference between the two sample mean compressive strengths, is often of great interest.

13 Statistics and Their Distributions Any statistic, being a random variable, has a probability distribution. In particular, the sample mean has a probability distribution. Suppose, for example, that n = 2 components are randomly selected and the number of breakdowns while under warranty is determined for each one. Possible values for the sample mean number of breakdowns are 0 (if X 1 = X 2 = 0),.5 (if either X 1 = 0 and X 2 = 1 or X 1 = 1 and X 2 = 0), 1, 1.5,....

14 Statistics and Their Distributions The probability distribution of specifies P( = 0), P( =.5), and so on, from which other probabilities such as P(1   3) and P(  2.5) can be calculated. Similarly, if for a sample of size n = 2, the only possible values of the sample variance are 0, 12.5, and 50 (which is the case if X 1 and X 2 can each take on only the values 40, 45, or 50), then the probability distribution of S 2 gives P(S 2 = 0), P(S 2 = 12.5), and P(S 2 = 50).

15 Statistics and Their Distributions The probability distribution of a statistic is sometimes referred to as its sampling distribution to emphasize that it describes how the statistic varies in value across all samples that might be selected.

16 Random Samples

17 Random Samples The probability distribution of any particular statistic depends not only on the population distribution (normal, uniform, etc.) and the sample size n but also on the method of sampling. Consider selecting a sample of size n = 2 from a population consisting of just the three values 1, 5, and 10, and suppose that the statistic of interest is the sample variance. If sampling is done “with replacement,” then S 2 = 0 will result if X 1 = X 2.

18 Random Samples However, S 2 cannot equal 0 if sampling is “without replacement.” So P(S 2 = 0) = 0 for one sampling method, and this probability is positive for the other method. Our next definition describes a sampling method often encountered (at least approximately) in practice.

19 Random Samples Definition The rv’s X 1, X 2,..., X n are said to form a (simple) random sample of size n if 1. The X i ’s are independent rv’s. 2. Every X i has the same probability distribution.

20 Random Samples Conditions 1 and 2 can be paraphrased by saying that the X i ’s are independent and identically distributed (iid). If sampling is either with replacement or from an infinite (conceptual) population, Conditions 1 and 2 are satisfied exactly. These conditions will be approximately satisfied if sampling is without replacement, yet the sample size n is much smaller than the population size N.

21 Random Samples In practice, if n/N .05 (at most 5% of the population is sampled), we can proceed as if the X i ’s form a random sample. The virtue of this sampling method is that the probability distribution of any statistic can be more easily obtained than for any other sampling method. There are two general methods for obtaining information about a statistic’s sampling distribution. One method involves calculations based on probability rules, and the other involves carrying out a simulation experiment.

22 Deriving a Sampling Distribution

23 Deriving a Sampling Distribution Probability rules can be used to obtain the distribution of a statistic provided that it is a “fairly simple” function of the X i ’s and either there are relatively few different X values in the population or else the population distribution has a “nice” form. Our next example illustrate such situation.

24 Example 20 A certain brand of MP3 player comes in three configurations: a model with 2 GB of memory, costing $80, a 4 GB model priced at $100, and an 8 GB version with a price tag of $120. If 20% of all purchasers choose the 2 GB model, 30% choose the 4 GB model, and 50% choose the 8 GB model, then the probability distribution of the cost X of a single randomly selected MP3 player purchase is given by with  = 106,  2 = 244 (5.2)

25 Example 20 Suppose on a particular day only two MP3 players are sold. Let X 1 = the revenue from the first sale and X 2 the revenue from the second. Suppose that X 1 and X 2 are independent, each with the probability distribution shown in (5.2) [so that X 1 and X 2 constitute a random sample from the distribution (5.2)]. cont’d

26 Example 20 Table 5.2 lists possible (x 1, x 2 ) pairs, the probability of each [computed using (5.2) and the assumption of independence], and the resulting and s 2 values. [Note that when n = 2, s 2 (x 1 – ) 2 (x 2 – ) 2.] Table 5.2 Outcomes, Probabilities, and Values of x and s 2 for Example 20 cont’d

27 Example 20 Now to obtain the probability distribution of, the sample average revenue per sale, we must consider each possible value and compute its probability. For example, = 100 occurs three times in the table with probabilities.10,.09, and.10, so P x (100) = P( = 100) =.10 +.09 +.10 =.29 Similarly, pS 2 (800) = P(S 2 = 800) = P(X 1 = 80, X 2 = 120 or X 1 = 120, X 2 = 80) =.10 +.10 =.20 cont’d

28 Example 20 The complete sampling distributions of and S 2 appear in (5.3) and (5.4). cont’d (5.3) (5.4)

29 Example 20 Figure 5.7 pictures a probability histogram for both the original distribution (5.2) and the distribution (5.3). The figure suggests first that the mean (expected value) of the distribution is equal to the mean 106 of the original distribution, since both histograms appear to be centered at the same place. Figure 5.7 cont’d Probability histograms for the underlying distribution and x distribution in Example 20

30 Example 20 From (5.3), = (80)(.04) +... + (120)(.25) = 106 =  Second, it appears that the distribution has smaller spread (variability) than the original distribution, since probability mass has moved in toward the mean. Again from (5.3), = (80 2 )(.04) +    + (120 2 )(.25) – (106) 2 cont’d

31 Example 20 The variance of is precisely half that of the original variance (because n = 2). Using (5.4), the mean value of S 2 is  S 2 = E(S 2 ) =  S 2  p S 2 (s 2 ) = (0)(.38) + (200)(.42) + (800)(.20) + 244 =  2 That is, the sampling distribution is centered at the population mean , and the S 2 sampling distribution is centered at the population variance  2. cont’d

32 Example 20 If there had been four purchases on the day of interest, the sample average revenue would be based on a random sample of four X i ’s, each having the distribution (5.2). More calculation eventually yields the pmf of for n = 4 as cont’d

33 Example 20 From this,  x = 106 =  and = 61 =  2 /4. Figure 5.8 is a probability histogram of this pmf. Figure 5.8 cont’d Probability histogram for based on n = 4 in Example 20

34 Example 20 Example 20 should suggest first of all that the computation of and can be tedious. If the original distribution (5.2) had allowed for more than three possible values, then even for n = 2 the computations would have been more involved. The example should also suggest, however, that there are some general relationships between E( ), V( ), E(S 2 ), and the mean  and variance  2 of the original distribution. cont’d

35 Simulation Experiments

36 Simulation Experiments The second method of obtaining information about a statistic’s sampling distribution is to perform a simulation experiment. This method is usually used when a derivation via probability rules is too difficult or complicated to be carried out. Such an experiment is virtually always done with the aid of a computer.

37 Simulation Experiments The following characteristics of an experiment must be specified: 1. The statistic of interest (, S, a particular trimmed mean, etc.) 2. The population distribution (normal with  = 100 and  = 15, uniform with lower limit A = 5 and upper limit B = 10,etc.) 3. The sample size n (e.g., n = 10 or n = 50) 4. The number of replications k (number of samples to be obtained)

38 Simulation Experiments Then use appropriate software to obtain k different random samples, each of size n, from the designated population distribution. For each sample, calculate the value of the statistic and construct a histogram of the k values. This histogram gives the approximate sampling distribution of the statistic. The larger the value of k, the better the approximation will tend to be (the actual sampling distribution emerges as k  ). In practice, k = 500 or 1000 is usually sufficient if the statistic is “fairly simple.”

39 Simulation Experiments The final aspect of the histograms to note is their spread relative to one another. The larger the value of n, the more concentrated is the sampling distribution about the mean value. This is why the histograms for n = 20 and n = 30 are based on narrower class intervals than those for the two smaller sample sizes. For the larger sample sizes, most of the values are quite close to 8.25. This is the effect of averaging. When n is small, a single unusual x value can result in an value far from the center.

40 Simulation Experiments With a larger sample size, any unusual x values, when averaged in with the other sample values, still tend to yield an value close to . Combining these insights yields a result that should appeal to your intuition: based on a large n tends to be closer to  than does based on a small n.

Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.

Similar presentations

Presentation on theme: "Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.

Similar presentations

Presentation on theme: "Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples."— Presentation transcript:

Similar presentations

About project

Feedback