# Chapter 9 Chapter 10 Chapter 11 and 12

## Presentation on theme: "Chapter 9 Chapter 10 Chapter 11 and 12"— Presentation transcript:

Chapter 9 Chapter 10 Chapter 11 and 12
Review #2 Chapter 9 Chapter 10 Chapter 11 and 12

Chapter 9 Sampling Distributions
A statistic is a random variable describing a characteristic of a random samples. Sample mean Sample variance We use statistic values in inferential statistics (make inference about population characteristics from sample characteristics). Statistics have distributions of their own.

Chapter 9 The Central Limit Theorem
The distribution of the sample mean is normal if the parent distribution is normal. The distribution of the sample mean approaches the normal distribution for sufficiently large samples (n ³ 30), even if the parent distribution is not normal. The parameters of the sample distribution of the mean are: Mean: Standard deviation: (Assumption: The population is sufficiently large. No correction is needed in the calculation of the variance).

Chapter 9 The Central Limit Theorem
Problem 1 (Using Excel) Given a normal population whose mean is 50 and whose standard deviation is 5, Question 1: Find the probability that a random sample of 4 has a mean between 49 and 52 Answer: -.4 .8

Chapter 9 The Central Limit Theorem
Normal table Chapter 9 The Central Limit Theorem Problem 1 (Using the table) Given a normal population whose mean is 50 and whose standard deviation is 5, Question 1: Find the probability that a random sample of 4 has a mean between 49 and 52 Answer: -.4 .8

Chapter 9 The Central Limit Theorem
Normal table Chapter 9 The Central Limit Theorem Problem 1 Question 2: Find the probability that a random sample of 16 has a mean between 49 and 52. Answer

Chapter 9 The Central Limit Theorem
Normal table Chapter 9 The Central Limit Theorem Problem 2: The amount of time per day spent by adults watching TV is normally distributed with m=6 and s=1.5 hours. Question 1: What is the probability that a randomly selected adult watches TV for more than 7 hours a day? Answer: Question 2: What is the probability that 5 adults watch TV on the average 7 or more hours? Answer:

Chapter 9 The Central Limit Theorem
Normal table Chapter 9 The Central Limit Theorem Problem 2: Question 3: What is the probability that the total time of watching TV of the five adults will not exceed 28 hours? Answer: Question 4: What total TV watching time is exceeded by only 3% of the population for samples of 5 adults? Comments: 1.Excel returns X for a given left hand tail probability = 1.5/5.5

Chapter 9 The Central Limit Theorem
Normal table Chapter 9 The Central Limit Theorem Problem 3: Assume that the monthly rents paid by students in a particular town is \$350 with a standard deviation of \$40. A random sample of 100 students who rented apartments was taken. Question1: What is the probability that the sample mean of the monthly rent exceeds \$355?

Chapter 9 The Central Limit Theorem
Normal table Chapter 9 The Central Limit Theorem Problem 3 - continued Question2: What is the probability that the total revenue from renting 10 randomly selected apartments falls between 3300 and 3700 dollars? 40/10.5 =

Chapter 9 The Central Limit Theorem
Normal table Chapter 9 The Central Limit Theorem Problem 3 - continued Question3: Let’s assume the population mean was unknown, but the standard deviation was known to be \$40. A sample of 100 rentals was selected in order to estimate the mean monthly rent paid by the whole student population. What is the probability that the sample mean differ from the actual mean by more than \$5? How about more than \$10?

Chapter 9 The Central Limit Theorem
Problem 3 – continued

Chapter 9 Sampling distribution of the sample proportion
In a sample of size n, if np > 5 and n(1-p) > 5, then the sample proportion p = x/n is approximately normally distributed with the following parameters: ^ (Assumption: The population is sufficiently large. No correction is needed in the calculation of the variance).

Sampling distribution of the sample proportion
Problem 4: A commercial of a household appliances manufacturer claims that less than 5% of all of its products require a service call in the first year. A survey of 400 households that recently purchased the manufacturer products was conducted to check the claim.

Sampling distribution of the sample proportion
Normal table Sampling distribution of the sample proportion Problem 4 - Continued: Assuming the manufacturer is right, what is the probability that more than 10% of the surveyed households require a service call within the first year? If indeed 10% of the sampled households reported a call for service within the first year, what does it tell you about the the manufacturer claim?

Sampling Distribution of the Difference Between two Means
If two independent variables are normally distributed with means and variances m1, s21, and m2, s22 respectively, then x1 – x2 is also normally distributed with:

Sampling Distribution of the Difference Between two Means
When at least one of the populations is not normally distributed but the samples sizes are both at least 30, x1 – x2 is approximately normally distributed, with a mean and a variance as indicated above.

Sampling Distribution of the Difference Between two Means
Example: A national TV telethon committee is interested in determining whether donations made by males are on the average larger than those made by females by \$4. Two samples of 25 males and 25 females were selected, and the donations made recorded. If the standard deviations of the male and female populations are \$2.4 and \$1.8 respectively, what is the probability that sample mean of the male donations exceeds the sample mean of the female donations by at least \$5? Assume donations for the two populations are normally distributed.

Sampling Distribution of the Difference Between two Means
Solution For males For females

Chapter 10 Introduction to Estimation
A population’s parameter can be estimated by a point estimator and by an interval estimator. A confidence interval with 1-a confidence level is an interval estimator that covers the estimated parameters (1-a)% of the time. Confidence intervals are constructed using sampling distributions.

Confidence interval of the mean – Known Variance
We use the central limit theorem to build the following confidence interval za/2 -za/2 a/2 1 - a

Confidence interval of the mean – Known Variance
Problem 5: How many classes university students miss each semester? A survey of 100 students was conducted. (See Data next) Assuming the standard deviation of the number of classes missed is 2.2, estimate the mean number of classes missed per student. Use 99% confidence level.

Confidence interval of the mean – Known Variance
Data Confidence interval of the mean – Known Variance Solution = = 1- a = .99 a = .01 a/2 = .005 Za/2 = Z.005= 2.575 LCL = 9.64, UCL = 10.78 You can used Data Analysis Plus > Z-Estimate: Mean

Confidence interval of the mean – Known Variance
Data Confidence interval of the mean – Known Variance Solution (using Data Analysis Plus): Shade the data set (you may include the title label) Select Data Analysis Plus, then “Z-Estimate: Mean” Type in the sigma (2.2), check Labels (if appropriate), type in alpha (.01), click OK.

Selecting the sample size
The shorter the confidence interval, the more accurate the estimate. We can, therefore, limit the width of the interval to 2W, and get From here we have W is called “Margin of error”, or “Bound on the error estimate”

Selecting the sample size
Problem 6 An operation manager wants to estimate the average amount of time needed by a worker to assemble a new electronic component. Sigma is known to be 6 minutes. The required estimate accuracy is within 20 seconds. The confidence level is 90%; 95%. Find the sample size.

Selecting the sample size
Solution s = 6 min; W = 20 sec = 1/3 min; 1 - a =.90 Za/2 = Z.05 = 1.645 1-a = .95, Za/2 = Z.025 = 1.96

Chapter 11 Hypotheses tests
In hypothesis tests we hypothesize on a value of a population parameter, and test to see if there is sufficient evidence to support our belief. The structure of hypotheses test Formulate two hypotheses. H0: The one we try to reject in favor of … H1: The alternative hypothesis, the one we try to prove. Define a significance level a.

Hypotheses tests a= P(reject H0 when H0 is true)
The significance level is the probability of erroneously reject the null hypothesis. a= P(reject H0 when H0 is true) Sample from the population and calculate a statistic that provides an indication whether or not the parameter value under H1 is more likely to be true. We shall test the population mean assuming the standard deviation is known.

Hypotheses tests of the Mean – Known Variance
Problem 7: A machine is set so that the average diameter of ball bearings it produces is .50 inch. In a sample of 100 ball bearings the mean diameter was .51 inch. Assuming the standard deviation is .05 inch, can we conclude at 5% significance level that the mean diameter is not .50 inch.

Hypotheses tests of the Mean – Known Variance
Solution: The population studied is the ball-bearing diameters. We hypothesize on the population mean. A good point estimator for the population mean is the sample mean. We use the distribution of the sample mean to build a sample statistic to test whether m = .50 inch.

Hypotheses tests of the Mean – Known Variance
Solution – (A Two Tail rejection region) Define the hypotheses: H0: m = .50 H1: m = .50 The probability of conducting a type one error

Hypotheses tests of the Mean – Known Variance
Solution - A Two Tail rejection region Critical Z Z.025 = 1.96 (obtained from the Z-table) Build a rejection region: Zsample> Za/2, or Zsample<-Za/2 -1.96 1.96 Calculate the value of the sample Z statistic and compare it to the critical value Since 2 > 1.96, there is sufficient evidence to reject H0 in favor of H1 at 5% significance level.

Hypotheses tests of the Mean – Known Variance
Solution - A Two Tail rejection region We can perform the test in terms of the mean value. Let us find the critical mean values for rejection XL2=m0 + Z = (.05)/(100)1/2=.5098 XL1=m0 - Z = (.05)/(100)1/2=.402 Since.51 > .5098, there is sufficient evidence to reject the null hypothesis at 5% significance level.

Hypotheses tests of the Mean – Known Variance
Calculate the p value of this test Solution p-value = P(Z > Zsample) + P(Z < -Zsample) = P(Z > 2) + P(Z < -2) = 2P(Z > 2) = 2[ } = Since < .05, H0 is rejected.

Hypotheses tests of the Mean – Known Variance
Problem 8 The average annual return on investment for American banks was found to be 10.2% with standard deviation of 0.8%. It is believed that banks that exercise comprehensive planning do better. A sample of 26 banks that exercise comprehensive training provide the following result: Mean return = 10.5% Can we infer that the belief about bank performance is supported at 10% significance level by this sample result?

Hypotheses tests of the Mean – Known Variance
Data Hypotheses tests of the Mean – Known Variance Solution: (A right Hand Tail Rejection region) The population tested is the “annual rate of return”. H0: m = 10.2 H1: m > 10.2 Let us perform the test with the standardized rejection region approach: Zsample > Z.10 (Right hand tail rejection region) Z.10 = Reject H0 if Zsample > 1.28

Hypotheses tests of the Mean – Known Variance
Conclusion At 10% significance level there is sufficient evidence in the data to reject H0 in favor of H1, since the sample statistic falls inside the rejection region. Interpretation: If we are willing to accept 10% chance of making the wrong conclusion, we can conclude banks conducting comprehensive training perform better than banks who do not.

Hypotheses tests of the Mean – Known Variance
Data Hypotheses tests of the Mean – Known Variance Let us perform the test with the p-value method: P(X > 10.5 given that m = 10.2) = P(Z > (10.5 – 10.2)/[.8/(26)1/2] = P(Z > 1.91) = = .0281 Since < .10 we reject the null hypothesis at 10% significance level.

Hypotheses tests of the Mean – Known Variance
Note the equivalence between the standardized method or the rejection region method and the p-value method. P(Z>Z.10) = .10 Z10 = 1.28 .10 The statement “p-value is smaller than alpha, is equivalent to the statement “ the test statistic falls in the rejection region” .0281 1.28 1.91

Hypotheses tests of the Mean – Known Variance
Problem 9 In the midst of labor-management negotiations, the president of a company argues that the company’s blue collar workers, who are paid an average of \$30K a year, are well-paid because the mean annual pay for blue-collar workers in the country is less than \$30K. This figure is disputed by the union. To test the president’s belief an arbitrator draws a random sample of 350 blue-collar workers from across the country and their income recorded (see file Salaries). If the arbitrator assumes that income is normally distributed with a standard deviation of \$8,000, can it be inferred at 5% significance level that the company’s president is correct?

Hypotheses tests of the Mean – Known Variance
Data Hypotheses tests of the Mean – Known Variance Solution (A left Hand Tail Rejection Region) The population tested is the ann. Salary H0: m = 30K H1: m < 30K Left hand Tail Rejection region: Z < -Z.05 or Z < ZSample =(29, ,000)/(8,000/350.5)= Since –2.059 < there is sufficient evidence to infer that on the average blue collar workers’ income is lower than \$30K at 5% significance level.

Hypotheses tests of the Mean – Known Variance
Calculate the p-value of this test: Solution p-value = P(Z < Zsample) = P(Z < )

Type II Error Problem 7a Calculate b for the two-tail hypotheses test performed in problem 7, when the actual mean diameter is .515 inch. Solution The rejection region in terms of the critical values of the sample mean was found before: XL1 = .402; XL2 = b = P(Do not reject H0 when H1 is true) = P(.402 < < when m = .515) = P( )/[.05/(100).5] < Z < ( )/[.05/(100).5] P(-22.6 < Z < -1.04) = P(1.04 < Z < 22.6) = = = .1492 This large probability may be reduced by taking larger samples H0: m = .500 H1: m = .515 P(Z<22.6) – P(Z<1.04) ≈ 1-P(Z<1.04)

Ch 12: Inference when the Variance is Unknown
Generally, the variance may be unknown In this case we change the test statistic from “Z” to “t”, when testing the population mean. To test the population proportion we’ll use the normal distribution (under certain conditions).

Testing the mean – unknown variance
Replace the statistic Z with “t” The original distribution must be normal (or at least mound shaped).

Testing the mean – unknown variance
Problem 10 A federal agency inspects packages to determine if the contents is at least as large as that advertised. A random sample of (i)5, (ii)50 containers whose packaging states that the weight was 8.04 ounces was drawn. (data is provided later) From the sample results… Can we conclude that the average weight does not meet the weight stated? (use a = .05). Estimate the mean weight of all containers with 99% confidence What assumption must be met?

Testing the mean – unknown variance
Solution We hypothesize on the mean weight. H0: m = 8.04 H1: m < 8.04 (i) n=5. For small samples let us solve manually Assume the sample was: 8.07, 8.03, 7.99, 7.95, 7.94 The rejection region: t < -ta, n-1 = -t.05,5-1 = The tsample = ? Mean = (8.07+…+7.94)/5 = Std. Dev.={[( )2+…+( )2]/4}1/2 = 0.054 -2.132

Testing the mean – unknown variance
The tsample is calculated as follows: Since > the sample statistic does not fall in the rejection region. There is insufficient evidence to conclude that the mean weight is smaller than 8, at 5% significance level. -2.132 -.165

Testing the mean – unknown variance
(ii) n=50. To calculate the sample statistics we use Excel, “Descriptive statistics” from the Tools>Data analysis menu. From the sample we obtain: Mean = 8.02; Std. Dev. = .04 The confidence interval is calculated by = = t.005,50-1 = about from the t - table 1-a = .99 a = .01 a/2 = .005 LCL = 8.005, UCL = 8.35

Testing the mean – unknown variance
Data Testing the mean – unknown variance Comments Check whether it appears that the distribution is normal

Using Excel: To obtain an exact value for t use the TINV function:
Data Using Excel: To obtain an exact value for t use the TINV function: The exact value: Degrees of freedom =TINV(0.01,49) .01 is the two tail probability = .005*2

Testing the mean – unknown variance
Problem 11 Engineers in charge of the production of car seats are concerned about the compliance of the springs used with design specifications. Springs are designed to be 500mm long. Springs too long or too short must be reworked. A standard deviation of 2mm in springs length will result in an acceptable number of reworked springs. A sample of 100 springs was taken and measured.

Testing the mean – unknown variance
Data Testing the mean – unknown variance Problem – continued Can we infer at 10% significance level that the mean spring length is not 500mm? Solution H0: m = Since the standard deviation is unknown H1: m ¹ We need to run a t-test, assuming the spring length is normally distributed. Rejection region: t < -ta/2 or t > ta/2 with d.f. = 99 t < or t > -.12

The test and the confidence interval are based on the approximated normal distribution of the sample proportion, if np>5 and n(1-p)>5. For the confidence interval of p we have: where p = x/n For the hypotheses test, we use a Z test. ^

Problem 12 (problem 11 continued). The engineers were interested in the percentage of springs that are the correct length. They marked each spring in the sample as Correct – 1; Too long – 2; Too short – 3; Can we infer that less than 90% of the springs are the correct length, at 10% sig level?

Data Inference about a population proportion Problem 12 - Solution H0: p = .9 H1: p < .9 Rejection region: Z < -Za, or Z < -1.28 Conclusion: Since –1.33 < we can infer that less than 90% of the springs do not need reworking.

Data Inference about a population proportion Problem 12 – solution continued Let us estimate the proportion of good springs at 99% confidence level.

Problem 12 – solution continued Find the sample size if the proportion of good springs is to be estimated to within Consider the given sample an initial sample.

Problem 13 A consumer protection group runs a survey of 400 dentists to check a claim that more than 4 out of 5 dentists recommend ingredients included in a certain toothpaste. The survey results are as follows: 71 – No; 329 – Yes At 5% significance level, can the consumer group infer that the claim is true?

Problem 13 - Solution The two hypotheses are: H0: p = .8 H1: p > .8 Z.05 = 1.645 Conclusion: Since < the consumer group cannot confirm the claim at 5% significance level. The rejection region: Z > Za

Summary Example An automotive expert claims that the large number of self-serve gas stations has resulted in poor automobile maintenance, and that the average tire pressure is more than 4.5 psi below it’s manufacturer specifications. A random sample of 50 tires revealed the results stored in the file TirePressure. Assume the tire pressure is normally distributed with s = 1.5 psi, and answer the following questions:

Tire Pressure Summary Example At 10% significance level can we infer that the expert is correct? What is the p value? Solution The Hypotheses: H0: m = 4.5 H1: m > The rejection region: Z > Z.10 or Z > From the data we have: mean = 5.04, so Z=(5.04 – 4.5)/(1.5/50.5) = 2.545 Since > 1.28, there is sufficient evidence to infer that the expert is correct. The p value = P(Sample Mean > 5.04 when m = 4.5)= P(Z > 2.545) = = .0055

Summary Example Find the probability of making a type II error when the actual tire under-inflation is 5 psi on the average. Solution The Rejection Region in terms of the sample means is found first: ZL= 1.28 =(XL – 4.5)/(1.5/50.5). XL= (1.5/50.5) = So, the Rejection Region is: Sample mean > b = P(accept H0 when H1 is true) = P(sample mean does not fall in the RR, when m = 5) = P( < 4.77 when m = 5) = P(Z < (4.77-5)/(1.5/50.5)) = P(Z < -1.08) = From Excel: [=NORMSDIST(-1.077)] = .1407

The following statistic is c2 (Chi squared) distributed with n-1 degrees of freedom: We use this relationship to test and estimate the variance.

The Hypotheses tested are: The rejection region is:

Testing the Variance Problem 15
Engineers in charge of the production of car seats are concerned about the compliance of the springs used with design specifications. Springs are designed to be 500mm long. Springs too long or too short must be reworked. A standard deviation of 2mm in springs length will result in an acceptable number of reworked springs. A sample of 100 springs was taken and measured.

Data Testing the Variance Problem 15 - continued Can we infer at 10% significance level that the number of springs requiring reworking is unacceptably large? H0: s2 = 4 H1: s2 > 4 The number of springs requiring reworking depends on the standard deviation, or the variance. Rejection region: c2Sample > c2a d.f. = 99 c2Sample >

Testing the Variance Problem 15 - conclusion Since > , we can infer at 10% significance level that the standard deviation is greater than 2, thus the number of springs that require reworking is unacceptably large.

Testing the Variance Problem 16
A random sample of 100 observations was taken from a normal population. The sample variance was Can we infer at 2.5% significance level that the population variance DOES NOT exceeds 30? Estimate the population variance with 90% confidence.

Testing the Variance ! Problem – 16: Solution: H0:s2 = 30
Rejection region: c2 < c21-a, n-1 c2 < 73.36 (n – 1)s2 s02 (100 – 1)29.76 30 !

Testing the Variance Problem 16 - conclusion Since > we conclude that there is insufficient evidence at 2.5% significance level to infer that the variance is smaller than 30.

Using Excel We can get an exact value of the probability P(c2d.f.> c2) = ? for a given c2 and known d.f., and then determine the p-value. Use the CHIDIST function: For example: = That is: P(c299> ) = In our example we had a left hand tail rejection region, and therefore the p-value is P(c299 < ) = = > .025 =CHIDIST(c2,d.f.) = CHIDIST(98.208,99)

Using Excel We can get the exact c2 value for which P(c2d.f.> c2) = a, for any given probability a and known d.f., then define the rejection region: Use the CHIINV function For example: =CHIINV(.975,99) = 73.36 That is: P(c299 > ?) = c2 = The rejection region is: c2 < =CHIINV(a,d.f.)