Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Review #2 Chapter 9 Chapter 10 Chapter 11 and 12.

Similar presentations


Presentation on theme: "1 Review #2 Chapter 9 Chapter 10 Chapter 11 and 12."— Presentation transcript:

1 1 Review #2 Chapter 9 Chapter 10 Chapter 11 and 12

2 2 Chapter 9 Sampling Distributions A statistic is a random variable describing a characteristic of a random samples. –Sample mean –Sample variance We use statistic values in inferential statistics (make inference about population characteristics from sample characteristics). Statistics have distributions of their own.

3 3 Chapter 9 The Central Limit Theorem The distribution of the sample mean is normal if the parent distribution is normal. The distribution of the sample mean approaches the normal distribution for sufficiently large samples (n 30), even if the parent distribution is not normal. The parameters of the sample distribution of the mean are: –Mean: –Standard deviation: ( Assumption: The population is sufficiently large. No correction is needed in the calculation of the variance).

4 4 Chapter 9 The Central Limit Theorem Problem 1 (Using Excel) Given a normal population whose mean is 50 and whose standard deviation is 5, –Question 1: Find the probability that a random sample of 4 has a mean between 49 and 52 –Answer: -.4.8

5 5 Chapter 9 The Central Limit Theorem Problem 1 (Using the table) Given a normal population whose mean is 50 and whose standard deviation is 5, –Question 1: Find the probability that a random sample of 4 has a mean between 49 and 52 –Answer: Normal table

6 6 Chapter 9 The Central Limit Theorem Problem 1 –Question 2: Find the probability that a random sample of 16 has a mean between 49 and 52. Answer Normal table

7 7 Problem 2: The amount of time per day spent by adults watching TV is normally distributed with =6 and =1.5 hours. –Question 1: What is the probability that a randomly selected adult watches TV for more than 7 hours a day? –Answer: –Question 2: What is the probability that 5 adults watch TV on the average 7 or more hours? Answer: Chapter 9 The Central Limit Theorem Normal table

8 8 Problem 2: –Question 3: What is the probability that the total time of watching TV of the five adults will not exceed 28 hours? –Answer: –Question 4: What total TV watching time is exceeded by only 3% of the population for samples of 5 adults? Chapter 9 The Central Limit Theorem Comments: 1.Excel returns X for a given left hand tail probability = 1.5/5.5 Normal table

9 9 Problem 3: Assume that the monthly rents paid by students in a particular town is $350 with a standard deviation of $40. A random sample of 100 students who rented apartments was taken. Question1: What is the probability that the sample mean of the monthly rent exceeds $355? Chapter 9 The Central Limit Theorem Normal table

10 10 Problem 3 - continued Question2: What is the probability that the total revenue from renting 10 randomly selected apartments falls between 3300 and 3700 dollars? Chapter 9 The Central Limit Theorem 40/10.5 = Normal table

11 11 Problem 3 - continued Question3: Lets assume the population mean was unknown, but the standard deviation was known to be $40. A sample of 100 rentals was selected in order to estimate the mean monthly rent paid by the whole student population. What is the probability that the sample mean differ from the actual mean by more than $5? How about more than $10? Chapter 9 The Central Limit Theorem Normal table

12 12 Problem 3 – continued Chapter 9 The Central Limit Theorem

13 13 Chapter 9 Sampling distribution of the sample proportion In a sample of size n, if np > 5 and n(1-p) > 5, then the sample proportion p = x/n is approximately normally distributed with the following parameters: ^ (Assumption: The population is sufficiently large. No correction is needed in the calculation of the variance).

14 14 Sampling distribution of the sample proportion Problem 4: –A commercial of a household appliances manufacturer claims that less than 5% of all of its products require a service call in the first year. –A survey of 400 households that recently purchased the manufacturer products was conducted to check the claim.

15 15 Problem 4 - Continued: Assuming the manufacturer is right, what is the probability that more than 10% of the surveyed households require a service call within the first year? If indeed 10% of the sampled households reported a call for service within the first year, what does it tell you about the the manufacturer claim? Sampling distribution of the sample proportion Normal table

16 16 Sampling Distribution of the Difference Between two Means If two independent variables are normally distributed with means and variances, and respectively, then x 1 – x 2 is also normally distributed with:

17 17 When at least one of the populations is not normally distributed but the samples sizes are both at least 30, x 1 – x 2 is approximately normally distributed, with a mean and a variance as indicated above. Sampling Distribution of the Difference Between two Means

18 18 Example: A national TV telethon committee is interested in determining whether donations made by males are on the average larger than those made by females by $4. Two samples of 25 males and 25 females were selected, and the donations made recorded. If the standard deviations of the male and female populations are $2.4 and $1.8 respectively, what is the probability that sample mean of the male donations exceeds the sample mean of the female donations by at least $5? Assume donations for the two populations are normally distributed. Sampling Distribution of the Difference Between two Means

19 19 Solution Sampling Distribution of the Difference Between two Means For males For females

20 20 Chapter 10 Introduction to Estimation A populations parameter can be estimated by a point estimator and by an interval estimator. A confidence interval with 1- confidence level is an interval estimator that covers the estimated parameters (1- )% of the time. Confidence intervals are constructed using sampling distributions.

21 21 Confidence interval of the mean – Known Variance We use the central limit theorem to build the following confidence interval z /2 -z /2 /2 1 -

22 22 Problem 5: How many classes university students miss each semester? A survey of 100 students was conducted. (See Data next) Assuming the standard deviation of the number of classes missed is 2.2, estimate the mean number of classes missed per student. Use 99% confidence level. Confidence interval of the mean – Known Variance

23 23 –Solution = = =.99 =.01 /2 =.005 Z a/2 = Z.005 = LCL = 9.64, UCL = You can used Data Analysis Plus > Z-Estimate: Mean Confidence interval of the mean – Known Variance Data

24 24 –Solution (using Data Analysis Plus): Shade the data set (you may include the title label) Select Data Analysis Plus, then Z-Estimate: Mean Type in the sigma (2.2), check Labels (if appropriate), type in alpha (.01), click OK. Confidence interval of the mean – Known Variance Data

25 25 Selecting the sample size The shorter the confidence interval, the more accurate the estimate. We can, therefore, limit the width of the interval to 2W, and get From here we have W is called Margin of error, or Bound on the error estimate

26 26 Problem 6 An operation manager wants to estimate the average amount of time needed by a worker to assemble a new electronic component. Sigma is known to be 6 minutes. The required estimate accuracy is within 20 seconds. The confidence level is 90%; 95%. Find the sample size. Selecting the sample size

27 27 –Solution = 6 min; W = 20 sec = 1/3 min; 1 - =.90 Z /2 = Z.05 = =.95, Z /2 = Z.025 = 1.96 Selecting the sample size

28 28 Chapter 11 Hypotheses tests –In hypothesis tests we hypothesize on a value of a population parameter, and test to see if there is sufficient evidence to support our belief. –The structure of hypotheses test Formulate two hypotheses. –H 0 : The one we try to reject in favor of … –H 1 : The alternative hypothesis, the one we try to prove. Define a significance level

29 29 Hypotheses tests –The significance level is the probability of erroneously reject the null hypothesis. = P(reject H 0 when H 0 is true) –Sample from the population and calculate a statistic that provides an indication whether or not the parameter value under H 1 is more likely to be true. –We shall test the population mean assuming the standard deviation is known.

30 30 Problem 7: A machine is set so that the average diameter of ball bearings it produces is.50 inch. In a sample of 100 ball bearings the mean diameter was.51 inch. Assuming the standard deviation is.05 inch, can we conclude at 5% significance level that the mean diameter is not.50 inch. Hypotheses tests of the Mean – Known Variance

31 31 Solution: The population studied is the ball-bearing diameters. –We hypothesize on the population mean. –A good point estimator for the population mean is the sample mean. –We use the distribution of the sample mean to build a sample statistic to test whether =.50 inch. Hypotheses tests of the Mean – Known Variance

32 32 Solution – (A Two Tail rejection region) –Define the hypotheses: H 0 : =.50 H 1 : =.50 The probability of conducting a type one error Hypotheses tests of the Mean – Known Variance

33 33 Calculate the value of the sample Z statistic and compare it to the critical value Z.025 = 1.96 (obtained from the Z-table) Build a rejection region: Z sample > Z /2, or Z sample <-Z /2 Critical Z Since 2 > 1.96, there is sufficient evidence to reject H 0 in favor of H 1 at 5% significance level Hypotheses tests of the Mean – Known Variance Solution - A Two Tail rejection region

34 34 We can perform the test in terms of the mean value. Let us find the critical mean values for rejection X L2 = 0 + Z.025 = (.05)/(100) 1/2 =.5098 X L1 = 0 - Z.025 = (.05)/(100) 1/2 =.402 Since.51 >.5098, there is sufficient evidence to reject the null hypothesis at 5% significance level. Hypotheses tests of the Mean – Known Variance Solution - A Two Tail rejection region

35 35 Calculate the p value of this test Solution p-value = P(Z > Z sample ) + P(Z 2) + P(Z 2) = 2[ } =.0456 Since.0456 <.05, H 0 is rejected. Hypotheses tests of the Mean – Known Variance

36 36 Problem 8 –The average annual return on investment for American banks was found to be 10.2% with standard deviation of 0.8%. –It is believed that banks that exercise comprehensive planning do better. –A sample of 26 banks that exercise comprehensive training provide the following result: Mean return = 10.5% –Can we infer that the belief about bank performance is supported at 10% significance level by this sample result? Hypotheses tests of the Mean – Known Variance

37 37 Solution: (A right Hand Tail Rejection region) The population tested is the annual rate of return. –H 0 : = 10.2 –H 1 : > 10.2 Let us perform the test with the standardized rejection region approach: Z sample > Z.10 (Right hand tail rejection region) Z.10 = Reject H 0 if Z sample > 1.28 Hypotheses tests of the Mean – Known Variance Data

38 38 Conclusion –At 10% significance level there is sufficient evidence in the data to reject H 0 in favor of H 1, since the sample statistic falls inside the rejection region. Interpretation: –If we are willing to accept 10% chance of making the wrong conclusion, we can conclude banks conducting comprehensive training perform better than banks who do not. Hypotheses tests of the Mean – Known Variance

39 39 Let us perform the test with the p-value method: P(X > 10.5 given that = 10.2) = P(Z > (10.5 – 10.2)/[.8/(26) 1/2 ] = P(Z > 1.91) = =.0281 Since.0281 <.10 we reject the null hypothesis at 10% significance level. Hypotheses tests of the Mean – Known Variance Data

40 40 Note the equivalence between the standardized method or the rejection region method and the p-value method. P(Z>Z.10 ) =.10 Z 10 = Hypotheses tests of the Mean – Known Variance The statement p-value is smaller than alpha, is equivalent to the statement the test statistic falls in the rejection region

41 41 Problem 9 –In the midst of labor-management negotiations, the president of a company argues that the companys blue collar workers, who are paid an average of $30K a year, are well-paid because the mean annual pay for blue-collar workers in the country is less than $30K. –This figure is disputed by the union. To test the presidents belief an arbitrator draws a random sample of 350 blue- collar workers from across the country and their income recorded (see file Salaries). –If the arbitrator assumes that income is normally distributed with a standard deviation of $8,000, can it be inferred at 5% significance level that the companys president is correct? Hypotheses tests of the Mean – Known Variance

42 42 Solution (A left Hand Tail Rejection Region) The population tested is the ann. Salary –H 0 : = 30K H 1 : < 30K –Left hand Tail Rejection region: Z < -Z.05 or Z < Z Sample =(29, ,000)/(8,000/350.5 )= Since –2.059 < there is sufficient evidence to infer that on the average blue collar workers income is lower than $30K at 5% significance level. Hypotheses tests of the Mean – Known Variance Data

43 43 Calculate the p-value of this test: Solution p-value = P(Z < Z sample ) = P(Z < ) Hypotheses tests of the Mean – Known Variance

44 44 Problem 7a Calculate for the two-tail hypotheses test performed in problem 7, when the actual mean diameter is.515 inch. Solution –The rejection region in terms of the critical values of the sample mean was found before: X L1 =.402; X L2 = = P(Do not reject H 0 when H 1 is true) = P(.402 < <.5098 when =.515) = P( ) [.05/(100).5 ] < Z < ( ) [.05/(100).5 ] P(-22.6 < Z < -1.04) = P(1.04 < Z < 22.6) = = =.1492 –This large probability may be reduced by taking larger samples Type II Error H 0 : =.500 H 1 : =.515 P(Z<22.6) – P(Z<1.04) 1-P(Z<1.04)

45 45 Ch 12: Inference when the Variance is Unknown Generally, the variance may be unknownGenerally, the variance may be unknown In this case we change the test statistic from Z to t, when testing the population mean.In this case we change the test statistic from Z to t, when testing the population mean. To test the population proportion well use the normal distribution (under certain conditions).To test the population proportion well use the normal distribution (under certain conditions).

46 46 Testing the mean – unknown variance Replace the statistic Z with t The original distribution must be normal (or at least mound shaped).

47 47 Problem 10 –A federal agency inspects packages to determine if the contents is at least as large as that advertised. –A random sample of (i)5, (ii)50 containers whose packaging states that the weight was 8.04 ounces was drawn. (data is provided later) –From the sample results… Can we conclude that the average weight does not meet the weight stated? (use =.05). Estimate the mean weight of all containers with 99% confidence What assumption must be met? Testing the mean – unknown variance

48 48 Solution –We hypothesize on the mean weight. H 0 : = 8.04 H 1 : < 8.04 (i) n=5. For small samples let us solve manually Assume the sample was: 8.07, 8.03, 7.99, 7.95, 7.94 –The rejection region: t < -t n = -t.05,5-1 = The t sample = ? –Mean = (8.07+…+7.94)/5 = Std. Dev.={[( ) 2 +…+( ) 2 ]/4} 1/2 = Testing the mean – unknown variance

49 49 –The t sample is calculated as follows: –Since > the sample statistic does not fall in the rejection region. There is insufficient evidence to conclude that the mean weight is smaller than 8, at 5% significance level Testing the mean – unknown variance

50 50 –(ii) n=50. To calculate the sample statistics we use Excel, Descriptive statistics from the Tools>Data analysis menu. From the sample we obtain: Mean = 8.02; Std. Dev. =.04 –The confidence interval is calculated by = = t.005,50-1 = about from the t - table 1- =.99 =.01 /2 =.005 Testing the mean – unknown variance LCL = 8.005, UCL = 8.35

51 51 Comments –Check whether it appears that the distribution is normal Testing the mean – unknown variance Data

52 52 –To obtain an exact value for t use the TINV function: The exact value: Using Excel: =TINV(0.01,49).01 is the two tail probability =.005*2 Degrees of freedom Data

53 53 Problem 11 –Engineers in charge of the production of car seats are concerned about the compliance of the springs used with design specifications. –Springs are designed to be 500mm long. Springs too long or too short must be reworked. A standard deviation of 2mm in springs length will result in an acceptable number of reworked springs. –A sample of 100 springs was taken and measured. Testing the mean – unknown variance

54 54 Problem – continued –Can we infer at 10% significance level that the mean spring length is not 500mm? Solution H 0 : 500 Since the standard deviation is unknown H 1 : 500 We need to run a t-test, assuming the spring length is normally distributed. Rejection region: t t /2 with d.f. = 99 t Testing the mean – unknown variance Data

55 55 Inference about a population proportion The test and the confidence interval are based on the approximated normal distribution of the sample proportion, if np>5 and n(1-p)>5. For the confidence interval of p we have: where p = x/n For the hypotheses test, we use a Z test. ^

56 56 Problem 12 (problem 11 continued). The engineers were interested in the percentage of springs that are the correct length. They marked each spring in the sample as –Correct – 1; –Too long – 2; –Too short – 3; Can we infer that less than 90% of the springs are the correct length, at 10% sig. level? Inference about a population proportion

57 57 Problem 12 - Solution –H 0 : p =.9 H 1 : p <.9 –Rejection region: Z < -Z or Z < Conclusion: Since –1.33 < we can infer that less than 90% of the springs do not need reworking. Inference about a population proportion Data

58 58 Problem 12 – solution continued –Let us estimate the proportion of good springs at 99% confidence level. Inference about a population proportion Data

59 59 Problem 12 – solution continued –Find the sample size if the proportion of good springs is to be estimated to within.035. Consider the given sample an initial sample. Inference about a population proportion

60 60 Problem 13 –A consumer protection group runs a survey of 400 dentists to check a claim that more than 4 out of 5 dentists recommend ingredients included in a certain toothpaste. –The survey results are as follows: 71 – No; 329 – Yes –At 5% significance level, can the consumer group infer that the claim is true? Inference about a population proportion

61 61 Problem 13 - Solution –The two hypotheses are: H 0 : p =.8 H 1 : p >.8 Z.05 = Conclusion: Since < the consumer group cannot confirm the claim at 5% significance level. The rejection region: Z > Z Inference about a population proportion

62 62 Summary Example An automotive expert claims that the large number of self-serve gas stations has resulted in poor automobile maintenance, and that the average tire pressure is more than 4.5 psi below its manufacturer specifications. A random sample of 50 tires revealed the results stored in the file TirePressure. Assume the tire pressure is normally distributed with = 1.5 psi, and answer the following questions:

63 63 Solution –The Hypotheses: H 0 : = 4.5 H 1 : > 4.5 The rejection region: Z > Z or Z > From the data we have: mean = 5.04, so Z=(5.04 – 4.5)/(1.5/50.5 ) = –Since > 1.28, there is sufficient evidence to infer that the expert is correct. At 10% significance level can we infer that the expert is correct? What is the p value? Summary Example The p value = P(Sample Mean > 5.04 when = 4.5)= P(Z > 2.545) = =.0055 Tire Pressure

64 64 Find the probability of making a type II error when the actual tire under-inflation is 5 psi on the average. Solution The Rejection Region in terms of the sample means is found first: Z L = 1.28 =(X L – 4.5)/(1.5/50.5 ). X L = (1.5/50.5 ) = So, the Rejection Region is: Sample mean > = P(accept H 0 when H 1 is true) = P(sample mean does not fall in the RR, when = 5) = P( < 4.77 when = 5) = P(Z < (4.77-5)/(1.5/50.5 )) = P(Z < -1.08) = From Excel: [=NORMSDIST(-1.077)] =.1407 Summary Example

65 65 Inference about the population Variance The following statistic is 2 (Chi squared) distributed with n-1 degrees of freedom: We use this relationship to test and estimate the variance.

66 66 Inference about the population Variance The Hypotheses tested are: The rejection region is:

67 67 Testing the Variance Problem 15 Engineers in charge of the production of car seats are concerned about the compliance of the springs used with design specifications. Springs are designed to be 500mm long. –Springs too long or too short must be reworked. –A standard deviation of 2mm in springs length will result in an acceptable number of reworked springs. A sample of 100 springs was taken and measured.

68 68 Testing the Variance Problem 15 - continued Can we infer at 10% significance level that the number of springs requiring reworking is unacceptably large? H 0 : 2 = 4 H 1 : 2 > 4 The number of springs requiring reworking depends on the standard deviation, or the variance. Rejection region: 2 Sample > 2 d.f. = 99 2 Sample > Data

69 69 Testing the Variance Problem 15 - conclusion Since > , we can infer at 10% significance level that the standard deviation is greater than 2, thus the number of springs that require reworking is unacceptably large.

70 70 Testing the Variance Problem 16 A random sample of 100 observations was taken from a normal population. The sample variance was Can we infer at 2.5% significance level that the population variance DOES NOT exceeds 30? Estimate the population variance with 90% confidence.

71 71 Testing the Variance Problem – 16: Solution: H 0 : 2 = 30 H 1 : 2 < 30 2 = = = (n – 1)s 2 2 (100 – 1)29.76 Rejection region: 2 < 2 1-, n-1 2 < !

72 72 Testing the Variance Problem 16 - conclusion Since > we conclude that there is insufficient evidence at 2.5% significance level to infer that the variance is smaller than 30.

73 73 –We can get an exact value of the probability P( 2 d.f. > 2 ) = for a given 2 and known d.f., and then determine the p-value. –Use the CHIDIST function: For example: = That is: P( 2 99 > ) = –In our example we had a left hand tail rejection region, and therefore the p-value is P( = CHIDIST(98.208,99) Using Excel =CHIDIST( 2,d.f.)

74 74 Using Excel –We can get the exact 2 value for which P( 2 d.f. > 2 ) = for any given probability and known d.f., then define the rejection region: –Use the CHIINV function For example: =CHIINV(.975,99) = That is: P( 2 99 > ?) = = The rejection region is: 2 < =CHIINV(,d.f.)


Download ppt "1 Review #2 Chapter 9 Chapter 10 Chapter 11 and 12."

Similar presentations


Ads by Google