Inferences on a Population Mean

Presentation on theme: "Inferences on a Population Mean"— Presentation transcript:

Inferences on a Population Mean
TOPIC 8 Inferences on a Population Mean

Road Map Statistical Methods Descriptive Statistics
Statistical Inference Estimation Decision Making Decision Making

Estimation Process               Population
Random Sample I am 95% confident that  is between 40 & 60. Mean X= 50 Mean, , is unknown Sample 7

Estimating Population Parameters Using Sample Statistics
Point Estimates Estimating Population Parameters Using Sample Statistics Mean ( µ ) Standard Deviation (  ) S Proportion ( p ) These are single value estimates (point estimates) They do not tell us how close our estimate is to the actual unknown parameters As the sample statistic varies from sample to sample, an interval based on the value of the sample statistics provides an estimate of the population parameter.

Confidence limit (lower) Confidence limit (upper)
Key Elements of Interval Estimation Sample statistic (point estimate) Confidence interval Confidence limit (lower) Confidence limit (upper) A probability that the population parameter falls somewhere within the interval. 24

Sampling Distribution of Sample Mean
Intervals and Confidence Level Sampling Distribution of Sample Mean _ s x a /2 a /2 1 - a _ X m = m ` x (1 – α)% of intervals contain μ α% do not Intervals extend from X – ZσX to X + ZσX Large number of intervals

Interval Estimates By taking all possible samples of n and compute their sample means, you’ll see that 95% of the intervals will include the population mean, and only 5% of them will not. In other words, you have 95% confidence that the population mean is somewhere in your interval. This tells us how close the estimates are to the true population parameter

Confidence Level The common level of confidence used are 90%, 95% or 99% The level of confidence is symbolized by (1 – α) 100%, where α is the proportion of the tails, upper (α/2) and lower (α/2), outside the confidence interval. Confidence Level (1- α)100% α α/2 Critical Value Z α/2 90% 95% 99% 0.1 0.05 0.01 0.025 0.005 1.645 1.960 2.575

Intervals extend from X – ZX toX + ZX
Factors Affecting Interval Width Data dispersion Measured by  Intervals extend from X – ZX toX + ZX Sample size Have students explain why each of these occurs. Level of confidence can be seen in the sampling distribution. Level of confidence (1 – ) Affects Z

Confidence Interval for μ (σ Known)
Interval Estimation Unknown Known Assumptions: Population follows a normal distribution or Sample size is large (n ≥ 30) The value of Zα/2 changes according to the confidence level (90%,95% or 99%)

Example 2 liter You’re a Q/C inspector for Gallo. The  for 2-liter bottles is .05 liters. A random sample of 100 bottles showed x = 1.99 liters. What is the 90% confidence interval estimate of the true mean amount in 2-liter bottles?

Example Solution (Look up the table or find using the normal table) 52

Exercise The quality control manager at a light bulb factory needs to estimate the mean life of a large shipment of light bulbs. The standard deviation is 100 hours. A random sample of 64 light bulbs indicated a sample mean life of 350 hours. Construct a 95% confidence interval estimate of the population mean life of light bulbs in this shipment. Do you think that the manufacturer has the right to state that the light bulbs last an average of 400 hours? Explain. Must you assume that the population of light bulb life is normally distributed? Explain.

Exercise Solution a) b) No. c) Not necessary
(Look up the table or find using the normal table) 52

Confidence Interval for μ (σ Unknown)
Interval Estimation Unknown Known The sample standard deviation S will be a good estimator. Then the confidence interval is approximately Assumptions: Sample size is large (n ≥ 30)

Exercise A stationary store wants to estimate the mean retail value of greeting cards that it has in its inventory. A random sample of 100 greeting cards indicates a mean value of \$2.55 and a standard deviation of \$0.44 Construct a 95% confidence interval estimate of the mean value of all greeting cards in the store’s inventory. Suppose there were 2,500 greeting cards in the store’s inventory. How are the results above useful in assisting the store owner to estimate the total value of her inventory?

Exercise Solution a) b)
(Look up the table or find using the normal table) 52

Confidence Interval for μ (σ Unknown)
What if the sample size is small (less than 30). Instead of using the standard normal statistic which requires knowledge of a good approximation of σ, we define and use a statistic, which is known as Student’s t distribution. If the random variable X is normally distributed, then the following test statistic has a t distribution with n – 1 degrees of freedom ( df ) Please notice that some books, including our textbook, use student’s t statistic even for large sample size ( n ≥ 30). This applies also for next topic, hypothesis testing!

Student’s t Distribution
Standard Normal Bell-Shaped Symmetric ‘Fatter’ Tails t (df = 13) t (df = 5) Z t

Degrees of Freedom (df)
Number of observations that are free to vary after sample statistic has been calculated Example Sum of 3 numbers is 6 X = 1 (or any number) X = 2 (or any number) X = 3 (cannot vary) Sum = 6 degrees of freedom = n - 1 = = 2

t - Table Degrees of freedom, df = n - 1
Match ‘df’ and upper tail area, α/2 For example, ‘t’ value with 90% confidence level and sample size, n = 6 α = 10%, α/2 = 5% = 0.05 df = n - 1 = = 5 Match (5, 0.05) in the table ‘tα/2’ value = d.f 0.25 0.1 0.05 0.025 1 2 3 4 5 6 1.0000 3.0777 6.3137 0.8165 1.8856 2.9200 4.3027 0.7649 1.6377 2.3534 3.1824 0.7407 1.5332 2.1318 2.7765 0.7267 1.4759 2.0150 2.5706 0.7176 1.4398 1.9432 2.4469

Confidence Interval for μ (σ Unknown)
Assumptions: σ unknown Population follows a normal distribution Sample size is small (n < 30)

Example You’re a time study analyst in manufacturing. You’ve recorded the following task times (min.): 3.6, 4.2, 4.0, 3.5, 3.8, 3.1. What is the 90% confidence interval estimate of the population mean task time? Allow students about 20 minutes to solve.

Example Solution n = 6, df = n - 1 = 6 - 1 = 5 , t.05 = ±2.015
(Look up the table or find using the t table) 72

Exercise The following data represent the bounced check fee, in dollars, charged by a sample of 23 banks for direct- deposit customers who maintain a \$100 balance: Construct a 90% confidence interval for the population mean bounced check fee. What assumption is required for the interval to be valid? Is it reasonably satisfied?

Exercise Solution n = 23, df = n - 1 = 23 - 1 = 22 t.05 = ±1.717 a)
(Look up the table or find using the t table) b) It is assumed that the population distribution follows a normal distribution 72

Sample Size for Estimating µ
SE = Sampling Error I don’t want to sample too much or too little! Sample standard deviation S from prior sampling would be a good estimator. Please notice that the textbook that we use has different formula to determine the sample size. L is interval length 89

Example What sample size is needed to be 90% confident the mean is within  5? A pilot study suggested that the standard deviation is 45. 91

Road Map Statistical Methods Descriptive Statistics
Statistical Inference Estimation Estimation Decision Making

   Hypothesis Testing Process Population 
I believe the population mean age is 50 (hypothesis). Reject hypothesis! Not close. Population Mean X = 20 Random sample

We claim that the average mileage is 10 km/l
What is a Hypothesis? A Statement/s or claim about a population parameter ( μ , p , σ), developed for the purpose of testing. Hypothesis statement/s are made before analysis. We claim that the average mileage is 10 km/l There are two elements of hypothesis: Null hypothesis Alternative hypothesis

Null and Alternative Hypothesis
A Null Hypothesis is a statement that nothing unusual occurs or will occur. We begin with the assumption it is true Designated H0 Always has equality sign: = Alternative Hypothesis is the opposite of null hypothesis. Alternative hypothesis is a statement that something unusual occurs or will occur: Designated H1 or Ha Always has inequality sign: ≠ , < , or >

Example: Test 1 “Test that the population mean is not 5”
State the question statistically: μ ≠ 5 State the opposite statistically: μ = 5 Must be mutually exclusive & exhaustive Select the null hypothesis: H0: μ = 5 Hence the alternate hypothesis is: Ha: μ ≠ 5 (two tailed test)

Example: Test 2 “Test that the population mean is less than 5”
State the question statistically: μ < 5 State the opposite statistically: μ ≥ 5 Must be mutually exclusive & exhaustive Select the null hypothesis: H0: μ ≥ 5 Hence the alternate hypothesis is: Ha: μ < 5 (one tailed test / lower tailed test) “Test that the population mean is greater than 5” Null hypothesis: H0: μ ≤ 5 Alternate hypothesis: Ha: μ > 5 (one tailed test / upper tailed test)

Critical Value and Rejection Region
The dividing point (critical value) between the region where the null hypothesis is rejected and the region where it is not rejected based on selected α value (level of significance) and the type of test statistic Z or t and whether a one or two tail test is used. Rejection Rejection Region Region 1 –  1/2 a 1/2 a Nonrejection Region Z or t Critical value (-Zα/2 or –tα/2) Critical value (Zα/2 or tα/2)

1) Level of Significance (α)
Probability of rejecting a true null hypothesis Represented by ‘α’ (alpha) Selected by researcher at start Typical values are 0.01, 0.05, 0.10 or (0.01 – 0.10) Same as confidence level of (1 - α) 100% For example, α = 0.01 (1%) means 99% confidence level

2) Type of Test Statistic: Z or t ?
Estimation Z – Test 1 or 2 tailed test One Sample of Population σ Known σ Unknown t – Test

3) Whether One or Two Tailed Test ?
Rejection Rejection Region Region Example 1: 1 –  1/2 a 1/2 a H0: m = 5 Ha: m ¹ 5 Nonrejection Region Ho Sample Statistic Critical value (-Zα/2 or –tα/2) Value Critical value (Zα/2 or tα/2) Two Tailed Test For a two-tailed test (Ha:   5) both the upper and lower tails marked by the critical values are the rejection region for H0 If the calculated test statistic (Z or t) lies in the rejection region, the null hypothesis (H0) is rejected. Otherwise, accept H0

3) Whether One or Two Tailed Test ?
Rejection Region Example 2: 1 –  H0: m = 5 Ha: m > 5 a Accept H0 Nonrejection Reject H0 Region Critical value (Zα or tα) Ho Sample Statistic Value Critical value (Zα or tα) One/Upper Tailed Test For a one/upper-tailed test (Ha:  > 5) the rejection region is the right-hand tail marked by a positive critical value. If the calculated test statistic (Z or t) lies in the rejection region, the null hypothesis (H0) is rejected. Otherwise, accept H0

3) Whether One or Two Tailed Test ?
Rejection Region Example 3: 1 –  a Accept H0 H0: m = 5 Ha: m < 5 Nonrejection Reject H0 Region Critical value (-Zα or –tα) Ho Sample Statistic Critical value (-Zα or –tα) Value One/Lower Tailed Test For a one/lower-tailed test (Ha:  < 5) the rejection region is the left-hand tail marked by a negative critical value. If the calculated test statistic (Z or t) lies in the rejection region, the null hypothesis (H0) is rejected. Otherwise, accept H0

Steps in Hypothesis Testing
Set up the null and alternative hypothesis H0, Ha based on the research question Decide the test to perform (Z or t) depending on whether  is know or unknown Determine the critical value for a given α Compute the test statistic (Z or t) depending on the type of test used Make decision using the rule (reject / accept H0)

Z - Test of Hypothesis for μ
Hypothesis Testing Unknown Known Z Test Assumptions: The population follows a normal distribution or The sample size is sufficiently large (n ≥ 30).

Alternative Hypothesis
Test Statistic: Z Test Rejection Regions for Common Values of α Alternative Hypothesis Lower-Tailed Z < -Zα Upper-Tailed Z > Zα Two-Tailed Z < -Zα/2 or Z > Zα/2 α = 0.10 α = 0.05 α = 0.01 Z < Z < Z < Z > 1.280 Z > 1.645 Z > 2.330 Z < or Z > 1.645 Z < or Z > 1.960 Z < or Z > 2.575

Standardized Normal Probability Table (Portion)
How to Find Critical Z Two Tailed Test What is Z/2 given  = .05? Standardized Normal Probability Table (Portion) s = 1 Z .05 .06 .07 1.6 4505 .4515 .4525  / 2 = .025 1.7 .4599 .4608 .4616 -1.96 1.96 Z 1.8 .4678 .4686 .4693 Z/2 = 1.96 1.9 .4744 .4750 .4756

Standardized Normal Probability Table (Portion)
How to Find Critical Z One Tailed Test What is Z given  = .025? Z .05 .07 1.6 4505 .4515 .4525 1.7 .4599 .4608 .4616 1.8 .4678 .4686 .4693 .4744 .4756 .06 1.9 .4750 Standardized Normal Probability Table (Portion) s = 1  = .025 -1.96 1.96 Z Z = 1.96

Example: Two Tailed Test
Does an average box of cereal contain 368 grams of cereal? A random sample of 25 boxes showed x = The company has specified  to be 15 grams. Test at the .05 level of significance. 368 gm.

Example Solution Test Statistic: Decision: Conclusion: H0: Ha:
Z = 1.5 is in the non rejection region Decision: Conclusion: H0: Ha:   , /2  0.025 n  Critical Value(s):  = 368   368 .05 25 Reject H Reject H Do not reject at  = .05 .025 .025 No evidence average is not 368 -1.96 1.96 Z

Example: One Tailed Test
You’re an analyst for Ford. You want to find out if the average miles per gallon of Escorts is at least 32 mpg. Similar models have a standard deviation of 3.8 mpg. You take a sample of 60 Escorts & compute a sample mean of 30.7 mpg. At the .01 level of significance, is there evidence that the miles per gallon is at least 32?

Example Solution Test Statistic: Decision: Conclusion: H0: Ha:  = n =
Z = is in the rejection region Decision: Conclusion: H0: Ha:  = n = Critical Value(s):  = 32  < 32 .01 60 Reject at  = .01 Reject .01 There is evidence average is less than 32 -2.33 Z

Exercises The quality control manager believes that the lifetime of bulbs follows a normal distribution with a mean of 200 hours and a standard deviation of 20 hours. A sample of 100 bulbs showed a sample mean of 195 hours. The manager wants to test whether the mean lifetime of all bulbs is 200 at a 1% level of significance. 2) A company that manufactures chocolate bars is particularly concerned that the mean weight of a chocolate bar not be greater than 6.03 ounces. Past experience allows you to assume that the standard deviation is 0.02 ounces. A sample of 50 chocolate bars is selected, and the sample mean is ounces. Using the α = 0.01 level of significance, is there evidence that the population mean weight of chocolate bars is greater than 6.03 ounces?

Exercise 1 Solution Test Statistic: Decision: Conclusion: H0: Ha:  =
Z = -2.5 is in the rejection region Decision: Conclusion: H0: Ha:  = n = Critical Value(s):  = 200  < 200 .01 100 Rejection Region Reject H0 at  = .01 .01 There is evidence average is less than 200 hours -2.33 Z

Exercise 2 Solution Test Statistic:  ≤ 6.03 H0: Ha:  =  > 6.03
Z = 1.41 is in the non rejection region Decision: Conclusion: H0: Ha:  = n = Critical Value(s):  ≤ 6.03  > 6.03 .01 50 Rejection Region Accept H0 at  = .01 .01 There is no evidence average is greater than 6.03 ounces 2.33

Z - Test of Hypothesis for μ
Interval Estimation Known Unknown Large Sample Z Test Small Sample t Test Test Statistic:

Example A random sample of 64 observations produced the following summary statistics: and Test the null hypothesis that the mean of the population is 0.36 against the alternative hypothesis, μ < Use α = 0.1 . Test the null hypothesis that the mean of the population is 0.36 against the alternative hypothesis, μ ≠ Use α = Inteprete the result.

Example Solution (1) Test Statistic: Decision: Conclusion: H0: Ha:  
Z = -1.6 is in the rejection region Decision: Conclusion: H0: Ha:   n  Critical Value(s):  = 0.36  < 0.36 0.1 64 Rejection Region Reject H0 at  = 0.1 0.1 There is evidence average is less than 0.36 -1.28 Z

Example Solution (2) Test Statistic: Decision: Conclusion: H0: Ha:
Z = -1.6 is in the non rejection region Decision: Conclusion: H0: Ha:   , /2  0.05 n  Critical Value(s):  = 0.36   0.36 0.1 64 Rejection Region Rejection Region Do not reject H0 at  = 0.1 .05 .05 No evidence average is not 0.36 -1.645 1.645 Z

t - Test for Mean (μ), σ Unknown
Interval Estimation Unknown Known Small Sample t Test Large Sample Z Test

t - Test for Mean (μ), σ Unknown
Assumptions Population is normally distributed If not normal, only slightly skewed & small sample (n < 30) taken Parametric test procedure t test statistic

Critical Values of t Table (Portion)
How to Find Critical t Two Tailed Test Given: n = 3;  = .10 Critical Values of t Table (Portion) df = n - 1 = 2 v t .10 .05 .025 1 3.078 6.314 12.706 2 1.886 2.920 4.303 3 1.638 2.353 3.182  /2 = .05  /2 = .05 -2.920 2.920 t

Example You’re a marketing analyst for Wal-Mart. Wal-Mart had teddy bears on sale last week. The weekly sales (\$ 00) of bears sold in 10 stores was: At the .05 level of significance, is there evidence that the average bear sales per store is different from 5 (\$ 00)? the average bear sales per store is more than 5 (\$ 00)? Assume that the population is normally distributed. Allow students about 10 minutes to solve this.

Example Solution (1) Test Statistic:  = 5 H0: Ha:  = , /2  0.025
t = 1.31 is in the non rejection region Decision: Conclusion: H0: Ha:  = , /2  0.025 df = Critical Value(s):  = 5   5 .05 = 9 Note: More than 5 have been sold (6.4), but not enough to be significant. Rejection Region Rejection Region Do not reject H0 at  = .05 .025 .025 There is no evidence average is different from 5 -2.262 2.262 t

Example Solution (2) t Test Statistic:  = 5 H0: Ha:  =  > 5 df =
t = 1.31 is in the non rejection region Decision: Conclusion:  = 5  > 5 H0: Ha:  = df = Critical Value(s): .05 = 9 Note: More than 5 have been sold (6.4), but not enough to be significant. Rejection Region Do not reject H0 at  = .05 .05 There is no evidence average is more than 5 1.833 t

Exercise Exercise: A particular branch of a well-known bank stores enough money during the weekends to satisfy its customer’s needs. The expected average withdrawal during the weekend is \$500. When they looked at a sample of the last 16 weekend transactions, they found the average withdrawal to be \$540 with a standard deviation of \$70. At  = .05, is there evidence that the mean withdrawal has increased during weekends? It is assumed that the population follows a normal distribution.

Exercise Solution t Test Statistic:  = 500 H0: Ha:  =  > 500
t = 2.28 is in the non rejection region Decision: Conclusion:  = 500  > 500 H0: Ha:  = df = Critical Value(s): .05 n – 1 = = 15 Note: More than 5 have been sold (6.4), but not enough to be significant. Rejection Region Reject H0 at  = .05 .05 There is evidence average is more than 500 1.753 t

Observed Significance Level (p -Values)
The p-value is the probability of getting a test statistic equal to or more extreme than the sample result, give that the null hypothesis, H0, is true. The p-value, often referred to as the observed level of significance, is the smallest value of the significance level at which H0 can be rejected. The decision rules for rejecting H0 in the p-value approach are If the p-value is less than , the null hypothesis is rejected If the p-value is greater than or equal to , the null hypothesis is not rejected

Finding the p - Values Z Z Area p-value = 2 × Area Z p-value = Area p-value = 2 × Probability ( Z ≥ ‘absolute value’ of the computed test statistic value) p-value = Probability ( Z ≥ the computed test statistic value) Z p-value = Area p-value = Probability ( Z ≤ the computed test statistic value)

Example You’re an analyst for Ford. You want to find out if the average miles per gallon of Escorts is at least 32 mpg. Similar models have a standard deviation of 3.8 mpg. You take a sample of 60 Escorts & compute a sample mean of 30.7 mpg. What is the value of the observed level of significance (p-Value)? .

Example Solution Test Statistic: H0:  ≥ 32 Ha: σ   < 32 n 
Critical Value(s):  ≥ 32  < 32 3.8 60 p-Value is P(Z  -2.65) = .004 Since p-Value < ( = .01) then Reject H0. Rejection Region p-value -2.65 Z

Exercise The quality control manager believes that the lifetime of bulbs follows a normal distribution with a mean of 200 hours and a standard deviation of 20 hours. A sample of 100 bulbs showed a sample mean of 195 hours. The manager wants to test whether the mean lifetime of all bulbs is 200 at a 1% level of significance. Use the p-value approach.

Exercise Solution Test Statistic: H0:  ≥ 200 Ha: σ   < 200 n 
Critical Value(s):  ≥ 200  < 200 20 100 p-Value is P(Z  -2.50) = .0062 Since p-Value < ( = .01) then Reject H0. Rejection Region p-value -2.50 Z

Risks in Using Hypothesis Testing
Statistical Decision Actual Situation H0 True H0 False Do not reject H0 Correct Decision Confidence = (1-α) Type II error P (Type II error) = β Reject H0 Type I error P (Type I error) = α Power = (1 – β) Type I error has serious consequences

You can’t reduce both errors simultaneously!
and Have an Inverse Relationship You can’t reduce both errors simultaneously!

Any Questions ?