# L. Wang, Department of Statistics University of South Carolina Inference on a Single Mean.

## Presentation on theme: "L. Wang, Department of Statistics University of South Carolina Inference on a Single Mean."— Presentation transcript:

L. Wang, Department of Statistics University of South Carolina Inference on a Single Mean

L. Wang, Department of Statistics University of South Carolina; Slide 2 Use Calculation from Sample to Estimate Population Parameter PopulationSample (select) Statistic (calculate) Parameter (estimate) (describes)

L. Wang, Department of Statistics University of South Carolina; Slide 3 Use Calculation from Sample to Estimate Population Parameter PopulationSample (select) Statistic (calculate) Parameter (estimate) (describes)

L. Wang, Department of Statistics University of South Carolina; Slide 4 Statistic Parameter Describes a sample. Describes a sample. Always known Always known Changes upon repeated sampling. Changes upon repeated sampling. Examples: Examples: Describes a population. Describes a population. Usually unknown Usually unknown Is fixed Is fixed Examples: Examples:

L. Wang, Department of Statistics University of South Carolina; Slide 5 A Statistic is a Random Variable Upon repeated sampling of the same population, the value of a statistic changes. Upon repeated sampling of the same population, the value of a statistic changes. While we don’t know what the next value will be, we do know the overall pattern over many, many samplings. While we don’t know what the next value will be, we do know the overall pattern over many, many samplings. The distribution of possible values of a statistic for repeated samples of the same size from a population is called the sampling distribution of the statistic. The distribution of possible values of a statistic for repeated samples of the same size from a population is called the sampling distribution of the statistic.

L. Wang, Department of Statistics University of South Carolina; Slide 6 Sampling Distribution of If a random sample of size n is taken from a normal population having mean μ y and variance σ y 2, then is a random variable which is also normally distributed with mean μ y and variance σ y 2 /n.

L. Wang, Department of Statistics University of South Carolina; Slide 7 Sampling Distribution of N(100,5) N(100,1) N(100,3.54) N(100,1.58)

L. Wang, Department of Statistics University of South Carolina; Slide 8 Light Bulbs The life of a light bulb is normally distributed with a mean of 2000 hours and standard deviation of 300 hours. The life of a light bulb is normally distributed with a mean of 2000 hours and standard deviation of 300 hours. What is the probability that a randomly chosen light bulb will have a life of less than 1700 hours? What is the probability that a randomly chosen light bulb will have a life of less than 1700 hours? What is the probability that the mean life of three randomly chosen light bulbs will be less than 1700 hours? What is the probability that the mean life of three randomly chosen light bulbs will be less than 1700 hours?

L. Wang, Department of Statistics University of South Carolina; Slide 9 Why Averages Instead of Single Readings? Suppose we are manufacturing light bulbs. The life of these bulbs has historically followed a normal distribution with a mean of 2000 hours and standard deviation of 300 hours. Suppose we are manufacturing light bulbs. The life of these bulbs has historically followed a normal distribution with a mean of 2000 hours and standard deviation of 300 hours. We change the filament material and unbeknown to us the average life of the bulbs decreases to 1500 hours. (We will assume that the distribution remains normal with a standard deviation of 300 hours.) We change the filament material and unbeknown to us the average life of the bulbs decreases to 1500 hours. (We will assume that the distribution remains normal with a standard deviation of 300 hours.) If we randomly sample 1 bulb, will we realize that the average life has decrease? What if we sample 3 bulbs? 9 bulbs? If we randomly sample 1 bulb, will we realize that the average life has decrease? What if we sample 3 bulbs? 9 bulbs?

L. Wang, Department of Statistics University of South Carolina; Slide 10 Why Averages Instead of Single Readings? μ = 1500μ = 2000 Single Readings σ = 300 Y < 1400 would signal shift

L. Wang, Department of Statistics University of South Carolina; Slide 11 Why Averages Instead of Single Readings? μ = 1500μ = 2000 Averages of n = 3 σ = 173 Y < 1650 would signal shift

L. Wang, Department of Statistics University of South Carolina; Slide 12 Why Averages Instead of Single Readings? Averages of n = 9 μ = 1500μ = 2000 µ = 1500µ = 2000 µ = 1500µ = 2000 σ = 100 Y < 1800 would signal shift

L. Wang, Department of Statistics University of South Carolina; Slide 13 What if the original distribution is not normal? Consider the roll of a fair die:

L. Wang, Department of Statistics University of South Carolina; Slide 14 Suppose the single measurements are not normally Distributed. Let Y = life of a light bulb in hours Let Y = life of a light bulb in hours Y is exponentially distributed Y is exponentially distributed with λ = 0.0005 = 1/2000 with λ = 0.0005 = 1/2000 0.0005

L. Wang, Department of Statistics University of South Carolina; Slide 15 Source: Lawrence L. Lapin, Statistics in Modern Business Decisions, 6 th ed., 1993, Dryden Press, Ft. Worth, Texas. Single measurements Averages of 2 measurements Averages of 4 measurements Averages of 25 measurements

L. Wang, Department of Statistics University of South Carolina; Slide 16 n=1 n=2 n=4 n=25 As n increases, what happens to the variance? A.Variance increases. B.Variance decreases. C.Variance remains the same.

L. Wang, Department of Statistics University of South Carolina; Slide 17 n = 1 n = 2 n = 4 n = 25

L. Wang, Department of Statistics University of South Carolina; Slide 18 Central Limit Theorem If n is sufficiently large, the sample means of random samples from a population with mean μ and standard deviation σ are approximately normally distributed with mean μ and standard deviation. If n is sufficiently large, the sample means of random samples from a population with mean μ and standard deviation σ are approximately normally distributed with mean μ and standard deviation.

L. Wang, Department of Statistics University of South Carolina; Slide 19 Random Behavior of Means Summary If Y is distributed n(μ, σ), then If Y is distributed n(μ, σ), then is distributed N(μ, ). If Y is distributed non-N(μ, σ), then If Y is distributed non-N(μ, σ), then is distributed approximately N(μ, ).

L. Wang, Department of Statistics University of South Carolina; Slide 20 If We Can Consider to be Normal … Recall: If Y is distributed normally with mean μ and standard deviation σ, then Recall: If Y is distributed normally with mean μ and standard deviation σ, then So if is distributed normally with mean μ and standard deviation, So if is distributed normally with mean μ and standard deviation,then

L. Wang, Department of Statistics University of South Carolina; Slide 21 If the time between adjacent accidents in an industrial plant follows an exponential distribution with an average of 700 days, what is the probability that the average time between 49 pairs of adjacent accidents will be greater than 900 days?

L. Wang, Department of Statistics University of South Carolina; Slide 22 XYZ Bottling Company claims that the distribution of fill on it’s 16 oz bottles averages 16.2 ounces with a standard deviation of 0.1 oz. We randomly sample 36 bottles and get y = 16.15. If we assume a standard deviation of 0.1 oz, do we believe XYZ’s claim of averaging 16.2 ounces?

L. Wang, Department of Statistics University of South Carolina; Slide 23 Up Until Now We have been Assuming that We Knew the True Standard Deviation (σ), But Let’s Face Facts … When we use s to estimate σ, then the calculated value When we use s to estimate σ, then the calculated value follows a t-distribution with n-1 degrees of freedom. Note: we must be able to assume that we are sampling from a normal population.

L. Wang, Department of Statistics University of South Carolina; Slide 24 Let’s take another look at XYZ Bottling Company. If we assume that fill on the individual bottles follows a normal distribution, does the following data support the claim of an average fill of 16.2 oz? 16.1 16.0 16.3 16.2 16.1

L. Wang, Department of Statistics University of South Carolina; Slide 25 In Summary When we know σ: When we know σ: When we estimate σ with s: When we estimate σ with s: We assume we are sampling from a normal population.

L. Wang, Department of Statistics University of South Carolina; Slide 26 Relationship Between Z and t Distributions Z t df=3 t df=1

L. Wang, Department of Statistics University of South Carolina; Slide 27 Internal Combustion Engine The nominal power produced by a student- designed internal combustion engine is 100 hp. The student team that designed the engine conducted 10 tests to determine the actual power. The data follow: The nominal power produced by a student- designed internal combustion engine is 100 hp. The student team that designed the engine conducted 10 tests to determine the actual power. The data follow: 98, 101, 102, 97, 101, 98, 100, 92, 98, 100 Assume data came from a normal distribution.

L. Wang, Department of Statistics University of South Carolina; Slide 28 Internal Combustion Engine ColumnnMean Std. Dev. hp1098.72.9 Summary Data: What is the probability of getting a sample mean of 98.7 hp or less if the true mean is 100 hp?

L. Wang, Department of Statistics University of South Carolina; Slide 29 Internal Combustion Engine 0.0949 What did we assume when doing this analysis? Are you comfortable with the assumption?

L. Wang, Department of Statistics University of South Carolina; Slide 30 Can We Assume Sampling from a Normal Population? If data are from a normal population, there is a linear relationship between the data and their corresponding Z values. If data are from a normal population, there is a linear relationship between the data and their corresponding Z values. If we plot y on the vertical axis and z on the horizontal axis, the y intercept estimates μ and the slope estimates σ.

L. Wang, Department of Statistics University of South Carolina; Slide 31 How to Calculate Corresponding Z-Values Order data Order data Estimate percent of population below each data point. Estimate percent of population below each data point. Look up Z-Value that has P i proportion of distribution below it. Look up Z-Value that has P i proportion of distribution below it. where i is a data point’s position in the ordered set and n is the number of data points in the set.

L. Wang, Department of Statistics University of South Carolina; Slide 32 Normal Probability (QQ) Plot Data set: Data set: Z P i y i i -1.15.1252 1 -0.32.3754 2 +0.32.6257 3 +1.15.87510 4 2 4 7 10

L. Wang, Department of Statistics University of South Carolina; Slide 33 Normal Probability (QQ) Plot This data is a random sample from a N(10,2) population.

L. Wang, Department of Statistics University of South Carolina; Slide 34 Normal Probability (QQ) Plot

L. Wang, Department of Statistics University of South Carolina Estimation of the Mean

L. Wang, Department of Statistics University of South Carolina; Slide 36 Point Estimators A point estimator is a single number calculated from sample data that is used to estimate the value of a parameter. A point estimator is a single number calculated from sample data that is used to estimate the value of a parameter. Recall that statistics change value upon repeated sampling of the same population while parameters are fixed, but unknown. Recall that statistics change value upon repeated sampling of the same population while parameters are fixed, but unknown. Examples: Examples:

L. Wang, Department of Statistics University of South Carolina; Slide 37 In General: What makes a “Good” estimator? Accuracy: An unbiased estimator of a parameter is one whose expected value is equal to the parameter of interest. (1) Precision: An estimator is more precise if its sampling distribution has a smaller standard error*. (2) *Standard error is the standard deviation for the sampling distribution.

L. Wang, Department of Statistics University of South Carolina; Slide 38 Unbiased Estimators For normal populations, both the sample mean and sample median are unbiased estimators of μ. µ mean median

L. Wang, Department of Statistics University of South Carolina; Slide 39 Most Efficient Estimators If you have multiple unbiased estimators, then you choose the estimator whose sampling distribution has the least variation. This is called the most efficient estimator. If you have multiple unbiased estimators, then you choose the estimator whose sampling distribution has the least variation. This is called the most efficient estimator. mean median For normal populations, the sample mean is the most efficient estimator of μ.

L. Wang, Department of Statistics University of South Carolina; Slide 40 Interval Estimate of the Mean So we say that we are 95% sure that μ is in the interval (with a little algebra) What assumptions have we made?

L. Wang, Department of Statistics University of South Carolina; Slide 41 Interval Estimate of the Mean.025 0.95 Z 1.96 -1.96

L. Wang, Department of Statistics University of South Carolina; Slide 42 Interval Estimate of the Mean Let’s go from 95% confidence to the general case. Let’s go from 95% confidence to the general case. The symbol z α is the z-value that has an area of α to the right of it. The symbol z α is the z-value that has an area of α to the right of it.

L. Wang, Department of Statistics University of South Carolina; Slide 43 Interval Estimate of the Mean α/2 1 - α (1 – α) 100% Confidence Interval -Z α/2 +Z α/2

L. Wang, Department of Statistics University of South Carolina; Slide 44 What Does (1 – α) 100% Confidence Mean? μ Sampling Distribution of the y (1-α)100% Confidence Intervals

L. Wang, Department of Statistics University of South Carolina; Slide 45 If Z 0.05 = 1.645, we are _____% confident that the mean is between A.99% B.95% C.90% D.85%

L. Wang, Department of Statistics University of South Carolina; Slide 46 Which z-value would you use to calculate a 99% confidence interval on a mean? A. Z 0.10 = 1.282 B. Z 0.01 = 2.326 C. Z 0.005 = 2.576 D. Z 0.0005 = 3.291

L. Wang, Department of Statistics University of South Carolina; Slide 47 Plastic Injection Molding Process A plastic injection molding process for a part that has a critical width dimension historically follows a normal distribution with a standard deviation of 8. A plastic injection molding process for a part that has a critical width dimension historically follows a normal distribution with a standard deviation of 8. Periodically, clogs from one of the feeder lines causes the mean width to change. As a result, the operator periodically takes random samples of size 4. Periodically, clogs from one of the feeder lines causes the mean width to change. As a result, the operator periodically takes random samples of size 4.

L. Wang, Department of Statistics University of South Carolina; Slide 48 Plastic Injection Molding A recent sample of four yielded a sample mean of 101.4. A recent sample of four yielded a sample mean of 101.4. Construct a 95% confidence interval for the true mean width. Construct a 95% confidence interval for the true mean width. Construct a 99% confidence for the true mean width. Construct a 99% confidence for the true mean width.

L. Wang, Department of Statistics University of South Carolina; Slide 49 When going from a 95% confidence interval to a 99% confidence interval, the width of the interval will A. Increase. B. Decrease. C. Remain the same.

L. Wang, Department of Statistics University of South Carolina; Slide 50 Interval Width, Level of Confidence and Sample Size At a given sample size, as level of confidence increases, interval width __________. At a given sample size, as level of confidence increases, interval width __________. At a given level of confidence as sample size increases, interval width __________. At a given level of confidence as sample size increases, interval width __________.

L. Wang, Department of Statistics University of South Carolina; Slide 51 Calculate Sample Size Before Sampling! The width of the interval is determined by: The width of the interval is determined by: Suppose we wish to estimate the mean to a maximum error of d:

L. Wang, Department of Statistics University of South Carolina; Slide 52 Plastic Injection Molding A plastic injection molding process for a part that has a critical width dimension historically follows a normal distribution with a standard deviation of 8. A plastic injection molding process for a part that has a critical width dimension historically follows a normal distribution with a standard deviation of 8. What sample size is required to estimate the true mean width to within + 2 units at 95% confidence? What sample size is required to estimate the true mean width to within + 2 units at 95% confidence? What sample size is required to estimate the true mean width to within + 2 units at 99% confidence? What sample size is required to estimate the true mean width to within + 2 units at 99% confidence?

L. Wang, Department of Statistics University of South Carolina; Slide 53 If we don’t have prior knowledge of the standard deviation, but can assume we are sampling from a normal population… Instead of using a z-value to calculate the confidence interval… Instead of using a z-value to calculate the confidence interval…

L. Wang, Department of Statistics University of South Carolina; Slide 54 Interval Estimate of the Mean α/2 1 - α (1 – α) 100% Confidence Interval -t α/2 +t α/2 t df=n-1

L. Wang, Department of Statistics University of South Carolina; Slide 55 Plastic Injection Molding – Reworded A plastic injection molding process for a part that has a critical width dimension historically follows a normal distribution. A plastic injection molding process for a part that has a critical width dimension historically follows a normal distribution. A recent sample of four yielded a sample mean of 101.4 and sample standard deviation of 8. A recent sample of four yielded a sample mean of 101.4 and sample standard deviation of 8. Estimate the true mean width with a 95% confidence interval. Estimate the true mean width with a 95% confidence interval.

L. Wang, Department of Statistics University of South Carolina Hypothesis Testing

L. Wang, Department of Statistics University of South Carolina; Slide 57 Statistical Hypothesis A statistical hypothesis is an assertion or conjecture concerning one or more population parameters. A statistical hypothesis is an assertion or conjecture concerning one or more population parameters. Examples: Examples: –More than 7% of the landings for a certain airline exceed the runway. –The defective rate on a manufacturing line is less than 10%. –The mean lifetime of the bulbs is above 2200 hours.

L. Wang, Department of Statistics University of South Carolina; Slide 58 The Null and Alternative Hypotheses Null Hypothesis, H o, represents what we assume to be true. It is always stated so as to specify an exact value of the parameter. Null Hypothesis, H o, represents what we assume to be true. It is always stated so as to specify an exact value of the parameter. Alternative (Research) Hypothesis, H 1 or H a, represents the alternative to the null hypothesis and allows for the possibility of several values. It carries the burden of proof. Alternative (Research) Hypothesis, H 1 or H a, represents the alternative to the null hypothesis and allows for the possibility of several values. It carries the burden of proof. In most situations, the researcher hopes to disprove or reject the null hypothesis in favor of the alternative hypothesis. In most situations, the researcher hopes to disprove or reject the null hypothesis in favor of the alternative hypothesis.

L. Wang, Department of Statistics University of South Carolina; Slide 59 Steps to a Hypothesis Test (1) Determine the null and alternative hypotheses. (2) Collect data and calculate test statistic, assuming null hypothesis it true. (3) Assuming the null hypothesis is true, calculate the p-value or use rejection region method. (4) Draw conclusion and state it in English.

L. Wang, Department of Statistics University of South Carolina; Slide 60 Two types of mistakes (1) Type I error Reject null hypothesis when it is true. Reject null hypothesis when it is true. (2) Type II error Fail to reject the null hypothesis when the alternative hypothesis is true. Fail to reject the null hypothesis when the alternative hypothesis is true. Let α= P(type I error), β=P(type II error) Power of the test is 1-β. Power of the test is 1-β.

L. Wang, Department of Statistics University of South Carolina; Slide 61 Combustion Engine The nominal power produced by a student designed combustion engine is assumed to be at least 100 hp. We wish to test the alternative that the power is less than 100 hp. Let µ = nominal power of engine. QQ plots shows it is reasonable to assume data came from a normal distribution. Sample Data:

L. Wang, Department of Statistics University of South Carolina; Slide 62 Combustion Engine (1) State hypotheses, set alpha. (2) Choose test statistic (3,4) Designate critical value for test ( if using the rejection region method) and draw conclusion Calculate p-value and draw conclusion. or

L. Wang, Department of Statistics University of South Carolina; Slide 63 (3) Designate Rejection Region Y=avg hp100 -4 -3 -2 -1 0 +1 +2 +3 +4 t df=9 -1.833 0.05 Assumes H 0 : µ = 100 is true

L. Wang, Department of Statistics University of South Carolina; Slide 64 Draw conclusion: -1.833 -1.4327 t df=9

L. Wang, Department of Statistics University of South Carolina; Slide 65p-value The p-value is the probability of getting the sample result we got or something more extreme. The p-value is the probability of getting the sample result we got or something more extreme. -1.4327 t df=9 0.0928

L. Wang, Department of Statistics University of South Carolina; Slide 66p-value P(t df=9 < -1.4327) = 0.0928 P(t df=9 < -1.4327) = 0.0928 Note: Note: If p-value < α, reject H 0. If p-value > α. Fail to reject H 0. -1.4327 -1.833 0.05 0.0928 t df=9

L. Wang, Department of Statistics University of South Carolina; Slide 67 Average Life of a Light Bulb Historically, a particular light bulb has had a mean life of no more than 2000 hours. We have changed the production process and believe that the life of the bulb has increased. Let μ = mean life. Let μ = mean life. H 0 : Ha: (1) Set Up Hypotheses α = 0.05

L. Wang, Department of Statistics University of South Carolina; Slide 68 Average Life of a Light Bulb (2) Collect Data and calculate test statistic: p-value = P(t df=14 > 2.5282) = 0.0121 1.7612.5282 0.05 0.0121 t df=14

L. Wang, Department of Statistics University of South Carolina; Slide 69 Average Life of a Light Bulb State Conclusion: A. At 0.05 level of significance there is insufficient evidence to conclude that µ > 2000 hours. B. At 0.05 level of significance there is sufficient evidence to conclude that µ > 2000 hours.

L. Wang, Department of Statistics University of South Carolina; Slide 70 Mean Width of a Manufactured Part Test the theory that the mean width of a manufactured part differs from 100 cm. Test the theory that the mean width of a manufactured part differs from 100 cm. Let µ = mean width. (1) Set up Hypotheses α = 0.05

L. Wang, Department of Statistics University of South Carolina; Slide 71 Mean Width of a Manufactured Part (2,3) Collect data and calculate test statistic. (4) State conclusion.

L. Wang, Department of Statistics University of South Carolina; Slide 72 Given population parameter µ and value µ 0 : For Ho: µ = µ 0 H a : µ = µ 0 H a : µ > µ 0 H a : µ < µ 0 α/2 HaHa α H0H0 α HaHa HaHa HaHa H0H0 H0H0

L. Wang, Department of Statistics University of South Carolina; Slide 73 Focus on the two types of errors in hypothesis test 1) Reject H 0 when H 0 is true. This is called a type I error. P(Rej H 0 |H 0 is true) = α 2) Fail to Reject H 0 when H a is true at some value. This is called a type II error. P(Fail to Rej H 0 |H a is true at some value) = β

L. Wang, Department of Statistics University of South Carolina; Slide 74 Avg Life of Light Bulb - Type I Error α = Probability that we will reject Ho when Ho is true. H 0 : µ < 2000 H a : µ > 2000 Fail to reject H 0. Z Assumes H 0 is true.

L. Wang, Department of Statistics University of South Carolina; Slide 75 Type I and Type II Errors What if µ = 2200H 0 : µ = 2000 β = Probability we will fail to reject Ho when H a is true at µ = 2200 α = Probability that we will reject Ho when Ho is true.

L. Wang, Department of Statistics University of South Carolina; Slide 76 How can we control the size of β? The value of α. The value of α. Location of our point of interest. Location of our point of interest. Sample size. Sample size.

L. Wang, Department of Statistics University of South Carolina; Slide 77 Calculating β If µ = 2200, what is the probability of a type II error? If µ = 2200, what is the probability of a type II error? Given: α = 0.05 and we are assuming Given: α = 0.05 and we are assuming µ = 2000. We will also assume we know σ = 216. µ = 2000. We will also assume we know σ = 216.

L. Wang, Department of Statistics University of South Carolina; Slide 78 Calculating β H 0 : µ = 2000What if µ = 2200 2091Fail to Reject HoReject Ho

L. Wang, Department of Statistics University of South Carolina; Slide 79 Calculating β

L. Wang, Department of Statistics University of South Carolina; Slide 80 α, β and Power α = P(Reject H 0 |µ = 2000) = 0.05 α = P(Reject H 0 |µ = 2000) = 0.05 β = P(Fail to Rej H 0 | µ = 2200) = 0.0254 β = P(Fail to Rej H 0 | µ = 2200) = 0.0254 We say that the power of this test at We say that the power of this test at µ = 2200 is 1 – 0.0254 = 0.9746 µ = 2200 is 1 – 0.0254 = 0.9746 Power = 1 –β Power = 1 –β Power = P(Rej H 0 |µ is at some H a level) Power = P(Rej H 0 |µ is at some H a level)

L. Wang, Department of Statistics University of South Carolina; Slide 81 Plastic Injection Molding A plastic injection molding process for a part that has a critical width dimension historically follows a normal distribution. A plastic injection molding process for a part that has a critical width dimension historically follows a normal distribution. A recent sample of n = 4 yielded a sample mean of 101.4 and sample standard deviation of 8. A recent sample of n = 4 yielded a sample mean of 101.4 and sample standard deviation of 8. Does this data support the statement: “The true average width is greater than 95.”? Does this data support the statement: “The true average width is greater than 95.”?

L. Wang, Department of Statistics University of South Carolina; Slide 82 Plastic Injection Molding Confidence Interval Approach 95% confidence interval on µ: 95% confidence interval on µ:

L. Wang, Department of Statistics University of South Carolina; Slide 83 Plastic Injection Molding Hypothesis Test Approach H0:H0:Ha:Ha:H0:H0:Ha:Ha: α = 0.05 Test statistics is p-value = Conclusion:

Download ppt "L. Wang, Department of Statistics University of South Carolina Inference on a Single Mean."

Similar presentations