Presentation on theme: "Statistics Review – Part I"— Presentation transcript:
1Statistics Review – Part I TopicsZ-valuesConfidence IntervalsHypothesis TestingPaired TestsT-testsF-tests
2Statistics References References used in class slides:Sullivan III, Michael. Statistics: Informed Decisions Using Data, Pearson Education, 2004.Gitlow, et. al Six Sigma for Green Belts and Champions, Prentice Hall, 2004.
3Sampling and the Normal Distribution Relative frequency histograms that are symmetric and bell-shaped are said to have the shape of a normal curve.
4Sampling and the Normal Distribution If a continuous random variable is normally distributed or has a normal probability distribution, then a relative frequency histogram of the random variable has the shape of a normal curve (bell-shaped and symmetric).
6Sampling and the Normal Distribution Suppose that the mean normal sugar level in the population is 0=9.7mmol/L with std. dev. =2.0mmol/L - you want to see whether diabetics have increased blood sugar levelSample n=64 individuals with diabetes mean is 0=13.7mmol/L with std. dev. =2.0mmol/LHow do you compare these values?Standardize!
7Sampling and the Normal Distribution Reading z-scores
8Sampling and the Normal Distribution Standardization:Using Z-tables to evaluate sample meansPuts samples on the same scaleSubtract mean and divide by standard deviation
9Sampling and the Normal Distribution Why do we standardize?Enables the comparison of populations/ samples using a standardized set of valuesRecall
10Sampling and the Normal Distribution The table gives the area under the standard normal curve for values to the left of a specified Z-score, zo, as shown in the figure.
12Sampling and the Normal Distribution Population Mean=10, Standard Deviation=5What is the likelihood of a sample (n=16) having a mean greater than 12 (standard deviation = 5)?What is the likelihood of a sample (n=16) having a mean of less than 8 (standard deviation = 5)?
13Sampling and the Normal Distribution Notation for the Probability of a Standard Normal Random Variable:P(a < Z < b) represents the probability a standard normal random variable is between a and bP(Z > a) represents the probability a standard normal random variable is greater than a.P(Z < a) represents the probability a standard normal random variable is less than a.
14Sampling and the Normal Distribution Before using Z-tables, need to assess whether the data is normally distributedDifferent waysHistogramProbability plot
15Sampling and the Normal Distribution Normal Probability Plots:
16Sampling and the Normal Distribution Normal Probability Plots:Fat pencil test to detect normality
17Sampling and the Normal Distribution Shapes of Normal Probability Plots:
18Sampling and the Normal Distribution Normal Probability Plots vs Box plots:
19Sampling and the Normal Distribution If distribution of data is “approximately” normally distributed, use Z-tables to determine likelihood of events
20Sampling and the Normal Distribution Can also “flip” Z-scores to determine the ‘highest’ or ‘lowest’ acceptable sample mean
21Confidence IntervalsPoint estimate: value of a statistic that estimates the value of the parameter.Confidence interval estimate: interval of numbers along with a probability that the interval contains the unknown parameter.Level of confidence: a probability that represents the percentage of intervals that will contain if a large number of repeated samples are obtained.
22Confidence IntervalsA 95% level if 100 confidence intervals were constructed, each based on a different sample from the same population, we would expect 95 of the intervals to contain the population mean.The construction of a confidence interval for the population mean depends upon three factors:The point estimate of the populationThe level of confidenceThe standard deviation of the sample mean:
23Confidence IntervalsIf a simple random sample from a population is normally distributed or the sample size is large, the distribution of the sample mean will be normal with:
33Properties of the t Distribution Confidence IntervalsProperties of the t DistributionThe t distribution is different for different values of n.2. The t distribution is centered at 0 and is symmetric about 0.3. The area under the curve is 1. The area under the curve to the right of 0 = the area under the curve to the left of 0 = 1 / 2.4. As t increases and decreases without bound, the graph approaches, but never equals, zero.The area in the tails of the t distribution is a little greater than the area in the tails of the standard normal distribution. This is due to using s as an estimate introducing more variability to the t statistic.As the sample size n increases, the density of the curve of t approaches the standard normal density curve. The occurs due to the values of s approaching the values of sigma by the law of large numbers.
36Confidence Intervals EXAMPLE: Finding t-values Find the t-value such that the area under the t distribution to the right of the t-value is 0.2 assuming 10 degrees of freedom.Hint: find t0.20 with 10 degrees of freedom.
43Confidence IntervalsEXAMPLE: Constructing a Confidence Interval about a Population Standard Deviation
44Hypothesis TestingHypothesis testing is a procedure, based on sample evidence and probability, used to test claims regarding a characteristic of one or more populations.Selecting Hypothesis Testing methods – see next slides.
47Hypothesis TestingThe null hypothesis, denoted Ho (read “H-naught”), is a statement to be tested. The null hypothesis is assumed true until evidence indicates otherwise. In this chapter, it will be a statement regarding the value of a population parameter.The alternative hypothesis, denoted, H1 (read “H-one”), is a claim to be tested. We are trying to find evidence for the alternative hypothesis. In this chapter, it will be a claim regarding the value of a population parameter.
48Hypothesis TestingThere are three ways to set up the null and alternative hypothesis:1. Equal versus not equal hypothesis (two-tailed test)Ho: parameter = some valueH1: parameter some value2. Equal versus less than (left-tailed test)H1: parameter < some value3. Equal versus greater than (right-tailed test)H1: parameter > some value
49Hypothesis TestingTHREE WAYS TO STRUCTURE THE HYPOTHESIS TEST:
52Hypothesis Testing Four Outcomes from Hypothesis Testing 1. We could reject Ho when in fact H1 is true. This would be a correct decision.2. We could not reject Ho when in fact Ho is true. This would be a correct decision.3. We could reject Ho when in fact Ho is true. This would be an incorrect decision. This type of error is called a Type I error.4. We could not reject Ho when in fact H1 is true. This would be an incorrect decision. This type of error is called a Type II error.
53For example, we might reject the null hypothesis if the sample mean is more than 2 standard deviations above the population mean. Why?zArea =
54Hypothesis TestingIf the null hypothesis is true, then = = 97.72% of all sample means will be less than
55Hypothesis TestingBecause sample means greater than 2.88 are unusual if the population mean is 2.62, we are inclined to believe the population mean is greater than 2.62.
57Hypothesis TestingStep 1: A claim is made regarding the population mean. The claim is used to determine the null and alternative hypotheses. Again, the hypothesis can be structured in one of three ways:
61Hypothesis TestingStep 4: Compare the critical value with the test statistic:Step 5: State the conclusion.
62Hypothesis TestingA P-value is the probability of observing a sample statistic as extreme or more extreme than the one observed under the assumption the null hypothesis is true.
63Hypothesis Test Regarding μ with σ Known Hypothesis TestingHypothesis Test Regarding μ with σ Known(P-values)
64Hypothesis TestingStep 1: A claim is made regarding the population mean. The claim is used to determine the null and alternative hypotheses. Again, the hypothesis can be structured in one of three ways:
69Properties of the t Distribution Hypothesis TestingProperties of the t DistributionThe t distribution is different for different values of n, the sample size.The t distribution is centered at 0 and is symmetric about 0.The area under the curve is 1. Because of the symmetry, the area under the curve to the right of 0 equals the area under the curve to the left of 0 equals ½.As t increases without bound, the graph approaches, but never equals, zero. As t decreases without bound the graph approaches, but never equals, zero.The area in the tails of the t distribution is a little greater than the area in the tails of the standard normal distribution. This result is because we are using s as an estimate of which introduces more variability to the t statistic.
71Hypothesis TestingStep 1: A claim is made regarding the population mean. The claim is used to determine the null and alternative hypotheses. Again, the hypothesis can be structured in one of three ways:
77Hypothesis Test Regarding a Population Variance or Standard Deviation Hypothesis TestingHypothesis Test Regarding a Population Variance or Standard DeviationIf a claim is made regarding the population variance or standard deviation, we can use the following steps to test the claim provided(1) the sample is obtained using simple random sampling(2) the population is normally distributed
78Step 1: A claim is made regarding the population variance or standard deviation. The claim is used to determine the null and alternative hypothesis. We present the three cases for a claim regarding a population standard deviation.
81Hypothesis TestingStep 4: Compare the critical value with the test statistic.Step 5: State the conclusion.
82Paired TestingA sampling method is independent when the individuals selected for one sample does not dictate which individuals are to be in a second sample. A sampling method is dependent when the individuals selected to be in one sample are used to determine the individuals to be in the second sample.Dependent samples are often referred to as matched pairs samples.
83EXAMPLE Independent versus Dependent Sampling Paired TestingEXAMPLE Independent versus Dependent SamplingFor each of the following, determine whether the sampling method is independent or dependent.(a) A researcher wants to know whether the price of a one night stay at a Holiday Inn Express Hotel is less than the price of a one night stay at a Red Roof Inn Hotel. She randomly selects 8 towns where the location of the hotels is close to each other and determines the price of a one night stay.(b) A researcher wants to know whether the newly issued “state” quarters have a mean weight that is different from “traditional” quarters. He randomly selects 18 “state” quarters and 16 “traditional” quarters. Their weights are compared.
84Paired TestingIn order to test the hypotheses regarding the mean difference, we need certain requirements to be satisfied.A simple random sample is obtainedThe sample data is matched pairsThe differences are normally distributed or the sample size, n, is large (n > 30).
95T-Tests Step 4: Compare the critical value with the test statistic: Step 5 : State the conclusion.
96T-TestsThe degrees of freedom used to determine the critical value(s) presented in the last example are conservative.Results that are more accurate can be obtained by using the following degrees of freedom:
98F-TestsRequirements for Testing Claims Regarding Two Population Standard Deviations1. The samples are independent simple random samples.2. The populations from which the samples are drawn are normally distributed.
101Characteristics of the F-distribution F-TestsCharacteristics of the F-distribution1. It is not symmetric. The F-distribution is skewed right.2. The shape of the F-distribution depends upon the degrees of freedom in the numerator and denominator. This is similar to the distribution and Student’s t-distribution, whose shape depends upon their degrees of freedom.3. The total area under the curve is 1.4. The values of F are always greater than or equal to zero.
103F-TestsIs the critical F with n1 – 1 degrees of freedom in the numerator and n2 – 1 degrees of freedom in the denominator and an area of to the right of the critical F.To find the critical F with an area of α to the left, use the following:
104Hypothesis Test Regarding the Two Means Population Standard Deviations F-TestsHypothesis Test Regarding the Two Means Population Standard Deviations