Normality,Sampling & Hypothesis Testing and sample size estimation Jobayer Hossain, PhD Larry Holmes, Jr, PhD October 23, 2008 RESEARCH STATISTICS.

Slides:



Advertisements
Similar presentations
Introduction to Statistics
Advertisements

MARE 250 Dr. Jason Turner The Normal Distribution.
Nemours Biomedical Research Statistics March 19, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Normal Distribution The Normal Distribution is a density curve based on the following formula. It’s completely defined by two parameters: mean; and standard.
Fundamentals of Hypothesis Testing. Identify the Population Assume the population mean TV sets is 3. (Null Hypothesis) REJECT Compute the Sample Mean.
Topic 2: Statistical Concepts and Market Returns
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 8-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter Sampling Distributions and Hypothesis Testing.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Chapter 11: Inference for Distributions
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
Ch. 9 Fundamental of Hypothesis Testing
Chapter 8 Introduction to Hypothesis Testing
© 1999 Prentice-Hall, Inc. Chap Chapter Topics Hypothesis Testing Methodology Z Test for the Mean (  Known) p-Value Approach to Hypothesis Testing.
Inference for regression - Simple linear regression
Chapter 10 Hypothesis Testing
Confidence Intervals and Hypothesis Testing - II
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
© 2002 Prentice-Hall, Inc.Chap 7-1 Statistics for Managers using Excel 3 rd Edition Chapter 7 Fundamentals of Hypothesis Testing: One-Sample Tests.
© 2003 Prentice-Hall, Inc.Chap 9-1 Fundamentals of Hypothesis Testing: One-Sample Tests IE 340/440 PROCESS IMPROVEMENT THROUGH PLANNED EXPERIMENTATION.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Fundamentals of Hypothesis Testing: One-Sample Tests
Week 8 Fundamentals of Hypothesis Testing: One-Sample Tests
Topic 5 Statistical inference: point and interval estimate
© 2003 Prentice-Hall, Inc.Chap 7-1 Business Statistics: A First Course (3 rd Edition) Chapter 7 Fundamentals of Hypothesis Testing: One-Sample Tests.
Lecture 7 Introduction to Hypothesis Testing. Lecture Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses.
Introduction to Hypothesis Testing: One Population Value Chapter 8 Handout.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
NOTES The Normal Distribution. In earlier courses, you have explored data in the following ways: By plotting data (histogram, stemplot, bar graph, etc.)
The Practice of Statistics Third Edition Chapter 10: Estimating with Confidence Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
Testing of Hypothesis Fundamentals of Hypothesis.
HYPOTHESIS TESTING. Statistical Methods Estimation Hypothesis Testing Inferential Statistics Descriptive Statistics Statistical Methods.
© 2002 Prentice-Hall, Inc.Chap 7-1 Business Statistics: A First course 4th Edition Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
Review of Chapters 1- 6 We review some important themes from the first 6 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Statistics for Managers 5th Edition Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Chap 8-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 8 Introduction to Hypothesis.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
© 2004 Prentice-Hall, Inc.Chap 9-1 Basic Business Statistics (9 th Edition) Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
© 2001 Prentice-Hall, Inc.Chap 9-1 BA 201 Lecture 14 Fundamentals of Hypothesis Testing.
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
What is a Hypothesis? A hypothesis is a claim (assumption) about the population parameter Examples of parameters are population mean or proportion The.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Chapter 13 Sampling distributions
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
1 Probability and Statistics Confidence Intervals.
1 of 53Visit UMT online at Prentice Hall 2003 Chapter 9, STAT125Basic Business Statistics STATISTICS FOR MANAGERS University of Management.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
More on Inference.
Lecture Nine - Twelve Tests of Significance.
Statistics for Managers using Excel 3rd Edition
Statistical inference: distribution, hypothesis testing
When we free ourselves of desire,
CHAPTER 29: Multiple Regression*
More on Inference.
Presentation transcript:

Normality,Sampling & Hypothesis Testing and sample size estimation Jobayer Hossain, PhD Larry Holmes, Jr, PhD October 23, 2008 RESEARCH STATISTICS

Bell-shaped Histogram Left half of a bell shaped or symmetric histogram is the mirror image of the right half histogram.

Normal Distribution The Normal Distribution is a density curve based on the following formula. – It’s completely defined by two parameters: mean; and standard deviation. A density function describes the overall pattern of a distribution. The total area under the curve is always 1.0. mmetrical. The normal distribution is symmetrical. – What does this mean? The mean, median The mean, median, and mode are all the same.

The beauty the Normal Distribution The Rule : In the normal distribution with mean µ and standard deviation σ: 68% of the observations fall within σ of the mean µ. 95% of the observations fall within 2σ of the mean µ. 99.7% of the observations fall within 3σ of the mean µ. No matter what  (mean) and  (standard deviation) are, the area between  -  and  +  is about 68%; the area between  -2  and  +2  is about 95%; and the area between  -3  and  +3  is about 99.7%. Almost all values fall within 3 standard deviations. The is called rule.

Rule 68% of the data 95% of the data 99.7% of the data Graph illustrating normal distribution by SDs. Credit: SU --++  +2   +3  -3   -2 

Normal Distribution Standardizing and z-Scores Standardizing and z-Scores If x is an observation from a distribution that has mean µ and standard deviation σ, the standardized value of x is, A standardized value is often called a z-score. If x is a normal variable with mean µ and standard deviation σ, then z is a standard normal variable with mean 0 and standard deviation 1.

Normal Distribution Let x 1, x 2, …., x n be n random variables each with mean µ and standard deviation σ, then sum of them ∑xi be also a normal with mean nµ and standard deviation σ√n. The distribution of mean is also a normal with mean µ and standard deviation σ/√n. The standardized score of the mean is, The mean of this standardized random variable is 0 and standard deviation is 1.

Are the data normally distributed? 1.Look at the histogram! Does it appear bell shaped? 2.Compute descriptive summary measures — are mean, median, and mode similar? 3.Do 2/3 of observations lie within 1 std dev of the mean? Do 95% of observations lie within 2 std dev of the mean? 4.Look at a normal probability plot — is it approximately linear? 5.Or Look at normal quantile plot? 6.Run tests of normality (such as Kolmogorov-Smirnov (K-S) or Shapiro-Wilk W statistic). To perform a K-S test or Shapiro-Wilk test for Normality in SPSS, Analyze> Descriptive statistics -> Explore -> Select variable in the dependent list -> select plots -> select normality plot with tests -> Continue -> OK

Normal quantile plot q-q plot of 100 sample observations from a normal distribution with mean 0 and standard deviation 1 If points lie on or close to a straight diagonal line, it indicates the data are normal Point (s) far away from over all pattern indicates outlier (s). Systematic deviations from a straight line indicates deviation from normality

Population and Sample

Population and sample Population: The entire collection of individuals, objects or measurements that we want information about. Sample: A subset (part) of the population that we select to examine in order to gather information. – Primary objective is to create a sample so that the distribution of the sample is similar to the distribution of the population. That is to create a subset of population whose center, spread and shape are as close as that of population. – Methods of sampling: Random sampling, stratified sampling, systematic sampling, cluster sampling, multistage sampling, area sampling, qoata sampling etc.

Population and Sample Random Sample: A simple random sample of size n from a population is a subset of n elements from that population where the subset is chosen in such a way that every possible unit of population has the same chance of being selected. Example: Consider a population of 5 numbers (1, 2, 3, 4, 5). How many random samples (without replacement) of size 2 can we draw from this population ? (1,2), (1,3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3,4), (3,5), (4,5)

Population and Sample Population mean of the five numbers in previous slide is 3. Averages of 10 samples of sizes 2 are 1.5, 2, 2.5, 3, 2.5, 3, 3.5, 3.5, 4, 4.5. Mean of this 10 averages ( )/10 =3 which is the same as the population mean. Why do we need randomness in sampling? It reduces the possibility of subjective and other biases. Mean and variance of a random sample is an unbiased estimate of the population mean and variance respectively.

Sampling error and bias

Sampling Variability and standard error If we repeat an experiment or measurement on the same number of subjects, the statistic varies as sample varies. This variability is known sampling variability Standard error (SE) measures the sampling variability or the precision of an estimate. – It indicates how precisely one can estimate a population value from a given sample. – For a large sample, approximately 68% of times sample estimate will be with in one SE of population value.

Parameter vs Statistics Parameter: – Any statistical characteristic of a population. – Population mean, population median, population standard deviation, difference of two population means are examples of parameters. e.g: The mean systolic BP of all AIDHC employees is 112 Hg mm. – Parameters describe the distribution of a population – Parameters are fixed and usually unknown

Parameter vs Statistic Statistic: Any statistical characteristic of a sample. – Sample mean, sample median, sample standard deviation, sample proportion, odds ratio, sample correlation coefficient are some examples of statistics. – Mean systolic BP of a sample of 50 AIDHC emplyees or the difference of means systolic BP for a sample of 25 women and 25 men at AIDHC. – Statistic describes the distribution of population – Value of a statistic is known and is varies for different samples – STATISTIC are used for making inference on parameter

Statistical inference is the process by which we acquire information about populations from samples. Two types of estimates for making inferences: – Point estimation. e.g mean SBP – Interval estimation e.g. CI Statistical Inference Sample Population

Elements/Steps in hypothesis Hypothesis testing steps: – 1. Null (Ho) and alternative (H 1 )hypothesis specification – 2. Selection of significance level (alpha) or 0.01 – 3. Calculating the test statistic –e.g. t, F, Chi-square – 4. Calculating the probability value (p-value) or confidence Interval? – 5. Describing the result and statistic in an understandable way.

A hypothesis is an assumption about the population parameter. – A parameter is a characteristic of the population, like its mean or variance. – The parameter (mean) must be identified before analysis. We assume the mean SBP of men at AIDH is 135 Hg mm What is a Hypothesis?

States the Assumption (numerical) to be tested e.g. The mean SBP AIDH employee = 130 Hg/mm Begin with the assumption that the null hypothesis is TRUE. (Similar to the notion of innocent until proven guilty) The Null Hypothesis, H 0 Refers to the Status Quo Always contains the ‘ = ‘ sign The Null Hypothesis may or may not be rejected.

Is the opposite of the null hypothesis E.g. The mean SBP AIDH employee is not 130 Hg/mm Challenges the Status Quo Never contains the ‘=‘ sign The Alternative Hypothesis may or may not be accepted Is generally the hypothesis that is believed to be true by the researcher The Alternative Hypothesis, H 1

Steps: – State the Null Hypothesis (H 0 :  = 130) – State its opposite, the Alternative Hypothesis (H 1 :  < 130) Hypotheses are mutually exclusive & exhaustive Sometimes it is easier to form the alternative hypothesis first. Identify the Problem

Population Assume the population mean age is 130 Hg/mm (Null Hypothesis) REJECT The Sample Mean Is 130 Sample Null Hypothesis Hypothesis Testing Process No, not likely!

Hypothesis Testing Goal: Keep ,  reasonably small

  Reduce probability of one error and the other one goes up.  &  Have an Inverse Relationship

True Value of Population Parameter – Increases When Difference Between Hypothesized Parameter & True Value Decreases Significance Level  – Increases When  Decreases Population Standard Deviation  – Increases When   Increases Factors Affecting Type II Error,     

True Value of Population Parameter – Increases When Difference Between Hypothesized Parameter & True Value Decreases Significance Level  – Increases When  Decreases Population Standard Deviation  – Increases When   Increases Sample Size n – Increases When n Decreases Factors Affecting Type II Error,       n

Choice depends on the cost of the error Choose little type I error when the cost of rejecting the maintained hypothesis or standard treatment is high Choose large type I error when you have an interest in changing the the standard treatment How to choose between Type I and Type II errors

Point estimator Sample distribution Parameter ? Population distribution A point estimate draws inference about a population by estimating the value of an unknown parameter using a single value or a point. Point Estimation

Interval estimator Sample distribution An interval estimator draws inferences about a population by estimating the value of an unknown parameter using an interval. Population distribution Parameter Interval Estimation

Confidence Interval (CI) point estimate  (measure of how confident we want to be)  (standard error) The value of the statistic in my sample (eg., mean) Critical value for a statistic Standard error of the statistic. What effect does larger sample size have on the confidence interval? It reduces standard error and makes CI narrower indicating more precision of estimate

P-Value versus the Confidence Interval Two main ways to assess study precision and the role of chance in a study. – P value measures ( in probability) the evidence against the null hypothesis. – A p-value of 0.05 means that in about 5 of 100 experiments, a result would appear significant just by chance (“Type I error”).

P-Value versus the Confidence Interval – A confidence interval (CI) is an interval within which the value of the parameter lies with a specified probability – CI measures the precision of an estimate (when sampling variability is high, the interval is wide to reflect the uncertainty of the estimate) – A 95% CI implies that if one repeats a study 100 times, the true measure of association will lie inside the CI in 95 out of 100 measures. If a parameter does not lie within 95% CI, indicates significance at 5% level of significance

Procedures for sample size calculation Selection of primary variables of interest and formulation of hypotheses Information of standard deviation ( if numeric) or proportion (if categorical) A tolerance level of significance (  ) Selection of reasonable test statistic Power or Confidence level A scientifically or clinically meaning effect/ difference

Useful links for sample size Calculation 1) 2) 3) 4) 5)

What sample size is needed to be 95% confident of being correct within ± 6? A previous study suggested that the standard deviation is 40. Example: Sample Size for Mean using CI

What sample size is needed to be within ± 5% with a 95% confidence to estimate the proportion of AIDHC employees with Flu shot already? Suppose in a very small sample it has been seen that 40% of AIDHC employees had flu shot already. Example: Sample Size for Proportion using CI

Credits Thanks are due to Faith Goa of the Golden State University for the implied permission to utilize some of the illustrations from their slides on “Fundamentals of Hypothesis Testing” for education purposes only. Other sources consulted during the preparation of these slides are herein acknowledged as well.

Questions