Presentation is loading. Please wait.

Presentation is loading. Please wait.

Estimating Population Parameters Based on a Sample

Similar presentations


Presentation on theme: "Estimating Population Parameters Based on a Sample"— Presentation transcript:

1 Estimating Population Parameters Based on a Sample
5/12/2019 HK Dr. Sasho MacKenzie

2 Why Estimate? It is often not feasible (lack of time and money) to measure an entire population. Therefore, a researcher must select a representative sample from the population and make estimations. This general principle is used frequently in research and is known as statistical inference. 5/12/2019 HK Dr. Sasho MacKenzie

3 Estimating a Population Mean
Researchers often want to know the mean of a population. E.g., Health Canada, may want to understand obesity trends over the next 10 years. The first step would be to measure obesity in the population on an annual basis (measure BMI). Researchers cannot measure all 20 million adult Canadians every year. Each year a random sample is measured and used to estimate the entire population. 5/12/2019 HK Dr. Sasho MacKenzie

4 Sampling Error It is unlikely that the sample will have exactly the same mean as the entire population. Sampling error is the amount of error in the estimate of a population parameter that is based on a sample statistic. Therefore, Health Canada needs to determine how accurate the mean BMI of the sample is and what the odds are that it is different from the population mean by a given amount. 5/12/2019 HK Dr. Sasho MacKenzie

5 Standard Error of the Mean (SEM)
Standard error of the mean is a numeric value that indicates the amount of error that may occur when estimating a population mean. The estimation of the population mean is always an educated guess and is accompanied by a probability statement. I.e., upper and lower limits can be set around the estimated mean and the chance of the true mean falling in this range can be stated as a probability such as, 5 out of 100 times, or p=.05 5/12/2019 HK Dr. Sasho MacKenzie

6 Understanding SEM Consider the following theoretical exercise.
Take 100 random samples (N=400) of the Canadian adult population and find the mean BMI of each sample. This means measuring 400 people and getting a mean BMI, then repeating the process 100 times. This generates 100 estimates of the population mean. 5/12/2019 HK Dr. Sasho MacKenzie

7 Understanding SEM The majority of the 100 sample means will cluster around the true mean of the population. However, some will also stray further from the true population mean. The sample means will form a normal distribution in the same way individual BMI measurements within a sample form a normal distribution. The standard deviation of the 100 sample means is the SEM. 5/12/2019 HK Dr. Sasho MacKenzie

8 Individual BMI Scores of One Sample
99.8% 95.4% 68.2% Standard Deviation = 4 Frequency 34.1% 34.1% 13.6% 13.6% 2.2% 2.2% 15 19 23 27 31 35 39 BMI (Kg/m2) 0.1% 0.1% 5/12/2019 HK Dr. Sasho MacKenzie

9 Interpreting Previous Slide
The sample had a mean of 27 and a SD of 4. It formed a normal distribution, which means 68.2% of the scores lie between 23 and 31 95.4% of the scores lie between 19 and 35 99.8% of the scores lie between 15 and 39 These values can be used to estimate the proportion of the entire population that would fall within the above limits. But, Health Canada needs an estimate of the mean! 5/12/2019 HK Dr. Sasho MacKenzie

10 Distribution of the 100 Sample Means
99.8% 95.4% 68.2% SEM = 0.2 Frequency 34.1% 34.1% 13.6% 13.6% 2.2% 2.2% 26.4 26.6 26.8 27 27.2 27.4 27.6 BMI (Kg/m2) 0.1% 0.1% 5/12/2019 HK Dr. Sasho MacKenzie

11 Interpreting Previous Slide
The mean of the 100 sample means was 27 and the SD of the 100 sample means was 0.2. The means are normally distributed, therefore, 68.2% chance that: < true mean < 27.2 95.4% chance that: < true mean < 27.4 99.8% chance that: < true mean < 27.6 The more precise, or narrow the estimate, the lower the odds of being correct. As the estimate becomes more encompassing, the odds of being correct improve. 5/12/2019 HK Dr. Sasho MacKenzie

12 Calculating SEM in Reality
It is not logical to take 100 samples and then find the SD of the means of those samples. There is an equation used to calculate SEM that is based on the SD of the sample, and the number of measurements in the sample. SD = sample standard deviation N = the number of measurements in the sample 5/12/2019 HK Dr. Sasho MacKenzie

13 SEM Example Suppose Health Canada measured the BMI of 1 sample of 400 adults and found: Mean = 27, SD = 4 Therefore, This is in agreement with the standard deviation of the 100 samples means from the last graph. 5/12/2019 HK Dr. Sasho MacKenzie

14 SEM is a Z-score SEM is actually a standard deviation on a normal curve; therefore, it is equivalent to a Z-score of ±1. The true mean of the population can be represented by the following equation. Using the previous example, and Z = 1, True mean = 27 ± 0.2 5/12/2019 HK Dr. Sasho MacKenzie

15 Level of Confidence A level of confidence (LOC) is a percentage figure that establishes the probability that a statement is correct. It is based on the characteristics of the normal curve. Using the example from the last slide, Health Canada can conclude that the mean BMI for adults, 27 ± 0.2, is accurate at the 68% level of confidence. 5/12/2019 HK Dr. Sasho MacKenzie

16 What if 68% isn’t enough? If Health Canada wanted to be 95.4% confident, then they would broaden the estimate of the mean to the values on the normal curve that encompass 95% of the area. Now we need 2 standard deviations: Z = 2, True mean = 27 ± 0.4 This estimate is accurate at the 95.4% LOC 5/12/2019 HK Dr. Sasho MacKenzie

17 Probability of Error (p-value)
If there is a 68% chance of being correct, there is also a 32% chance of being incorrect. This is referred to as the probability of error. The area under the curve that represents the probability of error is called alpha (). Alpha is the level of chance occurrence. Alpha is directly related to Z because alpha is the area under the curve that extends beyond a given Z-score. 5/12/2019 HK Dr. Sasho MacKenzie

18 Z, Level of Confidence, and P-value
1.00 68% .32 1.65 90% .10 1.96 95% .05 2.58 99% .01 The above table shows the relationship of Z-score, LOC, and the two-tailed p-value. By tradition, LOC is presented as a percentage, and the probability of error as a decimal. 5/12/2019 HK Dr. Sasho MacKenzie

19 Graphic of LOC, p-value, and alpha
Level of Confidence = 90% Probability of Error = 0.1 (5% + 5% = 0.1) Alpha = 0.1 Frequency 90% 5% 26.67, Z= -1.65 27.33, Z= 1.65 26.4 26.6 26.8 27 27.2 27.4 27.6 BMI (Kg/m2) 5/12/2019 HK Dr. Sasho MacKenzie

20 Tails of the Normal Curve
On the last slide, Health Canada had to consider the area on both ends (tails) of the curve. This was necessary since the true mean could be either above, or below, the estimated range. This is considered a two-tailed problem. The following question would be considered a one-tailed problem. What is the chance that the mean BMI of the population is greater than 27.5? 5/12/2019 HK Dr. Sasho MacKenzie

21 One-Tailed Problem To answer this question, we need to convert 27.5 to a Z-score and determine the area under the normal curve beyond that Z-score. Z = (27.5 – 27) / 0.2 = 2.5 standard deviations To find the area beyond Z=2.5, we could consult a table in stats book or use Excel. The equation: =1-Normsdist(2.5) in Excel provides the correct p-value of This means there is a 0.6% chance that the mean BMI of adult Canadians is greater than 27.5 Kg/m2. 5/12/2019 HK Dr. Sasho MacKenzie


Download ppt "Estimating Population Parameters Based on a Sample"

Similar presentations


Ads by Google