Presentation is loading. Please wait.

Presentation is loading. Please wait.

Instructor Resource Chapter 5 Copyright © Scott B. Patten, 2015. Permission granted for classroom use with Epidemiology for Canadian Students: Principles,

Similar presentations


Presentation on theme: "Instructor Resource Chapter 5 Copyright © Scott B. Patten, 2015. Permission granted for classroom use with Epidemiology for Canadian Students: Principles,"— Presentation transcript:

1 Instructor Resource Chapter 5 Copyright © Scott B. Patten, 2015. Permission granted for classroom use with Epidemiology for Canadian Students: Principles, Methods & Critical Appraisal (Edmonton: Brush Education Inc. www.brusheducation.ca).

2 Chapter 5. Random error from sampling

3 Objectives Identify and differentiate the 2 main sources of error in epidemiologic research: random error and systematic error. Describe the relationship between sampling and random error. Define confidence intervals and how to calculate them.

4 Objectives (continued) Describe the relationship between sample size and precision in a prevalence study. Differentiate estimation and statistical testing. Describe statistical testing and define key related concepts (significant versus nonsignificant tests, type I and type II error, statistical power). Explain the influence of sample size on statistical power.

5 Sources of error in epidemiological research Sources of error include: random error (a.k.a. stochastic error) systematic error (a.k.a. bias) A clear definition of bias comes from a clear understanding of what is meant by random error— which is why we are starting with random error.

6 PREVALENCE PREVALENCE is spelled in uppercase letters to indicate that the parameter is calculated from the population (not sampled) data. PREVALENCE is not an estimate: in the absence of measurement errors, it is the true population parameter.

7 Prevalence Prevalence is spelled in lowercase letters (prevalence) to indicate that the parameter is calculated from a sample. When calculated from a sample, prevalence is an estimate: repeating the process of sampling would result in different estimates. The different estimates are due to sampling variability. The difference between a true value and a sample- based estimate is a type of error: random error.

8 Random samples In a random sample, the selection of subjects into the sample cannot be predicted. Each person’s disease status is an independent observation that reflects true prevalence of disease in the population through the law of large numbers. The sample prevalence therefore estimates the true value, but can differ from the true value due to random error.

9 Sampling terminology In a probability sample, the probability of selecting a person from the population is known. A simple random sample is a basic form of a probability sample: the probability of selecting each member of the population is the same. The probability of selection is a selection probability. In practice, sampling requires a list from which to select. This is a sampling frame.

10 Sampling terminology (continued) Inference describes the process of gaining information about a population based on data collected from a sample. The target population is the subject of inference: it is the population whose parameters are estimated through sampling.

11 Sampling terminology (continued) A source population is a subset of a target population: it is a smaller population within a larger target population from which a sample is drawn. A study population is common term for a sample drawn from a source population: this is a confusing term because a “study population” is not a population, it’s a sample.

12 Dealing with random error The law of large numbers predicts that larger samples lead to parameter estimates (e.g., prevalence) that more closely reflect the true population values. Therefore, epidemiological studies prefer large samples. Nevertheless, random error needs to be addressed during data analysis.

13 Dealing with random error (continued) There are 2 general approaches: confidence intervals statistical tests Confidence intervals are the preferred approach.

14 Confidence intervals Confidence intervals define a range of plausible values for true population parameters, based on a desired level of confidence. Usually, 95% confidence is the desired level. A confidence interval consists of 2 numbers called confidence limits. The confidence interval comprises all values between the lower and upper confidence limits. You can be 95% confident that a 95% confidence interval captures the true population value.

15 Confidence intervals (continued) The best type of confidence intervals are exact confidence intervals. Others are based on approximations—for example, in a standard normal distribution, +/- 1.96 will include 95% of values, so if an estimate is normally distributed: Lower 95% Confidence Limit = Estimate – (1.96 x SE) Upper 95% Confidence Limit = Estimate + (1.96 x SE) where SE is the standard error associated with the estimate

16 Statistical tests Instead of providing a range of values, statistical tests are designed to help answer the question, “Is exposure associated with disease?” They follow a series of steps.

17 Statistical tests (continued) Step 1: Formulate a null hypothesis (e.g., there is no association between exposure and disease). Step 2: Calculate the probability of observing an effect as large, or larger, than observed due to chance, assuming that the null hypothesis is true. Step 3: If the probability in step 2 is small, the null hypothesis is rejected.

18 Statistical tests (continued) Statistical tests work by rejecting a hypothesis, not by proving a hypothesis. Null hypotheses are never rejected with certainty, they are just deemed unlikely The decision that a result (or one more extreme) is unlikely is usually based on its probability (given the null hypothesis) being less than 5% (p < 0.05).

19 Statistical errors Statistical tests can make 2 types of errors: rejecting a null assumption that is true (type I error) failing to reject a null assumption that is false (type II error)

20 An association exists in the population (null hypothesis is false) No association exists in the population (null hypothesis is true) Statistical test is significant No errorType I error Statistical test is nonsignificant Type II errorNo error

21 Statistical power Statistical power is the probability of rejecting a null hypothesis that is false. Power is calculated from: sample size (larger = greater power) effect size (bigger = greater power) probability at which null rejected (larger = greater power*) For continuous measures (e.g., comparing means), the standard deviation of the outcome also contributes to statistical power. * but this is usually set at the conventional 5% power and not changed to increase power

22 Probability of error The probability of type I error is: the value of probability at which the null is rejected The probability of type II error is: 1 – statistical power

23 End


Download ppt "Instructor Resource Chapter 5 Copyright © Scott B. Patten, 2015. Permission granted for classroom use with Epidemiology for Canadian Students: Principles,"

Similar presentations


Ads by Google