Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Biostatistics for Clinical and Translational Researchers KUMC Departments of Biostatistics & Internal Medicine University of Kansas Cancer.

Similar presentations


Presentation on theme: "Introduction to Biostatistics for Clinical and Translational Researchers KUMC Departments of Biostatistics & Internal Medicine University of Kansas Cancer."— Presentation transcript:

1 Introduction to Biostatistics for Clinical and Translational Researchers KUMC Departments of Biostatistics & Internal Medicine University of Kansas Cancer Center FRONTIERS: The Heartland Institute of Clinical and Translational Research

2 Course Information Jo A. Wick, PhD Office Location: 5028 Robinson Email: jwick@kumc.edu Lectures are recorded and posted at http://biostatistics.kumc.edu under ‘Educational Opportunities’ http://biostatistics.kumc.edu

3 Course Objectives Understand the role of statistics in the scientific process Understand features, strengths and limitations of descriptive, observational and experimental studies Distinguish between association and causation Understand roles of chance, bias and confounding in the evaluation of research

4 Course Calendar June 29: Descriptive Statistics and Core Concepts July 6: Hypothesis Testing July 13: Linear Regression & Survival Analysis July 20: Clinical Trial & Experimental Design

5 Probability Review

6 Experiment An experiment is a process whose results are not known until after it has been performed. The range of possible outcomes are known in advance We do not know the exact outcome, but would like to know the chances of its occurrence The probability of an outcome E, denoted P(E), is a numerical measure of the chances of E occurring. 0 ≤ P(E) ≤ 1

7 Probability The most common definition of probability is the relative frequency view: Probabilities for the outcomes of a random variable x are represented through a probability distribution:

8 Population Parameters Most often our research questions involve unknown population parameters: What is the average BMI among 5th graders? What proportion of hospital patients acquire a hospital- based infection? To determine these values exactly would require a census. However, due to a prohibitively large population (or other considerations) a sample is taken instead.

9 Sample Statistics Statistics describe or summarize sample observations. They vary from sample to sample, making them random variables. We use statistics generated from samples to make inferences about the parameters that describe populations.

10 Sampling Variability Population Samples Sampling Distribution of μ σ

11 Types of Samples Random sample: each person has equal chance of being selected. Convenience sample: persons are selected because they are convenient or readily available. Systematic sample: persons selected based on a pattern. Stratified sample: persons selected from within subgroup.

12 Random Sampling For studies, it is optimal (but not always possible) for the sample providing the data to be representative of the population under study. Simple random sampling provides a representative sample (theoretically). A sampling scheme in which every possible sub-sample of size n from a population is equally likely to be selected Assuming the sample is representative, the summary statistics (e.g., mean) should be ‘good’ estimates of the true quantities in the population. The larger n is, the better estimates will be.

13 Types of Samples We will explore the impact of sampling when we discuss Experimental Design on July 20.

14 Hypothesis Testing

15 Recall: Types of Data All data contains information. It is important to recognize that the hierarchy implied in the level of measurement of a variable has an impact on (1) how we describe the variable data and (2) what statistical methods we use to analyze it.

16 Levels of Measurement Nominal: difference Ordinal: difference, order Interval: difference, order, equivalence of intervals Ratio: difference, order, equivalence of intervals, absolute zero discrete qualitative continuous quantitative

17 Types of Data NOMINAL ORDINAL INTERVAL RATIO Information increases

18 Levels of Measurement The levels are in increasing order of mathematical structure—meaning that more mathematical operations and relations are defined—and the higher levels are required in order to define some statistics. At the lower levels, assumptions tend to be less restrictive and the appropriate data analysis techniques tend to be less sensitive. In general, it is desirable to have a higher level of measurement.

19 Levels of Measurement LevelStatistical Summary Mathematical Relation/Operation NominalModeone-to-one transformations OrdinalMedianmonotonic transformations IntervalMean, Standard Deviationpositive linear transformations RatioGeometric Mean, Coefficient of Variationmultiplication by c  0

20 Recall: Hypotheses Null hypothesis “H 0 ”: statement of no differences or association between variables This is the hypothesis we test—the first step in the ‘recipe’ for hypothesis testing is to assume H 0 is true Alternative hypothesis “H 1 ”: statement of differences or association between variables This is what we are trying to prove

21 Hypothesis Testing One-tailed hypothesis: outcome is expected in a single direction (e.g., administration of experimental drug will result in a decrease in systolic BP) H 1 includes ‘ ’ Two-tailed hypothesis: the direction of the effect is unknown (e.g., experimental therapy will result in a different response rate than that of current standard of care) H 1 includes ‘≠‘

22 Hypothesis Testing The statistical hypotheses are statements concerning characteristics of the population(s) of interest: Population mean: μ Population variability: σ Population rate (or proportion): π Population correlation: ρ Example: It is hypothesized that the response rate for the experimental therapy is greater than that of the current standard of care. π Exp > π SOC ← This is H 1.

23 Recall: Decisions Type I Error (α): a true H 0 is incorrectly rejected “An innocent man is proven GUILTY in a court of law” Commonly accepted rate is α = 0.05 Type II Error (β): failing to reject a false H 0 “A guilty man is proven NOT GUILTY in a court of law” Commonly accepted rate is β = 0.2 Power (1 – β): correctly rejecting a false H 0 “Justice has been served” Commonly accepted rate is 1 – β = 0.8

24 Decisions

25 Basic Recipe for Hypothesis Testing 1. State H 0 and H 1 2. Assume H 0 is true 3. Collect the evidence—from the sample data, compute the appropriate sample statistic and the test statistic Test statistics quantify the level of evidence within the sample—they also provide us with the information for computing a p-value (e.g., t, chi-square, F) 4. Determine if the test statistic is large enough to meet the a priori determined level of evidence necessary to reject H 0 (... or, is p < α?)

26 Example: Carbon Monoxide An experiment is undertaken to determine the concentration of carbon monoxide in air. It is hypothesized that the actual concentration is significantly greater than 10 mg/m 3. Eighteen air samples are obtained and the concentration for each sample is measured. The random variable (outcome) x is carbon monoxide concentration. The characteristic (parameter) of interest is μ—the true average concentration of carbon monoxide in air.

27 Step 1: State H 0 & H 1 H 1 : μ > 10 mg/m 3 ← We think! H 0 : μ ≤ 10 mg/m 3 ← We assume in order to test! μ = 10 Step 2: Assume μ = 10

28 Step 3: Evidence 10.2510.3710.66 10.4710.5610.22 10.4410.3810.63 10.4010.3910.26 10.3210.3510.54 10.3310.4810.68 Sample statistic: Test statistic: What does 1.79 mean? How do we use it?

29 Student’s t Distribution Remember when we assumed H 0 was true? Step 2: Assume μ = 10 μ = 10

30 Student’s t Distribution What we were actually doing was setting up this theoretical Student’s t distribution from which the p- value can be calculated: t = 0

31 Student’s t Distribution Assuming the true air concentration of carbon monoxide is actually 10 mg/mm 3, how likely is it that we should get evidence in the form of a sample mean equal to 10.43? Step 2: Assume μ = 10 μ = 10

32 Student’s t Distribution We can say how likely by framing the statement in terms of the probability of an outcome: t = 0 t = 1.79 p = P(t ≥ 1.79) = 0.0456

33 Step 4: Make a Decision Decision rule: if p ≤ α, the chances of getting the actual collected evidence from our sample given the null hypothesis is true are very small. The observed data conflicts with the null ‘theory.’ The observed data supports the alternative ‘theory.’ Since the evidence (data) was actually observed and our theory (H 0 ) is unobservable, we choose to believe that our evidence is the more accurate portrayal of reality and reject H 0 in favor of H 1.

34 Step 4: Make a Decision What if our evidence had not been in as great of degree of conflict with our theory? p > α: the chances of getting the actual collected evidence from our sample given the null hypothesis is true are pretty high We fail to reject H 0.

35 Decision How do we know if the decision we made was the correct one? We don’t! If α = 0.05, the chances of our decision being an incorrect reject of a true H 0 are no greater than 5%. We have no way of knowing whether we made this kind of error—we only know that our chances of making it in this setting are relatively small.

36 Which test do I use? What kind of outcome do you have? Nominal? Ordinal? Interval? Ratio? How many samples do you have? Are they related or independent?

37 Types of Tests One Sample Measurement Level Population Parameter Hypotheses Sample Statistic Inferential Method(s) Nominal Proportion π H 0 : π = π 0 H 1 : π ≠ π 0 Binomial test or z test (if np > 10 & nq > 10) OrdinalMedian M H 0 : M = M 0 H 1 : M ≠ M 0 m = p 50 Wilcoxon signed-rank test IntervalMean μ H 0 : μ = μ 0 H 1 : μ ≠ μ 0 Student’s t or Wilcoxon (if non-normal or small n) RatioMean μ H 0 : μ = μ 0 H 1 : μ ≠ μ 0 Student’s t or Wilcoxon (if non-normal or small n)

38 Types of Tests Parametric methods: make assumptions about the distribution of the data (e.g., normally distributed) and are suited for sample sizes large enough to assess whether the distributional assumption is met Nonparametric methods: make no assumptions about the distribution of the data and are suitable for small sample sizes or large samples where parametric assumptions are violated Use ranks of the data values rather than actual data values themselves Loss of power when parametric test is appropriate

39 Types of Tests Two Independent Samples Measurement Level Population Parameters Hypotheses Sample Statistics Inferential Method(s) Nominalπ 1, π 2 H 0 : π 1 = π 2 H 1 : π 1 ≠ π 2 Fisher’s exact or Chi-square (if cell counts > 5) OrdinalM 1, M 2 H 0 : M 1 = M 2 H 1 : M 1 ≠ M 2 m 1, m 2 Median test Intervalμ 1, μ 2 H 0 : μ 1 = μ 2 H 1 : μ 1 ≠ μ 2 Student’s t or Mann-Whitney (if non-normal, unequal variances or small n) Ratioμ 1, μ 2 H 0 : μ 1 = μ 2 H 1 : μ 1 ≠ μ 2 Student’s t or Mann-Whitney (if non-normal, unequal variances or small n)

40 # Groups 2 Normal or large n Independent Samples 2-sample t Dependent Samples Paired t Non-normal or small n Independent Samples Wilcoxon Signed-Rank Dependent Samples Wilcoxon Rank- Sum > 2 Normal or large n Independent Samples ANOVA Dependent Samples 2-way ANOVA Non-normal or small n Independent Samples Kruskal-Wallis Dependent Samples Friedman’s Comparing Central Tendency

41 One-Sample Test of a Mean Dissolving times (seconds) of a drug in gastric juice: It is hypothesized that the drug will take more than 45 seconds to fully dissolve. H 1 : μ > 45 H 0 : μ ≤ 45 42.743.444.645.1 45.645.946.847.6 t = 0 p = P(t > 0.36) = 0.36

42 Two-Sample Test of Means Clotting times (minutes) of blood for subjects given one of two different drugs: It is hypothesized that the two drugs will result in different blood-clotting times. H 1 : μ B ≠ μ G H 0 : μ B = μ G Drug BDrug G 8.88.49.99.0 7.98.711.19.6 9.19.68.710.4 9.5

43 Two-Sample Test of Means What we’re actually hypothesizing: H 0 : μ B  μ G = 0 μ B  μ G = 0

44 Two-Sample Test of Means What we’re actually hypothesizing: H 0 : μ B  μ G = 0 t = 0 p = P(|t| >  2.475) = 0.03 t =  2.48

45 Assumptions of t In order to use the parametric Student’s t test, we have a few assumptions that need to be met: Approximate normality of the observations In the case of two samples, approximate equality of the sample variances

46 Assumption Checking To assess the assumption of normality, a simple histogram would show any issues with skewness or outliers:

47 Assumption Checking Skewness

48 Assumption Checking Other graphical assessments include the QQ plot:

49 Assumption Checking Violation of normality:

50 Assumption Checking To assess the assumption of equal variances (when groups = 2), simple boxplots would show any issues with heteroscedasticity:

51 Assumption Checking Rule of thumb: if the larger variance is more than 2 times the smaller, the assumption has been violated

52 Now what? If you have enough observations (20? 30?) to be able to determine that the assumptions are feasible, check them. If violated: Try a transformation to correct the violated assumptions (natural log) and reassess; proceed with the t-test if fixed If a transformation doesn’t work, proceed with a non-parametric test Skip the transformation altogether and proceed to the non- parametric test If okay, proceed with t-test.

53 Now what? If you have too small a sample to adequately assess the assumptions, perform the non- parametric test instead. For the one-sample t, we typically substitute the Wilcoxon signed-rank test For the two-sample t, we typically substitute the Mann- Whitney test

54 Consequences of Nonparametric Testing Robust! Less powerful because they are based on ranks which do not contain the full level of information contained in the raw data When in doubt, use the nonparametric test—it will be less likely to give you a ‘false positive’ result.

55 Summary Probability review Population parameters Sample statistics Types of samples Hypothesis testing Matching the level of measurement to the type of test Recipe for hypothesis testing Types of tests Parametric versus nonparametric Assumption checking


Download ppt "Introduction to Biostatistics for Clinical and Translational Researchers KUMC Departments of Biostatistics & Internal Medicine University of Kansas Cancer."

Similar presentations


Ads by Google