Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics for clinicians l Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Similar presentations


Presentation on theme: "Statistics for clinicians l Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,"— Presentation transcript:

1 Statistics for clinicians l Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida, College of Nursing Professor, College of Public Health Department of Epidemiology and Biostatistics Associate Member, Byrd Alzheimer’s Institute Morsani College of Medicine Tampa, FL, USA 1

2 SECTION 1.1 Module Overview and Introduction.Part 1 Introduction to biostatistics, descriptive statistics, SPSS, and Power Point.

3 Module 1 Learning Objectives: 1.Describe the characteristics of different types of variables (e.g. nominal, ordinal, continuous, etc.) 2.Calculate proportions, ratios, and percentages 3.Calculate and interpret the measures prevalence, incidence, and relative risk 4.Calculate and interpret descriptive statistics: mean, median, mode, range, percentiles, variance, standard deviation, etc. 5.Explain the properties of skewness and kurtosis 6.Identify the data structure and basic features of the SPSS software program. 7.Generate plots to depict frequency distributions (bar charts, box plots, line graphs, scatter plots, histograms). 8.Demonstrate the basics of SPSS data manipulation and use of the syntax editor.

4 Assigned Reading: Module 1 Textbook: Essentials of Biostatistics in Public Health Section 1.1 Chapter 3 Chapter 4

5 Biostatistics: The application of statistical principles in medicine, public health (e.g. nursing), or biology. Make health-related inferences about a population (i.e. we can’t study everyone in the population). Use biostatistical principles grounded in mathematical and probability theory for “best” estimates of health summaries and measures of effect or association.

6 Key Terms and Concepts: --Variable types (dichotomous, nominal, ordinal, continuous) --Proportions and percentages ---Ratios ---Prevalence, incidence, and relative risk ---Mean, median, mode, and range ---Percentiles and interquartile range ---Variance, standard deviation, and standard error of mean ---Coefficient of variation ---Skewness and kurtosis ---Introduction to SPSS ---Introduction to Power Point

7 Variable Types: ---Dichotomous (2 categories, may signify order) Male / female Low / high ---Nominal (2 or more categories, no order) Male / female Unmarried / married / divorced / widowed ---Ordinal (categorical variable, with categories ordered in a meaningful sequence) Strongly agree / agree / undecided / disagree / strongly disagree ---Continuous (can assume one of a large or infinite number of values) e.g. financial gain from $0 to $50,000

8 Variable Types (Identify the correct type(s): VariableScoringDichot.NominalOrdinalContin. Quality of life 1 = Poor 2 = Fair 3 = Average 4 = Good 5 = Very Good Ethnicity 1 = Non-Hispanic 2 = Hispanic Race 1 = African American 2 = Caucasian 3 = Other Diabetes 1 = Absent 2 = Present Systolic BP Ranges from 95 to 190 mmHg

9 Variable Types (Identify the correct type(s): VariableScoringDichot.NominalOrdinalContin. Quality of life 1 = Poor 2 = Fair 3 = Average 4 = Good 5 = Very Good ● Ethnicity 1 = Non-Hispanic 2 = Hispanic ●● Race 1 = African American 2 = Caucasian 3 = Other ● Diabetes 1 = Absent 2 = Present ● Systolic BP Ranges from 95 to 190 mmHg ●

10 Proportions and Percentages: l Persons included in the numerator are always included in the denominator: A Proportion (P):-------- A + Bwhere B = all remaining l Indicates the magnitude of a part, related to the total. l Tells us the fraction of the population that is affected. l Percentage = proportion x 100 l Proportion range: 0 to 1.0Percentage range: 0 to 100

11 Proportions and Percentages: Smoking StatusNP% Never600.46246.2 Former450.34634.6 Current250.19219.2 Total1301.0100.0 P NEVER =60 / (60 + 45 + 25) P FORMER =45 / (60 + 45 + 25) P CURRENT =25 / (60 + 45 + 25)

12 Proportions and Percentages (Calculate): Blood Pressure StatusNP% Normal40 Pre-hypertensive75 Stage I hypertension25 Stage II hypertension10 Total

13 Proportions and Percentages (Calculate): Blood Pressure StatusNP% Normal400.26726.7 Pre-hypertensive750.50050.0 Stage I hypertension250.16716.7 Stage II hypertension100.0676.7 Total1501.0100 P Normal =40 / (40 + 75 + 25 + 10) P pre-hyp =75 / (40 + 75 + 25 + 10) P stageI =25 / (40 + 75 + 25 + 10) P stageII =10 / (40 + 75 + 25 + 10)

14 Ratios: l Like a proportion, is a fraction, BUT without a specified relationship between the numerator and denominator l Example: Occurrence of Major Depression Female cases = 240240 ------------------------=----2:1 female to male Male cases = 120120

15 SECTION 1.2 EpidemiologicalMeasures

16 Prevalence (proportion): The presence (proportion) of disease or condition in a population (generally irrespective of the duration of the disease) Prevalence: Quantifies the “burden” of disease. Number of existing cases P =-------------------------------- Total population At a set point in time (i.e. September 30, 1999)

17 Prevalence (proportion): Example: On June 30, 1999, neighborhood A has: Population of 1,600 29 current cases of hepatitis B 1,571 individuals without hepatitis B So,P = 29 / 1600 = 0.018 or 1.8%

18 Cumulative Incidence (CI) No. of new cases of disease during a given period CI = -------------------------------------------------------------- Total population at risk during the given period Example: During a 1-year period, 10 out of 100 “at risk” persons develop the disease of interest. 10 CI = -----=0.10 or10.0% 100

19 Compares the incidence of disease (risk) among the exposed with the incidence of disease (risk) among the non-exposed (“reference”) by means of a ratio. The reference group assumes a value of 1.0 (the “null” value) {“Relative Risk (RR)”}

20 The ‘null’ value (1.0) If the relative risk estimate is > 1.0, the exposure appears to be a risk factor for disease. If the relative risk estimate is < 1.0, the exposure appears to be protective of disease occurrence.

21 Risk Ratio = Incidence E+ / Incidence E- Where E = exposure status E+E+ E-E- D+D+ 1016 D-D- 8302,224 8402,240 RR = I E+ / I E- RR = (10 / 840) / (16 / 2,240) RR = 0.0119 / 0.0071 = 1.68 Hypothesis:Being subject to physical abuse in childhood is associated with lifetime risk of attempted suicide Results:Of 2,240 children not subject to physical abuse, 16 have attempted suicide. Of 840 children subjected to physical abuse, 10 have attempted suicide.

22 Risk Ratio = Incidence E+ / Incidence E- Where E = exposure status E+E+ E-E- D+D+ D-D- RR = I E+ / I E- RR = Hypothesis:Being subject to physical abuse in childhood is associated with lifetime risk of attempted suicide Results:Of 1,750 children not subject to physical abuse, 14 have attempted suicide. Of 620 children subjected to physical abuse, 12 have attempted suicide. Practice:

23 Risk Ratio = Incidence E+ / Incidence E- Where E = exposure status E+E+ E-E- D+D+ 1214 D-D- 6081,736 6201,750 RR = I E+ / I E- RR = (12 / 620) / (14 / 1,750) RR = 0.01936 / 0.008 = 2.42 Hypothesis:Being subject to physical abuse in childhood is associated with lifetime risk of attempted suicide Results:Of 1,750 children not subject to physical abuse, 14 have attempted suicide. Of 620 children subjected to physical abuse, 12 have attempted suicide. Practice:

24 Odds Ratio = Odds of Exposure D+ / Odds of Exposure D- Where D = disease (outcome) status D+D+ D-D- E+E+ 12 (a)88 (b) E-E- 9 (c)391 (d) 21479 Cases: 2112 ate chili peppers 9 did not eat chili peppers Controls: 47988 ate chili peppers 391 did not eat chili peppers OR = (a / c) / (b / d) OR = (12 / 9) / (88 / 391) OR = 1.333 / 0.225 = 5.92 Hypothesis:Eating chili peppers is associated with development of gastric cancer.

25 Odds Ratio = Odds of Exposure D+ / Odds of Exposure D- Where D = disease (outcome) status D+D+ D-D- E+E+ (a) (b) E-E- (c) (d) Cases: 4414 ate chili peppers 30 did not eat chili peppers Controls: 610100 ate chili peppers 510 did not eat chili peppers OR = (a / c) / (b / d) OR = Hypothesis:Eating chili peppers is associated with development of gastric cancer. Practice:

26 Odds Ratio = Odds of Exposure D+ / Odds of Exposure D- Where D = disease (outcome) status D+D+ D-D- E+E+ 14 (a)100 (b) E-E- 30 (c)510 (d) 44610 Cases: 4414 ate chili peppers 30 did not eat chili peppers Controls: 610100 ate chili peppers 510 did not eat chili peppers OR = (a / c) / (b / d) OR = (14 / 30) / (100 / 510) OR = 0.467 / 0.196 = 2.38 Hypothesis:Eating chili peppers is associated with development of gastric cancer. Practice:

27 SECTION 1.3 DescriptiveStatistics

28 Mean, median, and mode are 3 kinds of "averages". “Mean" is the "average" where you add up all the numbers and then divide by the number of numbers. “Median" is the "middle" value in the list of numbers. “Mode" is the value that occurs most often. “Range" is the difference between largest and smallest values. Mean, Median, Mode, and Range

29 Formula for the population mean l The population mean is calculated using a formula: l  (mu) is the symbol for the population mean l “sum all the observations of x, and divide by n”

30 Formula for the sample mean l The sample mean is calculated using a formula: l x bar is the symbol for the sample mean l “sum all the observations of x, and divide by n” The mean and the median are summary measures used to describe the most "typical" value in a set of values. Statisticians refer to the mean and median as measures of central tendency.

31 Mean, Median, Mode, and Range Child12345678910 Age (years)81099 911 Weight (lbs.)52646570727680848894 1.Calculate Mean Weight X = (52+64+65+70+72+76+80+84+88+94) / 10 = 745 / 10 = 74.5 2.What is the median weight? Note that Child 5 and 6 are both in the “middle” Median = (72 + 76) / 2) = 74 3.What is the mode age?= 11 (4 occurrences) 4.What is the weight range?= 94 – 52 = 42

32 Mean, Median, Mode, and Range (Calculate:) Child12345678910 Age (years)1110811121012810 Weight (lbs.)72635994887288505866 1.Calculate Mean Weight X = 2.What is the median weight? = 3.What is the mode age?= 4.What is the weight range?= Weight (lbs.) Note: To determine the median and range, you need to reorder weight from lowest to highest.

33 Mean, Median, Mode, and Range (Calculate:) Child12345678910 Age (years)1110811121012810 Weight (lbs.)72635994887288505866 1.Calculate Mean Weight X = (50+58+59+63+66+72+72+88+88+94) / 10 = 71.0 2.What is the median weight? = (66 + 72) / 2) = 69 3.What is the mode age?= 10 (4 occurrences) 4.What is the weight range?= 94 – 50 = 44 Weight (lbs.)505859636672 88 94 Note: To determine the median, you need to reorder weight from lowest to highest

34 Percentile (or centile): Value of a variable below which a certain percent of observations fall. Example: The 20th percentile is the value (or score) below which 20 percent of the observations are found. The 25th percentile is known as first quartile (Q 1 ); 50th percentile as median or second quartile (Q 2 ); and the 75th percentile as the third quartile (Q 3 ). The interquartile range is equal to Q3 minus Q1. Percentiles and Interquartile Range

35 Tertiles:3 equal parts Percentile points = 33.3, 66.7 Quartiles:4 equal parts Percentile points = 25, 50, 75 Quintiles:5 equal parts Percentile points = 20, 40, 60, 80 Deciles:10 equal parts Percentile points = 10, 20, 30, 40, 50, 60, 70, 80, 90 Percentiles and Interquartile Range

36 Percentiles and Interquartile Range (Example) 45 60 75 Age in years % of participants 0 6 12

37 Percentile Points/Groups Tertiles: 0 – 33.3% 45 to 54 >33.3 – 66.7% 55 to 62 >66.7 to 100% 63 to 75 Quartiles: 0 - 25% 45 to 52 >25 – 50% 53 to 57 >50 – 75% 58 to 64 >75 to 100% 65 to 75 Quintiles: 0 - 20% 45 to 51 >20 – 40% 52 to 55 >40 – 60% 56 to 60 >60 to 80% 61 to 65 >80 – 100% 66 to 75

38 Percentiles and Interquartile Range (Identify) % of participants 42 90 138 Diastolic Blood Pressure mmHg 0 10 20

39 Percentile Points/Groups Tertiles: 0 – 33.3% ________ >33.3 – 66.7% ________ >66.7 to 100% ________ Quartiles: 0 - 25% ________ >25 – 50% ________ >50 – 75% ________ >75 to 100% ________ Quintiles: 0 - 20% ________ >20 – 40% ________ >40 – 60% ________ >60 to 80% ________ >80 – 100% ________

40 Percentile Points/Groups Tertiles: 0 – 33.3% 40 to 75 >33.3 – 66.7% 76 to 83 >66.7 to 100% 84 to 137 Quartiles: 0 - 25% 40 to 72 >25 – 50% 73 to 79 >50 – 75% 80 to 87 >75 to 100% 88 to 137 Quintiles: 0 - 20% 40 to 70 >20 – 40% 71 to 76 >40 – 60% 77 to 83 >60 to 80% 84 to 88 >80 – 100% 89 to 137

41 Variance, SD, and SE of Mean Population variance: Average squared deviation from the population mean, as defined by the following formula: σ 2 is the population variance μ is the population mean X is the ith element from the population n is number of elements in the population. Observations from a simple random sample can be used to estimate the variance of a population. For this purpose, sample variance is defined by slightly different formula, and uses a slightly different notation:

42 l Sample variance is calculated using a formula: Variance is the mean of the squared deviations of the observations

43 IDAgeX(X – X) 2 1 43 49.5 42.25 2 25 49.5 600.25 3 31 49.5 342.25 4 55 49.5 30.25 5 45 49.5 20.25 6 62 49.5 156.25 7 41 49.5 72.25 8 58 49.5 72.25 9 38 49.5 132.25 10 52 49.5 6.25 11 70 49.5 420.25 12 74 49.5 600.25 Total Mean: 49.5 ∑ = 2,495 2,495 S 2 X =------- 12 - 1 S 2 X =226.8 Sample Variance Calculation Range = (74 – 25) = 49 years

44 IDAgeX(X – X) 2 1 45 47.83 2 38 47.83 3 32 47.83 4 57 47.83 5 43 47.83 6 64 47.83 7 48 47.83 8 55 47.83 9 32 47.83 10 60 47.83 11 54 47.83 12 46 47.83 Total Mean: 47.83 ∑ = S2X =S2X = Sample Variance Calculation (Practice) Range =

45 IDAgeX(X – X) 2 1 45 47.83 8.03 2 38 47.83 96.69 3 32 47.83 250.69 4 57 47.83 84.03 5 43 47.83 23.36 6 64 47.83 261.36 7 48 47.83 0.03 8 55 47.83 51.36 9 32 47.83 250.69 10 60 47.83 148.03 11 54 47.83 38.03 12 46 47.83 3.36 Total Mean: 47.83 ∑ = 1,215.67 1,215.67 S 2 X = ------- 12 - 1 S 2 X =110.5 Sample Variance Calculation (Practice) Range = (64 – 32) = 32 years

46 S 2 X =226.8 S 2 X =110.5 Sample A Sample B Question: Why is the variance for Sample A much larger than the variance for Sample B?

47 Standard Deviation (SD): Square root of the variance. σ = sqrt [ σ 2 ] The standard deviation is a measure of variation Unlike variance, the SD is in the same scale as the variable of interest (i.e. age in this example) σ = sqrt [226.8] = 15.1 S 2 X = 226.8 S 2 X = 110.5 σ = sqrt [226.8] = 10.5

48 Standard Error of the Mean (SEM) SEM: Standard deviation (s) of the error in a sample mean relative to the true mean, per the formula below: Represents how close to the population mean the sample mean is likely to be Decreases with larger sample sizes, as the estimate of the population mean improves Note: Standard deviation is the degree to which individuals within a sample differ from the sample mean – hence, not affected by sample size

49 Coefficient of Variation (CV) CV: When a sample of data from the population is available, the population CV is estimated as the ratio of the sample standard deviation to the sample mean (see formula below) Provides an indication of the size of the standard deviation relative to the mean Independent of the unit in which measurement was taken (i.e. mean) Thus, is dimensionless and can be used to compare between datasets with widely different means

50 AgeBMI 1 4528 2 2623 3 5548 4 4631 5 6136 6 5722 7 3940 8 5025 9 7232 10 3842 N 10 Mean 48.9032.70 Variance 172.1075.34 SD 13.128.68 SEM 4.152.74 CV 0.27 Note the same coefficient of variation for age and BMI despite much different mean, variance, SD, and SEM CV is dimensionless and can be used to compare between datasets or variables with widely different means Coefficient of Variation (CV)

51 Skewness and Kurtosis Skewness: Measure of asymmetry of the distribution of a continuous variable. Can be positive or negative, or even undefined. Negative skew indicates that the tail on the left side of distribution is longer than the right side and bulk of the values lie to the right of the mean. Positive skew indicates that the tail on the right side is longer than the left side and bulk of the values lie to the left of the mean. General guideline for skewed distribution: absolute value > 1. For a sample of n values, the sample skewness is:

52 Skewness and Kurtosis Kurtosis: Measure of "peakedness" of the distribution of a continuous variable, as compared to a normal distribution. Similar to skewness, kurtosis is a descriptor of the shape of a probability distribution. For a sample of n values the sample excess kurtosis is


Download ppt "Statistics for clinicians l Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,"

Similar presentations


Ads by Google