Appropriate techniques of statistical analysis Anil C Mathew PhD Professor of Biostatistics & General Secretary ISMS PSG Institute of Medical Sciences.

Presentation on theme: "Appropriate techniques of statistical analysis Anil C Mathew PhD Professor of Biostatistics & General Secretary ISMS PSG Institute of Medical Sciences."— Presentation transcript:

Appropriate techniques of statistical analysis Anil C Mathew PhD Professor of Biostatistics & General Secretary ISMS PSG Institute of Medical Sciences and Research Coimbatore 641 004

Types of studies Case study Case series Cross sectional studies Case control study Cohort study Randomized controlled trials Screening test evaluation

Data analysis-Case series Measures of averages Mean, Median, Mode Length of stay for 5 patients 1,3,2,4,5 Mean length of stay 3 days Median length of stay 3 days Mode length of stay No mode

Which is the best average MeanMedianMode DBP817976 Height180 SAL7.57.68.1

Data analysis-case series Frequency distribution RBCFrequencyRelative frequency 5.95-7.9510.029 7.95-9.9580.229 9.95-11.95140.400 11.95-13.9590.257 13.95-15.9520.057 15.95-17.9510.029 Total351.000

Design of Cohort Study Time Direction of inquiry Population People without the disease Exposed Not Exposed no disease disease no disease disease

Is obesity associated with adverse pregnancy outcomes? Women with a Body Mass Index > 30 delivering singletons. Ref- University of Udine, Italy,2006 Preterm BirthNo preterm birth % Obese1635 T=51 31.4 Normal46487 T=533 8.6 RR= 3.65

Design of Case Control Study Disease No Disease Not Exposed Exposed Not Exposed Exposed

Results of a Case Control Study Lung Cancer (D+) No Lung Cancer (D-) Totals Exposed (E+)80 a30 ba + b Non exposed (E-) 20 c70 dc + d Totals100 a + c100 b + d

Analysis of Case-control study Odds ratio = a*d/b*c =80*70/30*20 =9.3

Data Analysis-Screening Test Evaluation-Whether the plasma levels of (Breast Carcinoma promoting factor) could be used to diagnose breast cancer? Positive criterion of BCPF >150 units vs. Breast Biopsy (the gold standard) D+ D- BCPF Test T+570150720 T-30850880 600 1000 1600 TP = 570FN = 30 FP = 150TN = 850

Sensitivity = P (T+/D+)=570/600 = 95% Specificity = P(T-/D-) = 850/1000 = 85% False negative rate = 1 – sensitivity False positive rate = 1 – specificity Prevalence = P(D+) = 600/1600 = 38% Positive predictive value = P (D+/T+) = 570/720 = 79%

Tradeoffs between sensitivity and specificity When the consequences of missing a case are potentially grave When a false positive diagnosis may lead to risky treatment

Data analysis-case series Measures of variation Range Standard deviation Group 1Group 2 2925 30 3135

Data analysis- Analytical studies Tests of significance

Case Study 1: Drug A and Drug B Aim: Efficacy of two drugs on lowering serum cholesterol levels Method: Drug A – 50 Patients Drug B – 50 Patients Result: Average serum cholesterol level is lower in those receiving drug B than drug A at the end of 6 months

What is the Conclusion?

A)Drug B is superior to Drug A in lowering cholesterol levels : Possible/Not possible

B) Drug B is not superior to Drug A, instead the difference may be due to chance: Possible/Not possible

C) It is not due to drug, but uncontrolled differences other than treatment between the sample of men receiving drug A and drug B account for the difference: Possible/Not possible

D) Drug A may have selectively administrated to patients whose serum cholesterol levels were more refractory to drug therapy: Possible/Not possible

Observed difference in a study can be due to 1) Random change 2) Biased comparison 3) Uncontrolled confounding variables

Solutions: A and B Test of Significance – p value P<0.05, means probability that the difference is due to random chance is less than 5% P<0.01, means probability that the difference is due to random chance is less than 1% P value will not tell about the magnitude of the difference

Solutions: C and D Random allocation and compare the baseline characteristics

Figure 1

Table 1-Baseline Characteristics CharacteristicVitamin group (n = 141) Placebo group (n = 142) Mean age ± SD, y28.9 ± 6.429.8 ± 5.6 Smokers, n (%)22 (15.6)14 (9.9) Mean body mass index ± SD, kg/m225.3 ± 6.025.6 ± 5.6 Mean blood pressure ± SD, mm Hg Systolic Diastolic 112 ± 15 67 ± 11 110 ± 12 68 ± 10 Parity, n %) 0 1 2 >2 91 (65) 39 (28) 9 (6) 2 (1) 87 (61) 42 (30) 8 (6) 5 (4) Coexisting disease, n (%) Essential hypertension Lupus/antiphospholipid syndrome Diabetes 10 (7%) 4 (3%) 2 (1%) 7 (5%) 1(1%) 3 (2%)

“t” Test Ho: There is no difference in mean birth weight of children from HSE and LSE in the population CR = t = | X1 - X2 | SD 1 + 1 n1 n2 SD = (n1-1)SD1 2 + (n2-1)SD2 2 n1 + n2- 2 SD = 14*0.27 2 + 9*0.22 2 = 0.25 23 t = | 2.91 – 2.26| = 6.36 0.25 1 + 1 15 10 DF = n1 + n2 – 2 CAL > Table REJECT Ho

GENERAL STEPS IN HYPOTHESIS TESTING 1 ) State the hypothesis to be tested 2) Select a sample and collect data 3) Calculate the test statistics 4) Evaluate the evidence against the null hypothesis 5) State the conclusion

Commonly used statistical tests T test-compare two mean values Analysis of variance-Compare more than two mean values Chi square test-Compare two proportions Correlation coefficient-relationship of two continuous variables

Data entry format Treatment Ageweight Diabetes Painscore-bPainscore-a Vomiting 121501960 1245301090 125551991 1285001061 1296001050 1206501080 026600990 025901991 024801991 0288901081 0228611091 0224501090

Example t test Body temperature c Simple febrile seizure N = 25 Febrile without seizure N =25 P value Mean39.0138.64P<0.001 SD0.560.45

Example-Analysis of variance Serum zinc level in simple febrile patients based on duration of seizure occurred Duration min nMeanSDP value < 5310.270.25P <0.001 5 to 10189.020.81 >1046.900.98

Example Chi-square test Characteristics of patients in the two groups Duration of fever (hour) Simple febrile seizure Febrile without seizure P value < 24166P<0.05 More than 24919

Example Correlation We found a negative correlation between serum zinc level and simple febrile seizure event r = - 0.86 p <0.001

Type 1 and Type 2 Errors Ho True Ho False / H1 True Accept Ho Reject Ho Power = 1- β Correct decisionType 2 error β = P (Type 2 error) Type 1 error α = P (Type 1 error) Correct decision

Multivariate problem Main outcome Continuous variable-Linear regression Dichotomous variable-Logistic regression

Bradford Hills Questions Introduction- Why did you start? Methods-What did you do? Results- What did you find? Discussion- What does it mean?

How to begin writing? Data Tables  Methods, Results  Introduction, Discussion  Abstract  Title, Key words, References

Thank you

Similar presentations