Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.

Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University

Chapter 10 Statistical Analysis of Enumeration Data

10.1 Statistical Description for enumeration data

Absolute measure: The numbers counted for each category (frequencies) The absolute measure can hardly be used for comparison between different populations.

1. Relative measure Three kinds of relative measures: Frequency (Proportion) Intensity (Rate) Ratio

(1) Relative Frequency Note: The Chinese text book is wrong! It is not “rate”! It is proportion or frequency!

Example 10-1(P.304, revised) Question: Which grade has the most serious condition of myopias?

Prevalence rates describe : P(Myopia|First grade) P(Myopia|Second grade) P(Myopia|Third grade) Constitute among myopias describe: P(First grade | Myopia) P(Second grade | Myopia) P(Third grade | Myopia) Which grade has the most serious condition of myopias? Answer: P(Myopia|Third grade) = Maximum --The third grade has the highest prevalence of myopias P(Second grade | Myopia)= Maximum -- Among the myopias, the absolute number of Second grade students is the highest.

(2) Intensity Example A smoking population had followed up for 562833 person-years, 346 lung cancer cases were found. The incidence rate of lung cancer in the smoking population is : The incidence rate of lung cancer in the smoking population is : Incidence rate =346/562833 Incidence rate =346/562833 =61.47 per 100,000 person-year =61.47 per 100,000 person-year

Example The mortality rate of liver cancer in Guangzhou is 32 per 100,000 per year.

In general, Denominator: Sum of the person-years observed in the period Numerator: Total number of the event appearing in the period Unit: person/person year, or 1/Year Nature: the relative frequency per unit of time.

(3) Ratio Ratio is a number divided by another related number Examples Sex ratio of students in this class: No. of males : No. of females = 52% Coefficient of variation: CV=SD/mean Ratio of time spent per clinic visit: Large hospital : Community health station = 81.9 min. : 18.6 min. = 4.40

2. Caution in use of relative measures a.The denominator should be big enough! Otherwise the absolute measure should be used. Example: Out of 5 cases, 3 were cured– 60% ? b. Attention to the population where the relative measure comes from. Mistake in the textbook (P.305) : “Distinguish between constitutes and proportion” !? We should say “Distinguish between Prevalence rate and Constitute among patients” Prevalence rate: Population is the students in the same grade Constitutes: Population is all the patients

The above two frequency distributions reflect two populations of all patients; To describe the prevalence rate, one has to look at the general population;

c. Pooled estimate of the frequency Pooled estimate =  numerators /  denominators Example: The prevalence of myopia among 3 grades ≠ (15.16+15.89+18.37)/3 The prevalence of myopia among 3 grades = (67+68+56)/(442+428+305) = 192/1175 = 16.34 d. Comparability between frequencies or between frequency distributions – Notice the balance of other conditions

e. If the distributions of other variables are different, to improve the comparability, “Standardization” is needed. f. To compare two samples, hypothesis test is needed. (See Chi square test) The following will emphasize the above two points: Standardization Hypothesis test

3. Standardization for crude frequency or crude intensity 3. Standardization for crude frequency or crude intensity Crude incidence rate of city A=28.96; Crude incidence rate of city B=35.03 -- Strange!? They are not comparable ! -- Because the constitute are quite different Table 10-3 Incidence rates of infectious diseases, children of two cities

Standardized incidence rate of city A = 793/24767 = 32.02 ‰ Standardized incidence rate of city B = 3523/24767 = 21.12 ‰ Two steps: Select a standard population– taking as “weight” Weighted average of the actual incidence rates–direct standardization rate

Known: Age specific populations N i1, N i2 ; Total no.of deaths D i1 =432, D i2 =210 Select a set of standard mortality rates Standard mortality ratio: SMR 1 = D i1 / N i1 P i = 432/100.67 = 4.2912 (smoker) SMR 2 = D i2 / N i2 P i = 210/100.67 = 0.8620 (non-smoker) Standardized mortality rate P ’ 1 =34.60 SMR 1 =148.48 (1/10 5 ), P ’ 2 =34.60 SMR 2 =29.83 (1/10 5 )

10.2 Statistical Inference for Enumeration Data

1. Sampling error of frequency Example Suppose the death rate is 0.2, if the rats are fed with a kind of poison.. What will happen when we do the experiment on n=1, 2, 3 or 4 rat(s)?

In general, Supposed the population proportion is , sample size =n The frequency is a random variable When  is unknown and n is big enough, is approximately equal to

Example 10-5 HBV Surface antigen. 200 people were tested, 7 positive. Example 10-5 HBV Surface antigen. 200 people were tested, 7 positive.

If the sample size n is big enough, and observed frequency is p, then we have approximately

2. Confidence Interval of Probability If the sample size n is big enough, and observed frequency is p, then 95% Confidence interval 99% Confidence interval

Example 10-5 HBV Surface antigen. 200 people were tested, 7 positive. Example 10-5 HBV Surface antigen. 200 people were tested, 7 positive.

3. The hypothesis testing of proportion (u test) 1. Comparison of sample proportion and population proportion Example 10.6 Cerebral infarction Cases Cure rate New Method 98 50% Routine 30%

Statistic u Decision rule If, then reject Otherwise, no reason to reject (accept ) Since, reject

2. Comparison of two sample proportions Example 10.7 Carrier rate of Hepatitis B City: 522people were tested, 24 carriers, 4.06% (population carrier rate:  1 ) Countryside: 478people were tested, 33 carriers, 6.90% (population carrier rate:  2 )

Pooled estimate Standard error of P 1 -P 2

Statistic u Decision rule If, then reject Otherwise, no reason to reject (accept ) Since, not reject

Summary The parameter estimation and hypothesis testing of proportion are based on the normal approximation (when sample size is big enough) How big is enough? By experience, n  > 5 and n(1-  ) >5 If the sample size is not big, u test can’t be used and there is no t-test for proportion. (see more detailed text book)

10.3 Chi-square test

The u test can only be used for comparing  with a given  0 (one sample) or comparing  1 with  2 (two samples). If we need to compare more than two samples, Chi-square test is widely used.

1. Basic idea of  2 test Given a set of observed frequency distribution A 1, A 2, A 3 … to test whether the data follow certain theory. If the theory is true, then we will have a set of theoretical frequency distribution: T 1, T 2, T 3 … Comparing A 1, A 2, A 3 … and T 1, T 2, T 3 … If they are quite different, then the theory might not be true; Otherwise, the theory is acceptable.

Example10-8 Acute lower respiratory infection TreatmentEffectNon-effectTotalEffect rate Drug A68(64.82) a6(9.18) b74 (a+b)91.89 % Drug B52(55.18) c11(7.82) d63(c+d)82.54 % Total120 (a+c)17 (b+d)13753.59 % (2) Chi-square test for 2  2 table H:  1 =  2, H:  1 ≠  2, α=0.05 H 0 :  1 =  2, H 1 :  1 ≠  2, α=0.05 To calculate the theoretical frequencies If H  1 =  2  120/137 If H 0 is true,  1 =  2  120/137 T 11 =74  120/137 =64.82, T 21 =63  120/137=55.18 T 11 =74  120/137 =64.82, T 21 =63  120/137=55.18 T 12 =74  17/137 =9.18, T 22 =63  17/137=7.82 T 12 =74  17/137 =9.18, T 22 =63  17/137=7.82

To compare A and T by a statistic  2 If H 0 is true,  2 follows a chi-square distribution. =(row-1)(column-1) If the  2 value is big enough, we doubt about H 0, then reject H 0 !

To Example10-8, =(row-1)(column-1)=(2-1)(2-1)=1,  2 0.05(1) =3.84, Now,  2 =2.734 0.05, H 0 is not rejected. We have no reason to say the effects of two treatments are different.

For 2  2 table, there is a specific formula of chi-square calculation: To Example10-8,

Large sample is required (1) N  40, T i  5, N  40 (2)If n < 40 or T i < 1,  2 test is not applicable (3)If N  40, 1  T i < 5, needs adjustment:

Example 10-9 Hematosepsis TreatmentEffectiveNo effectTotalEffective rate (%) Drug A28 (26.09)2 (3.91)3093.33 Drug B12 (13.91)4 (2.09)1675.00 Total4064686.96

(3)  2 test for paired 2  2 table Example 10-10 Two diagnosis methods are used respectively for 53 cases of lung cancer. Question: Are the two positive rates equal? Method AMethod BTotal +- +25(a)2(b)27 -11(c)15(d)26 Total361753 Note:The two samples are not independent --The above  2 test does not work

Method AMethod BTotal +- +25(a)2(b)27 -11(c)15(d)26 Total361753 Question: Are the two positive rates equal? Comparing and Basic idea: Comparing and Equivalent to Comparing “2” and “11” Comparing “2” and “11” Given 13 patients, do they fall in the two cells with equal chance? Example 10-10 Two diagnosis methods are used respectively for 53 cases of lung cancer.

H 0 :  1 =  2, H 1 :  1 ≠  2, α=0.05 When H 0 is true, For large sample (b+c>40) Otherwise, needs adjustment If the  2 value is too big, then reject H 0

Example10-10: =1, 4.92>3.84, P 3.84, P<0.05, H 0 is rejected Conclusion: There is significant difference in positive rates between the two diagnosis methods. Since P A < P B, method B is better. Since P A < P B, method B is better.

(4)Chi-square test for R  C table

To calculate theoretical frequencies To calculate theoretical frequencies To compare A and T by statistic  2 To compare A and T by statistic  2 Specific formula Specific formula

Caution: (1) Either 2  2 table or R  C table are all called contingency table. 2  2 table is a special case of R  C table (2) When R>2, “H 0 is rejected”only means there is difference among some groups. Does not necessary mean that all the groups are different. (3) The  2 test requires large sample : By experience, The theoretical frequencies should be greater than 5 in more than 4/5 cells; The theoretical frequency in any cell should be greater than 1. Otherwise, we can not use chi-square test directly.

If the above requirements are violated, what should we do? (1) Increase the sample size. (2) Re-organize the categories, Pool some categories, or Cancel some categories Think: In fact, it is not appropriate to use a Chi-square test for Example 10-10 in the textbook. Why?

Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.

Similar presentations

Presentation on theme: "Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.

Similar presentations

Presentation on theme: "Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University."— Presentation transcript:

Similar presentations

About project

Feedback