Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.

Slides:



Advertisements
Similar presentations
Hypothesis Testing making decisions using sample data.
Advertisements

Analysis of frequency counts with Chi square
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
ChiSq Tests: 1 Chi-Square Tests of Association and Homogeneity.
QUANTITATIVE DATA ANALYSIS
The Simple Regression Model
Chapter Goals After completing this chapter, you should be able to:
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
= == Critical Value = 1.64 X = 177  = 170 S = 16 N = 25 Z =
Inferences About Process Quality
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
PSY 307 – Statistics for the Behavioral Sciences
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Nonparametric or Distribution-free Tests
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Presentation 12 Chi-Square test.
The Chi-square Statistic. Goodness of fit 0 This test is used to decide whether there is any difference between the observed (experimental) value and.
Chapter 5 Description of categorical data. Content Rate 、 proportion and ratio Application of relative numbers Standardization of rate Dynamic series.
1 Chi-Square Test(one) Chapter 8. 2 Content test of fourfold data test of R×C table Multiple comparison of sample rates test of paired fourfold data Fisher.
AM Recitation 2/10/11.
Hypothesis Testing:.
Fundamentals of Hypothesis Testing: One-Sample Tests
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Statistical Analysis Statistical Analysis
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
1/26/09 1 Community Health Assessment in Small Populations: Tools for Working With “Small Numbers” Region 2 Quarterly Meeting January 26, 2009.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Comparing Two Population Means
Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)
Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.
January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Ch9. Inferences Concerning Proportions. Outline Estimation of Proportions Hypothesis concerning one Proportion Hypothesis concerning several proportions.
Introduction to Medical Statistics Sun Jing Health Statistics Department.
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
Chapter 12 A Primer for Inferential Statistics What Does Statistically Significant Mean? It’s the probability that an observed difference or association.
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
The binomial applied: absolute and relative risks, chi-square.
Confidence intervals and hypothesis testing Petter Mostad
Chapter 22: Comparing Two Proportions. Yet Another Standard Deviation (YASD) Standard deviation of the sampling distribution The variance of the sum or.
© Copyright McGraw-Hill 2000
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
© Copyright McGraw-Hill 2004
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Medical Statistics Medical Statistics Tao Yuchun Tao Yuchun Practice 3
Postgraduate books recommended by Degree Management and Postgraduate Education Bureau, Ministry of Education Medical Statistics (the 2nd edition) 孙振球 主.
Medical Statistics Medical Statistics Tao Yuchun Tao Yuchun 9
Chapter 22 Comparing Two Proportions.  Comparisons between two percentages are much more common than questions about isolated percentages.  We often.
Chapter 3 Descriptive Statistics for Qualitative Data.
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.3 Other Ways of Comparing Means and Comparing Proportions.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Hypothesis Tests for 1-Proportion Presentation 9.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Comparing Two Proportions Chapter 21. In a two-sample problem, we want to compare two populations or the responses to two treatments based on two independent.
CHI-SQUARE(X2) DISTRIBUTION
Lecture8 Test forcomparison of proportion
The binomial applied: absolute and relative risks, chi-square
Association between two categorical variables
Chapter 9 Hypothesis Testing.
Comparing Populations
One-Way Analysis of Variance
Presentation transcript:

Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University

Chapter 10 Statistical Analysis of Enumeration Data

10.1 Statistical Description for enumeration data

Absolute measure: The numbers counted for each category (frequencies) The absolute measure can hardly be used for comparison between different populations.

1. Relative measure Three kinds of relative measures: Frequency (Proportion) Intensity (Rate) Ratio

(1) Relative Frequency Note: The Chinese text book is wrong! It is not “rate”! It is proportion or frequency!

Example 10-1(P.304, revised) Question: Which grade has the most serious condition of myopias?

Prevalence rates describe : P(Myopia|First grade) P(Myopia|Second grade) P(Myopia|Third grade) Constitute among myopias describe: P(First grade | Myopia) P(Second grade | Myopia) P(Third grade | Myopia) Which grade has the most serious condition of myopias? Answer: P(Myopia|Third grade) = Maximum --The third grade has the highest prevalence of myopias P(Second grade | Myopia)= Maximum -- Among the myopias, the absolute number of Second grade students is the highest.

(2) Intensity Example A smoking population had followed up for person-years, 346 lung cancer cases were found. The incidence rate of lung cancer in the smoking population is : The incidence rate of lung cancer in the smoking population is : Incidence rate =346/ Incidence rate =346/ =61.47 per 100,000 person-year =61.47 per 100,000 person-year

Example The mortality rate of liver cancer in Guangzhou is 32 per 100,000 per year.

In general, Denominator: Sum of the person-years observed in the period Numerator: Total number of the event appearing in the period Unit: person/person year, or 1/Year Nature: the relative frequency per unit of time.

(3) Ratio Ratio is a number divided by another related number Examples Sex ratio of students in this class: No. of males : No. of females = 52% Coefficient of variation: CV=SD/mean Ratio of time spent per clinic visit: Large hospital : Community health station = 81.9 min. : 18.6 min. = 4.40

2. Caution in use of relative measures a.The denominator should be big enough! Otherwise the absolute measure should be used. Example: Out of 5 cases, 3 were cured– 60% ? b. Attention to the population where the relative measure comes from. Mistake in the textbook (P.305) : “Distinguish between constitutes and proportion” !? We should say “Distinguish between Prevalence rate and Constitute among patients” Prevalence rate: Population is the students in the same grade Constitutes: Population is all the patients

The above two frequency distributions reflect two populations of all patients; To describe the prevalence rate, one has to look at the general population;

c. Pooled estimate of the frequency Pooled estimate =  numerators /  denominators Example: The prevalence of myopia among 3 grades ≠ ( )/3 The prevalence of myopia among 3 grades = ( )/( ) = 192/1175 = d. Comparability between frequencies or between frequency distributions – Notice the balance of other conditions

e. If the distributions of other variables are different, to improve the comparability, “Standardization” is needed. f. To compare two samples, hypothesis test is needed. (See Chi square test) The following will emphasize the above two points: Standardization Hypothesis test

3. Standardization for crude frequency or crude intensity 3. Standardization for crude frequency or crude intensity Crude incidence rate of city A=28.96; Crude incidence rate of city B= Strange!? They are not comparable ! -- Because the constitute are quite different Table 10-3 Incidence rates of infectious diseases, children of two cities

Standardized incidence rate of city A = 793/24767 = ‰ Standardized incidence rate of city B = 3523/24767 = ‰ Two steps: Select a standard population– taking as “weight” Weighted average of the actual incidence rates–direct standardization rate

Known: Age specific populations N i1, N i2 ; Total no.of deaths D i1 =432, D i2 =210 Select a set of standard mortality rates Standard mortality ratio: SMR 1 = D i1 / N i1 P i = 432/ = (smoker) SMR 2 = D i2 / N i2 P i = 210/ = (non-smoker) Standardized mortality rate P ’ 1 =34.60 SMR 1 = (1/10 5 ), P ’ 2 =34.60 SMR 2 =29.83 (1/10 5 )

10.2 Statistical Inference for Enumeration Data

1. Sampling error of frequency Example Suppose the death rate is 0.2, if the rats are fed with a kind of poison.. What will happen when we do the experiment on n=1, 2, 3 or 4 rat(s)?

In general, Supposed the population proportion is , sample size =n The frequency is a random variable When  is unknown and n is big enough, is approximately equal to

Example 10-5 HBV Surface antigen. 200 people were tested, 7 positive. Example 10-5 HBV Surface antigen. 200 people were tested, 7 positive.

If the sample size n is big enough, and observed frequency is p, then we have approximately

2. Confidence Interval of Probability If the sample size n is big enough, and observed frequency is p, then 95% Confidence interval 99% Confidence interval

Example 10-5 HBV Surface antigen. 200 people were tested, 7 positive. Example 10-5 HBV Surface antigen. 200 people were tested, 7 positive.

3. The hypothesis testing of proportion (u test) 1. Comparison of sample proportion and population proportion Example 10.6 Cerebral infarction Cases Cure rate New Method 98 50% Routine 30%

Statistic u Decision rule If, then reject Otherwise, no reason to reject (accept ) Since, reject

2. Comparison of two sample proportions Example 10.7 Carrier rate of Hepatitis B City: 522people were tested, 24 carriers, 4.06% (population carrier rate:  1 ) Countryside: 478people were tested, 33 carriers, 6.90% (population carrier rate:  2 )

Pooled estimate Standard error of P 1 -P 2

Statistic u Decision rule If, then reject Otherwise, no reason to reject (accept ) Since, not reject

Summary The parameter estimation and hypothesis testing of proportion are based on the normal approximation (when sample size is big enough) How big is enough? By experience, n  > 5 and n(1-  ) >5 If the sample size is not big, u test can’t be used and there is no t-test for proportion. (see more detailed text book)

10.3 Chi-square test

The u test can only be used for comparing  with a given  0 (one sample) or comparing  1 with  2 (two samples). If we need to compare more than two samples, Chi-square test is widely used.

1. Basic idea of  2 test Given a set of observed frequency distribution A 1, A 2, A 3 … to test whether the data follow certain theory. If the theory is true, then we will have a set of theoretical frequency distribution: T 1, T 2, T 3 … Comparing A 1, A 2, A 3 … and T 1, T 2, T 3 … If they are quite different, then the theory might not be true; Otherwise, the theory is acceptable.

Example10-8 Acute lower respiratory infection TreatmentEffectNon-effectTotalEffect rate Drug A68(64.82) a6(9.18) b74 (a+b)91.89 % Drug B52(55.18) c11(7.82) d63(c+d)82.54 % Total120 (a+c)17 (b+d) % (2) Chi-square test for 2  2 table H:  1 =  2, H:  1 ≠  2, α=0.05 H 0 :  1 =  2, H 1 :  1 ≠  2, α=0.05 To calculate the theoretical frequencies If H  1 =  2  120/137 If H 0 is true,  1 =  2  120/137 T 11 =74  120/137 =64.82, T 21 =63  120/137=55.18 T 11 =74  120/137 =64.82, T 21 =63  120/137=55.18 T 12 =74  17/137 =9.18, T 22 =63  17/137=7.82 T 12 =74  17/137 =9.18, T 22 =63  17/137=7.82

To compare A and T by a statistic  2 If H 0 is true,  2 follows a chi-square distribution. =(row-1)(column-1) If the  2 value is big enough, we doubt about H 0, then reject H 0 !

To Example10-8, =(row-1)(column-1)=(2-1)(2-1)=1,  (1) =3.84, Now,  2 = , H 0 is not rejected. We have no reason to say the effects of two treatments are different.

For 2  2 table, there is a specific formula of chi-square calculation: To Example10-8,

Large sample is required (1) N  40, T i  5, N  40 (2)If n < 40 or T i < 1,  2 test is not applicable (3)If N  40, 1  T i < 5, needs adjustment:

Example 10-9 Hematosepsis TreatmentEffectiveNo effectTotalEffective rate (%) Drug A28 (26.09)2 (3.91) Drug B12 (13.91)4 (2.09) Total

(3)  2 test for paired 2  2 table Example Two diagnosis methods are used respectively for 53 cases of lung cancer. Question: Are the two positive rates equal? Method AMethod BTotal (a)2(b)27 -11(c)15(d)26 Total Note:The two samples are not independent --The above  2 test does not work

Method AMethod BTotal (a)2(b)27 -11(c)15(d)26 Total Question: Are the two positive rates equal? Comparing and Basic idea: Comparing and Equivalent to Comparing “2” and “11” Comparing “2” and “11” Given 13 patients, do they fall in the two cells with equal chance? Example Two diagnosis methods are used respectively for 53 cases of lung cancer.

H 0 :  1 =  2, H 1 :  1 ≠  2, α=0.05 When H 0 is true, For large sample (b+c>40) Otherwise, needs adjustment If the  2 value is too big, then reject H 0

Example10-10: =1, 4.92>3.84, P 3.84, P<0.05, H 0 is rejected Conclusion: There is significant difference in positive rates between the two diagnosis methods. Since P A < P B, method B is better. Since P A < P B, method B is better.

(4)Chi-square test for R  C table

To calculate theoretical frequencies To calculate theoretical frequencies To compare A and T by statistic  2 To compare A and T by statistic  2 Specific formula Specific formula

Caution: (1) Either 2  2 table or R  C table are all called contingency table. 2  2 table is a special case of R  C table (2) When R>2, “H 0 is rejected”only means there is difference among some groups. Does not necessary mean that all the groups are different. (3) The  2 test requires large sample : By experience, The theoretical frequencies should be greater than 5 in more than 4/5 cells; The theoretical frequency in any cell should be greater than 1. Otherwise, we can not use chi-square test directly.

If the above requirements are violated, what should we do? (1) Increase the sample size. (2) Re-organize the categories, Pool some categories, or Cancel some categories Think: In fact, it is not appropriate to use a Chi-square test for Example in the textbook. Why?