Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.

Similar presentations


Presentation on theme: "Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND."— Presentation transcript:

1 Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND

2 Statistical inference revisited Statistical inference use data from samples to make inferences about a population 1. Estimate the population parameter Characterized by confidence interval of the magnitude of effect of interest 2. Test the hypothesis being formulated before looking at the data Characterized by p-value

3 n = 25 X = 52 SD = 5 Sample Population Parameter estimation [95%CI] Hypothesis testing [P-value] Parameter estimation [95%CI] Hypothesis testing [P-value]

4 n = 25 X = 52 SD = 5 SE = 1 Sample Population Parameter estimation [95%CI] : 52-1.96(1) to 52+1.96(1) 50.04 to 53.96 We are 95% confidence that the population mean would lie between 50.04 and 53.96 [95%CI] : 52-1.96(1) to 52+1.96(1) 50.04 to 53.96 We are 95% confidence that the population mean would lie between 50.04 and 53.96 Z = 2.58 Z = 1.96 Z = 1.64

5 n = 25 X = 52 SD = 5 SE = 1 Sample Hypothesis testing Hypothesis testing Population Z = 55 – 52 1 3 H 0 :  = 55 H A :   55

6 Hypothesis testing H 0 :  = 55 H A :   55 If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is 0.0027. Hypothesis testing H 0 :  = 55 H A :   55 If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is 0.0027. Z = 55 – 52 1 3 P-value = 1-0.9973 = 0.0027 5552 -3SE +3SE

7 Calculation of the previous example based on t-distribution Stata command to find probability.di (ttail(24, 3))*2.00620574 Stata command to find t value for 95%CL. di (invttail(24, 0.025)) 2.0638986 Web base stat table: http://vassarstats.net/tabs.htmlWeb base stat table: http://vassarstats.net/tabs.html or www.stattrek.comwww.stattrek.com

8 Revisit the example based on t-distribution (Stata output) Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------- | 25 52 1 49.9361 54.0639 1. Estimate the population parameter 2. Test the hypothesis being formulated before looking at the data One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 52 1 5 49.9361 54.0639 ------------------------------------------------------------------------------ mean = mean(x) t = -3.0000 Ho: mean = 55 degrees of freedom = 24 Ha: mean 55 Pr(T |t|) = 0.0062 Pr(T > t) = 0.9969

9 Mean one group: T-test a 1 2 2 5 5 1. HypothesisH 0 :  = 0 H a :   0 2. Data 3. Calculating for t-statistic 4. Obtain p-value based on t-distribution 5. Make a decision P-value = 0.023 Reject the null hypothesis at level of significant of 0.05 The mean of y is statistically significantly different from zero. Stata command.di (ttail(4, 3.59))*2.02296182

10 Mean one group: T-test (cont.) One-sample t test ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- y | 5 3.83666 1.870829.6770594 5.322941 ------------------------------------------------------------------------------ mean = mean(y) t = 3.5857 Ho: mean = 0 degrees of freedom = 4 Ha: mean 0 Pr(T |t|) = 0.0231 Pr(T > t) = 0.0115

11 One sample t-test using SPSS Please do the data analysis using SPSS and paste the results here.

12 Comparing 2 means: T-test ab 15 29 29 58 59 1. HypothesisH 0 :  A =  B H a :  A   B 2. Data 3. Calculating for t-statistic 4. Obtain p-value based on t-distribution 5. Make a decision P-value = 0.002 (http://vassarstats.net/tabs.html)http://vassarstats.net/tabs.html Reject the null hypothesis at level of significant of 0.05 Mean of Group A is statistically significantly different from that of Group B.

13 T-test Two-sample t test with equal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- a | 5 3.83666 1.870829.6770594 5.322941 b | 5 8.7745967 1.732051 5.849375 10.15063 ---------+-------------------------------------------------------------------- combined | 10 5.5.9916317 3.135815 3.256773 7.743227 ---------+-------------------------------------------------------------------- diff | -5 1.140175 -7.629249 -2.370751 ------------------------------------------------------------------------------ diff = mean(1) - mean(2) t = -4.3853 Ho: diff = 0 degrees of freedom = 8 Ha: diff 0 Pr(T |t|) = 0.0023 Pr(T > t) = 0.9988

14 Two independent sample t-test using SPSS Please do the data analysis using SPSS and paste the results here.

15 Mann-Whitney U test Wilcoxon rank-sum test Two-sample Wilcoxon rank-sum (Mann-Whitney) test group | obs rank sum expected -------------+--------------------------------- 1 | 5 16 27.5 2 | 5 39 27.5 -------------+--------------------------------- combined | 10 55 55 unadjusted variance 22.92 adjustment for ties -1.25 ---------- adjusted variance 21.67 Ho: y(group==1) = y(group==2) z = -2.471 Prob > |z| = 0.0135

16 Mann-Whitney U test Wilcoxon rank-sum test using SPSS Please do the data analysis using SPSS and paste the results here.

17 Comparing 2 means : ANOVA Mathematical model of ANOVAX =  +  +  X = Grand mean + Treatment effect + Error X = M + T + E 3. Calculating for F-statistic 4. Obtain p-value based on F-distribution 5. Make a decision P-value = 0.002 (http://vassarstats.net/tabs.html)http://vassarstats.net/tabs.html Reject the null hypothesis at level of significant of 0.05 Mean of Group A is statistically significantly different from that of Group B. XM T Mean: 3 8 = ++ E [3-5.5] [8-5.5] SST SSE Degree of freedom 1 1 8 Between groups Within groups

18 ANOVA 2 groups Analysis of Variance Source SS df MS F Prob > F ------------------------------------------------------------------------ Between groups 62.5 1 62.5 19.23 0.0023 Within groups 26 8 3.25 ------------------------------------------------------------------------ Total 88.5 9 9.83333333 Bartlett's test for equal variances: chi2(1) = 0.0211 Prob>chi2 = 0.885

19 Comparing 3 means: ANOVA 1. HypothesisH 0 :  A =  B =  C H a : At least one mean is difference 2. Data abc 154 294 296 588 594

20 ANOVA 3 groups (cont.) Mathematical model of ANOVAX =  +  +  X = Grand mean + Treatment effect + Error X = M + T + E 3. Calculating for F-statistic 4. Obtain p-value based on F-distribution 5. Make a decision P-value = 0.003 (http://vassarstats.net/tabs.html)http://vassarstats.net/tabs.html Reject the null hypothesis at level of significant of 0.05 At least one mean of the three groups is statistically significantly different from the others. XM T Mean: 3 8 5.2 =+ + E [3-5.4] [8-5.4] [5.2-5.4] SST SSE Df: 15 1 2 12 Between groups Within groups

21 ANOVA 3 groups Analysis of Variance Source SS df MS F Prob > F ------------------------------------------------------------------------ Between groups 62.8 2 31.4 9.71 0.0031 Within groups 38.8 12 3.23333333 ------------------------------------------------------------------------ Total 101.6 14 7.25714286 Bartlett's test for equal variances: chi2(2) = 0.0217 Prob>chi2 = 0.989

22 ANOVA 3 groups using SPSS Please do the data analysis using SPSS and paste the results here.

23 Kruskal-Wallis test Kruskal-Wallis equality-of-populations rank test +------------------------+ | group | Obs | Rank Sum | |-------+-----+----------| | 1 | 5 | 22.00 | | 2 | 5 | 61.50 | | 3 | 5 | 36.50 | +------------------------+ chi-squared = 7.985 with 2 d.f. probability = 0.0185 chi-squared with ties = 8.190 with 2 d.f. probability = 0.0167

24 Kruskal-Wallis test using SPSS Please do the data analysis using SPSS and paste the results here.

25 Comparing 2 means: Regression ab 15 29 29 58 59 1. Data xy(x-x)(x-x) 2 (y-y)(x-x)(y-y) 11 -0.50.25-4.52.25 12 -0.50.25-3.51.75 12 -0.50.25-3.51.75 15 -0.50.25-0.50.25 15 -0.50.25-0.50.25 25 0.50.25-0.5-0.30 29 0.50.253.51.75 29 0.50.253.51.75 28 0.50.252.51.25 29 0.50.253.51.75 Mean 1.5 5.5 Sum 2.5 12.5 y = a + bx whereb = 12.5/2.5 = 5, then 5.5=a + 5(1.5) Thusa = 5.5-7.5 = -2

26 Comparing 2 means: Regression Y x 10 0 2 4 6 8 -2 a b ab 15 29 29 58 59

27 Comparing 2 means: Regression (cont.) Y x 10 0 2 4 6 8 -2 1 2 xy 1 1 1 2 1 2 1 5 1 5 25 29 29 28 29

28 Comparing 2 means: Regression (cont.) Y x 10 0 2 4 6 8 -2 1 2 y = a + bx b difference of y between x=1 vs. x=2 a y = 3 if x = 1 y = 8 if x = 2 y = -2 if x = 0 y = -2 + 5x y = 5.5; x = 1.5

29 Regression model (2 means) Source | SS df MS Number of obs = 10 -------------+------------------------------ F( 1, 8) = 19.23 Model | 62.5 1 62.5 Prob > F = 0.0023 Residual | 26 8 3.25 R-squared = 0.7062 -------------+------------------------------ Adj R-squared = 0.6695 Total | 88.5 9 9.83333333 Root MSE = 1.8028 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- group | 5 1.140175 4.39 0.002 2.370751 7.629249 _cons | -2 1.802776 -1.11 0.299 -6.157208 2.157208 ------------------------------------------------------------------------------

30 i.group _Igroup_1-3 (naturally coded; _Igroup_1 omitted) Source | SS df MS Number of obs = 15 -------------+------------------------------ F( 2, 12) = 9.71 Model | 62.8 2 31.4 Prob > F = 0.0031 Residual | 38.8 12 3.23333333 R-squared = 0.6181 -------------+------------------------------ Adj R-squared = 0.5545 Total | 101.6 14 7.25714286 Root MSE = 1.7981 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Igroup_2 | 5 1.137248 4.40 0.001 2.522149 7.477851 _Igroup_3 | 2.2 1.137248 1.93 0.077 -.2778508 4.677851 _cons | 3.8041559 3.73 0.003 1.247895 4.752105 ------------------------------------------------------------------------------ Regression model (3 means)

31 Correlation coefficient Pearson product moment correlation – Denoted by r (for the sample) or  (for the population) – Require bivariate normal distribution assumption – Require linear relationship Spearman rank correlation – For small sample, not require bivariate normal distribution assumption

32 Regression model using SPSS Please do the data analysis using SPSS and paste the results here.

33 Pearson product moment correlation or Indeed it is the mean of the product of standard score.

34 Scatter plot b a 10 0 2 4 6 8 ab 15 29 29 58 59 5 1 234

35 Calculation for correlation coefficient(r) [1] x [2] y [3] (x-x)/SD [4] (y-y)/SD[3] x [4] 15-1.07-1.731.85 29-0.530.58-0.31 29-0.530.58-0.31 581.070.00 591.070.580.62 Sum1.85 Mean38 SD1.871.73

36 Interpretation of correlation coefficient CorrelationNegativePositive None−0.09 to 0.000.00 to 0.09 Small−0.30 to −0.100.10 to 0.30 Medium−0.50 to −0.300.30 to 0.50 Strong−1.00 to −0.500.50 to 1.00 These serve as a guide, not a strict rule. In fact, the interpretation of a correlation coefficient depends on the context and purposes. From Wikipedia, the free encyclopedia

37 The correlation coefficient reflects the non-linearity and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). The figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero. This is a file from the Wikimedia Commons.Wikimedia Commons

38 Inference on correlation coefficient Stata commands:.di tanh(-0.885) -.70891534.di tanh(1.887).95511058

39 Stata command ci2 x y, corr spearman Confidence interval for Spearman's rank correlation of x and y, based on Fisher's transformation. Correlation = 0.354 on 5 observations (95% CI: - 0.768 to 0.942) Warning: This method may not give valid results with small samples (n<= 10) for rank correlations.

40 Inference on correlation coefficient Or use Stata command.di (ttail(3, 0.9))*2.43445103

41 Inference on correlation coefficient using SPSS Please do the data analysis using SPSS and paste the results here.

42 Inference on proportion One proportion Two proportions Three or more proportions

43 One proportion: Z-test y 1 0 1... 0 1. HypothesisH 0 :  1 = 0 H a :  1  0 2. Data 3. Calculating for z-statistic n y = 50, p y = 0.1 4. Obtain p-value based on Z-distribution 5. Make a decision P-value = 0.018 (http://vassarstats.net/tabs.html)http://vassarstats.net/tabs.html Reject the null hypothesis at a level of significant of 0.05 Proportion of Y is statistically significantly different from zero. Stata command to get the p-vale. di (1-normal(2.357))*2.01842325

44 Comparing 2 proportions: Z-test xy 11 10 01... 10 1. HypothesisH 0 :  1 =  0 H a :  1   0 2. Data 3. Calculating for z-statistic n 0 = 50, p 0 = 0.1 n 1 = 50, p 1 = 0.4 4. Obtain p-value based on t-distribution 5. Make a decision P-value = 0.0005 (http://vassarstats.net/tabs.html)http://vassarstats.net/tabs.html Reject the null hypothesis at level of significant of 0.05 Proportion of Y between group of x is statistically significantly different from each other. x y 01Total 045550 1302050 Total7525100

45 Z-test for two proportions Two-sample test of proportions 0: Number of obs = 50 1: Number of obs = 50 ------------------------------------------------------------------------------ Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 0 |.1.0424264.0168458.1831542 1 |.4.069282.2642097.5357903 -------------+---------------------------------------------------------------- diff | -.3.0812404 -.4592282 -.1407718 | under Ho:.0866025 -3.46 0.001 ------------------------------------------------------------------------------ diff = prop(0) - prop(1) z = -3.4641 Ho: diff = 0 Ha: diff 0 Pr(Z z) = 0.9997

46 Comparing 2 proportions: Chi-square-test 1. HypothesisH 0 :  ij =  i+  +jwhere I = 0, 1; j = 0, 1 H a :  ร่   i+  +j 2. Data 3. Calculating for  2 -statistic 4. Obtain p-value based on t-distribution 5. Make a decision P-value = 0.001 (http://vassarstats.net/tabs.html)http://vassarstats.net/tabs.html Reject the null hypothesis at level of significant of 0.05 There is statistically significantly association between x and y. x y 01Total 045550 1302050 Total7525100 OE(O-E)(O-E) 2 (O-E) 2 /E 45 (75/100)  50 = 37.50 7.5056.251.50 5 (25/100)  50 =12.50 -7.5056.254.50 30 (75/100)  50 =37.50 -7.5056.251.50 20 (25/100)  50 =12.50 7.5056.254.50 Chi-square (df = 1)12.00

47 Comparing 2 proportions: Chi-square-test | y x | 0 1 | Total -----------+----------------------+---------- 0 | 45 5 | 50 1 | 30 20 | 50 -----------+----------------------+---------- Total | 75 25 | 100 Pearson chi2(1) = 12.0000 Pr = 0.001

48 csi 20 5 30 45, or exact | Exposed Unexposed | Total -----------------+------------------------+------------ Cases | 20 5 | 25 Noncases | 30 45 | 75 -----------------+------------------------+------------ Total | 50 50 | 100 | | Risk |.4.1 |.25 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference |.3 |.1407718.4592282 Risk ratio | 4 | 1.62926 9.820408 Attr. frac. ex. |.75 |.3862245.8981712 Attr. frac. pop |.6 | Odds ratio | 6 | 2.086602 17.09265 (Cornfield) +------------------------------------------------- 1-sided Fisher's exact P = 0.0005 2-sided Fisher's exact P = 0.0010

49 Binomial regression. binreg y x, rr Generalized linear models No. of obs = 100 Optimization : MQL Fisher scoring Residual df = 98 (IRLS EIM) Scale parameter = 1 Deviance = 99.80946404 (1/df) Deviance = 1.018464 Pearson = 99.99966753 (1/df) Pearson = 1.020405 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(u) [Log] BIC = -351.4972 ------------------------------------------------------------------------------ | EIM y | Risk Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 4 1.833024 3.03 0.002 1.629265 9.820377 _cons |.1.0424262 -5.43 0.000.0435379.2296851 ------------------------------------------------------------------------------

50 Logistic regression. logistic y x Logistic regression Number of obs = 100 LR chi2(1) = 12.66 Prob > chi2 = 0.0004 Log likelihood = -49.904732 Pseudo R2 = 0.1125 ------------------------------------------------------------------------------ y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 6 3.316625 3.24 0.001 2.030635 17.72844 _cons |.1111111.0523783 -4.66 0.000.044106.2799096 ------------------------------------------------------------------------------

51


Download ppt "Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND."

Similar presentations


Ads by Google