Presentation is loading. Please wait.

Presentation is loading. Please wait.

Common Statistical Analyses Theory behind them

Similar presentations


Presentation on theme: "Common Statistical Analyses Theory behind them"— Presentation transcript:

1 Common Statistical Analyses Theory behind them
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND

2 Statistical inference revisited
Statistical inference use data from samples to make inferences about a population 1. Estimate the population parameter Characterized by confidence interval of the magnitude of effect of interest 2. Test the hypothesis being formulated before looking at the data Characterized by p-value

3 Sample n = 25 X = 52 SD = 5 Population Parameter estimation [95%CI]
Hypothesis testing [P-value]

4 Sample n = 25 X = 52 SD = 5 SE = 1 Population Parameter estimation
Z = 2.58 Z = 1.96 Z = 1.64 Population Parameter estimation [95%CI] : (1) to (1) 50.04 to 53.96 We are 95% confidence that the population mean would lie between and 53.96

5 Sample n = 25 X = 52 SD = 5 SE = 1 Population Hypothesis testing
HA :   55 Z = 55 – 52 1 3

6 Hypothesis testing H0 :  = 55 HA :   55 52 55 -3SE +3SE
If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is Z = 55 – 52 1 3 P-value = =

7 Calculation of the previous example based on t-distribution
Stata command to find t value for 95%CL . di (invttail(24, 0.025)) Stata command to find probability .di (ttail(24, 3))*2 Web base stat table: or

8 Revisit the example based on t-distribution (Stata output)
1. Estimate the population parameter Variable | Obs Mean Std. Err [95% Conf. Interval] | 2. Test the hypothesis being formulated before looking at the data One-sample t test | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] x | mean = mean(x) t = Ho: mean = degrees of freedom = Ha: mean < Ha: mean != Ha: mean > 55 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) =

9 Mean one group: T-test a 1. Hypothesis H0:  = 0 Ha:    0 2. Data 1
5 3. Calculating for t-statistic 4. Obtain p-value based on t-distribution Stata command .di (ttail(4, 3.59))*2 P-value = 0.023 5. Make a decision Reject the null hypothesis at level of significant of 0.05 The mean of y is statistically significantly different from zero.

10 Mean one group: T-test (cont.)
One-sample t test Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] y | mean = mean(y) t = Ho: mean = degrees of freedom = Ha: mean < Ha: mean != Ha: mean > 0 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) =

11 Comparing 2 means: T-test
1. Hypothesis H0: A = B Ha: A   B 2. Data a b 1 5 2 9 8 3. Calculating for t-statistic 4. Obtain p-value based on t-distribution P-value = ( 5. Make a decision Reject the null hypothesis at level of significant of 0.05 Mean of Group A is statistically significantly different from that of Group B.

12 T-test Two-sample t test with equal variances
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] a | b | combined | diff | diff = mean(1) - mean(2) t = Ho: diff = degrees of freedom = Ha: diff < Ha: diff != Ha: diff > 0 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) =

13 Mann-Whitney U test Wilcoxon rank-sum test
Two-sample Wilcoxon rank-sum (Mann-Whitney) test group | obs rank sum expected 1 | 2 | combined | unadjusted variance adjustment for ties adjusted variance Ho: y(group==1) = y(group==2) z = Prob > |z| =

14 Comparing 2 means : ANOVA
Mathematical model of ANOVA X =  +  +  X = Grand mean + Treatment effect + Error X = M + T + E X M T E = + + Mean: [3-5.5] [8-5.5] Degree of freedom 3. Calculating for F-statistic Between groups SST SSE Within groups 4. Obtain p-value based on F-distribution P-value = ( 5. Make a decision Reject the null hypothesis at level of significant of 0.05 Mean of Group A is statistically significantly different from that of Group B.

15 ANOVA 2 groups Analysis of Variance Source SS df MS F Prob > F
Between groups Within groups Total Bartlett's test for equal variances: chi2(1) = Prob>chi2 = 0.885

16 Comparing 3 means: ANOVA
1. Hypothesis H0: A = B = C Ha: At least one mean is difference 2. Data a b c 1 5 4 2 9 6 8

17 ANOVA 3 groups (cont.) X M T E = + + 3. Calculating for F-statistic
Mathematical model of ANOVA X =  +  +  X = Grand mean + Treatment effect + Error X = M + T + E X M T E = + + Mean: [3-5.4] [8-5.4] [ ] Df: 3. Calculating for F-statistic Between groups SST SSE Within groups 4. Obtain p-value based on F-distribution P-value = ( 5. Make a decision Reject the null hypothesis at level of significant of 0.05 At least one mean of the three groups is statistically significantly different from the others.

18 ANOVA 3 groups Analysis of Variance Source SS df MS F Prob > F
Between groups Within groups Total Bartlett's test for equal variances: chi2(2) = Prob>chi2 = 0.989

19 Kruskal-Wallis test Kruskal-Wallis equality-of-populations rank test
| group | Obs | Rank Sum | | | | | 5 | | | | 5 | | | | 5 | | chi-squared = with 2 d.f. probability = chi-squared with ties = with 2 d.f. probability =

20 Comparing 2 means: Regression
1. Data a b 1 5 2 9 8 x y (x-x) (x-x)2 (y-y) (x-x)(y-y) 1 -0.5 0.25 -4.5 2.25 2 -3.5 1.75 5 0.5 -0.30 9 3.5 8 2.5 1.25 Mean Sum y = a + bx where b = 12.5/2.5 = 5, then 5.5 = a + 5(1.5) Thus a = = -2

21 Comparing 2 means: Regression
Y 10 a b 1 5 2 9 8 8 6 4 2 x a b -2

22 Comparing 2 means: Regression (cont.)
Y 10 x y 1 2 5 9 8 8 6 4 2 x 1 2 -2

23 Comparing 2 means: Regression (cont.)
Y y = a + bx 10 y = x y = 8 if x = 2 8 y = 5.5; x = 1.5 6 b difference of y between x=1 vs. x=2 y = 3 if x = 1 4 2 x 1 2 -2 a y = -2 if x = 0

24 Regression model (2 means)
Source | SS df MS Number of obs = F( 1, 8) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err t P>|t| [95% Conf. Interval] group | _cons |

25 Regression model (3 means)
i.group _Igroup_ (naturally coded; _Igroup_1 omitted) Source | SS df MS Number of obs = F( 2, 12) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err t P>|t| [95% Conf. Interval] _Igroup_2 | _Igroup_3 | _cons |

26 Correlation coefficient
Pearson product moment correlation Denoted by r (for the sample) or  (for the population) Require bivariate normal distribution assumption Require linear relationship Spearman rank correlation For small sample, not require bivariate normal distribution assumption

27 Pearson product moment correlation
Indeed it is the mean of the product of standard score.

28 Scatter plot b 10 a b 1 5 2 9 8 8 6 4 2 a 1 2 3 4 5

29 Calculation for correlation coefficient(r)
[1] x [2] y [3] (x-x)/SD [4] (y-y)/SD [3] x [4] 1 5 -1.07 -1.73 1.85 2 9 -0.53 0.58 -0.31 8 1.07 0.00 0.62 Sum Mean 3 SD 1.87 1.73

30 Interpretation of correlation coefficient
Negative Positive None −0.09 to 0.00 to 0.09 Small −0.30 to −0.10 0.10 to 0.30 Medium −0.50 to −0.30 0.30 to 0.50 Strong −1.00 to −0.50 0.50 to 1.00 These serve as a guide, not a strict rule. In fact, the interpretation of a correlation coefficient depends on the context and purposes. From Wikipedia, the free encyclopedia

31 The correlation coefficient reflects the non-linearity and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). The figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero. This is a file from the Wikimedia Commons.

32 Inference on correlation coefficient
Stata commands: .di tanh(-0.885) .di tanh(1.887)

33 Stata command ci2 x y, corr spearman
Confidence interval for Spearman's rank correlation of x and y, based on Fisher's transformation. Correlation = on 5 observations (95% CI: to 0.942) Warning: This method may not give valid results with small samples (n<= 10) for rank correlations.

34 Inference on correlation coefficient
Or use Stata command .di (ttail(3, 0.9))*2

35 Inference on proportion
One proportion Two proportions Three or more proportions

36 One proportion: Z-test
1. Hypothesis H0: 1 = 0 Ha: 1   0 2. Data y 1 . . . ny = 50, py = 0.1 3. Calculating for z-statistic 4. Obtain p-value based on Z-distribution P-value = ( Stata command to get the p-vale . di (1-normal(2.357))*2 5. Make a decision Reject the null hypothesis at a level of significant of 0.05 Proportion of Y is statistically significantly different from zero.

37 Comparing 2 proportions: Z-test
1. Hypothesis H0: 1 = 0 Ha: 1   0 2. Data x y 1 . . . x y 1 Total 45 5 50 30 20 75 25 100 n0 = 50, p0 = 0.1 n1 = 50, p1 = 0.4 3. Calculating for z-statistic 4. Obtain p-value based on t-distribution P-value = ( 5. Make a decision Reject the null hypothesis at level of significant of 0.05 Proportion of Y between group of x is statistically significantly different from each other.

38 Z-test for two proportions
Two-sample test of proportions : Number of obs = 1: Number of obs = Variable | Mean Std. Err z P>|z| [95% Conf. Interval] 0 | 1 | diff | | under Ho: diff = prop(0) - prop(1) z = Ho: diff = 0 Ha: diff < Ha: diff != Ha: diff > 0 Pr(Z < z) = Pr(|Z| < |z|) = Pr(Z > z) =

39 Comparing 2 proportions: Chi-square-test
1. Hypothesis H0: ij = i+ +j where I = 0, 1; j = 0, 1 Ha: ร่   i+ +j 2. Data x y 1 Total 45 5 50 30 20 75 25 100 O E (O-E) (O-E)2 (O-E)2/E 45 (75/100)50 = 37.50 7.50 56.25 1.50 5 (25/100)50 =12.50 -7.50 4.50 30 (75/100)50 =37.50 20 Chi-square (df = 1) 12.00 3. Calculating for 2-statistic 4. Obtain p-value based on t-distribution P-value = ( 5. Make a decision Reject the null hypothesis at level of significant of 0.05 There is statistically significantly association between x and y.

40 Comparing 2 proportions: Chi-square-test
| y x | | Total 0 | | 1 | | Total | | Pearson chi2(1) = Pr = 0.001

41 csi 20 5 30 45, or exact | Exposed Unexposed | Total
Cases | | Noncases | | Total | | | | Risk | | | Point estimate | [95% Conf. Interval] | Risk difference | | Risk ratio | | Attr. frac. ex. | | Attr. frac. pop | | Odds ratio | | (Cornfield) 1-sided Fisher's exact P = 2-sided Fisher's exact P =

42 Binomial regression . binreg y x, rr
Generalized linear models No. of obs = Optimization : MQL Fisher scoring Residual df = (IRLS EIM) Scale parameter = Deviance = (1/df) Deviance = Pearson = (1/df) Pearson = Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(u) [Log] BIC = | EIM y | Risk Ratio Std. Err z P>|z| [95% Conf. Interval] x | _cons |

43 Logistic regression . logistic y x
Logistic regression Number of obs = LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R = y | Odds Ratio Std. Err z P>|z| [95% Conf. Interval] x | _cons |

44 Q & A


Download ppt "Common Statistical Analyses Theory behind them"

Similar presentations


Ads by Google