Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.

Slides:



Advertisements
Similar presentations
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
Inference for Regression Today we will talk about the conditions necessary to make valid inference with regression We will also discuss the various types.
Inference for Regression
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
EPI 809/Spring Probability Distribution of Random Error.
Objectives (BPS chapter 24)
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Modelling risk ratios and risk differences …this is *new* methodology…
Differences Between Group Means
Sociology 601: Midterm review, October 15, 2009
Final Review Session.
SIMPLE LINEAR REGRESSION
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
Interpreting Bi-variate OLS Regression
5-3 Inference on the Means of Two Populations, Variances Unknown
Simple Linear Regression and Correlation
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Biostat 200 Lecture 8 1. Hypothesis testing recap Hypothesis testing – Choose a null hypothesis, one-sided or two sided test – Set , significance level,
Testing Group Difference
Inference for regression - Simple linear regression
Statistics for clinical research An introductory course.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Design and Analysis of Experiments Dr. Tai-Yue Wang Department of Industrial and Information Management National Cheng Kung University Tainan, TAIWAN,
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
SUMMARY FOR EQT271 Semester /2015 Maz Jamilah Masnan, Inst. of Engineering Mathematics, Univ. Malaysia Perlis.
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Warsaw Summer School 2015, OSU Study Abroad Program Regression.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Biostat 200 Lecture 9 1. Chi-square test when the exposure has several levels E.g. Is sleep quality associated with having had at least one cold in the.
Biostat 200 Lecture 7 1. Outline for today Hypothesis tests so far – One mean, one proportion, 2 means, 2 proportions Comparison of means of multiple.
Biostat 200 Lecture 9 1. Chi-square test when the exposure has several levels E.g. Is sleep quality associated with having had at least one cold in the.
Simple linear regression Tron Anders Moger
Experimental Statistics - week 3
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
366_8. Estimation: Chapter 8 Suppose we observe something in a random sample how confident are we in saying our observation is an accurate reflection.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
366_7. T-distribution T-test vs. Z-test Z assumes we know, or can calculate the standard error of the distribution of something in a population We never.
Exact Logistic Regression
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
1 In the Monte Carlo experiment in the previous sequence we used the rate of unemployment, U, as an instrument for w in the price inflation equation. SIMULTANEOUS.
Chapter 13 Understanding research results: statistical inference.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
1 Estimating and Testing  2 0 (n-1)s 2 /  2 has a  2 distribution with n-1 degrees of freedom Like other parameters, can create CIs and hypothesis tests.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
Chapter 20 Linear and Multiple Regression
From t-test to multilevel analyses Del-2
CHAPTER 7 Linear Correlation & Regression Methods
Correlation and Simple Linear Regression
Common Statistical Analyses Theory behind them
Correlation and Simple Linear Regression
Presentation transcript:

Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND

Statistical inference revisited Statistical inference use data from samples to make inferences about a population 1. Estimate the population parameter Characterized by confidence interval of the magnitude of effect of interest 2. Test the hypothesis being formulated before looking at the data Characterized by p-value

n = 25 X = 52 SD = 5 Sample Population Parameter estimation [95%CI] Hypothesis testing [P-value] Parameter estimation [95%CI] Hypothesis testing [P-value]

n = 25 X = 52 SD = 5 SE = 1 Sample Population Parameter estimation [95%CI] : (1) to (1) to We are 95% confidence that the population mean would lie between and [95%CI] : (1) to (1) to We are 95% confidence that the population mean would lie between and Z = 2.58 Z = 1.96 Z = 1.64

n = 25 X = 52 SD = 5 SE = 1 Sample Hypothesis testing Hypothesis testing Population Z = 55 – H 0 :  = 55 H A :   55

Hypothesis testing H 0 :  = 55 H A :   55 If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is Hypothesis testing H 0 :  = 55 H A :   55 If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is Z = 55 – P-value = = SE +3SE

Calculation of the previous example based on t-distribution Stata command to find probability.di (ttail(24, 3))* Stata command to find t value for 95%CL. di (invttail(24, 0.025)) Web base stat table: base stat table: or

Revisit the example based on t-distribution (Stata output) Variable | Obs Mean Std. Err. [95% Conf. Interval] | Estimate the population parameter 2. Test the hypothesis being formulated before looking at the data One-sample t test | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] x | mean = mean(x) t = Ho: mean = 55 degrees of freedom = 24 Ha: mean 55 Pr(T |t|) = Pr(T > t) =

Mean one group: T-test a HypothesisH 0 :  = 0 H a :   0 2. Data 3. Calculating for t-statistic 4. Obtain p-value based on t-distribution 5. Make a decision P-value = Reject the null hypothesis at level of significant of 0.05 The mean of y is statistically significantly different from zero. Stata command.di (ttail(4, 3.59))*

Mean one group: T-test (cont.) One-sample t test Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] y | mean = mean(y) t = Ho: mean = 0 degrees of freedom = 4 Ha: mean 0 Pr(T |t|) = Pr(T > t) =

One sample t-test using SPSS Please do the data analysis using SPSS and paste the results here.

Comparing 2 means: T-test ab HypothesisH 0 :  A =  B H a :  A   B 2. Data 3. Calculating for t-statistic 4. Obtain p-value based on t-distribution 5. Make a decision P-value = ( Reject the null hypothesis at level of significant of 0.05 Mean of Group A is statistically significantly different from that of Group B.

T-test Two-sample t test with equal variances Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] a | b | combined | diff | diff = mean(1) - mean(2) t = Ho: diff = 0 degrees of freedom = 8 Ha: diff 0 Pr(T |t|) = Pr(T > t) =

Two independent sample t-test using SPSS Please do the data analysis using SPSS and paste the results here.

Mann-Whitney U test Wilcoxon rank-sum test Two-sample Wilcoxon rank-sum (Mann-Whitney) test group | obs rank sum expected | | combined | unadjusted variance adjustment for ties adjusted variance Ho: y(group==1) = y(group==2) z = Prob > |z| =

Mann-Whitney U test Wilcoxon rank-sum test using SPSS Please do the data analysis using SPSS and paste the results here.

Comparing 2 means : ANOVA Mathematical model of ANOVAX =  +  +  X = Grand mean + Treatment effect + Error X = M + T + E 3. Calculating for F-statistic 4. Obtain p-value based on F-distribution 5. Make a decision P-value = ( Reject the null hypothesis at level of significant of 0.05 Mean of Group A is statistically significantly different from that of Group B. XM T Mean: 3 8 = ++ E [3-5.5] [8-5.5] SST SSE Degree of freedom Between groups Within groups

ANOVA 2 groups Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi2(1) = Prob>chi2 = 0.885

Comparing 3 means: ANOVA 1. HypothesisH 0 :  A =  B =  C H a : At least one mean is difference 2. Data abc

ANOVA 3 groups (cont.) Mathematical model of ANOVAX =  +  +  X = Grand mean + Treatment effect + Error X = M + T + E 3. Calculating for F-statistic 4. Obtain p-value based on F-distribution 5. Make a decision P-value = ( Reject the null hypothesis at level of significant of 0.05 At least one mean of the three groups is statistically significantly different from the others. XM T Mean: =+ + E [3-5.4] [8-5.4] [ ] SST SSE Df: Between groups Within groups

ANOVA 3 groups Analysis of Variance Source SS df MS F Prob > F Between groups Within groups Total Bartlett's test for equal variances: chi2(2) = Prob>chi2 = 0.989

ANOVA 3 groups using SPSS Please do the data analysis using SPSS and paste the results here.

Kruskal-Wallis test Kruskal-Wallis equality-of-populations rank test | group | Obs | Rank Sum | | | | 1 | 5 | | | 2 | 5 | | | 3 | 5 | | chi-squared = with 2 d.f. probability = chi-squared with ties = with 2 d.f. probability =

Kruskal-Wallis test using SPSS Please do the data analysis using SPSS and paste the results here.

Comparing 2 means: Regression ab Data xy(x-x)(x-x) 2 (y-y)(x-x)(y-y) Mean Sum y = a + bx whereb = 12.5/2.5 = 5, then 5.5=a + 5(1.5) Thusa = = -2

Comparing 2 means: Regression Y x a b ab

Comparing 2 means: Regression (cont.) Y x xy

Comparing 2 means: Regression (cont.) Y x y = a + bx b difference of y between x=1 vs. x=2 a y = 3 if x = 1 y = 8 if x = 2 y = -2 if x = 0 y = x y = 5.5; x = 1.5

Regression model (2 means) Source | SS df MS Number of obs = F( 1, 8) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] group | _cons |

i.group _Igroup_1-3 (naturally coded; _Igroup_1 omitted) Source | SS df MS Number of obs = F( 2, 12) = 9.71 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] _Igroup_2 | _Igroup_3 | _cons | Regression model (3 means)

Correlation coefficient Pearson product moment correlation – Denoted by r (for the sample) or  (for the population) – Require bivariate normal distribution assumption – Require linear relationship Spearman rank correlation – For small sample, not require bivariate normal distribution assumption

Regression model using SPSS Please do the data analysis using SPSS and paste the results here.

Pearson product moment correlation or Indeed it is the mean of the product of standard score.

Scatter plot b a ab

Calculation for correlation coefficient(r) [1] x [2] y [3] (x-x)/SD [4] (y-y)/SD[3] x [4] Sum1.85 Mean38 SD

Interpretation of correlation coefficient CorrelationNegativePositive None−0.09 to to 0.09 Small−0.30 to − to 0.30 Medium−0.50 to − to 0.50 Strong−1.00 to − to 1.00 These serve as a guide, not a strict rule. In fact, the interpretation of a correlation coefficient depends on the context and purposes. From Wikipedia, the free encyclopedia

The correlation coefficient reflects the non-linearity and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). The figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero. This is a file from the Wikimedia Commons.Wikimedia Commons

Inference on correlation coefficient Stata commands:.di tanh(-0.885) di tanh(1.887)

Stata command ci2 x y, corr spearman Confidence interval for Spearman's rank correlation of x and y, based on Fisher's transformation. Correlation = on 5 observations (95% CI: to 0.942) Warning: This method may not give valid results with small samples (n<= 10) for rank correlations.

Inference on correlation coefficient Or use Stata command.di (ttail(3, 0.9))*

Inference on correlation coefficient using SPSS Please do the data analysis using SPSS and paste the results here.

Inference on proportion One proportion Two proportions Three or more proportions

One proportion: Z-test y HypothesisH 0 :  1 = 0 H a :  1  0 2. Data 3. Calculating for z-statistic n y = 50, p y = Obtain p-value based on Z-distribution 5. Make a decision P-value = ( Reject the null hypothesis at a level of significant of 0.05 Proportion of Y is statistically significantly different from zero. Stata command to get the p-vale. di (1-normal(2.357))*

Comparing 2 proportions: Z-test xy HypothesisH 0 :  1 =  0 H a :  1   0 2. Data 3. Calculating for z-statistic n 0 = 50, p 0 = 0.1 n 1 = 50, p 1 = Obtain p-value based on t-distribution 5. Make a decision P-value = ( Reject the null hypothesis at level of significant of 0.05 Proportion of Y between group of x is statistically significantly different from each other. x y 01Total Total

Z-test for two proportions Two-sample test of proportions 0: Number of obs = 50 1: Number of obs = Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] | | diff | | under Ho: diff = prop(0) - prop(1) z = Ho: diff = 0 Ha: diff 0 Pr(Z z) =

Comparing 2 proportions: Chi-square-test 1. HypothesisH 0 :  ij =  i+  +jwhere I = 0, 1; j = 0, 1 H a :  ร่   i+  +j 2. Data 3. Calculating for  2 -statistic 4. Obtain p-value based on t-distribution 5. Make a decision P-value = ( Reject the null hypothesis at level of significant of 0.05 There is statistically significantly association between x and y. x y 01Total Total OE(O-E)(O-E) 2 (O-E) 2 /E 45 (75/100)  50 = (25/100)  50 = (75/100)  50 = (25/100)  50 = Chi-square (df = 1)12.00

Comparing 2 proportions: Chi-square-test | y x | 0 1 | Total | 45 5 | 50 1 | | Total | | 100 Pearson chi2(1) = Pr = 0.001

csi , or exact | Exposed Unexposed | Total Cases | 20 5 | 25 Noncases | | Total | | 100 | | Risk |.4.1 |.25 | | | Point estimate | [95% Conf. Interval] | Risk difference |.3 | Risk ratio | 4 | Attr. frac. ex. |.75 | Attr. frac. pop |.6 | Odds ratio | 6 | (Cornfield) sided Fisher's exact P = sided Fisher's exact P =

Binomial regression. binreg y x, rr Generalized linear models No. of obs = 100 Optimization : MQL Fisher scoring Residual df = 98 (IRLS EIM) Scale parameter = 1 Deviance = (1/df) Deviance = Pearson = (1/df) Pearson = Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(u) [Log] BIC = | EIM y | Risk Ratio Std. Err. z P>|z| [95% Conf. Interval] x | _cons |

Logistic regression. logistic y x Logistic regression Number of obs = 100 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] x | _cons |