Presentation is loading. Please wait. # EPI 809/Spring 2008 1 Probability Distribution of Random Error.

## Presentation on theme: "EPI 809/Spring 2008 1 Probability Distribution of Random Error."— Presentation transcript:

EPI 809/Spring 2008 1 Probability Distribution of Random Error

EPI 809/Spring 20082 Regression Modeling Steps  1.Hypothesize Deterministic Component  2.Estimate Unknown Model Parameters  3.Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error Estimate Standard Deviation of Error  4.Evaluate Model  5.Use Model for Prediction & Estimation

EPI 809/Spring 20083 Linear Regression Assumptions Assumptions of errors    n Assumptions of errors    n - Gauss-Markov condition - Gauss-Markov condition 1. Independent errors 2. Mean of probability distribution of errors is 0 3. Errors have constant variance σ 2, for which an estimator is S 2 4. Probability distribution of error is normal 5. Potential violation of G-M condition.

EPI 809/Spring 20084 Error Probability Distribution

EPI 809/Spring 20085 Random Error Variation

EPI 809/Spring 20086 Random Error Variation  1.Variation of Actual Y from Predicted Y

EPI 809/Spring 20087 Random Error Variation  1.Variation of Actual Y from Predicted Y  2.Measured by Standard Error of Regression Model Sample Standard Deviation of , s Sample Standard Deviation of , s ^

EPI 809/Spring 20088 Random Error Variation  1.Variation of Actual Y from Predicted Y  2.Measured by Standard Error of Regression Model Sample Standard Deviation of , s Sample Standard Deviation of , s  3. Affects Several Factors Parameter Significance Parameter Significance Prediction Accuracy Prediction Accuracy ^

EPI 809/Spring 2008 9 Evaluating the Model Testing for Significance

EPI 809/Spring 200810 Regression Modeling Steps  1. Hypothesize Deterministic Component  2.Estimate Unknown Model Parameters  3.Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error Estimate Standard Deviation of Error  4.Evaluate Model  5.Use Model for Prediction & Estimation

EPI 809/Spring 200811 Test of Slope Coefficient  1. Shows If There Is a Linear Relationship Between X & Y  2.Involves Population Slope  1  3.Hypotheses H 0 :  1 = 0 (No Linear Relationship) H 0 :  1 = 0 (No Linear Relationship) H a :  1  0 (Linear Relationship) H a :  1  0 (Linear Relationship)  4.Theoretical basis of the test statistic is the sampling distribution of slope

EPI 809/Spring 200812 Sampling Distribution of Sample Slopes

EPI 809/Spring 200813 Sampling Distribution of Sample Slopes

EPI 809/Spring 200814 Sampling Distribution of Sample Slopes  All Possible Sample Slopes  Sampl e 1:2.5  Sampl e 2:1.6  Sampl e 3:1.8  Sampl e 4:2.1 : : Very large number of sample slopes

EPI 809/Spring 200815 Sampling Distribution of Sample Slopes  All Possible Sample Slopes  Samp le 1:2.5  Samp le 2:1.6  Samp le 3:1.8  Samp le 4:2.1 : : large number of sample slopes Sampling Distribution 1111 1111 S ^ ^

EPI 809/Spring 200816 Slope Coefficient Test Statistic

EPI 809/Spring 200817 Test of Slope Coefficient Rejection Rule  Reject H 0 in favor of H a if t falls in colored area  Reject H 0 for H a if P-value = P(T>|t|) |t|) < α T=t (n-2) 0 t 1-α/2, (n-2) Reject H 0 0 α/2 -t 1-α/2, (n-2) α/2

EPI 809/Spring 200818 Test of Slope Coefficient Example  Reconsider the Obstetrics example with the following data: Estriol (mg/24h) B.w. (g/1000) 11 21 32 42 54 11 21 32 42 54  Is the Linear Relationship between Estriol & Birthweight significant at.05 level?

EPI 809/Spring 200819 Solution Table For β’s

EPI 809/Spring 200820 Solution Table for SSE Birth weight =y Estriol =x Predicted =y=β 0 + β 1 x (Obs-pred) 2 =( y - y) 2 110.60.16 121.30.09 2320 242.70.49 453.40.36 1015-SSE=1.1 ^^^^

EPI 809/Spring 200821 Test of Slope Parameter Solution  H 0 :  1 = 0  H a :  1  0   .05  df  5 - 2 = 3  Critical Value(s): Test Statistic:

EPI 809/Spring 200822 Test Statistic Solution From Table

EPI 809/Spring 200823 Test of Slope Parameter  H 0 :  1 = 0  H a :  1  0   .05  df  5 - 2 = 3  Critical Value(s): Test Statistic: Decision:Conclusion: Reject at  =.05 There is evidence of a linear relationship

EPI 809/Spring 200824 Test of Slope Parameter Computer Output  Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -0.10000 0.63509 -0.16 0.8849 Estriol 1 0.70000 0.19149 3.66 0.0354  t =  k / S  P-Value SS kk k k ^ ^ ^ ^

EPI 809/Spring 200825 Measures of Variation in Regression  1.Total Sum of Squares (SS yy ) Measures Variation of Observed Y i Around the Mean  Y Measures Variation of Observed Y i Around the Mean  Y  2.Explained Variation (SSR) Variation Due to Relationship Between X & Y Variation Due to Relationship Between X & Y  3.Unexplained Variation (SSE) Variation Due to Other Factors Variation Due to Other Factors

EPI 809/Spring 200826 Variation Measures Total sum of squares (Y i -  Y) 2 Unexplained sum of squares (Y i -  Y i ) 2 ^ Explained sum of squares (Y i -  Y) 2 ^ YiYiYiYi

EPI 809/Spring 200827  1.Proportion of Variation ‘Explained’ by Relationship Between X & Y Coefficient of Determination 0  r 2  1

EPI 809/Spring 200828 Coefficient of Determination Examples r 2 = 1 r 2 =.8r 2 = 0

EPI 809/Spring 200829 Coefficient of Determination Example  Reconsider the Obstetrics example. Interpret a coefficient of Determination of 0.8167.  Answer: About 82% of the total variation of birthweight Is explained by the mother’s Estriol level.

EPI 809/Spring 200830 r 2 Computer Output Root MSE 0.60553 R-Square 0.8167 Dependent Mean 2.00000 Adj R-Sq 0.7556 Coeff Var 30.27650 r 2 adjusted for number of explanatory variables & sample size S r2r2

EPI 809/Spring 2008 31 Using the Model for Prediction & Estimation

EPI 809/Spring 200832 Regression Modeling Steps  1.Hypothesize Deterministic Component  2.Estimate Unknown Model Parameters  3.Specify Probability Distribution of Random Error Term-Estimate Standard Deviation of Error  4.Evaluate Model  5.Use Model for Prediction & Estimation

EPI 809/Spring 200833 Prediction With Regression Models What Is Predicted? Population Mean Response E(Y) for Given X Population Mean Response E(Y) for Given X Point on Population Regression LinePoint on Population Regression Line Individual Response (Y i ) for Given X Individual Response (Y i ) for Given X

EPI 809/Spring 200834 What Is Predicted?

EPI 809/Spring 200835 Confidence Interval Estimate of Mean Y

EPI 809/Spring 200836 Factors Affecting Interval Width  1.Level of Confidence (1 -  ) Width Increases as Confidence Increases Width Increases as Confidence Increases  2.Data Dispersion (s) Width Increases as Variation Increases Width Increases as Variation Increases  3.Sample Size Width Decreases as Sample Size Increases Width Decreases as Sample Size Increases  4.Distance of X p from Mean  X Width Increases as Distance Increases Width Increases as Distance Increases

EPI 809/Spring 200837 Why Distance from Mean? Greater dispersion than X 1 XXXX

EPI 809/Spring 200838 Confidence Interval Estimate Example  Reconsider the Obstetrics example with the following data: Estriol (mg/24h) B.w. (g/1000) 11 21 32 42 54 11 21 32 42 54  Estimate the mean BW and a subject’s BW response when the Estriol level is 4 at.05 level.

EPI 809/Spring 200839 Solution Table

EPI 809/Spring 200840 Confidence Interval Estimate Solution - Mean BW X to be predicted

EPI 809/Spring 200841 Prediction Interval of Individual Response Note!

EPI 809/Spring 200842 Why the Extra ‘S’?

EPI 809/Spring 200843 SAS codes for computing mean and prediction intervals  Data BW; /*Reading data in SAS*/  input estriol birthw;  cards;  11  21  32  42  54  ;  run;  PROC REG data=BW; /*Fitting a linear regression model*/  model birthw=estriol/CLI CLM alpha=.05;  run;

EPI 809/Spring 200844 Interval Estimate from SAS- Output The REG Procedure Dependent Variable: y Output Statistics Dep Var Predicted Std Error Obs y Value Mean Predict 95% CL Mean 95% CL Predict Residual 1 1.0000 0.6000 0.4690 -0.8927 2.0927 -1.8376 3.0376 0.4000 2 1.0000 1.3000 0.3317 0.2445 2.3555 -0.8972 3.4972 -0.3000 3 2.0000 2.0000 0.2708 1.1382 2.8618 -0.1110 4.1110 0 4 2.0000 2.7000 0.3317 1.6445 3.7555 0.5028 4.8972 -0.7000 5 4.0000 3.4000 0.4690 1.9073 4.8927 0.9624 5.8376 0.6000 Predicted Y when X = 3 Confidence Interval SYSYSYSY^ Prediction Interval

EPI 809/Spring 200845 Hyperbolic Interval Bands

EPI 809/Spring 2008 46 Correlation Models

EPI 809/Spring 200847 Types of Probabilistic Models

EPI 809/Spring 200848  Both variables are treated the same in correlation; in regression there is a predictor and a response  In regression the x variable is assumed non- random or measured without error  Correlation is used in looking for relationships, regression for prediction Correlation vs. regression

EPI 809/Spring 200849 Correlation Models  1.Answer ‘How Strong Is the Linear Relationship Between 2 Variables?’  2.Coefficient of Correlation Used Population Correlation Coefficient Denoted  (Rho) Population Correlation Coefficient Denoted  (Rho) Values Range from -1 to +1 Values Range from -1 to +1 Measures Degree of Association Measures Degree of Association  3.Used Mainly for Understanding

EPI 809/Spring 200850  1.Pearson Product Moment Coefficient of Correlation between x and y: Sample Coefficient of Correlation

EPI 809/Spring 200851 Coefficient of Correlation Values +1.00-.5+.5

EPI 809/Spring 200852 Coefficient of Correlation Values +1.00-.5+.5 No Correlation

EPI 809/Spring 200853 Coefficient of Correlation Values +1.00 Increasing degree of negative correlation -.5+.5 No Correlation

EPI 809/Spring 200854 Coefficient of Correlation Values +1.00-.5+.5 Perfect Negative Correlation No Correlation

EPI 809/Spring 200855 Coefficient of Correlation Values +1.00-.5+.5 Perfect Negative Correlation No Correlation Increasing degree of positive correlation

EPI 809/Spring 200856 Coefficient of Correlation Values +1.00 Perfect Positive Correlation -.5+.5 Perfect Negative Correlation No Correlation

EPI 809/Spring 200857 Coefficient of Correlation Examples r = 1r = -1 r =.89r = 0

EPI 809/Spring 200858 Test of Coefficient of Correlation  1.Shows If There Is a Linear Relationship Between 2 Numerical Variables  2.Same Conclusion as Testing Population Slope  1  3.Hypotheses H 0 :  = 0 (No Correlation) H 0 :  = 0 (No Correlation) H a :   0 (Correlation) H a :   0 (Correlation)

EPI 809/Spring 200859 1 Sample t-Test on Correlation Coefficient  Hypotheses H 0 :  = 0 (No Correlation) H 0 :  = 0 (No Correlation) H a :   0 (Correlation) H a :   0 (Correlation)  test statistic: under H 0 t = r (n-2) 1/2 / (1-r 2 ) 1/2 ~ t (n-2) t = r (n-2) 1/2 / (1-r 2 ) 1/2 ~ t (n-2) Reject H 0 if |t| > t α/2, n-2 Reject H 0 if |t| > t α/2, n-2

EPI 809/Spring 200860 1 Sample Z-Test on Correlation Coefficient  Hypotheses (Fisher) H 0 :  =  0 H 0 :  =  0 H a :    0 H a :    0  test statistic: under H 0 : Reject H 0 if |z| > z 1- α/2 Reject H 0 if |z| > z 1- α/2

EPI 809/Spring 200861 Conclusion 1. Describe the Linear Regression Model 2. State the Regression Modeling Steps 3. Explain Ordinary Least Squares 4. Compute Regression Coefficients 5. Understand and check model assumptions 6. Predict Response Variable 7. Comments of SAS Output

EPI 809/Spring 200862 Conclusion … 8. Correlation Models 9. Test of coefficient of Correlation

Similar presentations

Ads by Google