Presentation is loading. Please wait.

Presentation is loading. Please wait.

Partial Regression Plots. Life Insurance Example: (nknw364.sas) Y = the amount of life insurance for the 18 managers (in $1000) X 1 = average annual income.

Similar presentations


Presentation on theme: "Partial Regression Plots. Life Insurance Example: (nknw364.sas) Y = the amount of life insurance for the 18 managers (in $1000) X 1 = average annual income."— Presentation transcript:

1 Partial Regression Plots

2 Life Insurance Example: (nknw364.sas) Y = the amount of life insurance for the 18 managers (in $1000) X 1 = average annual income (in $1000) X 2 = risk aversion score (0 – 10)

3 Life Insurance: Input, diagnostics title1 h=3 'Insurance'; data insurance; infile 'I:\My Documents\Stat 512\CH10TA01.DAT'; input income risk amount; run; proc print data=insurance; run; *diagnostics; title2 h=2 'residual plots'; symbol1 v=circle c=black; proc reg data=insurance; model amount = income risk/r p; plot r.*(p. income risk); run;

4 Life Insurance: output, diagnostics

5 Life Insurance: Initial Regression Analysis of Variance SourceDF Sum of Squares Mean Square F ValuePr > F Model217391986960542.33<.0001 Error152405.14763160.34318 Corrected Total17176324 Root MSE12.66267R-Square0.9864 Dependent Mean134.44444Adj R-Sq0.9845 Coeff Var9.41851 Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t| Intercept1-205.7186611.39268-18.06<.0001 income16.288030.2041530.80<.0001 risk14.737601.378083.440.0037

6 Life Insurance: Scatter plot title2 h=2 'Scatterplot'; proc sgscatter data=insurance; matrix income risk amount; run;

7 Life Insurance – Residual Plots

8 Life Insurance – Residual Plots (cont)

9 Life Insurance: Partial Regression Plots (1) proc reg data=insurance; model amount=income risk/partial; run;

10 Life Insurance: Partial Regression Plots (1)

11 Life Insurance: Partial Regression Plots (2) risk title1 h=3 'Partial residual plot'; title2 h=2 'for risk'; symbol1 v=circle i=rl; axis1 label=(h=2 'Risk Aversion Score'); axis2 label=(h=2 angle=90 'Amount of Insurance'); proc reg data=insurance; model amount risk = income; output out=partialrisk r=resamt resrisk; proc gplot data=partialrisk; plot resamt*resrisk / haxis=axis1 vaxis=axis2 vref = 0; run;

12 Life Insurance: Partial Regression Plots (2) risk (cont)

13 Life Insurance: Partial Regression Plots (2) income axis3 label=(h=2 'Income'); title2 h=2 'for income'; proc reg data=insurance; model amount income = risk; output out=partialincome r=resamt resinc; proc gplot data=partialincome; plot resamt*resinc / haxis=axis3 vaxis=axis2 vref = 0; run;

14 Life Insurance: Partial Regression Plots (2) income (cont)

15 Life Insurance: Quadratic title1 'Quadratic model'; title2 ''; data quad; set insurance; sinc = income; proc standard data=quad out=quad mean=0; var sinc; data quad; set quad; incomesq = sinc*sinc; proc corr data=quad; var amount risk income incomesq; run; proc reg data=quad; model amount = income risk incomesq; run;

16 Life Insurance: Quadratic (regression) Analysis of Variance SourceDF Sum of Squares Mean Square F ValuePr > F Model31762495875010958.0<.0001 Error1475.058955.36135 Corrected Total17176324 Root MSE2.31546R-Square0.9996 Dependent Mean134.44444Adj R-Sq0.9995 Coeff Var1.72224 Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t| Intercept1-200.811342.09649-95.78<.0001 income15.886250.04201140.11<.0001 risk15.400390.2539921.26<.0001 incomesq10.050870.0024420.85<.0001

17 Life Insurance: Quadratic (residual plots)

18 Life Insurance: normality Original ModelWith Quadratic Term

19 Types of Outliers

20 Life Insurance: Studentized Residuals (nknw364.sas) proc reg data=quad; model amount=income risk incomesq/r; output out = diag r=resid student=student; run; proc print data=diag; run;

21 Life Insurance: Studentized Residuals (cont) Output Statistics Obs Dependent Variable Predicted Value Std Error Mean Predict Residual Std Error Residual Student Residual -2-1 0 1 2 191.000097.81640.7181-6.81642.201-3.097 |******| | 2162.0000160.12010.95771.87992.1080.892 | |* | 311.000011.59011.5574-0.59011.713-0.344 | | | 4240.0000240.62780.8580-0.62782.151-0.292 | | | 573.000071.50190.66561.49812.2180.675 | |* | 6311.0000309.67771.43631.32231.8160.728 | |* | 7316.0000315.63592.01000.36411.1500.317 | | | 8154.0000153.36450.98290.63552.0960.303 | | | 9164.0000162.48470.82111.51532.1650.700 | |* | 1054.000052.40680.73461.59322.1960.726 | |* | 1153.000052.80600.83400.19402.1600.0898 | | | 12326.0000327.69751.4378-1.69751.815-0.935 | *| | 1355.000054.49570.71420.50432.2030.229 | | | 14130.0000131.01791.2720-1.01791.935-0.526 | *| | 15112.0000109.60800.81852.39202.1661.104 | |** | 1691.000093.09920.8093-2.09922.169-0.968 | *| | 1714.000013.81351.20420.18651.9780.0943 | | | 1863.000062.23630.67760.76372.2140.345 | | |

22 Life Insurance: Studentized Residuals (cont) Obsincomeriskamountsincincomesqresidstudent 145.010691-5.026825.268-6.81637-3.09652 257.20441627.167251.3691.879880.89174 326.852511-23.1848537.534-0.59009-0.34440 466.290724016.2532264.167-0.62783-0.29193 540.964573-9.072882.3151.498070.67550 672.9961031122.9592527.1261.322290.72806 779.380131629.3432861.0250.364070.31672 852.76681542.72927.4490.635520.30314 955.91661645.879234.5651.515320.69992 1038.122454-11.9148141.9621.593230.72557 1135.840653-14.1968201.5480.193970.08980 1275.796932625.7592663.538-1.69746-0.93525 1337.408555-12.6288159.4860.504250.22894 1454.37621304.339218.829-1.01786-0.52609 1546.1867112-3.850814.8282.392051.10437 1646.130491-3.906815.263-2.09925-0.96765 1730.366314-19.6708386.9390.186470.09429 1839.060563-10.9768120.4900.763740.34494

23 Life Insurance: Studentized Deleted Residuals proc reg data=quad; model amount=income risk incomesq/r influence; output out = diag1 r=resid rstudent=rstudent; run; proc print data=diag1; run;

24 Studentized Deleted Residuals (cont) Obs Dependent Variable Predicted Value RStudent Hat Diag H 191.000097.8164-5.31550.0962 2162.0000160.12010.88480.1711 311.000011.5901-0.33330.4524 4240.0000240.6278-0.28220.1373 573.000071.50190.66180.0826 6311.0000309.67770.71530.3848 7316.0000315.63590.30630.7535 8154.0000153.36450.29310.1802 9164.0000162.48470.68660.1258 1054.000052.40680.71270.1006 1153.000052.80600.08660.1297 12326.0000327.6975-0.93080.3856 1355.000054.49570.22100.0951 14130.0000131.0179-0.51200.3018 15112.0000109.60801.11380.1249 1691.000093.0992-0.96530.1222 1714.000013.81350.09090.2705 1863.000062.23630.33380.0856

25 Studentized Deleted Residuals (cont) Sum of Residuals0 Sum of Squared Residuals75.05895 Predicted Residual SS (PRESS)103.99525

26 Studentized Deleted Residuals (cont) Obsincomeriskamountsincincomesqresidrstudent 145.010691-5.026825.268-6.81637-5.31555 257.20441627.167251.3691.879880.88480 326.852511-23.1848537.534-0.59009-0.33328 466.290724016.2532264.167-0.62783-0.28217 540.964573-9.072882.3151.498070.66180 672.9961031122.9592527.1261.322290.71525 779.380131629.3432861.0250.364070.30630 852.76681542.72927.4490.635520.29307 955.91661645.879234.5651.515320.68658 1038.122454-11.9148141.9621.593230.71270 1135.840653-14.1968201.5480.193970.08656 1275.796932625.7592663.538-1.69746-0.93078 1337.408555-12.6288159.4860.504250.22103 1454.37621304.339218.829-1.01786-0.51204 1546.1867112-3.850814.8282.392051.11382 1646.130491-3.906815.263-2.09925-0.96529 1730.366314-19.6708386.9390.186470.09089 1839.060563-10.9768120.4900.763740.33381

27 Studentized Deleted Residuals: w/o square Obs Dependent Variable Predicted Value RStudent Hat Diag H 191.0000105.7311-1.22590.0693 2162.0000172.9321-0.90480.1006 311.0000-13.18452.44870.1890 4240.0000244.2780-0.35180.1316 573.000075.5522-0.20280.0756 6311.0000300.65831.01380.3499 7316.0000298.16272.74830.6225 8154.0000163.9763-0.83710.1319 9164.0000174.3084-0.83360.0658 1054.000052.94400.08500.1005 1153.000048.06990.40330.1201 12326.0000313.52721.19330.2994 1355.000053.19190.14510.0944 14130.0000145.6744-1.44150.2096 15112.0000117.8634-0.47420.0957 1691.0000103.2985-1.01200.0775 1714.0000-0.56361.30040.1818 1863.000063.5798-0.04620.0849

28 /r vs. /influence /r keyword /influence keyword Obs Dependent Variable Predicted Value Std Error Mean Predict Residual Std Error Residual Student Residual bar graph Cook's D ObsResidualRStudent Hat Diag H Cov Ratio DFFITS DFBETAS all parameters

29 Hat Matrix Diagnosis, DFFITS ObsResidualRStudentHat Diag HCov RatioDFFITS 1-6.8164-5.31550.09620.0147-1.7339 21.87990.88480.17111.28420.4020 3-0.5901-0.33330.45242.3742-0.3029 4-0.6278-0.28220.13731.5215-0.1126 51.49810.66180.08261.28420.1986 61.32230.71530.38481.87350.5656 70.36410.30630.75355.30270.5356 80.63550.29310.18021.59810.1374 91.51530.68660.12581.33420.2604 101.59320.71270.10061.28300.2384 110.19400.08660.12971.54200.0334 12-1.6975-0.93080.38561.6912-0.7373 130.50430.22100.09511.46430.0717 14-1.0179-0.51200.30181.7786-0.3366 152.39201.11380.12491.06750.4209 16-2.0992-0.96530.12221.1616-0.3601 170.18650.09090.27051.83900.0553 180.76370.33380.08561.42160.1022

30 Cook’s Distance, DFBetas, Cov Ratio Obs Cook's D Cov Ratio DFFITS DFBETAS Interceptincomeriskincomesq 10.2550.0147-1.7339-0.41260.0662-0.36860.9168 20.0411.28420.40200.01100.2513-0.2064-0.2579 30.0252.3742-0.3029-0.18390.2513-0.0525-0.2312 40.0031.5215-0.11260.0642-0.0692-0.02990.0230 50.0101.28420.19860.1216-0.0566-0.0108-0.0580 60.0831.87350.5656-0.36270.11830.39010.1704 70.0775.30270.5356-0.02490.2235-0.33810.2233 80.0051.59810.1374-0.03720.02450.0788-0.0712 90.0181.33420.2604-0.04620.13330.0084-0.1799 100.0151.28300.23840.1978-0.0988-0.0773-0.0084 110.0001.54200.03340.0195-0.02440.01260.0091 120.1371.6912-0.73730.4425-0.1728-0.3821-0.3486 130.0011.46430.07170.0535-0.04270.00300.0063 140.0301.7786-0.3366-0.0807-0.17460.25830.1861 150.0441.06750.42090.0160-0.01950.2003-0.2036 160.0331.1616-0.3601-0.1515-0.07740.16540.2177 170.0011.83900.05530.0462-0.0383-0.01500.0317 180.0031.42160.10220.0714-0.0471-0.0003-0.0097

31 Life Insurance: Multicollinearity proc reg data=quad; model amount=income risk incomesq/tol vif; run; Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t|Tolerance Variance Inflation Intercept1-200.811342.09649-95.78<.0001.0 income15.886250.04201140.11<.00010.738421.35424 risk15.400390.2539921.26<.00010.920581.08627 incomesq10.050870.0024420.85<.00010.789541.26657

32 Body Fat: Multicollinearity (nknw260b.sas) data bodyfat; infile 'I:\My Documents\Stat 512\CH07TA01.DAT'; input skinfold thigh midarm fat; proc print data=bodyfat; run; proc reg data=bodyfat; model fat=skinfold thigh midarm/vif tol; run; Parameter Estimates VariableDFParameter Estimate Standard Error t ValuePr > |t|ToleranceVariance Inflation Intercept1117.0846999.782401.170.2578.0 skinfold14.334093.015511.440.16990.00141708.84291 thigh1-2.856852.58202-1.110.28490.00177564.34339 midarm1-2.186061.59550-1.370.18960.00956104.60601

33 Blood Pressure Example: Background (nknw406.sas) Researching the relationship between blood pressure in healthy women ages 20 – 60. Y = diastolic blood pressure (diast) X = age n = 54

34 Blood Pressure: input data pressure; infile ‘H:\My Documents\Stat 512\CH11TA01.DAT'; input age diast; proc print data=pressure; run; title1 h=3 'Blood Pressure'; title2 h=2 'Scatter plot'; symbol1 v=circle i=sm70 c=purple; axis1 label=(h=2); axis2 label=(h=2 angle=90); proc sort data=pressure; by age; proc gplot data=pressure; plot diast*age; run;

35 Blood Pressure: Scatterplot

36 Blood Pressure: regression (unweighted) proc reg data=pressure; model diast=age / clb; output out=diag r=resid; run; Analysis of Variance SourceDF Sum of Squares Mean Square F ValuePr > F Model12374.96833 35.79<.0001 Error523450.3650166.35317 Corrected Total535825.33333 Root MSE8.14575R-Square0.4077 Dependent Mean79.11111Adj R-Sq0.3963 Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t|95% Confidence Limits Intercept156.156933.9936714.06<.000148.1430464.17082 age10.580030.096955.98<.00010.385480.77458

37 Blood Pressure: Residual Plots data diag; set diag; absr=abs(resid); sqrr=resid*resid; title2 h=2 'residual abs(resid) squared residual plots vs. age'; proc gplot data=diag; plot (resid absr sqrr)*age/haxis=axis1 vaxis=axis2; run;

38 Blood Pressure: Residual Plots (cont)

39 Blood Pressure: computing weights proc reg data=diag; model absr=age; output out=findweights p=shat; data findweights; set findweights; wt=1/(shat*shat);

40 Blood Pressure: computing weights if using resid 2 proc reg data=diag; model sqrr=age; output out=findweights p=shat2; data findweights; set findweights; wt=1/shat2;

41 Blood Pressure: weighted regression proc reg data=findweights; model diast=age / clb p; weight wt; output out = weighted r = resid p = predict; run; Analysis of Variance SourceDF Sum of Squares Mean Square F ValuePr > F Model183.34082 56.64<.0001 Error5276.513511.47141 Corrected Total53159.85432 Root MSE1.21302R-Square0.5214 Dependent Mean73.55134Adj R-Sq0.5122 Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t|95% Confidence Limits Intercept155.565772.5209222.04<.000150.5071860.62436 age10.596340.079247.53<.00010.437340.75534

42 Blood pressure: Comparison Normal Regression Weighted Regression Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t|95% Confidence Limits Intercept155.565772.5209222.04<.000150.5071860.62436 age10.596340.079247.53<.00010.437340.75534 Parameter Estimates VariableDF Parameter Estimate Standard Error t ValuePr > |t|95% Confidence Limits Intercept156.156933.9936714.06<.000148.1430464.17082 age10.580030.096955.98<.00010.385480.77458

43 Blood Pressure: new residuals data graphtest; set weighted; resid1 = sqrt(wt)*resid; title2 h=2 'Weighted data - residual plot'; symbol1 v=circle i=none color=red; proc gplot data=graphtest; plot resid1*predict/vref=0 haxis=axis1 vaxis=axis2; run;

44 Blood Pressure: new residuals

45 Biased vs. Unbiased Estimators

46 Body Fat Example (ridge.sas) n = 20 healthy female subjects ages of 25 – 34 Y = body fat (fat) X 1 = triceps skinfold thickness (skinfold) X 2 = thigh circumference (thigh) X 3 = midarm circumference (midarm) Previous Conclusion: Problem with multicollinearity Good model with a) thigh only or with b) midarm and skinfold only

47 Body Fat Example: Regression (input) data bodyfat; infile 'I:\My Documents\Stat 512\CH07TA01.DAT'; input skinfold thigh midarm fat; proc print data=bodyfat; run; proc reg data=bodyfat; model fat=skinfold thigh midarm; run;

48 Body Fat Example: Regression (output) Analysis of Variance SourceDFSum of Squares Mean Square F ValuePr > F Model3396.98461132.3282021.52<.0001 Error1698.404896.15031 Corrected Total19495.38950 Root MSE2.47998R-Square0.8014 Dependent Mean20.19500Adj R-Sq0.7641 Coeff Var12.28017 Parameter Estimates VariableDFParameter Estimate Standard Error t ValuePr > |t| Intercept1117.0846999.782401.170.2578 skinfold14.334093.015511.440.1699 thigh1-2.856852.58202-1.110.2849 midarm1-2.186061.59550-1.370.1896

49 Body Fat Example: Scatter plot

50 Body Fat Example: Correlation proc corr data=bodyfat noprob;run; Pearson Correlation Coefficients, N = 20 skinfoldthighmidarmfat skinfold1.000000.923840.457780.84327 thigh0.923841.000000.084670.87809 midarm0.457780.084671.000000.14244 fat0.843270.878090.142441.00000

51 Body Fat Example: Ridge trace title1 h=3 'Ridge Trace'; title2 h=2 'Body Fat Example'; axis1 label=(h=2); axis2 label= (h=2 angle=90); symbol1 v = S i = none c = black; symbol2 v = T i = none c = red; symbol3 v = M i = none c = green; proc reg data = bodyfat outvif outest = bfout ridge = 0 to.1 by 0.002; model fat = skinfold thigh midarm / noprint; plot / ridgeplot nomodel nostat; run;

52 Body Fat Example: Ridge trace (cont)

53 Body Fat Example: VIF factors title2 h=2 'Variance Inflation Factors'; proc gplot data = bfout; plot (skinfold thigh midarm)* _RIDGE_ / overlay haxis=axis1 vaxis=axis2; where _TYPE_ = 'RIDGEVIF'; run;

54 Body Fat Example: VIF factors (cont) proc print data = bfout; var _RIDGE_ skinfold thigh midarm; where _TYPE_ = 'RIDGEVIF'; Obs_RIDGE_skinfoldthighmidarm 20.000708.843564.343104.606 40.00250.55940.4488.280 60.00416.98213.7253.363 80.0068.5036.9762.119 100.0085.1474.3051.624 120.0103.4862.9811.377 140.0122.5432.2311.236 160.0141.9581.7641.146 180.0161.5701.4541.086 200.0181.2991.2381.043 220.0201.1031.0811.011 240.0220.9560.9630.986 260.0240.8430.8720.966 280.0260.7540.8010.949 300.0280.6830.7440.935

55 Body Fat Example: Parameters title2 'Parameter Estimates'; proc print data = bfout; var _RIDGE_ _RMSE_ Intercept skinfold thigh midarm; where _TYPE_ = 'RIDGE'; run; Obs_RIDGE__RMSE_Interceptskinfoldthighmidarm 30.0002.47998117.0854.33409-2.85685-2.18606 50.0022.5492122.2771.46445-0.40119-0.67381 70.0042.571737.7251.02294-0.02423-0.44083 90.0062.581741.8420.843720.12820-0.34604 110.0082.58739-1.3310.746450.21047-0.29443 130.0102.59104-3.3120.685300.26183-0.26185 150.0122.59360-4.6610.643240.29685-0.23934 170.0142.59551-5.6370.612490.32218-0.22278 190.0162.59701-6.3730.588990.34131-0.21004 210.0182.59822-6.9460.570420.35623-0.19991 230.0202.59924-7.4030.555350.36814-0.19163 250.0222.60011-7.7760.542870.37786-0.18470 270.0242.60087-8.0830.532330.38590-0.17881 290.0262.60156-8.3410.523310.39265-0.17372 310.0282.60218-8.5590.515490.39837-0.16926


Download ppt "Partial Regression Plots. Life Insurance Example: (nknw364.sas) Y = the amount of life insurance for the 18 managers (in $1000) X 1 = average annual income."

Similar presentations


Ads by Google