# Estimating the accuracy of the approximation (surrogate) From assumption that error is due to normally distributed uncorrelated random variables, get.

## Presentation on theme: "Estimating the accuracy of the approximation (surrogate) From assumption that error is due to normally distributed uncorrelated random variables, get."— Presentation transcript:

Estimating the accuracy of the approximation (surrogate) From assumption that error is due to normally distributed uncorrelated random variables, get estimate to error standard deviation (called standard error) Standard measure of accuracy Coefficient of multiple determination measures how much of variability in data is captured by approximation Adjusted coefficient of multiple determination accounts for the fitting bias

Curve fit noise=randn(1,30); x=1:1:30; y=x+noise 3.908 2.825 4.379 2.942 4.5314 5.7275 8.098 …………………………………25.84 27.47 27.00 30.96 [p,s]=polyfit(x,y,1); yfit=polyval(p,x); plot(x,y,'+',x,x,'r',x,yfit,'b') With dense data, functional form is clear. Fit serves to filter out noise

Example with y=0.1*x noise=randn(1,30); x=1:1:30; y=0.1*x+noise ; xx=[ones(30,1),x']; [B,BINT,R,RINT,STATS] = regress(y',xx) Stat 0.3016 12.0896 0.0017 1.7498

Estimating error in coefficients Some coefficients are more accurately estimated than others Standard error in coefficient is t-statistic is ratio of coefficient to standard error, would like it to be at least 2 Coefficients that are poorly estimated may be dropped to improve accuracy of predictions Dropping one coefficients changes t-statistics for others Need to iterate in dropping and adding coefficients

Regression in Excel (add-in data analysis) Rand Rand-0.5 x y fit error 0.7647420.26474211.2647421.035390.03539 0.258649-0.2413521.7586492.0311920.031192 0.7350260.23502633.2350263.0269940.026994 0.411036-0.0889643.9110364.0227970.022797 0.6749210.1749212424.1749223.93884-0.06116 0.694810.194812525.1948124.93465-0.06535 0.6479640.1479642626.1479625.93045-0.06955 0.407839-0.092162726.9078426.92625-0.07375 0.211674-0.288332827.7116727.92205-0.07795 0.405013-0.094992928.9050128.91786-0.08214 0.242633-0.257373029.7426329.91366-0.08634

Regression output SUMMARY OUTPUT Regression Statistics Multiple R0.999381 R Square0.998763 Adjusted R Square0.998719 Standard Error0.313962 Observations30 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept0.0395870.1175700.3367110.738845-0.2012450.280419 X Variable 10.9958020.006623150.3646232.93E-420.9822371.009368

Output with y=0.1x SUMMARY OUTPUT Regression Statistics Multiple R0.969193 R Square0.939334 Adjusted R Square0.937168 Standard Error0.251021 Observations30 Coefficients Standard Errort StatP-valueLower 95%Upper 95% Intercept-0.190830.094-2.030120.051942-0.38340.0017 X Variable 10.110250.00529520.821771.41E-180.09940.1211

Example 3.2.1 Given data Use Microsoft Excel to fit linear and quadratic polynomials Compare standard errors and t-statistics of coefficients X-2012 Y-1.5 01.251.75

Linear fit

Graphical comparison.

Cross validation Error estimates based on model assumptions are vulnerable For polynomial response surface approximations assumptions are rarely satisfied Cross validation divides data into n g groups Fit the approximation to n g -1 groups, and use last group to estimate error. Repeat for each group When each group consists of one point, error called PRESS (prediction error sum of squares) Calculate error at each point and then presenting r.m.s error Can be shown that Can be used only if not ill-conditioned

Questions The pairs (0,0), (1,1), (2,1) represent strain (millistrains) and stress (ksi) measurements. Estimate Young modulus using the three commonly used error norms. Estimate the error in Young modulus using cross validation

Download ppt "Estimating the accuracy of the approximation (surrogate) From assumption that error is due to normally distributed uncorrelated random variables, get."

Similar presentations