Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ch4 Describing Relationships Between Variables. Pressure.

Similar presentations


Presentation on theme: "Ch4 Describing Relationships Between Variables. Pressure."— Presentation transcript:

1 Ch4 Describing Relationships Between Variables

2 Pressure

3 Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example from an experiment we might have the following data showing the relationship of density of specimens made from a ceramic compound at different pressures. By fitting a line to the data we can predict what the average density would be for specimens made at any given pressure, even pressures we did not investigate experimentally.

4

5 For a straight line we assume a model which says that on average in the whole population of possible specimens the average density, y, value is related to pressure, x, by the equation The population (true) intercept and slope are represented by Greek symbols just like  and .

6 How to choose the best line? ----Principle of least squares To apply the principle of least squares in the fitting of an equation for y to an n-point data set, values of the equation parameters are chosen to minimize where y 1, y 2, …, y n are the observed responses and y i -hat are corresponding responses predicted or fitted by the equation.

7 In another word We want to choose a slope and intercept so as to minimize the sum of squared vertical distances from the data points to the line in question.

8 A least squares fit minimizes the sum of squared deviations from the fitted line minimize Deviations from the fitted line are called “residuals” We are minimizing the sum of squared residuals, called the “residual sum of squares.”

9 Come again We need to minimize over all possible values of  0 and  1. This is a calculus problem (take partial derivatives).

10 The resulting formulas for the least squares estimates of the intercept and slope are Notice the notation. We use b 1 and b 0 to denote some particular values for the parameters  1 and  0.

11 The fitted line For the measured data we fit a straight line For the i th point, the fitted value or predicted value is which represent likely y behavior at that x value.

12 Ceramic Compound data xy(x-x_bar)(y-y_bar)(x-x_bar)*(y-y_bar)(x-x_bar)^2 20002.486-4000-0.18172416000000 20002.479-4000-0.18875216000000 20002.472-4000-0.19578016000000 40002.558-2000-0.1092184000000 40002.57-2000-0.0971944000000 40002.58-2000-0.0871744000000 60002.6460-0.02100 60002.6570-0.0100 60002.6530-0.01400 80002.72420000.0571144000000 80002.77420000.1072144000000 80002.80820000.1412824000000 100002.86140000.19477616000000 100002.87940000.21284816000000 100002.85840000.19176416000000 5840120000000 x_bar=6000b1=4.87E-05 y_bar=2.667b0=2.375

13

14 Interpolation At the situation for x=5000psi, there are no data with this x value. If interpolation is sensible from a physical view point, the fitted value can be used to represent density for 5,000 psi pressure.

15 “Least squares” is the optimal method to use for fitting the line if – The relationship is in fact linear. – For a fixed value of x the resulting values of y are normally distributed with the same constant variance at all x values.

16 If these assumptions are not met, then we are not using the best tool for the job. For any statistical tool, know when that tool is the right one to use.

17 4.1.2 The Sample Correlation and Coefficient of Determination The sample (linear) correlation coefficient, r, is a measure of how “correlated” the x and y variable are. The correlation is between -1 and 1 +1 means perfect positive linear correlation 0 means no linear correlation -1 means perfect negative linear correlation

18 The sample correlation is computed by It is a measure of the strength of an apparent linear relationship.

19

20

21 Coefficient of Determination It is another measure of the quality of a fitted equation.

22 lnterpretation of R 2 R 2 = fraction of variation accounted for (explained) by the fitted line.

23 Pressurey = Densityy - mean(y-mean)^2 20002486-18132761 20002479-18835344 20002472-19538025 40002558-10911881 40002570-979409 40002580-877569 60002646-21441 60002657-10100 60002653-14196 80002724573249 8000277410711449 8000280814119881 10000286119437636 10000287921244944 10000285819136481 mean60002667sum0289366 st dev2927.7143.767 correlation0.991 correl^20.982

24 ObservationPredicted DensityResidualsResidual^2 12472.33313.667186.778 22472.3336.66744.444 32472.333-0.3330.111 42569.667-11.667136.111 52569.6670.3330.111 62569.66710.333106.778 72667.000-21.000441.000 82667.000-10.000100.000 92667.000-14.000196.000 102764.333-40.3331626.778 112764.3339.66793.444 122764.33343.6671906.778 132861.667-0.6670.444 142861.66717.333300.444 152861.667-3.66713.444 5152.666667sum

25 If we don't use the pressures to predict density – We use to predict every y i – Our sum of squared errors is = SS Total If we do use the pressures to predict density – We use to predict y i = SS Residual

26 The percent reduction in our error sum of squares is Using x to predict y decreases the error sum of squares by 98.2%.

27 The reduction in error sum of squares from using x to predict y is – Sum of squares explained by the regression equation – 284,213.33 = SS Regression This is also the correlation squared. r 2 = 0.991 2 = 0.982=R 2

28 For a perfectly straight line All residuals are zero. – The line fits the points exactly. SS Residual = 0 SS Regression = SS Total – The regression equation explains all variation R 2 = 100% r = ±1 r 2 = 1 If r=0, then there is no linear relationship between x and y R 2 = 0% Using x to predict y with a linear model does not help at all.

29 4.1.3 Computing and Using Residuals Does our linear model extract the main message of the data? What is left behind? (hopefully only small fluctuations explainable only as random variation) Residuals!

30 Good Residuals: Pattenless They should look randomly scattered if a fitted equation is telling the whole story. When there is a pattern, there is something gone unaccounted for in the fitting.

31 Plotting residuals will be most crucial in section 4.2 with multiple x variables – But residual plots are still of use here. Plot residuals versus – Predicted values – Versus x – In run order – Versus other potentially influential variables, e.g. technician – Normal Plot of residuals Read the book Page 135 for more Residual plots.

32 Checking Model Adequacy With only single x variable, we can tell most of what we need from a plot with the fitted line.

33 A residual plot gives us a magnified view of the increasing variance and curvature. This residual plot indicates 2 problems with this linear least squares fit The relationship is not linear – Indicated by the curvature in the residual plot The variance is not constant – So the least squares method isn't the best approach even if we handle the nonlinearity.

34 Some Cautions Don't fit a linear function to these data directly with least squares. With increasing variability, not all squared errors should count equally.

35 Some Study Questions What does it mean to say that a line fit to data is the "least squares" line? Where do the terms least and squares come from? We are fitting data with a straight line. What 3 assumptions (conditions) need to be true for a linear least squares fit to be the optimal way of fitting a line to the data? What does it mean if the correlation between x and y is -1? What is the residual sum of squares in this situation? If the correlation between x and y is 0, what is the regression sum of squares, SS Regression, in this situation?

36 Consider the following data. ANOVA dfSSMSFSignificance F Regression1124.0333124.0315.850.016383072 Residual431.37.825 Total5155.3333 Coefficients Standard Errort StatP-valueLower 95%Upper 95% Intercept-0.52.79732-0.17870.867-8.266605837.2666058 X1.5250.3830393.98130.0160.4615136982.5884863

37 What is the value of R 2 ? What is the least squares regression equation? How much does y increase on average if x is increased by 1.0? What is the sum of squared residuals? Do not compute the residuals; find the answer in the Excel output. What is the sum of squared deviations of y from y bar? By how much is the sum of squared errors reduced by using x to predict y compared to using only y bar to predict y? What is the residual for the point with x = 2?


Download ppt "Ch4 Describing Relationships Between Variables. Pressure."

Similar presentations


Ads by Google