Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spreadsheet Problem Solving

Similar presentations


Presentation on theme: "Spreadsheet Problem Solving"— Presentation transcript:

1 Spreadsheet Problem Solving
fitting models to data straight-line regression multilinear regression nonlinear regression model building and selection using Data Analysis Regression tool Trendline Solver

2 Review of Straight-line Linear Regression
[ from Class #6 ] x y y = ax + b y1 y11 e11 Model x11 For each data point, there is an error between that point and the model line. Fitting the model has to do with minimizing these errors.

3 Finding the model parameters that give the best fit
For the straight-line model, the model parameters are the slope (a) and the intercept (b). The problem is then to find the values of a and b that give the best fit. What is meant by the best fit? The standard measure of goodness of fit is the sum of squares of the errors: So, the problem reduces to finding the minimum of SSE by adjusting a and b.

4 Fitting a straight-line model to data
The minimization of SSE can be solved by calculus to give formulas for the best values of a and b: and Excel solves problems like this with either formulas or built-in tools (Data Analysis Regression & Trendline).

5 Example: straight-line fit

6 Transfer the data to an Excel spreadsheet
and create a graph

7 Calculating the slope and intercept using Excel formulas

8 The formulas behind the numbers

9 Using the model straight-line equation to compute
the predictions: and copy these to the graph, displaying as a straight line

10

11 Using an alternate, shortcut approach Trendline
Start with a simple graph of the data Select the data series by clicking on it Right-click on a data point to get context-sensitive menu Select Add Trendline option

12 The Add Trendline dialog box
Linear selected by default OK for this problem Click on Options tab

13 Options tab Set for Display equation on chart Click OK

14 Initial form of graph with straight-line added
Fix up equation display

15 Looks just like before, but we got there quicker
But neither of these approaches gives us much information about the model, how good it is, etc.

16 A 2nd alternate approach Data Analysis Regression tool
Tools Data Analysis recall that, if Data Analysis does not appear on the Tools menu, you will need to check Analysis Toolpak in the Add-ins dialog box [if it’s not there, you will have to go back to Microsoft Office/Excel set-up] Initial, empty Regression dialog box

17 Regression dialog box set up for our problem
checking Residuals will give us also model predictions

18 Initial (poorly formatted) Regression output display
[ on new worksheet ] Format Autoformat OK and fix up display for appropriate significant figures

19 Final Display of Regression Output
[ tons of info, most of which you will not understand for a couple years ] used to judge goodness of fit intercept and slope values used to judge whether terms “belong” in the model add to data graph for visual comparison with model

20 Judging Goodness of Fit
correlation coefficient: if close to +1 or –1, indicates strong correlation between x and y [something we already know from the original graph!] coefficient of determination: %-age of the variability in y that’s accounted for by the model adjustment to R2 that penalizes the value for using a model with too many terms gives an idea of how far off the model predictions will be Adjusted R2 or Standard Error can be used to compare different models and choose which fits best. The higher the value of Adjusted R2 the better, the lower the value of Standard Error the better.

21 Judging whether terms belong in the model
P-values estimate the probability that the true value of the coefficient could be zero P-values that are quite small, like these, indicate that there is little question about the significance of the term coefficients. In our case here, that means that both the intercept term and the slope term belong in the model. A P-value of 5% (0.05) or greater causes suspicion that the coefficient may not be significant and that the term should probably be dropped from the model

22 The Data Analysis Regression tool appears much more complicated and involved that the shortcut Trendline tool, so . . . Why use Data Analysis Regression? It provides more information that let’s us judge the goodness of fit and significance of model terms 2) It can handle model forms that cannot be handled by Trendline So, generally, when using Excel, we prefer the Data Analysis Regression tool over Trendline but Trendline is still quite good for “quick and dirty” looks at the data Learn to use both!

23 More complicated models
Note: it is called linear regression, even when there are nonlinear terms in x, because the terms are linear in the model parameters, a, b, c, etc. More complicated models Polynomial models General linear models Examples: polynomial models above Multilinear models Examples:

24 Nonlinear models Transformable to linear Not transformable straight-line regression! We can use the Data Analysis Regression tool for everything except the nonlinear models that can’t be transformed into linear. For those, we can use the Solver.

25 Example: polynomial regression
curvature evident

26 Setting up for polynomial fits
Select for quadratic model, etc

27 Data Analysis Regression tool
check Labels because headings are included in selections for Y and X check Residuals

28 Quadratic model regression results
model performance adjR2 model coefficients copy to graph

29 Quadratic model really doesn’t “capture” behavior of data

30 Continue with fits of cubic, 4th- & 5th-order polynomials
Summary of results Looks like 5th-order offers best performance but improvement is marginal over 4th-order. Resulting model:

31

32 Precautions on polynomial fitting
Try to use the lowest-order model that gives a good fit. Higher-order models will have “wiggles” between data points that will cause prediction errors. In fact, an (n-1)th-order polynomial will provide a perfect fit to the n data points, but it will usually do bizarre things in between the data points.

33 Example: multi-linear regression Model 1:
X-input range includes two independent variables: x1 and x2 High P value for intercept in Model 1 suggests Model 2 without intercept, but there is a significant loss in adjR2

34 Model performance isn’t that
great for either model, and Model 1 doesn’t appear dramatically better than Model 2 Note: for multi-linear models, we plot Predicted vs Measured y. A perfect model would place points directly on the 45-degree line.

35 Nonlinear Regression Fitting the parameters of the van der Waals’ equation of state Data for SO2 Find the values of a and b that give the best predictions for P, when compared to the measured values of P

36 Strategy for Nonlinear Regression
1) estimate initial values for a and b 2) compute predicted P’s using data for and T 3) compute errors between predicted P’s and measured P’s 4) sum the squares of these errors to compute SSE 5) have the Solver minimize SSE by adjusting the values of a and b

37 - Basic data Calculated Pressure by both ideal gas law
and van der Waals Sum of squares of this column -

38 van der Waals Calculation
Ideal Gas Calculation Sum of Squares Calculation van der Waals Calculation Error Calculation

39 Setting up Solver Parameters
SSE as Target Cell Minimize by adjusting a and b with b>=0 constraint Results

40 Results

41 Note departure of ideal gas predictions at higher pressures


Download ppt "Spreadsheet Problem Solving"

Similar presentations


Ads by Google