Presentation is loading. Please wait.

Presentation is loading. Please wait.

7 - 1 Chapter 7: Data Analysis in the Service of Modeling The Art of Modeling with Spreadsheets S.G. Powell and K.R. Baker © John Wiley and Sons, Inc.

Similar presentations


Presentation on theme: "7 - 1 Chapter 7: Data Analysis in the Service of Modeling The Art of Modeling with Spreadsheets S.G. Powell and K.R. Baker © John Wiley and Sons, Inc."— Presentation transcript:

1 7 - 1 Chapter 7: Data Analysis in the Service of Modeling The Art of Modeling with Spreadsheets S.G. Powell and K.R. Baker © John Wiley and Sons, Inc. PowerPoint Slides Prepared By: Tava Olsen Washington University in St. Louis

2 7 - 2 Data Analysis in the Context of Modeling  Supports the modeling process Improves accuracy of model Improves usefulness of conclusions  Modeling is the primary goal Data analysis is a means to that goal

3 7 - 3 Topics for Chapter  Finding facts in databases Searching, editing, sorting, and filtering  Estimating parameters Point estimates and interval estimates  Estimating relationships among variables Single, multiple, and nonlinear regression  Forecasting a single variable Time series methods

4 7 - 4 Databases  Tables of information  Each row is a record in the database  Each column is a field for the records  Excel calls such a table a list

5 7 - 5 Excel Lists  First row contains names for each field  Each successive row contains one record  Lists may be: Searched and edited Sorted Filtered Tabulated

6 7 - 6 Searching and Editing Lists  First assign a range name to entire list Include column titles  With list selected choose Data – Form  Examine records one at a time: Find Prev Find Next Enter new record with New button Delete record with Delete button

7 7 - 7 Database Form  ***Insert Figure 7.8

8 7 - 8 Criteria Button  Found under Data – Form  Allows for searching of records Enter data into a field Click Find Next

9 7 - 9 Alternate Excel Search Techniques  Highlight entire database  Use Edit – Find to search  Use Find and Replace to edit entries  In Find and Replace “?” stands for any single symbol “*” stands for any sequence of symbols

10 7 - 10 Sorting: Data – Sort Command  ***insert figure 7.10

11 7 - 11 Filtering  Select database then Data – Filter – AutoFilter  Will filter lists based on values Found under arrow at the title of each column  Arrow on title turns blue to remind list is filtered  Can remove filter by: Select (All) using the list arrow; or Selecting Show All under Data – Filter

12 7 - 12 More Filtering  Top 10 option returns records with smallest or largest value of a numerical record  Custom option allows filtering with compound criteria  More complicated compound criteria can be achieved with Data – Filter – Advanced Filter submenu

13 7 - 13 Tabulating  Select Data – Pivot Table  Creates summary tables  Layout button on third step of wizard creates the format for the table

14 7 - 14 Analyzing Sample Data  Data is unlikely to cover whole population  Work with sample from population Statistics are summary measures about sample Want to construct statistics that represent population  Convenience sampling Have easy access to information on subset of population Subset may not be representative  Random sampling All objects in population have equal chance of appearing in sample

15 7 - 15 Descriptive Statistics  Summarizes information in sample  Gives numerical picture of observations  Excel Tools – Data Analysis Descriptive Statistics table produced based on data given as input

16 7 - 16 Inferential Statistics  Use information in sample to make inferences about population  Systematic Error If sample not representative of population Avoid by careful sampling  Sampling Error Sample is merely subset of population Mitigated by taking large samples

17 7 - 17 Point Estimates  The sample average is calculated as:  The sample variance is calculated as:  and its square root is the sample standard deviation:

18 7 - 18 Interval Estimates  P(L <=  <= U) = 1 –   L and U represent the lower and upper limits of the interval  1 –  represents the confidence level Usually a large percentage like 95 or 99%   represents the (unknown) true value of the parameter.

19 7 - 19 Sampling Theory  Working with a population described by a Normal probability model Mean  and standard deviation .  Take repeated samples of n items from population  Calculate the sample average each time  The sample averages will follow a Normal distribution with a mean of  and a variance of  2 /n

20 7 - 20 Estimates  Standard error: the standard deviation of some function being used to provide an estimate  Use the sample average to estimate the population mean  The standard deviation of the sample average is called the standard error of the mean:

21 7 - 21 Z-scores  The z-score measures the number of standard deviations away from the mean  The z-score corresponding to any particular sample average is:  Tells how many standard errors from the mean  90% of the sample averages will have z-scores between –1.64 and +1.64 The chances are 90% that the sample average will fall no more than 1.64 standard errors from the true mean

22 7 - 22 Confidence Intervals for Means  Upper and lower limits on estimate for mean:  n>30 recommended unless original population resembles Normal  z can be computed using NORMSINV(1-  /2)  Replace  by the sample standard deviation s Provided that sample is larger than n = 30  Excel Descriptive Statistics also will calculate half- width of confidence interval

23 7 - 23 Interval Estimates for a Proportion  To estimate the sample proportion p, the interval estimate is:  Sample size should be at least 50 for this formula to be reliable

24 7 - 24 Sample Size Determination  Suppose want to estimate mean of sample to within a range of ±R n = (z  / R) 2  Assumes: Sampling from Normal distribution Known variance – can begin with small sample to estimate standard deviation

25 7 - 25 Sample Size Determination for Proportions  Suppose want to estimate a proportion to within a range of ±R n = z 2 p(1 – p) / R 2  Value maximized at p = 0.5  Conservative value: n = (z/2) 2 / R 2

26 7 - 26 Estimating Relationships  Scatter plot – visualize association  Correlation:  n – number of pairs of observations for x, y  s x, s y – standard deviations of x, y  r – measures strength of linear relationship between x and y

27 7 - 27 r-statistic  Independent of units of measurement  Lies in range [-1, 1]  r > 0 – positive association  r < 0 – negative association  r close to 1 (or –1) implies a strong association  r close to 0 implies a weak association  Excel function: CORREL(xrange,yrange)

28 7 - 28 Regression Relationships  Relationships based on empirical data  Dependent variable – predicted from values of one or more independent variables  Regression models can be: Linear or nonlinear Simple or multiple

29 7 - 29 Simple Linear Regression  y = a + bx + e  y - dependent variable  x - independent variable  e - an “error” term.  Constants a and b represent the intercept and slope, respectively, of the regression line

30 7 - 30 Error Term in Regression  Unexplained “noise” in the relationship  May represent limitations of knowledge  Or may represent random deviations of the dependent variable from its mean, y

31 7 - 31 Regression Goal  Want to find line to most closely match the observed relationship between x and y  Define “most closely” as minimizing sum of squared differences between observed and model values Minimizing sum of differences would set y equal to its mean Penalizes large differences more than small differences

32 7 - 32 Performing Regression  Residuals: e i = y i – y = y i – (a + bx i )  Sum of squared differences between observations and model : SS =  The regression problem: choose a and b to minimize SS

33 7 - 33 Regression Analysis  Assumes residuals are normally distributed with mean 0  Regression parameters can be calculated directly from the data  Simpler to use Excel’s regression tool (Under Data Analysis menu)

34 7 - 34 Quantifying Regression Fit  Coefficient of determination: R 2  Lies in range [0, 1]  Closer to one – better fit  Measures how much of the variation in y- values is explained by model 1 – perfect match to model 0 – equation explains none of observed variation

35 7 - 35 Regression Window  *** insert Figure 7.28

36 7 - 36 Regression Output R Squared Degree of significance (under 0.1 is significant) Estimate for a Estimate for b P values of under 0.1 are statistically significant

37 7 - 37 Simple Nonlinear Regression  A straight line may not be the most plausible description of dependency, e.g., y = ax b  Can follow previous ideas to minimize sum of squared differences No Excel functions or simple formulas  Or can transform non-linear relationship into linear one, e.g., log y = log a + b log x Give up some intuition for convenience

38 7 - 38 Multiple Linear Regression  Multiple independent variables y = a 0 + a 1 x 1 + a 2 x 2 + … + a m x m + e  Work with n observations – each has: One observation of dependent variable One observation each of the m independent variables  Seek to minimize the sum of squared differences  Put all independent variables into x-range in Excel’s regression tool

39 7 - 39 Regression Output Coefficient of multiple determination Coefficients of regression equation P values of under 0.1 are statistically significant Square root of R square Accounts for presence of multiple variables

40 7 - 40 Values to Include in Regression  Ideally pick values that can be justified based on practical or theoretical grounds  Could choose set that generates largest value of adjusted R 2  Also could choose based on those with significant p-values for coefficients  Remember that good models require good forecasts for the independent variables

41 7 - 41 Regression Assumptions  Errors in the regression model Follow a Normal distribution Are mutually independent Have the same variance  Linearity is assumed to hold

42 7 - 42 Forecasting with Time Series Models  Use historical data  Assume near-term future will resemble past  Hypothesize a model with: An average level: x t =  + e  – mean value; e – random noise term A trend A seasonal or cyclic fluctuation

43 7 - 43 Measures of Forecast Accuracy  MSE – Mean Squared Error between forecast and actual  MAD – Mean Absolute Deviation between forecast and actual  MAPE – Mean Absolute Percent Error between forecast and actual

44 7 - 44 Moving Average Model  x t : observation from period t  n-period moving average forecast: F t = (x t + x t–1 + … + x t–n+1 ) / n  Under Excel Data Analysis Moving Average: interval = number of periods Pairs forecast F t and observation x t

45 7 - 45 Exponential Smoothing  Historic observations: x t, x t–1, x t–2, etc.  Forecast: F t =  x t + (1 –  )F t–1  Smoothing constant:   Implies: F t =  x t +  (1 –  )x t–1 +  (1 –  ) 2 x t–2 +  (1 –  ) 3 x t–3 + … F t = F t–1 +  (x t – F t–1 )  Under Excel Data Analysis damping factor = 1 - 

46 7 - 46 Exponential Smoothing with a Trend  x t =  +  t + e  Forecast calculated after the observation for period t will be calculated as (F t + T t )   and  – smoothing constants  F t =  x t + (1 –  )(F t–1 + T t–1 )  T t =  (F t – F t–1 ) + (1 –  )T t–1

47 7 - 47 Exponential Smoothing with Trend and Seasonality  x t = (  +  t)S t + e  p = number of periods in a cycle  Forecast calculated after the observation for period t will be calculated as (F t + T t )S t–p+1  ,  and  – smoothing constants  F t =  x t / S t-p + (1 –  )(F t–1 + T t–1 )  T t =  (F t – F t–1 ) + (1 –  )T t–1  S t =  x t / F t + (1 –  ) S t-p

48 7 - 48 Summary  Data collection and analysis should support modeling Locate relevant information Estimate parameters and relations Construct routine forecasts  Excel provides many tools Databases: searching, sorting, filtering, and tabulating Data Analysis: descriptive statistics, linear regression, moving average and exponentially smoothed forecasts


Download ppt "7 - 1 Chapter 7: Data Analysis in the Service of Modeling The Art of Modeling with Spreadsheets S.G. Powell and K.R. Baker © John Wiley and Sons, Inc."

Similar presentations


Ads by Google