Download presentation
Presentation is loading. Please wait.
Published byEdwin Johnston Modified over 8 years ago
1
7 - 1 Chapter 7: Data Analysis in the Service of Modeling The Art of Modeling with Spreadsheets S.G. Powell and K.R. Baker © John Wiley and Sons, Inc. PowerPoint Slides Prepared By: Tava Olsen Washington University in St. Louis
2
7 - 2 Data Analysis in the Context of Modeling Supports the modeling process Improves accuracy of model Improves usefulness of conclusions Modeling is the primary goal Data analysis is a means to that goal
3
7 - 3 Topics for Chapter Finding facts in databases Searching, editing, sorting, and filtering Estimating parameters Point estimates and interval estimates Estimating relationships among variables Single, multiple, and nonlinear regression Forecasting a single variable Time series methods
4
7 - 4 Databases Tables of information Each row is a record in the database Each column is a field for the records Excel calls such a table a list
5
7 - 5 Excel Lists First row contains names for each field Each successive row contains one record Lists may be: Searched and edited Sorted Filtered Tabulated
6
7 - 6 Searching and Editing Lists First assign a range name to entire list Include column titles With list selected choose Data – Form Examine records one at a time: Find Prev Find Next Enter new record with New button Delete record with Delete button
7
7 - 7 Database Form ***Insert Figure 7.8
8
7 - 8 Criteria Button Found under Data – Form Allows for searching of records Enter data into a field Click Find Next
9
7 - 9 Alternate Excel Search Techniques Highlight entire database Use Edit – Find to search Use Find and Replace to edit entries In Find and Replace “?” stands for any single symbol “*” stands for any sequence of symbols
10
7 - 10 Sorting: Data – Sort Command ***insert figure 7.10
11
7 - 11 Filtering Select database then Data – Filter – AutoFilter Will filter lists based on values Found under arrow at the title of each column Arrow on title turns blue to remind list is filtered Can remove filter by: Select (All) using the list arrow; or Selecting Show All under Data – Filter
12
7 - 12 More Filtering Top 10 option returns records with smallest or largest value of a numerical record Custom option allows filtering with compound criteria More complicated compound criteria can be achieved with Data – Filter – Advanced Filter submenu
13
7 - 13 Tabulating Select Data – Pivot Table Creates summary tables Layout button on third step of wizard creates the format for the table
14
7 - 14 Analyzing Sample Data Data is unlikely to cover whole population Work with sample from population Statistics are summary measures about sample Want to construct statistics that represent population Convenience sampling Have easy access to information on subset of population Subset may not be representative Random sampling All objects in population have equal chance of appearing in sample
15
7 - 15 Descriptive Statistics Summarizes information in sample Gives numerical picture of observations Excel Tools – Data Analysis Descriptive Statistics table produced based on data given as input
16
7 - 16 Inferential Statistics Use information in sample to make inferences about population Systematic Error If sample not representative of population Avoid by careful sampling Sampling Error Sample is merely subset of population Mitigated by taking large samples
17
7 - 17 Point Estimates The sample average is calculated as: The sample variance is calculated as: and its square root is the sample standard deviation:
18
7 - 18 Interval Estimates P(L <= <= U) = 1 – L and U represent the lower and upper limits of the interval 1 – represents the confidence level Usually a large percentage like 95 or 99% represents the (unknown) true value of the parameter.
19
7 - 19 Sampling Theory Working with a population described by a Normal probability model Mean and standard deviation . Take repeated samples of n items from population Calculate the sample average each time The sample averages will follow a Normal distribution with a mean of and a variance of 2 /n
20
7 - 20 Estimates Standard error: the standard deviation of some function being used to provide an estimate Use the sample average to estimate the population mean The standard deviation of the sample average is called the standard error of the mean:
21
7 - 21 Z-scores The z-score measures the number of standard deviations away from the mean The z-score corresponding to any particular sample average is: Tells how many standard errors from the mean 90% of the sample averages will have z-scores between –1.64 and +1.64 The chances are 90% that the sample average will fall no more than 1.64 standard errors from the true mean
22
7 - 22 Confidence Intervals for Means Upper and lower limits on estimate for mean: n>30 recommended unless original population resembles Normal z can be computed using NORMSINV(1- /2) Replace by the sample standard deviation s Provided that sample is larger than n = 30 Excel Descriptive Statistics also will calculate half- width of confidence interval
23
7 - 23 Interval Estimates for a Proportion To estimate the sample proportion p, the interval estimate is: Sample size should be at least 50 for this formula to be reliable
24
7 - 24 Sample Size Determination Suppose want to estimate mean of sample to within a range of ±R n = (z / R) 2 Assumes: Sampling from Normal distribution Known variance – can begin with small sample to estimate standard deviation
25
7 - 25 Sample Size Determination for Proportions Suppose want to estimate a proportion to within a range of ±R n = z 2 p(1 – p) / R 2 Value maximized at p = 0.5 Conservative value: n = (z/2) 2 / R 2
26
7 - 26 Estimating Relationships Scatter plot – visualize association Correlation: n – number of pairs of observations for x, y s x, s y – standard deviations of x, y r – measures strength of linear relationship between x and y
27
7 - 27 r-statistic Independent of units of measurement Lies in range [-1, 1] r > 0 – positive association r < 0 – negative association r close to 1 (or –1) implies a strong association r close to 0 implies a weak association Excel function: CORREL(xrange,yrange)
28
7 - 28 Regression Relationships Relationships based on empirical data Dependent variable – predicted from values of one or more independent variables Regression models can be: Linear or nonlinear Simple or multiple
29
7 - 29 Simple Linear Regression y = a + bx + e y - dependent variable x - independent variable e - an “error” term. Constants a and b represent the intercept and slope, respectively, of the regression line
30
7 - 30 Error Term in Regression Unexplained “noise” in the relationship May represent limitations of knowledge Or may represent random deviations of the dependent variable from its mean, y
31
7 - 31 Regression Goal Want to find line to most closely match the observed relationship between x and y Define “most closely” as minimizing sum of squared differences between observed and model values Minimizing sum of differences would set y equal to its mean Penalizes large differences more than small differences
32
7 - 32 Performing Regression Residuals: e i = y i – y = y i – (a + bx i ) Sum of squared differences between observations and model : SS = The regression problem: choose a and b to minimize SS
33
7 - 33 Regression Analysis Assumes residuals are normally distributed with mean 0 Regression parameters can be calculated directly from the data Simpler to use Excel’s regression tool (Under Data Analysis menu)
34
7 - 34 Quantifying Regression Fit Coefficient of determination: R 2 Lies in range [0, 1] Closer to one – better fit Measures how much of the variation in y- values is explained by model 1 – perfect match to model 0 – equation explains none of observed variation
35
7 - 35 Regression Window *** insert Figure 7.28
36
7 - 36 Regression Output R Squared Degree of significance (under 0.1 is significant) Estimate for a Estimate for b P values of under 0.1 are statistically significant
37
7 - 37 Simple Nonlinear Regression A straight line may not be the most plausible description of dependency, e.g., y = ax b Can follow previous ideas to minimize sum of squared differences No Excel functions or simple formulas Or can transform non-linear relationship into linear one, e.g., log y = log a + b log x Give up some intuition for convenience
38
7 - 38 Multiple Linear Regression Multiple independent variables y = a 0 + a 1 x 1 + a 2 x 2 + … + a m x m + e Work with n observations – each has: One observation of dependent variable One observation each of the m independent variables Seek to minimize the sum of squared differences Put all independent variables into x-range in Excel’s regression tool
39
7 - 39 Regression Output Coefficient of multiple determination Coefficients of regression equation P values of under 0.1 are statistically significant Square root of R square Accounts for presence of multiple variables
40
7 - 40 Values to Include in Regression Ideally pick values that can be justified based on practical or theoretical grounds Could choose set that generates largest value of adjusted R 2 Also could choose based on those with significant p-values for coefficients Remember that good models require good forecasts for the independent variables
41
7 - 41 Regression Assumptions Errors in the regression model Follow a Normal distribution Are mutually independent Have the same variance Linearity is assumed to hold
42
7 - 42 Forecasting with Time Series Models Use historical data Assume near-term future will resemble past Hypothesize a model with: An average level: x t = + e – mean value; e – random noise term A trend A seasonal or cyclic fluctuation
43
7 - 43 Measures of Forecast Accuracy MSE – Mean Squared Error between forecast and actual MAD – Mean Absolute Deviation between forecast and actual MAPE – Mean Absolute Percent Error between forecast and actual
44
7 - 44 Moving Average Model x t : observation from period t n-period moving average forecast: F t = (x t + x t–1 + … + x t–n+1 ) / n Under Excel Data Analysis Moving Average: interval = number of periods Pairs forecast F t and observation x t
45
7 - 45 Exponential Smoothing Historic observations: x t, x t–1, x t–2, etc. Forecast: F t = x t + (1 – )F t–1 Smoothing constant: Implies: F t = x t + (1 – )x t–1 + (1 – ) 2 x t–2 + (1 – ) 3 x t–3 + … F t = F t–1 + (x t – F t–1 ) Under Excel Data Analysis damping factor = 1 -
46
7 - 46 Exponential Smoothing with a Trend x t = + t + e Forecast calculated after the observation for period t will be calculated as (F t + T t ) and – smoothing constants F t = x t + (1 – )(F t–1 + T t–1 ) T t = (F t – F t–1 ) + (1 – )T t–1
47
7 - 47 Exponential Smoothing with Trend and Seasonality x t = ( + t)S t + e p = number of periods in a cycle Forecast calculated after the observation for period t will be calculated as (F t + T t )S t–p+1 , and – smoothing constants F t = x t / S t-p + (1 – )(F t–1 + T t–1 ) T t = (F t – F t–1 ) + (1 – )T t–1 S t = x t / F t + (1 – ) S t-p
48
7 - 48 Summary Data collection and analysis should support modeling Locate relevant information Estimate parameters and relations Construct routine forecasts Excel provides many tools Databases: searching, sorting, filtering, and tabulating Data Analysis: descriptive statistics, linear regression, moving average and exponentially smoothed forecasts
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.