© 1998, Geoff Kuenning Comparison Methodology Meaning of a sample Confidence intervals Making decisions and comparing alternatives Special considerations.

Slides:



Advertisements
Similar presentations
Forecasting Using the Simple Linear Regression Model and Correlation
Advertisements

Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Ch11 Curve Fitting Dr. Deshi Ye
Objectives (BPS chapter 24)
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Multiple regression analysis
Chapter 10 Simple Regression.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Lecture 4 Page 1 CS 239, Spring 2007 Models and Linear Regression CS 239 Experimental Methodologies for System Software Peter Reiher April 12, 2007.
Chapter 11 Multiple Regression.
k r Factorial Designs with Replications r replications of 2 k Experiments –2 k r observations. –Allows estimation of experimental errors Model:
REGRESSION AND CORRELATION
Introduction to Probability and Statistics Linear Regression and Correlation.
Inferences About Process Quality
SIMPLE LINEAR REGRESSION
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Correlation and Linear Regression
Comparing Systems Using Sample Data Andy Wang CIS Computer Systems Performance Analysis.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Regression Analysis (2)
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Simple Linear Regression Models
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions.
© 1998, Geoff Kuenning General 2 k Factorial Designs Used to explain the effects of k factors, each with two alternatives or levels 2 2 factorial designs.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
CHAPTER 14 MULTIPLE REGRESSION
Other Regression Models Andy Wang CIS Computer Systems Performance Analysis.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Manijeh Keshtgary Chapter 13.  How to report the performance as a single number? Is specifying the mean the correct way?  How to report the variability.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
CPE 619 Comparing Systems Using Sample Data Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of.
Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
Data Analysis Overview Experimental environment prototype real sys exec- driven sim trace- driven sim stochastic sim Workload parameters System Config.
Comparing Systems Using Sample Data Andy Wang CIS Computer Systems Performance Analysis.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Lecture 3 Page 1 CS 239, Spring 2007 Variability in Data CS 239 Experimental Methodologies for System Software Peter Reiher April 10, 2007.
Linear Regression Models Andy Wang CIS Computer Systems Performance Analysis.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The simple linear regression model and parameter estimation
Comparing Systems Using Sample Data
Chapter 4 Basic Estimation Techniques
Linear Regression Models
Correlation and Simple Linear Regression
Correlation and Regression
Correlation and Simple Linear Regression
SIMPLE LINEAR REGRESSION
Replicated Binary Designs
SIMPLE LINEAR REGRESSION
Presentation transcript:

© 1998, Geoff Kuenning Comparison Methodology Meaning of a sample Confidence intervals Making decisions and comparing alternatives Special considerations in confidence intervals Sample sizes

© 1998, Geoff Kuenning Estimating Confidence Intervals Two formulas for confidence intervals –Over 30 samples from any distribution: z-distribution –Small sample from normally distributed population: t-distribution Common error: using t-distribution for non-normal population –Central Limit Theorem often saves us

© 1998, Geoff Kuenning The z Distribution Interval on either side of mean: Significance level  is small for large confidence levels Tables of z are tricky: be careful!

© 1998, Geoff Kuenning The t Distribution Formula is almost the same: Usable only for normally distributed populations! But works with small samples

© 1998, Geoff Kuenning Making Decisions Why do we use confidence intervals? –Summarizes error in sample mean –Gives way to decide if measurement is meaningful –Allows comparisons in face of error But remember: at 90% confidence, 10% of sample means do not include population mean

© 1998, Geoff Kuenning Testing for Zero Mean Is population mean significantly nonzero? If confidence interval includes 0, answer is no Can test for any value (mean of sums is sum of means) Example: our height samples are consistent with average height of 170 cm –Also consistent with 160 and 180!

© 1998, Geoff Kuenning Comparing Alternatives Often need to find better system –Choose fastest computer to buy –Prove our algorithm runs faster Different methods for paired/unpaired observations –Paired if ith test on each system was same –Unpaired otherwise

© 1998, Geoff Kuenning Comparing Paired Observations Treat problem as 1 sample of n pairs For each test calculate performance difference Calculate confidence interval for differences If interval includes zero, systems aren’t different –If not, sign indicates which is better

© 1998, Geoff Kuenning Example: Comparing Paired Observations Do home baseball teams outscore visitors? Sample from :

© 1998, Geoff Kuenning Example: Comparing Paired Observations H-V Mean 1.4, 90% interval (-0.75, 3.6) –Can’t reject the hypothesis that difference is 0. –70% interval is (0.10, 2.76)

© 1998, Geoff Kuenning Comparing Unpaired Observations A sample of size n a and n b for each alternative A and B Start with confidence intervals –If no overlap: Systems are different and higher mean is better (for HB metrics) –If overlap and each CI contains other mean: Systems are not different at this level If close call, could lower confidence level –If overlap and one mean isn’t in other CI Must do t-test mean A B A B B A

© 1998, Geoff Kuenning The t-test (1) 1. Compute sample means and 2. Compute sample standard deviations s a and s b 3. Compute mean difference = 4. Compute standard deviation of difference:

© 1998, Geoff Kuenning The t-test (2) 5. Compute effective degrees of freedom: 6. Compute the confidence interval: 7. If interval includes zero, no difference !

© 1998, Geoff Kuenning Comparing Proportions If k of n trials give a certain result, then confidence interval is If interval includes 0.5, can’t say which outcome is statistically meaningful Must have k>10 to get valid results !

© 1998, Geoff Kuenning Special Considerations Selecting a confidence level Hypothesis testing One-sided confidence intervals

© 1998, Geoff Kuenning Selecting a Confidence Level Depends on cost of being wrong 90%, 95% are common values for scientific papers Generally, use highest value that lets you make a firm statement –But it’s better to be consistent throughout a given paper

© 1998, Geoff Kuenning Hypothesis Testing The null hypothesis (H 0 ) is common in statistics –Confusing due to double negative –Gives less information than confidence interval –Often harder to compute Should understand that rejecting null hypothesis implies result is meaningful

© 1998, Geoff Kuenning One-Sided Confidence Intervals Two-sided intervals test for mean being outside a certain range (see “error bands” in previous graphs) One-sided tests useful if only interested in one limit Use z 1-   or t 1-  ;n instead of z 1-  /2  or t 1-  /2;n in formulas

© 1998, Geoff Kuenning Sample Sizes Bigger sample sizes give narrower intervals –Smaller values of t, v as n increases – in formulas But sample collection is often expensive –What is the minimum we can get away with? Start with a small number of preliminary measurements to estimate variance.

© 1998, Geoff Kuenning Choosing a Sample Size To get a given percentage error ±r%: Here, z represents either z or t as appropriate For a proportion p = k/n:

© 1998, Geoff Kuenning Example of Choosing Sample Size Five runs of a compilation took 22.5, 19.8, 21.1, 26.7, 20.2 seconds How many runs to get ±5% confidence interval at 90% confidence level? = 22.1, s = 2.8, t 0.95;4 = 2.132

© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions Verifying assumptions visually

© 1998, Geoff Kuenning What Is a (Good) Model? For correlated data, model predicts response given an input Model should be equation that fits data Standard definition of “fits” is least-squares –Minimize squared error –While keeping mean error zero –Minimizes variance of errors

© 1998, Geoff Kuenning Least-Squared Error If then error in estimate for x i is Minimize Sum of Squared Errors (SSE) Subject to the constraint yiyi N y N

© 1998, Geoff Kuenning Estimating Model Parameters Best regression parameters are where Note error in book!

© 1998, Geoff Kuenning Parameter Estimation Example Execution time of a script for various loop counts: = 6.8, = 2.32,  xy = 88.54,  x 2 = 264 b 0 = 2.32  (0.29)(6.8) = 0.35

© 1998, Geoff Kuenning Graph of Parameter Estimation Example

© 1998, Geoff Kuenning Variants of Linear Regression Some non-linear relationships can be handled by transformations –For y = ae bx take logarithm of y, do regression on log(y) = b 0 +b 1 x, let b = b 1, –For y = a+b log(x), take log of x before fitting parameters, let b = b 1, a = b 0 –For y = ax b, take log of both x and y, let b = b 1,

© 1998, Geoff Kuenning Allocating Variation If no regression, best guess of y is Observed values of y differ from, giving rise to errors (variance) Regression gives better guess, but there are still errors We can evaluate quality of regression by allocating sources of errors

© 1998, Geoff Kuenning The Total Sum of Squares Without regression, squared error is

© 1998, Geoff Kuenning The Sum of Squares from Regression Recall that regression error is Error without regression is SST So regression explains SSR = SST - SSE Regression quality measured by coefficient of determination

© 1998, Geoff Kuenning Evaluating Coefficient of Determination Compute

© 1998, Geoff Kuenning Example of Coefficient of Determination For previous regression example –  y = 11.60,  y 2 = 29.79,  xy = 88.54, –SSE = (0.35)(11.60)-(0.29)(88.54) = 0.05 –SST = = 2.89 –SSR = = 2.84 –R 2 = ( )/2.89 = 0.98

© 1998, Geoff Kuenning Standard Deviation of Errors Variance of errors is SSE divided by degrees of freedom –DOF is n  2 because we’ve calculated 2 regression parameters from the data –So variance (mean squared error, MSE) is SSE/(n  2) Standard deviation of errors is square root:

© 1998, Geoff Kuenning Checking Degrees of Freedom Degrees of freedom always equate: –SS0 has 1 (computed from ) –SST has n  1 (computed from data and, which uses up 1) –SSE has n  2 (needs 2 regression parameters) –So

© 1998, Geoff Kuenning Example of Standard Deviation of Errors For our regression example, SSE was 0.05, so MSE is 0.05/3 = and s e = 0.13 Note high quality of our regression: –R 2 = 0.98 –s e = 0.13 –Why such a nice straight-line fit?

© 1998, Geoff Kuenning Confidence Intervals for Regressions Regression is done from a single population sample (size n) –Different sample might give different results –True model is y =  0 +  1 x –Parameters b 0 and b 1 are really means taken from a population sample

© 1998, Geoff Kuenning Calculating Intervals for Regression Parameters Standard deviations of parameters: Confidence intervals are b i t s bi where t has n - 2 degrees of freedom !

© 1998, Geoff Kuenning Example of Regression Confidence Intervals Recall s e = 0.13, n = 5,  x 2 = 264, = 6.8 So Using a 90% confidence level, t 0.95;3 = 2.353

© 1998, Geoff Kuenning Regression Confidence Example, cont’d Thus, b 0 interval is –Not significant at 90% And b 1 is –Significant at 90% (and would survive even 99.9% test) ! (0.004) = (0.28,0.30) ! (0.16) = (-0.03,0.73)

© 1998, Geoff Kuenning Confidence Intervals for Nonlinear Regressions For nonlinear fits using exponential transformations: –Confidence intervals apply to transformed parameters –Not valid to perform inverse transformation on intervals

© 1998, Geoff Kuenning Confidence Intervals for Predictions Previous confidence intervals are for parameters –How certain can we be that the parameters are correct? Purpose of regression is prediction –How accurate are the predictions? –Regression gives mean of predicted response, based on sample we took

© 1998, Geoff Kuenning Predicting m Samples Standard deviation for mean of future sample of m observations at x p is Note deviation drops as m  Variance minimal at x = Use t-quantiles with n–2 DOF for interval y mp N S

© 1998, Geoff Kuenning Example of Confidence of Predictions Using previous equation, what is predicted time for a single run of 8 loops? Time = (8) = 2.67 Standard deviation of errors s e = % interval is then ! ypyp N S

© 1998, Geoff Kuenning Verifying Assumptions Visually Regressions are based on assumptions: –Linear relationship between response y and predictor x Or nonlinear relationship used in fitting –Predictor x nonstochastic and error-free –Model errors statistically independent With distribution N(0,c) for constant c If assumptions violated, model misleading or invalid

© 1998, Geoff Kuenning Testing Linearity Scatter plot x vs. y to see basic curve type LinearPiecewise Linear OutlierNonlinear (Power)

© 1998, Geoff Kuenning Testing Independence of Errors Scatter-plot  i versus Should be no visible trend Example from our curve fit: yiyi N

© 1998, Geoff Kuenning More on Testing Independence May be useful to plot error residuals versus experiment number –In previous example, this gives same plot except for x scaling No foolproof tests

© 1998, Geoff Kuenning Testing for Normal Errors Prepare quantile-quantile plot Example for our regression:

© 1998, Geoff Kuenning Testing for Constant Standard Deviation Tongue-twister: homoscedasticity Return to independence plot Look for trend in spread Example:

© 1998, Geoff Kuenning Linear Regression Can Be Misleading Regression throws away some information about the data –To allow more compact summarization Sometimes vital characteristics are thrown away –Often, looking at data plots can tell you whether you will have a problem

© 1998, Geoff Kuenning Example of Misleading Regression IIIIIIIV xyxyxyxy

© 1998, Geoff Kuenning What Does Regression Tell Us About These Data Sets? Exactly the same thing for each! N = 11 Mean of y = 7.5 Y = X Standard error of regression is All the sums of squares are the same Correlation coefficient =.82 R 2 =.67

© 1998, Geoff Kuenning Now Look at the Data Plots III IIIIV

For Discussion Today Project Proposal 1.Statement of hypothesis 2.Workload decisions 3.Metrics to be used 4.Method