We’ll consider here the problem of paired data. There are two common notations. (x 1, y 1 ), (x 2, y 2 ), …, (x n, y n ) shows the data as n points in.

Slides:



Advertisements
Similar presentations
Multiple Regression and Model Building
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Regression and correlation methods
Lesson 10: Linear Regression and Correlation
Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Chapter 4 The Relation between Two Variables
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Simple Linear Regression
The Basics of Regression continued
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
REGRESSION AND CORRELATION
CHAPTER 3 Describing Relationships
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Descriptive Methods in Regression and Correlation
Lecture 3-2 Summarizing Relationships among variables ©
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Ch4 Describing Relationships Between Variables. Pressure.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Chapter 8 Linear Regression. Slide 8- 2 Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Simple Linear Regression In the previous lectures, we only focus on one random variable. In many applications, we often work with a pair of variables.
Chapter 12: Correlation and Linear Regression 1.
Correlation & Regression Analysis
Economics 173 Business Statistics Lecture 10 Fall, 2001 Professor J. Petry
Chapter 8: Simple Linear Regression Yang Zhenlin.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
Math 4030 – 11b Method of Least Squares. Model: Dependent (response) Variable Independent (control) Variable Random Error Objectives: Find (estimated)
ANOVA, Regression and Multiple Regression March
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Lecture Slides Elementary Statistics Twelfth Edition
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
CHAPTER 12 More About Regression
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Lecture #26 Thursday, November 17, 2016 Textbook: 14.1 and 14.3
Linear Regression.
CHAPTER 3 Describing Relationships
Statistics 200 Lecture #5 Tuesday, September 6, 2016
Chapter 5 STATISTICS (PART 4).
Multiple Regression.
CHAPTER 12 More About Regression
Simple Linear Regression - Introduction
Lecture Slides Elementary Statistics Thirteenth Edition
CHAPTER 26: Inference for Regression
6-1 Introduction To Empirical Models
CHAPTER 3 Describing Relationships
Chapter 3 Describing Relationships Section 3.2
Product moment correlation
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 12 More About Regression
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

We’ll consider here the problem of paired data. There are two common notations. (x 1, y 1 ), (x 2, y 2 ), …, (x n, y n ) shows the data as n points in two-space XY x1x1 y1y1 x2x2 y2y2 x3x3 y3y3 … … xnxn ynyn This is the spreadsheet form. PowerPoint show prepared by Gary Simon, 11 MARCH 2008.

The separate points are assumed independent. We wish to find a relationship between variable X and variable Y. We have here a data set on eye response to different types of drops, but for now we’ll look at just a few simple items of information. DP0ODPupil diameter, start of experiment, right eye DP0OSPupil diameter, start of experiment, left eye AGESubject age There are altogether 100 subjects.

Let’s consider the relationship between pupil diameter in the eyes. An obvious first step is making a scatterplot showing all 100 people. Let’s put the right eye on the horizontal axis and the left eye on the vertical axis. This is not a critical decision. This graph shows that the points cluster near a diagonal line. This is not a surprise.

Here’s the same picture with the Y = X line superimposed: The points cling close to the line.

There are a few simple ways to summarize this situation. Perhaps the best is the correlation. Here r = Now let’s complicate this a bit. Suppose that we want to check on the relationship between DP0OS (pupil diameter, left eye) and AGE.

These two variables are not symmetric. We’ll think of the variable AGE as “logically earlier.” This means that we obtain it easily, reliably, and (probably) earlier than the pupil diameter. Also, it’s logical to think of using AGE to predict pupil diameter. We will designate AGE as the independent variable, we will identify it with the symbol X, and we will place it on the horizontal axis of the coming scatterplot.

We’ll think of the variable DP0OS as “logically later.” This information is obtained with some difficulty, with possible error measurement, and (probably) later than the age. We will designate DP0OS as the dependent variable, we will identify it with the symbol Y, and we will place it on the vertical axis of the coming scatterplot.

The scatterplot is next. Before it’s shown, we should ask ourselves whether *pupil diameter generally rises with age *pupil diameter is unrelated to age *pupil diameter generally decreases with age What do you think?

Here is the scatterplot:

Suppose that you would like to summarize the relationship between the two variables. You would like to write Pupil Diameter = Y = dependent variable = f(AGE) = f(X) = f(independent variable) for some function f. The problem is that you’ll never find a believable function to go through all the dots on the scatterplot. There is too much statistical noise.

The expression of the model will be revised to Y = f(X) + ε The symbol ε represents statistical noise. It may involve random errors in measuring Y or it may just represent variability that we just don’t know to account for. One could also have made “multiplicative noise” in the form Y = f(X) × ε. In some cases, this is useful. For now, we’ll stick with the “additive noise” with the + sign. We will have a lot to say about the ε term. For now, we’ll just assume that it is independent over the data points.

What form should we use for the function f ? How about f(X) = log X ? How about f(X) = a X 2 + b X + c ? How about f(X) = tan( a X 2 + h) ? How about f(X) = ?

We will start with the simplest function, the straight line. This is f(X) = β 0 + β 1 X. The symbols β 0 and β 1 are parameters. β 0 is the intercept, also called Y-intercept. β 1 is the slope. In nearly all cases, β 0 and β 1 are not known, and we have to estimate them from data.

The notation is not universal. You will also see f(X) = α + β XThis is OK. f(X) = a + b XUse of Roman letters is not recommended. For issues related to considering which symbols are fixed and which are random, we will prefer f(x) = β 0 + β 1 x. That is, we will prefer lower-case x. It is however impossible to enforce distinctions between x and X and also between y and Y. We can’t be too dogmatic about the notation.

The relationship between Y and X will be described through the simple linear regression model Y = β 0 + β 1 x + ε This is made more direct by putting on subscript i to label individual data points. Our preferred form for the simple linear regression model is Y i = β 0 + β 1 x i + ε i with i = 1, 2, …, n.

The simple linear regression model also includes these assumptions about the noise terms ε 1, ε 2, ε 3, …, ε n : The ε’s are independent of each other and also independent of the x’s. The ε’s are sampled from a hypothetical population in which the mean is zero and the standard deviation is σ. In some cases, we may add in the further assumption that the ε’s are sampled from a normal population.

The simple linear regression model Y i = β 0 + β 1 x i + ε i has three unknown parameters: β 0, β 1, and σ. Estimating these parameters is an important part of the regression task. Estimating β 0 and β 1 is equivalent to drawing a line on the scatterplot. The estimate of σ tells us how well the line describes the set of points on the scatterplot.

The estimate of β 0 is written b 0. The estimate of β 1 is written b 1. The estimate of σ is written s. You’ll also see s ε or s Y | x. Note this consistent pattern of usage: Model parameters are Greek letters. Data-based estimates are corresponding Latin letters.

Be aware that other schemes exist. Someone who writes the model as Y i = α + βx i + ε i will use a for the estimate of α and will use b for the estimate of β. Someone who writes the model as Y i = a + b x i + ε i will use for the estimate of a and will use for the estimate of b.

For our problem, the model is DP i = β 0 + β 1 AGE i + ε i The pupil diameter DP is in units of mm (millimeters). The variable AGE is in units of years. Therefore, β 0 and its estimate b 0 are in units of mm. Also, the ε’s and their standard deviation σ are in units of mm. The estimate of σ is also in units of mm. The slope β 1 and its estimate b 1 are in units of.

How should we estimate β 0 and β 1 ? We could guess. We could draw a nice-looking line on the scatterplot and then use that line to get the estimates. These are not necessarily bad methods, but they are not reproducible. This means that different people get different answers. Worse yet, the same person on two occasions will produce different answers.

We will instead propose that the estimates be done by minimizing a mathematical function. Many proposals have been made, but the nearly universal choice is least squares. Choose b 0 and b 1 to minimize the function Q = How should this minimization be done?

The solution is by (mindless and routine) differentiation. That is, solve the system This results in two linear equations in the two unknowns b 0 and b 1.

The solution method selected by the previous slide works, but it’s clumsy. Here is a cleaner way to do this. (1) Find the five sums,,,,. (2) Next find these quantities:,, S xx =, S yy =, S xy =

(3) Find b 1 (the estimate of the slope β 1 ) as b 1 = (4) Find b 0 (the estimate of the intercept β 0 ) as b 0 = - b 1 Note that b 1 is found before b 0.

(5) Finally, calculate S yy | x = We’ll use this later in the estimation of σ, the standard deviation of the noise.

While it’s possible to do this for our problem of pupil diameter versus age with just the use of a calculator… there are too many steps and we are likely to make errors. We’ll give this to the Minitab function Stat > Regression > Regression.

The Minitab output is extensive, but from it we find Regression Analysis: DP0OD versus AGE The regression equation is DP0OD = AGE This is called the fitted regression equation. This identifies for us b 0 = 7.27 and b 1 =

Here is a reprise of the scatterplot, now shown with the fitted regression line. This was made in Minitab with Stat > Regression > Fitted Line Plot. This has reported also s ε = , the estimate of σ.

It’s important to distinguish population quantities from sample quantities. The process of regression is not simply “numbers in”  “numbers out.”

The simple linear regression model is Y i = β 0 + β 1 x i + ε i If you are asked to graph the line Y = β 0 + β 1 x... Please refuse! You cannot graph this line because β 0 and β 1 are unknown population parameters.

With data, you will get the estimates b 0 and b 1. The fitted regression line is = b 0 + b 1 x. The “hat” on is helpful, but it’s a typesetting nuisance. The fitted line is often given without the “hat.”

For the pupil diameter problem, the fitted line is = AGE The interpretation of is... that each year of age is associated with a reduction of mm in pupil diameter. The interpretation of 7.27 is... to be avoided. It’s tempting to say that it’s an assessment of pupil diameter at birth. The data set did not have anyone younger than 18, so we won’t force an interpretation.

The estimate of the noise standard deviation was calculated as s ε = This is about 0.83 mm, which is rather large for this context. What are we to make of this large value? This is saying that AGE is far from a perfect predictor of pupil diameter.

We still have to decide * Is there an objective way to decide if this whole activity was worth doing? * Is there an objective way to decide if the model Y i = β 0 + β 1 x i + ε i was a good choice?