12b. Regression Analysis, Part 2 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science,

Slides:



Advertisements
Similar presentations
R Squared. r = r = -.79 y = x y = x if x = 15, y = ? y = (15) y = if x = 6, y = ? y = (6)
Advertisements

Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.
The Best-Fit Line Linear Regression. PGCC CHM 103 Sinex How do you determine the best-fit line through data points? x-variable y-variable Fortunately.
2.2 Correlation Correlation measures the direction and strength of the linear relationship between two quantitative variables.
Regression and Correlation
Chapter 12a Simple Linear Regression
Mathematical Modeling. What is Mathematical Modeling? Mathematical model – an equation, graph, or algorithm that fits some real data set reasonably well.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Regression in EXCEL r2 SSE b0 b1 SST.
11. Multivariate Analysis CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science, IUPUI.
Linear Regression Analysis
Regression and Median-Fit Lines (4-6)
The Line of Best Fit Linear Regression. Definition - A Line of Best or a trend line is a straight line on a Scatter plot that comes closest to all of.
Linear Regression.
Simple Linear Regression
Linear Trend Lines Y t = b 0 + b 1 X t Where Y t is the dependent variable being forecasted X t is the independent variable being used to explain Y. In.
Biostatistics Unit 9 – Regression and Correlation.
2-5 Using Linear Models Make predictions by writing linear equations that model real-world data.
Chapter 6 & 7 Linear Regression & Correlation
12a. Regression Analysis, Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science,
Data Handling & Analysis BD7054 Scatter Plots Andrew Jackson
4.2 Introduction to Correlation Objective: By the end of this section, I will be able to… Calculate and interpret the value of the correlation coefficient.
1.6 Linear Regression & the Correlation Coefficient.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
2 pt 3 pt 4 pt 5pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2pt 3 pt 4pt 5 pt 1pt 2pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4pt 5 pt 1pt Slope-Intercept Form Point-Slope.
Do Now 12/3/09 Take out HW from last night. -Text p. 328, #3-6, 8-12 evens, 16 & 17 (4 graphs) Copy HW in planner. - Text p. 338, #4-14 evens, 18 & 20.
Regression Regression relationship = trend + scatter
Introduction to regression 3D. Interpretation, interpolation, and extrapolation.
DEPARTMENT OF STATISTICS  What are they?  When should I use them?  How do Excel and GCs handle them?  Why should I be careful with the Nulake text?
9c. Line Charts CSCI N207 Data Analysis Using Spreadsheet Department of Computer and Information Science, IUPUI Lingma Acheson
Revision: Pivot Table 1. Histogram 2. Trends 3. Linear 4. Exponential
Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to.
5.7 Scatter Plots and Line of Best Fit I can write an equation of a line of best fit and use a line of best fit to make predictions.
Scatter Plots, Correlation and Linear Regression.
7-3 Line of Best Fit Objectives
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Financial Algebra © 2011 Cengage Learning. All Rights Reserved. Slide LINEAR REGRESSION Be able to fit a regression line to a scatterplot. Find and.
2.5 Using Linear Models A scatter plot is a graph that relates two sets of data by plotting the data as ordered pairs. You can use a scatter plot to determine.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
9e. Scatter Charts CSCI N207 Data Analysis Using Spreadsheet Department of Computer and Information Science, IUPUI Lingma Acheson
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
LSP 120: Quantitative Reasoning and Technological Literacy Topic 1: Introduction to Quantitative Reasoning and Linear Models Lecture Notes 1.2 Prepared.
Scatterplots and Linear Regressions Unit 8. Warm – up!! As you walk in, please pick up your calculator and begin working on your warm – up! 1. Look at.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
Chapter 8 Linear Regression. Fat Versus Protein: An Example 30 items on the Burger King menu:
9d. Pie Charts CSCI N207 Data Analysis Using Spreadsheet Department of Computer and Information Science, IUPUI Lingma Acheson
Lines of Best Fit When data show a correlation, you can estimate and draw a line of best fit that approximates a trend for a set of data and use it to.
Correlation and Regression Ch 4. Why Regression and Correlation We need to be able to analyze the relationship between two variables (up to now we have.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
LINEAR GRAPHS AND FUNCTIONS UNIT ONE GENERAL MATHS.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Copyright © Cengage Learning. All rights reserved.
Linear Regression Special Topics.
Regression and Correlation
Lines of Best Fit When data show a correlation, you can estimate and draw a line of best fit that approximates a trend for a set of data and use it to.
CHM 103 Sinex The Best-Fit Line Linear Regression.
S519: Evaluation of Information Systems
Investigating Relationships
DRILL Given each table write an equation to find “y” in terms of x.
Lesson 2.2 Linear Regression.
Correlation and Regression
Correlation and Regression
Multivariate Analysis Regression
Lesson 2.2 Linear Regression.
Ch 9.
Applying linear and median regression
Presentation transcript:

12b. Regression Analysis, Part 2 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science, IUPUI

Fitting the Data

If there are more than two data points, chances are they don’t all fit in one straight line. We need to find the equation for a straight line that does the “best job” of reproducing the data. About half of the data points should fall above our line (“positive residual”) and about half should fall below (“negative residual”).

Residual Difference between the measured and the calculated Y-values:

Finding the Slope ( m ) of an Estimated Line The slope of the estimated line is given by the ratio of the covariance between the X and Y data sets and of the variance of the X data set:

Finding the y-Intercept ( b ) of an Estimated Line Once we’ve found the slope, we can find the Y- intercept using the standard equation for a line, with one exception: we must use the means of the X and Y data sets as our coordinates (since the actual data points are unlikely to be on the estimated line): Excel functions: –m: SLOPE(..,..) –b: INTERCEPT(..,..)

Practice Find an equation for the trendline of the following data set and predict the reading hours when aptitude is 25, 33 or 45. Student Reading Aptitude Reading Hours

Predicting Values Once we get the slope ( m ) and the y-intercept ( b ) of the estimated line, we have a mathematical relation that ties the X variable to the Y variable. Once we have this relation, we can use it to predict X- and Y- coordinates that are not part of the data sets. E.g. What is the estimated reading hours if two new students coming in, one has a reading aptitude of 25 and another one 46? y = x x = 25, y = * = x = 46, y = * =

Interpolation Interpolation is the process by which we use the formula for estimated line to predict a value of Y for a given value of X that is not included in the data set, but is within the range of the data set. The given value of X and the predicted Y -value will be on the estimated line.

Extrapolation Extrapolation is the process by which we use the formula for estimated line to predict a value of Y for a given value of X that is not included in the data set AND is not within the range of the data set. The given value of X and the predicted Y -value will be on the estimated line, but outside of the range of the data set.

R 2 Value How good is the line? How confident is the prediction? R : C orrelation Coefficient, -1 ≤ R≤ 1 R 2 :Coefficient of Determination, 0 ≤ R 2 ≤ 1 The Coefficient of Determination is used to measure the certainty of making predictions from a graph. It represents the percent of data closest to the trendline. The closer it is to 1, the more confident the prediction is. - From "Correlation Coefficient" (

Excel Functions TREND() - Returns predicted Y values in a linear trend when passed X data. Add Trendline (from the Chart menu) Returns the trendline, equation, and correlation coefficient for a set of X,Y data.