Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03

Similar presentations


Presentation on theme: "Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03"— Presentation transcript:

1 Data Analysis and Statistical Software I (323-21-403) Quarter: Autumn 02/03
Daniela Stan, PhD Course homepage: Office hours: (No appointment needed) M, 3:00pm - 3:45pm at LOOP, CST 471 W, 3:00pm - 3:45pm at LOOP, CST 471 11/10/2018 Daniela Stan - CSC323

2 Outline Chapter 2: Looking at Data – Relationships between
two or more variables Least - Squares Regression Cautions about Regression and Correlation 11/10/2018 Daniela Stan - CSC323

3 Regression A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. 11/10/2018 Daniela Stan - CSC323

4 Regression y = a + b*x b = slope ~ rate of change a = intercept (x=0)
Fitting a line to data means drawing a line that comes as close as possible to the points: y = a + b*x b = slope ~ rate of change a = intercept (x=0) Height= a + b*age 11/10/2018 Daniela Stan - CSC323

5 Prediction Use of Regression: to predict the value of y for any value of x by substituting this x into the equation of the regression line. Extrapolation is the use of regression line for prediction outside the range values of the explanatory variable x that you used to obtain the line. Such predictions are often not accurate. Example: Problem 2.35 11/10/2018 Daniela Stan - CSC323

6 The regression line makes the prediction errors as small as possible.
Least - Squares Regression How do we fit a line to a scatterplot? The regression line makes the prediction errors as small as possible. 11/10/2018 Daniela Stan - CSC323

7 Least - Squares Regression (cont.)
The least - squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. How to calculate the least – squares regression line? = predicted value Where: r = correlation, Sx,Sy = standard deviations = means 11/10/2018 Daniela Stan - CSC323

8 Correlation and Regression
The correlation r is the slope of the least-squares regression line when we measure both x and y in standardized units. The square of the correlation, r2 11/10/2018 Daniela Stan - CSC323

9 Cautions about regression and correlation
Residuals = the difference between the observed and predicted values of y. A residual plot is a scatterplot of the regression residuals against the explanatory variable; it helps us to asses the fit of the regression line. Lurking variable is a variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of the relationships among those variables. 11/10/2018 Daniela Stan - CSC323

10 Data Mining Exploring really large data bases in the hope of finding useful patterns is called data mining. Domain Understanding Data Selection Cleaning & Preprocessing Knowledge Evaluation & Interpretation Discovering patterns The entire process is iterative and interactive. 11/10/2018 Daniela Stan - CSC323


Download ppt "Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03"

Similar presentations


Ads by Google