WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam WFM 5201: Data Management and Statistical Analysis Akm Saiful Islam Lecture-6: Correlation and Regression Analysis June, 2008 Institute of Water and Flood Management (IWFM) Bangladesh University of Engineering and Technology (BUET)
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam Correlation Correlation is concerned with describing the direction (positive or negative) and strength of a relationship between two variables. Correlation makes no distinction between the two variables (it is a measure of how they vary jointly), whereas regression theory depends on a dependent variable being affected by an error- free independent variable.
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam Correlation coefficient The direction and strength of the relationship can be expressed by means of a correlation coefficient “r”, which is mathematically defined as: The sum of cross products of deviations
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam The sum of squared deviations for X The sum of squared deviations for Y Correlation coefficient
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam Pearson’s “r”
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam Correlation coefficient A correlation coefficient varies from -1 to indicating a perfect negative relationship (one increase while other decrease), 0 indicating no relationship +1 indicating a perfect positive relationship. The size of the correlation indicates the strength of the relationship; for example, the correlation coefficient indicates a stronger relationship than a coefficient of
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam Linear Regression Regression is primarily concerned with using the relationship for the purpose of predicting one variable from knowledge of the other Correlation, on the other hand, is primarily concerned with discovering whether or not a relationship exists in the first place, and then specifying the strength and direction of this relationship.
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam Linear Regression The simple linear regression equation is given as: X = given data b 0 = intercept of regression line b 1 = slope of regression line It is also known as least squares method
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam Regression line
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam Coefficient of Regression
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam Coefficient of Determination The decomposition of the sample variation of leads to a measure of the "goodness of fit", which is known as the coefficient of determination and denoted by R 2. Note:
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam Coefficient of determination is a measure commonly used to describe how well the sample regression line fits the observed data. Range: 0 means poorest, 1 best fit of regression model
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam Exercise-1: Fit regression equation between Boro production and rainfall and find R 2 YearBoro ProductionRainfall
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam Deviations or Errors The sum of squares of these deviations from the fitted line is: Total = Explained + unexplained deviation deviation deviation
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam Total, explained, and unexplained deviation
WFM 5201: Data Management and Statistical Analysis © Dr. Akm Saiful IslamDr. Akm Saiful Islam Regression diagnostics Patterns for residual plots (a) satisfactory (b) funnel, (c) double bow (d) non-linear