Download presentation
Presentation is loading. Please wait.
1
BIVARIATE DATA: CORRELATION AND REGRESSION Two variables of interest: X, Y. GOAL: Quantify association between X and Y: correlation. Predict value of Y from the value of X: regression. EXAMPLES: (height, weight), (yrs. of education, salary), (hrs. of studying, exam score), (SAT score, GPA), (chemical reaction time, temperature), (rainfall, runoff volume), (demand, price), etc. BIVARIATE DATA: PAIRS (X, Y): (x1, y1), (x2, y2), …, (xn, yn). (xi, yi) – i th observation, values of X and y on the i th “subject”. Correlation studies: study type and amount of association between X and Y. Regression studies: aim to predict Y from X by constructing a simple equation relating Y to X.
2
CORRELATION GRAPHICAL REPRESENTATION OF BIVARIATE DATA – SCATTER PLOT: plot observations (x1, y1), (x2, y2), …, (xn, yn) as points on the plane. Types of association/relationship between vars: positive, negative, none. Positive association: Two variables are positively associated if large values of one tend to be associated (occur) with large values of the other variable and small values of one tend to be associated with small values of the other variable. Example: Height and weight are usually positively associated Positive association
3
CORRELATION, contd. Negative association: Two variables are negatively associated if large values of one tend to be associated (occur) with small values of the other. The variables tend to “move in opposite directions”. No association: If there is no association, the points in the scatter plot show no pattern. Negative association No association E.g. High demand often occurs with low price.
4
CORRELATION COEFFICIENT Measure of strength of association: correlation coefficient: r xy. Data: (x 1, y 1 ), (x 2, y 2 ), …, (x n, y n ). Sample statistics: sample means: Sample standard deviations: s x and s y. Sample correlation coefficient: Correlation coefficient measures strength of LINEAR association.
5
PROPERTIES OF THE SAMPLE CORRELATION COEFFICIENT r XY. r xy > 0 indicates positive association between X and Y. r xy < 0 indicates negative association between X and Y. r xy ≈ 0 indicates no association between X and Y. -1 ≤ r xy ≤ 1, the closer | r xy | to 1, the stronger the relationship between X and Y. Computational formula for r: given sample stats we can compute r as
6
CORRELATION COEFFICIENT AND ASSOCIATION r = 0.75r = 0.9 r = 0.28 r = 0.5 r = - 0.3 Almost perfect association, but not linear, r small. Strong association Moderate association Weak association No association
7
CORRELATION, CONTD. Correlation does not imply CAUSATION! Watch out for hidden (lurking) variables. Example. Study of fires. X=amount of damage, Y = # of firefighters. r XY ≈0.85. The more firefighters, the more damage? Hidden variable: Size of the fire.
8
EXAMPLE In a study of income an savings, data was collected from 10 households. Both savings and income are reported in thousands of $ in the following table. Find the correlation coefficient between income and savings. Solution: Summary statistics: X=income, Y=savings Σx i = 463, Σx 2 i = 23533, Σy i = 27.4, Σy 2 i = 120.04, Σx i y i =1564.4. incomesavings 250.5 280.0 350.8 391.6 441.8 483.1 524.3 654.6 553.5 727.2 There is a strong positive association between family income and savings.
9
EXAMPLE, MINITAB Correlations (Pearson) Correlation of income and savings = 0.963, P-Value = 0.000
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.