Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.

Similar presentations


Presentation on theme: "Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill."— Presentation transcript:

1

2 Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill Building 8:00 - 8:50 Mondays, Wednesdays & Fridays.

3

4 Labs continue this week with Multiple Regression

5 Schedule of readings Before next exam (Monday May 4 th ) Please read chapters 10 – 14 Please read Chapters 17, and 18 in Plous Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

6 Homework due – Wednesday (April 22 nd ) On class website: Please print and complete homework worksheet #19 Completing Simple Regression using Excel – Extended deadline On class website: Please print and complete homework worksheet #19 Completing Simple Regression using Excel – Extended deadline Extra Credit Opportunity

7 Next couple of lectures 4/20/15 Use this as your study guide Logic of hypothesis testing with Correlations Interpreting the Correlations and scatterplots Simple and Multiple Regression

8 Rory’s Regression: Predicting sales from number of visits (sales calls) Regression line (and equation) r = 0.71 b = 11.579 (slope) a = 20.526 (intercept) Predict using regression line (and regression equation) Slope: as sales calls increase by 1, sales should increase by 11.579 Describe relationship Correlation: This is a strong positive correlation. Sales tend to increase as sales calls increase Intercept: suggests that we can assume each salesperson will sell at least 20.526 systems Review Dependent Variable Independent Variable

9 Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(1) Y’ = 32.105 If make one sales call You should sell 32.105 systems Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls What should you expect from a salesperson who makes 1 calls? Madison Joshua They should sell 32.105 systems If they sell more  over performing If they sell fewer  underperforming Review

10 Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(2) Y’ = 43.684 Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls What should you expect from a salesperson who makes 2 calls? If make two sales call You should sell 43.684 systems Isabella Jacob They should sell 43.68 systems If they sell more  over performing If they sell fewer  underperforming Review

11 Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(3) Y’ = 55.263 Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls What should you expect from a salesperson who makes 3 calls? If make three sales call You should sell 55.263 systems Ava Emma They should sell 55.263 systems If they sell more  over performing If they sell fewer  underperforming Review

12 Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls What should you expect from a salesperson who makes 4 calls? Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(4) Y’ = 66.842 If make four sales calls You should sell 66.84 systems Emily They should sell 66.84 systems If they sell more  over performing If they sell fewer  underperforming Review

13 Shorter green lines suggest better prediction – smaller error Longer green lines suggest worse prediction – larger error Why are green lines vertical? Remember, we are predicting the variable on the Y axis So, error would be how we are wrong about Y (vertical) How well does the prediction line predict the Ys from the Xs? Residuals A note about curvilinear relationships and patterns of the residuals

14 14.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Does the prediction line perfectly the predicted variable when using the predictor variable? The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions No, we are wrong sometimes… How can we estimate how much “error” we have? -23.7 Perfect correlation = +1.00 or -1.00 Each variable perfectly predicts the other No variability in the scatterplot The dots approximate a straight line

15 Regression Analysis – Least Squares Principle When we calculate the regression line we try to: minimize distance between predicted Ys and actual (data) Y points (length of green lines) remember because of the negative and positive values cancelling each other out we have to square those distance (deviations) so we are trying to minimize the “sum of squares of the vertical distances between the actual Y values and the predicted Y values”

16 Is the regression line better than just guessing the mean of the Y variable? How much does the information about the relationship actually help? Which minimizes error better? How much better does the regression line predict the observed results? r2r2 Wow!

17 What is r 2 ? r 2 = The proportion of the total variance in one variable that is predictable by its relationship with the other variable If mother’s and daughter’s heights are correlated with an r =.8, then what amount (proportion or percentage) of variance of mother’s height is accounted for by daughter’s height? Examples.64 because (.8) 2 =.64

18 What is r 2 ? r 2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable If mother’s and daughter’s heights are correlated with an r =.8, then what proportion of variance of mother’s height is not accounted for by daughter’s height? Examples.36 because (1.0 -.64) =.36 or 36% because 100% - 64% = 36%

19 What is r 2 ? r 2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable If ice cream sales and temperature are correlated with an r =.5, then what amount (proportion or percentage) of variance of ice cream sales is accounted for by temperature? Examples.25 because (.5) 2 =.25

20 What is r 2 ? r 2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable If ice cream sales and temperature are correlated with an r =.5, then what amount (proportion or percentage) of variance of ice cream sales is not accounted for by temperature? Examples.75 because (1.0 -.25) =.75 or 75% because 100% - 25% = 75%

21 Some useful terms Regression uses the predictor variable (independent) to make predictions about the predicted variable (dependent) Coefficient of correlation is name for “r” Coefficient of determination is name for “r 2 ” (remember it is always positive – no direction info) Standard error of the estimate is our measure of the variability of the dots around the regression line (average deviation of each data point from the regression line – like standard deviation)

22 Regression Example Rory is an owner of a small software company and employs 10 sales staff. Rory send his staff all over the world consulting, selling and setting up his system. He wants to evaluate his staff in terms of who are the most (and least) productive sales people and also whether more sales calls actually result in more systems being sold. So, he simply measures the number of sales calls made by each sales person and how many systems they successfully sold.

23 Summary Interpret r = 0.71 Positive relationship between the number of sales calls and the number of copiers sold. Strong relationship Remember, we have not demonstrated cause and effect here, only that the two variables—sales calls and copiers sold—are related.

24 Correlation Coefficient – Excel Example Interpret r = 0.71 Does this correlation reach significance? n = 10, df = 8 alpha =.05 Observed r is larger than critical r (0.759 > 0.632) therefore we reject the null hypothesis. r (8) = 0.71; p < 0.05

25 Coefficient of Determination – Excel Example Interpret r 2 = 0.504 (.71 2 =.504) we can say that 50.4 percent of the variation in the number of copiers sold is explained, or accounted for, by the variation in the number of sales calls. Remember, we lose the directionality of the relationship with the r 2

26 Multiple regression equations Can use variables to predict behavior of stock market probability of accident amount of pollution in a particular well quality of a wine for a particular year which candidates will make best workers Preview

27 Y’ = b 1 X 1 + b 2 X 2 + b 3 X 3 + a Measured current workers – the best workers tend to have highest “success scores”. (Success scores range from 1 – 1,000) Try to predict which applicants will have the highest success score. We have found that these variables predict success: Age (X 1 ) Niceness (X 2 ) Harshness (X 3 ) According to your research, age has only a small effect on success, while workers’ attitude has a big effect. Turns out, the best workers have high “niceness” scores and low “harshness” scores. Your results are summarized by this regression formula: Both 10 point scales Niceness (10 = really nice) Harshness (10 = really harsh) Success score = (1)( Age ) + (20)( Nice ) + (-75)( Harsh ) + 700 Y’ = b 1 X 1 + b 2 X 2 + b 3 X 3 + a Can use variables to predict which candidates will make best workers Preview

28 Y’ = b 1 X 1 + b 2 X 2 + b 3 X 3 + a According to your research, age has only a small effect on success, while workers’ attitude has a big effect. Turns out, the best workers have high “niceness” scores and low “harshness” scores. Your results are summarized by this regression formula: Success score = (1)( Age ) + (20)( Nice ) + (-75)( Harsh ) + 700 Preview

29 Y’ is the dependent variable “Success score” is your dependent variable. X 1 X 2 and X 3 are the independent variables “Age”, “Niceness” and “Harshness” are the independent variables. Each “b” is called a regression coefficient. Each “b” shows the change in Y for each unit change in its own X (holding the other independent variables constant). a is the Y-intercept Y’ = b 1 X 1 + b 2 X 2 + b 3 X 3 + a According to your research, age has only a small effect on success, while workers’ attitude has a big effect. Turns out, the best workers have high “niceness” scores and low “harshness” scores. Your results are summarized by this regression formula: Success score = (1)( Age ) + (20)( Nice ) + (-75)( Harsh ) + 700 Preview Y’ = b 1 X 1 + b 2 X 2 + b 3 X 3 + a

30 14-29 The Multiple Regression Equation – Interpreting the Regression Coefficients b 1 = The regression coefficient for age (X 1 ) is “1” The coefficient is positive and suggests a positive correlation between age and success. As the age increases the success score increases. The numeric value of the regression coefficient provides more information. If age increases by 1 year and hold the other two independent variables constant, we can predict a 1 point increase in the success score. Y’ = b 1 X 1 + b 2 X 2 + b 3 X 3 + a Success score = (1)(Age) + (20)(Nice) + (-75)(Harsh) + 700 Preview

31 14-30 The Multiple Regression Equation – Interpreting the Regression Coefficients b 2 = The regression coefficient for age (X 2 ) is “20” The coefficient is positive and suggests a positive correlation between niceness and success. As the niceness increases the success score increases. The numeric value of the regression coefficient provides more information. If the “niceness score” increases by one, and hold the other two independent variables constant, we can predict a 20 point increase in the success score. Success score = (1)(Age) + (20)(Nice) + (-75)(Harsh) + 700 Preview Y’ = b 1 X 1 + b 2 X 2 + b 3 X 3 + a

32 14-31 The Multiple Regression Equation – Interpreting the Regression Coefficients b 3 = The regression coefficient for age (X 3 ) is “-75” The coefficient is negative and suggests a negative correlation between harshness and success. As the harshness increases the success score decreases. The numeric value of the regression coefficient provides more information. If the “harshness score” increases by one, and hold the other two independent variables constant, we can predict a 75 point decrease in the success score. Success score = (1)(Age) + (20)(Nice) + (-75)(Harsh) + 700 Preview Y’ = b 1 X 1 + b 2 X 2 + b 3 X 3 + a

33


Download ppt "Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill."

Similar presentations


Ads by Google