Modeling Possibilities

Slides:



Advertisements
Similar presentations
Example 12.2 Multicollinearity | 12.3 | 12.3a | 12.1a | 12.4 | 12.4a | 12.1b | 12.5 | 12.4b a12.1a a12.1b b The Problem.
Advertisements

1 Revisiting salary Acme Bank: Background A bank is facing a discrimination suit in which it is accused of paying its female employees.
USING DUMMY VARIABLES IN REGRESSION MODELS. Qualitative Variables Qualitative variables can be introduced into regression models using dummy variables.
Random Assignment Experiments
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Example 12.3 Explaining Spending Amounts at HyTex Include/Exclude Decisions.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Example 2.11 Exploring Data with Pivot Tables | 2.2 | 2.3 | 2.4 | 2.5 | 2.6 | 2.7 | 2.8 | 2.9 | 2.10 | ACTORS.XLS.
Example 2.11 Comparison of Male and Female Movie Stars’ Salaries Exploring Data with Pivot Tables.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Interaksi Dalam Regresi (Lanjutan) Pertemuan 25 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Lecture 27 Polynomial Terms for Curvature Categorical Variables.
Multiple Linear Regression
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Business Statistics - QBM117 Least squares regression.
Ch 2 and 9.1 Relationships Between 2 Variables
Class 20: Thurs., Nov. 18 Specially Constructed Explanatory Variables –Dummy variables for categorical variables –Interactions involving dummy variables.
| 13.1a | 13.2a | 13.2b | 13.3 | 13.3a | 13.4 | 13.3b | 13.5 | a13.2a13.2b a b Dummy Variables n Some potential.
 Independent X – variables that take on only a limited number of values are termed categorical variables, dummy variables, or indicator variables. 
Measures of Central Tendency
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Example of Simple and Multiple Regression
Regression Part 3 of 3 Parts Multiple Regression Overview Examples Hypothesis Tests MR ANOVA Table Interpretation Indicator Variables Assumptions Homework.
Correlation and Regression
Simple Regression Scatterplots: Graphing Relationships.
Chapter 13: Inference in Regression
Example 13.1 Forecasting Monthly Stereo Sales Testing for Randomness.
Example 11.4 Demand and Cost for Electricity Modeling Possibilities.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©
Lecture 3-3 Summarizing r relationships among variables © 1.
1 1 Slide © 2016 Cengage Learning. All Rights Reserved. The equation that describes how the dependent variable y is related to the independent variables.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Chapter Fourteen Statistical Analysis Procedures Statistical procedures that simultaneously analyze multiple measurements on each individual or.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
1 Further Maths Chapter 4 Displaying and describing relationships between two variables.
Inference About Regression Coefficients | 14.3 | 14.3a | 14.1a | 14.4 | 14.4a | 14.1b | 14.5 | 14.4b a14.1a a14.1b b BENDRIX.XLS.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Copyright © Cengage Learning. All rights reserved. 4 Quadratic Functions.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
CATEGORICAL VARIABLES Testing hypotheses using. When only one variable is being measured, we can display it. But we can’t answer why does this variable.
Chapter 13 Multiple Regression
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Seven.
Example 12.4 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield The Partial F Test.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 12 Multiple.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Business Research Methods
Example 13.3 Quarterly Sales at Intel Regression-Based Trend Models.
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Dummy Variables Some potential explanatory variables are categorical and cannot be measured on a quantitative scale. However, we often need to use these.
John Loucks St. Edward’s University . SLIDES . BY.
3 4 Chapter Describing the Relation between Two Variables
Regression and Categorical Predictors
Presentation transcript:

Modeling Possibilities Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield Modeling Possibilities

Objective To use StatPro’s multiple regression procedure to analyze whether the back discriminates against females in terms of salary.

BANK.XLS The Fifth National Bank of Springfield is facing a gender-discrimination suit. The charge is that its female employees receive substantially smaller salaries than its male employees. The bank’s employee database is listed in this file. Here is a partial list of the data.

Variables For each of the 208 employees, the data set includes the following variables: EducLev: education level, a categorical variable with categories 1 (finished high school), 2 (finished some college courses), 3 (obtianed a bachelor’s degree), 4 (took some graduate courses) and 5 (obtained a graduate degree) JobGrade: a categorical variable indicating the current job level, the possible levels being from 1-6 (6 is highest) YrHired: year employee was hired YrBorn: year employee was born Gender: a categorical variable with values “Female” and “Male”

Variables -- continued YrsPrior: number of years of work experience at another bank prior to working at Fifth National PCJob: a dummy variable with value 1 if the employee’s current job is computer-related and value 0 otherwise Salary: current annual salary in thousands of dollars Do the data provide evidence that females are discriminated against in terms of salary?

Naïve Approach A naïve approach to the problem is to compare the average salaries of the males and females. The average of all salaries is $39,922, the average female salary is $37,210, and the average male salary is $45,505. The difference between the averages is statistically different. The females are definitely earning less, but perhaps there is a reason. The question is whether the differences between the average salaries is still evident after taking other attributes into account. A perfect task for regression.

Dummy Variables Some potential explanatory variables are categorical and cannot be measured on a quantitative scale. However, we often need to use these variables because they are related to the response variable. The trick is to create dummy variables, also called indicator or 0-1 variables. These are variables that indicate the category a given observation is in.

Dummy Variables -- continued To create dummy variables we can use an IF statement or we can use StatPro’s Dummy variable procedure. The Dummy variable procedure is usually easier particularly when there are multiple categories. Once the dummy variables are created, we can combine the variables if we like by simply adding the columns to get the dummy for the new category.

Regression Analysis In this example we create dummy variables for Gender, and EducLev. Then we can run a regression analysis with Salary as the response variable, using any combination of numerical and dummy explanatory variables. We must follow two rules: We shouldn’t use any of the original categorical variables that the dummies are based on. We should use one less dummy than the number of categories for any categorical variable.

Regression Analysis -- continued This second rule is a technical one. If we violate it the software will give us an error message. For example, Ed_1-Ed_6, any five of these variables can be used. The omitted dummy then corresponds to the reference category. As we will see the interpretation of the dummy variable coefficients are all relevant to this reference category. To get used to dummy variables in regression analysis we will proceed in several stages.

Regression Analysis -- continued We first estimate a regression equation with only one variable. The output is shown in this table. The resulting equation is Predicated Salary = 45.505 - 8.26Female

Regression Analysis -- continued To interpret this equation recall that Female has only two possible values, 0 and 1. If we substitute 1 then the predicted salary equals 37.209 and if we substitute 0 the predicated salary is 45.505. These are the average salaries of females and males. Therefore the interpretation of the -8.926 coefficient of the Female dummy variable is straightforward.

Regression Analysis -- continued The above equation only tells part of the story, it ignores all information except for gender. We expand this equation by adding the experience variables. The output is shown in this table.

Regression Analysis -- continued The corresponding equation is Predicted Salary = 35.492 + 0.998YrsExper + 0.131YrsPrior - 8.080Female It is useful to write two separate equations, one for females and one for males Predicted Salary = 27.412 + 0.988YrsExper + 0.131YrsPrior Predicted Salary = 35.492 + 0.988YrsExper + 0.131YrsPrior We interpret the coefficient -8.080 of the Female dummy variable as the average salary disadvantage for females relative to males after controlling for job experience. But there is still more story to tell.

Regression Analysis -- continued We next add education level to the equation by including four of the five education level dummies. Although any four could be used, we use Ed_2 to Ed_5, so that the lowest level becomes the reference category. We would expect this to lead to positive coefficients for these dummies, which are easier to interpret. The resulting output is shown in the table on the next slide.

Regression Analysis -- continued

Regression Analysis -- continued The estimated regression equations is now Predicated Salary=26.613 + 1.033YrsExper + 0.362YrsPrior - 4.501Female + 0.160Ed_2 + 4.765Ed_3 + 7.320Ed_4 +11.770Ed_5 There are now two categorical variables involved, gender and educational level. However, we can still write a separate equation for any combination of categories by setting the dummies to the appropriate values.

Regression Analysis -- continued For example, the equation for females at the fifth education level is found by setting Female=1 and Ed_5=1 and setting the other job dummies equal to 0. The equation formed is PredictedSalary = 33.882 + 1.033YrsExper + 0.362YrsPrior We interpret this equation as follows: For either gender and any education level, the expected increase in salary for one extra year of experience with Fifth National of $1033; the expected increase in salary for one extra year of prior experience with another bank is $362.

Regression Analysis -- continued The coefficients of the education dummies indicate the average increase in salary an employee can expect relative to the reference (lowest) education level. The key coefficient, the negative $4501 for females, indicates the average salary disadvantage for females relative to males, given that they have the same experience levels and the same education levels. One further explanation for gender differences in salary might be job grade. Perhaps females tend to be in lower job grades, which would help explain why they get lower salaries on average.

Regression Analysis -- continued One way to check this is with a pivot table, as shown below, where we put job grade in the row area, gender in the column area, and request counts, displayed as percentages of columns. Clearly, females tend to be concentrated at the lower job grades.

Regression Analysis -- continued This certainly helps to explain why females get lower salaries on average. We can go one step further to see the effect of job grade on salary by including the dummies for job grade in the equation, along with the other variables we have included so far. As with the education dummies, we use the lowest job grad as the reference category and include only the five dummies for the other categories.

Regression Analysis -- continued While we’re at it, we include the other two potential explanatory variables to the equation: Age, coded as 95 minus YrBorn, and HasPCJob, a dummy based on the PCJob categorical variable. The regression output is shown on the next slide. As expected, the coefficients of the job grade dummies are all positive, and they increase as the job grade increases – it pays to be in the higher job grades.

Regression Analysis -- continued

Regression Analysis -- continued The effect of age appears to be minimal, and there appears to be a “bonus” of close to $5000 for having a PC-related job. The R2 value has now increased to 76.5%, and the penalty for being a female has decreased to $2555 – still large but not as large as before. However, even if this penalty, the coefficient of Female in this last equation, is considered “small,” is it convincing evidence against the argument for gender discrimination?

Regression Analysis -- continued We believe the answer is “no.” We have used variations in job grades to reduce the penalty for being female. But the remaining question is then: Why are females predominantly in the low job grades? Perhaps this is the real source of gender discrimination. Perhaps management is not advancing the females as quickly as it should, which naturally results in lower salaries for females.