Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding regression. 2 A regression is an average Experiment: Imagine that you are looking at people coming through a door. Imagine also that you.

Similar presentations


Presentation on theme: "Understanding regression. 2 A regression is an average Experiment: Imagine that you are looking at people coming through a door. Imagine also that you."— Presentation transcript:

1 Understanding regression

2 2 A regression is an average Experiment: Imagine that you are looking at people coming through a door. Imagine also that you had “metric eyes” (rather like Superman’s x-ray vision) and could accurately estimate the height of each person as they passed through. After 10 people had gone through the door, what would be the best prediction for the height of the eleventh person? Answer – the average This is why the “average” is also called the “expected value.”

3 3 The expected value of the height of the 11 th is the average of the previous 10.

4 4 Imagine that as you are estimating the height of the persons coming through the door, you also note their gender. Information on gender improves our ability to predict height.

5 5 Regression Two basic purposes: – Explanation – Prediction Regression is an efficient way to analyze the structure of the data. A regression model is a sentence that connects the average or expected value of something (a person’s height) in multi-dimensions (multivariate analysis).

6 6 The regression sentence The regression equation may be read as a sentence that summarizes the simultaneous influence of independent variables (causes or drivers) on a single dependent variable (effects or outcomes). Here is a simple, single variable model. Height = 165 + 5D (D = 1 for a man and 0 for a woman) The regression sentence: The predicted (expected) height for people coming through the door is 165 cm plus 5 cm if that person is a man. In other words: Women have an expected height of 165 cm and men have an expected height of 170 cm. Regression coefficient

7 7 Adding variables Adding more variables conditions our prediction (expectation) for the height of people. Typical variables could include: – number of litres of milk consumed per week – income of parents ($’000s) – kilometres above sea level at birth

8 8 Number of litres of milk consumed each week HEIGHT (cm) X X X X X X X X X 0 5 Height = 100 + 15L 100 For every litre consumed, height increases 15 cm. No milk consumption implies an expected height of 100 cm. Someone who drinks 20 litres of milk each week has an expected height of 400 cm.

9 9 Regression sentences An earnings regression simply relates the expected earnings based on several variables. Y = 6,000 + 200.5 AGE + 1000.5 YEARS_ED (Y = annual income) “Expected annual income for the sample is $6,000 plus 200.5 times AGE plus 1000.5 times years of education.” A 30-year-old with 12 years of education can expect to earn: $6,000 + 200.5(30) + 1000.5 (12) = $24,021 For every year of education, annual salary increases by $1000.50. Regression coefficient

10 10 Example - LMAPD impact analysis Wanted to associate labour market programming with outcome Wanted to assess the presence and intensity of programming Built a regression sentence that expressed this relationship Hours = a 1 + a 2 Female + a 3 Aboriginal + … + a k-1 EmpIoy + a k # Employ Worked Inter. Inter. Output appears more complicated, but follows the same principles. Output

11 11 Ex. LMAPD: Estimating VR counselling hours (LMAPD VRhours) Admin data includes total cost of services spent by the VR program on a particular client, but it does not include the cost of VR counselling. To estimate VR counselling costs per client, 281 VR clients with currently active VR counsellors were selected. VR counsellors were provided a short questionnaire including the following question to be answered for each VR client: On average, over the entire time that you have been this client’s counsellor, how many hours per month did you spend on this client’s case?

12 12 Ex. LMAPD VRhours Surveys for 270 clients were returned. Information from the surveys was merged with the administrative data. The next step was to run a regression using the sample of 270 VR clients to calculate the coefficients for the independent variables (from the admin data) to estimate VR counselling costs for the entire sample of VR clients (n=1,062).

13 13 Ex. LMAPD VRhours Dependent variable: Average monthly time in hours spent by VR counsellors on the clients’ files (survey question) Independent variables: – Demographic: gender, Aboriginal status, minority status, age, disability type – Service data: urban/rural service delivery region, organization that delivered services

14 14 Ex. LMAPD VRhours: Independent variables VariablesTypeMean (Male gender)M.E. dummy0.61 Female genderM.E. dummy0.39 (Non-Aboriginal)M.E. dummy0.98 AboriginalM.E. dummy0.02 (Non-minority)M.E. dummy0.99 MinorityM.E. dummy0.01 AgeContinuous35.09 Cognitive disabilityN.E. dummy0.17 Physical disabilityN.E. dummy0.30 Psychiatric disabilityN.E. dummy0.28 Hearing disabilityN.E. dummy0.09 Vision disabilityN.E. dummy0.13 Learning disabilityN.E. dummy0.14 (Urban service delivery region)M.E. dummy0.69 Rural service delivery regionM.E. dummy0.31 (Provincial service delivery)M.E. dummy0.52 SMD service deliveryM.E. dummy0.31 CPA service deliveryM.E. dummy0.06 CNIB service deliveryM.E. dummy0.12

15 15 Ex. LMAPD VRhours: Independent variables Variables in parentheses (X) are the excluded dummy variables from the regression. Types of variables: – Continuous – Mutually exclusive dummy variable – Not mutually exclusive dummy variable

16 16 Ex. LMAPD VRhours: Regression results Independent variablesCoefficientP-value Constant2.050.01 Female gender (fg)0.100.74 Aboriginal (ab)-1.140.26 Minority (m)3.980.01 Age (ag)-0.010.36 Cognitive disability (cd)0.200.78 Physical disability (phd)5.430.00 Psychiatric disability (psd)1.080.10 Hearing disability (hd)6.340.00 Vision disability (vd)0.580.73 Learning disability (ld)-0.610.35 Rural service delivery region (r)0.170.612 SMD service delivery (smd)-6.060.00 CPA service delivery (cpa)-5.160.00 CNIB service delivery (cnib)0.610.74 Sample: 270 Adj. R 2 : 0.1508

17 17 Ex. LMAPD VRhours: Coefficients Aboriginal status is associated with fewer hours per month (-1.14). Minority status required 3.98 hours more of VR counselling. Rural clients logged slightly more hours in counselling than urban clients (0.17, not statistically significant). Those with physical and hearing disabilities require substantial support.

18 18 Ex. LMAPD VRhours: Regression sentence VRhours = 2.05 + 0.1fg + (-1.14ab) + 3.98m + (- 0.01)ag + 0.2cd + 5.43phd + 1.08psd + 6.34hd + 0.58vd + (-0.61)ld + 0.17r + (-6.06)smd + (- 5.16)cpa + 0.61cnib Can now use the estimated coefficients and the independent variable values for all 1,062 VR participants to calculate the estimated number of VR hours required for each client.

19 19 Assessing the quality of a regression 1.Goodness of fit (R 2 ) measures the percentage of variation in Y explained by the model. The R 2 varies between 0 (low) and 1 (high).

20 20 Assessing the quality of a regression 2. Statistical significance The higher the coefficient, the more confident we are that it is not zero. The lower the SD, the more confident we are that we have measured the effect reliably. Coefficient divided by standard deviation is the t value. The rule of 2 is applied again as a “t” test. Y = 6,000 + 20.5 AGE + 100.5 YEARS_ED (2.5) (3.8) (1.2) Computer output reports t values (as above) and standard errors, p values and a host of other diagnostics.

21 21 Deaths = A + B (Number of installations) (The test is whether B is positive.) Model 1 Photo radar and traffic safety Model 2 Deaths = A + B (Year) + C (D) D = 0 (year 2001) (The test is whether C is negative.) N u m b e r o f d e a t h s f r o m t r a f f i c a c c i d e n t s Number of photo radar installations X X X X X X X X X Traffic accidents and photo radar for Canada’s largest cities X X X X X X

22 22 Regression variables Dependent (Outcome) Independent (Causal) – Context (age, gender, ethnicity) – Driver (policy) Policy can be measured directly ($, person years) or as a change in state (dummy variable).

23 23 Building a regression model Identify the dependent (effect or outcome) variable(s). What are the independent (causal) variables? Are there policy impacts? How are these to be measured?

24


Download ppt "Understanding regression. 2 A regression is an average Experiment: Imagine that you are looking at people coming through a door. Imagine also that you."

Similar presentations


Ads by Google