How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

BA 275 Quantitative Business Methods
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: slope dummy variables Original citation: Dougherty, C. (2012) EC220 -
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
EPI 809/Spring Probability Distribution of Random Error.
Heteroskedasticity The Problem:
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Some Topics In Multivariate Regression. Some Topics We need to address some small topics that are often come up in multivariate regression. I will illustrate.
INTERPRETATION OF A REGRESSION EQUATION
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Valuation 4: Econometrics Why econometrics? What are the tasks? Specification and estimation Hypotheses testing Example study.
1 Module II Lecture 4:F-Tests Graduate School 2004/2005 Quantitative Research Methods Gwilym Pryce
Chapter 10 Simple Regression.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable.
Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: testing a hypothesis relating to a regression coefficient (2010/2011.
Back to House Prices… Our failure to reject the null hypothesis implies that the housing stock has no effect on prices – Note the phrase “cannot reject”
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: two sets of dummy variables Original citation: Dougherty, C. (2012) EC220.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy classification with more than two categories Original citation:
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
Multiple Linear Regression Analysis
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.
Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.
Hypothesis Testing. Distribution of Estimator To see the impact of the sample on estimates, try different samples Plot histogram of answers –Is it “normal”
Hypothesis Testing in Linear Regression Analysis
Returning to Consumption
Serial Correlation and the Housing price function Aka “Autocorrelation”
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Six.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 3: Basic techniques for innovation data analysis. Part II: Introducing regression.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Chapter 5: Dummy Variables. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 We’ll now examine how you can include qualitative explanatory variables.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
(1)Combine the correlated variables. 1 In this sequence, we look at four possible indirect methods for alleviating a problem of multicollinearity. POSSIBLE.
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: exercise 6.13 Original citation: Dougherty, C. (2012) EC220 - Introduction.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
1 In the Monte Carlo experiment in the previous sequence we used the rate of unemployment, U, as an instrument for w in the price inflation equation. SIMULTANEOUS.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE In this sequence we will investigate the consequences of including an irrelevant variable.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Lecture #25 Tuesday, November 15, 2016 Textbook: 14.1 and 14.3
QM222 Class 9 Section A1 Coefficient statistics
Chapter 9 Dummy Variables Undergraduated Econometrics Page 1
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Presentation transcript:

How do Lawyers Set fees?

Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of GM) 4.Dummy variables

Model An example of how we can use the tools we have learned Simple analyses that don’t have a complicated structure can often be useful Question: Lawyers claim that they set fees to reflect the amount of legal work done Our suspicion is that fees are set to reflect the amount of money at stake –Form of second degree price discrimination

Model How to translate a story into econometrics and then test the story? Our Idea: Fees are determined by the size of the award rather than the work done –Percentage fees –Price discrimination Careful to consider alternatives: Insurance

Analysis As always summarize and describe the data Graph variables of interest (see over) Regression to find percentage price rule

reg ins_allow award Source | SS df MS Number of obs = F( 1, 89) = Model | e e+09 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = ins_allow | Coef. Std. Err. t P>|t| [95% Conf. Interval] award | _cons |

Formulate Story as Hypothesis Story is that lawyers charge a fee based on award So null hypothesis is that coefficient on award is zero H 0 :  = 0 H 1 :  ≠ 0 Test hypothesis that award is not statistically significant –Stata does it automatically

1.H 0 :  = 0 H 1 :  ≠ 0 2.Calculate the test statistic assuming that H 0 is true. t=( )/ )= Either find the test statistic on the t distribution and calculate p-value Prob (t>11.53)=0.000 Or compare with one of the traditional threshold (“critical”) values: N-k degrees of freedom 5% significance level: |t|>all the critical values and Prob (t>11.53)= So we reject the null hypothesis

Type 1 error Note how we set up the hypothesis test Null was that percentage charge was zero Type one error is reject the null when it is true The prob of type 1 error is the significance level So there is a 5% chance of saying that lawyers charge a % fee when they do not

Some Comments You could formulate the test as one sided H 0 :  > =0 H 1 :  < 0 H 0 :  0 Exercise to do this and think about which is best Could also test a particular value –H 0 :  = 0.2 H 1 :  ≠ 0.2

Omitted Variables Our first Failure of GM Theorem Key practical issue –Always some variables missing (R2<1) When does it matter? –When they are correlated with the included variables –OLS becomes inconsistent and biased Often a way to undermine econometric results Discuss in two ways –State the issue formally –Use the lawyers example

Formally Suppose we have model with z omitted y i =  +  x i +  z i + u i true model y i = a + bx i + u i estimated Then we will have:  E(b)    b is a biased estimator of effect of x on y  also inconsistent: bias does not disappear as N  The bias will be determined by the formula  E(b) =  +    = direct effect of x on y   = direct effect of z on y   = effect of z on x (from regression of z on x)

In Practice OLS erroneously attributes the effect of the missing z to x –Violates GM assumption that E(u|x)=0 From the formula, the bias will go away if –  =0 : the variable should be omitted as it doesn’t matter –  =0: the missing variable is unrelated to the included variable(s) In any project ask: –are there missing variables that ought to be included (  ≠0)? –could they be correlated with any included variables (  ≠0) ? –What is the direction of bias?

Lawyers Example Suppose we had the simple model of lawyers fees as before. A criticism of this model is that it doesn’t take account of the work done by lawyers –i.e. measure of quantity and quality of work are omitted variables –This invalidates the est of b –This is how you could undermine the study

Is the criticism valid? –these variables ought to be included as they plausibly affect the fee i.e.  ≠0 –They could be correlated with the included award variable (  ≠0) it is plausible that more work may lead to higher award or higher wards cases may require more work Turns out not to matter in our case because award and trial are uncorrelated Not always the case: use IV

Dummy Variables Record classifications –Dichotomous: “yes/no” e.g. gender, trial, etc –Ordinal e.g. level of education OLS doesn’t treat them differently Need to be careful about how coefficients are interpreted Illustrate with “trial” in the fee regression –Trial =1 iff case went to court –Trial =0 iff case settled before court

Our basic model is fee i =  1 +  2 award i + u i This can be interpreted a predicting fees based on awards i.e. E[fee i ]=  1 +  2 E[award i ] Suspect that fee is systematically different if case goes to trial fee i =  1 +  2 award i +  3 Trial i + u i

Now the prediction becomes: E[fee i ]=  1 +  2 E[award i ]+  3 iff trial E[fee i ]=  1 +  2 E[award i ] iff not Note that “trial” disappears when it is zero This translates into separate intercepts on the graph The extra € for bringing a case to trial Testing if  3 is significant is test of significant difference in fees between the two groups For price discrimination story: award still significant

regress ins_allow award trial Source | SS df MS Number of obs = F( 2, 88) = Model | e e+09 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = ins_allow | Coef. Std. Err. t P>|t| [95% Conf. Interval] award | trial | _cons |

While the intercept could be different the slope could be also i.e. the degree of price discrimination could be different between the two groups Model this by an “interaction term” fee i =  1 +  2 award i +  3 Trial i +  4 award i *Trial i + u i Interaction

Now the prediction becomes: E[fee i ]=  1 + (  2 +  4 )*E[award i ]+  3 iff trial E[fee i ]=  1 +  2 E[award i ] iff not Note that “trial” disappears when it is zero This translates into separate intercepts and slopes on the graph The extra € for bringing a case to trial and an extra % Testing if  4 is significant is test of significant difference in % fee between the two groups

gen interact=trial*award regress ins_allow award trial interact Source | SS df MS Number of obs = F( 3, 87) = Model | e e+09 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = ins_allow | Coef. Std. Err. t P>|t| [95% Conf. Interval] award | trial | interact | _cons |

Multiple Hypotheses A little weird that the interact and trial variables are insignificant Possible that they are jointly significant Formally: H 0 :  4 =0 and  3 =0 H 1 :  4 ≠0 and  3 ≠0 This is not the same as two t-tests in sequence Use F-test of “Linear Restriction” Turns out t-test is a special case

Procedure 1.Estimate the model assuming the null is true i.e. impose the restriction Record R 2 for the restricted model R 2 r = Estimate the unrestricted model i.e. assuming the null is false Record the R 2 for the unrestricted model R 2 u =

3.Form the Test statistic r = number of restrictions (count equals signs) N = number of observations K u = number of variables (and constant) in the unrestricted model 4.Compare with the critical value from F tables: F (r, N- K u ) If test statistic is greater than critical value: reject H 0 F(2,87)= 3.15 at 5% significance level

Comments/Intuition Imposing a restriction must make the model explain less of the dep variable If it is “a lot” less then we reject the restriction as being unrealistic How much is “a lot”? –Compare the two R2 (not “adjusted R2”) –Scale the difference –Compare to a threshold value Critical value is fn of 3 parameters: df1, df2, significance level Note doesn’t say anything about the component hypotheses Could do t-tests this way: stata does Sata automatically does H 0 :  2 =0 …  k =0

Conclusions We had four learning objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Dummy variables 4.Omitted variables (the first failure of GM) What’s Next? –More examples –More problems for OLS