Presentation on theme: "A Flavour of Errors in Variables Modelling Jonathan Gillard"— Presentation transcript:
A Flavour of Errors in Variables Modelling Jonathan Gillard
Constructing the Model We have two variables, ξ and η. ξ and η are linearly related in the form η = α+βξ. Instead of observing n pairs ( ξ i, η i ) we observe the n data pairs (x i,y i ), where x i = ξ i + δ i y i = η i + ε i and it is assumed that i and i are independent error terms having zero mean and variances σ δ and σ ε respectively. 2 2
Downs Syndrome Affects 1 in 1000 children born in the UK. Downs is caused by the presence of an extra chromosome. An extra copy of chromosome 21 is included when the sperm and the egg combine to form the embryo. Screening tests are used to calculate the chance of a baby having the condition.
The Data Set
How can we fit a line? There are clearly errors in both variables. To use standard statistical techniques of estimation to estimate β, one needs additional information about the variance of the estimators – Madansky (1959) We know the dating error is ±2 days – this is enough information!
Method of Moments The method of moments has a long history, involves an enormous amount of literature, has been through periods of severe turmoil associated with its sampling properties compared to other estimation procedures, yet survives as an effective tool, easily implemented and of wide generality – Bowman and Shenton
Method of Moments The maximum likelihood approach to estimation is primarily justified by asymptotic (as the sample size goes to infinity) considerations – Cheng and Van Ness
Estimating the Parameters As the dating error is ±2 days, then σ δ = 2. Use a modified y on x regression estimator: β = s xy / (s xx - σ δ ). Other parameters i.e. intercept α can be estimated from the method of moment equations. 2
Typology of Residuals What are residuals used for? 1.Prediction 2.Model checking 3.Leverage 4.Influence 5.Deletion
Estimating the true points Two naive m.m.es of ξ: The optimal linear combination is:
The Estimated True Points
Estimated true against observed
A residual? Attempt to write as a usual regression model: y = α + βx + (ε - βδ) 1. x is always random due to random error 2. Cov(x, ε – βδ) = -β σ δ 3. Using ordinary l.s. estimates leads to inconsistent estimators 2