Constructing Confidence Intervals based on Register Statistics Thomas Laitila Statistics Sweden and Örebro university Presentation at Q2014, Vienna, June 2-5, 2014
Outline Why - Why confidence intervals? Criteria - Criteria on measures of uncertainty of register statistics CIm - Confidence Image, a new tool Example Discussion Q2014, Vienna, June 2-5Thomas Laitila2
Why - Chatterjee (2003) There are two methods for deriving statements – deduction and induction Statistics provide with a method for inductive inference Q2014, Vienna, June 2-5Thomas Laitila3
Why - Induction Assumptions Evidence (Statistics) Area of concern Statement Q2014, Vienna, June 2-5Thomas Laitila4
Why - Induction and Evidence All evidence come with uncertainty of the general Statements derived by induction are uncertain Example: Inductive statement – A man will inevitably die – Evidence – No man born for more than 150 years ago is still alive. Q2014, Vienna, June 2-5Thomas Laitila5
Why - Why is statistical inference so special? Statistics is the only theory providing with objective measures of uncertainty of inductive inference. Objective measures of uncertainty essential in official statistics Q2014, Vienna, June 2-5Thomas Laitila6
Why - Summing up Register statistics are uncertain Statistical inference provide with objective measurements of uncertainty Inference on register statistics should be founded in statistical inference theory Do we have appropriate statistical tools? – Yes, and no No tool fulfills reasonable criteria Q2014, Vienna, June 2-5Thomas Laitila7
Criteria – Criteria on a measure a)Founded within statistical inference theory Interpretable and objective measures b)Easy to interpret by users How easy is the interpretation of an ordinary confidence interval? c)Of low cost d)Comparable with measures in sample surveys Comparability/coherency Q2014, Vienna, June 2-5Thomas Laitila8
Confidence Image (Cim) - Laitila (2014) Idéa: Use external information to restrict the potential values of study variables (y 1,y 2,…,y N ) – This will restrict the potential values of the population parameter of interest t=f(y 1,y 2,…,y N ) – The more information, the more t is restricted. Information can come in any form, as long it comes with a measure of uncertainty We can use registers, sample surveys, old statistics, big data, google, facebook, whatever!!! Q2014, Vienna, June 2-5Thomas Laitila9
Example - Estimation of total number of cattle in Swedish farms CountyN:o unitsN:o missing valuesSum of y_k Total Table 1: Information 1 in available register on farms (N=72030) 1) No measurement or coverage errors in the register. Problem: Estimate the total number of cattle with an interval estimate using the information in the register, which contains missing values. Q2014, Vienna, June 2-5Thomas Laitila10
Example - Pieces of information A1: Available data in the register A2: The 100 largest farms are in the register and the N:o cattle for the 100 th largest farm is 553. A3: N:o farms with ≥ 100 cattle (Table 2) A4: A 95% CI of the proportion of farms with zero cattle: 0.6 – 0.71 Q2014, Vienna, June 2-5Thomas Laitila11
In registerIn population Countyy_k=0y_k>=553y_k>= Total Table 2: Additional information (N:o units) Example – Table 2 Q2014, Vienna, June 2-5Thomas Laitila12
Example – Calculated CIms Information Used Confidence Level Lower bound Upper bound A1 - A2100% A1 - A3100% A1 – A495% Table 3:Confidence intervals for the total number of cattle based on information sets A1 – A4. (Thousands cattle, True value 1,56 million) Q2014, Vienna, June 2-5Thomas Laitila13
Discussion The CIm can directly be generalized to multivariate cases. The CIm fulfill all the four criteria listed above. Traditional confidence intervals are special cases of CIms Any kind of information (data) can be used, as long as there is a probability measure of its certainty The CIm is a theory, there is a need of methodological developments Q2014, Vienna, June 2-5Thomas Laitila14
Thanks for Your attention! Request for paper Laitila (2014) Q2014, Vienna, June 2-5Thomas Laitila15