Improved Register Data Matching and its Impact on Survey Population Estimates Steve Vale Office for National Statistics, UK.

Slides:



Advertisements
Similar presentations
Line Efficiency     Percentage Month Today’s Date
Advertisements

The Use of Administrative Sources for Economic Statistics An Overview Steven Vale Office for National Statistics UK.
The Use of Administrative Sources for Statistical Purposes Matching and Integrating Data from Different Sources.
Improved Register Data Matching and its Impact on Survey Population Estimates Steve Vale Office for National Statistics, UK.
Preparing for A Strategy for Change Based on Previous Experiences Steve Vale Office for National Statistics, UK.
ProjectImpactResourcesDeadlineResourcesDeadline Forecast Plan Time Resources Risk 001xx 002xx 003xx 004xx 005xx 006xx 007xx TotalXX Example 1: Portfolio.
Jan 2016 Solar Lunar Data.
Instruction This template should be used Only for The Best Employee Engagement category. Template can be modified, subject to your company template or.
Explanation of Monthly Compensation Changes

ITI Portfolio Plan Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Current Date Visibility of ITI Projects ITI Projects.
Q1 Jan Feb Mar ENTER TEXT HERE Notes
1.1 Increasing Access to Ocular Care

Project timeline # 3 Step # 3 is about x, y and z # 2
Average Monthly Temperature and Rainfall
Measuring the Impact of Business Profiling in the UK Wiesbaden Group on Business Registers Profiling (Costs and Benefits) Tallinn, Estonia, September.
80-Hour SHARP Certification Course Schedule



Mammoth Caves National Park, Kentucky
2017 Jan Sun Mon Tue Wed Thu Fri Sat

FOMEMA Sales Review Clinic Management Meeting
North Carolina Piedmont Region Consortium Timeline
Gantt Chart Enter Year Here Activities Jan Feb Mar Apr May Jun Jul Aug
Q1 Q2 Q3 Q4 PRODUCT ROADMAP TITLE Roadmap Tagline MILESTONE MILESTONE
Free PPT Diagrams : ALLPPT.com

Proposed Strategic Planning Process for FY 2013/14 thru FY 2015/16
Making Tax Digital Update

Step 3 Step 2 Step 1 Put your text here Put your text here
Calendar Year 2009 Insure Oklahoma Total & Projected Enrollment
MONTH CYCLE BEGINS CYCLE ENDS DUE TO FINANCE JUL /2/2015
Jan Sun Mon Tue Wed Thu Fri Sat
Administrative Data and their Use in Economic Statistics
Big Data ESSNet WP 1: Web scraping / Job Vacancies Pilot
The Birthday Paradox June 2012.

Electricity Cost and Use – FY 2016 and FY 2017
on propulsion of OBD2 gtr

SC SC SC WS SC S HIS Background document Seminar document
Unemployment in Today’s Economy
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Q1 Q2 Q3 Q4 PRODUCT ROADMAP TITLE Roadmap Tagline MILESTONE MILESTONE
Free PPT Diagrams : ALLPPT.com


Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Software Update - Type approval related issues -
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Project timeline # 3 Step # 3 is about x, y and z # 2
TIMELINE NAME OF PROJECT Today 2016 Jan Feb Mar Apr May Jun

UK Link Timeline June-19 Sept-19 (EUC) Nov-19
Q1 Q2 Q3 Q4 PRODUCT ROADMAP TITLE Roadmap Tagline MILESTONE MILESTONE
Pilot of revised survey
Preparing for A Strategy for Change Based on Previous Experiences Steve Vale Office for National Statistics, UK.
Student Information System Additional Information
Change Management E2E Roadmap
Presentation transcript:

Improved Register Data Matching and its Impact on Survey Population Estimates Steve Vale Office for National Statistics, UK

Current matching systems Enhancements Impact on survey populations Contents Background Current matching systems Enhancements Impact on survey populations

Background No common business identifier in UK Data from different sources matched using name, address and postcode Software based around SSAName3 Limited clerical input for “possible match” category (>10 employment) Quality marker (“inquiry stop”) used to indicate probability of duplication and to exclude some enterprises from survey populations

Inquiry Stop 6 Units - Time series 100,000 110,000 120,000 130,000 140,000 150,000 160,000 170,000 180,000 190,000 200,000 Jun- 02 Jul- Aug- Sep- Oct- Nov- Dec- Jan- 03 Feb- Mar- Apr- May- 04

Aim to improve the quality of automatic matching The Project Aim to improve the quality of automatic matching Reduce the number of units on the register that are not included in survey populations Improve certainty about probability of duplication Part funded by Eurostat

Name is standardised to form a name key Matching Process 1 Name is standardised to form a name key Name keys are checked against existing records at decreasing levels of accuracy until possible matches are found The name, address and post codes of possible matches are compared, and a score out of 100 is calculated

If the score is >79 it is considered to be a definite match Matching Process 2 If the score is >79 it is considered to be a definite match If the score is between 60 and 79 it is considered a possible match, and is reported for clerical checking If the score is <60 it is considered a non-match

Matching Process 3 Possible matches are checked clerically and linked where appropriate using an on-line system Non-matches with >9 employment are checked - if no link is found they are sent a Business Register Survey form Samples of definite matches and smaller non-matches are checked periodically

Re-matching using cleaned addresses Improvements 1 Re-matching using cleaned addresses Gains from timing Gains from cleaning and standardising addresses Needs extra storage space on the register for cleaned addresses (approx. 3Gb) Address cleaning tool used: Matchcode5 by Capscan

Better treatment of compound names Improvements 2 Enhancing name keys Standardised creation Inclusion of part of postcode Better treatment of compound names E.g. John Smith trading as Smiths Bakery More use of data on company registrations to assist matching of corporate units

Some units in survey populations found to be duplicates (1%?) Results 1 Approximately 30% of units outside survey populations will match to units already in those populations Less than 5% of the remainder are duplicates of units in the survey populations Some units in survey populations found to be duplicates (1%?)

Overall impact: Results 2 6% more units in survey populations Maximum of 1.4% increase in employment Timing of change is an issue The risk of duplication will be less than the risk of under-coverage

Conclusions Matching rates will be improved by regular re-matching using cleaned addresses. Initial matching by name can be improved if part of the postcode is included. Improvements to matching increase the certainty that the remaining unmatched units are genuinely single source. Desk profiling and clerical matching can reduce duplication still further if targeted at high risk units.

Any Questions? www.statistics.gov.uk/idbr steve.vale@ons.gov.uk Further information www.statistics.gov.uk/idbr steve.vale@ons.gov.uk Any Questions?