1 Business Register: Quality Practices Eddie Salyers
2 An Assessment of Current Quality Assurance Practices and Ongoing Work to Develop a Comprehensive Quality Plan for the U.S. Census Bureau Business Register
3 Introduction –Database Redesign –Quality Assurance Team Business Register Overview Quality Assurance –Migration –Administrative Records –Census Bureau Data Collections –Recommendations Conclusion Business Register: Quality Practices
4 BR Database Redesign Complete redesign Old Standard Statistical Establishment List (SSEL) VAX RDB New Business Register (BR) Oracle All software rewritten New BR production Fall 2002
5 Quality Assurance Team Mission: Assure the quality of the new BR is a minimum commensurate with the old SSEL which it replaces, and to establish a complete quality framework.
6 Quality Assurance Team Definitions: Quality – "The totality of features and characteristics of a product or service that bare on its ability to satisfy specified or implied needs." (ISO, 1986). Reliability - “The ability of a system or component to perform its required functions under stated conditions for a specified period of time.” [IEEE 90]. Integrity - Information in the system follows designated standards and is consistent both within an individual table as well as between associated tables.
7 Business Register Overview Primary Functions –Economic Census enumeration list –Survey sampling frames –Central storage of administrative data –Control file for data collection/processing –Data for statistical products –Data for economic research
8 Key Concepts and Definitions The BR’s Units Business/Statistical –Establishment –Enterprise –Enterprise segment (e.g., alternate reporting unit) Administrative –EIN unit –SSN unit } Standard Statistical Units } Variable } Mainly for IRS tax reporting
9 Business Organization Basic Types Single-establishment enterprise: –An enterprise that operates just one establishment (i.e., at one physical location) - a single unit or SU Multi-establishment enterprise –An enterprise that operates two establishments or more (2-plus locations)
10 Multiunit A more complex MU may have: Multiple EIN units One subsidiary enterprise or more
11 Complex Multiunits The largest U.S. Multi-units may have: Several thousand EINs More than 10,000 establishments
12 System Oracle Database Many Related Tables Interactive Web-Based Interface built with Oracle Forms & PL/SQL Interface used for research and updates Software for interactive and batch updates and edits
13 Migration Complete Redesign –New IDs –New Table Structures –All New Software –Copy Existing data –Load “new” data
14 Migration Quality Checks –Create SAS Datasets from Old SSEL and New BR for 2001 Records –Record to Record Match of 2001 SSEL and 2001 BR After accounting for differences cause by design no significant differences were found –Comparison of 2001 BR to 2002 BR Checks both migration and software used to load 2002 records Year to Year Changes as Expected
15 Administrative Records Internal Revenue Service: Business Master File (BMF) Payroll tax returns Business income tax returns Bureau of Labor Statistics (BLS): –Description: Industrial classification assigned by State Employment Security Agencies as part of Covered Employment and Wages Social Security Administration –Applications for new Employer Identification Number (EIN)
16 Administrative Records Over 100 Million administrative records are received each year.
17 Administrative Records Quality Assurance Current Practices: Stage 1: –Tabulate distributions of variables on incoming files and compare to expected values. –Unchanged with redesign, works on inputs Stage 2: –Basic Validity Test: Edits to assure each item has a valid form (valid states, data type, etc.) –Ratio Edits: Examine Consistency of correlated data, I.e. Payroll per employee –Data failing edits are replaced with imputed values and referred to an analyst for review –Done as part of load to BR database –Process is similar to old, but all software rewritten for new BR
18 Administrative Records Quality Assurance Current Practices: –Strengths: Identifies systematic file errors well –Weaknesses Lack of Macro-Level Post Processing Quality Assurance Communication Identifying significant problems with large cases
19 Administrative Records Quality Assurance R ecommendations –Using SAS datasets that are created monthly from the BR perform a routine macro-level review. –Creation of a Centralized Administrative Record Tracking System – Standardization and Automation of all Current QA Reports – Increase Ability to Identify Important Companies with Missing or Inaccurate Administrative Records –Development of Systematic Review of Post- Processing Administrative Record QA – Monitor Cost of Current Administrative Record Quality Assurance Activities
20 Census Bureau Data Collections Company Organization Survey Description: Register proving survey directed to selected multiunit enterprises Content –Ownership or control by a United States parent –Ownership or control by a foreign parent –Inventory of establishments, verifying or collecting the following for each: Primary and secondary name Physical location EIN used for payroll tax reporting SIC Employment for pay period including March 12 First quarter and annual payroll Year-end operating status
21 Census Bureau Data Collections Economic Census Description: Enumeration of establishments in covered industries Content for each establishment: –Ownership or control by a parent enterprise –Locations of operation –Primary and secondary name –Physical location address –EIN used for payroll tax reporting –SIC and Type of Operation –Employment for pay period including March 12 –First quarter and annual payroll –Dollar volume of business (value of shipments, sales, receipts, revenue) –Year-end operating status –Value of products and services by category (selectively) –Other industry-specific content
22 Census Bureau Data Collections Quality Assurance Current Practices: Data Entry –Independent Verification of samples –Data are re-keyed and difference adjudicated –Lots accepted or rejected based on error rates. Batch Update Operations –Basic Validity Test: Edits to assure each item has a valid form (valid states, data type, etc.) –Ratio Edits: Examine Consistency of correlated data, I.e. Payroll per employee –Data failing edits are replaced with imputed values and referred to an analyst for review –Done as part of load to BR database –Process is similar to old, but all software rewritten for new BR
23 Census Bureau Data Collections Quality Assurance Current Practices: Clerical Operations –A second person that is qualified as a verifier selects and inspects a sample of the referrals from each completed work unit (dependent verification); –Rejected work units subjected to 100% re- inspection –Note “old” SSEL had functionality to hold corrections until they passed inspections
24 Additional QA Team Recommendations Improve Error Tracking Improve Imputation for missing Employment and Payroll Values Evaluate ORACLE DQI (Data Quality Inspector) as way to identify problems Expand use of SAS datasets built from the BR to assess quality Review and documentation of user needs and how the BR meets those needs Comparison to Bureau of Labor Statistics (BLS) Business Establishment List (BEL)-
25 Conclusion No identifiable difference in quality of new BR and old SSEL Most procedures remain same Migration completed accurately Concerns –Clerical processing –Dependence on staff expertise Several Areas for Potential Improvements