Presentation is loading. Please wait.

Presentation is loading. Please wait.

Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March.

Similar presentations


Presentation on theme: "Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March."— Presentation transcript:

1 Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March 15 to 17, 2011

2 Outline Of The Presentation  Overview of the Manufacturing Program  Centralized Process  Surveys  Overview of the UES Survey Process  Post Collection Processing Inputs & Tools  Use of Tax Data  The many phases of UES Post Collection Process  Managing the UES Post Collection Process 2

3 Statistics Canada 3

4 Business and Trade Statistics IndustryStatisticsEconomy-wideStatistics Agriculture, Technology and Transportation Statistics Manufacturing and Energy DistributiveTrades Service Industries Enterprise Statistics Consumer Prices International Trade Producer Prices Investment and Capital Stock Enterprise Statistics Agriculture Small Business And Special Surveys Science, Innovation And Electronic Information Transportation 4

5 Manufacturing Distribution Of Sales 5

6  Establishments primarily engaged in the physical or chemical transformation of materials and substances into new products  Includes assembly of the component parts of manufactured goods, blending of materials, finishing of manufactured products by dyeing, heat treating, plating and similar operations  Transformation of own materials or those owned by others  Service outputs: custom work, repair and maintenance  Product outputs: finished goods, intermediate goods Who Are Manufacturers? 6

7  Monthly Survey of Manufacturing (MSM)  Annual Survey of Manufactures and Logging (ASML)  Series of sub-annual commodity surveys Manufacturing Program At Statistics Canada (STC) 7

8  Monthly indicator of manufacturing activity  Last Redesign in 1999  Designed to be a reliable indicator for both trends and levels  Establishment Survey (n= 10,500)  Stratified by Province, NAICS and Size General Characteristics Of The MSM 8

9  Sales Goods of own manufacture  Inventories Raw materials Goods-in-process Finished products  Orders New orders Unfilled orders  Goods purchased for resale (revenue and inventory) These data are collected but not released  Sales is the main concept, exceptionally production for some industries (aerospace and shipbuilding) MSM Concepts 9

10 SimpleComplex Total number of establishments on the business register 2,278,730110,557 Value of sales of all establishments on the Business Register $2,214.9 billion $1,859.1 billion Total number of manufacturing establishments on the business register 84,2156,648 Value of sales of manufacturing establishments on the Business Register $340.8 billion $234.5 billion Frame And Coverage 10

11 MSM Sampling Plan Take-Some Take-All Take-None 11 Tax replaced Survey

12  Background The Goods and Services Tax (GST) is the federal Value Added Tax GST is collected by the Canada Revenue Agency (CRA) The CRA provides tax data to Statistics Canada  Information received includes the Business Number, revenue, tax remitted and input tax credit MSM Sampling Plan: Use Of Tax 12

13  Who is replaced? Single establishment enterprises  Replace 50% of sampled data with GST data Chronic refusals  Who are not replaced? Very large single enterprise establishments Complex units (i.e. multiple establishments) – as it is found in the GST database Use Of Tax Data 13

14  Measures the contribution of manufacturing industries to economic activity in Canada  In 2010, manufacturing accounted for 15% of GDP and 12% of total employment (SEPH)  Key input to SNA Input-Output tables  Survey collects data on what commodities are produced (Make matrix) where commodities are destined (provincial I/O tables) what commodities and primary inputs are used in production (Use matrix) What Is The Annual Survey Of Manufactures And Logging (ASML)? 14

15  ASML is conducted under the umbrella of Statistics Canada’s Unified Enterprise Survey Program (UES)  Same as MSM  Establishments primarily engaged in manufacturing and logging activities and classified to NAICS 31, 32 and 33 as well as NAICS 113  Estimates produced for 261 NAICS6 level industries  Estimates produced for the 10 provinces and 3 territories. Survey Coverage 15

16  Revenue variables (16), expense variables (43), detailed opening and closing inventories (12), other financial (5)  Sales or outputs variables are valued at producer or FOB factory gate prices required by SNA  Commodities consumed (inputs) and produced (outputs) both goods and services  Collect commodity values and quantities (for selected goods)  Services produced and consumed collected as expense items and classified based on COA Content: Commodity Variables 16

17 Types Of Administrative (Tax) Data From the Canadian Revenue Agency (CRA) Agreement between two agencies T1 (unincorporated businesses) T2 (incorporated businesses) T4 (pay slips) GST (goods and service tax) PD7 (payroll deduction accounts) 17

18 Editing And Imputation For Manufacturing Surveys

19 Why A Centralized Process?  Best Practices  Standardization of Processes Cross Survey Comparisons Enterprise Centric Processing/Coherence Analysis  Efficient use of Resources  Transportable Knowledge Across Survey Programs 19

20 Challenges Of A Centralized Process  Remain Centralized  Distribute processing  Priority Setting  Communication and Coordination 20

21 Pre-Grooming Allocation / Estimation Edit & Imputation “Clean” Records Central Data Store Subject Matter Review & Correction Tool Tax Data USTART UES Post-Collection Processing 21

22 Collection  Collection Period: February to early October  Collection Processing System: Blaise Blaise can be seen as being a Collection Control Center Blaise has many functions:  Call Scheduler  Transaction history files  Audit Trail Files  And more 22

23 Blaise: Variables  Questionnaire number  Mail-out date  Number of calls  Length of the call  Number of contact attempts  Response code  And more 23

24 Blaise: Bonuses Over The Years  Blaise Transaction History (BTH) Files Collection data analysis:  Produced a paper on best time to call  Produced a paper on maximum # of attempts  Audit Trail Files Find outliers Difficult to answer questions 24

25 Collection  Precontact (Dec-Jan) –Mostly for Business Register (BR) births; verification of contact information (name, address, …) –By phone (in a few cases, a letter or a fact sheet is sent)  Mail-out of questionnaires (Jan-March) –2 or 3 mail-out dates  Follow-up in case of non-response for some units (begins about a month atfer mail-out) –Phone call, r or fax  Mail-back of questionnaires  Verifications of received questionnaires / Edits –Is the questionnaire complete or are some key variables missing? (Edit follow-up by phone in some cases) 25

26 Collection  Coding of questionnaires (about 20 response codes) Response, non-response, out-of-scope, …  Imaging / Data capture (CADI - Computer Assisted Data Input) 26

27 Centralized Collection Mailout (38K CEs) Pre-Contact (17K Businesses) Edit / Verification (BLAISE) Receipt (75% target) Delinquent Follow-Up Capture / Imaging “Clean” Records Score Function 27

28 UES: Data Collection / Score Function  Introduced in 2002, the UES score function is the main tool used at the collection stage to determine which priority to give for the follow-up of about 23,000 Collection Entities (CE) each year.  Reduces collection costs yet retains data quality  Similar to the collection goal of obtaining a high weighted coverage response rate.  PRIORITY 1: Extensive follow-up for the larger revenue CEs in cases of non-response.  PRIORITY 0: Minimum follow-up for the smaller CEs in cases of non-response. 28

29 DISSEMINATION COLLECTION Chart Of Accounts Sales Operating revenue Cost of sales Gross profit Expenses EBIT Outputs Inputs Value added Shipments Operating Surplus GDP LINK, BRIDGE, CONCORDANCE 29

30 Expected Benefits Of A Chart Of Accounts  Standardization in business data collection  Higher survey response  Increase in quality of data  Comparison of data from various sources  Increase efficiency in using administrative data 30

31 Links To Chart Of Accounts CHART OF ACCOUNT Establishment Legal entity Enterprise 31

32 UES: Use Of Tax Data  Validation (comparison)  Verify dubious collected data against the equivalent tax data record  Imputation  One of the methods used for non-response  Estimation  Below take-none  Direct Data Replacement  Update Business Register  Allocation of survey data (use tax revenues, salaries and expenses)

33  Develop centralized systems Move away from stand-alone Single point of access for security  Integrated Questionnaire Metadata System  Edit and imputation  Allocation and Estimation  Data Warehouse Centralized Processing Systems And Databases

34 Enterprise Portfolio Managers  Top 350 enterprises in Canada  Status Platinum, Gold, Silver, Bronze  Personal visits  Enterprise Profiling  Coordination of mail-out and collection  Enterprise/ Establishment coherence  Holistic Response Management Strategic Response Unit Escalation Process / Statistics Act 34

35 Review and Correction (Post-Capture)  Done via an application which is a micro-editing tool  Opportunity to perform edits and to manually correct data before the automated edit and imputation process  Opportunity to gain an understanding of the quality of data coming in from the field 35

36 What Is Generally Done By SMOs During This Process?  Ensure that industry codes are valid and response code are correct  Ensure that equivalent survey cells have consistent data  Enter data for records that came in after the collection cut-off date  Review high impact outliers in terms of profit, average salary, etc.  Check comments made by respondents and collection staff 36

37 Why Is This Process Necessary?  Reviewing and correcting records will increase the number and quality of donors for the automated edit and imputation (E&I) stage. This will improve the quality of data coming out of E&I.  Need to assess the quality of collected data  Determine if problems with questionnaire  Inability of respondent to provide a given data point  Determine if enough data for E&I 37

38 What Should Not Be Done During This Process?  Do not plug data for non-response records. They will be imputed during the automated E&I. 38

39 What Is E & I?  Editing Verify that parts add-up to total Ensure that there are no missing values where parts add up to total There must be consistency between related variables  Imputation Changing values in fields which fail edit rules with a view to ensuring that the resulting data satisfy all edit rules. In practice, reported data will rarely be changed Impute for missing data or partially responded data Impute entire records in the case of total non- response 39

40 Why Is E&I Necessary?  To produce a complete and consistent data file that accounts for all sampled units  Both units that did not respond to the survey must be imputed and units that did not provide a complete response must be imputed  Correct erroneous responses 40

41 E&I Terminology  Data Group Groupings (defined by SM) of records that will be kept together for imputation purposesGroupings (defined by SM) of records that will be kept together for imputation purposes These groupings are based on multi dimensions:These groupings are based on multi dimensions:  industry (NAICS)  geography (province)  Data groups that will be used for a specific survey will depend on: initial sample design (number of units sampled and the level of stratification used)initial sample design (number of units sampled and the level of stratification used) number of records that respond to the survey (a minimum of 5 or 10 records are required in a data group)number of records that respond to the survey (a minimum of 5 or 10 records are required in a data group)  May be changed during production if not enough donors 41

42 E&I Terminology (continued)  Edit Group Grouping of variables within a record that will be processed together in an imputation method Generally edit groups may be defined as follows for most surveys:  revenue and expense sections  employment section and provincial distribution of goods/services sold Allows for a record to be a donor if it has clean data in one section even when other sections are blank; this increases the donor pool 42

43 E&I Terminology (continued)  Key variables Total operating revenue Total operating expenses Salaries Cost of goods sold 43

44 The Stages Of The E&I System  Pre-processing  BANFF E & I System   Post-Processing  Allocation 44

45 Preprocessing  Deterministic Edits  Conditional edits - If A then B  Sum of Parts (SOP)  Assign 100% to percentage totals  Impute reporting period  Donor Outlier Detection 45

46 BANFF E & I System  Impute for missing key variables as specified by subject matter (i.e. total revenue, total expenses)  Impute for other missing variables: Apply Historical Trend Apply Current Year Trend Use donor (for partial imputation),  Select a donor for massive imputation for total non-response 46

47 BANFF Algorithms  DIFTREND - Historical trend imputation  CURRATIO - Current ratio imputation  PREVALUE – Value from the previous period for the same unit is imputed  PREAUX – Historical value of a proxy variable for the same unit  CURAUX – Current value of a proxy variable for the same unit 47

48 Post-Processing  Prorate components to ensure that they sum exactly to totals  Perform a number of consistency checks to ensure that micro-data are valid  Assign customer location (percentage cells)  Massive Imputation (donor selected during processor but applied in the post-processor) 48

49 Allocation - Definition & Purpose Definition:  Allocation is the distribution of survey and administrative data from their acquisition level (Collection Entity) to the targeted statistical units (Establishments or Locations) as defined on the survey frame. Purpose:  To provide fully-processed micro data on a fiscal year basis, for establishments or locations in-sample for the UES  Determine the distribution of value added by province 49

50 Sample Survey Allocation 50

51  Post Collection Operations Committee Discuss production issues of common interest Provide status reports on production and production readiness  Divisional Production meetings Working group level dealing with production issues relating to a specific subject matter division, including planning and adhoc requests  Post Collection Processing Teams Structured by Subject Matter Division to provide the best support and to maximise subject matter expertise  Change Management Requests Improvements  Service Request Management Portal (SRM) Corrections Managing The UES Post Collection Process 51

52 Future Directions  IBSP (Integrated Business Statistics Project) New and Improved UES, to consolidate and standardise processing for more annual and sub- annual business surveys Start RY2013. To be completed for RY2015 Number of surveys to increase from 60 annual surveys to 120 annual and sub-annual surveys. 52


Download ppt "Editing And Imputation For Manufacturing Statistics At Statistics Canada Marie Brodeur Director General, Industry Statistics Branch Santiago, Chile March."

Similar presentations


Ads by Google