Presentation is loading. Please wait.

Presentation is loading. Please wait.

(Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in.

Similar presentations


Presentation on theme: "(Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in."— Presentation transcript:

1 (Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in België Brussels, September 25 2009

2 Overview Benefits of using linked longitudinal firm- level datasetsBenefits of using linked longitudinal firm- level datasets International experienceInternational experience Modes of access to confidential firm-level datasetsModes of access to confidential firm-level datasets

3 Benefits of using firm-level data Improving quality of statisticsImproving quality of statistics Testing of theories at firm-levelTesting of theories at firm-level Providing ‘moments’ for modellingProviding ‘moments’ for modelling Policy evaluationPolicy evaluation

4 Benefits of using firm-level data Improving quality of statisticsImproving quality of statistics Assessing quality of published statsAssessing quality of published stats New uses for old dataNew uses for old data Uncovering new collection methods and new data needsUncovering new collection methods and new data needs Testing of theories at firm-levelTesting of theories at firm-level Providing ‘moments’ for modellingProviding ‘moments’ for modelling Policy evaluationPolicy evaluation

5 Data Quality In-house use at National Stats Office (NSO): Consistency in x-sect and longitudinal Integration: top-down vs bottoms-up External users: quality improvement criteria Systematic learning from external users

6 New uses for ‘old’ data Linking of multiple sourcesLinking of multiple sources link NSO surveys to Business Registerlink NSO surveys to Business Register cross-linking with other registerscross-linking with other registers Housing, transport, labor, taxHousing, transport, labor, tax Linking with external surveysLinking with external surveys Creation of new indicators from linked dataCreation of new indicators from linked data Gross FlowsGross Flows Higher moments; CorrelationsHigher moments; Correlations New disaggregationsNew disaggregations Subsamples: region, industry, size, typeSubsamples: region, industry, size, type

7 New collection methods Links to registers allows for mass imputation of small samplesLinks to registers allows for mass imputation of small samples Collection of data at ‘transactions’ siteCollection of data at ‘transactions’ site New types of info from linking disparate sourcesNew types of info from linking disparate sources Example: linked geographic info for disaster planning.Example: linked geographic info for disaster planning.

8 Uncovering data needs Micro-level research reveals useful indicatorsMicro-level research reveals useful indicators Employment gross flows (US/BLS)Employment gross flows (US/BLS) Firm demographics (Eurostat)Firm demographics (Eurostat) Interactions with external researchers improves understanding of users needs at NSOsInteractions with external researchers improves understanding of users needs at NSOs Gaps in available data are identified through researchGaps in available data are identified through research

9 Benefits of using Firm-level data Improving quality of statistics Testing of theories at firm-level Firm-level data now used in many fields: IO, Trade, Labor, Finance, Management, Organization, Macro Recent improvements in modelling heterogeneous firms Variation in costs (… of learning, transport, etc) Usually representative consumer, constant mark-up Application of econometric techniques (GMM, clever instruments) to cope with endogeneity Providing ‘moments’ for modellingProviding ‘moments’ for modelling Policy evaluationPolicy evaluation

10 Benefits of using Firm-level data Improving quality of statistics Testing of theories at firm-level Providing ‘moments’ for modellingProviding ‘moments’ for modelling Information drawn from linked longitudinal firm-level distributions can be used to calibrate models. Especially the ability to do cross-country comparisons is promising Policy evaluation

11 Benefits of using Firm-level data Improving quality of statistics Testing of theories at firm-level Providing ‘moments’ for modelling Policy evaluationPolicy evaluation Individual decision making units respond to policyIndividual decision making units respond to policy Track decisions and outcomes from longitudinal micro dataTrack decisions and outcomes from longitudinal micro data No need to infer result from movement in aggregateNo need to infer result from movement in aggregate Identification requires a control groupIdentification requires a control group Implementation of policy differ across cells (locations, between types of units, or over time)Implementation of policy differ across cells (locations, between types of units, or over time) Effect of policy differs across cells (ie highways affect transport-intensive firms)Effect of policy differs across cells (ie highways affect transport-intensive firms) Cross-country comparisons for identificationCross-country comparisons for identification

12 International Experience l l History of micro data access: – –Stats Norway: early 1970s – –US Census: late 1980s l l Typical attitude of NSO before allowing access – –Micro data is too difficult, You can’t really do that with data, and, we don’t trust you to use the data, Absolute security is required – –Well, maybe we can think of how to allow access…. l l Now: At least 25 NSOs have facilities for micro data research – –Also, they use the backbone as basis of statistical process: enormous gains in productivity

13 International Experience l l Situation in EU countries – –Business Register, VAT register, SS register, Business Surveys – –Some have on-site, others have remote access: l l Fin, Swe, Dnk, UK, Nld, Slo, Est, l l Some have excellent in house research: Fra l l In other countries a variety of situations: ad- hoc sharing of data, on-site, trusted third part)

14 Modes of access to confidential micro data Research shop within stats agencyResearch shop within stats agency On-site facility with access rules for external researchersOn-site facility with access rules for external researchers Secure remote-access for external researchersSecure remote-access for external researchers Remote executionRemote execution Distributed micro data analysisDistributed micro data analysis how to share unsharable data how to share unsharable data

15 Issues to consider l Absolute certainty about confidentiality of data l Uniqueness of published official statistics l Requirements for access l Resource cost sharing

16 Confidentiality l l Must weigh costs and benefits – –What is ‘cost’ of confidential data being released l l Relate to costs of not allowing access to data: Increasing irrelevance of stats agency and hopefully extreme budget cuts – –Don’t just look at technical side of disclosure l l What is likelihood of malice or fraud l l Look at ease of getting same or better confidential data elsewhere

17 Uniqueness l l The ‘one published number’ view of stats agencies conflicts with reliability – –We all know numbers don’t add up and that different assumptions generate different stats. So, openness, replicability, review, robustness testing by others will enhance reputation of stats agency publications l l Research output can be labelled as such with a disclaimer

18 Requirements for access l l Create (legal) framework for allowing access by external researchers – –Screening of projects and research teams – –Special employee status l l Create technical facilities – –Database architecture – –Meta data – –On-site laboratory – –Remote-access facilities

19 Distributed Micro Data Research Distributed Micro Data research was developed to allow cross-country research using confidential firm-level data that could not be combinedDistributed Micro Data research was developed to allow cross-country research using confidential firm-level data that could not be combined The key is to ‘micro-aggregate’ underlying micro data into cells that pass disclosure andThe key is to ‘micro-aggregate’ underlying micro data into cells that pass disclosure and Provide enough information for further analysis, and/orProvide enough information for further analysis, and/or Can be merged at cell-level with other sourcesCan be merged at cell-level with other sources DMD can be viewed as system to allow customer- driven publication of statisticsDMD can be viewed as system to allow customer- driven publication of statistics ‘Moments’ are useful for economic modelling‘Moments’ are useful for economic modelling

20 SC LMD EUKLEMS Longitudinal Micro Data National Accounts Industry Data Single country Macro and Sectoral Timeseries Surveys, Business Registers Multiple countries N.A. Data for Cross-country Firm-level Analysis DMD EUKLEMS+

21 Provision of metadata. Approval of access. Execution of Code Disclosure analysis of DMD tables. Disclosure analysis of Publication Researcher Policy Question Research Design Program Code Publication Research Network Metadata Network members DMD Tables NSOs Distributed Micro Data Analysis

22 DMD Projects l OECD 2000-2003 l World Bank 2006 –Followup 2009-2011 l EU/NL 2007 l Eurostat ICT Impacts 2008-2009 –Followup 2010

23 Analytical uses of DMD datasets Creation of new indicators from linked dataCreation of new indicators from linked data Definition of cells based on complex longitudinal characteristicsDefinition of cells based on complex longitudinal characteristics e.g.Employer-employee matchede.g.Employer-employee matched ‘Event’ studies (tracking sub-populations based on prior characteristics)‘Event’ studies (tracking sub-populations based on prior characteristics) Indicators may be high-moments, correlations, regression coefficients, etc.Indicators may be high-moments, correlations, regression coefficients, etc. e.g. correlation of profitability and employee gender-ratio, by industry, region and timee.g. correlation of profitability and employee gender-ratio, by industry, region and time Linking of outside data sources at cell-levelLinking of outside data sources at cell-level Generate custom tabulations of data to match cells of other published or DMD datasetsGenerate custom tabulations of data to match cells of other published or DMD datasets e.q. labor force gender-ratio by region and timee.q. labor force gender-ratio by region and time Cross-country analysis with panels with the same cell level definitionsCross-country analysis with panels with the same cell level definitions

24 Uses of DMD for Policy Evaluation Individual decision making units respond to policyIndividual decision making units respond to policy Track decisions and outcomes from longitudinal micro dataTrack decisions and outcomes from longitudinal micro data No need to infer result from movement in aggregateNo need to infer result from movement in aggregate Identification requires a control groupIdentification requires a control group Implementation of policy differ across cells (locations, between types of units, or over time)Implementation of policy differ across cells (locations, between types of units, or over time) Effect of policy differs across cells (ie highways affect transport-intensive firms)Effect of policy differs across cells (ie highways affect transport-intensive firms)

25 Implementing efficient firm-level data analysis Technical facilitiesTechnical facilities Meta-data librariesMeta-data libraries Disclosure analysis and rules for re-use of extracted datasetsDisclosure analysis and rules for re-use of extracted datasets

26 Technical Facilities Back-bones for universe of statistical unitsBack-bones for universe of statistical units Firms, Households, Dwellings, etcFirms, Households, Dwellings, etc Relational database organisation of data and meta- dataRelational database organisation of data and meta- data Statistical tools inside relational database programming environmentStatistical tools inside relational database programming environment Remote access or remote executionRemote access or remote execution Remote access allows data visualisation, interactive data checkingRemote access allows data visualisation, interactive data checking

27 Meta-data l Ideal application of meta-data –Be able to write generic code remotely –Convert code to run locally, using meta-data l Meta-data set up to describe –available datasets –unique record identifiers –classifications –‘economic variables’

28 Necessary meta-data l list of available forms and schedules l info on record identifiers (Firm_id, person_id) l info on ‘economic variables’ l info on classifications l concordances between units l concordances between variables l concordances to standard classifications

29 Underlying Metadata: datasources Survey Type Name Unique keys Location BRGenBusReg FID, year G:\dirx PSSBS_yyyyFIDG:\diry ECECS_yyyyFIDG:\dirz ISInvS_yyyyFIDG:\dirz

30 Underlying Metadata: variables in survey NameDescriptionUnitsDomain FID Unique FirmID stringGBR IndC Detailed industry code string ISIC r3 Q1 Use of IT integerYNM PurchS Software Exp Eur (1000) ECS_1999

31 Underlying Metadata: classifications of domains IndCDescription TOT Total Economy AG Agriculture, Fishing, Forestry Agriculture, Fishing, Forestry 01 Farms Farms MFG Manufacturing Manufacturing 27t35 Durables Durables 27 Basic Metals Basic Metals ISICr3

32 Underlying Metadata: Concordances IndCICTind 01Other … 12Other … 2727a8 2827a8 IndC_ICTind

33 Disclosure Analysis Can be fairly automated, based on cell-count and ‘concentration’Can be fairly automated, based on cell-count and ‘concentration’ Further, rules may be instated about further use of DMD dataset. For example, requirement that dataset be erased after use will reduce worries about secondary disclosure.Further, rules may be instated about further use of DMD dataset. For example, requirement that dataset be erased after use will reduce worries about secondary disclosure. Checking may also be required on final publicationChecking may also be required on final publication


Download ppt "(Policy) research with confidential micro data Eric J. Bartelsman Vrije Universiteit Amsterdam Tinbergen Institute Expertenworkshop Ondernemingsdata in."

Similar presentations


Ads by Google