Presentation is loading. Please wait.

Presentation is loading. Please wait.

The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,

Similar presentations


Presentation on theme: "The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,"— Presentation transcript:

1 The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago, mulcahy-tim@norc.uchicago.edu

2 Overview Enclave MissionEnclave Mission Data ProtectionData Protection Metadata DocumentationMetadata Documentation Portfolio ApproachPortfolio Approach Focus on Research Collaboration/Developing MetricsFocus on Research Collaboration/Developing Metrics Current StatusCurrent Status SummarySummary

3 Enclave Mission To Promote access to sensitive NIST micro dataTo Promote access to sensitive NIST micro data –Serves mandate of TIP to “accelerate the development of high- risk, transformative research targeted to address key societal challenges.” –NIST has a unique source of innovation data which researchers can use to study: Entrepreneurship & innovationEntrepreneurship & innovation Early stage technology developmentEarly stage technology development Commercialisation of high-risk R&DCommercialisation of high-risk R&D To Protect ConfidentialityTo Protect Confidentiality –Technical –Legal –Organizational –Statistical To Archive, Index and Curate ATP Micro- dataTo Archive, Index and Curate ATP Micro- data

4 What’s in it for NIST? Researcher access to database to examine entrepreneurship and firm behaviorResearcher access to database to examine entrepreneurship and firm behavior Development of research community, including graduate students (and possibly undergraduates)Development of research community, including graduate students (and possibly undergraduates) High quality research => more insights into value added of ATP/TIP programHigh quality research => more insights into value added of ATP/TIP program a)High quality analysis leverages federal investment b)Metadata documentation improves scientific quality

5 Ideal System SecureSecure FlexibleFlexible Low CostLow Cost Meet Replication standardMeet Replication standard –The only way to understand and evaluate an empirical analysis fully is to know the exact process by which the data were generated –Replication dataset include all information necessary to replicate empirical results –Metadata crucial to meet the standard Composed of documentation and structured metadataComposed of documentation and structured metadata Undocumented data are uselessUndocumented data are useless Create foundation for metadata documentation and extend data lifecycleCreate foundation for metadata documentation and extend data lifecycle

6 Metadata & Survey Cycle  Data collection is not a static process – it’s a lifecycle  It dynamically evolved across time and involves many players  It extends to aggregate data to reach decision makers  Metadata are crucial to capture knowledge *Exhibit Courtesy of Chuck Humphrey

7 NORC Data Enclave: Mechanics 1.Data Protection a)Already collect data for multiple statistical agencies (BLS, Federal Reserve (IRS data), EIA, NSF/SRS etc.) => safeguards in place b)NIST approved IT security plan 2.Provision of access – a portfolio approach a)Statistical protection (statistical) b)Researcher training (Educational) c)Dissemination to researcher community (Operational) d)Agency-specific data protection requirements (Legal)

8 Statistical, Technical, Legal & Operational Controls

9 Data Protection

10 The Data Enclave is fully compliant with DOC IT Security Program Policy, Section 6.5.2, the Federal Information Security Management Act, provisions of mandatory Federal Information Processing Standards (FIPS) and all other applicable NIST Data IT system and physical security requirements. The Data Enclave is fully compliant with DOC IT Security Program Policy, Section 6.5.2, the Federal Information Security Management Act, provisions of mandatory Federal Information Processing Standards (FIPS) and all other applicable NIST Data IT system and physical security requirements.

11 IT Security Encrypted connection with the data enclave using virtual private network (VPN) technology. VPN technology enables the data enclave to prevent an outsider from reading the data transmitted between the researcher’s computer and NORC’s network.Encrypted connection with the data enclave using virtual private network (VPN) technology. VPN technology enables the data enclave to prevent an outsider from reading the data transmitted between the researcher’s computer and NORC’s network. Users access the data enclave from specific, pre-defined IP addresses.Users access the data enclave from specific, pre-defined IP addresses. Citrix’s Web-based technology.Citrix’s Web-based technology. –All applications and data run on the server at the data enclave. –Data enclave can prevent the user from transferring any data from data enclave to a local computer. –Data files cannot be downloaded from the remote server to the user’s local PC. –User cannot use the “cut and paste” feature in Windows to move data from the Citrix session. –User is prevented from printing the data on a local computer. Audit logs and audit trailsAudit logs and audit trails

12 Provision of Access

13 2413 Menu Options for Agency X (and Study Y) 1,42,312Licensing (different levels of anonymization) None13,53 with customizatio n Onsite Access 252None Remote Access Educational (1,2,3,4) Operational (1,2,3,4,5) Statistical (1,2,3,4,5) Legal Options (1,2,3,4) Sample Modalities Provision of Research Access

14 Two Approaches:  Remote access –External researchers access data via an encrypted connection with the data enclave using VPN –RSA Smart Card –Restrict user access from specific, pre-defined IP addresses –Citrix technology to access applications – configured so no downloads, cut and paste or print possible  Onsite access –Secure room at NORC site (Bethesda, MD & Chicago, IL) –Secure machines –Video camera –Audit logs and trails –Workspaces

15 Legal and Statistical Protections  Legal –Access Agreement signed by institutional and individual researcher –Approved institutions –Access limited to data requested and authorized  Statistical –Remove obvious identifiers and replace with unique identifiers –Statistical techniques chosen by agency (recognising data quality issues) Note: Both are at discretion of agency and can go above and beyond the minimum level of protection

16 Researcher Training Subjects –Basic confidentiality –Agency specific (joint with agency) –Dataset specific (joint with agency)  Locations –Onsite –Web-based –Researcher locations (AAEA, JSM, AOM, ASA, ASSA, NBER summer institute) Note: The training is designed to go above and beyond current practice in terms of both frequency and coverage

17

18

19 Researcher Responsibilities Serve Agency MissionServe Agency Mission Metadata documentationMetadata documentation –Code –Information about variables Post research outputPost research output Cite sourcesCite sources Evaluation and feedbackEvaluation and feedback

20 Developing a Virtual Collaboratory Value AddedValue Added –Serve Agency Mission –Metadata documentation CodeCode Information about variablesInformation about variables Policy RelevancePolicy Relevance –Research output Cite sourcesCite sources Evaluation and feedbackEvaluation and feedback

21 Logging On The browser downloads the.ica file and launches the Citrix Client

22 ENCLAVE LEVEL PORTAL

23 SITE MENU CONTENT DISPLAY AREA

24

25 ENCLAVE LEVEL FEATURES Informs users about enclave updates, events, publications, new features, etc. Guidelines and technical assistance for new users Calendar of events such as conferences, data release, trainings,…. Background information on the data enclave

26 ENCLAVE LEVEL FEATURES Overview and catalog of surveys available in the enclave General information on clients or survey series

27 ENCLAVE LEVEL FEATURES Access to enclave documentation and public survey documents (reports, questionnaires, no data!). This Information consists of files organized in folders. Can also be searched by categories. A wiki based knowledge area maintained by the enclave managers. Provides FAQ, technical info, tips & trick,… Issue tracking system for users to request technical assistance from the enclave staff or report issues with the survey data. Collaborative features reserved for data enclave managers (not be visible to regular users)

28

29 GROUP LEVEL FEATURES

30 Summary Goal: To promote access to sensitive ATP micro data while protecting confidentialityGoal: To promote access to sensitive ATP micro data while protecting confidentiality Benefits:Benefits: –Secure, low-cost approach to leveraging ATP’s investment in data collection –Archiving, Indexing, and Curation of ATP Micro- data –Applicable and Customizable to agency needs and requirements

31 Next Steps Developing metricsDeveloping metrics –Number of interactions –Additions to the wiki, code, combined variables, macros –Research output (how to quantify) Developing incentivesDeveloping incentives –Establish leaders –External communications

32 Contact Information Timothy M. MulcahyTimothy M. Mulcahy Mulcahy-Tim@norc.uchicago.eduMulcahy-Tim@norc.uchicago.eduMulcahy-Tim@norc.uchicago.edu WebsiteWebsite –http://dataenclave.norc.org


Download ppt "The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,"

Similar presentations


Ads by Google