Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Microdata Computation Centre for de-centralized data sources

Similar presentations


Presentation on theme: "A Microdata Computation Centre for de-centralized data sources"— Presentation transcript:

1 A Microdata Computation Centre for de-centralized data sources
Anja Burghardt and David Schiller (FDZ of BA at IAB) IASSIST 2014, Toronto (Canada), June 4, 2014

2 Introduction The European “Data without Boundaries” (DwB) project aims to improve the access to confidential microdata. A European Remote Access Network (EuRAN) should ease data access. A number of services will be provided to researchers. One of them is a Microdata Computation Centre (MiCoCe).

3 Introduction EuRAN with SPA are secure infrastructures to access tools to analyze distributed data sources. Restriction when talking about confidential microdata: data has to stay physically in the facilities of the data providers. This is due to legal restrictions: “I can only care about my data as long as it is stored in my institution or at least in my country (range of validity of my law)”. Goal: enable analyzes with multiple distributed data sources via the EuRAN SPA.

4 EuRAN – the network

5 Single Point of Access and Service Hub
Web portal User Account Management Digital Rights Management Researcher CV / Passport Online submission system Information Platform Datasets available Variable level documentation Accreditation Data Access Virtual Research Environment Collaborative work Remote Desktop Job submission Output MiCoCe Data Storage (general) Data Storage (project) Data analysis „on the fly“ Additional Service

6 MiCoCe workshop in Nuremberg, Germany
DwB proposed the MiCoCe in his deliverable 4.2. The need for a tool like this is stressed out by both sides: Researchers in order to answer European level research questions Data owners in order to give secure access to their data sources Fact: DwB had no real idea how to do and implement something like this? Reaction: we organized an workshop with experts from different disciplines (took place April 29-30, 2014)

7 Workshop participants
Tim Mulcahy (NORC), Duncan Smith (University of Manchester), Paul Burton (University of Bristol), Amadou Gaye (University of Bristol), Claus Goran Hjelm (Statistic Sweden), Christian Boehme (GWDG), Thorsten Busert (DIPF), Gerald Mahlmeister (TBA21), Roxane Silberman (CNRS), Leo Engberts (CBS), Titus Purdea (Eurostat), Patricia Kelly Hall (Minnesota Population Center), Ørnulf Risnes (NSD), Andreas Nold (SAS), Lars Hvidberg (University Southern Denmark), Gillian Raab (University of Edinburgh), Beata Nowok (University of Edinburgh), Alf Wachsmann (MDC for Molecular Medicine), Hans Irebäck (Statistic Sweden), Cosmin Basca (University of Zurich), Oliver Schmitt (GWDG), Peter-Paul de Wolf (CBS), Christoph Stallmann (University Magdeburg), David Schiller (IAB), Anja Burghardt (IAB), Stefan Bender (IAB), Johanna Eberle (IAB), Thomas Rhein (IAB), Iris Dieterich (IAB), and Jörg Drechsler (IAB).

8 Outline Need for harmonization of data sources.
Trusted third party approaches. Example of solutions from different disciplines. Big data and public/private sector cooperation. Conclusion and outlook.

9 Need for harmonization of data sources
Some statements: “From a technical point you can run analysis on distributed data sources but the results are likely nonsense if the data is not harmonized” “As a researcher I don’t care if data is marked as comparable. Access to data is always better than no access and it is up to me to use the data in a useful way” “The responsibility for output harmonization lies with the data provider but also with the researcher. However, the data provider should support the researcher by providing good (and understandable) data documentation!“ “Harmonization of microdata AND metadata are both necessary” “The VRE of EuRAN can be a secured environment to create international data sets”

10 Trusted third party approaches
Some kind of centralized infrastructure is needed in order to work with distributed data sources. If there is a trusted party, there can be a change to move the data physically to a central place. Different levels of creating trust exist: Trust in an organization or person (example: Tim Mulcahy from NORC who was able to create a data set out of business information of US rating agencies). Trust in secure (and sophisticated) organizational workflows (example German National Cohort that works with confidential information about individuals coming from different sources).

11 Examples of solutions from different disciplines
Health Data (DataShield) - Data Aggregation Through Anonymous Summary-statistics from Harmonized Individual levEL Databases. Governmental Data (Stat. Sweden and Netherlands) – e.g. federated solutions. Synthetic Data (University Edinburgh) – creating non confidential data to support research projects. Database and File systems (DIPF and GWDG) – adopting current solutions to fit into the needs of social science research. Virtualization (SAS) – using approaches created for Big Data to solve data security issues. Statistical approaches (CBS) – adopting statistical methods to solve the challenges of analyzing confidential microdata in a secure way. Adaptive query processing over distributed linked data-endpoints (University Zurich) – privacy preserving solution if adopted to social science needs. Need to analyze and compare approaches to support the needs of researchers and data providers.

12 Big data and public/private sector cooperation
The use of Big Data for social sciences research is getting a more and more important topic. Beside the evaluation of the usefulness of Big Data an infrastructure to work with Big Data is needed. MiCoCe and EuRAN can be adopted to work as such an infrastructure. Big Data is often hosted by private sector companies. Need to make data sources from public and private sector available without harming the interests of each other. MiCoCe and EuRAN can work as a trusted infrastructure to enable access to public and private sector data.

13 Conclusion and Outlook
This was the first summarizing of the workshop output. A paper with more detailed discussions will be prepared by the workshop participants. And will be (a small) part of the DwB output (European Data Access Forum (final event of DwB (I)) in March 2015). Creating running tools can be a task for one or more follow-up project(s) of DwB. Hopefully we will have good news on that at the 41th IASSIST in Minneapolis.

14 Thank you for your attention!
Anja Burghardt, David Schiller, This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no


Download ppt "A Microdata Computation Centre for de-centralized data sources"

Similar presentations


Ads by Google