ROLE OF ANONYMIZATION FOR DATA PROTECTION Irene Schluender and Murat Sariyar (TMF)

Slides:



Advertisements
Similar presentations
NIGB Legal requirements for use of personal data in research OnCore UK / NRES Training workshop Ethical Principles relating to consent for use of samples.
Advertisements

NATIONAL INFORMATION GOVERNANCE BOARD
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY.
DATA PROTECTION and Research University Research Ethics Committee – David Cauchi David Cauchi Office of the Commissioner for Data Protection.
Introduction to basic principles of Regulation (EC) 45/2001 Sophie Louveaux María Verónica Pérez Asinari.
Data Protection: Health. Data Protection & Health Data Data on physical or mental health or condition or sexual life are ‘sensitive personal data’ with.
21-1 Last time Database Security  Data Inference  Statistical Inference  Controls against Inference Multilevel Security Databases  Separation  Integrity.
Protecting the Privacy of Family Members in Survey and Pedigree Research Jeffrey R. Botkin, MD, MPH University of Utah.
Health Insurance Portability Accountability Act of 1996 HIPAA for Researchers: IRB Related Issues HSC USC IRB.
Privacy and Information Security Essentials
1 Privacy Preserving Data Publishing Prof. Ravi Sandhu Executive Director and Endowed Chair March 29, © Ravi.
Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.
DATA PROTECTION and Research University Research Ethics Committee – David Cauchi Office of the Data Protection Commissioner.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Protecting Participants in a Global Research Community Dr. Jane Kaye University of Oxford, UK.
C MU U sable P rivacy and S ecurity Laboratory 1 Privacy Policy, Law and Technology Data Privacy October 30, 2008.
Privacy Policy, Law and Technology Carnegie Mellon University Fall 2007 Lorrie Cranor 1 Data Privacy.
Privacy in Computing Legal & Ethical Issues in Computer …Security Information Security Management …and Security Controls Week-9.
Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014.
1st MODINIS workshop Identity management in eGovernment Frank Robben General manager Crossroads Bank for Social Security Strategic advisor Federal Public.
) Linked2Safety Project (FP7-ICT – 5.3 ) A NEXT-GENERATION, SECURE LINKED DATA MEDICAL INFORMATION SPACE FOR SEMANTICALLY-INTERCONNECTING ELECTRONIC.
Identity A legal perspective FIDIS WP2 workshop 2/3 december 2003
Privacy and trust in social network
Human Research Protection Programs 1a: How to Navigate Human Subject Protection Regulations Sponsored by the American Society for Investigative Pathology.
Oviedo Convention and Its Protocols – Impact on Polish Law International Bioethics Conference Oviedo Convention in Central and Eastern European Countries.
Meeting The Technical Security Needs Primary and Secondary use of EHR systems Filip De Meyer
Li Xiong CS573 Data Privacy and Security Healthcare privacy and security: Genomic data privacy.
Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be.
PSEUDONYMIZATION TECHNIQUES FOR PRIVACY STUDY WITH CLINICAL DATA 1.
The Eighth Asian Bioethics Conference Biotechnology, Culture, and Human Values in Asia and Beyond Confidentiality and Genetic data: Ethical and Legal Rights.
Europe's work in progress: quality of mHealth Pēteris Zilgalvis, J.D., Head of Unit, Health and Well-Being, DG CONNECT Voka Health Community 29 September.
Privacy Impact Assessments Iain Bourne, Group Manager, Policy Delivery Information Commissioner’s Office, UK Workshop on data protection and the internet:
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
Dimensions of Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Achieving Anonymity in Micro Data Files 10th Symposium on Identity and Trust on the Internet April 6-7, 2011 Privacy: An Emerging Landscape Alvan O. Zarate,
Data Anonymization – Introduction and k-anonymity Li Xiong CS573 Data Privacy and Security.
1 Ethical issues in genomics research Bernard Lo, M.D. March 3, 2009.
ANONYMISATION Research Data Management. c Research Data Management Sensitive Data Sensitive Data is information covering: The racial or ethnic origin.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
HIPAA and Human Subjects Research IRB Member CE May 2014 Slideshow by Sean Horkheimer.
CSCI 347, Data Mining Data Anonymization.
Nov 26, Health-y sharing of human data. 2 Plan ahead.. It can be done in many cases, to great success and benefit!
Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory.
Unraveling an old cloak: k-anonymity for location privacy
PCOR Privacy and Security Research Scenario Initiative and Legal Analysis and Ethics Framework Development Welcome and Please Sign In »Please sign into.
Privacy, data protection and connected cars Lilian Edwards, Professor of Internet Law University of Strathclyde Researcher in Residence, Digital Catapult.
PCOR Privacy and Security Research Scenario Initiative and Legal Analysis and Ethics Framework Development Welcome and Please Sign In »Please sign into.
“Translational research includes two areas of translation. One (T1) is the process of applying discoveries generated during research in the laboratory,
Business Challenges in the evolution of HOME AUTOMATION (IoT)
Transforming Data to Satisfy Privacy Constraints 컴퓨터교육 전공 032CSE15 최미희.
Controlled Data Access for Precision Medicine: An Acceptable Trade-off? Yann Joly, Ph.D.
Ethical, legal and social aspects of public health genomics Mark Taylor, School of Law, University of Sheffield 7 th November 2014.
HIPAA and RESEARCH 5 th Thursday May 31, Page 2.
An agency of the European Union Guidance on the anonymisation of clinical reports for the purpose of publication in accordance with policy 0070 Industry.
Brussels Privacy Symposium on Identifiability
Brussels Privacy Symposium on Identifiability
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Issues of personal data protection in scientific research
Viewing the GDPR Through a De-Identification Lens
Amandine Jambert - IT Experts Department
General Data Protection Regulation
G.D.P.R General Data Protection Regulations
The GDPR and research data
General Data Protection Regulation
Data Access and Stewardship
Presented by : SaiVenkatanikhil Nimmagadda
Analysis of Final HIPAA Privacy Modification Rule
Dr Elizabeth Lomas The General Data Protection Regulation (GDPR): Changing the data protection landscape Dr Elizabeth Lomas
Pseudonymised Matching: Robustly Linking Molecular and Prescription Data to Cancer Registry Data in England Brian Shand, Fiona McRonald, Katherine Henson,
Should we also regulate non-personal data?
Presentation transcript:

ROLE OF ANONYMIZATION FOR DATA PROTECTION Irene Schluender and Murat Sariyar (TMF)

Background Individual-level data used in health research contexts:  EHRs  hospital discharge databases  Health insurance data  Clinical studies  Genetic datasets: 1000 genomes, HapMap, TCGA, …

Legal reasons for anonymisation: Privacy BASIC DATA PROTECTION CONSTRAINTS Legal framework: national Law, EU Law, no international harmonisation, only „softlaw“, e.g. Declaration of Helsinki EU Data Protection Directive: Any processing of personal data is generally prohibited, if not explicitly permitted (Article 7: “Member States shall provide that personal data may be processed only if:…) Permission must be based on law or on the consent of the data subject refer to a specific purpose difficult in „omics“ research and data mining

ANONYMISATION AS „MAGIC BULLET“? Article 7: “Member States shall provide that personal data may be processed only if:… Dichotomy of data protection law: anonymous versus personal data: Only personal data is protected by law Anonymous data: no consent or other legal basis needed for processing (Rec. 26: “the principles of protection shall not apply to data rendered anonymous”) Conclusion: anonymise to get rid of any data protection constraints!

Anonymisation vs. De-identification No HIPPA List! Removing all 18 identifiers leads to de-identified data, not to anonymous data! The list makes only sense within the context of HIPAA and cannot be transferred into the European legal framework.

Context and Trade-off Anonymisation is not static, but dependent on context knowlwdge  „Harry Smith“  De facto anonymity is sufficient („reasonable means“) Trade-off: usefulness and re-identification risk  information is reduced or distorted  some of it may be relevant for research  Challenges: enhanced re-identification technologies, increasing context kowledge

Anonymisation of genetic data? DNA sequences alone do not disclose the identity of an individual But it can be enough information to single out a person Opinion on Anonymisation Techniques (Art. 29 Working Party): “Genetic data profiles are an example of personal data that can be at risk of identification if the sole technique used is the removal of the identity of the donor due to the unique nature of certain profiles. It has already been shown in the literature that the combination of publically available genetic resources (e.g. genealogy registers, obituary, results of search engine queries) and the metadata about DNA donors (time of donation, age, place of residence) can reveal the identity of certain individuals even if that DNA was donated ‘anonymously’”.

Side-effects (adverse events) of anonymisation Full (unlinked) anonymisation deprives the donor of the possibility to use their right to withdraw consent (critical for biosamples/genetic data) It makes feeding back research results or incidental findings impossible It is not useful in cases where research is linked to treatment (oncology: precision medicine)

Therefore … Anonymisation is not a panacea to resolve any data protection issues We will have to rely on „Broad consent“ (for example as agreed with the German Ethics Committee‘s Working Group) + additional safeguards (access control etc.) Broad consent will (hopefully) supported by GDPR (Rec. 25aa)

The technical perspective

What is anonymization? ISO 29100:2011: “Anonymization is the process by which personally identifiable information (PII) is irreversibly altered in such a way that a PII principal can no longer be identified directly or indirectly, either by the PII controller alone or in collaboration with any other party.”

Relevant terms: Kind of Attributes Kind of attributes: (1)Unique Identifiers (e.g., social security number) (2)Quasi-Identifiers (e.g., Zip-Code) => QIDs (3)Sensitive attributes (exhibiting a special characteristic) (4)Non-sensitive attributes

Relevant terms: Quasi-Identifier OECD-Definition for a Quasi-Identifier: Variable values or combinations of variable values within a dataset that are not structural uniques but might be empirically unique and therefore in principle uniquely identify a population unit. Should contain an attribute A if an attacker could potentially obtain A from other external resources. QIDs (5-digit ZIP code, birth date, gender) uniquely identify 87% of the population in the U.S.

Important Anonymization techniques Generalization and Suppression (hide some details in QID)  Replace some values with a parent value in a taxonomy  Full-domain and local (subtree, cell) generalization  Suppression (see former slide) Anatomization and Permutation (structural changes)  Deassociate the relationship between QIDs and sensitive attributes  Partition into groups and shuffle sensitive values within each group Perturbation  Additive Noise (Randomization; independent of other recs => data streams), Data swapping, synthetic data generation

Anonymization techniques: Cave These are criteria not techniques: K-Anonymity L-Diversity T-Closeness And there is no hierarchy! K-Anonymity protects against identity disclosure L-diversity and T-Closeness protect against attribute disclosure There are more definitions for L-diversity

Anonymization techniques: generalization

Conclusion Creating an anonymous dataset whilst retaining as much of the underlying information as required for the task (usefulness) is done by technical means However … The legal perspective should correspond with the technical one (e.g., regarding the definition of sensitive attributes)

References BCM Fung et al. Privacy-preserving data publishing: A survey of recent developments (ACM Computing Surveys) L Sweeney. K-anonymity: a model for protecting privacy (International Journal on Uncertainty, Fuzziness and Knowledge-based Systems) CC Aggarwal. Privacy-Preserving Data Mining: Models and Algorithms (Advances in Database Systems) (Springer)