Presentation is loading. Please wait.

Presentation is loading. Please wait.

PhUSE Data De-identification Standard for CDSIC SDTM IG 3

Similar presentations


Presentation on theme: "PhUSE Data De-identification Standard for CDSIC SDTM IG 3"— Presentation transcript:

1 PhUSE Data De-identification Standard for CDSIC SDTM IG 3
PhUSE Data De-identification Standard for CDSIC SDTM IG 3.2, and EMA Policy 0070 15Sept2016 Sherry Meeh Integrated Data Analytics and Reporting Janssen

2 Disclaimer The views and opinions expressed in this slide deck and in the oral presentation belong to the presenter and do not necessarily reflect the views of Janssen Pharmaceutical Companies of Johnson & Johnson or the Integrated Data Analytics and Reporting organization. This presentation does not include any advice or counsel.

3 Topics Background PhUSE Data De-identification Standard for CDISC SDTM IG 3.2 Structure Decisions and Rules Important Considerations EMA Policy 0070 Data Flows Reference Q & A

4 Background PhUSE Data Transparency Working Group:
To meet transparency and data sharing needs Launched PhUSE Data Transparency Working Group in July 2014 Goal: To drive an industry set of data de-identification rules that are based on the CDISC SDTM data model Volunteers from: Pharmaceutical companies (J&J, GSK, Amgen, AstraZeneca, Novartis, Shire, Novo Nordisk, Roche, EISAI …...) CROs (Accenture, d-Wise, ICON, Theorem, …) Academia, Privacy groups, Software companies (SAS, IBM, …)

5 Background PhUSE Data Transparency Working Group:
Established Data De-identification Standard for CDISC SDTM IG 3.2 Draft standard released for extended review in Feb2015 Janssen completed one of two Pilot studies in March2015 PhUSE Data De-identification Standard for CDISC SDTM IG 3.2 was released on 15May2015: EMA endorsement in EMA policy 0070 (European Medicines Agency policy on publication of clinical data for medicinal products for human use), 02March2016 - as guidance and standards on the anonymisation of personal data

6 PhUSE Data De-identification Standard for SDTM IG 3.2
Aim to provide practical guidance to Facilitate the identification of direct and quasi identifiers with 3 levels of data privacy assessment: Direct identifiers Level 1 quasi identifiers Level 2 quasi identifiers Provide rules with technical guidelines (primary and alternative) Ensure consistency across sponsors Provide guidance estimating residual risk in de-id data

7 PhUSE Data De-identification Standard for SDTM IG 3.2
Every domain and variable defined in SDTM IG 3.2 is assessed:

8 PhUSE Data De-identification Standard for SDTM IG 3.2 - Structure
Cover Sheet Version and Date Introduction Sheet Background, Disclaimer, Important Considerations, and etc. Definition Sheet Direct Identifier: "One or more direct identifiers can be used to uniquely identify an individual. E.g. Subject ID, Social Security Number, Telephone number, Exact address, etc. It is compulsory to remove or de-identify any direct identifier.“ Quasi Identifier Level 1: “Information that is not likely to change over time, be visible and available in other sources. Typically Demographic information. E.g. Sex, Age at baseline, Country, BMI, etc.“ Quasi Identifier Level 2: “Longitudinal information that is likely to change over time. E.g. Measurements, Events, etc.”

9 PhUSE Data De-identification Standard for SDTM IG 3
PhUSE Data De-identification Standard for SDTM IG Structure (cont.) Decision Sheet Important areas with rational for decisions: Dates, Low Frequency, etc. Rules Sheet Principle rules for data de-identification: Keep, Remove, Elevate to continent, Derive Age, etc. SDTMIG Sheet SDTM Variables (1300+) listed in SDTMIG with their assessment for data de-identification References Sheets Reference documents used for establishing the standard: SDTM IG, HIPAA Privacy Rule, etc. Appendices Sheet Details regarding: Dates Offsetting Method , and Low Frequency.

10 PhUSE Data De-identification Standard for SDTM IG 3
PhUSE Data De-identification Standard for SDTM IG Decisions and Rules ID variables Create a new random unique subject ID that is not made up of any identifiable information. Site numbers must not be replicated in the recoded subject IDs. The list of original subject IDs and the recoded ones must not have any values in common. Same recoded subject ID must be used in extension study data. If the same subject is part of several studies within the same request, consider providing same subject ID. If excluded patients (protocol inclusion/exclusion criteria) are deleted, this must be documented. Reference ID and Sponsor ID may be constructed using CRF page or laboratory sample numbers (Direct Identifiers) and are recommended to be recoded while Group ID and Link ID are usually derived numbers by the sponsor and can be kept as-is. If only operational and not used to link data in other datasets, they can be removed. Geographical Location Country is an important Level 1 Quasi Identifier and in order to decrease residual risk by default (i.e. in the case of proactive data de-identification), country is advised to be aggregated to continent as primary rule unless critical to the analysis (e.g. Country is a fixed-effect factor in a statistical model and the results cannot be reproduced). The alternative rule is to keep country as- is. Site ID and Investigator ID need to be removed because a frequency analysis would likely reveal the most highly recruiting site in a country/region (which by definition would include many of the participants). The alternative rule is to recode Site ID and Investigator ID if required for the analysis and in this case, it should be considered within the risk assessment.

11 PhUSE Data De-identification Standard for SDTM IG 3
PhUSE Data De-identification Standard for SDTM IG Decisions and Rules (cont.) Age and Aggregate Age "Date of Birth is removed and Age must be kept as-is for patients up to 89 years old included and redacted (in this case replaced by ""."" in numeric variable AGE in DM dataset) for patients above 89 years old and set to missing when not available. “ If the distribution of age in the dataset includes low frequency or the overall residual risk requires further de-identification, further age aggregation may be applied by aggregating age in different intervals (e.g. "20-25 YEARS", "25-30 YEARS", etc. or "20-22 YEARS", "22-24 YEARS", etc.) within variable AGECATDI in DM with values in capital letters and the variable AGE in DM must be deleted. The interval length will depend on the data and must be established by the sponsor. Date variables Offset dates (see appendix 1 for detailed algorithm) Relative date Free -text data In general, free-text data must be deleted as free-text data is considered to be a Direct identifier as it can hold any data including personally identifiable information (PII). Recommend to use coded values instead of free-text

12 PhUSE Data De-identification Standard for SDTM IG 3
PhUSE Data De-identification Standard for SDTM IG Important Considerations Sponsor’s responsibility A secure portal where the download and upload of data is controlled For example, Janssen utilizes Yale Open Data Access (YODA) portal Data sharing agreements are signed between sponsors and researchers The researchers have privacy practices in place at their institution. The rules described here do not guarantee an acceptable or very small residual risk of re-identification in the data and it is the responsibility of the sponsors to define and measure what the residual risk is and define an acceptable risk threshold. SDTM is a normalized data model, not all direct nor quasi identifiers may be captured in this deliverable and it is the responsibility of the sponsor to ensure that such assessment is conducted and reviewed according to defined internal procedures Redacted documents (e.g. CSR) disclosed together with the de-identified data or made public (or semi-public) must be redacted in the same manner and consistency

13 PhUSE Data De-identification Standard for SDTM IG 3.2 - Sample
Finding Picasso Subject ID Birth Date Age Sex Country 001 10JAN1960 56 F UK 002 09MAR1954 62 M USA 003 25OCT1881  135 Spain 004 06JUNE1945 71 France 005 20OCT1912 104 006 12DEC1907 109 007 12NOV1943 73 008 18AUG1950 66 Germany 009 09SEPT1919 97

14 PhUSE Data De-identification Standard for SDTM IG 3.2 - Sample
Finding Picasso DeID Subject ID Birth Date DeID Age Sex De-ID Country 991 55-59 F Europe 992 60-64 M NA 993 >89 994 70-74 995 996 997 998 65-69 999

15 EMA Policy 0070 External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use, 02Mar2016 Composed of two phases: Phase 1 of Policy 0070 entered into force on 1st January Phase 1 pertains to publication of clinical reports only Phase 2, which will be implemented at a later stage, pertains to the publishing of individual patient data (IPD) Endorsed PhUSE De-ID standard as guidance and standards on the anonymisation of personal data

16 PhUSE Working Group: De-Identification Data Flow Scenarios
Raw Data CODE SDTM ADaM TFLs (CSR) Clinical Data Flow CODE Raw Data SDTM ADaM TFLs (CSR) Future: DeID at SDTM Anonymized Anonymized ADaM CODE CODE Anonymized CSR (Listings) Raw Data CODE SDTM ADaM TFLs (CSR) Current: DeID SDTM and ADaM together Anonymized CSR (Listings) Use this data flow slide for IDB-SDTM projects OR Anonymized SDTM and ADaM CODE CODE

17 Thank You ! The Working Group participants for their contribution
Vinitha Arumugam & Patricia Coyle (GSK) Jean-Marc Ferran (Qualiance and PhUSE) Nancy Freidland (IBM) Per-Arne Stahl (AstraZeneca) Nick Dedonder (BDLS) Gene Lightfoot (SAS) Sherry Meeh (Johnson & Johnson) Cathal Gallagher (d-Wise) Jacques Lanoue & Benoit Vernay (Novartis) Kim Musgrave (Amgen) Nate Freimark (Theorem) Joanna Koft (Biogen Idec) Gary Chen (Shire) Khaled El Emam (Privacy Analytics) Jennifer Chin (EISAI) Carol Herremans (Merck) Beate Hientzsch & Sven Greiner (Accovion) Kishore Papineni, Thijs van den Hoven & Bharat Jaswani (Astellas) Kelly Mewes (Roche) Kristin Kelly (Accenture) Sarah Nolan (Liverpool University & Cochran) Boris Grimm (BI) Shafi Chowdury (Shafi Consultancy) Ravi Yandamuri (MMS Holdings) The Janssen Pilot Study participants for their contribution Denis Michel Bin Zhou Hongwen Gao

18 References PhUSEPhUSE Data De-identification Standard for CDSIC SDTM IG 3.2 EMA Policy 0070 _and_procedural_guideline/2016/03/WC pdf

19 Q&A Integrated Data Analytics and Reporting Janssen


Download ppt "PhUSE Data De-identification Standard for CDSIC SDTM IG 3"

Similar presentations


Ads by Google