Presentation on theme: "PhUSE Data De-identification Standard for CDSIC SDTM IG 3.2, and EMA Policy 0070 Sherry Meeh 15Sept2016 Integrated Data Analytics and Reporting Janssen."— Presentation transcript:
PhUSE Data De-identification Standard for CDSIC SDTM IG 3.2, and EMA Policy 0070 Sherry Meeh 15Sept2016 Integrated Data Analytics and Reporting Janssen
Janssen Research & Development Disclaimer The views and opinions expressed in this slide deck and in the oral presentation belong to the presenter and do not necessarily reflect the views of Janssen Pharmaceutical Companies of Johnson & Johnson or the Integrated Data Analytics and Reporting organization. This presentation does not include any advice or counsel. 2
Janssen Research & Development Topics Background PhUSE Data De-identification Standard for CDISC SDTM IG 3.2 –Structure –Decisions and Rules –Important Considerations EMA Policy 0070 Data Flows Reference Q & A 3
Janssen Research & Development Background 4 PhUSE Data Transparency Working Group: To meet transparency and data sharing needs Launched PhUSE Data Transparency Working Group in July 2014 Goal: To drive an industry set of data de-identification rules that are based on the CDISC SDTM data model Volunteers from: Pharmaceutical companies (J&J, GSK, Amgen, AstraZeneca, Novartis, Shire, Novo Nordisk, Roche, EISAI …...) CROs (Accenture, d-Wise, ICON, Theorem, …) Academia, Privacy groups, Software companies (SAS, IBM, …)
Janssen Research & Development Background 5 PhUSE Data Transparency Working Group: Established Data De-identification Standard for CDISC SDTM IG 3.2 Draft standard released for extended review in Feb2015 Janssen completed one of two Pilot studies in March2015 PhUSE Data De-identification Standard for CDISC SDTM IG 3.2 was released on 15May2015: http://www.phuse.eu/data_transparency.aspx http://www.phuse.eu/data_transparency.aspx EMA endorsement in EMA policy 0070 (European Medicines Agency policy on publication of clinical data for medicinal products for human use), 02March2016 - as guidance and standards on the anonymisation of personal data
Janssen Research & Development PhUSE Data De-identification Standard for SDTM IG 3.2 Aim to provide practical guidance to Facilitate the identification of direct and quasi identifiers with 3 levels of data privacy assessment: –Direct identifiers –Level 1 quasi identifiers –Level 2 quasi identifiers Provide rules with technical guidelines (primary and alternative) Ensure consistency across sponsors Provide guidance estimating residual risk in de-id data 6
Janssen Research & Development PhUSE Data De-identification Standard for SDTM IG 3.2 7 Every domain and variable defined in SDTM IG 3.2 is assessed:
Janssen Research & Development PhUSE Data De-identification Standard for SDTM IG 3.2 - Structure 8 Cover Sheet – Version and Date Introduction Sheet –Background, Disclaimer, Important Considerations, and etc. Definition Sheet –Direct Identifier: "One or more direct identifiers can be used to uniquely identify an individual. E.g. Subject ID, Social Security Number, Telephone number, Exact address, etc. It is compulsory to remove or de-identify any direct identifier.“ –Quasi Identifier Level 1: “Information that is not likely to change over time, be visible and available in other sources. Typically Demographic information. E.g. Sex, Age at baseline, Country, BMI, etc.“ –Quasi Identifier Level 2: “Longitudinal information that is likely to change over time. E.g. Measurements, Events, etc.”
Janssen Research & Development PhUSE Data De-identification Standard for SDTM IG 3.2 - Structure (cont.) 9 Decision Sheet – Important areas with rational for decisions: Dates, Low Frequency, etc. Rules Sheet –Principle rules for data de-identification: Keep, Remove, Elevate to continent, Derive Age, etc. SDTMIG Sheet –SDTM Variables (1300+) listed in SDTMIG with their assessment for data de-identification References Sheets –Reference documents used for establishing the standard: SDTM IG, HIPAA Privacy Rule, etc. Appendices Sheet – Details regarding: Dates Offsetting Method, and Low Frequency.
Janssen Research & Development PhUSE Data De-identification Standard for SDTM IG 3.2 - Decisions and Rules 10 ID variables –Create a new random unique subject ID that is not made up of any identifiable information. Site numbers must not be replicated in the recoded subject IDs. The list of original subject IDs and the recoded ones must not have any values in common. Same recoded subject ID must be used in extension study data. If the same subject is part of several studies within the same request, consider providing same subject ID. If excluded patients (protocol inclusion/exclusion criteria) are deleted, this must be documented. –Reference ID and Sponsor ID may be constructed using CRF page or laboratory sample numbers (Direct Identifiers) and are recommended to be recoded while Group ID and Link ID are usually derived numbers by the sponsor and can be kept as-is. If only operational and not used to link data in other datasets, they can be removed. Geographical Location –Country is an important Level 1 Quasi Identifier and in order to decrease residual risk by default (i.e. in the case of proactive data de-identification), country is advised to be aggregated to continent as primary rule unless critical to the analysis (e.g. Country is a fixed-effect factor in a statistical model and the results cannot be reproduced). The alternative rule is to keep country as- is. Site ID and Investigator ID need to be removed because a frequency analysis would likely reveal the most highly recruiting site in a country/region (which by definition would include many of the participants). The alternative rule is to recode Site ID and Investigator ID if required for the analysis and in this case, it should be considered within the risk assessment.
Janssen Research & Development PhUSE Data De-identification Standard for SDTM IG 3.2 - Decisions and Rules (cont.) 11 Age and Aggregate Age –"Date of Birth is removed and Age must be kept as-is for patients up to 89 years old included and redacted (in this case replaced by ""."" in numeric variable AGE in DM dataset) for patients above 89 years old and set to missing when not available. “ –If the distribution of age in the dataset includes low frequency or the overall residual risk requires further de-identification, further age aggregation may be applied by aggregating age in different intervals (e.g. "20-25 YEARS", "25-30 YEARS", etc. or "20-22 YEARS", "22-24 YEARS", etc.) within variable AGECATDI in DM with values in capital letters and the variable AGE in DM must be deleted. The interval length will depend on the data and must be established by the sponsor. Date variables –Offset dates (see appendix 1 for detailed algorithm) –Relative date Free -text data –In general, free-text data must be deleted as free-text data is considered to be a Direct identifier as it can hold any data including personally identifiable information (PII). –Recommend to use coded values instead of free-text
Janssen Research & Development Sponsor’s responsibility –A secure portal where the download and upload of data is controlled For example, Janssen utilizes Yale Open Data Access (YODA) portal –Data sharing agreements are signed between sponsors and researchers –The researchers have privacy practices in place at their institution. –The rules described here do not guarantee an acceptable or very small residual risk of re-identification in the data and it is the responsibility of the sponsors to define and measure what the residual risk is and define an acceptable risk threshold. –SDTM is a normalized data model, not all direct nor quasi identifiers may be captured in this deliverable and it is the responsibility of the sponsor to ensure that such assessment is conducted and reviewed according to defined internal procedures –Redacted documents (e.g. CSR) disclosed together with the de-identified data or made public (or semi-public) must be redacted in the same manner and consistency 12 PhUSE Data De-identification Standard for SDTM IG 3.2 - Important Considerations
Janssen Research & Development Finding Picasso 13 PhUSE Data De-identification Standard for SDTM IG 3.2 - Sample Subject IDBirth DateAgeSexCountry 00110JAN196056FUK 00209MAR195462MUSA 00325OCT1881 135MSpain 00406JUNE194571FFrance 00520OCT1912104MUK 00612DEC1907109MFrance 00712NOV194373FUSA 00818AUG195066MGermany 00909SEPT191997FSpain
Janssen Research & Development Finding Picasso 14 PhUSE Data De-identification Standard for SDTM IG 3.2 - Sample DeID Subject IDBirth DateDeID AgeSexDe-ID Country 991 55-59FEurope 992 60-64MNA 993 >89MEurope 994 70-74FEurope 995 >89MEurope 996 >89MEurope 997 70-74FNA 998 65-69MEurope 999 >89FEurope
Janssen Research & Development EMA Policy 0070 External guidance on the implementation of the European Medicines Agency policy on the publication of clinical data for medicinal products for human use, 02Mar2016 Composed of two phases: –Phase 1 of Policy 0070 entered into force on 1st January 2015. Phase 1 pertains to publication of clinical reports only –Phase 2, which will be implemented at a later stage, pertains to the publishing of individual patient data (IPD) –Endorsed PhUSE De-ID standard as guidance and standards on the anonymisation of personal data 15
Janssen Research & Development PhUSE Working Group: De-Identification Data Flow Scenarios OR Raw Data CODE SDTM CODE ADaM CODE TFLs (CSR) Clinical Data Flow CODE Raw Data CODE SDTM CODE ADaM TFLs (CSR) Future: DeID at SDTM CODE Anonymized SDTM Raw Data CODE SDTM ADaM CODE TFLs (CSR) Current: DeID SDTM and ADaM together CODE Anonymized CSR (Listings) Anonymized ADaM Anonymized CSR (Listings) CODE Anonymized SDTM and ADaM CODE
Janssen Research & Development 17 The Working Group participants for their contribution Vinitha Arumugam & Patricia Coyle (GSK) Jean-Marc Ferran (Qualiance and PhUSE) Nancy Freidland (IBM) Per-Arne Stahl (AstraZeneca)Nick Dedonder (BDLS)Gene Lightfoot (SAS) Sherry Meeh (Johnson & Johnson) Cathal Gallagher (d-Wise)Jacques Lanoue & Benoit Vernay (Novartis) Kim Musgrave (Amgen)Nate Freimark (Theorem)Joanna Koft (Biogen Idec) Gary Chen (Shire)Khaled El Emam (Privacy Analytics) Jennifer Chin (EISAI) Carol Herremans (Merck)Beate Hientzsch & Sven Greiner (Accovion) Kishore Papineni, Thijs van den Hoven & Bharat Jaswani (Astellas) Kelly Mewes (Roche)Kristin Kelly (Accenture)Sarah Nolan (Liverpool University & Cochran) Boris Grimm (BI)Shafi Chowdury (Shafi Consultancy) Ravi Yandamuri (MMS Holdings) The Janssen Pilot Study participants for their contribution Denis MichelBin ZhouHongwen Gao Thank You !
Janssen Research & Development References PhUSEPhUSE Data De-identification Standard for CDSIC SDTM IG 3.2 –http://www.phuse.eu/data_transparency.aspxhttp://www.phuse.eu/data_transparency.aspx EMA Policy 0070 –http://www.ema.europa.eu/docs/en_GB/document_library/Regulatory _and_procedural_guideline/2016/03/WC500202621.pdfhttp://www.ema.europa.eu/docs/en_GB/document_library/Regulatory _and_procedural_guideline/2016/03/WC500202621.pdf
Q&A Integrated Data Analytics and Reporting Janssen