Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Anonymisation and Linkage

Similar presentations

Presentation on theme: "Data Anonymisation and Linkage"— Presentation transcript:


2 Data Anonymisation and Linkage
Alison Bell Senior Data Analyst / Programmer Health Informatics Centre (HIC) University of Dundee

3 What is HIC ? The Health Informatics Centre (HIC) is a partnership between the University of Dundee, NHS Tayside and the Information Services Division of NHS National Services (ISD). It’s a shared research resource with strong scientific traditions, built on MEMO work since early 1980s. HIC provides authorised researchers and others with anonymised extracts of information derived from person-specific data sets captured by the NHS, University of Dundee researchers and others, to help them answer research questions and address important quality and patient safety issues.

4 HIC Structures Staff and facilities managed by HIC Executive
User input: HIC User Group Governance - Confidentiality & Privacy Advisory Committee (HICCPAC) - Users Forum - Annual External Audit

5 Issues that HIC addresses
Governance: linkage then anonymisation carried out in NHS domain Trust in access to NHS data through approved SOPs, Privacy Advisory Committee, “Clinical Information Bureau” Deterministic linkage via single patient identifier Continually improving data quality through clinical use of data & HIC Users’ Group Ecological fallacy: person, not practice, based data

6 Information governance
Physical security: Isolation of servers holding identifiable data and staff working with it Reliable backup and recovery mechanisms Separation of functions on NHSNet, JANET Governed by Confidentiality & Privacy Advisory Committee Members include lawyer, GP, Caldicott Guardians, Director Public Health Management tools: Standard Operating Procedure Adverse incident reporting mechanism on intranet Project management system enforces SOP Annual external audit by information security experts & table of issues reviewed monthly by HIC Exec

7 HIC Standard Operating Procedure
Covers: Acquisition & anonymisation of datasets Requesting access to data Project level anonymisation (Pro-CHI) Release & archival of datasets Reversal of anonymisation Includes: Definitions Appendix summarising 8 data protection principles Declaration & signature HIC has Caldicott & Ethics approval to supply anonymised data to approved research projects

8 HIC project management system
Allocates each project a unique ID Captures: Identity & contact details of “approved researcher” Project funder Project abstract Copies of approval from Ethics & Caldicott (if required), NHS R&D, protocol Data sources and versions Exact syntax used to generate & link data extracts Audit trail of all data releases Exact location of archived datasets once project complete


10 Available HIC Data HIC hosts a large number of Tayside data sets received from various sources (ISD, PSD, GRO, Ninewells Labs etc.) These cover various populations, time periods and use a variety of coding systems Each of these patient-specific data sets contain the patient CHI number allowing linkage across multiple data sets HIC currently has approval to provide Tayside data only, but seeking to extend to Fife & Glasgow soon

11 How data are linked and anonymised
CHI labelled data CHI labelled data Fully anonymised but linked data Paper prescription-ID Find and enter CHI Paper prescription - ID Drug data-CHI Paper prescription - ID Drug data-CHI Drug data-CHI Drug data, lab data-CHI Link using CHI Drug data, lab data Drug data, lab data-CHI Delete CHI Add Pro-CHI Drug data, lab data-CHI Drug data, lab data Drug data, lab data Lab result-ID Find CHI Lab result - ID Lab data-CHI Lab result - ID Lab data-CHI Lab data-CHI Analysis Data Provider - mainly NHS Clinical Information Bureau Academia

12 Anonymisation Process
Every research dataset has its own project level anonymisation (Pro-CHI) applied to the data before being released to a researcher. Purpose written software generates the Pro-CHI based on the Project Management unique ID & the CHI A 3-digit alphabetic code is generated based on the PM ID (to base26) eg. 165 translates to agj The last 7 digits are randomly generated Eg. (CHI) = (Pro-CHI) agj under project 165 All research data relating to a specific project will have the same 3-digit code. All other patient identifiers are removed (eg name, address etc) Other anonymisations are performed – anon DOB, anon GP code If any identifiable data is required, specific Caldicott approval must be granted

13 A bit more about the prescribing data set …..
The Tayside prescribing data set is unique to the UK. It is a database of all Tayside encashed prescriptions, including CHI, date prescribed and drugs dispensed. Prior to 2005, paper prescriptions were scanned by the data entry clerks and all prescription details were entered manually using a purpose-built application. Since 2005, PSD have been automatically sending HIC the scanned prescription images and associated data. 300,000 prescriptions per month (total 14.5m in dbase from 2005) 13 GB .tif images per month (front and back) 17% (50,000) still require data entry (CHI) each month

14 Users of HIC data 2004-9 93 projects totalling £16m (£3.2m pa), inc:
Diabetes research Maternal & Child Health Dental Health Services Research Cardiovascular Genetics Health Informatics Drug Safety Scottish Longitudinal Studies Centre

15 Examples of recent studies using prescription data
Influence of apo-e & other genotypes on response to statins (Louise Donnelly, GSK studentship) Adherence: to insulin (Morris et al, Lancet); to sulphonylureas (Donnan et al Diab Med, Evans et al Diab Med) Drug safety studies: corticosteroids and risk of fracture (Donnan et al); statins (Li Wei); methadone (Fahey); methotrexate (Guthrie) Markers for co-morbidity, eg. emergency admissions study (Donnan)

16 Future plans Enhanced HIC service including
Programming, statistical, Clinical Trials Unit support, data management Scaling up to a Scotland-wide Health Programme (SHIP) Rolling out novel research data mechanism to further improve information governance: MILA Pilot study – obtaining identifiable retinal images from Ninewells eye clinic (300 5 MB each) & anonymise for research

17 Conventional Record-Linkage (PAC Oversight and SOPs)
Confidentiality? Governance? Scalability? Recipient Generate identifier substitutions and deliver to recipient Data sources Data sources Trusted repository (PAC Oversight and SOPs)

18 Multi-Institutional Linkage & Anonymisation Linker (holds identifiers)
MILA: Multi-Institutional Linkage & Anonymisation (89) Recipient (17) (17 -> 2) (89 -> 2) B (89 -> 2) Confidentiality  Governance  Scalability  A MILA is an alternative linkage mechanism It continues to enforce the separation of data even as the linkage proceeds. It forces all parties to be involved and makes possible a more rigorous governance procedure (17 -> 2) Person (IDA, IDB, …) Person 1 (17, 89, …) Person 2 (…) Linker (holds identifiers) Data sources

19 Some research data mechanisms
Pros Cons Project specific, ad hoc data collections Simple, personal, researcher in control Ad hoc – no governance, re-use Data warehouse Copies of all data in one place Threat to trust, privacy GRID computing & eScience techniques No copies of data Is it trustworthy ? Is it scalable ? Multi Institution Linkage & Anonymisation (MILA) Transparent, data owners retain control In development – pilot complete

20 How MILA matches the requirements
Stakeholder Requirements Patients, the public Trust that mechanism respects consent & privacy ? Data used once, for intended purpose only  Promotes research and knowledge creation  Data owners, eg . NHS Trust that mechanism always secure, follows law ? No work to provide or update dataset  (a benefit ?) Due credit given  Researchers Trust in data provenance, quality, completeness ? Wide range of datasets (data owners trust mechanism) ? Dataset descriptions, scoping searches  Data anonymised but linkable  Simple, rapid, cheap data extracts  Long term data curation  Patients / public: respect, no surprises Data owner: No hassle, no scandals

21 Sir Alan Langlands, September 2005

Download ppt "Data Anonymisation and Linkage"

Similar presentations

Ads by Google