Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data linkage: the key to long term outcomes Professor Ronan Lyons Farr Institute – CIPHER Centre for Improvement in Population Health through E-records.

Similar presentations

Presentation on theme: "Data linkage: the key to long term outcomes Professor Ronan Lyons Farr Institute – CIPHER Centre for Improvement in Population Health through E-records."— Presentation transcript:

1 Data linkage: the key to long term outcomes Professor Ronan Lyons Farr Institute – CIPHER Centre for Improvement in Population Health through E-records Research. Swansea University Biennial Scientific Meeting, Congenital Anomaly Registers: Utilizing a valuable resource Tuesday 7 th October 2104 Dylan Thomas Centre, Swansea

2 Farr Institute Data linkage in the UK What is possible now and in the future Long term outcomes Content of Presentation

3 Historical research

4 MRC’s vision for UK medical bioinformatics research Enabling technologies & infrastructure Developing capacity & expertise Funding for innovative research High throughput data Cohorts Trials BioBanks Educational Environmental Social Data NHS Clinical Data Patient groups Demographic data

5 Farr UCL Partners Farr Scotland Farr - CIPHER Farr N8 Manchester Strengthening health informatics research MRC coordinated 10-partner £19m call for e-health informatics research centres across the UK Cutting edge research using data linkage capacity building Additional £20m capital to create Farr Institute UK Health Informatics Research Network Coordinate training, share good practice and develop methodologies Engage with the public, collaborate with industry and the NHS

6 Who is Farr? “Diseases are more easily prevented than cured and the first step to their prevention is the discovery of their exciting causes.” William Farr

7 “To harness health data for patient and public benefit by setting the international standard in trustworthy reuse of electronic patient records and related linkable data for large-scale research.” Our Vision

8 Our Ten Key Activities 1.Collaborative Leadership 6. Meta Data and Enabling Datasets 2.Cutting edge Research 7. Harmonised eInfrastructure 3.Public engagement 8. Partnerships 4.Governance (safe havens)9. Training/ Capacity Building 5.Methods development 10. Communications To deliver impact nationally an internationally

9 Various developments across the UK Considerable number of initiatives UK – Farr Institute – Administrative Data Research Centres/Network England – Health and Social Care Information Centre – Clinical Practice Research Datalink Northern Ireland – Northern Ireland Longitudinal study Scotland – Information Services Division, ISD Scotland – Electronic Data Research and Innovation Service eDRIS Wales – SAIL databank

10 Steps in utilising health information for research 1.Building trust, partnerships and collaboration 2.Development of anonymisation and linkage techniques 2.Quality assessment and appraisal of datasets 2.Use of datasets to support research SAIL uses a split file, trusted third party (TTP), multi-stage encryption, and step wise and restricted field remote access analysis system to ensure privacy protection Lyons RA, et al.The SAIL databank: linking multiple health and social care datasets. BMC Med Inform Decis Mak. 2009 Jan 16;9:3.

11 Secure Anonymised Information Linkage (SAIL) databank SAIL : a multi-sourced data bank of linkable anonymised data on the population of Wales: health service operational systems national databases clinical and biological data education, housing, social care, etc. Uses a trusted third party, split file and multiple encryption technologies to create Anonymised Linkage Fields (ALFs) for individuals and residences SAIL Gateway is a remote access analysis facility to curtailed data.

12 SAIL split file/trusted third party methodology Anonymisatio n process HIRU (Blue C) Demographi c data only Clinical / activity data Recombine Other recombined data Validated, anonymised data Encrypt and load Operational system NHS Wales Informatics Service Data Provider HIRU (Blue C) Construct ALF ValidateTrace & Geo-code

13 Datasets in SAIL (incomplete coverage) Administrative Health: Population Inpatients Outpatients Emergency Department Child Health Database Wales NHS Direct Wales Administrative Non-Health: Births Deaths Educational Attainment Social Services Housing Clinically rich data bases: Specialty specific Cancer Incidence Cancer Screening Congenital Anomalies Arthropathies Myocardial Infarction Diabetes Etc. General GP Data Laboratory systems Study specific Embedded trials and cohorts

14 Patient Journey Analysis - Health and Social Care

15 Fetal deaths common with more severe malformations Fetus does not have an ‘identity’ such as an NHS number Ther e may be multiple fetuses Babies often leave hospital with incomple name – ‘Baby Surname’ Early neonatal deaths - not registered with GP However, possible to link maternal and baby NHS numbers if systems like National Community Child Health Databases in Wales exist NN4B Partcular difficulties with congenital anomaly research

16 Modern cohorts/registries designed for multi-modal data linkage –Huge amounts of data –Different database structures/sizes –Major challenges when creating cross/cohort/platform analyses –Semantic interoperability /data harmonisation issues Original metadata - standards Variable definitions from baseline/laboratory results Variable definitions from routine GP/hospital data –GP Read codes: UK/NZ, user variation+++ –UK Inpatient data – different in Wales/England/Scotland –Too difficult to move very large and complex data Recipients would need to design/implement very complex data structures just to receive data Privacy protection essential –Potential for ‘jigsaw’ attacks, threat from reidentification scientists World-wide shortage of skills and expertise in managing these challenges –No single institution with all necessary skills –Need for international collaboration –Build upon existing expertise, developments and investments Informatics challenges

17 22 cohorts involved UK Biobank – greatest variety – Baseline survey – Baseline anthropometrics/ physiological measurements (continuous/categorical) – Baseline biochemistry/haematology – Genomics – 821,000 SNPs – Imaging: retinal/MRI/US – Accelerometer data – Follow up Death and cancer registry Primary care Hospital data Disease registries Self reported conditions/status Functional/cognitive impairment Cohort Data in UK Dementia Platform

18 Possibilities: 1.Access only available to investigators at the host institutions site 2.Pre-specified analyses are conducted by the host institution with results sent to external researchers 3.Data, or subsets of data, are transferred to external researchers 4.The host institution operates a remote analysis platform in which external researchers can carry out simple or complex analyses 5.The host institution facilitates remote access to external research enquiry tools, e.g. DataShield 6.Data are transferred to a data platform where data can be downloaded to external researchers, e.g. UK Data Archive 7.Data are locally managed on a remote analysis platform containing multiple cohorts to enable cross-cohort analyses at the individual record level (UKSeRP) Access options for cohort data – establish preferences

19 Built upon SAIL Gateway developments Built with MRC capital infrastructure for Farr Institute – bid supported by ALSPAC, UK Biobank, LifeStudy cohorts A national / international resource delivered through FARR – A secure environment to enable research groups to conform to best practices of data management, security and information governance – A remote access large scale IT infrastructure with standard and bespoke analytical tools Leaves data ownership with the cohorts – devolved account and access control – information governance responsibility & control with projects Researchers focus on the science Remote analysis platform for multiple cohorts: UK Secure e-Research Platform (UK SeRP)

20 UK Secure eResearch Platform -UKSeRP IBM DB2 MP-DB SQL 2014 Cluster HADOOP Cluster Virtualisation Stack Virtual Desktops PORTAL IBM ICA PostgreSQL + Post GIS ARCGIS NRDA Security Probabilistic Linkage Data Catalogue, Documentation, Metrics, Quality T1T2T3 Shared Filestore Doc / Community Support HPC / Specialist

21 Multidisciplinary collaborative project Platform for translating routinely collected data into an anonymised population level child e-cohort Investigate the widest possible range of social and environmental determinants of child health and social outcomes Inform the development of interventions to reduce health inequalities of children in Wales Two phases:- Phase 1: proof of concept - Phase 2: dynamic capabilities Wales Electronic Cohort for Children (WECC)

22 Birth records (ONS births) Mortality records (ONS deaths) Wales Electronic Cohort for Children N=981,404 Wales Electronic Cohort for Children N=981,404 WECC eligibility criteria applied Data cleaning: rules for removal of duplicates and errors WDS Child Health (NCCHD) Child Health (NCCHD) ALF_E WDS: Welsh Demographic Service, NCCHD: National Community Child Health, ONS: Office for National Statistics WECC development

23 Links with health and education data via ALF_E Links with maternal health data via mALF_E Links with SAIL eGIS data via ALF_E/RALF_E WECC core n = 981,404 ♂: 500,181 (51.0%) ♀ : 481,205 (49.0%) WECC core n = 981,404 ♂: 500,181 (51.0%) ♀ : 481,205 (49.0%) Inpatient GP consultations Perinatal and Child health Environment House Moves Non-Welsh births n=215,095 ♂: 107,222 (49.8%) ♀ : 107,872 (50.2%) Non-Welsh births n=215,095 ♂: 107,222 (49.8%) ♀ : 107,872 (50.2%) Born in Wales n= 766,309 ♂: 392,959 (51.3%) ♀ : 373,333 (49.0%) Born in Wales n= 766,309 ♂: 392,959 (51.3%) ♀ : 373,333 (49.0%) WECC derived tables National dataset Education

24 I.Influence of maternal and child health factors on time to first admission with a respiratory disorder (Paranjothy S. et al (2013) Pediatrics 132:6 e1562-e1569) II.Influence of head injuries on educational attainment at age 7 (Gabbe B.J. et al (2014)Journal of Epidemiology and Community Health, J Epidemiol Community Health. 68:5 466-470 ) III.Educational outcomes for frequent movers (Hutchings H. et al (2013) PLoS One. 8 (8) e70601) IV.Influence of the physical social and environment on childhood obesity Examples of analyses

25 Background to WECC phase 2 Poor educational attainment  unemployment and/or low salary  ill-health A greater understanding of factors underlying education inequalities is necessary to target interventions to protect future generations from poverty and ill health. Health of the child E Environ ment Family size Household illness Unemployment Ill health Low salary Educational attainment

26 1.Does moving to a less deprived community influence child health and educational outcomes? 2.To what extent do serious childhood or family health conditions affect educational outcomes? 3.Is poor educational attainment a risk factor for adverse health in adolescence? 4.Can a novel hybrid cohort study; embedding a traditional detailed survey cohort e.g. Millennium Cohort Study (MCS) within D-WECC be used to evaluate the strengths and weaknesses of using e-cohorts for epidemiological studies? Research questions

27 Individual linkage – Mortality data : survival and cause of death – GP and hospital activity: health service impact/comorbidy – Laboratory and imaging systems: severity of condition/comorbidity – Education attainment: social impact of condition – Work and benefits: social impact/disability Family/household linkage – Impact on the wider family Data linkage and long term outcomes


29 Time to the first emergency respiratory hospital admission Risk decreased with each successive week in gestation up to 40 – 42 weeks. Risk further increased for babies that were small for gestational age. The increased risk is small for late preterm infants but the number affected is large and will impact on healthcare services.

30 Head injury and school performance J Epidemiol Community Health 2014;68:466-470 doi:10.1136/jech-2013-203427 For children entering the school, what is the association between preceding head injury and KS1 (age 5-7 years) performance? n=116,154 Born in Wales Sept 1998- Aug 2001 n=90,661 Valid KS1 result n=290 Head injury admission n=90,371 No head injury n=101,892 Remaining in Wales n=14,262 Left Wales

31 Association between head injury and satisfactory performance on KS1 PredictorOR (95% CI)AOR (95% CI) Head injury None (reference) Skull fracture Concussion Intracranial injury 1 0.73 (0.50, 1.09) 0.85 (0.33, 2.16) 0.50 (0.33, 0.75) 1 0.79 (0.52, 1.18) 0.87 (0.31, 2.49) 0.46 (0.30, 0.72) Gender Male (reference) Female - 1 1.95 (1.87, 2.03) Townsend deprivation index quintile 1 (Least deprived) (reference) 2 3 4 5 (Most deprived) - 1 0.64 (0.59, 0.69) 0.49 (0.45, 0.52) 0.38 (0.35, 0.41) 0.26 (0.24, 0.28) Age at KS1 assessment (years)-2.77 (2.60, 2.97) Birth weight (kg)-1.41 (1.35, 1.47) Gestational age (weeks)-1.01 (1.00, 1.03)

32 Household level linkage

33 Soon - a tidal wave of data… Full genome sequence ~£3,000 Dropping in price 10x every 2-4 years Existing NHS genetic test ~£1,000 Disk cost to store individuals variations ~10p Development of continuous monitoring and remote sensors Data from many other sources New approaches needed for accessing, manipulating, visualizing Requires entirely new perspective

34 Expect further development of data linkage capabilities across the UK However, capacity is a major issue Amount of work needed is often underestimated Ensuring privacy is protected and that the public are engagement and accept this research approach are key activities The future is bright


Download ppt "Data linkage: the key to long term outcomes Professor Ronan Lyons Farr Institute – CIPHER Centre for Improvement in Population Health through E-records."

Similar presentations

Ads by Google