Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research Databases for NRES London 29 th Feb 2012.

Similar presentations


Presentation on theme: "Research Databases for NRES London 29 th Feb 2012."— Presentation transcript:

1 Research Databases for NRES London 29 th Feb 2012

2 JHC roles 1.Research chair at UoN –epidemiology, risk prediction and drug safety 2.Member of the ECC NIGB 3.Developed and run the not-for-profit QResearch database with EMIS 4.Inner city GP

3 Outline Background Key ethical issues ScientificConfidentiality Example of QResearch Data linkage and pseudonymisation Discussion /questions

4 Background Large volumes of electronic data now collected in the NHSLarge volumes of electronic data now collected in the NHS Huge potential for useful researchHuge potential for useful research Technology exists to extract data and assemble it into databasesTechnology exists to extract data and assemble it into databases Databases popular with academics and DHDatabases popular with academics and DH Large numbers for studiesLarge numbers for studies Relative efficiencyRelative efficiency Increasing potential for data linkagesIncreasing potential for data linkages

5 Definition research database in NRES SOP “a structured collection of individual level personal information, which is stored for potential research purposes beyond tehh life of a specific research project with defined end points”“a structured collection of individual level personal information, which is stored for potential research purposes beyond tehh life of a specific research project with defined end points” Includes databases set up for researchIncludes databases set up for research Re-use of databases established forRe-use of databases established for - audit - audit - disease registers - disease registers

6 Research databases Included in NRES SOPsIncluded in NRES SOPs Specific section within IRAS formSpecific section within IRAS form Approvals generally for 5 years renewableApprovals generally for 5 years renewable Can include generic approvalCan include generic approval Can include providing data to third parties as part of a research serviceCan include providing data to third parties as part of a research service Detailed protocol required on purpose, operation, methods, policies, governanceDetailed protocol required on purpose, operation, methods, policies, governance

7 New research databases What is the purpose?What is the purpose? Do we need a new one or can an existing database be used?Do we need a new one or can an existing database be used? Who is will ‘own’ it and be responsible for it?Who is will ‘own’ it and be responsible for it? What data will it contain and how will it be accessed?What data will it contain and how will it be accessed? What is the governance framework ?What is the governance framework ? Will it contain identifiable data +/- consent?Will it contain identifiable data +/- consent? ? S251 support required? S251 support required

8 Key objectives for safe data sharing Patient and their data Minimise risk Privacy Maximise public benefit Maintain public trust

9 Three main options for data access Patient and their data Minimise risk Privacy Maximise public benefit Maintain public trust consent Pseudo nymisation s251

10 De-identification Various methods to reduce identifiability of dataVarious methods to reduce identifiability of data PseudonymisationPseudonymisation Use of samples and limited data items rather than whole databaseUse of samples and limited data items rather than whole database Conversion of dob to year of birth or age.Conversion of dob to year of birth or age. Contracts/data sharing agreements with clear liabilities and penalitiesContracts/data sharing agreements with clear liabilities and penalities

11 Example for QResearch Established in 2002 to support ethical medical researchEstablished in 2002 to support ethical medical research Largest of three UK databases & expandingLargest of three UK databases & expanding Management board – UoN and EMIS.Management board – UoN and EMIS. Advisory board – professional and lay representation. Advises on policy, strategy etc.Advisory board – professional and lay representation. Advises on policy, strategy etc. Scientific Board – review science and risk assessment.Scientific Board – review science and risk assessment.

12 QResearch key facts Large pseudonymised databaseLarge pseudonymised database >700 GP practices, 14 million patients>700 GP practices, 14 million patients Patient and event level dataPatient and event level data Demographics – year birth, sex, ethnicityDemographics – year birth, sex, ethnicity Diagnoses, Lab results, clinical valuesDiagnoses, Lab results, clinical values Medication, referralsMedication, referrals No free text. No strong identifiersNo free text. No strong identifiers All research peer reviewed & published.All research peer reviewed & published.

13 QResearch uploads informed consent from practiceinformed consent from practice Practice displays notice in waiting roomPractice displays notice in waiting room Practice activates upload softwarePractice activates upload software Data pseudonymised BEFORE data leaves practiceData pseudonymised BEFORE data leaves practice Patients can be opted out of uploadPatients can be opted out of upload Secure upload to server at EMIS with full NHS security clearanceSecure upload to server at EMIS with full NHS security clearance Backups delivered to UniversityBackups delivered to University

14 QResearch - security Full database stored on off line serverFull database stored on off line server Full encryption of hard driveFull encryption of hard drive Key padded server room with limited accessKey padded server room with limited access 24 hour CCTV with monitoring24 hour CCTV with monitoring Confidentiality clauses in staff contractsConfidentiality clauses in staff contracts Full log of all data accessesFull log of all data accesses Log of all uses of dataLog of all uses of data No losses data or breaches in 10 yearsNo losses data or breaches in 10 years

15 QResearch policy Whilst all data are pseudonymised, we have same safeguard as it identifiableWhilst all data are pseudonymised, we have same safeguard as it identifiable To minimise any risks of re-identification patients (and practices)To minimise any risks of re-identification patients (and practices) To maintain public and professional trustTo maintain public and professional trust Explicit policy to ensure all results of research studies are widely and freely available for public benefit.Explicit policy to ensure all results of research studies are widely and freely available for public benefit.

16 Researcher access University based academicsUniversity based academics One must be GMC registeredOne must be GMC registered Standard application formStandard application form Clarify research question and methodsClarify research question and methods Independent Scientific reviewIndependent Scientific review Provided with sample size and data items needed to answer questionProvided with sample size and data items needed to answer question Data only used for agreed purposeData only used for agreed purpose Data destroyed after project completedData destroyed after project completed

17 Why is it important to ensure robust scientific methods Published research must give valid results which don’t mislead or misinform doctors, patients, policy makersPublished research must give valid results which don’t mislead or misinform doctors, patients, policy makers Equally need to avoid unpublished research – eg a good study with important resultsEqually need to avoid unpublished research – eg a good study with important results Avoid duplication effortAvoid duplication effort Avoid publication biasAvoid publication bias Avoid suppression of unpopular results (eg side effects medicines)Avoid suppression of unpopular results (eg side effects medicines)

18 Ensuring scientific quality 1.Is there a clear research question? 2.Can the data answer the question? 3.Are the methods scientifically valid? 4.Are the results likely to be generalisable? 5.Does team have skills to do the project 6.Is the researcher free to publish? Some databases with generic REC agreement will organise independent scientific review to answer the above.Some databases with generic REC agreement will organise independent scientific review to answer the above.

19 Risk to confidentialty Each study needs risk assessment even if pseudonymisedEach study needs risk assessment even if pseudonymised Could the study lead to identification of the patients because ofCould the study lead to identification of the patients because of - other data that the researcher might have - other data that the researcher might have - small numbers/rare events - small numbers/rare events Minimise risk by de-identification dataMinimise risk by de-identification data Data sharing agreement & sanctions for misconduct.Data sharing agreement & sanctions for misconduct.

20 QResearch data linkage study Linked to deprivation in 2002Linked to deprivation in 2002 Linked to ONS cause death in 2007Linked to ONS cause death in 2007 Currently being linked to HES and cancer registryCurrently being linked to HES and cancer registry Testing out new method of data linkage using pseudonymised data linkageTesting out new method of data linkage using pseudonymised data linkage Exceptionally high levels of valid, complete NHS numbers for ONS data, HES, GP dataExceptionally high levels of valid, complete NHS numbers for ONS data, HES, GP data

21 Open pseudonymiser project Need approach which doesn’t extract identifiable data but still allows linkageNeed approach which doesn’t extract identifiable data but still allows linkage Legal ethical and NIGB approvalsLegal ethical and NIGB approvals Secure, ScalableSecure, Scalable Reliable, AffordableReliable, Affordable Generates ID which are Unique to projectGenerates ID which are Unique to project Can be used by any set of organisations wishing to share dataCan be used by any set of organisations wishing to share data Pseudoymisation applied as close as possible to identifiable data ie within clinical systemsPseudoymisation applied as close as possible to identifiable data ie within clinical systems

22 Pseudonymisation: method Scrambles NHS number BEFORE extraction from clinical systemScrambles NHS number BEFORE extraction from clinical system Takes NHS number + project specific encrypted ‘salt code’ One way hashing algorithm (SHA2-256) – no collisions and US standard from 2010 Applied twice - before leaving clinical system & on receipt by next organisation Apply identical software to second datasetApply identical software to second dataset Allows two pseudonymised datasets to be linkedAllows two pseudonymised datasets to be linked Cant be reversed engineeredCant be reversed engineered

23

24 Web tool to create encrypted salt: proof of concept Web site private key used to encrypt user defined project specific saltWeb site private key used to encrypt user defined project specific salt Encrypted salt distributed to relevant data supplier with identifiable dataEncrypted salt distributed to relevant data supplier with identifiable data Public key in supplier’s software to decrypt salt at run time and concatenate to NHS number (or equivalent)Public key in supplier’s software to decrypt salt at run time and concatenate to NHS number (or equivalent) Hash then appliedHash then applied Resulting ID then unique to patient within projectResulting ID then unique to patient within project

25 Openpseudonymiser.org WebsiteWebsite Desktop applicationDesktop application Software for integrationSoftware for integration Test dataTest data DocumentationDocumentation Utility to generate encrypted salt codesUtility to generate encrypted salt codes Source code GNU GPLSource code GNU GPL

26 Progress so far Pseudonymised entiredPseudonymised entired HES database since 1997HES database since 1997 Cause of death data since 1993Cause of death data since 1993 Cancer registrations since 1990Cancer registrations since 1990 Linked all three datasets based only on pseudo NHS number - >99% completeLinked all three datasets based only on pseudo NHS number - >99% complete Due to linked GP data Spring 2012Due to linked GP data Spring 2012 Implementing into major GP computer systemsImplementing into major GP computer systems

27 Key points Pseudonymisation at sourcePseudonymisation at source Instead of extracting identifiers and storing lookup tables/keys centrally, then technology to generate key is stored within the clinical systemsInstead of extracting identifiers and storing lookup tables/keys centrally, then technology to generate key is stored within the clinical systems Use of project specific encrypted salted hash ensures secure sets of ID unique to projectUse of project specific encrypted salted hash ensures secure sets of ID unique to project Full control of data controllerFull control of data controller Can work in addition to existing approachesCan work in addition to existing approaches Open source technology so transparent & freeOpen source technology so transparent & free

28 Definition of clinical care team Important as determines whether s251 requiredImportant as determines whether s251 required Tendency by research community to adopt v broad definition to justify accessTendency by research community to adopt v broad definition to justify access Definition is tricky as a guideDefinition is tricky as a guide Individual has a duty of care to patientIndividual has a duty of care to patient Has duty of confidenceHas duty of confidence Would be recognised in that role by a reasonable patientWould be recognised in that role by a reasonable patient

29 Implications of Open Data VERY difficult to see how patient level data can be suitably de-identified so that it can published on line to meet Cameron’s promisesVERY difficult to see how patient level data can be suitably de-identified so that it can published on line to meet Cameron’s promises Current work on de-identification standard by IC/DH to help custodians decided when data can be published.Current work on de-identification standard by IC/DH to help custodians decided when data can be published.


Download ppt "Research Databases for NRES London 29 th Feb 2012."

Similar presentations


Ads by Google