Presentation on theme: "9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s."— Presentation transcript:
9 th May 2006 Data Quality and Ensuring Usability …of routinely collected PC data Presented to Integrating Clinical and Genetic Datasets: Nirvana or Pandora’s Box Presented by Simon de Lusignan email@example.com
About me GP in Guildford 11,500 patient practice 6.5 Whole time equivalent GPs Computerised since 1988 Senior Lecturer, St. Georges Primary Care Informatics (PCI) research group Using routinely collected data for quality improvement + research Electronic libraries Computer in the consultation Telemonitoring Chair PCI WG of EFMI Developing a BSc in BMI
Overview Introduction Benefits from linking clinical + genetic data Growing volumes of accessible primary care data… …increasingly used for quality improvement + research Objective Is it possible to define the features of a routinely collected dataset which can be integrated to genetic data Method Literature review + 10 years of experiential learning working with data Features of “quality” data: 1. What is data quality? 2. Unique identifiers + denominators 3. What need to be defined about data processing + storage Discussion
Introduction “GIVEN” Benefits from linking clinical and genetic data Routinely collected clinical data is used increasingly for: 1. Quality improvement 2. Clinical Audit 3. Health Service Planning 4. Research References: 1. de Lusignan S, van Weel C. The use of routinely collected computer data for research in primary care: opportunities and challenges. Fam Pract. 2006 Apr;23(2):253-63. 2: de Lusignan S, Hague N, van Vlymen J, Kumarapeli P. Routinely collected general practice data are complex but with systematic processing can be used for quality improvement and research. Accepted for publication: Informatics in primary care
Objective To define the features of clinical data which make them fit for integration with genetic data
Features of “quality” data Defining Data Quality Unique identitifiers Defined process of data extraction + storage
Defining data quality Evolving definitions: Completeness + accuracy (Pringle et al. BJGP 1995) Currency (Williams, Methods 2003) Sensitivity + positive predictive value (Thiru et al., BMJ 2003) Data Quality Probe (Brown + Warmington IPC 2003) “Fit for purpose” (PCI WG EFMI, 2005)
Unique IDs Linkage of data Interoperability of systems Follow-up / traceability of individuals Population denominator + ghosts…. England + Wales - NHS number Scotland - CHI number Our system “MIQUEST” unique ID for one practice + compound with study number + unique ID for practice Convert to non-case sensitive ASCII format
Processing data (1) Appreciation of data entry issues + contemporary perspective of system users; (2) Defined stages of data processing + applications used at each stage, + quality controls; (3) Archive coding systems and the look-up tables used to infer meaning or rubrics; (4) The queries used to extract the data; (5) A metadata system to ensure traceability of each cell of data; (6) The ethical constraints that apply to the dataset.
(1) Data entry issues + contemporary perspective of users COPD and Bronchitis codes are easily confused Recoding half of the practice asthmatics from a diagnosis to “history of” code Ref: Faulconer ER, de Lusignan S. An eight-step method for assessing diagnostic data quality: COPD as an exemplar. Inform Prim Care. 2004;12(4):243-54.
(2) Defined stages of data processing We have defined eight discrete steps in data processing: (1) Design of queries, + piloting, (2) Data: entry, (already dealt with) (3) Extraction, (4) Migration, unique IDs essential (5) Integration, (6) Cleaning, (7) Processing, and (8) Analysis Ref:van Vlymen J, de Lusignan S, Hague N, Chan T, Dzregah B. Ensuring the Quality of Aggregated General Practice Data: Lessons from the Primary Care Data Quality Programme (PCDQ). Stud Health Technol Inform. 2005;116:1010-5.
(3) Archive coding systems…. Coding systems are constantly evolving In general coding systems are becoming larger + more complex You can go from many to few; but not from few to many… We archive: Clinical codes look-up engine used e.g. NHS Triset Browser Each relevant version E.g. 4 and 5-Byte Read Codes; Drug Dictionary, Proprietary codes
(4) The query library Re-issued by date Query set for each clinical programme e.g. C1, C2, C3 – Cardiac programme Query set for each extraction type e.g. E4, E5, G4, G5 (E for EMIS, G for Generic) Defined look-up tables + rubrics for queries
(5) Metadata system Follows data from query set to analysis Preserves original data Derived variables clearly identified Associated dates + numerics labelled Rules for units used Look-up table used to define variable names van Vlymen J, de Lusignan S. A system of metadata to control the process of query, aggregating, cleaning and analysing large datasets of primary care data. Inform Prim Care. 2005;13(4):281-91.
Source data – metadata structure originating query set bigram query file Read code / CCC repeat index type bigram C2_PDNP P1 _G3_1_DI BIGRAMMEANING DI Diagnosis RX Drugs Prescription OC Occupation HO History Symptoms OE Examination Signs
Data quality is best defined in terms of “Fitness for purpose” - What purpose when? Transparent methods of data processing allow audit of results Understanding data entry issues / context is essential Metadata can help control processing Careful curation of data may allow its use beyond the timescale of the original study
9 th May 2006 Thanks for listening Simon de Lusignan Tel: 020 8725 5661 Fax:020 8767 7697 Email:firstname.lastname@example.org Web:www.gpinformatics.org www.sgul.ac.uk/informatics/