Presentation is loading. Please wait.

Presentation is loading. Please wait.

Implementation of Probabilistic Matching in NYC Chronic Hepatitis B and NYC A1C Registries, and Implications Towards an MPI Maushumi Mavinkurve Director,

Similar presentations


Presentation on theme: "Implementation of Probabilistic Matching in NYC Chronic Hepatitis B and NYC A1C Registries, and Implications Towards an MPI Maushumi Mavinkurve Director,"— Presentation transcript:

1 Implementation of Probabilistic Matching in NYC Chronic Hepatitis B and NYC A1C Registries, and Implications Towards an MPI Maushumi Mavinkurve Director, Center for Data Matching NYC Department of Health and Mental Hygiene October 17th, 2008 Integrated Surveillance Seminar

2 Overview Describe data quality challenges in disease surveillance
Describe probabilistic matching techniques Implementation of probabilistic matching NYC Chronic Hepatitis B Registry (LVR) NYC Hemoglobin A1C Registry (NYCAR) NYC proposed challenges and benefits of an MPI

3 Public Health Surveillance
Public health surveillance process includes: Collection of Data on a specific disease or condition via standardized information systems Analysis and interpretation the data Dissemination of information to individuals who can act on it Utilization of information to facilitate necessary response that will effectively deal with the public health issue

4 Surveillance Data Quality Issues
Accuracy Non-standardized across different data sources Multiple laboratory systems De-duplication of reports Exact duplicates Multiple events linked to a unique person Non-relevant information Accuracy refers to the difference between an estimate of a parameter and its true value. We characterize the difference in terms of systematic (bias) and random (variance) errors. Completeness Integrity Timeliness refers to the length of time between the reference period of the information and when we deliver the data product to our customers. Relevance refers to the degree to which our data products provide information that meets our customers’ needs. Accessibility refers to the ease with which customers can identify, obtain, and use the information in our data products. Interpretability refers to the availability of documentation to aid customers in understanding and using our data products. This documentation typically includes: the underlying concepts; definitions; the methods used to collect, process, and analyze the data; and the limitations imposed by the methods used. Transparency refers to providing documentation about the assumptions, methods, and limitations of a data product to allow qualified third parties to reproduce the information, unless prevented by confidentiality or other legal constraints.

5 Impact of Data Quality Issues in Surveillance
Impacts on surveillance reporting Over or underestimates of true cases Geographical misrepresentation (missing address) Increases costs Additional staff required to address data quality issues Increases inefficiencies Timeliness for patient or provider follow up

6 Addressing Data Quality Challenges
Modern disease surveillance information systems: Validates data at time of collection Minimize inaccurate or incomplete data Standardizes different data to uniform structure Integrates matching technology to create Patient indexes (person-centric systems vs event-centric systems) Providers indexes Facility indexes Could refer to these as each registry as a system that will ultimately feed from a larger MPI.

7 What is Probabilistic Matching?
Rule based match algorithms Standardizes Data Parses data into smaller tokens Create fields that enhance matching Adapt to specific data - incorporates uniqueness or frequency of data values when comparing records Processes data in blocks – viable to use on large volume data sets Rule based match algorithms employing fields that uniquely identify an entity – name, dob, gender, telephone, etc. Standardizes Data Formalizes names: Mike  Michael Parses data into smaller tokens Addressline1  house number, street name, street type, apt # Create fields that enhance matching Phonetic coding: Soundex, NYSIIS Hash and packed keys Adapt to specific data - incorporates uniqueness or frequency of data values when comparing records “Mary Jones” vs “Maushumi Mavinkurve” Processes data in blocks – viable to use on large volume data sets

8 Evaluating Match Algorithm
Outcome of a potential match is a weight or likelihood that 2 records are the same entity Surveillance programs identify thresholds for match algorithm Prior to reviewing results of match algorithm: Identify implications for precision (PPV) vs negative predictive valuen (NPV) Evaluation of health code mandate Practical issues Surveillance reporting Identify guidelines or criteria to review matches

9 Identifying Thresholds
Goal: maximize precision or PPV Sacrifice on negative predictive value (NPV) Surveillance programs can decide to review ambiguous matches Therefore - set high thresholds

10 Outcome of Probabilistic Matching
Entity-centric, relational registry system

11 Background of Hepatitis B in NYC
Decline in acute Hepatitis B incidents case rates (per 100,000 persons) from 11.5 in 1985 to 1.6 in 2006 In NYC burden of chronic Hepatitis B infection as much as 2x higher within specific populations MSM IDU Persons born in regions where HBsAg prevalence >2% Need for continued surveillance and monitoring Hepatitis B Surface antigen test was developed and FDA approved in 1980’s Decline in acute Hepatitis B incidents rates from 11.5 cases per 100,000 persons in 1985 to 1.6 in 2006 In NYC burden of chronic Hepatitis B infection up to 2x higher within specific populations MSM IDU Persons born in regions where HBsAg prevelance >2% (Asian/PI, Eastern Europe, Middle East, Africa, Pacific Island immigrants) Need for continued surveillance and monitoring Source: recommendations for identification and public health management of persons with chronic Hepatitis B infection

12 Hepatitis B Surveillance Activities
Monitor disease trends Aggregate descriptive reporting aimed to guide prevention and intervention efforts Outreach with newly infected Educational materials to new cases reported to the registry

13 NYC Hepatitis B Registry
Legacy application, built in-house in 1999 Automatic weekly batch uploads of laboratory reports Data entry of provider reports System did not index on patients (event-based), could not link 2 reports for the same person. Program utilized staff to build and apply deterministic match algorithms Resource intensive Version control

14 NYC Liver Virus Registry (LVR)
Implemented in October 2008, built in-house Migrated all legacy data Web-based application Person-centric - integrates probabilistic matching Consolidated views of all information for a person Ability to conduct longitudinal analysis 2 weeks ago! Implemented in October 2008, built in-house Migrated all legacy data – almost 10 years worth of data Web-based application Person-centric - integrates probabilistic matching Consolidated views of all information for a person Ability to conduct longitudinal analysis

15 LVR Probabilistic Matching
Created a match algorithm based on fields unique to patient from laboratory and provider reports Processed all legacy data ~380,000 records Program evaluated algorithm and identified thresholds Results: out of ~380,000 reports the match algorithm was able to link these to ~111,000 unique persons Probabilistic matching enhanced duplication by 1% as compared to legacy deterministic algorithm

16 LVR Challenges & Successes
Iterative review process time and resource intensive Evaluation against legacy deterministic match Identifying target PPV and NPV Successes: Long term savings on time and resources Streamlined system Longitudinal analysis More accurate case counting Enhanced data quality Challenges: Iterative review process time and resource intensive Evaluation against legacy deterministic match – NOT GOLD STANDARD, did not evaluate the legacy match Identifying target PPV and NPV Successes: Long term savings on time and resources

17 Implementing Probabilistic Matching with NYC Hemoglobin A1C Registry (NYCAR)

18 What is Diabetes? Diabetes is a chronic disease caused by inadequate insulin levels or sensitivity leading to elevated blood sugar levels Blood sugar levels can be measured by Plasma glucose Fingerstick glucose Glycosylated hemoglobin or A1C (goal is <7%) Persistently high blood sugar levels can cause Heart disease and stroke Kidney failure Blindness Nerve damage and amputation

19 Diabetes Burden in NYC Diabetes is epidemic in NYC
Prevalence has more than doubled over the past 10 years. Approximately 500,000 New Yorkers have diabetes An additional ~200,000 New Yorkers have diabetes, but have not yet been diagnosed Approximately 1 in 8 adults have diabetes In 2006, diabetes was the 4th leading cause of death in NYC

20 Prevalence of Self-Reported Diabetes Among Adults in NYC
Source: NYC estimates— CDC Behavioral Risk Factor Survey System (BRFSS) , NYC Community Health Survey ( Source: National estimates—BRFSS 2006

21 Use of Traditional Public Health Surveillance for Chronic Disease
Disease reporting to public health agency to: Monitor trends Describe glycemic control in NYC Identify special populations Target individuals with poor control Communicate with provider community Feedback to providers and their patients Control epidemics Decrease complications/improve quality of life

22 Hemoglobin A1C Tests A1C is a measure of average blood sugar control in preceding 3 months (goal <7%) A1C is used to: Monitor individual’s blood sugar control Guide changes in medication therapy Impart risk of diabetes complications Most people who get A1Cs have diabetes so it is a marker for diabetes status THEREFORE, AN A1C REGISTRY WILL PROVIDE A MECHANISM FOR TRACKING INDIVIDUALS WITH DIABETES Goal – to have 7.0% (average blood sugar of 170 mg/dL). THEREFORE, AN A1C REGISTRY WILL PROVIDE A MECHANISM FOR TRACKING INDIVIDUALS WITH DIABETES

23 Implementation of NYCAR
Based on existing NY State / NYC laboratory reporting system Amendment to NYC health code, Article 13 which mandates communicable disease reporting, to include A1C Public hearing Summer 2005 Approval of amendment December 2005 Went into effect January 15, 2006 Laboratories submitting data to NY State and NYC subject to mandate Report information on patient, ordering provider and facility, testing facility and result Submit via secure network Receive ~5,000 new lab reports daily – High Volume Patient advocacy and privacy groups voiced concerns during amendment proceedings Felt no satisfactory rationale for public health agency involvement in chronic disease reporting DOHMH clearly wrote into amendment that information can only be released to: Treating medical provider (s) Patient Patients can opt out of intervention but not from being in the registry Laboratory Reporting: 34 labs reporting A1C tests Test results reported by lab within 24 hours PHINMS – Secure file transmission HL7 messages or ASCII files

24 Objectives of New York City A1C Registry (NYCAR)
Surveillance and epidemiology Track trends on the population level Provider feedback and communication Quarterly provider reports in comparison to peers Quarterly rosters of patients stratified by A1C level Patient feedback (via provider) Letters with A1C information Local resources Deliver resources to providers/patients All of the above requires matching and data linkages Began January 15, 2006 with mandate of electronic lab reporting. Provider Reports: Quarterly reports with patients listed by A1C level will be distributed to providers. Reports may be used to identify individuals who may benefit from additional support, such as intensification of therapy, or a referral to a physical activity program or self-management program. Patient Letters: Letters with recent A1C test results and a reminder to return to care will be sent to patients with high A1C levels.

25 Components of A1C Registry
Information collected by laboratory reports include: Individual name, address, date of birth, sex Name and address of ordering provider, ordering facility and testing facility A1C test collection date and result

26 NYCAR Probabilistic Methodology
Created 3 separate matching models: Patient Provider Ordering Facility Obtained a representative sample of data For each model - created a match algorithm utilizing fields that uniquely identify each entity Name (patient, provider, ordering facility), patient dob, gender, address, providerID, telephone number, etc. Provided match results to program for review and identify thresholds Creates indexes for patient, provider and ordering facility: Each patient appears once and all tests for that individual are linked Each provider appears once all tests reported by that provider are linked Each ordering facility appears once and all reports by that facility are linked Sample used about 100,000 records selected from a specific time period

27 Program Threshold Evaluation
Due to volume of reports, impractical for staff to review all ambiguous matches – need to set thresholds Method to identify of thresholds using sample 2 reviewers and 1 tie-breaker scored matches referencing guidelines Utilized a sampling method within weight ranges Identified specific weight or threshold at which target precision rates were met based on review

28 Deploying Probabilistic Matching
All new incoming A1C lab reports parsed into 3 staging entities: patient, provider and facilities Each entity is matched against existing respective entities in the registry If matched above thresholds, linked to an existing record If below thresholds, creating a new entity (patient, provider or facility) Provider Reports and Rosters and Patient Letters are generated using an in-house developed application which reads from the registry On a weekly basis – the following process occurs

29 Facility Report Page 2 Note: All information in this slide is fictitious Page 1

30 Provider Report Note: All information in this slide is fictitious

31 Patient Letter GET NEW VERSION SCANNED IN 31

32 Challenges and Successes
Quality of record linkage Need sufficient information for successful linkage of multiple tests per individual as well as master provider and facility indexing Maintaining accurate facility-provider linkage Effect of laboratory variation – availability of data Review thresholds – time and resource intensive Successes Entire process is seamless, electronic and automated High volume of data Ability to conduct Longitudinal analysis Quality of record linkage Misconceptions – bad quality data cannot be reconciled! Need sufficient information for successful linkage of multiple tests per individual as well as master provider and facility indexing Maintaining accurate facility-provider linkage -1 large umbrella facility can have multiple names -providers can work for multiple facilities Case definitions Individuals with diabetes Provider for a given patient Effect of laboratory variation Impact of inter-laboratory variation – data integrity and availability

33 Is NYC ready for an MPI?

34 NYC Current Status Modernizing several disease registries:
Chronic Hepatitis B - completed NYCAR – completed STD – requirements completed TB – requirements completed HIV – planning Is this an opportune time to develop an MPI?

35 Planning an MPI: Challenges
Each registry program has requirements for a matching based on: Patient population Data quality and volume Dissemination/Use of Surveillance data Foster consensus among disease programs Breach of Security – higher risk Legal barriers to creating an MPI Analysis of health code by reportable disease Political barriers to creating an MPI Challenges: Each registry program has requirements for a matching based on: Patient population Data quality and volume Dissemination/Use of Surveillance data Foster consensus among disease programs Legal barriers to creating an MPI laws, particularly in NYC, are extremely specific. Mandated reportables must be used for the purpose of surveillance and epidemiology of that specific disease – is an MPI a stretch? Particularly A1C data – this is the most strictly written health code.

36 Planning an MPI: Benefits
Pooling data from different sources could enhance PPV and NPV of the match Streamline IT resources Support staff Infrastructure Ability to conduct syndemic surveillance and investigation More efficient use of limited resources Syndemic is defined as two or more afflictions, interacting synergistically, contributing to excess burden of disease in a population

37 Acknowledgements Diabetes Prevention and Control Program Lynn Silver
Shadi Chamany Angela Merges Charlotte Neuhaus Bahman Tabei Cindy Driver Leslie Korenda Division of Informatics and Information Technology Don Weiner Stephen Giannotti Namrata Kumar Jisen Ho Laura Goodman Bureau of Chronic Disease Control Katherine Bornschlegl Magdalena Berger Emily Lumeng Division of Epidemiology Lorna Thorpe Bonnie Kerker Jenna Mandel-Ricci Ram Koppaka

38 Questions? Maushumi Mavinkurve Director, Center for Data Matching
NYC Department of Health and Mental Hygiene (P)


Download ppt "Implementation of Probabilistic Matching in NYC Chronic Hepatitis B and NYC A1C Registries, and Implications Towards an MPI Maushumi Mavinkurve Director,"

Similar presentations


Ads by Google