Presentation is loading. Please wait.

Presentation is loading. Please wait.

Implementation of Probabilistic Matching in NYC Chronic Hepatitis B and NYC A1C Registries, and Implications Towards an MPI Maushumi Mavinkurve Director,

Similar presentations

Presentation on theme: "Implementation of Probabilistic Matching in NYC Chronic Hepatitis B and NYC A1C Registries, and Implications Towards an MPI Maushumi Mavinkurve Director,"— Presentation transcript:

1 Implementation of Probabilistic Matching in NYC Chronic Hepatitis B and NYC A1C Registries, and Implications Towards an MPI Maushumi Mavinkurve Director, Center for Data Matching NYC Department of Health and Mental Hygiene October 17 th, 2008 Integrated Surveillance Seminar

2 Overview Describe data quality challenges in disease surveillance Describe probabilistic matching techniques Implementation of probabilistic matching –NYC Chronic Hepatitis B Registry (LVR) –NYC Hemoglobin A1C Registry (NYCAR) NYC proposed challenges and benefits of an MPI

3 Public Health Surveillance Public health surveillance process includes: –Collection of Data on a specific disease or condition via standardized information systems –Analysis and interpretation the data –Dissemination of information to individuals who can act on it –Utilization of information to facilitate necessary response that will effectively deal with the public health issue

4 Surveillance Data Quality Issues Accuracy Non-standardized across different data sources –Multiple laboratory systems De-duplication of reports –Exact duplicates –Multiple events linked to a unique person Non-relevant information

5 Impact of Data Quality Issues in Surveillance Impacts on surveillance reporting –Over or underestimates of true cases –Geographical misrepresentation (missing address) Increases costs –Additional staff required to address data quality issues Increases inefficiencies –Timeliness for patient or provider follow up

6 Addressing Data Quality Challenges Modern disease surveillance information systems: Validates data at time of collection –Minimize inaccurate or incomplete data Standardizes different data to uniform structure Integrates matching technology to create –Patient indexes (person-centric systems vs event- centric systems) –Providers indexes –Facility indexes

7 What is Probabilistic Matching? Rule based match algorithms Standardizes Data Parses data into smaller tokens Create fields that enhance matching Adapt to specific data - incorporates uniqueness or frequency of data values when comparing records Processes data in blocks – viable to use on large volume data sets

8 Evaluating Match Algorithm Outcome of a potential match is a weight or likelihood that 2 records are the same entity Surveillance programs identify thresholds for match algorithm Prior to reviewing results of match algorithm: –Identify implications for precision (PPV) vs negative predictive valuen (NPV) Evaluation of health code mandate Practical issues Surveillance reporting –Identify guidelines or criteria to review matches

9 Identifying Thresholds Goal: maximize precision or PPV Sacrifice on negative predictive value (NPV) Surveillance programs can decide to review ambiguous matches Therefore - set high thresholds

10 Outcome of Probabilistic Matching Entity-centric, relational registry system

11 Background of Hepatitis B in NYC Decline in acute Hepatitis B incidents case rates (per 100,000 persons) from 11.5 in 1985 to 1.6 in 2006 In NYC burden of chronic Hepatitis B infection as much as 2x higher within specific populations –MSM –IDU –Persons born in regions where HBsAg prevalence >2% Need for continued surveillance and monitoring Source: recommendations for identification and public health management of persons with chronic Hepatitis B infection

12 Hepatitis B Surveillance Activities Monitor disease trends Aggregate descriptive reporting aimed to guide prevention and intervention efforts Outreach with newly infected –Educational materials to new cases reported to the registry

13 NYC Hepatitis B Registry Legacy application, built in-house in 1999 Automatic weekly batch uploads of laboratory reports Data entry of provider reports System did not index on patients (event-based), could not link 2 reports for the same person. Program utilized staff to build and apply deterministic match algorithms –Resource intensive –Version control

14 NYC Liver Virus Registry (LVR) Implemented in October 2008, built in- house Migrated all legacy data Web-based application Person-centric - integrates probabilistic matching Consolidated views of all information for a person Ability to conduct longitudinal analysis

15 LVR Probabilistic Matching Created a match algorithm based on fields unique to patient from laboratory and provider reports Processed all legacy data ~380,000 records Program evaluated algorithm and identified thresholds Results: out of ~380,000 reports the match algorithm was able to link these to ~111,000 unique persons Probabilistic matching enhanced duplication by 1% as compared to legacy deterministic algorithm

16 LVR Challenges & Successes Challenges: –Iterative review process time and resource intensive –Evaluation against legacy deterministic match –Identifying target PPV and NPV Successes: –Long term savings on time and resources –Streamlined system –Longitudinal analysis –More accurate case counting –Enhanced data quality

17 Implementing Probabilistic Matching with NYC Hemoglobin A1C Registry (NYCAR)

18 What is Diabetes? Diabetes is a chronic disease caused by inadequate insulin levels or sensitivity leading to elevated blood sugar levels Blood sugar levels can be measured by –Plasma glucose –Fingerstick glucose –Glycosylated hemoglobin or A1C (goal is <7%) Persistently high blood sugar levels can cause –Heart disease and stroke –Kidney failure –Blindness –Nerve damage and amputation

19 Diabetes Burden in NYC Diabetes is epidemic in NYC Prevalence has more than doubled over the past 10 years. Approximately 500,000 New Yorkers have diabetes An additional ~200,000 New Yorkers have diabetes, but have not yet been diagnosed Approximately 1 in 8 adults have diabetes In 2006, diabetes was the 4 th leading cause of death in NYC

20 Prevalence of Self-Reported Diabetes Among Adults in NYC Source: NYC estimates— CDC Behavioral Risk Factor Survey System (BRFSS) 1994-2001, NYC Community Health Survey 2002-2006 ( Source: National estimates—BRFSS 2006

21 Use of Traditional Public Health Surveillance for Chronic Disease Disease reporting to public health agency to: –Monitor trends Describe glycemic control in NYC –Identify special populations Target individuals with poor control –Communicate with provider community Feedback to providers and their patients –Control epidemics Decrease complications/improve quality of life

22 Hemoglobin A1C Tests A1C is a measure of average blood sugar control in preceding 3 months (goal <7%) A1C is used to: –Monitor individual’s blood sugar control –Guide changes in medication therapy –Impart risk of diabetes complications Most people who get A1Cs have diabetes so it is a marker for diabetes status THEREFORE, AN A1C REGISTRY WILL PROVIDE A MECHANISM FOR TRACKING INDIVIDUALS WITH DIABETES

23 Implementation of NYCAR Based on existing NY State / NYC laboratory reporting system Amendment to NYC health code, Article 13 which mandates communicable disease reporting, to include A1C –Public hearing Summer 2005 –Approval of amendment December 2005 –Went into effect January 15, 2006 Laboratories submitting data to NY State and NYC subject to mandate –Report information on patient, ordering provider and facility, testing facility and result –Submit via secure network Receive ~5,000 new lab reports daily – High Volume

24 Objectives of New York City A1C Registry (NYCAR) Surveillance and epidemiology –Track trends on the population level Provider feedback and communication –Quarterly provider reports in comparison to peers –Quarterly rosters of patients stratified by A1C level Patient feedback (via provider) –Letters with A1C information –Local resources Deliver resources to providers/patients All of the above requires matching and data linkages

25 Components of A1C Registry Information collected by laboratory reports include: –Individual name, address, date of birth, sex –Name and address of ordering provider, ordering facility and testing facility –A1C test collection date and result

26 NYCAR Probabilistic Methodology Created 3 separate matching models: –Patient –Provider –Ordering Facility Obtained a representative sample of data For each model - created a match algorithm utilizing fields that uniquely identify each entity –Name (patient, provider, ordering facility), patient dob, gender, address, providerID, telephone number, etc. Provided match results to program for review and identify thresholds

27 Program Threshold Evaluation Due to volume of reports, impractical for staff to review all ambiguous matches – need to set thresholds Method to identify of thresholds using sample –2 reviewers and 1 tie-breaker scored matches referencing guidelines –Utilized a sampling method within weight ranges –Identified specific weight or threshold at which target precision rates were met based on review

28 Deploying Probabilistic Matching All new incoming A1C lab reports parsed into 3 staging entities: –patient, provider and facilities Each entity is matched against existing respective entities in the registry –If matched above thresholds, linked to an existing record –If below thresholds, creating a new entity (patient, provider or facility) Provider Reports and Rosters and Patient Letters are generated using an in-house developed application which reads from the registry

29 Facility Report Note: All information in this slide is fictitious Page 1 Page 2

30 Provider Report Note: All information in this slide is fictitious

31 Patient Letter

32 Challenges and Successes Challenges –Quality of record linkage Need sufficient information for successful linkage of multiple tests per individual as well as master provider and facility indexing Maintaining accurate facility-provider linkage –Effect of laboratory variation – availability of data –Review thresholds – time and resource intensive Successes –Entire process is seamless, electronic and automated –High volume of data –Ability to conduct Longitudinal analysis

33 Is NYC ready for an MPI?

34 NYC Current Status Modernizing several disease registries: –Chronic Hepatitis B - completed –NYCAR – completed –STD – requirements completed –TB – requirements completed –HIV – planning Is this an opportune time to develop an MPI?

35 Planning an MPI: Challenges Each registry program has requirements for a matching based on: –Patient population –Data quality and volume –Dissemination/Use of Surveillance data Foster consensus among disease programs –Breach of Security – higher risk –Legal barriers to creating an MPI Analysis of health code by reportable disease –Political barriers to creating an MPI

36 Planning an MPI: Benefits Pooling data from different sources could enhance PPV and NPV of the match Streamline IT resources –Support staff –Infrastructure Ability to conduct syndemic surveillance and investigation More efficient use of limited resources

37 Acknowledgements Diabetes Prevention and Control Program Lynn Silver Shadi Chamany Angela Merges Charlotte Neuhaus Bahman Tabei Cindy Driver Leslie Korenda Division of Informatics and Information Technology Don Weiner Stephen Giannotti Namrata Kumar Jisen Ho Laura Goodman Division of Epidemiology Lorna Thorpe Bonnie Kerker Jenna Mandel- Ricci Ram Koppaka Bureau of Chronic Disease Control Katherine Bornschlegl Magdalena Berger Emily Lumeng

38 Questions? Maushumi Mavinkurve Director, Center for Data Matching NYC Department of Health and Mental Hygiene (P) 212 515 5182

Download ppt "Implementation of Probabilistic Matching in NYC Chronic Hepatitis B and NYC A1C Registries, and Implications Towards an MPI Maushumi Mavinkurve Director,"

Similar presentations

Ads by Google