Presentation on theme: "Implementation of Probabilistic Matching in NYC Chronic Hepatitis B and NYC A1C Registries, and Implications Towards an MPI Maushumi Mavinkurve Director,"— Presentation transcript:
1 Implementation of Probabilistic Matching in NYC Chronic Hepatitis B and NYC A1C Registries, and Implications Towards an MPIMaushumi MavinkurveDirector, Center for Data MatchingNYC Department of Health and Mental HygieneOctober 17th, 2008Integrated Surveillance Seminar
2 Overview Describe data quality challenges in disease surveillance Describe probabilistic matching techniquesImplementation of probabilistic matchingNYC Chronic Hepatitis B Registry (LVR)NYC Hemoglobin A1C Registry (NYCAR)NYC proposed challenges and benefits of an MPI
3 Public Health Surveillance Public health surveillance process includes:Collection of Data on a specific disease or condition via standardized information systemsAnalysis and interpretation the dataDissemination of information to individuals who can act on itUtilization of information to facilitate necessary response that will effectively deal with the public health issue
4 Surveillance Data Quality Issues AccuracyNon-standardized across different data sourcesMultiple laboratory systemsDe-duplication of reportsExact duplicatesMultiple events linked to a unique personNon-relevant informationAccuracy refers to the difference between an estimate of a parameter and its true value.We characterize the difference in terms of systematic (bias) and random (variance) errors.CompletenessIntegrityTimeliness refers to the length of time between the reference period of the informationand when we deliver the data product to our customers.Relevance refers to the degree to which our data products provide information that meetsour customers’ needs.Accessibility refers to the ease with which customers can identify, obtain, and use theinformation in our data products.Interpretability refers to the availability of documentation to aid customers inunderstanding and using our data products. This documentation typically includes: theunderlying concepts; definitions; the methods used to collect, process, and analyze thedata; and the limitations imposed by the methods used.Transparency refers to providing documentation about the assumptions, methods, andlimitations of a data product to allow qualified third parties to reproduce the information,unless prevented by confidentiality or other legal constraints.
5 Impact of Data Quality Issues in Surveillance Impacts on surveillance reportingOver or underestimates of true casesGeographical misrepresentation (missing address)Increases costsAdditional staff required to address data quality issuesIncreases inefficienciesTimeliness for patient or provider follow up
6 Addressing Data Quality Challenges Modern disease surveillance information systems:Validates data at time of collectionMinimize inaccurate or incomplete dataStandardizes different data to uniform structureIntegrates matching technology to createPatient indexes (person-centric systems vs event-centric systems)Providers indexesFacility indexesCould refer to these as each registry as a system that will ultimately feed from a larger MPI.
7 What is Probabilistic Matching? Rule based match algorithmsStandardizes DataParses data into smaller tokensCreate fields that enhance matchingAdapt to specific data - incorporates uniqueness or frequency of data values when comparing recordsProcesses data in blocks – viable to use on large volume data setsRule based match algorithmsemploying fields that uniquely identify an entity – name, dob, gender, telephone, etc.Standardizes DataFormalizes names: Mike MichaelParses data into smaller tokensAddressline1 house number, street name, street type, apt #Create fields that enhance matchingPhonetic coding: Soundex, NYSIISHash and packed keysAdapt to specific data - incorporates uniqueness or frequency of data values when comparing records“Mary Jones” vs “Maushumi Mavinkurve”Processes data in blocks – viable to use on large volume data sets
8 Evaluating Match Algorithm Outcome of a potential match is a weight or likelihood that 2 records are the same entitySurveillance programs identify thresholds for match algorithmPrior to reviewing results of match algorithm:Identify implications for precision (PPV) vs negative predictive valuen (NPV)Evaluation of health code mandatePractical issuesSurveillance reportingIdentify guidelines or criteria to review matches
9 Identifying Thresholds Goal: maximize precision or PPVSacrifice on negative predictive value (NPV)Surveillance programs can decide to review ambiguous matchesTherefore - set high thresholds
10 Outcome of Probabilistic Matching Entity-centric, relational registry system
11 Background of Hepatitis B in NYC Decline in acute Hepatitis B incidents case rates (per 100,000 persons) from 11.5 in 1985 to 1.6 in 2006In NYC burden of chronic Hepatitis B infection as much as 2x higher within specific populationsMSMIDUPersons born in regions where HBsAg prevalence >2%Need for continued surveillance and monitoringHepatitis B Surface antigen test was developed and FDA approved in 1980’sDecline in acute Hepatitis B incidents rates from 11.5 cases per 100,000 persons in 1985 to 1.6 in 2006In NYC burden of chronic Hepatitis B infection up to 2x higher within specific populationsMSMIDUPersons born in regions where HBsAg prevelance >2% (Asian/PI, Eastern Europe, Middle East, Africa, Pacific Island immigrants)Need for continued surveillance and monitoringSource: recommendations for identification and public health management of persons with chronic Hepatitis B infection
12 Hepatitis B Surveillance Activities Monitor disease trendsAggregate descriptive reporting aimed to guide prevention and intervention effortsOutreach with newly infectedEducational materials to new cases reported to the registry
13 NYC Hepatitis B Registry Legacy application, built in-house in 1999Automatic weekly batch uploads of laboratory reportsData entry of provider reportsSystem did not index on patients (event-based), could not link 2 reports for the same person.Program utilized staff to build and apply deterministic match algorithmsResource intensiveVersion control
14 NYC Liver Virus Registry (LVR) Implemented in October 2008, built in-houseMigrated all legacy dataWeb-based applicationPerson-centric - integrates probabilistic matchingConsolidated views of all information for a personAbility to conduct longitudinal analysis2 weeks ago!Implemented in October 2008, built in-houseMigrated all legacy data – almost 10 years worth of dataWeb-based applicationPerson-centric - integrates probabilistic matchingConsolidated views of all information for a personAbility to conduct longitudinal analysis
15 LVR Probabilistic Matching Created a match algorithm based on fields unique to patient from laboratory and provider reportsProcessed all legacy data ~380,000 recordsProgram evaluated algorithm and identified thresholdsResults: out of ~380,000 reports the match algorithm was able to link these to ~111,000 unique personsProbabilistic matching enhanced duplication by 1% as compared to legacy deterministic algorithm
16 LVR Challenges & Successes Iterative review process time and resource intensiveEvaluation against legacy deterministic matchIdentifying target PPV and NPVSuccesses:Long term savings on time and resourcesStreamlined systemLongitudinal analysisMore accurate case countingEnhanced data qualityChallenges:Iterative review process time and resource intensiveEvaluation against legacy deterministic match – NOT GOLD STANDARD, did not evaluate the legacy matchIdentifying target PPV and NPVSuccesses:Long term savings on time and resources
17 Implementing Probabilistic Matching with NYC Hemoglobin A1C Registry (NYCAR)
18 What is Diabetes?Diabetes is a chronic disease caused by inadequate insulin levels or sensitivity leading to elevated blood sugar levelsBlood sugar levels can be measured byPlasma glucoseFingerstick glucoseGlycosylated hemoglobin or A1C (goal is <7%)Persistently high blood sugar levels can causeHeart disease and strokeKidney failureBlindnessNerve damage and amputation
19 Diabetes Burden in NYC Diabetes is epidemic in NYC Prevalence has more than doubled over the past 10 years.Approximately 500,000 New Yorkers have diabetesAn additional ~200,000 New Yorkers have diabetes, but have not yet been diagnosedApproximately 1 in 8 adults have diabetesIn 2006, diabetes was the 4th leading cause of death in NYC
20 Prevalence of Self-Reported Diabetes Among Adults in NYC Source: NYC estimates— CDC Behavioral Risk Factor Survey System (BRFSS) , NYC Community Health Survey (http://www.nyc.gov/health/epiquery)Source: National estimates—BRFSS 2006
21 Use of Traditional Public Health Surveillance for Chronic Disease Disease reporting to public health agency to:Monitor trendsDescribe glycemic control in NYCIdentify special populationsTarget individuals with poor controlCommunicate with provider communityFeedback to providers and their patientsControl epidemicsDecrease complications/improve quality of life
22 Hemoglobin A1C TestsA1C is a measure of average blood sugar control in preceding 3 months (goal <7%)A1C is used to:Monitor individual’s blood sugar controlGuide changes in medication therapyImpart risk of diabetes complicationsMost people who get A1Cs have diabetes so it is a marker for diabetes statusTHEREFORE, AN A1C REGISTRY WILL PROVIDE A MECHANISM FOR TRACKING INDIVIDUALS WITH DIABETESGoal – to have 7.0% (average blood sugar of 170 mg/dL).THEREFORE, AN A1C REGISTRY WILL PROVIDE A MECHANISM FOR TRACKING INDIVIDUALS WITH DIABETES
23 Implementation of NYCAR Based on existing NY State / NYC laboratory reporting systemAmendment to NYC health code, Article 13 which mandates communicable disease reporting, to include A1CPublic hearing Summer 2005Approval of amendment December 2005Went into effect January 15, 2006Laboratories submitting data to NY State and NYC subject to mandateReport information on patient, ordering provider and facility, testing facility and resultSubmit via secure networkReceive ~5,000 new lab reports daily – High VolumePatient advocacy and privacy groups voiced concerns during amendment proceedingsFelt no satisfactory rationale for public health agency involvement in chronic disease reportingDOHMH clearly wrote into amendment that information can only be released to:Treating medical provider (s)PatientPatients can opt out of intervention but not from being in the registryLaboratory Reporting:34 labs reporting A1C testsTest results reported by lab within 24 hoursPHINMS – Secure file transmissionHL7 messages or ASCII files
24 Objectives of New York City A1C Registry (NYCAR) Surveillance and epidemiologyTrack trends on the population levelProvider feedback and communicationQuarterly provider reports in comparison to peersQuarterly rosters of patients stratified by A1C levelPatient feedback (via provider)Letters with A1C informationLocal resourcesDeliver resources to providers/patientsAll of the above requires matching and data linkagesBegan January 15, 2006 with mandate of electronic lab reporting.Provider Reports: Quarterly reports with patients listed by A1C level will be distributed to providers. Reports may be used to identify individuals who may benefit from additional support, such as intensification of therapy, or a referral to a physical activity program or self-management program.Patient Letters: Letters with recent A1C test results and a reminder to return to care will be sent to patients with high A1C levels.
25 Components of A1C Registry Information collected by laboratory reports include:Individual name, address, date of birth, sexName and address of ordering provider, ordering facility and testing facilityA1C test collection date and result
26 NYCAR Probabilistic Methodology Created 3 separate matching models:PatientProviderOrdering FacilityObtained a representative sample of dataFor each model - created a match algorithm utilizing fields that uniquely identify each entityName (patient, provider, ordering facility), patient dob, gender, address, providerID, telephone number, etc.Provided match results to program for review and identify thresholdsCreates indexes for patient, provider and ordering facility:Each patient appears once and all tests for that individual are linkedEach provider appears once all tests reported by that provider are linkedEach ordering facility appears once and all reports by that facility are linkedSample used about 100,000 records selected from a specific time period
27 Program Threshold Evaluation Due to volume of reports, impractical for staff to review all ambiguous matches – need to set thresholdsMethod to identify of thresholds using sample2 reviewers and 1 tie-breaker scored matches referencing guidelinesUtilized a sampling method within weight rangesIdentified specific weight or threshold at which target precision rates were met based on review
28 Deploying Probabilistic Matching All new incoming A1C lab reports parsed into 3 staging entities:patient, provider and facilitiesEach entity is matched against existing respective entities in the registryIf matched above thresholds, linked to an existing recordIf below thresholds, creating a new entity (patient, provider or facility)Provider Reports and Rosters and Patient Letters are generated using an in-house developed application which reads from the registryOn a weekly basis – the following process occurs
29 Facility ReportPage 2Note: All information in this slide is fictitiousPage 1
30 Provider ReportNote: All information in this slide is fictitious
32 Challenges and Successes Quality of record linkageNeed sufficient information for successful linkage of multiple tests per individual as well as master provider and facility indexingMaintaining accurate facility-provider linkageEffect of laboratory variation – availability of dataReview thresholds – time and resource intensiveSuccessesEntire process is seamless, electronic and automatedHigh volume of dataAbility to conduct Longitudinal analysisQuality of record linkageMisconceptions – bad quality data cannot be reconciled!Need sufficient information for successful linkage of multiple tests per individual as well as master provider and facility indexingMaintaining accurate facility-provider linkage-1 large umbrella facility can have multiple names-providers can work for multiple facilitiesCase definitionsIndividuals with diabetesProvider for a given patientEffect of laboratory variationImpact of inter-laboratory variation – data integrity and availability
34 NYC Current Status Modernizing several disease registries: Chronic Hepatitis B - completedNYCAR – completedSTD – requirements completedTB – requirements completedHIV – planningIs this an opportune time to develop an MPI?
35 Planning an MPI: Challenges Each registry program has requirements for a matching based on:Patient populationData quality and volumeDissemination/Use of Surveillance dataFoster consensus among disease programsBreach of Security – higher riskLegal barriers to creating an MPIAnalysis of health code by reportable diseasePolitical barriers to creating an MPIChallenges:Each registry program has requirements for a matching based on:Patient populationData quality and volumeDissemination/Use of Surveillance dataFoster consensus among disease programsLegal barriers to creating an MPIlaws, particularly in NYC, are extremely specific. Mandated reportables must be used for the purpose of surveillance and epidemiology of that specific disease – is an MPI a stretch?Particularly A1C data – this is the most strictly written health code.
36 Planning an MPI: Benefits Pooling data from different sources could enhance PPV and NPV of the matchStreamline IT resourcesSupport staffInfrastructureAbility to conduct syndemic surveillance and investigationMore efficient use of limited resourcesSyndemic is defined as two or more afflictions, interacting synergistically, contributing to excess burden of disease in a population
37 Acknowledgements Diabetes Prevention and Control Program Lynn Silver Shadi ChamanyAngela MergesCharlotte NeuhausBahman TabeiCindy DriverLeslie KorendaDivision of Informatics and Information TechnologyDon WeinerStephen GiannottiNamrata KumarJisen HoLaura GoodmanBureau of Chronic Disease ControlKatherine BornschleglMagdalena BergerEmily LumengDivision of EpidemiologyLorna ThorpeBonnie KerkerJenna Mandel-RicciRam Koppaka
38 Questions? Maushumi Mavinkurve Director, Center for Data Matching NYC Department of Health and Mental Hygiene(P)