Presentation on theme: "Copyright 2009 bSolv. All rights reserved Citizen360 Identity Resolution Introduction to the Identity Resolution (IR) processes Version 1.0 You should."— Presentation transcript:
Copyright 2009 bSolv. All rights reserved Citizen360 Identity Resolution Introduction to the Identity Resolution (IR) processes Version 1.0 You should see the system overview before you run this presentation. Click here to launch the overview presentationClick here to launch the overview presentation.
Copyright 2009 bSolv. All rights reserved Citizen360 – Introduction to Identity Resolution Identity Resolution (IR) This is the process by which computer records are analyzed to find those records which represent the same physical person and to subsequently merge or link those records.
Copyright 2009 bSolv. All rights reserved Citizen360 IR Approach IR Sweep - this batch program identifies possible citizen matches –Built with a high degree of parallelism - up to 10 instances can run in parallel –Can be configured to run against customized citizen models –Runs a configurable IR Algorithm –Can set the confidence-threshold level (e.g., 70%) at which match results are not reported IR Algorithm – this is the algorithm, used by the IR Sweep program, that calculates the “match confidence level” between two citizens –The algorithm results in a single confidence result expressed as a percentage, e.g., 83% –The algorithm is made up of three major components: Identifier Match, e.g., SSN Personal Demographic Data (PDD) match, e.g., names, ages, and gender Location match, e.g., phones, emails, and addresses. Record “Merge” –This moves the different citizen detail-records under the same “citizen header” –Although called a “merge” it is really a “link”. The source systems are not forced to be the same
Copyright 2009 bSolv. All rights reserved Date of Birth Date:07/13/1965 SourceDOH Date of Birth Date:07/14/1965 SourceDSS Citizen Id: 222222 Master Index:10001340065 Date of Birth Date:07/14/1965 SourceDSS Citizen Id: 333333 Master Index:1000130073 Date of Birth Date:07/13/1965 SourceDOH Merging does not change the data – it is still held by Source System Citizen Id: 111111 Master Index:10001340057 Date of Birth Date:07/14/1965 SourceDHS Identity Resolution Process Match:83% Identity Resolution Process Match:82% Master Index History Value:10001340057 Master Index History Value:10001340065 Master Index History Value:1000130073 We can continue to use any of the original/historic “master index” values to reference the citizen Based on Identity Resolution processes we may decide to merge other records… The data is still unique by source system - but we now know that it is for a common citizen
Copyright 2009 bSolv. All rights reserved IR Algorithm Configuration Data elements that are compared are given “grades”: –None A confirmed non-match –Approximate A less exact match or quite often one or more values are absent (null) –Close For example SSNs that have some digits swapped, dates of birth that are 1 day apart, a name that “sounds like” another name –Exact A confirmed exact match Each data element grade type is given a score between 0 and 1 (exact) –The grade scoring is configurable through the user interface Each data element grade is weighted and applied to the overall score –The data element weighting is configurable through the user interface
Copyright 2009 bSolv. All rights reserved IR Algorithm Sophistication – a few examples Can select the preferred phonetic algorithms for different fields, e.g., Soundex, Metaphone, Double Metaphone, Phonex, NYCIIS The Double Metaphone phonetics comparison is generally the best for names: –Much more powerful than Soundex –Can properly handle Eastern European names, e.g., Budjinski –Considers correct and incorrect pronunciations of names such as “Juan”, e.g., “hwahn” and “jewann” –Can handle silent B in Bomb and Dumb, etc. The address-comparison converts the addresses into the best-fit standardized post office address names Full and partial address matches Dates are considered Close matches if they are within a range, only have a single digit difference, or the format is possibly different (US standard - mm/dd/yyyy, compared to US INS - dd/mm/yyyy) Emails with the same name but different domains are considered Close matches, e.g., email@example.com and firstname.lastname@example.org Number fields (e.g., SSNs) are considered Close if they just have digits swapped, if they are the same except a digit is missing from one, etc.
Copyright 2009 bSolv. All rights reserved Confidential and Proprietary THANK YOU info@bSolv.com www.bsolv.com 3330 Cumberland Boulevard Suite 500 Atlanta, GA 30339 Office: +1 678.638.6692