Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fuzzy Duplicates Analysis with ACL Prepared by: Kevin Legere Date: April 3 rd, 2013.

Similar presentations


Presentation on theme: "Fuzzy Duplicates Analysis with ACL Prepared by: Kevin Legere Date: April 3 rd, 2013."— Presentation transcript:

1 Fuzzy Duplicates Analysis with ACL Prepared by: Kevin Legere Date: April 3 rd, 2013

2 2 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk Agenda  Overview  Example  FUZZYDUP command  OMIT() Function  Script Editor and RECOFFSET  Q&A

3 3 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk Overview  What is a "Fuzzy Duplicate"? –Match based on criteria where the values are not exact but very close » EX: "ACL Services" and "ACL Service"  Typically used for: »Keyword matching »Invoice Number matching »Vendor Name matching* »Employee Name matching  Can be simple or complex »Completely depends on your approach and desired accuracy * focus for this presentation

4 4 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk Overview  Simple Match Examples: –Exact or 100% match » "ACL" = "ACL" –Force Upper or Lower case » "ACL" = UPPER("acl") » "acl" = LOWER("ACL") –Removal of special characters » "ACL" = EXCLUDE("*ACL." –Only compare numbers or letters » "ACL" = INCLUDE(UPPER("ACL123") "ABCDEFGHIJKLMNOPQRSTUVWXYZ") » "123" = INCLUDE("ACL123" " ")

5 5 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk Overview  Complex Match Examples: –Removal of company type indicators (LLC, INC, LTD, etc) » "ACL Services Ltd." = "ACL Services" –Percent of word match AKA letter by letter » "ACL Services" "ACL Service" 11/12 character match or 91.6% match –Word by Word* » "ACL Services" "ACL Champions" "ACL" "ACL" "Services" "Champions" = 50% match –Levenshtein distance –Sounds like –NYSIIS *Most used by ACL Consultants

6 6 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk Vendor Master Analysis

7 7 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk  Fuzzy Duplicates on Vendor Name – Possible Risk »Payments are being sent to more than one vendor – May not involve risk. The desire can be to normalize the vendor master list to ensure that duplicates do not exist. » Ideally, one unique vendor should exist in your vendor master list with one or more address records in your vendor address table Vendor Master Analysis

8 8 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk  Sample file contains 75 vendors –Only Vendor Code and Vendor Name  Where do you start for Vendor Name matching? –Look for exact duplicates –Focus on Simple matching –Sort or Summarize! Vendor Master Analysis

9 9 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk  Step 1: Summarize your Vendor Master File »Choose Vendor Name as your key field »Add Vendor Code as the Other Fields for Summarizing »Be sure to check "Presort" Vendor Master Analysis

10 10 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk  Step 2: Quickly comb over the data to identify a common trend. »We will focus on this issue, in the sample data: »Create a computed field that corrects the trend (or cleans the data). Vendor Master Analysis

11 11 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk  Functions used in Default Value text box: INCLUDE ( UPPER ( ALLTRIM (Vendor_Name)) 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')  Within ACL, the computed field will return the following: Vendor Master Analysis

12 12 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk  Step 3: Perform a Duplicates Command on the computed field Vendor Master Analysis

13 13 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk  Results are as follows: Vendor Master Analysis

14 14 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk  ACL 9.3 has new features that make Fuzzy Duplicate analysis easier – FUZZYDUP command – OMIT() function – ISFUZZYDUP() function – LEVDIST() function  Important parameters to understand –Levenshtein Distance –Difference Percentage FUZZYDUP command

15 15 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk  Syntax – FUZZYDUP ON { key_field } { LEVDISTANCE value } TO table_name  Example – FUZZYDUP ON Vendor_Name OTHER ALL LEVDISTANCE 2 DIFFPCT 50 TO My_Results  Levenshtein Distance ( LEVDISTANCE ) »The number of edits required to make the strings equal EX: "Smith" and "Smythe" have a Levenshtein Distance of 2  Difference Percentage ( DIFFPCT ) »The threshold for percentage difference between two strings EX: "Smith" and "Smythe" have a Percentage Difference of 40% (2/5) * 100% FUZZYDUP command

16 16 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk  When Do I use OMIT()? –When you want to refine fuzzy duplicate analysis –Look for repeating strings you want to remove from your Vendor Name field  Syntax – OMIT (string1, string2 ) –Specify T to make substrings specified for removal case-sensitive, or F to ignore case  Example – OMIT (Vendor_Name " Ltd, Inc, Corp, Corporation" F) OMIT() Function

17 17 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk Script Editor and RECOFFSET

18 18 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk

19 19 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk Contact Information Kevin Legere Implementation Consultant ACL Services Ltd Alberni Street, Vancouver, BC, Canada V6G 1A5 | |

20 20 © 2012 ACL Services Ltd. ACL | Transforming Audit and Risk


Download ppt "Fuzzy Duplicates Analysis with ACL Prepared by: Kevin Legere Date: April 3 rd, 2013."

Similar presentations


Ads by Google