Presentation is loading. Please wait.

Presentation is loading. Please wait.

© OMIKRON 2004 - © 1993-2006 OMIKRON Data Quality GmbH ∙ ∙ Duplicates & matching in worldwide data:

Similar presentations


Presentation on theme: "© OMIKRON 2004 - © 1993-2006 OMIKRON Data Quality GmbH ∙ ∙ Duplicates & matching in worldwide data:"— Presentation transcript:

1 © OMIKRON 2004 - www.omikron.net © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Duplicates & matching in worldwide data: Challenges & Solutions Carsten Kraus

2 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net The world becomes more global Germany creates over 30% of its GNP by exports In 1993, export accounted for only 20%

3 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net The world becomes more local When conquered by the russians, Mongolia changed from their own alphabet to Cyrillic In 1993, they changed back Ireland and many other countries put efforts to strengthen their own languages  Don‘t believe the world will soon switch to English standards anyway

4 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net The World – so what? Worldwide customer data needs: -A worldwide adequate data structure -worldwide adequate entry forms  e.g. for the internet -Worldwide adequate Processing  E.g. Matching / Duplicate check

5 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Why is matching important Save money -Avoid duplicates in Mailings -Avoid selling to Blacklisted customers or above credit limit Earn more money -Avoid customer frustation -Single view of customer needed for BI  CLV  cross selling  marketing controlling …… Save your life -Avoid selling to terrorists

6 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Do not trust the postal code In the UK, the postal code goes very deep, sometimes to the level of a single building -It is therefore a strong identifier in duplicate checks Germany, France, Switzerland, Sweden … only 5 resp. 4 numeric digits 70 countries do not have any postal code at all – e.g. Ireland  Do not trust that you can base strong identification on the postal code

7 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Postal Data Street names are available only for a few countries Different periods of updating Therefore: Duplicate Check must also be able to handle addresses, which are not postally precise

8 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Characters in the world: Alphabet Abugidas Abjads Syllables Script Symbol Script -Александр Пушкин - ेवनागर - أسامة بن لادن - あいこ - 愛子

9 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Processes e.g. Duplicate Check Just „Unicode capabilities“ are not enough

10 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Abjads Semitic Languages -Arabic [ العربية ] -Hebrew [ עברית ] -Abjads are written from right to left Abjads are only using consonants -For most words, vowels are optional, as they are obvious for the locals and are added while speaking Problems result with latin writing of arabic names: -27 ways of writing Usama bin Laden [ أسامة بن لادن ] in the archives of „Der Spiegel“ (magazine like „Time“) -(Demonstration with Omikron-technology)

11 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Japan Multiple ways of writing: Aiko (Child of Love) 愛子 あいこ

12 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net China ZHANG Aiguo 张爱国 ZHANG Aimin 张爱民 ZHANG Aidang 张爱党 ZHANG is the family name, Ai the generation name Only the last syllable represents the given name

13 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net The 5 most common names U.K. 3,3% China 31%

14 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Thus Do not trust the family name to be a good differentiator in all countries Your software should be able to handle these cases

15 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Householding In many countries, male and female names have different endings -E.g. greek names  Male: Πέτρος Κώτης (Petros Kotis)  Female Αναστασία Κώτη (Anastasia Koti) When identifying households, it is just not enough to search for a 100%match

16 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Order of given/family name In the U.K., names begin with the given name: -John Smith In France and in many other countries the given name stands after the familiy name: -DUPONT Michel

17 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Komposita in Firmennamen English: ring tone service Ltd. (3 words) German: klingeltonservice GmbH (1 word) Or: -Klingelton-Service -Service für Klingeltöne Not only German: -Netherlands -Scandinavia -Occasional occurrence in many languages Most algorithms cannot solve that as they compare wordwise

18 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Omikron offers: Worldwide matching technology -At D&B Sweden and at Schober Iberia, we replaced localized solutions because our international matching technology proved better results than localized solutions – even on local data -At Reed, we found 20 000 more duplicates in 400 000 international Addresses already processed by another high end software -Patent pending Other DQ technology -e.g. data structuring, Upper/lower case… All built into an SOA-ready solution, the Omikron DQ Server -Other surroundings available

19 © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Thus: Handling international data correctly, means more than just being able to import Unicode-Data Keep in mind the impact on -Data storage -Data entries -Matching -Salutation -etc. The global world becomes more local again – care about it and you will have a competitive advantage Feel free to ask us to help you ;-)

20 © OMIKRON 2004 - www.omikron.net © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Attention, content has been added/changed For updated Presentation, please give me your card

21 © OMIKRON 2004 - www.omikron.net © 1993-2006 OMIKRON Data Quality GmbH ∙ www.global.omikron.net ∙ info@omikron.net Thank you for your attention! Carsten Kraus ckraus@omikron.net


Download ppt "© OMIKRON 2004 - © 1993-2006 OMIKRON Data Quality GmbH ∙ ∙ Duplicates & matching in worldwide data:"

Similar presentations


Ads by Google