De-Duplication A not so simple problem Covers Appendix Part 5.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Mail merge I: Use mail merge for mass mailings Set up your recipient list You know what the basic elements of a mail merge are. Now youll learn how to.
We have developed CV easy management (CVem) a fast and effective fully automated software solution for effective and rapid management of all personnel.
OS Places New Service Products from May 2014 Address Capture & Verification Address Matching GeoSearch Ordnance Survey 2014.
Database management system (DBMS)  a DBMS allows users and other software to store and retrieve data in a structured way  controls the organization,
CSCI3170 Introduction to Database Systems
Microsoft Excel 2003 Illustrated Complete Excel Files and Incorporating Web Information Sharing.
Software and Database Management Presented By: Michael Lloyd UNC Wilmington March 31, 2008.
1 Software Engineering Lecture 11 Software Testing.
Welcome Data Imports Instant Imports & How to Create an Import File Ryan McIntire Digital Measures.
© John M. Abowd 2005, all rights reserved Sampling Frame Maintenance John M. Abowd February 2005.
Preforming Mail Merges Lesson 11 © 2014, John Wiley & Sons, Inc. Microsoft Official Academic Course, Microsoft Word Microsoft Word 2013.
Address register: HM Land Registry’s experience Jon Atkey Head of International Unit, HM Land Registry England and Wales.
Adding an Address To avoid adding duplicate addresses, Always use Find first. If you cannot find an address then add it. A warning should display if an.
Mail Merge. One very useful feature of Microsoft Office is the Mail Merge feature. As an example of its use, suppose you want to send out application.
Merging census aggregate statistics with postal code-based microdata Laine Ruus University of Toronto. Data Library Service ,
1 DATABASE TECHNOLOGIES BUS Abdou Illia, Fall 2012 (September 5, 2012)
Relational Database Concepts. Let’s start with a simple example of a database application Assume that you want to keep track of your clients’ names, addresses,
Database Design Concepts With Access. Learning Outcomes  Identify and define the information that is needed to design a database  Create conceptual.
DAY 14: ACCESS CHAPTER 1 Tazin Afrin October 03,
© Hanson Research Corporation Deduping contacts in Sage CRM 24 th Day of November 2010.
Mail merge I: Use mail merge for mass mailings Perform a complete mail merge Now you’ll walk through the process of performing a mail merge by using the.
Exploring Office Grauer and Barber 1 Introduction to Access: What is a Database?(Wk1)
Audio Dial In: or CRM to RM Visual CRM to MS-CRM 2007 Visual User Group Nov 21 st 2007.
Nasca Access BasicsMore Access Access Again Access Continued Access Leftovers.
INTRODUCTION Abeera Akmal Senior instructor (IT) GVTI (W) Mulhal Mughlan Chakwal.
Organizing Your Research & Files  To be successful in your genealogy research, you must be able to plan each research step and organize the information.
(Business) Process Centric Exchanges
Selection Control Structures. Simple Program Design, Fourth Edition Chapter 4 2 Objectives In this chapter you will be able to: Elaborate on the uses.
1 Database Concepts 2 Definition of a Database An organized Collection Of related records.
The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.
DATABASE SYSTEMS. DATABASE u A filing system for holding data u Contains a set of similar files –Each file contains similar records Each record contains.
1 CSE 2337 Introduction to Data Management Access Book – Ch 1.
ECDL. Word processing Work with documents and save them in different file formats Choose built-in options such as the Help function to enhance productivity.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
© 2014 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
Lesson 13 Databases Unit 2—Using the Computer. Computer Concepts BASICS - 22 Objectives Define the purpose and function of database software. Identify.
Mail merge I: Use mail merge for mass mailings Overview: Mailings en masse What if you need to send to each of your employees a letter containing personal.
Chapter 11: Sequential File Merging, Matching, and Updating Programming Logic and Design, Third Edition Comprehensive.
11 Chapter 111 Sequential File Merging, Matching, and Updating Programming Logic and Design, Second Edition, Comprehensive 11.
DAY 14: ACCESS CHAPTER 1 RAHUL KAVI October 8,
Lesson 7: Using Mail Merge
ESTP Course on the EGR November Validation of preliminary EGR data and files to EGR on GEG.
XP Exploring Outlook  Outlook is a powerful information manager  You can use Outlook to perform a wide range of communication and organizational tasks,
Chapter 1 Page ref. Chapter 1 Company File Setup and Maintenance 1.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
All rights reserved. © 2009 Tableau Software Inc. Advanced Mapping Techniques Austin Dahl, Dirk Karis, Robert Morton Tableau Software.
XP Chapter 1 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Level 2 Objectives: Understanding and Creating Table.
ELECTRONIC DOCUMENT PREPARATION AND MANAGEMENT SCHOOL: MAGGOTTY HIGH TEACHER: APPLICATION SOFTWARE: MICROSOFT OFFICE WORD 2007 OPERATING SOFTWARE: WINDOWS.
Perform a complete mail merge Lesson 14 By the end of this lesson you will be able to complete the following: Use the Mail Merge Wizard to perform a basic.
OSI Model OSI MODEL. Communication Architecture Strategy for connecting host computers and other communicating equipment. Defines necessary elements for.
Welcome to.. Miss Heaton.
Preforming Mail Merges
GO! with Microsoft Office 2016
Databases Chapter 16.
What is IC3 IC3 /ˌaɪˌsiːˈθriː/ is the abbreviation and registered trademark of the "Internet and Computing Core Certification." The IC3 is a global certification.
GO! with Microsoft Access 2016
Preforming Mail Merges
Microsoft Access 2003 Illustrated Complete
Merging Word Documents
Programming Logic and Design Fourth Edition, Comprehensive
[Your company] Business Plan [Street Address City, State & Zip Code
2020 Census Local Update of Census Addresses Operation (LUCA)
Data Management – Processing
Mail Merge.
Mail Merge April 1, 2004.
DATABASE TECHNOLOGIES
[Your company] Business Plan [Street Address City, State & Zip Code
Stephanie Hirner ESTP ”Administrative data and censuses
Presentation transcript:

De-Duplication A not so simple problem Covers Appendix Part 5

False? False positives occur when a group of duplicates are identified that do NOT represent the same customer False negatives occur when actual redundant representations of the same customer are NOT identified

Customer Name – only personal names Postal Address – only United States address formats Tax ID – Could be personal National Insurance Number or another unique identifier

Identical Would you argue that these are NOT duplicate customers?

Exact???

Abbreviation The abbreviation of first and middle names is a common challenge: Does a matching Tax ID guarantee that a variation is a duplicate? What about when Tax ID is missing?

Marriage Marriages can be good for people but possibly bad for their data: Did the hyphenated last name on Key 252 help overcome the change of address and missing Tax ID? How do you know if Keys 261 and/or 262 are truly the same customer as Key 263?

False Positives For Keys 312 & 313, do you think the matching Tax ID and similar name indicate possible duplication of Key 311 despite the different postal address? For Keys 322 & 323, do you think the exact same postal address and similar name indicate possible duplication of Key 321 despite the missing Tax IDs

Same Address A common challenge is the same family name and the exact same postal address

What goes in Report Appendix? Discuss deduplication – What is your business strategy Show via a flow chart how you would attempt deduplication

Mailing List Management Functional Requirements Set out what the new system will do. You have some experience with this from CS22120 Group Project. An attempt to describe, logically, the functionality of the system. You need to describe it NOT build it.

Requirements Functional – What is it supposed to do Non-Functional requirements – Computer Environment – Personnel – Web based

Some functions Set up required fields Add, Modify and Delete Fields Import initial list – Field matching – Excel, CSV programs Add, Modify and Delete Records Merge records from externally purchased files

Mailing List Functionality cont’d Cleanse using Post Office Address File (PAF) – Contains all address in UK – Use to correct address from post code – Can add correct: Street name Posttown County

Sorting Sort by – Post code – Geographic Areas – Job Title – SIC codes – Turnover (Ascending/Descending/Random) – And combinations of above

Mailing List Functionality cont’d Select Number of records to deliver and maybe by – Post code – Job Title – SIC codes – Turnover (Ascending/Descending/Random) – Add false “ghosts” – File formats

Product? Must be able distribute software – How? – Web or local OS – Hardware Platform

Competition Mailing Houses – Data discs – Web – Mailing list management services Software Companies – Dedupe software – Mailing List Management Software CHECK THESE OUT FOR THE REPORT