Essentials of Data Quality for Predictive Modeling Jeremy Benson, FCAS, FSA Alietia Caughron, Ph.D Central States Actuarial Forum June 5, 2009.

Slides:



Advertisements
Similar presentations
Ed-D 420 Inclusion of Exceptional Learners. CAT time Learner-Centered - Learner-centered techniques focus on strategies and approaches to improve learning.
Advertisements

A Systems Approach To Training
IUFRO International Union of Forest Research Organizations Eero Mikkola The Increasing Importance of Metadata in Forest Information Gathering NEFIS Symposium.
Business Planning using Spreasheets-2 1 BP-2: Good Spreadsheet Practice  There is always the temptation to rush in and start entering data.  However.
ISIC and CPC Maintenance and Revision Policies Expert Group Meeting May 18-20, 2011.
Judy Windle Sunflower Systems How to Clean Up Your Property Data: From Plan to Execution.
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
Copyright 2009  Develop the project charter: working with stakeholders to create the document that formally authorizes a project—the charter  Develop.
Copyright © 2009 PMI RiskSIGNovember 5-6, 2009 “Project Risk Management – An International Perspective” RiskSIG - Advancing the State of the Art A collaboration.
Online communities 1 Theory revision Complete some of the activities in this powerpoint and use the revision book to answer questions.
EFFECTIVE PREDICTIVE MODELING- DATA,ANALYTICS AND PRACTICE MANAGEMENT Richard A. Derrig Ph.D. OPAL Consulting LLC Karthik Balakrishnan Ph.D. ISO Innovative.
A Review ISO 9001:2015 Draft What’s Important to Know Now
4. Quality Management System (QMS)
Quality Improvement Prepeared By Dr: Manal Moussa.
Release & Deployment ITIL Version 3
Leaders Manage Meetings
Child Study. What is Child Study? The CSC is a school-based team, convened for the purpose of reviewing any problems (academic/developmental, behavioral,
Company Confidential Registration Management Committee (RMC) 1 AQMS Accreditation Programs ANAB Findings Atlanta, GA July 22, 2010 Steve Holladay ANAB,
What is Business Analysis Planning & Monitoring?
AICT5 – eProject Project Planning for ICT. Process Centre receives Scenario Group Work Scenario on website in October Assessment Window Individual Work.
BSBIMN501A QUEENSLAND INTERNATIONAL BUSINESS ACADEMY.
Tool for Assessing Statistical Capacity (TASC) The development of TASC was sponsored by United States Agency for International Development.
Dear User, This presentation has been designed for you by the Hearts and Minds Support Team. It provides a template for presenting the results of the SAFE.
Orphans in Actuarial Data DM-1: Data Cleansing for Predictive Modeling Jeremy Benson March 2012 – RPM Seminar.
Presented by Joan Kossow Data Compliance Manager The Changing Face of Claims Processing &
Performance Technology Dr. James J. Kirk Professor of HRD.
Information Services, Griffith University Problem Management Implementation Service Desk Project Phase 2 Project
Information Assurance The Coordinated Approach To Improving Enterprise Data Quality.
1 ISA&D7‏/8‏/ ISA&D7‏/8‏/2013 Systems Development Life Cycle Phases and Activities in the SDLC Variations of the SDLC models.
August 7, Market Participant Survey Action Plan Dale Goodman Director, Market Services.
Kick off Meeting. Discussion Points Definitions Education Communication Training Project Goals Project Scope Training Scope Learning Objectives Roles.
Enterprise Social Networking Pilot Project April 14, 2010.
ISM 5316 Week 3 Learning Objectives You should be able to: u Define and list issues and steps in Project Integration u List and describe the components.
Office of Process Simplification May 20, 2009 Planning an Improvement Project.
Overall Quality Assurance, Selecting and managing external consultants and outsourcing Baku Training Module.
Dryad Management Board Meeting Friday, May 22 1:30 p.m. Session 3: Software development timeline and priorities Slides pprepared by the Dryad development.
Search Engine Optimization © HiTech Institute. All rights reserved. Slide 1 What is Solution Assessment & Validation?
Copyright  2005 McGraw-Hill Australia Pty Ltd PPTs t/a Australian Human Resources Management by Jeremy Seward and Tim Dein Slides prepared by Michelle.
Project Portfolio Management Business Priorities Presentation.
Implementation and follow up Critically important but relatively neglected stages of EIA process Surveillance, monitoring, auditing, evaluation and other.
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 2 Quality management Produced in Collaboration between.
SCIENCE PROCESS SKILLS
INTEGRATED ASSESSMENT AND PLANNING FOR SUSTAINABLE DEVELOPMENT 1 Click to edit Master title style 1 Evaluation and Review of Experience from UNEP Projects.
Privileged & Confidential Frequency and Severity vs. Loss Cost Modeling CAS 2012 Ratemaking and Product Management Seminar March 2012 Philadelphia, PA.
Human Computer Interaction
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Census Processing Baku Training Module.  Discuss:  Processing Strategies  Processing operations  Quality Assurance for processing  Technology Issues.
Company: Cincinnati Insurance Company Position: IT Governance Risk & Compliance Service Manager Location: Fairfield, OH About the Company : The Cincinnati.
Name Project Management Symposium June 8 – 9, 2015 Slide 1 Susan Hostetter, Reed Livergood, Amy Squires, and James Treat 2015 Project Management Symposium.
State of Georgia Release Management Training
Essay Questions. Two Main Purposes for essay questions 1. to assess students' understanding of and ability to think with subject matter content. 2. to.
BSBPMG501A Manage Project Integrative Processes Manage Project Integrative Processes Project Integration Processes – Part 2 Diploma of Project Management.
Company LOGO. Company LOGO PE, PMP, PgMP, PME, MCT, PRINCE2 Practitioner.
Session 2: Developing a Comprehensive M&E Work Plan.
Teaching and Learning Cycle and Differentiated Instruction A Perfect Fit Rigor Relevance Quality Learning Environment Differentiation.
Software Engineering Process - II 7.1 Unit 7: Quality Management Software Engineering Process - II.
PROMOTING SPECIALTY CROPS AS LOCAL Module 5: How do you talk to consumers about your locally grown food? – Part 1.
1 Auditing Your Fusion Center Privacy Policy. 22 Recommendations to the program resulting in improvements Updates to privacy documentation Informal discussions.
Evaluation. What is important??? Cost Quality Delivery Supplier Expertise Financial Stability Coverage Product Offerings Do you intend to negotiate?
Developing a Monitoring & Evaluation Plan MEASURE Evaluation.
Financial Management Chapter 1- Introduction to Accounting & Finance Session Number N1.
Pre-Investigational New Drug (pre-IND) Meeting with FDA
Project Initiatives Identified by the CIA Project
Patterns emerging from chaos
Data Quality By Suparna Kansakar.
Business and IT Architecture for ESS validation
Working SMART How Leaders can Align Attendance with School Goals, Structures and Functions Welcome - Introductions Name of the session – We all know that.
AICT5 – eProject Project Planning for ICT
Implementation Business Case
OBSERVER DATA MANAGEMENT PRINCIPLES AND BEST PRACTICE (Agenda Item 4)
Presentation transcript:

Essentials of Data Quality for Predictive Modeling Jeremy Benson, FCAS, FSA Alietia Caughron, Ph.D Central States Actuarial Forum June 5, 2009

Slide 2 Agenda Team Structure Goals of Data Quality Process Benefits of Data Quality Lessons Learned

Slide 3 Separate Data and Modeling Teams Advantages – Increased focus on data quality – Different skills sets are needed for data management than for modeling – Data quality issues go beyond modeling – Data quality team can start next project earlier Disadvantages – Modeling team is not as intimate with the data – Knowledge transfer to the modeling team may be incomplete

Slide 4 Overall Goals of Data Quality Accurate, Consistent, Complete Data The data should be appropriate for the purpose of the analysis Improved ability to explain and defend decisions Better decisions result from better data Actuary’s time is freed up for more focus on core professional responsibilities, decision and analysis

Slide 5 Goals of Data Quality for Predictive Modeling Data: Fundamentally a statistical exercise that requires data. “Clean” data: If the data set is incomplete, model convergence can/will be a problem. “Good” clean data: Quality of the data impacts models’ predictive accuracy. Documentation: Supports repeatability (model updates) & buy-in from others. Communication: Feedback loop improves knowledge transfer & assists in prioritizing.

Slide 6 Data Quality Process  Data Collection and Integration  Data Quality Testing  Data Scrubbing  Documentation  Communication

Slide 7 Data Collection and Integration Scope – Records (Filters) – Fields Metadata – Operational Definitions of each field Data Integration – Mapping – Aggregation/Duplication

Slide 8 Data Quality Testing Data Profiling – Fill Rates – Frequency Tests – Min/Max Data Integration Tests Reconciliation Business Rules – Loss Date should be after Effective Date – Claims should have a matching Policy – Tests for Negative Premium/Loss

Slide 9 Data Scrubbing Normalizing Data – If fields are required to add to a certain value, adjusting the data so that they do. Imputations – Filling in of Missing Data Translations – Changing the value of a Data point to make it Consistent with other values that have the same meaning Cleaning – Correction of Erroneous Data Mapping – Process in which similar data is merged together and the values in the datasets are translated to become consistent with each other – Redefining the segmentation of data

Slide 10 Documentation Clear and concise so that someone else can re-create the process Modeler should be able to understand the what each data element represents and any data scrubbing that took place for that element Any justification for changes to the dataset should be clearly documented Helps provide the modeler with a comfort level about the data.

Slide 11 Communication Process IT – Determine how to access the data and obtain permission to the data Actuarial, Underwriting, Claims, Operations – Understand the intended uses of the data – Determination of the Scope of Records and Fields to use for Modeling – Input and Feedback on Results of Data Quality Testing – Drilldown into Root Cause Analysis of Data Quality Issues Project Sponsor – Set Overall Direction for Predictive Modeling Project

Slide 12 Communication Process Communication with Predictive Modeling Team Provides direction for data collection & quality review, e.g. identifying “must-haves”, “nice-to-have”, “wish list”, and “not needed” Timeline management Facilitates knowledge transfer and mitigates concerns about separating the two functions (data collection/quality & modeling)

Slide 13 Benefits of Data Quality for Predictive Modeling See the impact of Pre and Post Data Quality More time to focus on building the models Improvements in data that measurably impact the business can be taken care of because there are resources focused on the data Obtain early buy-in from the business

Slide 14 Benefits of Data Quality Retrospective – Fixing Data Errors in Systems – Documentation of cleanup for other Actuarial Analysis Prospective – Involvement in System Architecture when setting up new system – Fix processes that caused data errors – Find errors before they adversely affect results Communication – Business Awareness of Data Quality Issues

Slide 15 Lessons Learned Data Quality affects everyone – it is not just a business or IT issue Communication with Business experts is essential to understanding why there are errors in the data Formalized Process for Data Collection, Integration, Quality and Scrubbing will produce both better data and data sooner It is important to have operational definitions If pulling from multiple datasets, mapping the datasets is HUGE Process and Communication are just as, if not more, important than the Data Quality Tests and Results.