Predictive Modeling and Dental Fraud Detection November 22, 2007 Jonathan Polon FSA Barry Senensky FSA FCIA MAAA Claim Analytics Inc. www.claimanalytics.com.

Slides:



Advertisements
Similar presentations
Barry Senensky FSA FCIA MAAA Overview of Claim Scoring November 6, 2008.
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Improving Disability Claims Management with Predictive Modeling May 15, 2008 Claim Analytics Inc. Barry Senensky FSA FCIA MAAA Jonathan Polon FSA
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Evaluating Inforce Blocks Of Disability Business With Predictive Modeling SOA Spring Health Meeting May 28, 2008 Jonathan Polon FSA
Disability Pricing and Dental Fraud Detection: Supervised and Unsupervised Learning October 11, 2007 Jonathan Polon FSA Claim Analytics Inc.
Dental Insurance Fraud Detection With Predictive Modeling SOA Spring Health Meeting May 29, 2008 Jonathan Polon FSA
Predictive Modeling for Disability Pricing May 13, 2009 Claim Analytics Inc. Barry Senensky FSA FCIA MAAA Jonathan Polon FSA
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Mining the Data Ira M. Schoenberger, FACHCA Senior Administrator 2011 AHCA/NCAL Quality Symposium Friday February 18, 2011.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Chapter 14 The Second Component: The Database.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Part I: Classification and Bayesian Learning
CORRELATIO NAL RESEARCH METHOD. The researcher wanted to determine if there is a significant relationship between the nursing personnel characteristics.
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Overview DM for Business Intelligence.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Anomaly detection with Bayesian networks Website: John Sandiford.
Inductive learning Simplest form: learn a function from examples
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 4 An Excel-based Data Mining Tool (iData Analyzer) Jason C. H. Chen, Ph.D. Professor of MIS.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
CSC 562: Final Project Dave Pizzolo Artificial Neural Networks.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Principal Component Analysis
Clustering Algorithms Minimize distance But to Centers of Groups.
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
Azure Machine Learning Introduction to Azure ML. Setting Expectations This presentation is for you if…  you hear the buzzword “Machine Learning” and.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
MIS2502: Data Analytics Advanced Analytics - Introduction
Risk Mgt and the use of derivatives
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Kathi Kellenberger Redgate Software
Dr. Morgan C. Wang Department of Statistics
Descriptive Statistics vs. Factor Analysis
Introduction to Predictive Modeling
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Model generalization Brief summary of methods
Azure Machine Learning
Presentation transcript:

Predictive Modeling and Dental Fraud Detection November 22, 2007 Jonathan Polon FSA Barry Senensky FSA FCIA MAAA Claim Analytics Inc.

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Introduction to Predictive Modeling Overview of Dental Fraud Paper Predictive Modeling vs Classical Stats Supervised vs Unsupervised Learning Predictive Modeling for Fraud Detection Agenda

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Introductionto Predictive Modeling

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Computer Performance MeasureIBM 7094 c Laptop c Change Processor Speed (MIPS) 0.252,0008,000-fold increase Main Memory 144 KB256,000 KB1,778-fold increase Approx. Cost ($2003) $11,000,000$2,0005,500-fold decrease

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 What is a Predictive Model A Predictive Model is a model which is created or chosen to try to best predict the probability of an outcome Two “ERA’s” of Predictive Models –Traditional –Modern

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Traditional Predictive Models Generally type used by Actuaries Examples include –Mortality Studies –Lapse Studies –Morbidity Studies

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Characteristics of Traditional Predictive Models Relatively low usage of computer power Usually require many assumptions May need large amounts of data to maintain credibility Relatively easy to understand

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Modern Predictive Models Have been around 40+ years Used extensively in industry Applications include –Credit Scores –Credit Card Fraud Detection –Stock Selection –Mail sorting –Hot dogs and Hamburgers

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Examples of Modern Predictive Modeling Methods Classification and Regression Trees Neural Networks Genetic Algorithms Stochastic Gradient Boosted Trees Clustering Techniques –K Means –Expectation Maximization

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Characteristics of Modern Predictive Models Intense usage of computer power Require few assumptions Can maintain credibility with lesser of amounts of data Models can be black boxes

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Why aren’t Actuaries building modern predictive models? Life Insurance Industry is conservative and slow to change Not a traditional actuarial tool The times are changing! –Especially P&C Actuaries Its only a matter of time! –It just makes too much sense! –Innumerable applications to help solve insurance problems

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Applications of Modern Predictive Models to Insurance Problems Traditional –Better understanding of experience –Better pricing –Better reserving –Better underwriting Non-traditional –More objective claims management –Improved fraud detection

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Overview of Dental Fraud Research Paper

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Dental Fraud Research Started with approximately 200,000 dental claims records Used modern predictive modeling techniques to identify dentists with “atypical” insurance claims practices Project was very successful

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Dental Fraud Research (Cont’d) Research Paper titled “Dental Insurance Claims Identification of Atypical Claims Activity” Published April 2007 and available on CIA Website and our website

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Predictive Modeling Vs Classical Statistics

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 What is Predictive Modeling? Analyse historic dataYes So what’s the difference? Identify & quantify relationships between inputs and outcomes Yes Apply this learning to predict outcomes of new cases Yes ActionClassical Statistics Predictive Modeling

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 The essentials haven’t changed. But many things are different.

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Exploiting Modern Computing Classical Statistics Electronic computing Predates it Approach  Computationally efficient  eg, tables  eg, common distributions Predictive Modeling Exploits it  Computationally intense  Test complex relationships  Use a numeric approach

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Exploiting Computing Power To Build a Predictive Model: Analyst specifies desired form of the model Applies iterative, numerical approach to optimize weights Common approaches include: gradient descent, competitive learning and exhaustive search

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Predictive Modeling Summary Primary Advantage Can quantify greater complexity of relationships between input and outcome Can result in much better accuracy than traditional techniques Primary Concern Models can be difficult to interpret

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Common Modeling Techniques Generalized Linear Models Neural Networks Genetic Algorithms Stochastic Gradient Boosted Trees Random Forests Support Vector Machines

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 SupervisedVs Unsupervised Learning

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Supervised Learning Known outcome associated with each record in training dataset Objective: build a model to accurately estimate outcomes for each record Eg, Predicting claim incidence rates or severities based on past experience Commonly referred to as predictive modeling

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Unsupervised Learning No known outcome associated with any record in training dataset Objective: self-organization or clustering; finding structure in the data Eg, Grocery stores: define types of shoppers and their preferences Insurance fraud is typically unsupervised learning. Because it is not known which historic claims were fraudulent.

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Dental Fraud Detection Using Unsupervised Learning

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Dental Fraud Project Overview Explore use of pattern detection tools in detecting dental claim anomalies 2004 data 1,600 Ontario GPs (non-specialists) 200,000 claims

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Traditional Tools Rule-based Strong at identifying claims that match known types of fraudulent activity Limited to identifying what is known Typically analyze at the level of a single claim, in isolation

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Pattern Detection Tools Analyze millions of claims to reveal patterns –Technology learns what is normal and what is atypical – not limited by what we already know Categorize each claim. Reveal dentists with high percentage of atypical claims –Immediately highlight new questionable behaviors –Find new large-dollar schemes –Also identify frequent repetition of small-dollar abuses

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Methodology – 3 Perspectives 1.By Claim e.g. Joe Green’s semi-annual check-up 2.By Tooth e.g. all work done in 2004 on Joe’s bicuspid by Dr. Brown 3.By General Work e.g. all general work (exams, fluoride, radiographs, scaling and polishing) done on Joe in 2004 by Dr. Brown

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Methodology – 2 Techniques 1.Principal components analysis 2.Clustering

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 PCACluster Claim by claim Tooth by tooth General work Methodology – Summary

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Principal Components Analysis Visualization technique Allows reduction of multi-dimensional data to lower dimensions while maximizing amount of information preserved Powerful approach for identifying outliers

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 PCA: Simple Example Lots of variance between points around both axes

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 PCA: Simple Example Create new axes, X’ and Y’: rotate original axes 45º

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 PCA: Simple Example In the new axes, very little variance around Y’ - Y’ contains little “information”

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 PCA: Simple Example Can set all Y’ values to 0 – ie, ignore Y’ axis Result: reduce to 1 dimension with little loss of info ♦ Original ● Revised

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 PCA: Identifying Atypical Dentists

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 PCA: Identifying Atypical Dentists 1.Begin at the individual transaction level 2.Determine the average transaction for each dentist 3.Graph quickly isolates dentists that are outliers

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Dentist Average 90 th percentile PCA: Identifying Atypical Dentists

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 PCA: Summary 1.Visualization allows for easy and intuitive identification of dentists that are atypical 2.Manual investigation required to understand why a dentist or claim is atypical

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Clustering Techniques Categorization tools Organize claims into several groups of similar claims Applicable for profiling dentists by looking at the percentage of claims in each cluster We apply two different clustering techniques –K-Means –Expectation-Maximization

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Clustering Example

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Clustering Example

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Eg: Clusters found – by Claim 1.Major dental problems 2.Moderate dental problems 3.Age > 28, minor work performed 4.Work includes unbundled procedures 5.Minor work and expensive technologies 6.Age < 28, minor work performed

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Clusters: Identifying Atypical Dentists 1.Begin at the individual claim level 2.Calculate the proportion of claims in each cluster – in total, and for each dentist 3.Isolate dentists with large deviations from average

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Clusters: Identifying Atypical Dentists

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Clustering Techniques: Summary 1.Clustering provides an easy to understand method to profile dentists 2.Unlike PCA, clustering tells why a given dentist is considered atypical 3.Effectively identifies atypical activity at the dentist level

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Pilot Project Results PCA identified 68 dentists that are atypical Clusters identified 182 dentists that are atypical 36 dentists are identified as atypical by both PCA and Clusters In total, 214 of 1,644 dentists are identified as atypical (13%) Billings by atypical dentists were $2.5 MM out of $16.1 MM billed by all dentists (15%)

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 ExamplesOf Atypical Dentists

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Analysis: Dentist A is far beyond norms in clusters 6 and 8; both indicate high charges in ‘general dentistry’ What we discovered: Each of Dentist A’s patients is being billed for at least 30 minutes of polishing and 45 minutes of scaling Dentist A

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Dentist B Analysis : Dentist B is far beyond norm in cluster 7 What we discovered: Frequent extractions and anesthesia  Dentist B looks like an oral surgeon, yet is a general practitioner

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Dentist C Analysis: Very high proportion in Cluster 1, suggesting many high-ticket visits What we discovered: First, Dentist C performs an inordinate amount of scaling. Second, Dentist C has emergency examinations with atypically high frequency

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Dentist M Analysis: Dentist M has a disproportionate amount of work beyond the 90 th percentile of work by all dentists on all teeth What we discovered: Dentist M appears to utilize lab work very heavily

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 More Atypical Dentists Frequent use of panoramic x-rays Frequent use of nitrous oxide – including with procedures rarely associated with anesthesia Large number of extractions, often using multiple types of sedation Very high proportion of claims for crowns and endodontic work Individual instruction on oral hygiene provided with very atypical frequency

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Dental Fraud Summary Highly effective in identifying dentists with claim portfolios significantly different from the norm Enables experts to quickly identify and focus on those dentists with atypical claims activity Pattern detection:

2007 General Meeting Assemblée générale General Meeting Assemblée générale 2007 Questions?