Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection.

Similar presentations


Presentation on theme: "CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection."— Presentation transcript:

1 CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection

2 What is Fraud Detection? Identify wrongful actions –Is right and wrong universal? –If so, why not just prevent wrong actions Identify actions by the wrong people Identify suspect actions –Legal –But probably not right

3 In Data Mining terms… Classification? –Classify into fraudulent and non-fraudulent behavior –What do we need to do this? Outlier Detection –Assume non-fraudulent behavior is normal –Find the exceptions Problems?

4 –+– Solution: Differential Profiling Determine individual behavior –What is normal for the individual –What separates one individual from another Gives profile of individual behavior How do we do this? Profile Classification Mining Profile ++–

5 Has this been done? Intrusion Detection (Lane&Brodley) Profiled computer users based on command sequences –Command –Some (but not all) argument information –Sequence information

6 Results AccuracyTime to Alarm

7 Scaling Issues What happens with millions of users? –Credit card –Cell phone What about new users? Ideas?

8 Multi-user profiles Cluster users Develop profiles for clusters –E.g., differential profiling Old customers: Do they match profile for their cluster? –Allows wider range of acceptable behavior New customer: Do they match any profile?

9 Data mining for detection and prevention

10 “The process of discovering meaningful new relationships, patterns and trends by sifting through data using pattern recognition technologies as well as statistical and mathematical techniques.” - The Gartner Group Data mining defined:

11 Matching known fraud/non-compliance Which new cases are similar to known cases? How can we define similarity? How can we rate or score similarity?

12 Anomalies and irregularities How can we detect anomalous or unusual behavior? What do we mean by usual? Can we rate or score cases on their degree of anomaly?

13 Data mining is not “Blind”application of analysis/modeling algorithms Brute-force crunching of bulk data Black box technology Magic

14 How do you mine data? Use the Cross Industry Standard Process for Data Mining (CRISP-DM) Based on real- world lessons: –Focus on business issues –User-centric & interactive –Full process –Results are used

15 Techniques used to identify fraud Predict and Classify –Regression algorithms (predict numeric outcome): neural networks, CART, Regression, GLM –Classification algorithms (predict symbolic outcome): CART, C5.0, logistic regression Group and Find Associations –Clustering/Grou ping algorithms: K-means, Kohonen, 2Step, Factor analysis –Association algorithms: apriori, GRI, Capri, Sequence

16 Techniques for finding fraud: Predict the expected value for a claim, compare that with the actual value of the claim. Those cases that fall far outside the expected range should be evaluated more closely

17 Techniques for finding fraud: Build a profile of the characteristics of fraudulent behavior. Pull out the cases that meet the historical characteristics of fraud. Decision Trees and Rules

18 Techniques for finding fraud: Group behavior using a clustering algorithm Find groups of events using the association algorithms Identify outliers and investigate Clustering and Associations

19 Fraud detection using CRISP-DM Provides a systematic way to detect fraud and abuse Ensures auditing and investigative efforts are maximized Continually assesses and updates models to identify new emerging fraud patterns Leads to higher recoupments

20 Data mining in action: Fraud, waste and abuse case studies

21 How can data mining help? Payment error prevention Billing and payment fraud Audit selection

22 Payment Error Prevention …used this information to focus their auditing effort The US Health Care Finance Administration needed to isolate the likely causes of payment error by developing a profile of acceptable billing practices and...

23 Payment error prevention solution Clementine™ Using audited discharge records, built profiles of appropriate decisions such as diagnosis coding and admission Matched new cases Cases not matching are audited

24 Payment error prevention results Detected 50% of past incorrect payments – resulting in significant recovery of funding lost to payment errors PRO analysts able to use resultant Clementine models to prevent future error

25 Billing and payment fraud Identified suspicious cases to focus investigations The US Defense Finance and Accounting Service needed to find fraud in millions of Dept of Defense transactions and...

26 Billing and payment fraud solution Clementine Detection models based on known fraud patterns Analyzed all transactions – scored based on similarity to these known patterns High scoring transactions flagged for investigation

27 Billing and payment fraud results Identified over 1,200 payments for further investigation Integrated the detection process Anomaly detection methods (e.g., clustering) will serve as ‘sentinel’ systems for previously undetected fraud patterns

28 Audit selection Focused audit investigations on cases with the highest likely adjustments The Washington State Department of Revenue needed to detect erroneous tax returns and...

29 Audit selection solution Clementine Using previously audited returns Model adjustment (recovery) per auditor hour based on return information Models will then score future returns showing highest potential adjustment

30 Audit selection results Maximizes auditors’ time by focusing on cases likely to yield the highest return Closes the ‘tax gap’

31 Data mining - key to detecting and preventing fraud, waste and abuse Learn from the past –High quality, evidence based decisions Predict –Prevent future instances React to changing circumstances –Models kept current, from latest data


Download ppt "CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection."

Similar presentations


Ads by Google