Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski.

Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski

Objective Explore research issues related to the application of statistical data mining to fraud detection in journal entries –Is this important? –YES! Most significant frauds are not conducted by the users of the ERP systems, they are done “outside” of these well controlled systems. Was this accomplished? –Maybe

Accomplished? Used Benford’s Law in examining Journal Entries Statistically significant differences in First Digit distributions were found (Chi Square test), should these be investigated? –A 0% difference (Omicron) gives a statistically significant p < 0.015. What does this tell me? –Is a 1% difference between observed and predicted indicative of a problem? –Could use Mean Absolute Deviation

EntityTotal DevMAD Beta0.190.0211 Chi0.030.0033 ChiEta0.060.0067 ChiNu0.110.0122 ChiPi0.300.0333 Delta0.060.0067 Eta0.200.0222 EtaNu0.100.0111 EtaPi0.080.0089 Nu0.340.0378

Benford’s Law & First 5 Firms

Accomplished? Identification of “violations” of the Benford’s First Digit Law only provides a preliminary indication –Nigrini and Mittermaier (1997) recommend using the first digit as an initial test of reasonableness

Other “Benford’s Law” Digit Tests Second Digit Test –This also only gives a preliminary indication First Two Digits Test –Provide more direction Number Duplication –Identify and rank order duplicate numbers

Other Benford’s Law Research Carslaw (1988) found support for rounding up of income figures using the expected second digit frequencies (more 0s, fewer 9s than expected). Thomas (1989), again using second digits found support for rounding up of income and down for losses. Nigrini –(1994) used first two digit frequencies to analyze payroll fraud, and –(1996) used first two digit frequencies to examine tax compliance

Fourth Digit Test Chi Square to test for distributional difference of fourth digit –“…distribution of the fourth digit for each organization for all dollar amounts over $999.” –Was this the fourth digit to the left or right? –What if the transaction was for $100,000? While statistically significant differences were found, should these be investigated?

Three Digit Test Examined Last (Three) Digits in dollar amounts –Used the “top 5” of the last three digit pattern –Found that 4 of 29 entities had 30-60% of their transactions consisting of the top 5 last three digit patterns Would be interesting to note if these were the entities that “failed” Benford’s Law

Data Mining J/E Questions Would have liked a more reasoned/theoretical approach in specifying where and why data mining techniques should be applied Sources of J/E? –Influence Data Mining Unusual patterns between classes of J/Es? Class of J/E influence nature of J/E (i.e., do any type of J/E have a higher probability of fraud)? Evidence from Benford’s Law or Right Most Digits? Underlying issues that will guide effective and efficient data mining of JEs

Descriptive Statistics Any way to group the firms by industry? What can be found based upon grouping and analyzing by size?

Other Questions What other approaches (than Benford’s Law) can be applied to mining journal entries? What is currently done by audit teams for computerized analysis of journal entries? The analysis expects to see a “large enough” number of Journal Entries in order to highlight that fraud might be occurring. What if only a few JEs are made? What is the sensitivity of this approach?

Confusion Number of organizations? –36 organizations – 8 data sets had less than 1 year – 1 data set was incomplete – 27  why 29 observations? Did you count each year for the 2 organizations that provided 2 years of data as separate observations? –What is the justification? –Why not do a year-to-year comparison for those organizations?

What’s Missing? Interpretation and more detailed analysis of the data –Know that there are “violations” but never know if there is really fraudulent activity What are the other data mining techniques that are planned? Analytical reasoning as to what tests should be done or what is revealed by certain tests

Data Mining Extensions Compare the entities with “larger” average line items per journal entry (e.g. >10) in one pool? Alternatively look at those in which the maximum number of line items is large (e.g. >100)

Summary Objective – explore research issues related to the application of statistical data mining to fraud detection in journal entries Good first step – and this is a pilot study Would like more theoretical motivation for tests & research issues Would have liked more data analysis Could I apply this in an audit? I’m not sure - - - more research is needed

Thank You

Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski.

Similar presentations

Presentation on theme: "Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski.

Similar presentations

Presentation on theme: "Data Mining Journal Entries for Fraud Detection: A Pilot Study by Roger S. Debreceny & Glen L. Gray Discussed by Severin Grabski."— Presentation transcript:

Similar presentations

About project

Feedback