Data-Driven Fraud Detection

Data-Driven Fraud Detection
6 Data-Driven Fraud Detection

LEARNING OBJECTIVES After studying this chapter, you should be able to: Describe the importance of data-driven fraud detection, including the difference between errors and fraud. Explain the steps in the data analysis process. Be familiar with common data analysis packages. Understand the principles of data access, including Open Database Connectivity (ODBC), text import, and data warehousing. Perform basic data analysis procedures for fraud detection. Read and analyze a Matosas matrix. Understand how fraud is detected by analyzing financial statements. CHAPTER 6

Errors and Frauds ERRORS FRAUD Errors are not intentional
They are simply problems in the system caused by failures in systems, procedures, or policies They do not represent fraud and normally do not result in legal action Errors are usually spread evenly throughout a data set Fraud is the intentional circumvention of controls by intelligent human beings Perpetrators cover their tracks by creating false documents or changing records in database systems Evidence of fraud may be found in very few transactions Fraudulent symptoms are found in single cases or limited areas of the data set CHAPTER 6

Audit Sampling and Fraud
Statistical sampling has become a standard auditing procedure. Audit sampling is an effective analysis procedure for finding routine errors spread throughout a data set. In contrast, sampling is usually a poor analysis technique when looking for a needle in a haystack. If you sample at a 5 percent rate, you effectively take a 95 percent chance that you will miss the few fraudulent transactions. Often, fraud examiners strive to complete full-population analysis to ensure that the “needles” are found. Given the right tools and techniques, full-population analysis is often the preferred method in a fraud investigation. CHAPTER 6

The Data Analysis Process
Fraud investigators must be prepared to learn new methodologies, software tools, and analysis techniques to successfully take advantage of data- oriented methods. Data-driven fraud detection is proactive in nature. The investigator no longer has to wait for a tip to be received. The investigator brainstorms the schemes and symptoms that might be found and then looks for them. Data-driven detection is essentially a hypothesis- testing approach: The investigator makes hypotheses and tests to see which are supported by the data. CHAPTER 6

Figure 6.1 The Proactive Method of Fraud Detection
CHAPTER 6

The Data Analysis Process—Six Steps
Step 1 Understand the business Step 2 Identify possible frauds that could exist Step 3 Catalog possible fraud symptoms Step 4 Use technology to gather data about symptoms Step 5 Analyze results Step 6 Investigate symptoms CHAPTER 6

Step 1: Understand the Business
The same fraud detection procedures cannot be applied generically to all businesses or even to different units of the same organization. Several potential methods to gather information about a business are as follows: Tour the business, department, or plant Become familiar with competitor processes Interview key personnel (ask them where fraud might be found) Analyze financial statements and other accounting information Review process documentation Work with auditors and security personnel Observe employees performing their duties CHAPTER 6

Step 2: Identify Possible Frauds That Could Exist
This risk assessment step requires an understanding of the nature of different frauds, how they occur, and what symptoms they exhibit. The fraud identification process begins by conceptually dividing the business unit into its individual functions or cycles. During this stage, the fraud detection team should brainstorm potential frauds by type and player. CHAPTER 6

Step 3: Catalog Possible Fraud Symptoms
In Step 3, the fraud examiner should carefully consider what symptoms could be present in the potential frauds identified in Step 2. Types of Fraud Symptoms Accounting errors Internal control weaknesses Analytical errors Extravagant lifestyles Unusual behaviors Tips and complaints CHAPTER 6

Figure 6.2 Red Flags of Kickbacks
CHAPTER 6

Step 4: Use Technology to Gather Data about Symptoms
Searching and analysis Data analysis applications Custom structured query language (SQL) queries and scripts The deliverable of this step is a set of data that matches the symptoms identified in the previous step. CHAPTER 6

Step 5: Analyze Results Once errors are refined and determined by the examiners to be likely indications of fraud, they are analyzed using either traditional or technology-based methods: Screening results using computer algorithms Real-time analysis and detection of fraud One advantage of the data-driven approach is its potential reuse. CHAPTER 6

Step 6: Investigate Symptoms
The final step of the data-driven approach is investigation into the most promising indicators. The primary advantage of the data-driven approach is the investigator takes charge of the fraud investigation process. Instead of waiting for tips or other indicators to become egregious enough to show on their own, the data-driven approach can highlight frauds while they are small. The primary drawback to the data-driven approach is that it can be more expensive and time intensive than the traditional approach. CHAPTER 6

Data Analysis Software
Software used by auditors and investigators for data analysis includes: ACL Audit Analytics Powerful program for data analysis Most widely used by auditors worldwide CaseWare’s IDEA Recent versions include an increasing number of fraud techniques ACL’s primary competitor Microsoft Office + ActiveData A plug-in for Microsoft Office Provides data analysis procedures Based in Excel and Access Less expensive alternative to ACL and IDEA Other software packages include SAS and SPSS (statistical analysis programs with available fraud modules) Traditional programming languages like Perl, Python, Ruby, Visual Basic, and other specialized data mining platforms CHAPTER 6

Data Access The most important (and often most difficult) step in data analysis is gathering the right data in the right format during the right time period. Methods include Open Database Connectivity (ODBC) Text Import Hosting a Data Warehouse CHAPTER 6

Open Database Connectivity (ODBC)
Open Database Connectivity (ODBC) is a standard method of querying data from corporate relational databases. It is a connector between the front-end analysis and the back-end corporate database. It is usually the best way to retrieve data for analysis. It can retrieve data in real time. It allows use of the powerful SQL language for searching and filtering. It allows repeated pulls for iterative analysis. It retrieves metadata like column types and relationships directly. CHAPTER 6

Text Import Several text formats exist for copying data from one application (i.e., a database) to another (i.e., an analysis application). Delimited text Comma separated values (CSV) tab separated values (TSV) Fixed-width format Extensible markup language (XML) Used in many new applications EBCDIC Used primarily on IBM mainframes CHAPTER 6

Hosting a Data Warehouse
Many investigators simply import data directly into their analysis application, effectively creating a simplified data warehouse. While most programs are capable of storing millions of records in multiple tables, most analysis applications are relatively poor data repositories. Databases are the optimal method of storing data. Accounting applications like ACL and IDEA provide options for server-based storage of data. CHAPTER 6

Data Analysis Techniques
Once data are retrieved and stored in a data warehouse, analysis application, or text file, they need to be analyzed to identify transactions that match the indicators identified earlier in the process. Analysis techniques commonly used by fraud investigators: Data preparation Benford’s Law Digital analysis Outlier investigation Stratification and summarization Time trend analysis Fuzzy matching Real-time analysis CHAPTER 6

Data Preparation One of the most important—and often most difficult— tasks in data analysis is proper preparation of data. Areas of concern Type conversion and consistency of values Descriptives about columns of data Time standardization CHAPTER 6

Digital Analysis Digital analysis is the art of analyzing the digits that make up number sets like invoice amounts, reported hours, and costs. Benford’s Law accurately predicts for many kinds of financial data that the first digits of each group of numbers in a set of random numbers will conform to the predicted distribution pattern. Using Benford’s Law to detect fraud has the major advantage of being a very inexpensive method to implement and use. The disadvantage of using Benford’s Law is that it is tantamount to hunting fraud with a shotgun. CHAPTER 6

Table 6.1 Benford’s Law Probability Values
CHAPTER 6

Figure 6.3 Digital Analysis—Supply Management
CHAPTER 6

Outlier Investigation
Another common analysis that fraud investigators perform is identification of outliers. By focusing on outliers, investigators can easily identify cases that do not match the norm. CHAPTER 6

Figure 6.4 Supplier Graphs
CHAPTER 6

Stratification Stratification is the splitting of complex data sets into groupings. The data set must be stratified into a number of “subtables” before analysis can be done. For many data sets, stratification can result in thousands of subtables. While basic programs like spreadsheets make working with this many tables difficult and time consuming, analysis applications like ACL and IDEA make working with lists of tables much easier. CHAPTER 6

Summarization Summarization is an extension of stratification.
Summarization runs one or more calculations on the subtables to produce a single record representing each subtable. Basic summarization usually produces a single results table with one record per case value. Pivot tables (also called cross tables) are two- dimensional views with cases in one dimension and the calculations in the detail cells. CHAPTER 6

Time Trend Analysis Time trend analysis is a summarization technique that produces a single number that summarizes each graph. By sorting the results table appropriately, the investigator quickly knows which graphs need further manual investigation. CHAPTER 6

Figure 6.5 Time Trend Graph
CHAPTER 6

Fuzzy Matching Another common technique is fuzzy matching of textual values. This technique allows for searches to be performed that will find matches between some text and entries in a database that are less than 100 percent identical. The first and most common method of fuzzy matching is use of the Soundex algorithm. A more powerful technique for fuzzy matching uses n- grams. This technique compares runs of letters in two values to get a match score from 0 to 100 percent. CHAPTER 6

Real-Time Analysis Data-driven investigation is one of the most powerful methods of discovering fraud. It is usually performed during investigations or periodic audits, but it can be integrated directly into existing systems to perform real-time analysis on transactions. Although real-time analysis is similar to traditional accounting controls because it works at transaction time, it is a distinct technique because it specifically analyzes each transaction for fraud (rather than for accuracy or some other attribute). CHAPTER 6

Data Analysis Matosas Matrix
The Matosas matrix is a high-level view of which contracts have indicator hits that need to be investigated. It allows the investigator to mentally combine different indicators to different schemes. An example of a Matosas matrix appears in Figure 6.6. This matrix lists one record per contract for which vendors bid. Each column in the table represents an indicator run by the system. While the matrix shown in Figure 6.6 contains only four indicators, a matrix in the real world might contain 50 or 100 indicator columns. CHAPTER 6

Figure 6.6 Example Matosas Matrix for Contract Bidding
CHAPTER 6

Analyzing Financial Statements
To detect fraud through financial statements, investigators focus on unexplained changes. Balance sheets and income statements are converted from position and period statements to change statements in four ways: Comparing account balances in the statements from one period to the next Calculating key ratios and comparing them from period to period Performing vertical analysis Performing horizontal analysis The statement of cash flows is already a change statement and doesn’t need to be converted. CHAPTER 6

TABLE 6.2 Common Ratios CHAPTER 6

Figure 6.7 Vertical Analysis of a Balance Sheet
CHAPTER 6

Figure 6.8 Vertical Analysis of an Income Statement
CHAPTER 6

Figure 6.9 Horizontal Analysis of a Balance Sheet and an Income Statement
CHAPTER 6

Figure 6.10 ESM Government–Horizontal Analysis
CHAPTER 6

Figure 6.11 ESM Government–Vertical Analysis
CHAPTER 6

Figure 6.12 Statement of Cash Flows
CHAPTER 6

Data-Driven Fraud Detection

Similar presentations

Presentation on theme: "Data-Driven Fraud Detection"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data-Driven Fraud Detection

Similar presentations

Presentation on theme: "Data-Driven Fraud Detection"— Presentation transcript:

Similar presentations

About project

Feedback