Presentation is loading. Please wait.

Presentation is loading. Please wait.

ANOMALY DETECTION FRAMEWORK FOR BIG DATA

Similar presentations


Presentation on theme: "ANOMALY DETECTION FRAMEWORK FOR BIG DATA"— Presentation transcript:

1 ANOMALY DETECTION FRAMEWORK FOR BIG DATA

2 Outline Background Problem Related Work Result Introduction
Methodology Implementation Result Conclusions/Future Work by NKG

3 Introduction The volume of data hitting the servers of financial institutions is increasing at an exponential rate. This makes it difficult for their analysis Big Data Anomaly Detection Framework Spark: Spark Mlib, Support Vector Machine, was used to build a model to detect Fraudulent Transaction. by NKG

4 Background What is Big Data?
The iBig data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. Anomaly Detection: Anomaly detection is defined as the task of finding instances in a dataset which are different from the norm. (Goldstein, 2014). The idea of Big data was born out of the fact that considering the growing volume of data which are large and complex, traditional tools are no longer sufficient to process this data. In this regard, predictive modeling such as anomaly detection, should be performed on this Big data. by NKG

5 Background Cont. What is Data mining :
Data mining is a process used by companies to turn raw data into useful information. By using software to look for patterns in large batches of data, businesses can learn more about their customers and develop more effective marketing strategies as well as increase sales and decrease costs. Credit card fraud is a wide-ranging term for theft and fraud committed using or involving a payment card, such as a credit card or debit card, as a fraudulent source of funds in a transaction. (Wikipedia, 2016) Data mining is a powerful new technology with great potential to help companies focus on the most important information in the data they have collected about the behavior of their customers and potential customers. by NKG

6 Problem With this exciting growth comes a new challenge whereby, new products and a growing customer base translate to increased pressure on banks to manage and secure their data. The problem under study is to look at ways of best algorithm and techniques to detect fraud (data anomaly) in financial institutions based on online banking transactions (using Credit Card) to improve decisions. By the nature of their businesses, financial firms are required to store a large volume of data under incredibly strict compliance with a number of regulators. by NKG

7 Introduction Cont. General Objectives Review the concept of big data, tools and techniques for real time data analysis. Develop a model for anomaly detection Propose a framework based on adversarial machine learning techniques to detect anomaly in real time environment in big data by NKG

8 Related Work Data Mining Process Data Clustering and Integration
Selection and Transformation Data Mining Interpretation/Evaluation Knowledge by NKG

9 Big Data Related Work Cont. Volume Velocity Variety Veracity Value
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. by NKG

10 Technologies of Big Data
Related Work Cont. Technologies of Big Data There are a lot of big data technologies, Examples are Spark Hadoop Cassandra Mongo DB No SQL by NKG

11 Related Work Cont. Mahest et al. (2015), using classification profiling of the input codes to differentiates between good and malicious codes. The Intelligent Customer Analytics for Recognition and Exploration (iCARE), a framework presented to analyze customer behavior using banking big data. which was leverage on IBM products of SPSS Analytic Server and InforSphere. Credit Card Fraud Detection using SVM and reduction of False Alarms by NKG

12 Compering SVM-M with iCARE
Related Work Cont. Compering SVM-M with iCARE SVM-S iCARE Distributing of Data Real time Streaming Spark Apache Runs in Memory Distributing of Data Not real time streaming IBM InforSphere Runs in External Storage by NKG

13 Methodology Experiment was conducted to solve the problem using Spark Apache big data Platform. Support Vector Machine with Spark (SVM-S) Framework was used. Data for the experiment was credit card transactions. by NKG

14 Methodology Cont. Data Sources Client Creating Files Bank Data Sources
by NKG

15 SVM-S Model The Support Vector Machine with Spark (SVM-S) Model combine both historical and streaming data for prediction. It can stream real time transactions for predicting anomalies with the train data by NKG

16 SVM-S Architecture The SVM-S Architecture combine both Support Vector Machine and Logistic Regression for training the data model. It stand on HDFS for acquiring data for prediction. by NKG

17 SVM-S Framework The SVM-S Framework
It stream data at real time using the Spark streaming. The SVM-S Framework process supervised, semi-supervised and unsupervised data. by NKG

18 Implementation Set up your Spark
Create path for Python or Scala Programming the data used was from literature review, credit card transaction of German companies and Australia agencies  the data was .json file, and it was analyze using spark stand alone on i5 laptop with 8 Gig of RAM.  by NKG

19 Implementation Data Credit card transactions was use for implementation of the SVM-S Framework. Normal transactions was mixed with anomaly transactions to be detected by SVM-S Framework by NKG

20 Result SVM-S was very accurate and robust to detect injected fraud transactions. SVM-S Framework is very efficient when compare with iCARE Framework. by NKG

21 Conclusion This thesis has considered how big data can be leveraged in the financial sector. In particular, the study has presented an approach to anomaly detection in big data. This thesis proposed Anomaly Detection Framework to enhance fraud detection in financial institutions. To cope with the velocity, veracity and volume of Big Data, the anomaly detection framework relies on a fast, albeit less accurate and live streaming of data to find anomalies in real-time. This is one of a kind to be introduced by the financial institutions in Ghana. by NKG

22 Reference by NKG

23 Future Work It is important to consider the other common types of application, datasets. However, adding new features to improve the framework to serve other data types like sensors. Secondly, to implement the framework, in a real business environment, the framework could also be feasible with decision making systems within such an environment. Again, the component of ADF, allows further components to be added or upgraded. However, perhaps research will be presented that shows different machine learning algorithms to perform better in implementations. by NKG


Download ppt "ANOMALY DETECTION FRAMEWORK FOR BIG DATA"

Similar presentations


Ads by Google