ANOMALY DETECTION FRAMEWORK FOR BIG DATA

Slides:



Advertisements
Similar presentations
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Advertisements

Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
1. Abstract 2 Introduction Related Work Conclusion References.
Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.
Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.
Data Mining: A Closer Look
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.
Copyright © 2014 Pearson Education, Inc. 1 It's what you learn after you know it all that counts. John Wooden Key Terms and Review (Chapter 6) Enhancing.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
CS525: Big Data Analytics Machine Learning on Hadoop Fall 2013 Elke A. Rundensteiner 1.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Banking on Analytics Dr A S Ramasastri Director, IDRBT.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Information Eastman. Business Process Skills Order to Cash, Forecasting & Budgeting, etc. Process Modeling Project Management Technical Skills.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Big Data Javad Azimi May First of All… Sorry about the language  Feel free to ask any question Please share similar experiences.
Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
Introduction to Machine Learning, its potential usage in network area,
9/24/2017 7:27 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
BigML “Warren Buffett is one of the best learning machines on this earth. The turtles which outrun the hares are learning machines. If you stop learning.
Oracle Advanced Analytics
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
A Generic Approach to Big Data Alarms Prioritization
Experience Report: System Log Analysis for Anomaly Detection
Data Science for Finance and Business
Big data classification using neural network
Data Mining.
Big Data is a Big Deal!.
SNS COLLEGE OF TECHNOLOGY
Based on four case studies and a follow-up survey, we have identified the key success factors for realizing value from DDS (digital data stream) investments.
PROTECT | OPTIMIZE | TRANSFORM
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Machine Learning Library for Apache Ignite
MIS2502: Data Analytics Advanced Analytics - Introduction
Assurance Scoring: Using Machine Learning and Analytics to Reduce Risk in the Public Sector Matt Thomson 17/11/2016.
Design of Big Data Reference Architectures for Use Cases in the Insurance Sector Vladimir Elvov, Bachelor’s Thesis – Initial Presentation,
Trends in my profession, Information Technology
The Contemporary Firm 550 By: Beatriz Guzman
Business Intelligence Design and Development Michael A. Fudge, Jr.
Apache Spark & Complex Network
Data Warehousing and Data Mining
Big Data.
Big Data Overview.
Parallel Analytic Systems
Supporting End-User Access
Big Data Young Lee BUS 550.
Course Introduction CSC 576: Data Mining.
Integrating Deep Learning with Cyber Forensics
Comparative Evaluation of SOM-Ward Clustering and Decision Tree for Conducting Customer-Portfolio Analysis By 1Oloyede Ayodele, 2Ogunlana Deborah, 1Adeyemi.
Dep. of Information Technology By: Raz Dara Mohammad Amin
Big Data Analysis in Digital Marketing
Business Intelligence
Big DATA.
Azure Machine Learning
IBM Software A financial services company speeds up fraud detection Protecting customers and innovating new product offerings with an IBM business rules.
Customer 360.
Machine Learning in Business John C. Hull
Big Data.
Presentation transcript:

ANOMALY DETECTION FRAMEWORK FOR BIG DATA

Outline Background Problem Related Work Result Introduction Methodology Implementation Result Conclusions/Future Work by NKG

Introduction The volume of data hitting the servers of financial institutions is increasing at an exponential rate. This makes it difficult for their analysis Big Data Anomaly Detection Framework Spark: Spark Mlib, Support Vector Machine, was used to build a model to detect Fraudulent Transaction. by NKG

Background What is Big Data? The iBig data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. Anomaly Detection: Anomaly detection is defined as the task of finding instances in a dataset which are different from the norm. (Goldstein, 2014). The idea of Big data was born out of the fact that considering the growing volume of data which are large and complex, traditional tools are no longer sufficient to process this data. In this regard, predictive modeling such as anomaly detection, should be performed on this Big data. by NKG

Background Cont. What is Data mining : Data mining is a process used by companies to turn raw data into useful information. By using software to look for patterns in large batches of data, businesses can learn more about their customers and develop more effective marketing strategies as well as increase sales and decrease costs. Credit card fraud is a wide-ranging term for theft and fraud committed using or involving a payment card, such as a credit card or debit card, as a fraudulent source of funds in a transaction. (Wikipedia, 2016) Data mining is a powerful new technology with great potential to help companies focus on the most important information in the data they have collected about the behavior of their customers and potential customers. by NKG

Problem With this exciting growth comes a new challenge whereby, new products and a growing customer base translate to increased pressure on banks to manage and secure their data. The problem under study is to look at ways of best algorithm and techniques to detect fraud (data anomaly) in financial institutions based on online banking transactions (using Credit Card) to improve decisions. By the nature of their businesses, financial firms are required to store a large volume of data under incredibly strict compliance with a number of regulators. by NKG

Introduction Cont. General Objectives Review the concept of big data, tools and techniques for real time data analysis. Develop a model for anomaly detection Propose a framework based on adversarial machine learning techniques to detect anomaly in real time environment in big data by NKG

Related Work Data Mining Process Data Clustering and Integration Selection and Transformation Data Mining Interpretation/Evaluation Knowledge by NKG

Big Data Related Work Cont. Volume Velocity Variety Veracity Value Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. by NKG

Technologies of Big Data Related Work Cont. Technologies of Big Data There are a lot of big data technologies, Examples are Spark Hadoop Cassandra Mongo DB No SQL by NKG

Related Work Cont. Mahest et al. (2015), using classification profiling of the input codes to differentiates between good and malicious codes. The Intelligent Customer Analytics for Recognition and Exploration (iCARE), a framework presented to analyze customer behavior using banking big data. which was leverage on IBM products of SPSS Analytic Server and InforSphere. Credit Card Fraud Detection using SVM and reduction of False Alarms by NKG

Compering SVM-M with iCARE Related Work Cont. Compering SVM-M with iCARE SVM-S iCARE Distributing of Data Real time Streaming Spark Apache Runs in Memory Distributing of Data Not real time streaming IBM InforSphere Runs in External Storage by NKG

Methodology Experiment was conducted to solve the problem using Spark Apache big data Platform. Support Vector Machine with Spark (SVM-S) Framework was used. Data for the experiment was credit card transactions. by NKG

Methodology Cont. Data Sources Client Creating Files Bank Data Sources by NKG

SVM-S Model The Support Vector Machine with Spark (SVM-S) Model combine both historical and streaming data for prediction. It can stream real time transactions for predicting anomalies with the train data by NKG

SVM-S Architecture The SVM-S Architecture combine both Support Vector Machine and Logistic Regression for training the data model. It stand on HDFS for acquiring data for prediction. by NKG

SVM-S Framework The SVM-S Framework It stream data at real time using the Spark streaming. The SVM-S Framework process supervised, semi-supervised and unsupervised data. by NKG

Implementation Set up your Spark Create path for Python or Scala Programming the data used was from literature review, credit card transaction of German companies and Australia agencies  the data was .json file, and it was analyze using spark stand alone on i5 laptop with 8 Gig of RAM.  by NKG

Implementation Data Credit card transactions was use for implementation of the SVM-S Framework. Normal transactions was mixed with anomaly transactions to be detected by SVM-S Framework by NKG

Result SVM-S was very accurate and robust to detect injected fraud transactions. SVM-S Framework is very efficient when compare with iCARE Framework. by NKG

Conclusion This thesis has considered how big data can be leveraged in the financial sector. In particular, the study has presented an approach to anomaly detection in big data. This thesis proposed Anomaly Detection Framework to enhance fraud detection in financial institutions. To cope with the velocity, veracity and volume of Big Data, the anomaly detection framework relies on a fast, albeit less accurate and live streaming of data to find anomalies in real-time. This is one of a kind to be introduced by the financial institutions in Ghana. by NKG

Reference by NKG

Future Work It is important to consider the other common types of application, datasets. However, adding new features to improve the framework to serve other data types like sensors. Secondly, to implement the framework, in a real business environment, the framework could also be feasible with decision making systems within such an environment. Again, the component of ADF, allows further components to be added or upgraded. However, perhaps research will be presented that shows different machine learning algorithms to perform better in implementations. by NKG