Cloud Data Anonymization Using Hadoop Map-Reduce Framework With Qos Evaluation and Behaviour analysis PROJECT GUIDE: Ms.S.Subbulakshmi TEAM MEMBERS: A.Mahalakshmi(921711205037).

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

HDFS & MapReduce Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Building Cloud-ready Video Transcoding System for Content Delivery Networks(CDNs) Zhenyun Zhuang and Chun Guo Speaker: 饒展榕.
Min Song 1, Yanxiao Zhao 1, Jun Wang 1, E. K. Park 2 1 Old Dominion University, USA 2 University of Missouri at Kansas City, USA IEEE ICC 2009 A High Throughput.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.
Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin.
Spark: Cluster Computing with Working Sets
DISC-Finder: A distributed algorithm for identifying galaxy clusters.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
Ch 4. The Evolution of Analytic Scalability
MapReduce.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
SecureMR: A Service Integrity Assurance Framework for MapReduce Author: Wei Wei, Juan Du, Ting Yu, Xiaohui Gu Source: Annual Computer Security Applications.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
DisCo: Distributed Co-clustering with Map-Reduce S. Papadimitriou, J. Sun IBM T.J. Watson Research Center Speaker: 吳宏君 陳威遠 洪浩哲.
Introduction to Hadoop and HDFS
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
Hybrid OFDMA/CSMA Based Medium Access Control for Next- Generation Wireless LANs Yaser Pourmohammadi Fallah, Salman Khan, Panos Nasiopoulos, Hussein Alnuweiri.
Alastair Duncan STFC Pre Coffee talk STFC July 2014 The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project.
Hidemoto Nakada, Hirotaka Ogawa and Tomohiro Kudoh National Institute of Advanced Industrial Science and Technology, Umezono, Tsukuba, Ibaraki ,
Copyright © 2006, UCD Dublin Systems Research Group School of Computer Science and Informatics UCD Dublin, Belfield, Dublin 4, Ireland
Database Applications (15-415) Part II- Hadoop Lecture 26, April 21, 2015 Mohammad Hammoud.
Bounded relay hop mobile data gathering in wireless sensor networks
Using Map-reduce to Support MPMD Peng
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Department of Computing, School of Electrical Engineering and Computer Sciences, NUST - Islamabad KTH Applied Information Security Lab Secure Sharding.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Using Map-reduce to Support MPMD Peng
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
High Speed Interconnect Project
”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.
NN QoS activities – past and present
MapReduce Compiler RHadoop
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
University of Maryland College Park
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
Introduction to Load Balancing:
Scalable Load-Distance Balancing
Edinburgh Napier University
ABSTRACT   Recent work has shown that sink mobility along a constrained path can improve the energy efficiency in wireless sensor networks. Due to the.
Parallel Density-based Hybrid Clustering
Algorithms for Big Data Delivery over the Internet of Things
Ministry of Higher Education
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Author: Ahmed Eldawy, Mohamed F. Mokbel, Christopher Jonathan
Cloud Distributed Computing Environment Hadoop
湖南大学-信息科学与工程学院-计算机与科学系
MapReduce: Data Distribution for Reduce
Cse 344 May 4th – Map/Reduce.
Ch 4. The Evolution of Analytic Scalability
Data-Intensive Computing: From Clouds to GPU Clusters
Resource-Efficient and QoS-Aware Cluster Management
Introduction to Spark.
COS 518: Distributed Systems Lecture 11 Mike Freedman
MapReduce: Simplified Data Processing on Large Clusters
EdgeWise: A Better Stream Processing Engine for the Edge
Presentation transcript:

Cloud Data Anonymization Using Hadoop Map-Reduce Framework With Qos Evaluation and Behaviour analysis PROJECT GUIDE: Ms.S.Subbulakshmi TEAM MEMBERS: A.Mahalakshmi(921711205037). S.Sundareswari(921711205075).

PROJECT OVERVIEW To achieve scalability and to measure the quality of the system designed we propose an Anonymized data using Hadoop-Map reduce in cloud with QOS evaluation. A scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the Map Reduce framework on cloud. We do Behavioural analysis and analyse the QOS parameters such as availability, throughput, transmission delay, latency, scalability and speed.

EXISTING SYSTEM Existing systems use TDS algorithms which are centralized, resulting in their inadequacy in handling large scale data sets. Although some distributed algorithms have been proposed, they mainly focus on secure anonymization of data sets from multiple parties, rather than the scalability aspect. This algorithm is only suitable for small amount of data sets and the level of anonymization is also low.

PROPOSED SYSTEM A scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the Map Reduce framework on cloud with QOS evaluation and Behavioural analysis. This approach splits the input data and anonymisation is applied to this small data sets, then they are merged together to get the final result. We analyze the each and every data set sensitive field and give priority for this sensitive field. Then we analyze QOS parameters of the system with Behavioral analysis.

MODULES Data Partition Anonymization Merging Optimised Balanced Scheduling Algorithm QOS Evaluation and Behaviour Analysis

OPTIMISED BALANCE SCHEDULING Load balancing is a pre-requirement service for improving the cloud performance and for maximum utilisation of resources. It is the process of assigning the total loads into the individual nodes of the collective system to make the resource utilisation effective and to improve the response time of the job. This process removes the situation where some nodes are over loaded and some under loaded.

Selecting Hadoop Location

DATA PARTITION We collect the large no of data sets. We split the large data set into small data sets. Then we provide the random number for each data set and analyze the parameters and size of the dataset. Based on the size of the data we partition the data into number of files and upload in the cloud.

DATA ANONYMISATION In the anonymization process we are going to identify the sensitive parameters. After identification of sensitive parameters we are going to apply anonymization to those parameters. Then we view the parameters based on the anonymized results. For anonymisation we use Bottom-up Generalisation approach.It hides the detailed information by creating successive layers.

MERGING The intermediate result of the small data sets are merged. The MRTDS driver is used to organize the small intermediate result for merging. The merged data sets are collected on cloud. Anonymisation is applied again, called as specialization.

CONCLUSION In this paper, we have investigated the scalability problem of large-scale data anonymization by Top-Down Specialization (TDS), and proposed a highly scalable two-phase TDS approach using Map Reduce on cloud. The QOS parameters such as availability, throughput, transmission delay, latency, scalability and speed are also computed to compare its efficiency with the existing system.