Tools for Processing Big Data Jinan Al Aridhee and Christian Bach

Slides:

Advertisements

Similar presentations

Large Scale Computing Systems

Advertisements

HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.

Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.

Hive: A data warehouse on Hadoop

Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.

Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.

U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.

Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.

Ch 4. The Evolution of Analytic Scalability

A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.

Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.

Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.

Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

Introduction to Hadoop and HDFS

Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.

Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.

Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.

Web Log Data Analytics with Hadoop

Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉教授 : 許毅然作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.

HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.

BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.

BIG DATA/ Hadoop Interview Questions.

What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.

B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.

Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,

Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.

Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.

Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit

A Tutorial on Hadoop Cloud Computing : Future Trends.

Hadoop Javad Azimi May What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data. It includes:

Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.

Data Analytics (CS40003) Introduction to Data Lecture #1

Big Data & Test Automation

Performance Assurance for Large Scale Big Data Systems

CS 405G: Introduction to Database Systems

Organizations Are Embracing New Opportunities

SAS users meeting in Halifax

Big Data Enterprise Patterns

MapReduce Compiler RHadoop

Hadoop Aakash Kag What Why How 1.

Big Data A Quick Review on Analytical Tools

An Open Source Project Commonly Used for Processing Big Data Sets

INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER

Tutorial: Big Data Algorithms and Applications Under Hadoop

Chapter 14 Big Data Analytics and NoSQL

Big Data Technology.

The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.

Hadoop Clusters Tess Fulkerson.

Ministry of Higher Education

Big Data - in Performance Engineering

Introduction to PIG, HIVE, HBASE & ZOOKEEPER

Ch 4. The Evolution of Analytic Scalability

Hadoop Technopoints.

Introduction to Apache

Overview of big data tools

Charles Tappert Seidenberg School of CSIS, Pace University

Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper

Big-Data Analytics with Azure HDInsight

Analysis of Structured or Semi-structured Data on a Hadoop Cluster

Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.

Convergence of Big Data and Extreme Computing

Pig Hive HBase Zookeeper

Presentation transcript:

Tools for Processing Big Data Jinan Al Aridhee and Christian Bach Abstract Information technology plays as important role for processing of Big Data. Major limitations while processing Big Data include capturing, searching, sorting, and analysis such as some of petabytes of data is quite rigid for sorting huge volume of data. Some of the tools highlighted in this research include Hadoop, NoSQL, Hive, and Pig. Due to complexity of Big Data, we require big data tools for exploring such large volume of data. In this paper, we have presented some of these typically used data analytical tools. INTRODUCTION Big Data is a large collection of data sets which are very hard to process using traditional tools. The size of data set can be more than giga bytes. Because of the assortment of information that it envelops, huge information dependably conveys various difficulties identifying with its volume and multifaceted nature. Hadoop, NoSQL, Hive, and Pig are few tools among number of modern technology which are useful in data analysis. BIG DATA HIVE Big Data Hadoop Pig Hive NoSQL Metastore Driver Execution Engine Massive volumes of data sets that cannot manage by using simple data tools technology. An analytical tool used to manage the flood of data and then turn the flood into source of productive and useable information (Jaison, Kavitha, & Janardhanan, 2016). HIVE is a data warehouse distributed system for Hadoop that facilitates easy data summarization. It allows us to obtain the final analytics components from the Big Data processed (Genkin et al., 2016). Fig. 4. Components of Hive architecture. Fig. 1. Different tools to process Big Data HADOOP PIG The most commonly used framework it combines hardware with open source software which allows large scale processing of data sets. It partitions the data so that it computes across many of hosts, and executing applications computations in parallel close to their data (Patel, Yuan, Roy, & Abernathy, 2017). PIG Pig Latin Parser Optimizer Compiler MapReduce HDFS Yarn HADOOP Pig is a high level programing language used to write MapReduce program to run within the Hadoop framework. It also provides some basic analytical functionalities for NoSQL data stores (Garcia, 2013). Figure 5. Pig Programming Language’s components. RESULTS Fig. 2. Components of Hadoop BIG DATA HADOOP HIVE PIG MapReduce HDFS Yarn Metastore Driver Execution Engine Pig Latin Optimizer Parser Compiler NOSQL BigTable Cassanadra DynamoDB NOSQL Non-relational databases (NoSQL databases) are considering as new Era database, it provides dynamic schemas, flexible data model, scale-out architecture, efficient big data storage and access requirement. Today the use of NoSQL is mainly due to its scalability and performance characteristics(Zaki,2014 ). BigTable Cassanadra DynamoDB NoSQL Figure 6. Suggested framework model for Big Data and different tool components Figure 3. Types of NoSQL Databases. CONCLUSION As we know, there are many tools available for Big Data process. But Hadoop, NoSQL, Hive and Pig are very cost effective with great flexibility. This all tools work on distributed system with useful to store enormous number of data in cluster and then we can process on that data using tools like Hadoop, Hive and Pig. The relationship between these tools are they are resistant to failure. These tools have very effective speed which do data processing on millions of data in second. REFERENCES Garcia, Christopher. (2013). Demystifying MapReduce. Procedia Computer Science, 20(Supplement C), 484-489. doi: https://doi.org/10.1016/j.procs.2013.09.307 Genkin, M., Dehne, F., Pospelova, M., Chen, Y., & Navarro, P. (2016, 12-14 Dec. 2016). Automatic, On-Line Tuning of YARN Container Memory and CPU Parameters. Paper presented at the 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). Jaison, A., Kavitha, N., & Janardhanan, P. S. (2016, 21-22 Oct. 2016). Docker for optimization of Cassandra NoSQL deployments on node limited clusters. Paper presented at the 2016 International Conference on Emerging Technological Trends (ICETT) Zaki, Asadulla Khan. (2014). NoSQL databases: new millennium database for big data, big users, cloud computing and its security challenges. International Journal of Research in Engineering and Technology (IJRET), 3(15), 403-409.. Patel, D., Yuan, X., Roy, K., & Abernathy, A. (2017, March 30 2017-April 2 2017). Analyzing network traffic data using Hive queries. Paper presented at the SoutheastCon 2017.