Tools for Processing Big Data Jinan Al Aridhee and Christian Bach

Tools for Processing Big Data Jinan Al Aridhee and Christian Bach
Abstract Information technology plays as important role for processing of Big Data. Major limitations while processing Big Data include capturing, searching, sorting, and analysis such as some of petabytes of data is quite rigid for sorting huge volume of data. Some of the tools highlighted in this research include Hadoop, NoSQL, Hive, and Pig. Due to complexity of Big Data, we require big data tools for exploring such large volume of data. In this paper, we have presented some of these typically used data analytical tools. INTRODUCTION Big Data is a large collection of data sets which are very hard to process using traditional tools. The size of data set can be more than giga bytes. Because of the assortment of information that it envelops, huge information dependably conveys various difficulties identifying with its volume and multifaceted nature. Hadoop, NoSQL, Hive, and Pig are few tools among number of modern technology which are useful in data analysis. BIG DATA HIVE Big Data Hadoop Pig Hive NoSQL Metastore Driver Execution Engine Massive volumes of data sets that cannot manage by using simple data tools technology. An analytical tool used to manage the flood of data and then turn the flood into source of productive and useable information (Jaison, Kavitha, & Janardhanan, 2016). HIVE is a data warehouse distributed system for Hadoop that facilitates easy data summarization. It allows us to obtain the final analytics components from the Big Data processed (Genkin et al., 2016). Fig. 4. Components of Hive architecture. Fig. 1. Different tools to process Big Data HADOOP PIG The most commonly used framework it combines hardware with open source software which allows large scale processing of data sets. It partitions the data so that it computes across many of hosts, and executing applications computations in parallel close to their data (Patel, Yuan, Roy, & Abernathy, 2017). PIG Pig Latin Parser Optimizer Compiler MapReduce HDFS Yarn HADOOP Pig is a high level programing language used to write MapReduce program to run within the Hadoop framework. It also provides some basic analytical functionalities for NoSQL data stores (Garcia, 2013). Figure 5. Pig Programming Language’s components. RESULTS Fig. 2. Components of Hadoop BIG DATA HADOOP HIVE PIG MapReduce HDFS Yarn Metastore Driver Execution Engine Pig Latin Optimizer Parser Compiler NOSQL BigTable Cassanadra DynamoDB NOSQL Non-relational databases (NoSQL databases) are considering as new Era database, it provides dynamic schemas, flexible data model, scale-out architecture, efficient big data storage and access requirement. Today the use of NoSQL is mainly due to its scalability and performance characteristics(Zaki,2014 ). BigTable Cassanadra DynamoDB NoSQL Figure 6. Suggested framework model for Big Data and different tool components Figure 3. Types of NoSQL Databases. CONCLUSION As we know, there are many tools available for Big Data process. But Hadoop, NoSQL, Hive and Pig are very cost effective with great flexibility. This all tools work on distributed system with useful to store enormous number of data in cluster and then we can process on that data using tools like Hadoop, Hive and Pig. The relationship between these tools are they are resistant to failure. These tools have very effective speed which do data processing on millions of data in second. REFERENCES Garcia, Christopher. (2013). Demystifying MapReduce. Procedia Computer Science, 20(Supplement C), doi: Genkin, M., Dehne, F., Pospelova, M., Chen, Y., & Navarro, P. (2016, Dec. 2016). Automatic, On-Line Tuning of YARN Container Memory and CPU Parameters. Paper presented at the 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). Jaison, A., Kavitha, N., & Janardhanan, P. S. (2016, Oct. 2016). Docker for optimization of Cassandra NoSQL deployments on node limited clusters. Paper presented at the 2016 International Conference on Emerging Technological Trends (ICETT) Zaki, Asadulla Khan. (2014). NoSQL databases: new millennium database for big data, big users, cloud computing and its security challenges. International Journal of Research in Engineering and Technology (IJRET), 3(15), Patel, D., Yuan, X., Roy, K., & Abernathy, A. (2017, March April ). Analyzing network traffic data using Hive queries. Paper presented at the SoutheastCon 2017.

Tools for Processing Big Data Jinan Al Aridhee and Christian Bach

Similar presentations

Presentation on theme: "Tools for Processing Big Data Jinan Al Aridhee and Christian Bach"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tools for Processing Big Data Jinan Al Aridhee and Christian Bach

Similar presentations

Presentation on theme: "Tools for Processing Big Data Jinan Al Aridhee and Christian Bach"— Presentation transcript:

Similar presentations

About project

Feedback