Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Comparison of Clustered Systems Yugandhar Maram, #91527748 Anjana Vadivel, #78563168 Stuthi Balaji, #34682837.

Similar presentations


Presentation on theme: "Performance Comparison of Clustered Systems Yugandhar Maram, #91527748 Anjana Vadivel, #78563168 Stuthi Balaji, #34682837."— Presentation transcript:

1 Performance Comparison of Clustered Systems Yugandhar Maram, #91527748 Anjana Vadivel, #78563168 Stuthi Balaji, #34682837

2 Motivation and Goals The primary motivation is to study the architecture of distributed systems and understand typical issues that arises in middleware. Aim of this project is to analyze the performance of various distributed systems and reason the results. We are currently considering Hadoop and Spark systems as our target distributed environments.

3 Implementation Details In order to perform analysis for both systems, we are using Hive tool which runs on top of them. We are issuing TPCH benchmark SQL queries to the hive, which queries database of many GBs of size that is spread across the systems. The hive translates the SQL queries to Hadoop/Spark systems jobs, where they will be performed in distributed manner.

4 Implementation Details(cont.) We will later analyze the performance of these systems based on the latency to generate the required results. We will compare the differences in architecture of the systems to reason the results of the above queries. Performance analysis of the same systems with different sizes of databases will also be reason.

5 Related Work/Progress We have set up Hadoop environment on our local machines and also ran map-reduce programs successfully. We have also set up Hive on top of Hadoop and performed sample queries to check for correct functionality. The architecture of Hadoop distributed systems, Google File Systems and other relevant topics that might be required for this project were carefully studied.

6 Evaluation Plans Currently, this week, we will start the next phase with TPCH queries on Hive. Once we get familiarized with whole setup in local systems, we will start the actual analysis on cluster nodes. Then, we start the last phase of reasoning the results, and present our analysis.

7 References http://bradhedlund.com/2011/09/10/understandi ng-hadoop-clusters-and-the-network/#download http://bradhedlund.com/2011/09/10/understandi ng-hadoop-clusters-and-the-network/#download The Google File System by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur.


Download ppt "Performance Comparison of Clustered Systems Yugandhar Maram, #91527748 Anjana Vadivel, #78563168 Stuthi Balaji, #34682837."

Similar presentations


Ads by Google