Presentation is loading. Please wait.

Presentation is loading. Please wait.

B534 distributed computing

Similar presentations


Presentation on theme: "B534 distributed computing"— Presentation transcript:

1 B534 distributed computing
Developing a Dynamic Virtual Cluster for Massively Parallel Applications: Case Study of Performance Analysis with PageRank Algorithm on FutureGrid Team 009: Joshi Harshad, Joshi Swapnil, Nachankar Vaibhav

2 Distributed systems A distributed system is a collection of independent computers that appears to its users as a single coherent system. - Tannenbaum’s book

3 Distributed systems Historically, computers were used only for complex Scientific and engineering problems. Engaged large computer clusters. Issues of performance and benchmarking of these clusters were thus mainly limited to the select set of scientists and engineers. With the birth of internet, distributed systems are becoming ubiquitous. These include using mobile phones to booking of travel tickets to office works. Internet and internet-based computing can be found everywhere. Cloud computing is becoming another measure of success and has sparked many academic and commercial institutions to implement this platform for their work. important to understand the features and differences between the distributed systems and study their components. In this project we attempt to decompose and study in details two systems: academic cloud and academic bare-metal supercomputing platform.

4 Two popular systems: Bare-metal and cloud computing
Bare-metal platform platform which is formed by joining compute nodes via a interconnect communication switch. There are many types of these switches including commonly used Gigabit Ethernet, Myrinet and Infiniband; infiniband being the fastest among them. Cloud computing Model for delivering Internet-based information and technology services in real time. Allows users to see the services while the infrastructure that delivers these services remains transparent (or in the "cloud").

5 Hypothesis for the study in question
A hypothesis for this study is that for larger and more complex problems where the performance of the computation on a distributed system relies on the communication will show stark differences in the results obtained from the above two platforms. The cloud platform will show lower performance in this case since the infini0band interconnect in the bare-metal will be much faster in achieving better communication between compute nodes.

6 Overview of the PageRank algorithm
In a web2.0 era it is becoming increasingly important to search/find the most relevant data specific to query from millions of webpages on the internet. Everyday thousands or more webpages get added, so the filtering of this search criteria has to be updated constantly or at least periodically enough to get the data properly indexed. Need to sort/index the webpages with some scoring index. PageRank algorithm introduced by Google-search engine tries to address this need.

7 Overview of the PageRank algorithm
Taken from Prof Qiu’s lecture notes

8 PageRank algorithm contd…
PR, pagerank (a probability value) pi , a page under consideration L(pi), the number of outbound links on page pj d, damping factor which can be set between 0 and 1 (It is usually set d to 0.85) N, total number of pages

9 Implementation of Parallel PageRank

10 Results Performance Analysis for small dataset

11 Varying No. of URLs BareMetal

12 Varying No. of Processes (BareMetal)

13 Monitoring system Test and implement parallel PageRank on FutureGrid
Optimization for better speed up from the initial results Build a monitoring system using Pub/Sub Build a dynamic virtual cluster

14 Implementation of Monitoring system - Results on bare-metal cluster

15 Conclusions Parallel algorithm for PageRank calculations was successfully implemented The algorithm was tested on two system – bare-metal cluster and virtual platform – eucalyptus The results obtained were in agreement with the hypothesis, that infiniband interconnect provided better communications and that for large datasets the communication between nodes becomes the bottleneck for the calculations

16 Future Work Rigorous performance can be tested with other systems and variety of datasets If possible performance can also be tested for different interconnect protocols

17 Thanks We are grateful for FutureGrid administrators for providing FutureGrid access for our work and their help in running the program successfully. Special thanks are to Andrew Young who helped solving major issues whenever technical problems regarding FutureGrid arose. We are also thankful to all the questions-answers raised by the class-mates as the forums helped to solve problems while executing our tasks. Last but not the least, we thank Prof Qiu and the AIs for both guiding in each task as well as showing the distributed system approach of the overall project.


Download ppt "B534 distributed computing"

Similar presentations


Ads by Google