A Survey on Parallel Computing in Heterogeneous Grid Environments Takeshi Sekiya Chikayama-Taura Laboratory M1 Nov 24, 2006.

Slides:



Advertisements
Similar presentations
A Communication Relay Mechanism toward Construction of Virtual Cluster on Orverlay Network PRAGMA14 Working Group March 2008 Yasuyuki Kusumoto Osaka.
Advertisements

Internetworking II: MPLS, Security, and Traffic Engineering
Scalable Content-Addressable Network Lintao Liu
Transitioning to IPv6 April 15,2005 Presented By: Richard Moore PBS Enterprise Technology.
Center for Autonomic Computing Intel Portland, April 30, 2010 Autonomic Virtual Networks and Applications in Cloud and Collaborative Computing Environments.
Analysis of NAT-Based Internet Connectivity for Multi-Homed On-Demand Ad Hoc Networks Engelstad, P.E. and Egeland, G. University of Oslo (UniK) / Telenor.
11 TROUBLESHOOTING Chapter 12. Chapter 12: TROUBLESHOOTING2 OVERVIEW  Determine whether a network communications problem is related to TCP/IP.  Understand.
MCDST : Supporting Users and Troubleshooting a Microsoft Windows XP Operating System Chapter 13: Troubleshoot TCP/IP.
1 In VINI Veritas: Realistic and Controlled Network Experimentation Jennifer Rexford with Andy Bavier, Nick Feamster, Mark Huang, and Larry Peterson
1 Fall 2005 Internetworking: Concepts, Architecture and TCP/IP Layering Qutaibah Malluhi CSE Department Qatar University.
CSCI 4550/8556 Computer Networks Comer, Chapter 19: Binding Protocol Addresses (ARP)
COS 420 Day 20. Agenda Group Project Discussion Protocol Definition Due April 12 Paperwork Due April 29 Assignment 3 Due Assignment 4 is posted Last Assignment.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
IP Routing: an Introduction. Quiz
Chapter 19 Binding Protocol Addresses (ARP) Chapter 20 IP Datagrams and Datagram Forwarding.
ROUTING PROTOCOLS Rizwan Rehman. Static routing  each router manually configured with a list of destinations and the next hop to reach those destinations.
Center for Autonomic Computing Intel Portland, April 30, 2010 Autonomic Virtual Networks and Applications in Cloud and Collaborative Computing Environments.
Copyright 2003 CCNA 1 Chapter 7 TCP/IP Protocol Suite and IP Addressing By Your Name.
Support Protocols and Technologies. Topics Filling in the gaps we need to make for IP forwarding work in practice – Getting IP addresses (DHCP) – Mapping.
Data Communications and Networks
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Addressing the Network – IPv4 Network Fundamentals – Chapter 6.
COMS W COMS W Lecture 8. NAT, DHCP & Firewalls.
Network Layer (3). Node lookup in p2p networks Section in the textbook. In a p2p network, each node may provide some kind of service for other.
Advanced Computing and Information Systems laboratory Plug-and-play Virtual Appliance Clusters Running Hadoop Dr. Renato Figueiredo ACIS Lab - University.
1 Introducing Routing 1. Dynamic routing - information is learned from other routers, and routing protocols adjust routes automatically. 2. Static routing.
Common Devices Used In Computer Networks
Introducing Reliability and Load Balancing in Home Link of Mobile IPv6 based Networks Jahanzeb Faizan, Mohamed Khalil, and Hesham El-Rewini Parallel, Distributed,
Objectives Configure routing in Windows Server 2008 Configure Routing and Remote Access Services in Windows Server 2008 Network Address Translation 1.
Networks – Network Architecture Network architecture is specification of design principles (including data formats and procedures) for creating a network.
Grid Appliance – On the Design of Self-Organizing, Decentralized Grids David Wolinsky, Arjun Prakash, and Renato Figueiredo ACIS Lab at the University.
Advanced Computing and Information Systems laboratory Self-configuring Condor Virtual Machine Appliances for Ad-Hoc Grids Renato Figueiredo Arijit Ganguly,
Private Network Interconnection Chapter 20. Introduction Privacy in an internet is a major concern –Contents of datagrams that travel across the Internet.
Internetworking – What is internetworking? Connect multiple networks of one or more organizations into a large, uniform communication system. The resulting.
Part 3: Internetworking Internet architecture, addressing, encapsulation, reliable transport and the TCP/IP protocol suite.
PC1 LAN GW SP RTR1 SP RTR2 DST 4 * 25 ms 21 ms dst [ ] 4. A third packet is sent with TTL=3, which decrements at each hop, and expires after RTR2,
Virtual Private Ad Hoc Networking Jeroen Hoebeke, Gerry Holderbeke, Ingrid Moerman, Bard Dhoedt and Piet Demeester 2006 July 15, 2009.
TOMA: A Viable Solution for Large- Scale Multicast Service Support Li Lao, Jun-Hong Cui, and Mario Gerla UCLA and University of Connecticut Networking.
Dynamic Source Routing (DSR) Sandeep Gupta M.Tech - WCC.
Chapter 17 Internetworking: Concepts, Architecture, and Protocols
OS Services And Networking Support Juan Wang Qi Pan Department of Computer Science Southeastern University August 1999.
Distributed Computing Systems CSCI 4780/6780. Geographical Scalability Challenges Synchronous communication –Waiting for a reply does not scale well!!
Advanced Computing and Information Systems laboratory IP over P2P: Enabling Self- configuring Virtual IP Networks for Grid Computing Arijit Ganguly, Abhishek.
CORBA1 Distributed Software Systems Any software system can be physically distributed By distributed coupling we get the following:  Improved performance.
1 Network Address Translation (NAT) and Dynamic Host Configuration Protocol (DHCP) Relates to Lab 7. Module about private networks and NAT.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Mapping IP Addresses to Hardware Addresses Chapter 5.
CS470 Computer Networking Protocols Huiping Guo Department of Computer Science California State University, Los Angeles 4. Internetworking.
Middleware issues: From P2P systems to Ad Hoc Networks
Module 16: Distributed System Structures Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Apr 4, 2005 Distributed.
ITP 457 Network Security Networking Technologies III IP, Subnets & NAT.
J. Liebeher (modified by M. Veeraraghavan) 1 Introduction Complexity of networking: An example Layered communications The TCP/IP protocol suite.
Chapter 5. An IP address is simply a series of binary bits (ones and zeros). How many binary bits are used? 32.
Network Layer IP Address.
1 K. Salah Module 5.1: Internet Protocol TCP/IP Suite IP Addressing ARP RARP DHCP.
Gijeong Kim ,Junho Kim ,Sungwon Lee Kyunghee University
Martin Casado, Nate Foster, and Arjun Guha CACM, October 2014
Scaling the Network: The Internet Protocol
CS4470 Computer Networking Protocols
Part I. Overview of Data Communications and Networking
VIRTUAL SERVERS Presented By: Ravi Joshi IV Year (IT)
Troubleshooting IP Communications
University of Technology
NTHU CS5421 Cloud Computing
Plethora: Infrastructure and System Design
Data and Computer Communications by William Stallings Eighth Edition
NET323 D: Network Protocols
NTHU CS5421 Cloud Computing
Scaling the Network: The Internet Protocol
Lecture 4a Mobile IP 1.
Computer Networks Protocols
Presentation transcript:

A Survey on Parallel Computing in Heterogeneous Grid Environments Takeshi Sekiya Chikayama-Taura Laboratory M1 Nov 24, 2006

Parallel Computing in Grid Environments Increase opportunity in which we can use multi cluster environments –But, schemes for stand alone clusters cause problems in grid-like usage New mechanisms are needed –Handling heterogeneity –Firewall/NAT traversal –Adaptation to dynamic environment –Monitoring Heterogeneous hardware and software Failure Firewall/ NAT Maintenance Dynamic Change of CPU/Network Load Complex Configuration Difficult to Know What’s Happening

Heterogeneous Environments Heterogeneous machines –Binaries are different –Complex configuration are required when hardware/software is different Heterogeneous networks –Overheads of synchronization in parallel application with different latency/bandwidth –Firewalls/NATs

Firewall/NAT Firewalls/NATs hinder bi-directional connectivity Bi-directional TCP/IP connectivity needs to be provided to support a wide spectrum of applications Firewall or NAT

Solutions to the Internet Asymmetric-Connectivity Problem MPI Environment on Grid with Virtual Machines [Tachibana et al. 2006] –X–Xen for VM and VPN for Virtual Network –L–Low cost VM migration ViNe [Tsugawa et al. 2006] –A–A host named Virtual Router –O–Overlay network base WOW [Ganguly et al. 2006]

Outline Introduction WOW –IPOP: IP over P2P –Routing IP on the P2P Overlay –Connection Setup –Joining an Existing Network –NAT Traversal –Experiments Summary

Objective and Approach The system architected to … –Adapt heterogeneous environments Present to end-users a cluster-like environment –Scale to large number of nodes –Facilitate the addition of nodes through self- organization of virtual network Less manual configuration Approach with Virtualization –Virtual Machines Homogeneous software –Self-organizing overlay network All-to-all connectivity

Virtual Machine A homogeneous software environment Offering opportunities for load balancing and fault tolerance Users can use pre- configured systems –Linux distribution –Libraries and softwares

Virtual Network NAT P2P overlay network IPOP (IP over P2P) Physical Infrastructure P2P Network Virtual Grid Cluster firewall

IPOP [Ganguly et al. 2006] Characteristics –A virtual IP address space –Self-organizing Architecture –IP tunneling over P2P –A virtualized network interface (tap) captures virtual IP packets –Brunet P2P overlay network

Capturing Virtual IP Packets The tap appears as a network interface from applications IPOP translates virtual IP addresses to Brunet P2P network addresses IPOP application tap IPOP application tap Ethernet Frame IP Packet Brunet Message IP Packet Ethernet Frame IP Packet Tunneling

Brunet P2P Ring-structured overlay Organized connections –Near: with neighbors –Far: across the ring 160 bit SHA-1 hash address Greedy routing Each node has constant number of connections –O(log 2 (n)) overlay hops n1 n2 n3 n4 n5 n6 n7 n8 n9 n10 n11 n12 Multi hop path from n1 to n7

Connection Setup Connection Protocol Node A wishes to connect to node B 1. A sends a CTM (Connect To Me) request to B over P2P network The CTM request contains A’s URI 2.When B receives the CTM request, B sends a CTM reply to A The CTM reply contains B’s URI A B CTM request CTM reply URI (Uniform Resource Indicator) ex.) brunet.tcp: :1024

Connection Setup Linking Protocol A B 3. B sends a link request message to A over the physical network 4.When A receives the link request, A simply responds with a link reply message 5.Finally, new connection is established between A and B link request link reply Direct connection A to B

Linking Race Condition (1) Race condition may occur because linking protocol is initiated by both peers link request link reply Both attempts succeed

Linking Race Condition (2) Check no existing connection or connection attempt, when nodes receive link request When nodes receive link error, they restart protocol with random back-off link request link error link request link reply Active linking on? Random back-off

Joining an Existing Network Leaf Connection A new node N creates a leaf connection to an initial node I by directly using linking protocol I acts as forwarding agent for N New node N Leaf connection Initial node I Correct position of new node

Joining an Existing Network Send CTM request N sends a CTM request addressed to itself over P2P network –the CTM request contains N’s URI A CTM request is received by right and left neighbors, since N is still not in the ring CTM request Right neighbor R Left neighbor L New node N Initial node I

Joining an Existing Network Send CTM reply L and R send CTM reply including their URI to I I forwards CTM reply to N CTM reply Right neighbor R Left neighbor L New node N Initial node I CTM reply

Joining an Existing Network Linking Protocol Start linking protocol L and R send link request message to N over the physical network Right neighbor R Left neighbor L New node N Initial node I Link request

Joining an Existing Network Complete Joining N forms connections with neighbors and is in ring Acquires “far” connections Right neighbor R Left neighbor L New node N Initial node I

Adaptive Shortcut Creation High latencies were observed in experiments due to multi-hop overlay routing Shortcut creation –Count IPOP packets to other nodes –When number of packets within an interval exceeds threshold, initiate connection setup –Because overhead incurred during maintenance connections, drop connections no longer in use

NAT Host aNAT Host b NAT Table :5000 ⇔ :6000 IP: IP: IP: Src: :5000 Dst: :80 Src: :6000 Dst: :80 Src: :80 Dst: :5000 Src: :80 Dst: :6000 Private Network Global Network

NAT Traversal UDP Hole Punching NAT Host AHost B IP: AIP: NIP: MIP: B NAT Table A:a ⇔ N:n NAT Table M:m ⇔ B:b Src: A:a Dst: M:m Src: B:b Dst: N:n Src: M:m Dst: A:a Src: N:n Dst: M:m Src: M:m Dst: N:n

Experimental Setup Hosts: 2.4GHz Xeon, Linux , VMware GSX Host: 1.3GHz P-III Linux VMPlayer Host: 1.7GHz P4, Win XP SP2, VMPlayer Hosts: 2.0 GHz Xeon, Linux , VMware GSX 34 compute nodes, 118 P2P router nodes on PlanetLab

Experiment 1 Joining and Shortcut Connections Node A: IPOP node Node B: new joining node –A and B are in different network domains with NAT –B sends ICMP packets to A at 1sec intervals Within period 1 (about 3 seconds), B establish a route to other nodes Within period 2 (about 28seconds), B establish a shortcut connections to A

Experiment 2 PVM parallel application: FastDNAml (1) Parallelization with PVM based master-workers model FastDNAml has a high computation-to- communication ratio Dynamic task assignment tolerates performance heterogeneities among computing nodes Master Workers Task Pool

Experiment 2 PVM parallel application: FastDNAml (2) The execution with shortcuts enabled is 24% faster than that with shortcuts disabled The parallel speedup is 13.6x –23x is reported in previous work in homogeneous cluster Sequential Execution Parallel Execution Node #230 Nodes Shortcuts disabledShortcuts enabled Execution time (sec) Parallel Speed upn/a

Summary Introduced WOW –Scalable, fault-resilient and low management infrastructure Future works –Research on middleware which is easy to use for heterogeneous adaptive Grid environment