2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology.

Slides:



Advertisements
Similar presentations
INTRODUCTION TO SIMULATION WITH OMNET++ José Daniel García Sánchez ARCOS Group – University Carlos III of Madrid.
Advertisements

Providing Fault-tolerance for Parallel Programs on Grid (FT-MPICH) Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University.
CoMPI: Enhancing MPI based applications performance and scalability using run-time compression. Rosa Filgueira, David E.Singh, Alejandro Calderón and Jesús.
Estinet open flow network simulator and emulator. IEEE Communications Magazine 51.9 (2013): Wang, Shie-Yuan, Chih-Liang Chou, and Chun-Ming Yang.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
1 GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.
Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.
A Scalable Virtual Registry Service for jGMA Matthew Grove CCGRID WIP May 2005.
Performance Evaluation of RDMA over IP: A Case Study with the Ammasso Gigabit Ethernet NIC H.-W. Jin, S. Narravula, G. Brown, K. Vaidyanathan, P. Balaji,
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.
A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.
Nor Asilah Wati Abdul Hamid, Paul Coddington. School of Computer Science, University of Adelaide PDCN FEBRUARY 2007 AVERAGES, DISTRIBUTIONS AND SCALABILITY.
1 Performance Evaluation of Gigabit Ethernet & Myrinet
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
All rights reserved © 2006, Alcatel Accelerating TCP Traffic on Broadband Access Networks  Ing-Jyh Tsang 
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
MDK-ARM Microcontroller Development Kit MDK: Microcontroller Development Kit.
Vision/Benefits/Introduction Randy Armstrong (OPC Foundation)
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.
An Agile Vertical Handoff Scheme for Heterogeneous Networks Hsung-Pin Chang Department of Computer Science National Chung Hsing University Taichung, Taiwan,
Prof. Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University FT-MPICH : Providing fault tolerance for MPI parallel applications.
Ishikawa, The University of Tokyo1 GridMPI : Grid Enabled MPI Yutaka Ishikawa University of Tokyo and AIST.
Research Achievements Kenji Kaneda. Agenda Research background and goal Research background and goal Overview of my research achievements Overview of.
NAREGI WP4 (Data Grid Environment) Hideo Matsuda Osaka University.
Influence of Virtualization on Process of Grid Application Deployment Distributed Systems Research Group Department of Computer Science AGH-UST Cracow,
So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management.
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
1 Next Few Classes Networking basics Protection & Security.
A study of introduction of the virtualization technology into operator consoles T.Ohata, M.Ishii / SPring-8 ICALEPCS 2005, October 10-14, 2005 Geneva,
G-JavaMPI: A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports Lin Chen, Cho-Li Wang, Francis C. M. Lau and.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
Example: Sorting on Distributed Computing Environment Apr 20,
Introduction of CRON Lin Xue Feb What is CRON “cron.cct.lsu.edu” testbed project is based on the Emulab system in the University of Utah. Emulab:
Latest news on JXTA and JuxMem-C/DIET Mathieu Jan GDS meeting, Rennes, 11 march 2005.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
In Large-Scale Cluster Yutaka Ishikawa Computer Science Department/Information Technology Center The University of Tokyo
1 On Dynamic Parallelism Adjustment Mechanism for Data Transfer Protocol GridFTP Takeshi Itou, Hiroyuki Ohsaki Graduate School of Information Sci. & Tech.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
EVGM081 Multi-Site Virtual Cluster: A User-Oriented, Distributed Deployment and Management Mechanism for Grid Computing Environments Takahiro Hirofuchi,
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Virtual Machines Created within the Virtualization layer, such as a hypervisor Shares the physical computer's CPU, hard disk, memory, and network interfaces.
1 Wide Area Network Emulation on the Millennium Bhaskaran Raman Yan Chen Weidong Cui Randy Katz {bhaskar, yanchen, wdc, Millennium.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
Improving the Reliability of Commodity Operating Systems Michael M. Swift, Brian N. Bershad, Henry M. Levy Presented by Ya-Yun Lo EECS 582 – W161.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
Unified Parallel C at LBNL/UCB Berkeley UPC Runtime Report Jason Duell LBNL September 9, 2004.
Parallel Computing on Wide-Area Clusters: the Albatross Project Aske Plaat Thilo Kielmann Jason Maassen Rob van Nieuwpoort Ronald Veldema Vrije Universiteit.
BDTS and Its Evaluation on IGTMD link C. Chen, S. Soudan, M. Pasin, B. Chen, D. Divakaran, P. Primet CC-IN2P3, LIP ENS-Lyon
1 Scalability and Accuracy in a Large-Scale Network Emulator Nov. 12, 2003 Byung-Gon Chun.
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Realization of a stable network flow with high performance communication in high bandwidth-delay product network Y. Kodama, T. Kudoh, O. Tatebe, S. Sekiguchi.
iSCSI-based Virtual Storage System for Mobile Devices
MPJ: A Java-based Parallel Computing System
GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.
Part I. Overview of Data Communications and Networking
STATEL an easy way to transfer data
Cluster Computers.
Presentation transcript:

2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology Research Center, AIST (National Institute of Advanced Industrial Science and Technology) This work is partially supported by the NAREGI project (1,2) (2) (1) (2)

2006/1/23Yutaka Ishikawa, The University of Tokyo2 Motivation MPI, Message Passing Interface, has been widely used to program parallel applications. Users want to run such applications over the Grid environment without any modifications of the program. However, the performance of existing MPI implementations is not scaled up on the Grid environment. Wide-area Network Single (monolithic) MPI application over the Grid environment computing resource site A computing resource site A computing resource site B computing resource site B

2006/1/23Yutaka Ishikawa, The University of Tokyo3 Motivation Focus on metropolitan-area, high-bandwidth environment:  10Gpbs,  500miles (smaller than 10ms one-way latency) –We have already demonstrated that the performance of the NAS parallel benchmark programs are scaled up if one-way latency is smaller than 10ms using an emulated WAN environment. Wide-area Network Single (monolithic) MPI application over the Grid environment computing resource site A computing resource site A computing resource site B computing resource site B Motohiko Matsuda, Yutaka Ishikawa, and Tomohiro Kudoh, ``Evaluation of MPI Implementations on Grid-connected Clusters using an Emulated WAN Environment,'' CCGRID2003, 2003 Motohiko Matsuda, Yutaka Ishikawa, and Tomohiro Kudoh, ``Evaluation of MPI Implementations on Grid-connected Clusters using an Emulated WAN Environment,'' CCGRID2003, 2003

2006/1/23Yutaka Ishikawa, The University of Tokyo4 Internet Issues High Performance Communication Facilities for MPI on Long and Fat Networks –TCP vs. MPI communication patterns –Network Topology Latency and Bandwidth Interoperability –Most MPI library implementations use their own network protocol. Fault Tolerance and Migration –To survive a site failure Security TCPMPI Designed for streams. Burst traffic. Repeat the computation and communication phases. Change traffic by communication patterns.

2006/1/23Yutaka Ishikawa, The University of Tokyo5 Interne t Issues High Performance Communication Facilities for MPI on Long and Fat Networks –TCP vs. MPI communication patterns –Network Topology Latency and Bandwidth Interoperability –Many MPI library implementations. Most implementations use their own network protocol. Fault Tolerance and Migration –To survive a site failure Security TCPMPI Designed for streams. Burst traffic. Repeat the computation and communication phases. Change traffic by communication patterns. Using Vendor C’s MPI library Using Vendor A’s MPI library Using Vendor B’s MPI library Using Vendor D’s MPI library

2006/1/23Yutaka Ishikawa, The University of Tokyo6 GridMPI Features MPI-2 implementation IMPI (Interoperable MPI) protocol and extension for Grid –MPI-2 –New Collective protocols –Checkpoint Integration of Vendor MPI –IBM, Solaris, Fujitsu, and MPICH2 High Performance TCP/IP implementation on Long and Fat Networks –Pacing the transmission ratio so that the burst transmission is controlled according to the MPI communication pattern. Checkpoint IMPI Cluster XCluster Y VendorMPI YAMPII

2006/1/23Yutaka Ishikawa, The University of Tokyo7 Evaluation It is almost impossible to reproduce the execution behavior of communication performance in the wide area network. A WAN emulator, GtrcNET-1, is used to scientifically examine implementations, protocols, communication algorithms, etc. GtrcNET-1 GtrcNET-1 is developed at AIST. injection of delay, jitter, error, … traffic monitor, frame capture Four 1000Base-SX ports One USB port for Host PC FPGA (XC2V6000)

2006/1/23Yutaka Ishikawa, The University of Tokyo8 Experimental Environment 8 PCs CPU: Pentium4/2.4GHz, Memory: DDR MB NIC: Intel PRO/1000 (82547EI) OS: Linux (Fedora Core 2) Socket Buffer Size: 20MB WAN Emulator GtrcNET-1 8 PCs Node7 Host 0 Node0 Catalyst 3750 Node15 Host 0 Node8 Catalyst 3750 ……… Bandwidth:1Gbps Delay: 0ms -- 10ms

2006/1/23Yutaka Ishikawa, The University of Tokyo9 GridMPI vs. MPICH-G2 (1/4) FT (Class B) of NAS Parallel Benchmarks 3.2 on 8 x 8 processes One way delay (msec) Relative Performance

2006/1/23Yutaka Ishikawa, The University of Tokyo10 GridMPI vs. MPICH-G2 (2/4) IS (Class B) of NAS Parallel Benchmarks 3.2 on 8 x 8 processes One way delay (msec) Relative Performance

2006/1/23Yutaka Ishikawa, The University of Tokyo11 GridMPI vs. MPICH-G2 (3/4) LU (Class B) of NAS Parallel Benchmarks 3.2 on 8 x 8 processes One way delay (msec) Relative Performance

2006/1/23Yutaka Ishikawa, The University of Tokyo12 GridMPI vs. MPICH-G2 (4/4) NAS Parallel Benchmarks 3.2 Class B on 8 x 8 processes One way delay (msec) Relative Performance No parameters tuned in GridMPI

2006/1/23Yutaka Ishikawa, The University of Tokyo13 GridMPI on Actual network NAS Parallel Benchmarks run using 8 node (2.4GHz) cluster at Tsukuba and 8 node (2.8GHz) cluster at Akihabara –16 nodes Comparing the performance with –result using 16 node (2.4 GHz) –result using 16 node (2.8 GHz) JGN2 Network 10Gbps Bandwidth 1.5 msec RTT JGN2 Network 10Gbps Bandwidth 1.5 msec RTT Pentium-4 2.4GHz x 8 connected by 1G Tsukuba Pentium GHz x 8 Connected by 1G Akihabara 60 Km (40mi.) Benchmarks Relative performance

2006/1/23Yutaka Ishikawa, The University of Tokyo14 Demonstration Easy installation –Download the source –Make it and set up configuration files Easy use –Compile your MPI application –Run it ! JGN2 Network 10Gbps Bandwidth 1.5 msec RTT JGN2 Network 10Gbps Bandwidth 1.5 msec RTT Pentium-4 2.4GHz x 8 connected by 1G Tsukuba Pentium GHz x 8 Connected by 1G Akihabara 60 Km (40mi.)

2006/1/23Yutaka Ishikawa, The University of Tokyo15 NAREGI Software Stack (Beta Ver. 2006) ( Globus,Condor,UNICORE  OGSA / WSRF) Grid-Enabled Nano-Applications Grid PSE Grid Programing -Grid RPC -Grid MPI Grid Visualization Grid VM Distributed Information Service Grid Workflow Super Scheduler High-Performance & Secure Grid Networking Data

2006/1/23Yutaka Ishikawa, The University of Tokyo16 GridMPI Current Status GridMPI version 0.9 was released –MPI-1.2 features are fully supported –MPI-2.0 features are supported except for MPI-IO and one sided communication primitives –Conformance Tests MPICH Test Suite: 0/142 (Fails/Tests) Intel Test Suite: 0/493 (Fails/Tests) GridMPI version 1.0 will be released in this Spring –MPI-2.0 fully supported

2006/1/23Yutaka Ishikawa, The University of Tokyo17 Concluding Remarks GridMPI is integrated into the NaReGI package. GridMPI is not only for production but also our research vehicle for Grid environment in the sense that the new idea in Grid is implemented and tested. We are currently studying high-performance communication mechanisms in the long and fat network: –Modifications of TCP Behavior M Matsuda, T. Kudoh, Y. Kodama, R. Takano, and Y. Ishikawa, “TCP Adaptation for MPI on Long-and-Fat Networks,” IEEE Cluster 2005, –Precise Software Pacing R. Takano, T. Kudoh, Y. Kodama, M. Matsuda, H. Tezuka, Y. Ishikawa, “Design and Evaluation of Precise Software Pacing Mechanisms for Fast Long-Distance Networks”, PFLDnet2005, –Collective communication algorithms with respect to network latency and bandwidth.

2006/1/23Yutaka Ishikawa, The University of Tokyo18 BACKUP

2006/1/23Yutaka Ishikawa, The University of Tokyo19 GridMPI Version 1.0 – YAMPII, developed at the University of Tokyo, is used as the core implementation – Intra communication by YAMPII ( TCP/IP 、 SCore ) – Inter communication by IMPI ( TCP/IP ) MPI API TCP/IP PMv2MXO2GVendor MPI P2P Interface Request Layer Request Interface IMPI LACT Layer (Collectives) IMPI sshrshSCoreGlobusVendor MPI RPIM Interface

2006/1/23Yutaka Ishikawa, The University of Tokyo20 GridMPI vs. Others (1/2) NAS Parallel Benchmarks 3.2 Class B on 8 x 8 processes One way delay (msec) Relative Performance

2006/1/23Yutaka Ishikawa, The University of Tokyo21 GridMPI vs. Others (1/2) NAS Parallel Benchmarks 3.2 Class B on 8 x 8 processes Relative Performance

2006/1/23Yutaka Ishikawa, The University of Tokyo22 GridMPI vs. Others (2/2) Relative Performance NAS Parallel Benchmarks 3.2 Class B on 8 x 8 processes

2006/1/23Yutaka Ishikawa, The University of Tokyo23 GridMPI vs. Others Relative Performance NAS Parallel Benchmarks x 16

2006/1/23Yutaka Ishikawa, The University of Tokyo24 GridMPI vs. Others Relative Performance NAS Parallel Benchmarks 3.2