Parallel Computing on Wide-Area Clusters: the Albatross Project Aske Plaat Thilo Kielmann Jason Maassen Rob van Nieuwpoort Ronald Veldema Vrije Universiteit.

Slides:

Advertisements

Similar presentations

European Research Network on Foundations, Software Infrastructures and Applications for large scale distributed, GRID and Peer-to-Peer Technologies Experiences.

Advertisements

The First 16 Years of the Distributed ASCI Supercomputer Henri Bal Vrije Universiteit Amsterdam COMMIT/

Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences DAS-1 DAS-2 DAS-3.

Opening Workshop DAS-2 (Distributed ASCI Supercomputer 2) Project vrije Universiteit.

Vrije Universiteit Interdroid: a platform for distributed smartphone applications Henri Bal, Nick Palmer, Roelof Kemp, Thilo Kielmann High Performance.

7 april SP3.1: High-Performance Distributed Computing The KOALA grid scheduler and the Ibis Java-centric grid middleware Dick Epema Catalin Dumitrescu,

The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal Vrije Universiteit Amsterdam.

CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Supporting Parallel Applications on Clusters of Workstations: The Intelligent Network Interface Approach.

The Distributed ASCI Supercomputer (DAS) project Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.

Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.

Beowulf Supercomputer System Lee, Jung won CS843.

Parallel Programming on Computational Grids. Outline Grids Application-level tools for grids Parallel programming on grids Case study: Ibis.

Distributed supercomputing on DAS, GridLab, and Grid’5000 Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.

Distributed Processing, Client/Server, and Clusters

Summary Background –Why do we need parallel processing? Applications Introduction in algorithms and applications –Methodology to develop efficient parallel.

The Distributed ASCI Supercomputer (DAS) project Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.

June 3, 2015 Synthetic Grid Workloads with Ibis, K OALA, and GrenchMark CoreGRID Integration Workshop, Pisa A. Iosup, D.H.J. Epema Jason Maassen, Rob van.

Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.

The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal, Jason Maassen, Rob van Nieuwpoort, Thilo Kielmann, Niels Drost, Ceriel Jacobs, Frank.

Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus Gabrielle Allen, Thomas Dramlitsch, Ian Foster,

Virtual Laboratory for e-Science (VL-e) Henri Bal Department of Computer Science Vrije Universiteit Amsterdam vrije Universiteit.

Grid Adventures on DAS, GridLab and Grid'5000 Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.

Ibis: a Java-centric Programming Environment for Computational Grids Henri Bal Vrije Universiteit Amsterdam vrije Universiteit.

DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters Thu D. Nguyen and Christopher Peery Department of Computer Science.

The Distributed ASCI Supercomputer (DAS) project Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences.

4 december, The Distributed ASCI Supercomputer The third generation Dick Epema (TUD) (with many slides from Henri Bal) Parallel and Distributed.

Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.

Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical.

Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

Optimizing Threaded MPI Execution on SMP Clusters Hong Tang and Tao Yang Department of Computer Science University of California, Santa Barbara.

2006/1/23Yutaka Ishikawa, The University of Tokyo1 An Introduction of GridMPI Yutaka Ishikawa and Motohiko Matsuda University of Tokyo Grid Technology.

DAS 1-4: 14 years of experience with the Distributed ASCI Supercomputer Henri Bal Vrije Universiteit Amsterdam.

Panel Abstractions for Large-Scale Distributed Systems Henri Bal Vrije Universiteit Amsterdam.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

1 Distributed Systems: an Introduction G53ACC Chris Greenhalgh.

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Sensitivity of Cluster File System Access to I/O Server Selection A. Apon, P. Wolinski, and G. Amerson University of Arkansas.

High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.

“DECISION” PROJECT “DECISION” PROJECT INTEGRATION PLATFORM CORBA PROTOTYPE CAST J. BLACHON & NGUYEN G.T. INRIA Rhône-Alpes June 10th, 1999.

A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.

Parallelization of Classification Algorithms For Medical Imaging on a Cluster Computing System 指導教授 : 梁廷宇老師系所 : 碩光通一甲姓名 : 吳秉謙學號 :

MSA’2000 Metacomputing Systems and Applications. MSA Introduction2 Organizing Committee F. Desprez, INRIA Rhône-Alpes E. Fleury, INRIA Lorraine J.-F.

A High Performance Middleware in Java with a Real Application Fabrice Huet*, Denis Caromel*, Henri Bal + * Inria-I3S-CNRS, Sophia-Antipolis, France + Vrije.

Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

On Optimizing Collective Communication UT/Texas Advanced Computing Center UT/Computer Science Avi Purkayastha Ernie Chan, Marcel Heinrich Robert van de.

Wide-Area Parallel Computing in Java Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences vrije Universiteit.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.

Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.

CBHPC’08: Component-Based High Performance Computing (16/10/08) 1 A GCM-Based Runtime Support for Parallel Grid Applications Elton Mathias, Françoise Baude.

Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

Performance Evaluation of Parallel Algorithms on a Computational Grid Environment Simona Blandino 1, Salvatore Cavalieri 2 1 Consorzio COMETA, 2 Faculty.

Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.

Background Computer System Architectures Computer System Software.

1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.

SCARIe: using StarPlane and DAS-3 Paola Grosso Damien Marchel Cees de Laat SNE group - UvA.

High level programming for the Grid Gosia Wrzesinska Dept. of Computer Science Vrije Universiteit Amsterdam vrije Universiteit.

Fault tolerance, malleability and migration for divide-and-conquer applications on the Grid Gosia Wrzesińska, Rob V. van Nieuwpoort, Jason Maassen, Henri.

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Parallel Programming By J. H. Wang May 2, 2017.

Department of Computer Science University of California, Santa Barbara

Course Outline Introduction in algorithms and applications

MPJ: A Java-based Parallel Computing System

Vrije Universiteit Amsterdam

Department of Computer Science University of California, Santa Barbara

Cluster Computers.

Presentation transcript:

Parallel Computing on Wide-Area Clusters: the Albatross Project Aske Plaat Thilo Kielmann Jason Maassen Rob van Nieuwpoort Ronald Veldema Vrije Universiteit Amsterdam Faculty of Sciences vrije Universiteit Henri Bal

2 Introduction Cluster computing becomes popular -Excellent price/performance ratio -Fast commodity networks Next step: wide-area cluster computing -Use multiple clusters for single application -Form of metacomputing Challenges -Software infrastructure (e.g., Legion, Globus) -Parallel applications that can tolerate WAN-latencies

3 Albatross project Study applications and programming environments for wide-area parallel systems Basic assumption: wide-area system is hierarchical -Connect clusters, not individual workstations General approach -Optimize applications to exploit hierarchical structure  most communication is local

4 Outline Experimental system and programming environments Application-level optimizations Performance analysis Wide-area optimized programming environments

5 Distributed ASCI Supercomputer (DAS) VU (128)UvA (24) Leiden (24)Delft (24) 6 Mb/s ATM Node configuration 200 MHz Pentium Pro MB memory 2.5 GB local disks Myrinet LAN Fast Ethernet LAN Redhat Linux

6 Programming environments Existing library/language + expose hierarchical structure -Number of clusters -Mapping of CPUs to clusters Panda library -Point-to-point communication -Group communication -Multithreading Panda JavaOrcaMPI LFCTCP/IP ATMMyrinet

7 Example: Java Remote Method Invocation (RMI) -Simple, transparent, object-oriented, RPC-like communication primitive Problem: RMI performance -JDK RMI on Myrinet is factor 40 slower than C-RPC (1228 vs. 30 µsec) Manta: high-performance Java system [PPoPP’99] -Native (static) compilation: source  executable -Fast RMI protocol between Manta nodes -JDK-style protocol to interoperate with JVMs

8 JDK versus Manta 200 MHz Pentium Pro, Myrinet, JDK interpreter, 1 object as parameter

9 2 orders of magnitude between intra-cluster (LAN) and inter-cluster (WAN) communication performance Application-level optimizations [JavaGrande’99] -Minimize WAN-overhead Manta on wide-area DAS

10 Example: SOR Red/black Successive Overrelaxation -Neighbor communication, using RMI Problem: nodes at cluster-boundaries -Overlap wide-area communication with computation -RMI is synchronous  use multithreading Cluster 1Cluster 2 CPU 3CPU 2CPU 1CPU 6CPU 5CPU µsec µs

11 Wide-area optimizations

12 Performance Java applications Wide-area DAS system: 4 clusters of 10 CPUs Sensitivity to wide-area latency and bandwidth: -See HPCA’99

13 Optimized applications obtain good speedups -Reduce wide-area communication, or hide its latency Java RMI is easy to use, but some optimizations are awkward to express -Lack of asynchronous communication and broadcast RMI model does not help exploiting hierarchical structure of wide-area systems Need wide-area optimized programming environment Discussion

14 MagPIe: wide-area collective communication Collective communication among many processors -e.g., multicast, all-to-all, scatter, gather, reduction MagPIe: MPI’s collective operations optimized for hierarchical wide-area systems [PPoPP’99] Transparent to application programmer

15 Spanning-tree broadcast Cluster 1Cluster 2Cluster 3Cluster 4 MPICH (WAN-unaware) -Wide-area latency is chained -Data is sent multiple times over same WAN-link MapPIe (WAN-optimized) -Each sender-receiver path contains at most 1 WAN-link -No data item travels multiple times to same cluster

16 MagPIe results MagPIe collective operations are wide-area optimal, except non-associative reduction Operations up to 10 times faster than MPICH Factor 2-3 speedup improvement over MPICH for some (unmodified) MPI applications

17 Conclusions Wide-area parallel programming is feasible for many applications Exploit hierarchical structure of wide-area systems to minimize WAN overhead Programming systems should take hierarchical structure of wide-area systems into account