Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2008 IBM Corporation Deep Computing Messaging Framework Lightweight Communication for Petascale Supercomputing Supercomputing 2008 Michael Blocksome,

Similar presentations


Presentation on theme: "© 2008 IBM Corporation Deep Computing Messaging Framework Lightweight Communication for Petascale Supercomputing Supercomputing 2008 Michael Blocksome,"— Presentation transcript:

1 © 2008 IBM Corporation Deep Computing Messaging Framework Lightweight Communication for Petascale Supercomputing Supercomputing 2008 Michael Blocksome, blocksom@us.ibm.com

2 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 2 DCMF Open Source Community  Open source community established January 2008  Wiki –http://dcmf.anl-external.org/wikihttp://dcmf.anl-external.org/wiki  Mailing List –dcmf@lists.anl-external.orgdcmf@lists.anl-external.org  Git Source Repository –helpful git resources on wiki –git clone http://dcmf.anl-external.org/dcmf.git/

3 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 3 Design Goals  Scalable to millions of tasks  Efficient on low frequency embedded cores –Inlined system programmer interface (SPI)  Supports many programming paradigms –Active Messages –Support multiple contexts –Multiple levels of application interfaces  Structured component design –Extendible to new architectures –Software architecture for multiple networks –Open source runtime with external contributions  Separate library for optimized collectives –Hardware acceleration –Software collectives

4 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 4 Berkeley UPC Application DMA SPI DCMF (C++) MPICH2 DCMF Public API Global Arrays GASNet Systems Programming Interface Deep Computing Messaging Framework CCMI Application Layer Charm++ DMA SPI Applications (QCD) DCMF Applications ARMCI Library Portability Layer BG/P Network Hardware IBM supported software Externally supported software IBM ® Blue Gene ® /P Messaging Software Stack dcmfd ADI

5 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 5 Direct DCMF Application Programming  dcmf.h – core interface –point-to-point and utilities –all functions implemented  collectives interface(s) –may or may not be implemented –check return value on register!  Collective Component Messaging Interface (CCMI) –high level collectives library –uses multisend interface –extensible to new collectives BG/P Hardware dcmf_collectives.h CCMI SX SX SX DCMF sysdep messager sysdep Device Protocol dcmf.h Protocol Device dcmf_globalcollectives.hdcmf_multisend.h Adaptor Application high level collectives multisend collectives all point-to-point global collectives

6 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 6 DCMF Blue Gene/P Performance Point-to-PointCollectives on 512 nodes (SMP)  MPI achieves 4300MB/sec (96% of peak) for torus near-neighbor communication on 6 links ProtocolLatency (µs) DCMF Eager One-way1.6 MPI Eager One-way2.4 MPI Rendezvous One-way5.6 DCMF Put0.9 DCMF Get1.6 ARMCI blocking put2.0 ARMCI blocking get3.3 Collective OperationPerformance MPI Barrier1.3us MPI Allreduce (int sum)4.3us MPI Broadcast4.3us MPI Allreduce throughput817 MB/sec MPI Bcast throughput2.0 GB/sec  Barriers accelerated via the Global Interrupt network  Allreduce and broadcast operations accelerated via the collective network  Large broadcasts take advantage of the 6 edge- disjoint routes on a 3D torus

7 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 7 Why use DCMF ?  Scales on BG/P to millions of tasks –high-efficiency, low overhead  Open Source –active community support  Easily port applications and libraries to DCMF interface  Unique features of DCMF –See next chart

8 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 8 MXVERBSLAPIELANDCMF Multiple ContextsNYYYY Active MessagesNN1N1 YYY One-sided callsNYYYY Strided or Vector callsN1N1 N1N1 YYN2N2 Multi-send callsN1N1 N1N1 N1N1 N1N1 Y Message Ordering and Consistency NNNNY Device interface for many different networks NY (C-API)NNY 3 (C++) Topology AwarenessNNNNY Architecture NeutralNYYNY Non-blocking optimized collectives N1N1 N1N1 N1N1 BlockingY 1 This feature can be implemented in software on top of the provided set of features in this API, at possibly lower efficiency 2 Non-contiguous transfer operation to be added 3 Device level programming is available at the protocol level and not the API Feature Comparison (to the best of our knowledge)

9 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 9 DCMF C API Features  Multiple Context Registration –supports multiple, concurrent communication paradigms  Memory Consistency –One sided communication APIs like UPC and ARMCI need optimized support for memory consistency levels  Active Messaging –Good match for Charm++ and other active message runtimes –MPI can be easily supported  Multisend Protocols –Amortize startup across many messages sent together  Topology Awareness  Optimized Protocols  See dcmf.h

10 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 10 Extending DCMF to other Architectures  Copy the “Linux ® sockets” messager and build options –Contains sockets device and DCMF_Send () protocol –Implements core API, returns DCMF_UNIMPL for collectives  New architecture only needs to implement DCMF_Send –Sockets device enables DCMF on Linux clusters –Shmem device enables DCMF on multi-core systems  DCMF provides default *oversend point-to-point implementations –DCMF_Put () –DCMF_Get () –DCMF_Control ()  Selectively implement architecture devices and optimized protocols –Assign to DCMF_USER0_SEND_PROTOCOL (for example) to test

11 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 11 Upcoming Features * (nothing promised)  Common Device Interface (CDI) –Posix Shared Memory –Sockets –Infiniband  Multi-channel advance –Thread may advance a “slice” of the messaging devices –Dedicated threads result in uncontested locks for high-level communication libraries  Add a blocking advance API –Eliminate explicit processor polls on supported hardware –May degrade to a regular DCMF_Messager_advance() on unsupported hardware  Extend API to access Blue Gene ® features in portable manner –network and device structures –replace hardware struct with key-value  Noncontiguous point-to-point one-sided –iterator can be used to implement all other interfaces (strided, vector, etc)  One-sided “on the fly” collectives (ad hoc)

12 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 12 DCMF Device Abstraction  At the core of DCMF is a “Device”, with a packet API abstraction and a DMA API abstraction  In principle, the functions are virtual, in practice the methods are inlined for performance –Barton-Nackman C++ templates  Common Device Interface (CDI) –If you implement this interface, you get all of DCMF “for free” –Good for rapid prototypes

13 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 13 Current DCMF Devices  Blue Gene/P –DMA / 3-D Torus Network –Collective Network –Global Interrupt Network –Lockbox / Memory Atomics  Generic –Sockets –hybrid compatable –Shared Memory –hybrid compatable –Infiniband –hybrid compatable

14 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 14 Other DCMF Projects  IBM –Roadrunner  Argonne National Laboratory –MPICH2 –ZeptoOS  Pacific Northwest National Laboratory –Global Arrays / ARMCI  Berkeley –UPC / GASNet  University of Illinois at Urbana-Champaign –Charm++

15 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 15 Open Source Project Ideas, in no particular order  Store-and-Forward protocols  Stream API  Channel combining, message striping across devices  Extend to other process managers (OpenMPI, etc)  Extend to other platforms (OS X, BSD, Windows, ?)  DCMF functional and performance test suite  Scalability improvements for sockets and IB  Combination shmem/sockets messager  GPU device ? hybrid model  Shared memory collectives

16 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 16 How Can We be a more effective open source project  How to improve open source experience  specific needs, directions?  missing features?

17 © 2008 IBM Corporation Additional Charts DCMF on Linux Clusters DCMF on Infiniband

18 © 2008 IBM Corporation DCMF on Linux Clusters

19 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 19 DCMF on Linux Clusters  Build Instructions on Wiki –http://dcmf.anl-external.org/wiki/index.php/Building_DCMF_for_Linuxhttp://dcmf.anl-external.org/wiki/index.php/Building_DCMF_for_Linux  Test environment for application developers –Evaluate the DCMF API and runtime –Port applications to DCMF before reserving time on Blue Gene/P  Uses MPICH2 PMI for job launch and management –Needs pluggable job launch and sysdep extension to remove MPICH2 dependency  Implemented Devices –sockets device –shmem device

20 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 20 DCMF Sockets Device  Standard sockets syscalls implemented on many architectures  Uses the “packet” CDI –New “stream” CDI may provide better performance  Current design is not scalable –primarily a development and porting platform  Can be used to initialize other devices that require sychronization

21 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 21 DCMF Shmem Device  Uses the “packet” CDI  Only point-to-point send  Thread safe, allows multiple threads to post messages to device  No collectives

22 © 2008 IBM Corporation DCMF on Infiniband

23 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 23 DCMF Infiniband Motivations  Optimize for low power processors and big fatties  Infiniband project lead: Charles Archer –communicate via dcmf mailing list

24 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 24 DCMF Infiniband Device  Implements CDI “rdma” version –direct RDMA –memregions  Implements CDI “packet” version –“eager” style sends  rdma CDI design –SRQ, scalable – worst latency  packet CDI design –Per destination rdma with send recv –Per destination rdma with direct DMA – best latency

25 DCMF BoF, Supercomputing 2008 Deep Computing Messaging Framework | Lightweight Communication for Petascale Supercomputing © 2008 IBM Corporation 25 DCMF Infiniband – Future Work  Remove artificial limits on scalability –currently 32 nodes  Implement memregion caching  Multiple adaptor support (?)  Switch management routines (?)  Multiple network implemention –SRQ and “per destination”  Async progress through IB events


Download ppt "© 2008 IBM Corporation Deep Computing Messaging Framework Lightweight Communication for Petascale Supercomputing Supercomputing 2008 Michael Blocksome,"

Similar presentations


Ads by Google