Presentation is loading. Please wait.

Presentation is loading. Please wait.

Developing a Scalable Coherent Interface (SCI) device for MPJ Express

Similar presentations


Presentation on theme: "Developing a Scalable Coherent Interface (SCI) device for MPJ Express"— Presentation transcript:

1 Developing a Scalable Coherent Interface (SCI) device for MPJ Express
Guillermo López Taboada 14th October, 2005 Dept. of Electronics and Systems University of A Coruña (Spain) Visitor at Distributed Systems Group

2 Outline Introduction Design of scidev Implementation issues
Benchmarking Future work Conclusions November 24, 2018

3 Introduction The interconnection network and its associated software libraries play a key role in High Performance Clustering Technology Cluster interconnection technologies: Gb & 10Gb Ethernet Myrinet SCI Infiniband Qsnet Quadrics GSN - HIPPI Giganet Latencies are small (usually under 10us) Bandwidths are high (usually above 1Gbps) Outline the project November 24, 2018

4 Introduction SCI (Scalable Coherent Interface)
Latency 1.42 us (theoretical) Bandwidth 5333 Mbps (bi-directional) Usually without switch (small clusters) Topologies 1D (ring) / 2D (torus 2D) Outline the project November 24, 2018

5 Introduction Example of a 2D torus SCI cluster with FE (admin)
Outline the project November 24, 2018

6 Introduction Software available from Dolphinics:
Outline the project Software available from Scali: ScaIP: IP emulation ScaSISCI: SISCI (Sw Infrastructure for SCI) ScaMPI: proprietary MPI implementation November 24, 2018

7 Introduction Java’s portability means in networking that only the widely extended TCP/IP is supported by the JDK Previously, IP emulations were used (ScaIP & SCIP) but performance is similar to FE Now a High Performance Socket Implementation, SCI SOCKETS Similar to other Interconnection Tech. Myrinet (IPoGM->GMSockets) Outline the project November 24, 2018

8 Introduction Several research projects have been trying to get support in Java for these System Area Networks, mainly in Myrinet: KaRMI/GM (JavaParty, Univ. Karlsruhe) Manta/LFC/Panda/Ibis (Univ. Vrije – Holland) Java GM Sockets RMIX myrinet mpiJava/MPICH-GM or MPICH-MX But nothing in SCI Outline the project November 24, 2018

9 Introduction My PhD Project:
“Designing Efficient Mechanisms for Java communications on SCI systems” The motivation is filling the gap between Java and this high-speed interconnect, which lacks of sw support for Java SCI Java Fast Sockets An SCI communication device, base of a messaging system SCI Channel for Java NIO Wrappers for some libraries Optimized RMI for High Speed Networks Low level Java buffering and communication system Outline the project November 24, 2018

10 Introduction MPJ Express, a reference implementation of the MPI bindings for the Java language, has been released. Already mature bindings for C, C++, and Fortran, but ongoing efforts on the Java binding at DSG A good opportunity to provide SCI support to a messaging system Outline the project November 24, 2018

11 Outline Introduction Design of scidev Implementation issues
Benchmarking Future work Conclusions November 24, 2018

12 Design of scidev Use of Java Native Interface JNI (unavoidable)
In order to provide support and good performance we have to rely on specific low level libraries In the presence of SCI hw it should use it Lost of portability in exchange of higher performance Differences between mpiJava and scidev: mpiJava- thin wrapper providing a large number of Java MPI primitives scidev- thicker layer providing a small API November 24, 2018

13 Design of scidev Implementing the xdev API: init() finish() id()
iprobe(ProcessID srcID, int tag, int context) irecv(Buffer buf, ProcessID srcID, int tag, int context, Status status) isend(Buffer buf, ProcessID destID, int tag, int context) and the blocking counterparts of these functions: probe, recv, send + issend & ssend November 24, 2018

14 Design of scidev November 24, 2018

15 Design of scidev mpjdev JVM xdev mxdev scidev JNI O.S Native Libraries
November 24, 2018

16 Design of scidev Native libraries: SCILib and SISCI SCILIB
Outline the project November 24, 2018

17 Outline Introduction Design of scidev Implementation issues
Benchmarking Future work Conclusions November 24, 2018

18 Implementation Issues
Optimizations / initialization process: JNI: Caching field identifiers and references to objects Sending 2 messages in Long protocol 1st from a 4-byte multiple address and second from a 128-byte multiple address up to a 128-byte multiple address (go further the end of the message – raw Buffer has a 2^n length) Algorithm to init the message queues of SCILib Connect (to nodes with lower rank) Create (for all nodes, beginning with the following rank) Connect (the remaining nodes) The complexity is O(n) November 24, 2018

19 Implementation Issues
Tranport protocols: 3 native protocols: Inline 1-113b Short 114b-64Kb Long 64Kb-1Mb scidev fragments messages > 1MB and is using: Inline for control messages and small messages<113b Short with PIO (Programmed Input-Output) for messages < 8Kb Short with DMA (Direct Memory Access) for messages 8-64Kb Long in user level libraries does not use DMA transfers, so it is replaced by own Long protocol with DMA tx November 24, 2018

20 Implementation Issues
Communications: scidev is based on non-blocking communications It’s coded having niodev as template Asynchronous sends for messages sizes > 1MB Notification strategy: Following the approach of SCI SOCKET, using the mbox interruption library Created without transfering the references (SCI interrupt handlers) Each interruption (both user_interruptions and dma_interruptions) register a callback method November 24, 2018

21 Implementation Issues
Sending/Receiving: 2 threads: user and selector thread, synchronized for reducing latency 1 message queue in which the control messages of pending communications are kept Sending directly from the “Buffer” Direct ByteBuffer If selector thread receives a message not posted -> creates an intermediate buffer for temporal storage If the message has been posted, it copies the message directly to the “Buffer” Direct ByteBuffer November 24, 2018

22 Implementation Issues
This schema for each pair of nodes selector thread user thread user thread SBUFFER RBUFFER ULL ULL LONG LONG Intermediate SHORT SHORT Queue Queue Queue Queue SCI Inline Inline November 24, 2018

23 Outline Introduction Design of scidev Implementation issues
Benchmarking Future work Conclusions November 24, 2018

24 Benchmarking JDK 1.5 on holly. Latency (us). SCI 51 12 5 11 FE 161 145
MPJE mpiJava C sockets Java S. SCI 51 12 5 11 FE 161 145 83 109 GbE 131 101 65 86 scidev latency is 33us! November 24, 2018

25 Benchmarking JDK 1.5 on holly. Asymptotic Bandwidths (Mbps). SCI 1200
MPJE mpiJava C sockets Java S. SCI 1200 1480 400 360 FE 90 92 93 GbE 680 587 900 600* scidev throughput is 1280 Mbps! November 24, 2018

26 Outline Introduction Design of scidev Implementation issues
Benchmarking Future work Conclusions November 24, 2018

27 Future work Immediatily:
Testing for collective communications (here only was for point-to-point) A design with lower interdependence between xdev and mpjbuf Get information from different formats of configuration files in SCI Benchmarking with MPJ applications and developing MPJ and xdev applications. New buffering implementation November 24, 2018

28 Future work Buffering System with Sbuffer and Rbuffer in ULL (still intermidiate) SBUFFER RBUFFER ULL ULL SBUFFER RBUFFER LONG LONG Intermediate SHORT SHORT Queue Queue Queue Queue SCI Inline Inline November 24, 2018

29 Outline Introduction Design of scidev Implementation issues
Benchmarking Future work Conclusions November 24, 2018

30 Conclusions Performance is still a problem
Try to avoid control message. Maybe integrating this data in the ul library Aim: latency 30us & Bw 1350 Mbps Current phase in developing: Testing Hard to do multiple initializations in a single thread (restart the device) Design is a bit coupled with MPJ – strong interdependence Needs evaluation and implementation using a kernel level library (threads and spawns process natively) November 24, 2018

31 Questions ? November 24, 2018

32 Appendix Visitor at the DSG during summer 05
Pursuing PhD at Univ. of A Coruña (Spain) November 24, 2018

33 Appendix BS in Computing Tech. in 2002 at A Coruña Univ.
Member of the Computer Architecture Group. Areas of interest of the group: High Performance compilers (automatic detection of parallelism) Cluster computing Grid applications Management of Parallel/Distributed systems Fault tolerance in MPI Computer graphics (rendering, radiosity) Geographical Information Systems 12 staff members, 8 PhD students November 24, 2018

34 Appendix Computer Architecture Group.
Crossgrid (eu project within Gridstart) November 24, 2018

35 Appendix The Computer Architecture Group is young, has an average age of 32 years Some achievements ( ): Papers in international conferences: 102 Papers in Journals: 53 (41 in JCR/SCI list) Regional, national and european funded projects (+/- 1M € in 5 years) November 24, 2018

36 Gratitudes DSG for providing full support for my work
Specially Aamir and Raz for late, smoky and caffeinated DSG office hours Mark for hosting the visit and his valuable support ICG and UoP for the facilities and services Bryan Carpenter for his rare but valuable comments, and his help with some JNI pbs. DXIDI – Xunta de Galicia, for funding the visit November 24, 2018

37 A Coruña You will be always welcome to A Coruña! November 24, 2018

38 A Coruña You will be always welcome to A Coruña! November 24, 2018


Download ppt "Developing a Scalable Coherent Interface (SCI) device for MPJ Express"

Similar presentations


Ads by Google