Channel Interface Takeshi Nanri (Kyushu Univ., Japan) and Shinji Sumimoto (Fujitsu Ltd.) Dec. 2014 Advanced Communication for Exa (ACE) project supported.

Slides:



Advertisements
Similar presentations
Proposal (More) Flexible RMA Synchronization for MPI-3 Hubert Ritzdorf NEC–IT Research Division
Advertisements

Northwestern University 2007 Winter – EECS 443 Advanced Operating Systems Exokernel: An Operating System Architecture for Application-Level Resource Management.
Categories of I/O Devices
MPI Message Passing Interface
CoMPI: Enhancing MPI based applications performance and scalability using run-time compression. Rosa Filgueira, David E.Singh, Alejandro Calderón and Jesús.
Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in Wireless Ad Hoc Networks By C. K. Toh.
Traffic Shaping Why traffic shaping? Isochronous shaping
Advancing the “Persistent” Working Group MPI-4 Tony Skjellum June 5, 2013.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 10: Virtual Memory Background Demand Paging Process Creation Page Replacement.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.
Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Virtual Memory.
Semester Copyright USM EEE442 Computer Networks The Data Link / Network Layer Functions: Switching En. Mohd Nazri Mahmud MPhil (Cambridge, UK)
EE 4272Spring, 2003 Chapter 10 Packet Switching Packet Switching Principles  Switching Techniques  Packet Size  Comparison of Circuit Switching & Packet.
Wide Area Networks School of Business Eastern Illinois University © Abdou Illia, Spring 2007 (Week 11, Thursday 3/22/2007)
William Stallings Data and Computer Communications 7th Edition
EE 4272Spring, 2003 Chapter 9: Circuit Switching Switching Networks Circuit-Switching Networks Circuit-Switching Concept  Space-Division Switching  Time-Division.
Chapter 10 Introduction to Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
University Of Maryland1 A Study Of Cyclone Technology.
CS603 Communication Mechanisms 14 January Types of Communication Shared Memory Message Passing Stream-oriented Communications Remote Procedure Call.
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
Real-time Video Streaming from Mobile Underwater Sensors 1 Seongwon Han (UCLA) Roy Chen (UCLA) Youngtae Noh (Cisco Systems Inc.) Mario Gerla (UCLA)
Grid IO APIs William Gropp Mathematics and Computer Science Division.
The hybird approach to programming clusters of multi-core architetures.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Authors: Joaquim Azevedo, Filipe Santos, Maurício Rodrigues, and Luís Aguiar Form : IET Wireless Sensor Systems Speaker: Hao-Wei Lu sleeping zigbee networks.
Data Communications and Networking
Data and Computer Communications Chapter 8 – Multiplexing
Decision Optimization Techniques for Efficient Delivery of Multimedia Streams Mugurel Ionut Andreica, Nicolae Tapus Politehnica University of Bucharest,
A Comparative Study of the Linux and Windows Device Driver Architectures with a focus on IEEE1394 (high speed serial bus) drivers Melekam Tsegaye
Data and Computer Communications Chapter 10 – Circuit Switching and Packet Switching (Wide Area Networks)
 Circuit Switching  Packet Switching  Message Switching WCB/McGraw-Hill  The McGraw-Hill Companies, Inc., 1998.
Data and Computer Communications Circuit Switching and Packet Switching.
William Stallings Data and Computer Communications 7 th Edition Chapter 1 Data Communications and Networks Overview.
Planned AlltoAllv a clustered approach Stephen Booth (EPCC) Adrian Jackson (EPCC)
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
Jason Ernst, University of Guelph 1.  Introduction ◦ Background Information ◦ Motivation for Research / Current Problems  Proposed Solution ◦ Algorithm.
Computer Networks with Internet Technology William Stallings
HUAWEI TECHNOLOGIES CO., LTD. Page 1 Survey of P2P Streaming HUAWEI TECHNOLOGIES CO., LTD. Ning Zong, Johnson Jiang.
PPSP Peer Protocol draft-gu-ppsp-peer-protocol PPSP WG IETF 82 Taipei Rui Cruz (presenter) Yingjie Gu, Jinwei Xia, Mário Nunes, David Bryan, João Taveira.
Performance Oriented MPI Jeffrey M. Squyres Andrew Lumsdaine NERSC/LBNL and U. Notre Dame.
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
CSCI 465 D ata Communications and Networks Lecture 14 Martin van Bommel CSCI 465 Data Communications & Networks 1.
Tao Lin Chris Chu TPL-Aware Displacement- driven Detailed Placement Refinement with Coloring Constraints ISPD ‘15.
Minimizing Communication Latency to Maximize Network Communication Throughput over InfiniBand Design and Implementation of MPICH-2 over InfiniBand with.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.
CE Operating Systems Lecture 13 Linux/Unix interprocess communication.
Supporting Systolic and Memory Communication in iWarp CS258 Paper Summary Computer Science Jaein Jeong.
Operating System Principles And Multitasking
By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.
Eric Tryon Brian Clark Christopher McKeowen. System Architecture The architecture can be broken down to three different basic layers Stub/skeleton layer.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
DetNet BoF IETF #93 DetNet Problem Statement Monday, July 20 th, 2015 Norman Finn draft-finn-detnet-problem-statement.
Sending large message counts (The MPI_Count issue)
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Ch 8. Switching. Switch  Devices that interconnected with each other  Connecting all nodes (like mesh network) is not cost-effective  Some topology.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 CH. 8: SWITCHING & DATAGRAM NETWORKS 7.1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Exploring Parallelism with Joseph Pantoga Jon Simington.
Message Passing Interface Using resources from
Data Communication Networks Lec 13 and 14. Network Core- Packet Switching.
Parallel Programming By J. H. Wang May 2, 2017.
Chapter 9: Virtual-Memory Management
CS703 - Advanced Operating Systems
Data Communication Networks
Introduction to Operating Systems
COE 342: Data & Computer Communications (T042) Dr. Marwan Abu-Amara
Presentation transcript:

Channel Interface Takeshi Nanri (Kyushu Univ., Japan) and Shinji Sumimoto (Fujitsu Ltd.) Dec Advanced Communication for Exa (ACE) project supported by JST, Japan.

Overview of the proposal in Kobe (Sep, 2014) Single-directional, in-order and on-demand channel Data type MPI_Ch : a handle of a channel Routines for channel allocation / deallocation MPI_Channel_create(sender, receiver, &ch, comm) MPI_Channel_free(ch) MPI_Channel_ifree(ch, &req) Routines for message passings on a channel MPI_Channel_send(ch, addr, count, type, comm) MPI_Channel_isend(ch, addr, count, type, comm, &req) MPI_Channel_recv(ch, addr, count, type, comm) MPI_Channel_irecv(ch, addr, count, type, comm, &req) 2

Background and motivation of the proposal Background: Memory requirement for communication. For buffers and/or control structures (e.g. remote addresses, flags and counters) Requirement depends on the pattern of producers and consumers. High speed communication tends to require larger memory. Motivation: Enable memory-efficient implementation of the library. lower latency, higher bandwidth just enough memory consumption Approach: Explicit specification of the duration and the pattern of the communication. 3

Comments so far Need use-cases to show its advantage. There were similar approaches. Can't be memory efficient. MPI_COMM_WORLD requires O(N) memory consumption. Even proposed channels require O(N) memory to keep information for establishing connections. 'On demand' techniques already exists. Lazy creation of connection. Heuristic de-allocation. Effect from 'in order' is not promising. 'no-wildcards' on INFO (Ticket 381) will solve the problem. Data transfers on networks can be 'unordered'. 4

Comments so far (cont.) Should consider other patterns. P2P is too primitive Dedicated buffers for every P2P are not scalable. Require dumping information for creating channels. 'Topology' may be more efficient way to express the change of the communication pattern. Collectives, Graphs, etc. Should consider MPI_Freeze / MPI_Thaw also. 'Count' should be MPI_Count. How to decide the internal buffer size? Functions for querying the address of the buffer? Helps to use zero-copy data transfer. 5

Chances of advantages of the current proposal Still, memory consumption can be small enough. Choose appropriate options to minimize the memory consumption at MPI_Init. Information for connection is usually less than 100byte / procs.... less than 100MB for 1M procs. In case the number of connections per process is few enough. Different from persistent communication. Address and size of the message are not fixed. More precise de-allocation of buffers than heuristics. Hints can be provided per channel, rather than per communicator. e.g. size of the internal buffer, etc. 6 but...

Problems in the current proposal. No practical use-cases Persistent communication seems to be sufficient in most of the cases. ex) Adaptive Mesh Refinement, Stencil,... Messages with different addresses and sizes may be packed into one buffer. May conflict with other similar approaches. Op_notify / Sync_notify Stream(?) 7

Other ways Topology: Multiple producers and consumers Persistent collectives (proposed 7 years ago in Collectives WG). ex) Persistent neighbors Graph comms with fixed address and size of the buffer to be used. 8 MPI_Request *persreq;... do_init(); MPI_Bcast_init(buf, count, datatype, root, comm, persreq); while (!finished) { fill_buffer(buf, count); MPI_Start(persreq); finished = do_computation(); MPI_Wait(persreq, status); } MPI_Request_free(persreq);

Conclusions Proposal of Channel Interface. At least, there are some differences over other ones. Practical use case is the critical issue. We continue to look for it. Persistent collectives and neighbors are other options. 9