A Hybrid MPI Design using SCTP and iWARP Distributed Systems Group Mike Tsai, Brad Penoff, and Alan Wagner Department of Computer Science University of.

Slides:



Advertisements
Similar presentations
TCP/IP MODEL Maninder Kaur
Advertisements

Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
Transport Layer3-1 Transport Overview and UDP. Transport Layer3-2 Goals r Understand transport services m Multiplexing and Demultiplexing m Reliable data.
Umut Girit  One of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer.
Performance Evaluation of RDMA over IP: A Case Study with the Ammasso Gigabit Ethernet NIC H.-W. Jin, S. Narravula, G. Brown, K. Vaidyanathan, P. Balaji,
1 May 2011 RDMA Capable iWARP over Datagrams Ryan E. Grant 1, Mohammad J. Rashti 1, Pavan Balaji 2, Ahmad Afsahi 1 1 Department of Electrical and Computer.
IWARP Update #OFADevWorkshop.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
UDP - User Datagram Protocol UDP – User Datagram Protocol Author : Nir Shafrir Reference The TCP/IP Guide - ( Version Version.
Introduction to Transport Layer. Transport Layer: Motivation A B R1 R2 r Recall that NL is responsible for forwarding a packet from one HOST to another.
ECE Department: University of Massachusetts, Amherst ECE 354 Spring 2009 Lab 3: Transmitting and Receiving Ethernet Packets.
CPSC 441: Intro, UDP1 Instructor: Anirban Mahanti Office: ICT Class Location: ICT 121 Lectures: MWF 12:00 – 12:50 Notes.
Research Agenda on Efficient and Robust Datapath Yingping Lu.
Gursharan Singh Tatla Transport Layer 16-May
Supporting iWARP Compatibility and Features for Regular Network Adapters P. BalajiH. –W. JinK. VaidyanathanD. K. Panda Network Based Computing Laboratory.
© 2007 Cisco Systems, Inc. All rights reserved.ICND1 v1.0—1-1 Building a Simple Network Understanding the Host-to-Host Communications Model.
Lecturer: Tamanna Haque Nipa
Process-to-Process Delivery:
New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.
SCTP versus TCP for MPI Brad Penoff, Humaira Kamal, Alan Wagner Department of Computer Science University of British Columbia Distributed Research Group.
IWARP Redefined: Scalable Connectionless Communication Over High-Speed Ethernet M. J. Rashti, R. E. Grant, P. Balaji and A. Afsahi.
ISO Layer Model Lecture 9 October 16, The Need for Protocols Multiple hardware platforms need to have the ability to communicate. Writing communications.
SCTP versus TCP for MPI Brad Penoff, Humaira Kamal, Alan Wagner Department of Computer Science University of British Columbia.
1 March 2010 A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance Ryan E. Grant 1, Pavan Balaji 2, Ahmad.
Protocols and the TCP/IP Suite
CECS 474 Computer Network Interoperability Notes for Douglas E. Comer, Computer Networks and Internets (5 th Edition) Tracy Bradley Maples, Ph.D. Computer.
TCP/IP Yang Wang Professor: M.ANVARI.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
M3UA Patrick Sharp.
TCP/IP PROTOCOL SUITE The TCPIIP protocol suite was developed prior to the OSI model. Therefore, the layers in the TCP/IP protocol suite do not exactly.
The NE010 iWARP Adapter Gary Montry Senior Scientist
University of the Western Cape Chapter 12: The Transport Layer.
2006 Sonoma Workshop February 2006Page 1 Sockets Direct Protocol (SDP) for Windows - Motivation and Plans Gilad Shainer Mellanox Technologies Inc.
User Datagram Protocol (UDP) Chapter 11. Know TCP/IP transfers datagrams around Forwarded based on destination’s IP address Forwarded based on destination’s.
1 Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Computer Security Workshops Networking 101. Reasons To Know Networking In Regard to Computer Security To understand the flow of information on the Internet.
Minimizing Communication Latency to Maximize Network Communication Throughput over InfiniBand Design and Implementation of MPICH-2 over InfiniBand with.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
4.1.4 multi-homing.
Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultipl exing m reliable data transfer.
BZUPAGES.COM Presentation on TCP/IP Presented to: Sir Taimoor Presented by: Jamila BB Roll no Nudrat Rehman Roll no
Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.
Towards MPI progression layer elimination with TCP and SCTP
1 Bus topology network. 2 Data is sent to all computers, but only the destination computer accepts 02608c
SCTP: A new networking protocol for super-computing Mohammed Atiquzzaman Shaojian Fu Department of Computer Science University of Oklahoma.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
Protocol Layering Chapter 11.
Teacher:Quincy Wu Presented by: Ying-Neng Hseih
Enterprise Network Systems TCP Mark Clements. 3 March 2008ENS 2 Last Week – Client/ Server Cost effective way of providing more computing power High specs.
Voice Over Internet Protocol (VoIP) Copyright © 2006 Heathkit Company, Inc. All Rights Reserved Presentation 5 – VoIP and the OSI Model.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
Project Title: Establishing communication between the server and Envirobat using TCP/IP Presented by Apourva.P.
Network Models. The OSI Model Open Systems Interconnection (OSI). Developed by the International Organization for Standardization (ISO). Model for understanding.
Cisco I Introduction to Networks Semester 1 Chapter 7 JEOPADY.
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
Transport Layer Slides are originally from instructor: Carey Williamson at University of Calgary Very minor modification are made Notes derived from “Computer.
Port Scanning James Tate II
4.1.5 multi-homing.
OSI model vs. TCP/IP MODEL
Subject Name: Computer Communication Networks Subject Code: 10EC71
Using SCTP to hide latency in MPI programs
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
SCTP-based Middleware for MPI
CPEG514 Advanced Computer Networkst
Network Models CCNA Instructor Training Course October 12-17, 2009
CS4470 Computer Networking Protocols
High Throughput Application Messaging
An XML-based System Architecture for IXA/IA Intercommunication
How Applications (Will Hopefully Soon) Use the Internet
Presentation transcript:

A Hybrid MPI Design using SCTP and iWARP Distributed Systems Group Mike Tsai, Brad Penoff, and Alan Wagner Department of Computer Science University of British Columbia Vancouver, Canada April 14, 2008

A Hybrid Message Passing Interface Design using the Stream Control Transmission Protocol and the Internet Wide Area Remote Direct Memory Access Protocol Distributed Systems Group Mike Tsai, Brad Penoff, and Alan Wagner Department of Computer Science University of British Columbia Vancouver, Canada April 14, 2008

Research Background SCTP – Stream Control Transmission Protocol –IETF standardized transport protocol for IP –Can be used anywhere TCP or UDP are used –Additional features SCTP and MPI middleware –LAM (unreleased)‏ –MPICH2 (1.0.5 and on) ch3:sctp –Open MPI SCTP BTL (in v1.3 trunk)‏

Hardware acceleration techniques for IP –Protocol offload –OS bypass –Zero copy –RDMA –10 GigE How would these look for SCTP? Are there benefits here for using SCTP? State-of-the-Art Networking

iWARP - Internet Wide Area RDMA protocol –IETF standard for RDMA over IP Use RDMA, point-to-point, or a mix? “Why Compromise?” (G. HPCWire.com)‏ –Depending on the application, use whichever is best. For MPI middleware, who decides what’s best? Story/motivation The programmer!

Contribution Hybrid MPI with functional decomposition lets the programmer decide: –Let RMA use RDMA –Let other communications use point-to-point Explore SCTP’s use within iWARP –Extended OSC userspace software iWARP, making many internal OSC changes

iWARP : DDP & LLP Direct Data Placement Fragments messages Reassembles segments Segments self-contained Data delivery and placement separation Out-of-order delivery Requires LLP to: Keep segment boundaries Be reliable Take a strong checksum

iWARP : LLP = MPA over TCP Message PDU Aligned Message framing DDP segment vs. TCP stream Markers for out-of-order For middlebox fragmentation Stronger checksum … is a complex layer (majority of OSC code)! … can lead to non-compliant TCP stacks. LLP

SCTP is a better LLP LLP’s needs built-in to SCTP: Reliable, message-based CRC32c checksum Out-of-order support: MSG_UNORDERED Multistreaming Multihoming Unmodified stack supports: Path failover Multirail data striping LLP

In the beginning, there was ch3:sctp

OSC iWARP was modified and incorporated in as a thread….

RMA done by modified OSC iWARP

OSC iWARP changes to support MPI Running in a thread Use SCTP Making all OSC ops non-blocking Locks around shared data

Connection Management Design Connection establishment: Separate one-to-many socket for new QPs –SCTP “peeloff” feature New QP sends request from one-to-many socket Request/ACK received, then QP socket peeled-off For conflicts, MPI rank resolves who sends ACK

Progress Engine

Performance What we tested… –Compared our new ch3:hybrid to the original ch3:sctp –Two 3.2 GHz Intel boxes (GigE + switch)‏ OSU latency tests (MPI_Put & MPI_Get)‏ Homemade synthetic benchmark –Combination of RMA and MPI-1 calls

OSU One-sided Latency Tests ch3:hybrid adds 2-8% overhead

Synthetic Application ch3:hybrid was faster than ch3:sctp – 3.8 seconds vs. 4.5 seconds Extra thread helps in some cases

Conclusions RDMA versus point-to-point for MPI –Why choose? Functional decomposition lets programmer decide SCTP is a good match for iWARP –Implementation of iWARP using SCTP shown. –SCTP has its place in the state-of-the-art. –It’d be more exciting to have SCTP-based devices…

Google “sctp mpi” for more information about our work Thank you!

Connection Management Design