Stan Smith Intel SSG/DPD February, 2015 Kernel OpenFabrics Interface Initialization.

Slides:



Advertisements
Similar presentations
Introduction to Sockets Jan Why do we need sockets? Provides an abstraction for interprocess communication.
Advertisements

CSE 333 – SECTION 8 Networking and sockets. Overview Network Sockets IP addresses and IP address structures in C/C++ DNS – Resolving DNS names Demos.
KOFI Stan Smith Intel SSG/DPD January, 2015 Kernel OpenFabrics Interface.
Networks: TCP/IP Socket Calls1 Elementary TCP Sockets Chapter 4 UNIX Network Programming Vol. 1, Second Ed. Stevens.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Socket Programming.
Socket Addresses. Domains Internet domains –familiar with these Unix domains –for processes communicating on the same hosts –not sure of widespread use.
The Basics of communication LectureII. Processing Techniques.
CSE/EE 461 Getting Started with Networking. Basic Concepts  A PROCESS is an executing program somewhere.  Eg, “./a.out”  A MESSAGE contains information.
Stan Smith Intel SSG/DPD March, 2015 Kernel OpenFabrics Interface Server End point setup.
5/12/05CS118/Spring051 A Day in the Life of an HTTP Query 1.HTTP Brower application Socket interface 3.TCP 4.IP 5.Ethernet 2.DNS query 6.IP router 7.Running.
IP-UDP-RTP Computer Networking (In Chap 3, 4, 7) 건국대학교 인터넷미디어공학부 임 창 훈.
Gursharan Singh Tatla Transport Layer 16-May
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface KFI Framework.
Stan Smith Intel SSG/DPD February, 2015 Kernel OpenFabrics Interface kOFI Framework.
IB ACM InfiniBand Communication Management Assistant (for Scaling) Sean Hefty.
Sockets CIS 370 Fall 2009, UMassD. Introduction  Sockets provide a simple programming interface which is consistent for processes on the same machine.
Assignment 3 A Client/Server Application: Chatroom.
Network Programming Tutorial #9 CPSC 261. A socket is one end of a virtual communication channel Provides network connectivity to any other socket anywhere.
Socket Programming. Introduction Sockets are a protocol independent method of creating a connection between processes. Sockets can be either – Connection.
CS345 Operating Systems Φροντιστήριο Άσκησης 2. Inter-process communication Exchange data among processes Methods –Signal –Pipe –Sockets.
Review: –What is AS? –What is the routing algorithm in BGP? –How does it work? –Where is “policy” reflected in BGP (policy based routing)? –Give examples.
OpenFabrics 2.0 Sean Hefty Intel Corporation. Claims Verbs is a poor semantic match for industry standard APIs (MPI, PGAS,...) –Want to minimize software.
Sockets CIS 370 Lab 10 UMass Dartmouth. Introduction 4 Sockets provide a simple programming interface which is consistent for processes on the same machine.
Server Sockets: A server socket listens on a given port Many different clients may be connecting to that port Ideally, you would like a separate file descriptor.
OpenFabrics 2.0 or libibverbs 1.0 Sean Hefty Intel Corporation.
TCP : Transmission Control Protocol Computer Network System Sirak Kaewjamnong.
1 The Internet and Networked Multimedia. 2 Layering  Internet protocols are designed to work in layers, with each layer building on the facilities provided.
OFI SW - Progress Sean Hefty - Intel Corporation.
Hyung-Min Lee ©Networking Lab., 2001 Chapter 8 ARP and RARP.
Fabric Interfaces Architecture Sean Hefty - Intel Corporation.
---- IT Acumens. COM IT Acumens. COMIT Acumens. COM.
Chapter 2 Applications and Layered Architectures Sockets.
Scalable RDMA Software Solution Sean Hefty Intel Corporation.
The Socket Interface Chapter 22. Introduction This chapter reviews one example of an Application Program Interface (API) which is the interface between.
Network Programming Eddie Aronovich mail:
Remote Shell CS230 Project #4 Assigned : Due date :
Storage Interconnect Requirements Chen Zhao, Frank Yang NetApp, Inc.
Networking Tutorial Special Interest Group for Software Engineering Luke Rajlich.
Advanced Sockets API-II Vinayak Jagtap
Fabric Interfaces Architecture Sean Hefty - Intel Corporation.
Transport Layer 3-1 Chapter 3 Outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP.
CSE/EE 461 Getting Started with Networking. 2 Basic Concepts A PROCESS is an executing program somewhere. –Eg, “./a.out” A MESSAGE contains information.
Socket Programming Lab 1 1CS Computer Networks.
The Client-Server Model And the Socket API. Client-Server (1) The datagram service does not require cooperation between the peer applications but such.
Part 4: Network Applications Client-server interaction, example applications.
OFI SW Sean Hefty - Intel Corporation. Target Software 2 Verbs 1.x + extensions 2.0 RDMA CM 1.x + extensions 2.0 Fabric Interfaces.
How to write a MSGQ Transport (MQT) Overview Nov 29, 2005 Todd Mullanix.
CSCI 330 UNIX and Network Programming Unit XV: Transmission Control Protocol.
S OCKET P ROGRAMMING IN C Professor: Dr. Shu-Ching Chen TA: HsinYu Ha.
Introduction to Sockets
S OCKET P ROGRAMMING IN C Professor: Dr. Shu-Ching Chen TA: Hsin-Yu Ha.
Review: – Why layer architecture? – peer entities – Protocol and service interface – Connection-oriented/connectionless service – Reliable/unreliable service.
CSCI 330 UNIX and Network Programming Unit XIV: User Datagram Protocol.
1 IEX8175 RF Electronics Avo Ots telekommunikatsiooni õppetool, TTÜ raadio- ja sidetehnika inst.
OpenFabrics 2.0 rsockets+ requirements Sean Hefty - Intel Corporation Bob Russell, Patrick MacArthur - UNH.
Socket Programming. Computer Science, FSU2 Interprocess Communication Within a single system – Pipes, FIFOs – Message Queues – Semaphores, Shared Memory.
Stan Smith Intel SSG/DPD June, 2015 Kernel Fabric Interface Kfabric Framework.
Lecture 3 TCP and UDP Sockets CPE 401 / 601 Computer Network Systems slides are modified from Dave Hollinger.
11 CS716 Advanced Computer Networks By Dr. Amir Qayyum.
1 Socket Interface. 2 Client-Server Architecture The client is the one who speaks first Typical client-server situations  Client and server on the same.
Socket Programming(1/2). Outline  1. Introduction to Network Programming  2. Network Architecture – Client/Server Model  3. TCP Socket Programming.
1 Socket Interface. 2 Basic Sockets API Review Socket Library TCPUDP IP EthernetPPP ARP DHCP, Mail, WWW, TELNET, FTP... Network cardCom Layer 4 / Transport.
SC’13 BoF Discussion Sean Hefty Intel Corporation.
Assignment 3 A Client/Server Application: Chatroom
developing to the openfabrics interfaces libfabric
Fabric Interfaces Architecture – v4
Socket Programming in C
Transport layer API: Socket Programming
Outline Communications in Distributed Systems Socket Programming
Presentation transcript:

Stan Smith Intel SSG/DPD February, 2015 Kernel OpenFabrics Interface Initialization

Steps Application Flow –Initialization –Server connection setup –Client connection setup –Connection finalization –Data transfer –Shutdown Initialization –fi_getinfo()*info – query & select a fabric provider –fi_fabric()*fabric – create a fabric instance –fi_domain()*domain – select a fabric domain (which HCA via info) 2

kOFI Initialization #include fi_getinfo() - Acquire a list of desirable/available fabric providers. List elements (struct fi_info) are filled-out per device/provider. –Release: fi_freeinfo( &fi ); –Duplicate: fi2 = fi_dupinfo( &fi ); Select appropriate fabric (traverse provider list). fi_fabric(fi, &fabric) Create a fabric instance based on fabric provider selection. fi_domain(fabric, fi, &domain) create a fabric access domain object. 3

fi_getinfo() Acquire a list of available fabric providers based on struct fi_info hints. int fi_getinfo(int version, const char *node, const int port, uint64_t flags, struct fi_info *hints, struct fi_info **info); version - Interface version requested by application. node - Optional, name or fabric address to resolve. port - Optional, port number of address. flags - Operation flags for the fi_getinfo call. hints - *fi_info structure which specifies criteria for selecting fabrics. info : (OUT) a linked list of fi_info structures containing fabric information. 4

fi_getinfo() changed Acquire a list of available fabric providers based on struct fi_info hints. int fi_getinfo(int version, struct fi_info *hints, struct fi_info **info); version - Interface version requested by application. Removed: node - Optional, name or fabric address to resolve. port - Optional, port number of address. flags - Operation flags for the fi_getinfo call. In favor of hints.src_addr -> struct sockaddr for (RDS & IB providers) hints - *fi_info structure which specifies criteria for selecting fabrics. info : (OUT) a linked list of fi_info structures containing fabric information. 5

fi_getinfo() cont. struct fi_info { struct fi_info *next; uint64_t caps;// fabric interface capabilities uint64_t mode;// Operational modes supported by the IF enum fi_ep_type ep_type;// type of fabric IF communication desired uint32_t addr_format; // format of addresses referenced by the fabric size_t src_addrlen;// length of the source address size_t dest_addrlen;// length of the destination address void *src_addr;// source address void *dest_addr;// destination address fi_connreq_t connreq;// a specific connection request struct fi_tx_attr *tx_attr; struct fi_rx_attr *rx_attr; struct fi_ep_attr *ep_attr; struct fi_domain_attr *domain_attr;// optional:.name = “ib0” struct fi_fabric_attr *fabric_attr; }; 6

Tx attributes Optional Transmit context attributes may be specified and returned as part of fi_getinfo. When provided as hints, requested values of struct fi_tx_ctx_attr should be set. On output, the actual transmit context attributes that can be provided will be returned. struct fi_tx_attr { uint64_tcaps;# see CAPABILITIES section in fi_getinfo(3) for details # e.g. FI_MSG, FI_RMA, FI_ATOMIC uint64_tmode;# see MODE section in fi_getinfo(3) for details; FI_LOCAL_MR uint64_top_flags;# Tx operation types: (man fi_endpoint.3) uint64_tmsg_order;# order in which transport layer headers (as viewed by the # application) are processed. FI_ORDER_NONE, FI_ORDER_STRICT uint64_tcomp_order;# order in which completed requests are written into the # completion queue. size_tinject_size;# max Tx buffer size which is reusable prior to associated CQ event. size_tsize;# bytes in local Tx context queue; work-queue depth. size_tiov_limit; # max QP scatter-gather list depth (num elements). size_trma_iov_limit; # max RDMA scatter-gather list depth. }; 7

Rx attributes rx_attr - Optional receive context attributes may be specified and returned as part of fi_getinfo. When provided as hints, requested values of struct fi_rx_ctx_attr should be set. On output, the actual receive context attributes that can be provided will be returned. Output values will be greater than or or equal to the requested input values. struct fi_rx_attr { uint64_tcaps; # see CAPABILITIES section in fi_getinfo(3) for details # e.g. FI_MSG, FI_RMA, FI_ATOMIC uint64_tmode;# see MODE section in fi_getinfo(3) for details; FI_LOCAL_MR uint64_top_flags;# permitted operations uint64_tmsg_order;# order in which transport layer headers are processed. uint64_tcomp_order;# order in which completed requests are written into the # completion queue. size_ttotal_buffered_recv; # total size of provider managed Rx buffer space. # (see FI_BUFFERED_RECV context flag) size_tsize;# bytes in local Rx context queue; work-queue depth. size_tiov_limit;# max scatter-gather list depth. }; 8

EndPoint attributes ep_attr - Optional endpoint attributes may be specified and returned as part of fi_getinfo. When provided as hints, requested values of struct fi_ep_attr should be set. On output, the actual endpoint attributes that can be provided will be returned. Output values will be greater than or equal to requested input values. struct fi_ep_attr { uint32_tprotocol;# end-2-end protocol: FI_PROTO_RDMA_CM_IB_RC uint32_tprotocol_version;# connection protocol version size_tmax_msg_size;# max number of bytes per data transfer operation. size_tinject_size;# max TX buffer size that can be reused prior to CQ event. size_ttotal_buffered_recv;# Total # of Rx bytes utilized for FI_BUFFERED_RECV size_tmsg_prefix_size;# size of message prefix size(0), FI_MSG_PREFIX size_tmax_order_raw_size;# RDMA: Read-after-Write byte size size_tmax_order_war_size;# RDMA: Write-after-Read byte size size_tmax_order_waw_size; # RDMA: Write-after-Write byte size uint64_tmem_tag_format;# bit array: number of tag bits supported by the provider uint64_tmsg_order;# order in which transport layer headers are processed. uint64_tcomp_order;# order which completed requests are written to the CQ. size_ttx_ctx_cnt;# Tx work-queue-depth size_trx_ctx_cnt;# Rx work-queue-depth }; 9

Domain attributes domain_attr - Optionally supplied domain attributes may be specified and returned as part of fi_getinfo. When provided as hints, requested values of struct fi_domain_attr should be set. On output, the actual domain attribute that can be provided will be returned. Output values will be >= to requested input values. struct fi_domain_attr { struct fid_domain*domain;# optional existing domain instance char*name;# ib0, /sys/class/infiniband enum fi_threadingthreading; FI_THREAD_UNSPEC, FI_THREAD_SAFE, FI_THREAD_FID, FI_THREAD_DOMAIN, FI_THREAD_COMPLETION, FI_THREAD_ENDPOINT enum fi_progresscontrol_progress; enum fi_progressdata_progress; FI_PROGRESS_UNSPEC, FI_PROGRESS_AUTO (provider determined), FI_PROGRESS_MANUAL (provider requires the use of an application thread to complete an asynchronous request (socket dev); eq/wait/cq/counter). }; 10

Domain attributes II struct fi_domain_attr { enum fi_resource_mgmtresource_mgmt; # provider support for protecting local and/or remote resource overruns; IB RNR FI_RM_UNSPEC, FI_RM_DISABLED, FI_RM_ENABLED size_tmr_key_size;# range in bytes a key may span size_tcq_data_size;# remote CQ data size, Tx supplied RCQ data size_tcq_cnt;# max num of CQs per domain size_tep_cnt;# max num of EndPoints per domain size_ttx_ctx_cnt;# optimal num of Tx QPs size_trx_ctx_cnt;# optimal num of Rx QPs size_tmax_ep_tx_ctx;# max num of Tx contexts per EP size_tmax_ep_rx_ctx;# max num of Rx contexts per EP }; 11

Fabric attributes fabric_attr - Optional fabric attributes may be specified and returned as part of fi_getinfo. When provided as hints, requested values of struct fi_fabric_attr should be set. On output, the actual fabric attributes that can be provided will be returned. struct fi_fabric_attr { struct fid_fabric*fabric;# optional already opened fabric instance, char*name;# infiniband, /sys/class/ char*prov_name;# ibverbs uint32_tprov_version; }; 12

RDS Example 13 struct fi_info*providers, hints = { 0 }; struct fi_fabric_attrfabric_attr = { 0 }; struct fi_domain_attrdomain_hints = { 0 }; struct fi_ep_attrep_hints = { 0 }; ep_hints.protocol= FI_PROTO_UNSPEC; fabric_attr.name = "IP"; fabric_attr.prov_name = "RDS"; // domain_hints.name = "ib0";// provider will set local address in hints.src_addr hints.domain_attr = &domain_hints; hints.fabric_attr = &fabric_attr; hints.ep_attr= &ep_hints; hints.ep_type= FI_EP_RDM; /* Reliable Datagram */ hints.caps= FI_MSG | FI_SOURCE | FI_CANCEL | FI_SEND | FI_RECV; hints.addr_format= FI_SOCKADDR_IN; hints.src_addr= (struct sockaddrs_in*) &local_IF_addr; rc = fi_getinfo( FT_FIVERSION, &hints, &providers );

Infiniband RC Example 14 struct fi_info*provider, hints = { 0 }; struct fi_fabric_attrfabric_attr = { 0 }; struct fi_domain_attrdomain_hints = { 0 }; struct fi_ep_attrep_hints = { 0 }; ep_hints.protocol= FI_PROTO_RDMA_CM_IB_RC; hints.ep_type= FI_EP_MSG; hints.caps= FI_MSG | FI_RMA | FI_CANCEL; hints.addr_format= FI_SOCKADDR_IN; hints.src_addr= &local_IF_addr; hints.src_addrlen= sizeof(local_IF_addr); local_IF_addr.sin_family= AF_INET; local_IF_addr.sin_port= htons(SVR_PORT); local_IF_addr.sin_addr.s_addr = htonl(SVR_ADDR);// or in4_pton() ret = fi_getinfo(FT_FIVERSION, &hints, &provider);

Concerns fi_info() Node name resolution for each provider? Failure to expose port configuration, sans doman.name? fi_info per port? Sockaddr_ib. Complexity – trying to be everything to everyone; how to reduce? A struct fi_info* for each: EndPoint type: IB: RC, UD Fi_info for each port, although no explicit port identification? 15

fi_fabric() Open the fabric domain provider which the application selected. A fabric domain represents a collection of hardware and software resources that access a single physical or virtual network. struct fid_fabric*fi; int fi_fabric( fi_fabric_attr *attr, struct fid_fabric *fabric, void *context ) Arguments: attr: Attributes of fabric to open. fabric (out) Fabric provider pointer context: fabric level event context. 16

fi_domain() Open a fabric access domain. A fabric access domain typically refers to a physical or virtual NIC or hardware port; however, a domain may span across multiple hardware components for fail-over or data striping purposes. struct fid_domain*domain; int fi_domain(struct fid_fabric *fabric, struct fi_info *info, struct fid_domain **domain, void *context); Arguments: fabric – previously opened fabric pointer. info – provider info structure used in creating the ‘fabric’ pointer. domain (out) fabric provider domain. context – domain event context. 17

Examples 18 struct fi_info*fi = provider fi_info record selected from fi_getinfo() provider list struct fid_fabric*fabric; intrc; rc = fi_fabric( fi->fabric_attr, &fabric, NULL ); if ( rc ) { printk( KERN_ERROR “%s() ERR: fi_fabric() rc %d\n”,__func__,rc); return rc; } struct fid_domain *domain; rc = fi_domain( fabric, fi, &domain, NULL ); // Ready to create End points: passive and active to connect.