Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fabric Interfaces Architecture Sean Hefty - Intel Corporation.

Similar presentations


Presentation on theme: "Fabric Interfaces Architecture Sean Hefty - Intel Corporation."— Presentation transcript:

1 Fabric Interfaces Architecture Sean Hefty - Intel Corporation

2 Changes v2 –Remove interface object –Add open interface as base object –Add SRQ object –Add EQ group object v3 –Modified SRQ –Enhanced architecture semantics www.openfabrics.org 2

3 Overview Object Model –Do we have the right type of objects defines? –Do we have the correct object relationships? Interface Synopsis –High-level description of object operations –Is functionality missing? –Are interfaces associated with the right object? Architectural Semantics –Do the semantics match well with the apps? –What semantics are missing? www.openfabrics.org 3

4 Object “Class” Model Objects represent collection of attributes and interfaces –I.e. object-oriented programming model Consider architectural model only at this point www.openfabrics.org 4 Objects do not necessarily map directly to hardware or software objects

5 Conceptual Object Hierarchy www.openfabrics.org 5 Fabric Descriptor Fabric Domain Address Vector Map Index Endpoint Msg Passive Active Datagram RDM Dispatcher Event Queue Completion CM AV Domain EQ Group Counter Memory Region Interfaces Object “inheritance”

6 Object Relationships www.openfabrics.org 6 Fabric Passive EP EQCM Domain AV Map Index Active EP Msg Datagram RDM Dispatch EP EQ Group EQ CQ CM AV Domain Counter MR Object “scope”

7 Fabric Represents a communication domain or boundary – Single IB or RoCE subnet, IP (iWarp) network, Ethernet subnet Multiple local NICs / ports Topology data, network time stamps Determines native addressing – Mapped addressing possible – GID/LID versus IP www.openfabrics.org 7 Fabric Passive EP EQ Domain

8 Passive (Fabric) EP Listening endpoint – Connection-oriented protocols Wildcard listen across multiple NICs / ports Bind to address to restrict listen – Listen may migrate with address www.openfabrics.org 8 Fabric Passive EP EQ Domain

9 Fabric EQ Associated with passive endpoint(s) Reports connection requests Could be used to report fabric events www.openfabrics.org 9 Fabric Passive EP EQ Domain

10 Resource Domain Boundary for resource sharing – Physical or logical NIC – Command queue Container for data transfer resources A provider may define multiple domains for a single NIC – Dependent on resource sharing www.openfabrics.org 10 Fabric Passive EP EQ Domain

11 Domain Address Vectors Maintains list of remote endpoint addresses – Map – native addressing – Index – ‘rank’-based addressing Resolves higher-level addresses into fabric addresses – Native addressing abstracted from user Handles address and route changes www.openfabrics.org 11 Domain AV Active EP EQ EQ Group Counter MR

12 Domain Endpoints Data transfer portal – Send / receive queues – Command queues – Ring buffers – Buffer dispatching Multiple types defined – Connection-oriented / connectionless – Reliable / unreliable – Message / stream www.openfabrics.org 12 Domain AV Active EP EQ EQ Group Counter MR

13 Domain Event Queues Reports asynchronous events Unexpected errors reported ‘out of band’ Events separated into ‘EQ domains’ – CM, AV, completions – 1 EQ domain per EQ – Future support for merged EQ domains www.openfabrics.org 13 Domain AV Active EP EQ EQ Group Counter MR

14 EQ Groups Collection of EQs Conceptually shares same wait object Grouping for progress and wait operations www.openfabrics.org 14 Domain AV Active EP EQ EQ Group Counter MR

15 Domain Counters Provides a count of successful completions of asynchronous operations – Conceptual HW counter Count is independent from an actual event reported to the user through an EQ www.openfabrics.org 15 Domain AV Active EP EQ EQ Group Counter MR

16 Domain Memory Regions Memory ranges accessible by fabric resources – Local and/or remote access Defines permissions for remote access www.openfabrics.org 16 Domain AV Active EP EQ EQ Group Counter MR

17 Interface Synopsis Operations associated with identified ‘classes’ General functionality, versus detailed methods –The full set of methods are not defined here –Detailed behavior (e.g. blocking) is not defined Identify missing and unneeded functionality –Mapping of functionality to objects www.openfabrics.org 17 Use timeboxing to limit scope of interfaces to refine by a target date

18 www.openfabrics.org 18 Base Class CloseDestroy / free object BindCreate an association between two object instances SyncFencing operation that completes only after previously issued asynchronous operations have completed Control(~fcntl) set/get low-level object behavior I/F OpenOpen provider extended interfaces

19 www.openfabrics.org 19 Fabric DomainOpen a resource domain EndpointCreate a listening EP for connection-oriented protocols EQ OpenOpen an event queue for listening EP or reporting fabric events

20 www.openfabrics.org 20 Resource Domain QueryObtain domain specific attributes Open AV, EQ, EP, SRQ, EQ Group Create an address vector, event or completion counter, event queue, endpoint, shared receive queue, or EQ group MR OpsRegister data buffers for access by fabric resources

21 www.openfabrics.org 21 Address Vector InsertInsert one or more addresses into the vector RemoveRemote one or more addresses from the vector LookupReturn a stored address StraddrConvert an address into a printable string

22 www.openfabrics.org 22 Base EP EnableEnables an active EP for data transfers CancelCancel a pending asynchronous operation Getopt(~getsockopt) get protocol specific EP options Setopt(~setsockopt) set protocol specific EP options

23 www.openfabrics.org 23 Passive EP Getname(~getsockname) return EP address ListenStart listening for connection requests RejectReject a connection request

24 www.openfabrics.org 24 Active EP CMConnection establishment ops, usable by connection-oriented and connectionless endpoints MSG2-sided message queue ops, to send and receive messages RMA1-sided RDMA read and write ops Tagged2-sided matched message ops, to send and receive messages (conceptual merge of messages and RMA writes) Atomic1-sided atomic ops TriggeredDeferred operations initiated on a condition being met

25 www.openfabrics.org 25 Event Queue ReadRetrieve a completion event, and optional source endpoint address data for received data transfers Read ErrRetrieve event data about an operation that completed with an unexpected error WriteInsert an event into the queue ResetDirects the EQ to signal its wait object when a specified condition is met StrerrorConverts error data associated with a completion into a printable string

26 www.openfabrics.org 26 EQ Group PollCheck EQs for events WaitWait for an event on the EQ group

27 www.openfabrics.org 27 Completion Counter ReadRetrieve a counter’s value AddIncrement a counter SetSet / clear a counter’s value WaitWait until a counter reaches a desired threshold

28 www.openfabrics.org 28 Memory Region Desc(~lkey) Optional local memory descriptor associated with a data buffer Key(~rkey) Protection key against access from remote data transfers

29 Architectural Semantics Progress Ordering - completions and data delivery Multi-threading and locking model Buffering Function signatures and semantics www.openfabrics.org 29 Once defined, object and interface semantics cannot change – semantic changes require new objects and interfaces Need refining

30 Progress Ability of the underlying implementation to complete processing of an asynchronous request Need to consider ALL asynchronous requests –Connections, address resolution, data transfers, event processing, completions, etc. HW/SW mix www.openfabrics.org 30 All(?) current solutions require significant software components

31 Progress Support two progress models –Automatic and implicit Separate operations as belonging to one of two progress domains –Data or control –Report progress model for each domain www.openfabrics.org 31 SAMPLEImplicitAutomatic DataSoftwareHardware offload ControlSoftwareKernel services

32 Automatic Progress Implies hardware offload model –Or standard kernel services / threads for control operations Once an operation is initiated, it will complete without further user intervention or calls into the API Automatic progress meets implicit model by definition www.openfabrics.org 32

33 Implicit Progress Implies significant software component Occurs when reading or waiting on EQ(s) Application can use separate EQs for control and data Progress limited to objects associated with selected EQ(s) App can request automatic progress –E.g. app wants to wait on native wait object –Implies provider allocated threading www.openfabrics.org 33

34 Ordering Applies to a single initiator endpoint performing data transfers to one target endpoint over the same data flow –Data flow may be a conceptual QoS level or path through the network Separate ordering domains –Completions, message, data Fenced ordering may be obtained using fi_sync operation www.openfabrics.org 34

35 Completion Ordering Order in which operation completions are reported relative to their submission Unordered or ordered –No defined requirement for ordered completions Default: unordered www.openfabrics.org 35

36 Message Ordering Order in which message (transport) headers are processed –I.e. whether transport message are received in or out of order Determined by selection of ordering bits –[Read | Write | Send] After [Read | Write | Send] –RAR, RAW, RAS, WAR, WAW, WAS, SAR, SAW, SAS Example: –fi_order = 0 // unordered –fi_order = RAR | RAW | RAS | WAW | WAS | SAW | SAS // IB/iWarp ordering www.openfabrics.org 36

37 Data Ordering Delivery order of transport data into target memory –Ordering per byte-addressable location –I.e. access to the same byte in memory Ordering constrained by message ordering rules –Must at least have message ordering first www.openfabrics.org 37

38 Data Ordering Ordering limited to message order size –E.g. MTU –In order data delivery if transfer <= message order size Message order size = 0 –No data ordering Message order size = -1 –All data ordered www.openfabrics.org 38

39 Other Ordering Rules Ordering to different target endpoints not defined Per message ordering semantics implemented using different data flows –Data flows may be less flexible, but easier to optimize for –Endpoint aliases may be configured to use different data flows www.openfabrics.org 39

40 Multi-threading and Locking Support both thread safe and lockless models –Compile time and run time support –Run-time limited to compiled support Lockless (based on MPI model) –Single – single-threaded app –Funneled – only 1 thread calls into interfaces –Serialized – only 1 thread at a time calls into interfaces Thread safe –Multiple – multi-threaded app, with no restrictions www.openfabrics.org 40

41 Buffering Support both application and network buffering –Zero-copy for high-performance –Network buffering for ease of use Buffering in local memory or NIC –In some case, buffered transfers may be higher- performing (e.g. “inline”) Registration option for local NIC access –Migration to fabric managed registration Required registration for remote access –Specify permissions www.openfabrics.org 41


Download ppt "Fabric Interfaces Architecture Sean Hefty - Intel Corporation."

Similar presentations


Ads by Google