Presentation is loading. Please wait.

Presentation is loading. Please wait.

Realizing the Performance Potential of the Virtual Interface Architecture Evan Speight, Hazim Abdel-Shafi, and John K. Bennett Rice University, Dep. Of.

Similar presentations


Presentation on theme: "Realizing the Performance Potential of the Virtual Interface Architecture Evan Speight, Hazim Abdel-Shafi, and John K. Bennett Rice University, Dep. Of."— Presentation transcript:

1 Realizing the Performance Potential of the Virtual Interface Architecture Evan Speight, Hazim Abdel-Shafi, and John K. Bennett Rice University, Dep. Of Electrical and Computer Engineering Presented by Constantin Serban, R.U.

2 VIA Goals Communication infrastructure for System Area Networks (SANs) Targets mainly high speed cluster applications Efficiently harnesses the communication performance of underlying networks

3 Trends The peak bandwidth increase two order of magnitude over past decade while user latency decreased modestly. The latency introduced by the protocol is typically several times the latency of the transport layer. The problem becomes acute especially for small messages

4 Targets VI architecture addresses the following issues: Decrease the latency especially for small messages (used in synchronization) Increase the aggregate bandwidth (only a fraction of the peak bandwidth is utilized) Reduce the CPU processing due to the message overhead

5 Overhead Overhead mainly comes from two sources: Every network access requires one-two traps into the kernel –user/kernel mode switch is time consuming Usually two data copies occur: –From the user buffer to the message passing API –From message layer to the kernel buffer

6 VIA approach Remove the kernel from the critical path –Moving communication code out of the kernel into user space Provide 0-copy protocol –Data is sent/received directly into the user buffer, no message copy is performed

7 VIA emerged as a standardization effort from Compaq, Intel, and Microsoft It was built on several academic ideas: The main architecture most similar to U-Net Essential features derived from VMMC Among current implementations : –GigaNet cLan – VIA implemented in hardware –Tandem ServerNet –VIA software driver emulated –Myricom Myrinet - software emulated in firmware

8 VIA architecture

9 VIA operations Set-Up/Tear-Down : VIA is point-to-point connection oriented protocol VI-endpoint : the core concept in VIA Register/De-Register Memory Connect/Disconnect Transmit Receive RDMA

10 VIA operations Set-Up/Tear-Down :VIA is point-to-point connection oriented protocol VI-endpoint : the core concept in VIA VipCreateVi function creates a VI endpoint in the user space. The user-level library passes the call to the kernel agent which passes the creation information to the NIC. OS thus controls the application access to the NIC

11 VIA operations - cont’d Register/De-Register Memory: All data buffers and descriptors reside in a registered memory NIC performs DMA I/O operation in this registered memory Registration pins down the pages into the physical memory and provides a handle to manipulate the pages and transfer the addresses to the NIC It is performed once, usually at the beginning of the communication session

12 VIA operations - cont’d Connect/Disconnect: Before communication, each endpoint is connected to a remote endpoint The connection is passed to the kernel agent and down to the NIC VIA does not define any addressing scheme, existing schemes can be used in various implementations

13 VIA operations - cont’d Transmit/receive: The sender builds a descriptor for the message to be sent. The descriptor points to the actual data buffer. Both descriptor and data buffer resides in a registered memory area. The application then posts a doorbell to signal the availability of the descriptor.The doorbell contains the address of the descriptor. The doorbells are maintained in an internal queue inside the NIC

14 VIA operations - cont’d Transmit/receive (cont’d): Meanwhile, the receiver creates a descriptor that points to an empty data buffer and posts a doorbell in the receiver NIC queue When the doorbell in the sender queue has reached the top of the queue, through a double indirection the data is sent into the network. The first doorbell/ descriptor is picked up from the receiver queue and the buffer is filled out with data

15 VIA operations - cont’d RDMA: As a mechanism derived from VMMC, VIA allows Remote DMA operations: RDMA Read and Write Each node allocates a receive buffer and registers it with the NIC. Additional structures that contain read and write pointers to the receive buffers are exchanged during connection setu Each node can read and write to the remote node address directly. These operations posts potential implementation problems.

16 Evaluation Benchmarks Two VI implementations : –GigaNet cLan B:125MB/sec, Latency 480ns –Tandem ServerNet, 50MB/S, Latency 300ns Performance measured: –Bandwidth and Latency –Poling vs. Blocking –CPU Utilization

17 Bandwidth

18 Latency

19 Latency Polling/Blocking

20 CPU utilization

21 MPI performance using VIA The challenge is to deliver performance to distributed application Software layers such MPI are mostly used between VIA and the application: provide increased usability but they bring additional overhead How to optimize this layer in order to use it efficiently with VIA ?

22 MPI VIA - performance

23 MPI observations Difference between MPI-UDP and MPI- VIA-baseline is remarkable MPI-VIA-baseline is dramatically far from VIA-Native Several improvements proposed to shift MPI-Via to be closer to VIA native : reduce MPI overhead

24 MPI Improvements Eliminating unnecessary copies: MPI UDP and VIA use a single set of receiving buffers, thus data should be copied to the application : allow the user to register any buffer Choosing a synchronization primitive: All synchronization formerly using OS constructs/events. Better implementation using swap processor commands No Acknowledge: Remove the acknowledge of the message by switching to a reliable VIA mode

25 VIA - Disadvantages Polling vs. blocking synchronization – a tradeoff between CPU consumption and overhead Memory registration: locking large amount of memory makes virtual memory mechanisms inefficient. Registering / deregistering on the fly is slow Point-to-point vs. multicast: VIA lacks multicast primitives. Implementing multicast over the actual mechanism, makes communication inefficient

26 Conclusion Small latency for small messages. Small messages have a strong impact on application behavior Significant improvement over UDP communication (still after recent TCP/UDP hardware implementations?) At the expense of an uncomfortable API


Download ppt "Realizing the Performance Potential of the Virtual Interface Architecture Evan Speight, Hazim Abdel-Shafi, and John K. Bennett Rice University, Dep. Of."

Similar presentations


Ads by Google