Presentation is loading. Please wait.

Presentation is loading. Please wait.

High-Performance Object Access in OSD Storage Subsystem Yingping Lu.

Similar presentations


Presentation on theme: "High-Performance Object Access in OSD Storage Subsystem Yingping Lu."— Presentation transcript:

1 High-Performance Object Access in OSD Storage Subsystem Yingping Lu

2 Outline OSD Overview Problem and common approaches Related work Initial Proposal Issues

3 Design Objectives of OSD Scalability (local area-enterprise-global) High-performance (high throughput, low latency) Cross platform High availability (resilient to device, machine failure) Support both permanent, mobile and even disconnected clients Security (authentication, access control, transmission and data storage encryption) Data sharing Manageability?

4 Region Communication Entities: Client Metadata Manager OSD device Communication Paths: Client to metadata server Client to OSD device Metadata to OSD device Metadata to Metadata

5 Problem The network bandwidth is getting faster and faster (10Gb/s is on the road). OSD Application requires high performance How to efficiently deliver object data between OSD device and client?

6 Potential Measures Potential performance improvement measures – Locality-based Migration (reduce transmission time) – Migrate to the location closer to client. Replication (reduce transmission time) – Replicate a copy within the client’s proximity. – Can replicate data object or metadata. Cache (reduce disk access time/transmission time) – Where: client, metadata server, object device, etc. – What: data object, metadata, locking. – How long: TTL, lease, renewal.

7 Performance Improvement Measures (cont.) Improvement measures – Aggregation (Device grouping) Improve the aggregate I/O throughput and reliability Works like a RAID system – Data path-based Decouple the control path from data path Reduce the length of critical path in the data access level.

8 Performance Constraints Consistency (in updating, reconciliation) Locking and serialization Security Small data size access Crash recovery

9 Leveraging Data Access Path Streamline the end system Zero copy/RDMA User level programming/OS bypass TCP offloading Improve the transport system Large window size Explicit congestion notification Selective acknowledgement Connection splitting (mobile) Explicit congestion control protocol (XCP)

10 What’s Wrong With End System Streamlining end systems – Problems: the end system cannot provide the potential bandwidth to applications. Memory copy Context switching Interrupt service Checksumming generation Protocol processing

11 End System Overhead Streamlining the end system – Overhead Per packet – Protocol processing (execute code, allocate/release buffer) – access control – Interrupt service time for each received packet – Kernel context switching Per byte – Checksum generation – Memory copy – Data transmission

12 Streamlining End System Solutions – RDMA (Zero copy) – One system-wide buffer pool – User level networking (bypassing kernel) – TCP offloading – Jumbo packets – Interrupt coalescing – Scatter/gather list

13 Related work Previous work: – I/O Lite – VI (Myrinet, Servernet) – SDP – InfiniBand – SRP – DAT (Direct Access Transport collaborative) – DAFS (SNIA) – NFS/RDMA (SNIA) – RDMA over TCP/IP

14 I/O Lite Purpose: Reduce memory copy Approach: Maintain a global buffer pool in the system Allow application, IPC, file system, network subsystem to share one copy of data Pros: – Reduce memory copy – Useful for read-only buffer Cons: – System rewritten – Buffer update is difficult

15 RDMA Extend DMA’s semantics across machine boundary Two operations: RDMA read, RDMA Write Memory registration: memory needs “pinned” A descriptor carries the src, dest address, length A special hardware (nic) handle the RDMA operation. Pros: – Zero copy – Offload CPU processing Cons – Need Special hardware – Need reprogramming

16 Remote DMA Scenario Host AHost B RDMA Engine (NIC) RDMA Engine (NIC) Buffer A CPU Buffer B CPU 1 2 3

17 Virtual Interface Architecture (VIA) Goal:low latency, high throughput by direct access to NIC, zero copy Programming abstract: VI(queue pair) Components: consumer,VI provider(UA, KA, NIC) Operations: RDMA, Send/Receive Present a standard of RDMA operations and VI abstract

18 InfiniBand An emerging I/O interconnect technology Decouple I/O from CPU Adopt a serial, switched- based fabric Provide a unified communication mechanism (4 layers) Provide VI support (Verb, QP, RDMA, etc.) Implement VI concept in a standard network

19 SCSI RDMA Protocol (SRP) Goal: provide a SCSI access across IB fabric Exploit the IB RDMA to transfer SCSI data Enable SAN based on IB It’s targeted specifically for IB, not suitable for IP It’s block-level (SCSI) access, (can be object level?)

20 DAFS and NFS/RDMA DAFS is being developed by DAFS consortium A light weight file sharing protocol for local data sharing Leverage NFS4.0 Exploit RDMA mechanism to transfer file data. Being developed by SNIA NFS/RDMA group Enable NFS to exploit the new networking technology (VIA, IB) Make changes to RPC/XDR to use RDMA semantics Target at local area environment

21 Socket Direct Protocol (SDP) Microsoft’s solution in datacenter (2000) Retain the same socket programming interface Bypass the TCP/IP processing in kernel Support RDMA semantic Not routable, works in a data center or cluster

22 RDMA over TCP/IP Developed by rdmaconsortium Support RDMA over TCP/IP network Consisted of three components: RDMAP, DDP, MPA RDMAP: provide RDMA operations DDP: direct data placement MPA: handle framing SCTP: stream-control transport protocol SCTP DDP RDMAP ULP TCP MPA IP

23 Summary Link-level – No routing info carried – Rely on the underlying link-level switch to forward – Restricted to data center, cluster environment – Examples: VIA, InfiniBand, SRP, SDP, DFAS, NFS/RDMA Transport-level – Carries TCP/IP header – Can traverse to IP network – Process framing, direct data placement.

24 OSD Requirements Direct delivery from object device – Direct transmission between initiator and target device – This is the critical data path Secure delivery – No security channel is assumed, encryption of transmitted object is necessary QoS requirement – Object may have specific QoS requirement Mobile client – Client may be connect, disconnect connected again. – Error can occur during transmission

25 Initial Proposal: OSD/Secure RDMA This is a ULP-based RDMA – The RDMA is tightly integrated with OSD protocol Leverage RDMA over TCP/IP – Extend the communication to IP network OSD device initiate RDMA request Security-enabled RDMA – The underlying transport support security QoS support – Virtual Lane-type mechanism to provide QoS support

26 OSD/Secure RDMA Architecture OSD Client OSD controller OSD VIPL Object Manager Buffers Disk Driver NIC VI NIC driver OSD Device Application OSD VIPL Buffers NIC VI NIC driver IP network

27 Protocol Stacks OSD/RDMA maps OSD to RDMA DDP provide the direct data placement The underlying transport can be either SCTP or MPA with TCP. IPSec is used as security protocol (object encryption) SCTP DDP OSD/RDMA OSD Protocol TCP MPA IP/IPSec Intelligent NIC OSD Consumer OSD VIPL Consumer

28 Data Access Case – Get an Object OSD ClientOSD Device Request an obj with Obj id, credential, descriptor RDMA write Data packet RDMAWrCompl 1* 2* 1*: need first get access permission and establish an session. Register memory Post a send request 2*: Validate the request. Register a memory buffer Fetch the object from disk or cache to the buffer Post a RDMA write request

29 Issues to be solved Elaborate OSD object transfer protocol. – Should we simply consider SCSI/OSD? – What would be new requirement, e.g. security? The integration of iSCSI over RDMA. – The establishment of session OSD session/iSCSI session/RDMA connection/TCP connection Sequence? Persistence vs. transient? – Define the format of OSD/RDMA packet Memory descriptor Commands (login, logout, CMD) Flow-control

30 Issues Integration of RDMA with OSD (cont.) – Define a set of standard API for OSD/RDMA Create a session Register memory Post a work queue element Query status, etc. Integration with security – IPSec vs. SSL? Handle QoS requirement – QoS attributes, how to specify in an object – QoS assurance: credit-based flow control?


Download ppt "High-Performance Object Access in OSD Storage Subsystem Yingping Lu."

Similar presentations


Ads by Google