Presentation is loading. Please wait.

Presentation is loading. Please wait.

Challenges in Getting Flash Drives Closer to CPU Myoungsoo Jung (UT-Dallas) Mahmut Kandemir (PSU) The University of Texas at Dallas.

Similar presentations


Presentation on theme: "Challenges in Getting Flash Drives Closer to CPU Myoungsoo Jung (UT-Dallas) Mahmut Kandemir (PSU) The University of Texas at Dallas."— Presentation transcript:

1 Challenges in Getting Flash Drives Closer to CPU Myoungsoo Jung (UT-Dallas) Mahmut Kandemir (PSU) The University of Texas at Dallas

2 Take-away Leveraging PCIe bus as storage interface – ≠ conventional memory system interconnects – ≠ thin storage interfaces – Requires new SSD architecture and storage stack Motivation: there are not many studies focusing on the system characteristics of these emerging PCIe SSD platforms. Contributions: we quantitatively analyze the challenges faced by PCIe SSDs in getting flash memory closer to CPU 1.Memory consumption 2.Computation resource requirement 3.Performance as a shared storage system 4.Latency impact on their storage-level queuing mechanisms

3 Bandwidth Trend Bandwidth improvement (150MB/s ~ 600MB/s)

4 Bandwidth Trend SSDs have improved their bandwidth 4x SSDs begin to blur the distinction between block and memory access semantic devices

5 Flash Storage Migration Core Flash Core Taking SSDs out from the I/O controller hub and locating them as close to the CPU side as possible Interface Bottleneck PCIe interface is by far one of the easiest ways to integrate flash memory into the processor-memory complex

6 Flash Integration 1.Bridge-based PCIe SSD (BSSD) 2.From-scratch PCIe SSD (FSSD)

7 Bridge-based PCIe SSD (BSSD) multiple traditional SAS/SATA SSD controllers Bridge controller exposing an aggregated SAS/SATA SSD performance RC = Root Complex, CTRL = Controller EP = Endpoint, HBA = Host Block Adapter

8 Bridge-based PCIe SSD (BSSD) High Compatibility Fast Development Process Redundant Control Logics Computational Overheads En-decoding Overheads PROS CONS RC = Root Complex, CTRL = Controller EP = Endpoint, HBA = Host Block Adapter

9 From-scratch PCIe SSD (FSSD) PCIe endpoints (EPs) has upstream and downstream buffers, which control in- bound and out-bound I/O requests PCIe EPs and switch are implemented as a form of native PCIe controller FSSD has been built bottom to top by directly interconnecting the NAND flash interface and the external PCIe link Point-to-point PCIe link network RC = Root Complex, CTRL = Controller EP = Endpoint, HBA = Host Block Adapter

10 From-scratch PCIe SSD (FSSD) Highly scalable Exposing flash performance Protocol design/implementation Tailoring SW/HW Resource competition PROS CONS RC = Root Complex, CTRL = Controller EP = Endpoint, HBA = Host Block Adapter

11 Flash Software Stack File System Block Storage Layer HBA Device Driver Host Interface Layer (NVMHC) Flash Software (FTL) Hardware Abstraction Layer Database Logical Block I/O Interface Host Storage Buffer cache Address mapping Wear-leveling

12 Experimental Setup Host configuration – Quad Core i7 Sandy Bridge 3.4GHz – External extra HDD (for logging the footprints) – 16GB Memory (4GB DDR DIMM * 4) most performance values observed with FSSD are about 40% better than BSSD

13 Tool Synthesized micro-benchmark workloads of Iometer Modified Iometer – Time series evaluation: a script that generates log-data per every sec. – Memory usage evaluation: added a module in calling system API GlobalMemoryStatusEx() into Iometer

14 Memory Usage (Overall) [Writes][Reads] Request sizes (1 ~ 512 sectors ) Physical memory consumption FSSD consumes 3x~16x more memory space FSSD consumes 2.5x more memory space 0.6 GB (BSSD)

15 Memory Usage (BSSD) Memory consumption submits I/Os whenever device is available 128 entries BSSD requires only 0.6GB memory space regardless of the I/O type and size.

16 Memory Usage (FSSD) 2GB memory requirements 10GB memory usage to manage only the underlying SSD may not be acceptable in many applications As the I/O process progresses, the amount of memory usage keeps increasing in logarithmic fashion and reach 10GB

17 CPU Usage (BSSD) Time series Host-level CPU usages BSSD consumes 15%~30% of total CPU cycles for handling I/O requests

18 CPU Usage (FSSD) FSSD requires much higher CPU usages (50%~ 90%) A CPU usage over 60% for just I/O processing might be able to degrade overall system performance 60% of the cycles on host-side CPU I/O service with queue-mode operation requires 50% more CPU cycles

19 FSSD performance (multi-threads) Latency Throughput worse than four workers by 118% worse than single workers by 289 % 2.2x better than single worker FSSD offers very stable and predictable performance

20 FSSD resource usages (multi-threads) Memory consumption CPU usages the advantage decreases because of high memory requirement and CPU usages Require 134% more memory space Require 201% more computation resources

21 BSSD resource usages (multi-threads) offers similar memory requirements (less than 0.66GB) irrespective of # of threads offers similar CPU usages (less than 30%) irrespective of # of threads Memory consumption CPU usages

22 BSSD performance (multi-threads) worse than four workers by 289% worse than single workers by 708 % There exist no differences with varying number of workers Write-cliff occurs (garbage collection impact) Latency Throughput

23 Latency Impact on a Queuing Method worse than a legacy req. by 106x worse than a legacy req. by 86x worse than a legacy req. by 99x worse than a legacy req. by 184x FSSD BSSD

24 Summary Design trade-off between performance and resource utilization – All-Flash-Array – Data-center/HPC local node SSD Software stack optimization – Co-operative approaches – Unified/direct file systems – Garbage collection schedulers – Queue control We are constructing an environment for automated SSD evaluation in camelab.org


Download ppt "Challenges in Getting Flash Drives Closer to CPU Myoungsoo Jung (UT-Dallas) Mahmut Kandemir (PSU) The University of Texas at Dallas."

Similar presentations


Ads by Google