Download presentation
Presentation is loading. Please wait.
1
The NVMe™ Standard: The Next Five Years
Sponsored by NVM Express® organization, the owner of NVMe™, NVMe-oF™ and NVMe-MI™ standards
2
Speaker David Allen David L. Black, Ph.D. NVMe Board Member
3
Follow NVMe™ https://www.linkedin.com/company/nvmexpress/
nvmexpress.org @NVMexpress
4
About NVM Express® NVM Express Base Specification
NVM Express (NVMe™) is an open collection of standards and information to fully expose the benefits of non-volatile memory in all types of computing environments from mobile to data center. NVMe is designed from the ground up to deliver high bandwidth and low latency storage access for current and future NVM technologies. NVM Express Base Specification The register interface and command set for PCI Express attached storage with industry standard software available for numerous operating systems. NVMe™ is widely considered the defacto industry standard for PCIe SSDs. NVM Express Management Interface (NVMe-MI™) Specification The command set and architecture for out of band management of NVM Express storage (i.e., discovering, monitoring, and updating NVMe™ devices using a BMC). NVM Express Over Fabrics (NVMe-oF™) Specification The extension to NVM Express that enables tunneling the NVM Express command set over additional transports beyond PCIe. NVMe over Fabrics™ extends the benefits of efficient storage architecture at scale in the world’s largest data centers by allowing the same protocol to extend over various networked interfaces.
5
Introduction: Where Are We Now?
6
Source: Micron
7
NVMe™ Specification Roadmap: 2014 – 2019
2015 2016 2017 2018 2019 Q1 Q2 Q3 Q4 Namespace Management Controller Memory Buffer Host Memory Buffer Live Firmware Update NVMe™ 1.2 – Nov ‘14 NVMe™ May’16 NVMe™ 1.3 NVMe™ (next) NVMe™ Base Sanitize Streams Virtualization IO Determinism Persistent Memory Region Multipathing NVMe™ oFabric NVMe-oF™ 1.0 May’16 NVMe-oF™ (next) Transport and protocol RDMA binding Enhanced Discovery TCP Transport Binding NVMe-MI™ 1.0 Nov’15 NVMe-MI™ 1.1 NVMe™ MI Out-of-band management Device discovery Health & temp monitoring Firmware Update SES Based Enclosure Management NVMe-MI™ In-band Storage Device Enhancements Released NVMe specification Planned release
8
Projected completion: mid-2019
NVMe™ (next) – Major new Functionality I/O Determinism, Manageability & Testability Improvements Projected completion: mid-2019
9
I/O Determinism: Background
“In practice, a single user request may result in thousands of subqueries, with a critical path that is dozens of subqueries long. The fork/join structure of subqueries causes latency outliers to have a disproportionate effect on total latency" Challenges to Adopting Stronger Consistency at Scale - Ajoux et. Al., (Facebook & USC), 2015 Great resources from FMS this year!
10
Latency Outliers and Queue Depth
Enable “QoS wall”, without trading high performance for low Queue Depth (QD) Eliminate read latency outliers without giving up high performance.
11
I/O Determinism: Eliminate Read Latency Outliers
Media: Alternating Deterministic and Non-Deterministic states/windows Deterministic state/window Reads: Bounded read latency Host avoids writes, Device avoids background activity (e.g., garbage collection, data scrubbing) Exit deterministic state/window after some time period, Writes may cause immediate exit Non-deterministic state/window Reads: Unbounded read latency Writes and background operations happen during this state/window
12
I/O Determinism: Isolate Independent Workloads
Problem: Don’t want entire device to go Non- Deterministic at once (too much) Solution: NVM Set Media subset that is logically separate from media in other NVM Sets Individual NVM Sets change determinism state independently NVM Sets are constructed by device vendor to avoid noisy neighbor problem Workloads using one NVM Set do not impact others Example: separate channels and/or dies From Spec: An NVM Set is a collection of NVM that is separate (logically and potentially physically) from NVM in other NVM Sets. One or more namespaces may be created within an NVM Set and those namespaces inherit the attributes of the NVM Set. A namespace is wholly contained within a single NVM Set and shall not span more than one NVM Set. Figure Fig4.TBDa shows an example of three NVM Sets. NVM Set A contains three namespaces (NS A1, NS A2, and NS A3). NVM Set B contains two namespaces (NS B1 and NS B2). NVM Set C contains one namespace (NS C1). Each NVM Set shown also contains ‘Unallocated’ regions that consist of NVM that is not yet allocated to a namespace. From FB presentation: An abstract allocation of SSD HW resources ▪ Each set has dedicated NAND resource ▪ Each set can have dedicated channels, depends on architecture ▪ Each set carries out its own writes and background operations ▪ Physically isolated to avoid “Collison”caused by the noisy neighbors
13
Manageability & Testability Improvements
Persistent Event Log: Record interesting events and errors Important use case: System vendor triage of returned SSDs. Rebuild Assist: Help host deal with failures Proactive scan/scrub detection of media failures (e.g., degraded cell) React to component failures (e.g., die or channel) Verify Command: Read and discard data (to check data integrity) Improves testing for read errors – Data discard visibly shortens test time Administrative Controller: NVMe™ controller that doesn’t do I/O Important use case: Enclosure management via NVMe-MI™
14
NVMe™ (next) Well Underway
2017 2018 2019 NVMe (next) Development Status Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 8 Phase 1 22 Ratified 11 Phase 2 2 Phase 3 NVMe 1.3 May’17 NVMe (next) Device Self Test Set Timestamp Boot Partitions Sanitize Error Log Updates Globally Unique NGUID/EUI64 SGL Dword Simplify Streams Directive Device Telemetry Virtualization Host Controlled Thermal Mgmt NVMe-MI Tunneling Grab Bag (incl. Strict Mode) Para-virtualized Dev Support IO Determinism Persistent Event Log Rebuild Assist Verify Command Administrative Controller Sanitize (secure erase) extension Persistent Memory Region Namespace Write Protect Enhanced Command Retry and many more … Ratified TPs available publically at: resources/specifications/ 22 TPs Ratified Already Released NVMe specification Planned release * Subject to change
15
Theme: Diversification of Non-Volatile Media and I/O Functionality
Beyond NVMe™ (next) Theme: Diversification of Non-Volatile Media and I/O Functionality
16
Diversification Non-Volatile Media – It’s no longer just NAND flash:
SCM (Storage Class Memory) QLC (Quad Level Cell) NAND Endurance (DWPD): > 2 orders of magnitude (100x) range and increasing DWPD = Drive Writes Per Day I/O Functionality – New Commands, e.g.: Zoned Namespaces Key Value Expect interesting combinations
17
Non-Volatile Media – Two Recent Developments
SCM (Storage Class Memory) (e.g., 3D XPoint, Magnetic RAM) Very fast access, fine-grain overwrite- in-place Endurance: 10s of DWPD possible NVMe™ Direction: Expose parallelism for host to exploit Better performance and isolation based on host knowing what doesn’t interfere QLC (Quad Level Cell) NAND Flash Slower access, write & erase behavior differs from SLC/MLC/TLC NAND Endurance: May be < 0.5 DWPD NVMe™ Direction: Expose endurance for host to manage Better endurance based on more host control of write amplification New I/O functionality applies to all types of Non-Volatile Media. SCM and QLC are important motivating examples.
18
Exposing Parallelism and Endurance
Moving beyond I/O Determinism Endurance Group Management: Enable explicit combination of media chunks (aka Parallel Units) to construct namespaces E.g.: Based on SSD channel/die structure Endurance Groups are an underlying abstraction for NVM Sets Moving beyond Streams Zoned Namespaces: Expose NAND sequential write and block erase via new commands Reuse zone concept and states from SMR (Shingled Magnetic Recording) HDD (disk) standards
19
Key Value I/O Commands Motivations: New NoSQL databases (e.g., RocksDB), Scale out storage Both use an internal Key/Value abstraction: Write: Key/Value Pair Read: Present Key to storage, get Value back Native Key/Value commands remove translation layers in storage stack Key/Value Pair Translate into LBA Write to SSD Map LBA to PBA Map KV to PBA
20
Beyond NVMe™ (next): Summary
Endurance Group Management More I/O functionality: Zoned Namespaces Key Value And more enhancements in familiar areas, e.g.: Security Manageability and Testability And whatever else needs attention
21
Projected completion: late Q1/early Q2 2019
NVMe™ Management Interface (NVMe-MI™) 1.1 In-Band NVMe-MI, Multi-SSD Devices, Enclosure Management Projected completion: late Q1/early Q2 2019
22
In-Band Management and NVMe-MI™
In-band mechanism allows application to tunnel NVMe-MI™ commands through NVMe™ driver Two new NVMe Admin commands NVMe-MI Send NVMe-MI Receive Benefits Provides management capabilities not available in-band via NVMe commands Efficient NVM Subsystem health status reporting Ability to manage NVMe at a FRU level Vital Product Data (VPD) access Enclosure management
23
NVMe™ Storage Device with Multiple NVM Subsystems (not just one SSD)
Carrier Board from Amfeltec Carrier Board from Facebook
24
Enclosure Management SES Based Enclosure Management
While the NVMe™ and SCSI architectures differ, the elements of an enclosure and the capabilities required to manage these elements are the same Example enclosure elements: power supplies, fans, display or indicators, locks, temperature sensors, current sensors, voltage sensors, and ports Comprehensive enclosure management that leverages SCSI Enclosure Services (SES), a standard developed by T10 for management of enclosures using the SCSI architecture Motivation for Administrative Controller (doesn’t do I/O): Administrative Controller for enclosure services process Use In-Band NVMe-MI™ from the host.
25
NVMe-MI™ over NVMe-oF™ (Beyond NVMe-MI 1.1)
Vision: All NVMe™ management via NVMe-MI™ Gap: NVMe over Fabrics™ management Status: Plumbing in place, need to specify details Plumbing in place for NVMe-MI over NVMe-oF
26
Projected completion: late Q1/early Q2 2019
NVMe over Fabrics™ [NVMe-oF™] (next) NVMe™/TCP, ANA Multipathing, Authentication, Enhanced Discovery Projected completion: late Q1/early Q2 2019
27
NVMe™/TCP – Now ratified!!
NVMe™ Host Software Host Side Transport Abstraction NVMe block storage protocol over standard TCP/IP transport Enables disaggregation of NVMe SSDs without compromising latency and without requiring changes to networking infrastructure Independently scale storage & compute to maximize resource utilization and optimize for specific workload requirements Maintains NVMe model: sub-systems, controllers namespaces, admin queues, data queues Fibre Channel InfiniBand* RoCE TCP iWARP Next Gen Fabrics Controller Side Transport Abstraction
28
NVMe™/TCP in a Nutshell
NVMe-oF™ commands sent over standard TCP/IP sockets Each NVMe queue pair mapped to a TCP connection TCP provides a reliable transport layer for NVMe queueing model
29
NVMe™/TCP Usage Enables NVMe-oF™ I/O operations in existing IP Datacenter environments Software-only NVMe Host Driver with NVMe-TCP transport Provides an NVMe-oF alternative to iSCSI for Storage Systems with PCIe NVMe SSDs More efficient End-to-End NVMe Operations by eliminating SCSI to NVMe translations Co-exists with other NVMe-oF transports Transport selection may be based on h/w support and/or policy
30
NVMe™ Multipathing and Namespace Sharing
NVMe Multipathing I/O refers to two or more completely independent PCI Express paths between a single host and a namespace Namespace sharing enables two or more hosts to access a common shared namespace using different NVM Express controllers Host Host A Host B Host C Port A Port B Port C NVMe Controller 1 NVMe Controller 3 NVMe Controller 4 NVMe Controller 1 NVMe Controller 2 NVMe Controller 3 NVMe Controller 4 NSD1 NSD1 NSD1 NSD1 NSD1 NSD1 NSD1 Namespace Namespace NVMe™ Multipathing Namespace Sharing Both multi-path I/O and namespace sharing require that the NVM subsystem contain two or more controllers
31
Symmetric and Asymmetric Multipathing
Simple: Symmetric Multipathing, all paths equivalent Send any I/O on any path, in principle … in practice, some favoritism in drivers Many storage systems work better if one storage processor is fully loaded before sending I/O to another, e.g., due to cache effects. NVMe standard has support for Symmetric Multipathing … but there’s more … Complex: Asymmetric Multipathing, some paths are better than others, e.g.: Failover-only paths - No I/O unless other paths fail Sub-Optimal paths - e.g., I/O is forwarded across storage processors (head) NEW: Asymmetric Namespace Access (ANA): NVMe standard support for Asymmetric Multipathing
32
NVMe-oF™ In-band Authentication (Authentication = Proof of Identity)
Goal: Replace unimplemented TCG-based authentication in NVMe-oF 1.0 NVMe-oF (next) - Native NVMe-oF Authentication, based on FC and iSCSI experience: DH-CHAP: Adapted from Fibre Channel, originally designed for iSCSI NVMe-oF: DH-HMAC-CHAP, cryptographic strengthening SRP: Adapted and updated from iSCSI NVMe-oF: N-SRP, cryptographic strengthening Both authentication protocols use shared-secret credentials: Same class of credentials as common FC and iSCSI authentication Certificates are not supported
33
Enhanced Discovery: NVMe-oF™ (next)
How do I connect storage consumers to storage suppliers? Specification enhancement for efficient, dynamic resource management Fabric-transport specific mechanisms to determine where to get provisioning information from Allows the fabric to tell hosts when something changes Allows hosts to perform dynamic discovery of new stuff; or Adapt to removal of stuff from the NVMe-oF™ environment Dynamically find new paths; or know when old paths go away; Now can be done over RDMA and TCP as well as FC Special Thanks: Fred Knight, NetApp Special Thanks to Phil Cayton, Intel
34
NVMe-oF™ Discovery & Mgmt. – Beyond NVMe-oF (next)
To support large-scale deployment of NVMe-oF: Specification enhancement for efficient, dynamic resource management Fabric-transport specific mechanisms to determine where to get provisioning information from Linux kernel driver stack changes as the specification evolves Management tools to enable NVMe-oF management and scale-out Automatically finding the discovery root (currently manually configured) Discovery support for multiple fabric installations (e.g., FABRIC ID in the discovery service, so when you have a name and a port, you know which fabric it’s connected to) Discovery is also still just discovery – NOT about configuration management or provisioning of storage or fabric connectivity Special Thanks: Fred Knight, NetApp
35
NVMe™ Applications
36
NVMe™ Applications NVMe enables faster performance and greater density compared to legacy protocols. It's geared for applications that require top performance Data-centric computing, supercomputing, artificial intelligence, machine learning, virtual and augmented reality, real-time data analytics, online trading platforms and other latency-sensitive workloads Other developments are just entering the horizon IDC predicts that between 60% and 70% of Fortune 2000 companies will have at least one mission-critical workload that leverages real-time big data analytics by 2020.
37
Summary – Next Releases
NVMe™ (next) Better exploit media – I/O Determinism Manageability and testability improvements And much more … NVMe-oF™ (next) TCP Transport Binding Asymmetric Multipathing (ANA) Enhanced Discovery NVMe-MI™ 1.1 In-band NVMe commands for NVMe-MI functionality NVMe enclosure management Managing multi-SSD NVMe storage devices, e.g., based on carrier cards
38
Future Vision Beyond Next Releases
NVMe™: Diversity in NVM media and I/O functionality Emphasis: More I/O functionality, Improved endurance management NVMe-MI™: Support all NVMe-related management Fabrics emphasis: NVMe-MI™ over NVMe-oF™ NVMe-oF™: Enhance Management, Orchestration and Automation Initial emphasis: Discovery enhancements
39
Summary - The Future of NVMe™
Continue to focus on feature that the market demand plus the ability to tailor for competitive advantage As we contend with the perpetual growth of data, we need to constantly rethink how data is captured, preserved, accessed and transformed Performance, economics and endurance of data at scale is paramount NVMe is having a great impact on businesses and what they can do with data, particularly Fast Data for real-time analytics and emerging technologies
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.