3Agenda Definitions What is a SAN Fabric What is a storage array Front-end connectionsControllersBack-end connectionsPhysical DisksManagementPerformanceFuture – Distributed storage
4Definitions SAN – Storage Area Network This is generally used as a catch all term for all the following definitionsFor storage personnel SAN does NOT equal storage arrayLUN – Logical Unit Number, also known as a volumeWWN – World Wide NameMAC address for storage networksFabric – Network that connects hosts to storageiSCSI – Internet SCSISCSI –FC – Fibre ChannelFCoE – Fibre Channel over EthernetFCIP – Fibre Channel over IPStorage Array – Storage Device that provides block level access to volumesDAS/DASD – Direct Attached StorageStorage directly attached to a server without any networkNAS – Network attached StorageStorage device that provides file level access to volumesRAID – Redundant array of Independent DisksA way to combine multiple physical disks into a logical entity providing different performance and protection characteristics.
5What is a SAN fabricA network comprising hosts, storage arrays they access, and storage switches that provide the network connectivity
8SAN Fabric Details A SAN Fabric has hosts that connect to the network Each host has a physical connection and some logical addressespWWN (Port WWN) is the equivalent MAC address for the port on the host that is connected to the networkFCID is a dynamic address that represents the connection as wellOnly HP-UX 11v2 and below use thisTypically hosts connect into some storage switchThese look like traditional network switches in many ways and operate the same way.These switches will contain both host ports and storage ports, or in the storage world, initiators and targetsStorage arrays that provide storage also connect into these switches to provide the full network
9What is a storage array?A storage array is a system that consists of components that provide storage available for consumptionThe components are front-end ports, controllers, back-end ports, and physical disk drives
11Front-end connections Front-end connections are used for individual hosts to connect to the storage array and utilize the volumes availableThis can be directly connected in a small or medium size SAN, or in a DAS environmentThe physical transport mechanism can be fibre or copperThe logical transport protocols can be block level protocols such as iSCSI, FC, or FCoESome arrays also support file level protocols as well such as NAS devicesThe larger arrays tend to have more front-end connections to aggregate bandwidth and provide load balancingVolumes are typically presented via one or more front-end connections to hosts
12ControllersControllers are the brains that translate the request from the front-end ports and determine how to fulfill the requestControllers run code optimized for moving data and performing mathematical calculations needed to support RAID levelsControllers also have a certain amount of on-board memory, or cache, to help reduce the amount of data that has to come from spinning disks.Many arrays perform some level of read-ahead caching and write caching to optimize performanceThey also have some diagnostics routines and management in order to support the operations of the array.
13Back-end connectionsFrom the controllers themselves to the physical disk shelves or disks there are back-end connections.These send actual commands to the disks commanding them to retrieve or write blocks of data.These connections are usually transparent to all but the most sophisticated storage engineer.Often times these have specific fan-out ratios where each disk shelf may have two or four connections and split the bandwidth available in some way.Back-end connections are rarely a bottleneck
14Physical Disks These days physical disks come in all shapes and sizes Spinning drives come in capacities of anywhere from 146GB to 3TB, with the space increasing year over year (though not performance)These drives also come in various rotational speeds anywhere from 5400 RPM in a laptop drive to RPM in an enterprise class drive, which directly affects performanceNon Spinning drives, also known as SSD’s, come in capacities that don’t yet match spinning drives, though there are SSD cards that have up to 960GB of storage space available.These physical disks directly impact the performance of the storage array system, and are usually the bottleneck for most enterprise class storage systems.
15Provisioning Provisioning storage is a multi-step process Configure the host with any software including multi-path supportAlias the host port WWNZone the host port alias to a storage array WWNActivate update zone informationCreate host representation on storage arrayCreate volume on storage arrayPresent/LUN Mask volume to correct hostFormat volume for use
16PerformanceThere are many statistics you can use to monitor your storage devices, however there tend to be two key ones that directly impact performance more than most.IOPS – Input/Output Operations Per SecondThis is based on the number of disks that support the volume being used and the RAID level of the volume15k RPM disks provide 200 IOPS raw without any RAID write penaltyRaid 1 has a 1:2 ratio for writes. For every 1 write command sent to the array, 2 commands are sent to the disks.Raid 5 has a 1:4 ratio, while Raid 6 has a 1:6 ratioRead existing data block, Read Parity 1, Read Parity 2, Calculate XOR (parity) is not I/O, Write data, Write Parity 1, Write Parity 2Read commands are always 1:1For an application that has a requirement of 10,000 IOPS and a 50/50 read to write ratio on a raid 6 volume:5,000 read IOPS, translating into 25 physical disks5,000 write IOPS translating into 30,000 back-end operations requiring 150 physical disksTotal requirement is 175 physical disks just to support the performance needed!BandwidthThis is based on the speed of the connections from the host to the array as well as how much oversubscription is taking place within the SAN Fabric.
17Performance Bandwidth This is based on the speed of the connections from the host to the array as well as how much oversubscription is taking place within the SAN Fabric.Fibre Channel currently supports 16Gb full duplex, though 8Gb is more commonThat’s 3200 MBps in each direction, transferring 3GB of data each second in one direction or 6GB of data bi-directionally.FCoE currently supports 10Gb, though the roadmap includes 40Gb and 100Gb10Gb is 2400 MBps in each direction, while 100Gb is MBps, 23.4GB per second!Besides the speed is the matter of oversubscription
18PerformanceOversubscription – The practice of providing less aggregate bandwidth than the environment may add up toIn an environment with 100 servers having dual 8Gb FC connections we’d have a total of 1600Gb that is directed at a storage array via some SAN switchThe storage array may only have a total of eight 8Gb FC connections for 64Gb aggregated bandwidthWe have a ratio of 1600:64 or 25:1.This is done in networking all the time and is now a standard in the storage world.The assumption is that there will never be a need for all 100 hosts to be transmitting 100% of the time their full bandwidth
19Storage Futures Converged Infrastructure Datacenters designed today talk about converged infrastructuresOne HP Blade enclosure can encompass servers, networking, and storage components that need to be configured in a holistic mannerVirtualization has helped speed this convergence up, though organizational design is usually still far behind.Storage arrays are beginning to support target based zoningThe goal is to reduce the administration needed to configure a host to storage mapping letting the storage array do more intelligent administration without human intervention
20Storage FuturesOver the last few years storage has begun transitioning from “big old iron” to distributed systems where data is spread across multiple nodes for capacity and performance.EMC IsilonHP IbrixNutanixVmware VSANNexentaAs always in IT, the pendulum is swinging back to the distributed platforms for storage where each node hosts a small amount of data instead of a big platform hosting all of the data.
21Storage FuturesData protection is maturing from traditional RAID levels such as 1, 1+0, 5, 6, etcRAID levels do offer additional protection however don’t protect against corruption most of the timeRAID levels also have performance implications that are usually negative to the applications residing upon themThese days the solution is to create multiple copies of files or blocks based upon some rulesMost of the large public cloud providers use this solution including Amazon S3, or simple storage serviceIt just so happens by default anything stored in S3 has three copies!The ‘utopia’ world is a place where each application has some metadata that controls what protection level and performance characteristics are requiredThis would enable these applications to run internally or externally yet provide the same experience regardless.This is the essence of SDDC, Software Defined Data Center. The application requirements will define where they run without any intervention.