Presentation on theme: "Storage Performance for SQL Server Joe Chang"— Presentation transcript:
Storage Performance for SQL Server Joe Chang email@example.com www.sql-server-performance.com/joe_chang.asp
Objectives Understand storage components Learn how interpret performance counters How to determine if you have a bottleneck Sizing and configuring the storage system How many disks and IO channels do you need Verify storage system performance Does the system perform as expected How SQL operations translate to disk ops
Quiz How much capacity do you have? For each data and log partition What is the RAID level for each? How many disks and IO channels on each? What are the performance characteristics? Small block low queue random IOPS Small block high queue random IOPS Large block sequential transfer rate Read and write for each
Topics Storage components & system Performance counters IO Performance Tools & Testing IO characteristics of SQL Server operations Configuring the storage system File placement strategies
Storage Components Storage Performance for SQL Server
Storage Components System Overview Disk Interfaces Disks Disk Performance RAID Controllers / Host Bus Adapters PCI-X, PCI-Express
System Overview Chipset (memory controller & IO bridges) connects processors, memory & IO System bus usually means processor bus Internal IO connects memory controller to IO bridges, may be proprietary PCI – connects IO adapters to IO bridge Disk interface connects disks to IO adapters
Serverworks GC-LE chipset PCI-X 64-bit 0.8GB/sec@100MHz 1.0GB/sec-133MHz Popular in 2-way Xeon servers with 400 & 533MHz FSB EMC Clarion CX500/700 series What is actual realizable IO bandwidth?
Disk Interfaces Desktop Interfaces ATA133 MBytes/sec SATA1.5 Gbit/sec (8b/10b) ~150MB/sec SATA-IO (formerly SATA-II) up to 3.0 Gbit/sec defined Enterprise Interfaces SCSI – LVD signaling 14-15 disks per SCSI bus U160 Full speed only on data, not command U320 More efficient protocol, full speed command FC 126 devices per loop 1 Gbit/sec 2 Gbit/sec - 4 Gbit/sec - soon SAS – point to point, with fan out, 128 devices 3.0 Gbit/sec full duplex, 6.0 Gbit/sec second generation
SCSI SCSI 1 controller per bus and up to 15 disks or up to 2 controllers and up to 14 disks Shared bus, disks must arbitrate for bus Common form factor: 14 disks in 3U rack Can be 1 SCSI channel (1x14) or 2 SCSI channels (2x7)
Fiber Channel Dual port Arbitrated Loop, Fabric, or Point-to-point topologies Bandwidth is shared Can achieve higher utilization than shared bus One port of loop topology
SATA Differential signals - 2 wires, +/- 1 pair for transmit, 1 pair for receive
SATA 1 disk per port New features in SATA Native Command Queuing Port Multiplier 1 port can connect to multiple devices Port Selector Each disk can have 2 ports
Disk Drives 36GB74GB146G300G 7200 RPM SATA$150 10K 3.5in SCSI/FC$200$400$750 15K 3.5in SCSI/FC$200$400$800 10K 2.5in SAS$325$600 Bare drive, no hot-plug carrier, no enclosure
Drive Speed versus Capacity 95mm 84mm 65mm 7200RPM, 8ms 200, 300, 400GB BPI763K/in 91.56Mbit/in 2 10,000RPM, 5ms 73, 146, 300GB BPI 658K/in 15,000RPM, 3.6ms 36, 73, 146GB BPI 628K/in Lower RPM drives have higher bit density and larger platters contributing to very low $/GB. Desktop rated for 2 years @ 20% duty cycle, server for 5 years @ 100%
Disk Performance Characteristics Random I/O Rotational speed Seek time Command Queuing, Short Stroke Sequential I/O Media transfer rate Outer versus Inner tracks Disk interface saturation
Disk Specs (2003) Generation7200.710K.615K.3 RPM720010K15K Rotational Latency4.163.02.0ms Avg. Seek (R/W)8.5/9.54.9/5.53.6/4.0ms Track-to-Track Seek1.0/1.20.55/0.750.4/0.6*ms Transfer Rate Internal - Rawxxx-683475-841632-891Mbit/s Internal Formattedxx-8443-7857-86MB/sec Sustained32-5838-68.549-75MB/sec * Includes 0.2ms controller overhead
Random IO Rate Drive Speed Rotational Latency Avg. Seek 8K Transfer Tot Latency I/O per sec 72004.168.50.2012.8677.7 10K3.04.90.198.09123.7 15K2.03.60.165.76174.2 IO rate based on data distributed over entire disk accessed at random, one IO command issued at a time
Other Factors – Random IO Short Stroke: Data is distributed over a fraction of the entire disk Average seek time is lower (track-to-track minimum) Command Queuing: More than one IO issued at a time, Disk can reorder individual IO accesses, lowering access time per IO
Controllers and Adapters [RAID] Controllers Processing capability for RAID logic etc SCSI 2-4 channels per adapter SAS 8 ports SATA 4-8 ports (12 & 16) Host Bus Adapters (HBA) Only interfaces IO bridge to disk interface Fiber channel 1 or 2 ports, SCSI
Controllers and Adapters PCI-XPCI-e SATA SATA II 3Ware Highpoint RaidCore LSI SCSIHP SA 64X, 640X LSI FC HBAEmulex, QLogicQLogic SASHP SA P600 LSI
U320 RAID Controllers May generate 240MB/sec per channel 2 Channel adapter ~480MB/sec 2 adapters per PCI-X bus OK Minute part of PCI-e x8 port 4 Channel adapter could generate 1GB/sec Prefer 1 adapter on 133MHz PCI-X bus
Fiber Channel HBA 1 & 2 port adapters PCI-X and PCI-e 2 port may generate 350-400MB/sec 2 dual port adapters per PCI-X bus if bandwidth used is mostly uni-directional Dual port adapter only uses fraction of PCI- Express x8 port
SAS Adapters HP Smart Array P600 RAID Controller 8 3.0Gbit/sec SAS ports 2.4GB/sec each direction 2 x 4 port connectors Max 38 drives PCI-X LSI Logic SAS3xxx
SATA Raid Controllers 8 port SATA common 1.5Gbit/sec per port 3.0Gbit/sec per port on SATA-II PCI-X SATA disks max out at 50-70MB/sec 560MB/sec per 8-port adapter max
PCI-X and PCI Express PCI-X 64-bit wide 100MHz 2 slots per bus, 133MHz 1 slot 800MB/sec, 1GB/sec Most adapters available PCI Express 3 x8 slots Each 2GB/sec in direction No single adapter can generate this
Performance Counters Storage Performance for SQL Server
Performance Counters System Monitor - measured from OS Can only see disks visible to OS HW specific – detail for each disk in array OS: Physical & Logic disks Size, latency, queue depth, IOPS, MB/sec Are disk ops small block random Large or sequential ops Read/Write mix
OS & Hardware Counters OS Counters Average values only Example: 100 Reads, Average Bytes 16K Don’t actual mix of 8K, 64K etc Hardware – Vendor Specific May give distribution of actual IOPS
OS: Physical & Logical Disk Physical Disk Frequently most useful Seen by OS as distinct physical disk Hardware RAID may have striped multiple disks Disks may be shared by other partitions Logical Disk When partition is striped across multiple physical disk
Counters (Transfer, Read, Write) Avg. Disk Bytes/[Read] [Write] [Transfer] Disk [Reads] [Writes] [Transfer]/sec Disk [Read] [Write]  Bytes/sec Avg. Disk [Read] [Write]  Queue Length Avg. Disk sec/[Read] [Write] (latency) No simple single value interpretation Must examine all the above counters together
Interpreting Counters No single counters can determine whether IOPS are random or sequential High activity at 0 ms latency indicates small block sequential IOPS Latency ~ Queue depth X Media transfer time also indicates sequential activity Ex. 64MB/sec – 64K in 1ms For queue depth 2, latency doubles
Random IO Low queue depth For small block IO, 8-64K Rotational latency and seek time are primary contributors to latency Avg. Seek time for data spread across entire disk Track-to-track seek time for highly localized data High Queue depth High IOPS per disk possible for small block IO due to command queuing
Counters Looking for indications of: 1) small random transfers 2) sequential or large block transfers
Low Write – 4 min Recovery 4 15K SCSI Non checkpoint writes not as bad
All Data in memory 4 15K SCSI Checkpoints does not slow SQL batch, no reads required
Configuring the storage system Storage Performance for SQL Server
Configuring Storage Systems SCSI, FC, SATA & SAS Disk Units
RAID Performance Scaling OperationRAID 0RAID 1+0RAID 5 Read111 Small Write11/21/4 Large Write11/21 - 1/N Theoretical performance per drive for N drives in a RAID group RAID 5 write: 1 read data, 1 read parity, 1 write data, 1 write parity. Write penalty is reduced if entire stripe can be written
Popular “Thumb Rules” Older thumb rules 100 Random IOPS per 10K, 150 per 15K 75 Random IOPS per 7200RPM disk, 150 Sequential IOPS per 7200 disk Newer IOPS rule: 200-300? Max queue depth 2 per physical disk How true/relevant are the above?
SAN Vendors Claim: Big cache is solution to disk performance RAID 5 is OK Carve multiple LUNs from each RAID Group Allocate as necessary Don’t need to separate Data & Logs Higher space utilization With shared storage resource versus “islands” of storage Are any of the above true for database applications?
Random Read Summary Command Queuing Significantly increases IOPS at high disk queue but higher latency Fully supported in SCSI/FC systems New SATA disks, not yet in controllers (?) Short Strokes Use only small fraction of disk space Further increases IO Lower latency
Sequential Disk Access Scales nearly linear with number of disks 50-70MB/sec per disk SATA – no controller limitations SCSI – U320 – practical limit 240MB/sec? FC – 2Gbit/sec – 170MB/sec bidirectional Bus architecture PCI-X 2 Slots, 100MHz, 800MB/sec PCI-e x4 1GB/sec bidirectional SAN – 9.6MB/sec per disk?
Sequential Disk Access Distribute disks over multiple SCSI channels or FC ports SCSI 7 disks per SCSI channel, 2 SCSI ports per 14 disk rack FC – limited expansion in SAN? 1 rack per port Distribute HBAs over multiple PCI-X busses Distribute data across multiple files?
Database Characteristics Data files Random reads for transactions Sequential or large block access Log file(s) Sequential writes, small or large blocks Backup Tempdb ? Potentially high queue operations?
Checkpoint Summary All dirty data pages dumped to disk queue Data reads are normally prioritized over writes, But any reads issued during checkpoint must wait until outstanding writes complete? SQL 2005 has smarter checkpoint Disk should have sufficient peak IO to minimize checkpoint impact In memory data also works Trace Flag 3505 disable automatic checkpoint
Transaction Log 2 disks in RAID 1 OK for most applications Few situations need more than 50MB/sec Small writes: > 5000 writes/sec per disk! Avg. Disk sec/Write should read mostly 0 ms! Log Backups / Mixed Data + Log No longer purely sequential disk ops. Random IO performance characteristics Does big SAN cache help T-Log backup?
Key Metrics Random Reads Reads/sec versus Latency curve Not single value High read rate at low latency High Queue Depth Capability Blast through checkpoints & tables scans Transaction Log Backups
Storage Systems Large Spindle count for random IO Normal low queue IO rate Checkpoint IO capability Transaction log backup Multiple Channels for bandwidth SCSI – U320 – 240MB/sec max? FC 2Gbit/sec full-duplex – 170MB/sec SATA/SAS – disk has its own bus
SAN Specs EMC ClariionIOPS*MB/secDisksFC CX30050K680604F/2B CX500120K7601204F/4B CX700200K15202408F/8B SymmetrixIOPsMB/secDisksFC DMX80060/1208/16 DMX100014416/32 DMX2000275K?3000?28832/64 Peak IOPS to cache?
SAN Specs - HP MSAIOPsMB/secDisksPorts 500141 SCSI 100030K200421 FC 150030K200561 FC IOPsMB/secDisksFC XP12000 1.9M 120K 8000114832/64
SAN Specs – HP (2) EVAIOPsMB/secDisksPorts 4000141K335564/4 6000141K6501124/4 8000200K13002404/4 IOPsMB/secDisksFC EVA 3000141K335562 EVA 5000141K7002402/4
Big Cache on SAN Reads – system memory is better Writes: Modify 100,000 random rows in table ~100K dirty pages, 800KB What is SAN cache line size? If 64KB, then 64K * 100K = 6.4GB needed! Cache setting OLTP 2M Read for each LUN, All else to write(?)
SAN Summary RAID 5 may have poor write performance How likely is a large strip to be modified? LUNs per RAID Group Absolutely essential to separate sequential & random loads. Low & High queue loads Space Utilization Is possible But really want low space utilization for short stroke performance gains
Key Metrics Random IO Performance at low latency <10-15ms Important for Random IO Performance at high queue Ability to handle checkpoints Sequential Performance
Summary Single number metrics have no meaning Random and Sequential IO Queue depth versus latency Checkpoints and Transaction Log backup Checkpoints generate high disks queues T-Log BU disrupts zero latency writes Very difficult to guarantee 100% fast response times (for SLA) Feedback: firstname.lastname@example.org