Presentation on theme: "Storage Performance for Data Warehousing Joe Chang"— Presentation transcript:
Storage Performance for Data Warehousing Joe Chang
About Joe Chang SQL Server Execution Plan Cost Model True cost structure by system architecture Decoding statblob (distribution statistics) SQL Clone – statistics-only database Tools ExecStats – cross-reference index use by SQL- execution plan Performance Monitoring, Profiler/Trace aggregation
Organization Structure In many large IT departments DB and Storage are in separate groups Storage usually has own objectives Bring all storage into one big system under full management (read: control) Storage as a Service, in the Cloud One size fits all needs Usually have zero DB knowledge Of course we do high bandwidth, 600MB/sec good enough for you?
Data Warehouse Storage OLTP – Throughput with Fast Response DW – Flood the queues for maximum through-put Do not use shared storage for data warehouse! Storage system vendors like to give the impression the SAN is a magical, immensely powerful box that can meet all your needs. Just tell us how much capacity you need and dont worry about anything else. My advice: stay away from shared storage, controlled by different team.
Nominal and Net Bandwidth PCI-E Gen 2 – 5 Gbit/sec signaling x8 = 5GB/s, net BW 4GB/s, x4 = 2GB/s net SAS 6Gbit/s – 6 Gbit/s x4 port: 3GB/s nominal, 2.2GB/sec net? Fibre Channel 8 Gbit/s nominal 780GB/s point-to-point, 680MB/s from host to SAN to back-end loop SAS RAID Controller, x8 PCI-E G2, 2 x4 6G 2.8GB/s Depends on the controller, will change!
Storage – SAS Direct-Attach Many Fat Pipes Very Many Disks Balance by pipe bandwidth Dont forget fat network pipes Option A: 24-disks in one enclosure for each x4 SAS port. Two x4 SAS ports per controller Option B: Split enclosure over 2 x4 SAS ports, 1 controller RAID PCI-E x8 SAS x4 RAID PCI-E x8 SAS x4 RAID PCI-E x8 SAS x4 RAID PCI-E x8 SAS x4 PCI-E x4 RAID 2 x10GbE PCI-E x4 2 x10GbE SAS x4
Storage – FC/SAN PCI-E x8 Gen 2 Slot with quad-port 8Gb FC If 8Gb quad-port is not supported, consider system with many x4 slots, or consider SAS! SAN systems typically offer 3.5in 15-disk enclosures. Difficult to get high spindle count with density disk enclosures per 8Gb FC port, MB/s per disk? 2 x10GbE PCI-E x4 2 x10GbE PCI-E x HBA PCI-E x8 HBA PCI-E x8 8Gb FC HBA PCI-E x4 HBA 8Gb FC HBA PCI-E x4 8Gb FC PCI-E x4 HBA 8Gb FC PCI-E x4 HBA 8Gb FC
Storage – SSD / HDD Hybrid Log: Single DB – HDD, unless rollbacks or T-log backups disrupts log writes. Multi DB – SSD, otherwise to many RAID1 pairs to logs Storage enclosures typically 12 disks per channel. Can only support bandwidth of a few SSD. Use remaining bays for extra storage with HDD. No point expending valuable SSD space for backups and flat files No RAID w/SSD? SAS PCI-E x8 SAS x4 SAS PCI-E x8 SAS x4 SAS PCI-E x8 SAS x4 SAS PCI-E x8 SAS x4 PCI-E x4 RAID 2 x10GbE PCI-E x4 2 x10GbE SSD SAS x4 SSD
PDW Gigabit Ethernet Infini-Band SP FC SP FC SP FC SP FC SP FC SP FC SP FC SP FC SP FC SP FC Fiber Channel Control Management Landing Zone Backup Node Compute Nodes
SAS SAS x4 RAID 2 x10GbE PCI-E x4 2 x10GbE SAS x SP FC SPFCSPFCSP FC IB SSD
SSD Current: mostly 3Gbps SAS/SATA SDD Some 6Gbps SATA SSD Fusion IO – direct PCI-E Gen2 interface 320GB-1.2TB capacity, 200K IOPS, 1.5GB/s No RAID ? HDD is fundamentally a single point failure SDD could be built with redundant components HP report problems with SSD on RAID controllers, Fujitsu did not?
Big DW Storage – iSCSI Are you nuts? Well, maybe if you like frequent long coffee-cigarette breaks
Storage Configuration - Arrays Shown: two 12-disk Arrays per 24-disk enclosure Options: between 6-16 disks per array SAN systems may recommend R or R5 7+1 Very Many Spindles Comment on Meta LUN
Data Consumption Rate: Xeon TPC-H Query 1 Lineitem scan, SF1 1GB, 2k8 875M Data consumption rate is much higher for current generation Nehalem and Westmere processors than Core 2 referenced in Microsoft FTDW document. TPC-H Q1 is more compute intensive than the FTDW light query. Processors Total Cores Q1 sec SQL Total MB/s per core GHz Mem GB SF 2 Xeon Xeon Xeon sp2 8sp1 8r2 1, , , Xeon r27, Nehalem Westmere Neh.-EX Conroe 8 Xeon r214,
Data Consumption Rate: Opteron Expected Istanbul to have better performance per core than Shanghai due to HT Assist. Magny-Cours has much better performance per core! (at 2.3GHz versus 2.8 for Istanbul), or is this Win/SQL 2K8 R2? TPC-H Query 1 Lineitem scan, SF1 1GB, 2k8 875M Processors Total Cores Q1 sec SQL Total MB/s per core GHz Mem GB SF 4 Opt Opt rtm 8rtm , Opt rtm3, Opt sp15, Barcelona Shanghai Istanbul 2 Opt r24, Magny-C 4 Opt r28, Opt rtm5,
Data Consumption Rate TPC-H Query 1 Lineitem scan, SF1 1GB, 2k8 875M Processors Total Cores Q1 sec SQL Total MB/s per core GHz Mem GB SF 2 Xeon Xeon Xeon Opt sp2 8sp1 8r Opt Opt rtm 8rtm Opt Opt rtm 8sp Opt r Xeon r Barcelona Shanghai Istanbul Magny-C
Storage Targets Processors 2 Xeon X Opt Xeon X Xeon X7560 Total Cores PCI-E x8-x SAS HBA Storage Units/Disks Actual Bandwidth 5 GB/s 10 GB/s 15 GB/s 26 GB/s 8-way : 9 controllers in x8 slots, 24 disks per x4 SAS port 2 controllers in x4 slots, 12 disk 24 15K disks per enclosure, 12 disks per x4 SAS port requires 100MB/sec per disk, possible but not always practical 24 disks per x4 SAS port requires 50MB/sec, more achievable in practice 2U disk enclosure 24 x 73GB 15K 2.5in disks $14K, $600 per disk BW Core Target MB/s Storage Units/Disks Think: Shortest path to metal (iron-oxide)
Your Storage and the Optimizer Assumptions 2.8GB/sec per SAS 2 x4 Adapter, Could be 3.2GB/sec per PCI-E G2 x8 HDD 400 IOPS per disk – Big query key lookup, loop join at high queue, and short- stroked, possible skip-seek. SSD 35,000 IOPS Sequential IOPS 1, ,000 Model Optimizer SAS 2x4 Disks BW (KB/s) 10,800 2,800,000 Random IOPS 320 9,600 19,200 Sequential- Rand IO ratio ,000FC 4G30360,00012, ,000SSD82,800,000280, The SQL Server Query Optimizer make key lookup versus table scan decisions based on a 4.22 sequential-to-random IO ratio A DW configured storage system has a ratio, 30 disks per 4G FC about matches the QO, SSD is in the other direction
Fast Track Reference Architecture Several Expensive SAN systems (11 disks) Each must be configured independently $1,500-2,000 amortized per disk Too many 2-disk Arrays 2 LUN per Array, too many data files Build Indexes with MAXDOP 1 Is this brain dead? Designed around 100MB/sec per disk Not all DW is single scan, or sequential My Complaints Scripting?
Fragmentation Weak Storage System 1) Fragmentation could degrade IO performance, 2) Defragmenting very large table on a weak storage system could render the database marginally to completely non-functional for a very long time. Powerful Storage System 3) Fragmentation has very little impact. 4) Defragmenting has mild impact, and completes within night time window. What is the correct conclusion? File Partition LUN Disk Table
Operating System View of Storage
Operating System Disk View Controller 1 Port 0 Controller 1 Port 1 Disk 2 Basic 396GB Online Disk 3 Basic 396GB Online Controller 2 Port 0 Controller 2 Port 1 Disk 4 Basic 396GB Online Disk 5 Basic 396GB Online Controller 3 Port 0 Controller 3 Port 1 Disk 6 Basic 396GB Online Disk 7 Basic 396GB Online Additional disks not shown, Disk 0 is boot drive, 1 – install source?
File Layout Disk 2, Partition 0 File Group for the big Table File 1 Partition 1 File Group for all others File 1 Partition 2 Tempdb File 1 Partition 4 Backup and Load File 1 Disk 3 Partition 0 File Group for the big Table File 2 Partition 1 Small File Group File 2 Partition 2 Tempdb File 2 Partition 4 Backup and Load File 2 Disk 4 Partition 0 File Group for the big Table File 3 Partition 1 Small File Group File 3 Partition 2 Tempdb File 3 Partition 4 Backup and Load File 3 Disk 5 Partition 0 File Group for the big Table File 4 Partition 1 Small File Group File 4 Partition 2 Tempdb File 4 Partition 4 Backup and Load File 4 Disk 6 Partition 0 File Group for the big Table File 5 Partition 1 Small File Group File 5 Partition 2 Tempdb File 5 Partition 4 Backup and Load File 5 Disk 7 Partition 0 File Group for the big Table File 6 Partition 1 Small File Group File 6 Partition 2 Tempdb File 6 Partition 4 Backup and Load File 6 Each File Group is distributed across all data disks Log disks not shown, tempdb share common pool with data
File Groups and Files Dedicated File Group for largest table Never defragment One file group for all other regular tables Load file group? Rebuild indexes to different file group
Partitioning - Pitfalls Common Partitioning Strategy Partition Scheme maps partitions to File Groups What happens in a table scan? Read first from Part 1 then 2, then 3, … ? SQL 2008 HF to read from each partition in parallel? What if partitions have disparate sizes? Disk 2 File Group 1 Disk 3 File Group 2 Disk 4 File Group 3 Disk 5 File Group 4 Disk 6 File Group 5 Disk 7 File Group 6 Table Partition 1 Table Partition 2 Table Partition 3 Table Partition 4 Table Partition 5 Table Partition 6