Presentation is loading. Please wait.

Presentation is loading. Please wait.

#SQLSatRiyadh Storage Performance 2013 Joe Chang www.qdpma.com.

Similar presentations


Presentation on theme: "#SQLSatRiyadh Storage Performance 2013 Joe Chang www.qdpma.com."— Presentation transcript:

1 #SQLSatRiyadh Storage Performance 2013 Joe Chang www.qdpma.com

2 About Joe SQL Server consultant since 1999 Query Optimizer execution plan cost formulas (2002) True cost structure of SQL plan operations (2003?) Database with distribution statistics only, no data 2004 Decoding statblob/stats_stream – writing your own statistics Disk IO cost structure Tools for system monitoring, execution plan analysis See ExecStats on www.qdpma.com

3 Storage Performance Chain All elements must be correct – No weak links Perfect on 6 out 7 elements and 1 not correct = bad IO performance HDD SDD SAS RAID Group Dir At/SAN SAS/FC SQL Server File SQL Server Engine SQL Server Extent Pool

4 Storage Performance Overview System Architecture – PCI-E, SAS, HBA/RAID controllers SSD, NAND, Flash Controllers, Standards – Form Factors, Endurance, ONFI, Interfaces SLC, MLC Performance Storage system architecture – Direct-attach, SAN Database – SQL Server Files, FileGroup

5 Sandy Bridge EN & EP Xeon E5-2600, Socket: R 2011-pin 2 QPI, 4 DDR3, 40 PCI-E 3.0 8GT/s, DMI2 Model, cores, clock, LLC, QPI, (Turbo) E5-2690 8 core 2.9GHz 20M 8.0GT/s (3.8)* E5-2680 8 core 2.7GHz 20M 8.0GT/s (3.5) E5-2670 8 core 2.6GHz 20M 8.0GT/s (3.3) E5-2667 6 core 2.9GHz 15M 8.0GT/s (3.5)* E5-2665 8 core 2.4GHz 20M 8.0GT/s (3.1) E5-2660 8 core 2.2GHz 20M 8.0GT/s (3.0) E5-2650 8 core 2.0GHz 20M 8.0GT/s (2.8) E5-2643 4 core 3.3GHz 10M 8.0GT/s (3.5)* E5-2640 6 core 2.5GHz 15M 7.2GT/s (3.0) Xeon E5-2400, Socket B2 1356 pins 1 QPI 8 GT/s, 3 DDR3 memory channels 24 PCI-E 3.0 8GT/s, DMI2 (x4 @ 5GT/s) E5-2470 8 core 2.3GHz 20M 8.0GT/s (3.1) E5-2440 6 core 2.4GHz 15M 7.2GT/s (2.9) E5-2407 4c – 4t 2.2GHz 10M 6.4GT/s (n/a) QPI DMI 2 PCIe x8 MI PCI-E C1 C0 C6 C7 C2C5 C3C4 LLC QPI MI PCI-E x8 MI PCI-E C1 C0 C6 C7 C2C5 C3C4 LLC QPI MI PCH x4 QPI DMI 2 PCIe x8 MI PCI-E C1 C0 C6 C7 C2C5 C3C4 LLC QPI MI PCI-E x8 MI PCI-E C1 C0 C6 C7 C2C5 C3C4 LLC QPI MI PCH x4 Dell T620 4 x16, 2 x8, 1 x4 Dell R7201 x16, 6 x8 HP DL380 G8p 2 x16, 3 x8, 1 x4 Supermicro X9DRX+F 10 x8, 1 x4 g2 80 PCI-E gen 3 lanes + 8 gen 2 possible EN EP Disable cores in BIOS/UEFI?

6 Xeon E5-4600 Socket: R 2011-pin 2 QPI, 4 DDR3 40 PCI-E 3.0 8GT/s, DMI2 Model, cores, Clock, LLC, QPI, (Turbo) E5-4650 8 core 2.70GHz 20M 8.0GT/s (3.3)* E5-4640 8 core 2.40GHz 20M 8.0GT/s (2.8) E5-4620 8 core 2.20GHz 16M 7.2GT/s (2.6) E5-4617 6c - 6t 2.90GHz 15M 7.2GT/s (3.4) E5-4610 6 core 2.40GHz 15M 7.2GT/s (2.9) E5-4607 6 core 2.20GHz 12M 6.4GT/s (n/a) E5-4603 4 core 2.00GHz 10M 6.4GT/s (n/a) Hi-freq 6-core gives up HT No high-frequency 4-core, 160 PCI-E gen 3 lanes + 16 gen 2 possible Dell R820 2 x16, 4 x8, 1 int HP DL560 G8p 2 x16, 3 x8, 1 x4 Supermicro X9QR 7 x16, 1 x8 QPI DMI 2 PCI-E QPI PCI-E MI PCI-E C1C6 C2C5 C3C4 LLC QPI MI C7C0 MI PCI-E C1C6 C2C5 C3C4 LLC QPI MI C7C0 MI PCI-E C1C6 C2C5 C3C4 LLC QPI MI C7C0 MI PCI-E C1C6 C2C5 C3C4 LLC QPI MI C7C0

7 PCI-E, SAS & RAID CONTROLLERS 2

8 PCI-E gen 1, 2 & 3 GenRaw bit rate UnencodedBandwidth per direction BW x8 Per direction Net Bandwidth x8 PCIe 12.5GT/s2Gbps~250MB/s2GB/s1.6GB/s PCIe 25.0GT/s4Gbps~500MB/s4GB/s3.2GB/s PCIe 38.0GT/s8Gbps~1GB/s8GB/s6.4GB/s? PCIe 1.0 & 2.0 encoding scheme 8b/10b PCIe 3.0 encoding scheme 128b/130b Simultaneous bi-directional transfer Protocol Overhead – Sequence/CRC, Header – 22 bytes, (20%?) Adaptec Series 7: 6.6GB/s, 450K IOPS

9 PCI-E Packet Net realizable bandwidth appears to be 20% less (1.6GB/s of 2.0GB/s)

10 PCIe Gen2 & SAS/SATA 6Gbps SATA 6Gbps – single lane, net BW 560MB/s SAS 6Gbps, x 4 lanes, net BW 2.2GB/s – Dual-port, SAS protocol only Not supported by SATA PCIe g2 x8 HBA SAS x4 6G 3.2GB/s 2.2GB/s Some bandwidth mismatch is OK, especially on downstream side AB AB A B

11 PCIe 3 & SAS 12Gbps – coming soon? Slowly? – Infrastructure will take more time PCIe g3 x8 HBA SAS x4 6G PCIe 3.0 x8 HBA 2 SAS x4 12Gbps ports or 4 SAS x4 6Gbps port if HBA can support 6GB/s PCIe g3 x8 HBA SAS Expander SAS x4 6Gb SAS x4 12G SAS Expander SAS x4 6Gb

12 PCIe Gen3 & SAS 6Gbps

13 LSI 12Gpbs SAS 3008

14 PCIe RAID Controllers? 2 x4 SAS 6Gbps ports (2.2GB/s per x4 port) – 1 st generation PCIe 2 – 2.8GB/s? – Adaptec: PCIe g3 can do 4GB/s – 3 x4 SAS 6Gbps bandwidth match PCIe 3.0 x8 6 x4 SAS 6Gpbs – Adaptec Series 7, PMC – 1 Chip: x8 PCIe g3 and 24 SAS 6Gbps lanes Because they could PCIe g3 x8 HBA SAS x4 6G

15 SSD, NAND, FLASH CONTROLLERS 2

16 SSD Evolution HDD replacement – using existing HDD infrastructure – PCI-E card form factor lack expansion flexibility Storage system designed around SSD – PCI-E interface with HDD like form factor? – Storage enclosure designed for SSD Rethink computer system memory & storage Re-do the software stack too!

17 SFF-8639 & Express Bay SCSI Express – storage over PCI-E, NVM-e

18 New Form Factors - NGFF Enterprise 10K/15K HDD - 15mm 15mm SSD Storage Enclosure could be 1U, 75 5mm devices?

19 SATA Express Card (NGFF) mSATA M2 Crucial

20 SSD – NAND Flash NAND – SLC, MLC regular and high-endurance – eMLC could mean endurance or embedded - differ Controller interfaces NAND to SATA or PCIE Form Factor – SATA/SAS interface in 2.5in HDD or new form factor – PCI-E interface and FF, or HDD-like FF – Complete SSD storage system

21 NAND Endurance Intel – High Endurance Technology MLC

22 NAND Endurance – Write Performance Write Performance Endurance SLC MLC-e MLC Cost Structure MLC = 1 MLC EE = 1.3 SLC = 3 Process depend. 34nm 25nm 20nm Write perf?

23 NAND P/E - Micron 34 or 25nm MLC NAND is probably good Database can support cost structure

24 NAND P/E - IBM 34 or 25nm MLC NAND is probably good Database can support cost structure

25 Write Endurance Vendors commonly cite single spec for range of models, 120, 240, 480GB Should vary with raw capacity? Depends on over- provioning? 3 year life is OK for MLC cost structure, maybe even 2 year MLC 20TB / life = 10GB/day for 2000 days (5 years+), 20GB/day – 3 years Vendors now cite 72TB write endurance for 120-480GB capacities?

26 NAND SLC – fast writes, high endurance eMLC – slow writes, medium endurance MLC – medium writes, low endurance MLC cost structure of $1/GB @ 25nm – eMLC 1.4X, SLC 2X?

27 ONFI Open NAND Flash Interface organization 1.0 2006 – 50MB/s 2.0 2008 – 133MB/s 2.12009 – 166 & 200MB/s 3.0 2011 – 400MB/s – Micron has 200 & 333MHz products ONFI 1.0 – 6 channels to support 3Gbps SATA, 260MB/s ONFI 2.0 – 4+ channels to support 6Gbps SATA, 560MB/s

28 NAND write performance MLC 85MB/s per 4-die channel (128GB) 340MB/s over 4 channels (512GB)?

29 Controller Interface PCIe vs. SATA NAND Controller NAND Some bandwidth mistmatch/overkill OK ONFI 2 – 8 channels at 133MHz to SATA 6Gbps – 560 MB/s a good match PCIe or SATA? Multiple lanes? CPU access efficiency and scaling Intel & NVM Express 6-8 channel at 400MB/s to match 2.2GB/s x4 SAS? 16 channel+ at 400MB/s to match 6.4GB/s x8 PCIe 3 But ONFI 3.0 is overwhelming SATA 6Gbps?

30 Controller Interface PCIe vs. SATA Controller SATA DRAM NAND PCIe Controller DRAM NAND PCIe NAND Controller Vendors VendorChannelsPCIe Gen IDT 32x8 Gen3 NVMe Micron32x8 Gen2 Fusion-IO 3x4?X8 Gen2?

31 SATA & PCI-E SSD Capacities 64 Gbit MLC NAND die 150mm 2 25nm 2 x 32 Gbit 34nm 1 x 64 Gbit 25nm 1 x 64 Gbit 29nm 1 64 Gbit die 8 x 64 Gbit die in 1 package = 64GB SATA Controller – 8 channels, 8 package x 64GB = 512GB PCI-E Controller – 32 channels x 64GB = 2TB

32 PCI-E vs. SATA/SAS SATA/SAS controllers have 8 NAND channels – No economic benefit in fewer channels? – 8 ch. Good match for 50MB/s NAND to SATA 3G 3Gbps – approx 280MB/s realizable BW – 8 ch also good match for 100MB/s to SATA 6G 6Gbps – 560MB/s realizable BW – NAND is now at 200 & 333MB/s PCI-E – 32 channels practical – 1500 pins – 333MHz good match to gen 3 x8 – 6.4GB/s BW

33 Crucial/Micron P400m & e Crucial P400m100GB200GB400GB Raw168GB336GB672GB Seq Read (up to)380MB/s 380MB/ Seq Write (up to)200MB/s310MB/s Random Read52K54K60K Random Write21K26K Endurance 2M-hr MTBF1.75PB3.0PB7.0PB Price$300?$600?$1000? Crucial P400e100GB200GB400GB Raw128256512 Seq Read (up to)350MB/s 350MB/ Seq Write (up to)140MB/s Random Read50K Random Write7.5K Endurance 1.2M-hr MTBF175TB Price$176$334$631 P410m SAS specs slightly different EE MLC Higher endurance write perf not lower than MLC? Preliminary – need to update

34 Crucial m4 & m500 Crucial m4128GB256GB512GB Raw128256512 Seq Read (up to)415MB/s 415MB/ Seq Write (up to)175MB/s260MB/s Random Read40K Random Write35K50K Endurance72TB Price$112$212$400 Crucial m500120GB240GB480GB960GB Raw128GB256GB512GB1024 Seq Read (up to)500MB/s 500MB/ Seq Write (up to)130MB/s250MB/s400MB/s Random Read62K72K80K Random Write35K60K80K Endurance 1.2M-hr MTBF72TB Price$130$220$400$600 Preliminary – need to update

35 Micron & Intel SSD Pricing (2013-02) P400m raw capacities are 168, 336 and 672GB (pricing retracted) Intel SSD DC S3700 pricing $235, 470, 940 and 1880 (800GB) respectively Need corrected P400m pricing

36 4K Write K IOPS P400m raw capacities are 168, 336 and 672GB (pricing retracted) Intel SSD DC S3700 pricing $235, 470, 940 and 1880 (800GB) respectively Need corrected P400m pricing

37 SSD Summary MLC is possible with careful write strategy – Partitioning to minimize index rebuilds – Avoid full database restore to SSD Endurance (HET) MLC – write perf? – Standard DB practice work – But avoid frequent index defrags? SLC – only extreme write intensive? – Lower volume product – higher cost

38 DIRECT ATTACH STORAGE 3

39 Full IO Bandwidth Misc devices on 2 x4 PCIe g2, Internal boot disks, 1GbE or 10GbE, graphics PCIe x8 QPI 10 PCIe g3 x8 slots possible – Supermicro only – HP, Dell systems have 5-7 x8+ slots + 1 x4? 4GB per slot with 2 x4 SAS, – 6GB/s with 4 x4 Mixed SSD + HDD – reduce wear on MLC PCIe x4 10GbE Infini Band 192 GB RAID HDD SSD RAID HDD SSD RAID HDD SSD RAID HDD SSD 192 GB PCIe x8 Infini Band PCIe x4 Misc PCIe x8 RAID HDD SSD RAID HDD SSD RAID HDD SSD RAID HDD SSD

40 System Storage Strategy Dell & HP only have 5-7 slots 4 Controllers @ 4GB/s each is probably good enough? Few practical products can use PCIe G3 x16 slots Capable of 16GB/s with initial capacity – 4 HBA, 4-6GB/s each with allowance for capacity growth – And mixed SSD + HDD PCIe x8 PCIe x4 IBRAID 10GbE QPI 192 GB HDD SSD

41 Node 1 QPI 192 GB HBA Node 2 QPI 192 GB HBA MD3220 Clustered SAS Storage Dell MD3220 supports clustering Upto 4 nodes w/o external switch (extra nodes not shown) SAS Host IOC PCIE Switch SAS Host SAS Host SAS Host 2GB SAS Exp SAS Host IOC PCIE Switch SAS Host SAS Host SAS Host 2GB SAS Exp Host IOCPCIE Host 2GBExp Host IOCPCIE Host 2GBExp SSD HDD

42 Alternate SSD/HDD Strategy PCIe x8 PCIe x4 IBHBA Primary System – All SSD for data & temp, – logs may be HDD Secondary System – HDD for backup and restore testing 10GbE SSD HDD Backup System PCIe x8 PCIe x4PCIe x8 IB RAID HDD QPI 192 GB

43 System Storage Mixed SSD + HDD Each RAID Group-Volume should not exceed 2GB/s BW of x4 SAS 2-4 volumes per x8 PCIe G3 slot SATA SSD read 350-500MB/s, write 140MB/s+ 8 per volume allows for some overkill 16 SSD per RAID Controller 64 SATA/SAS SSD’s to deliver 16- 24GB/s 4 HDD per volume rule does not apply HDD for local database backup, restore tests, and DW flat files SSD & HDD on shared channel – simultaneous bi-directional IO x8 HBA x8 x4 HBA IB HBA SSD HDD 10GbE QPI 192 GB

44 SSD/HDD System Strategy MLC is possible with careful write strategy – Partitioning to minimize index rebuilds – Avoid full database restore to SSD Hybrid SSD + HDD system, full-duplex signalling Endurance (HET) MLC – write perf? – Standard DB practice work, avoid index defrags SLC – only extreme write intensive? – Lower volume product – higher cost HDD – for restore testing

45 SAS Expander Disk Enclosure expansion ports not shown 2 x4 to hosts 1 x4 for expansion 24 x1 for disks

46 Storage Infracture – designed for HDD 15mm 2 SAS Expanders for dual-port support – 1 x4 upstream (to host), 1 x4 downstream (expan) – 24 x1 for bays 2U

47 Mixed HDD + SSD Enclosure 15mm Current: 24 x 15mm = 360mm + spacing Proposed 16 x15mm=240mm + 16x7mm= 120 2U

48 Enclosure 24x15mm and proposed Current 2U Enclosure, 24 x 15mm bays – HDD or SSD 2 SAS expanders – 32 lanes each 4 lanes upstream to host 4 lanes downstream for expansion 24 lanes for bays PCIe Host x8 PCIe x8 384 GB SAS SAS Expander HBA SAS x4 6 Gpbs 2.2GB/s New SAS 12Gbps 16 x 15mm + 16 x 7mm bays 2 SAS expanders – 40 lanes each 4 lanes upstream to host 4 lanes downstream for expansion 32 lanes for bays 2 RAID Groups for SSD, 2 for HDD 1 SSD Volume on path A 1 SSD Volume on path B

49 Enclosure 24x15mm and proposed Current 2U Enclosure, 24 x 15mm bays – HDD or SSD 2 SAS expanders – 32 lanes each 4 lanes upstream to host 4 lanes downstream for expansion 24 lanes for bays PCIe Host x8 PCIe x8 384 GB SAS SAS Expander HBA New SAS 12Gbps 16 x 15mm + 16 x 7mm bays 2 SAS expanders – 40 lanes each 4 lanes upstream to host 4 lanes downstream for expansion 32 lanes for bays 2 RAID Groups for SSD, 2 for HDD 1 SSD Volume on path A 1 SSD Volume on path B

50 SAS x4 Alternative Expansion PCIe x8 HBA SAS x4 Host Expander Each SAS expander – 40 lanes, 8 lanes upstream to host with no expansion or 4 lanes upstream and 5 lanes downstream for expansion 32 lanes for bays Enclosure 2 Enclosure 3 Enclosure 4 Enclosure 1

51 PCI-E with Expansion SAS SAS Expander PCIe Host x8 PCIe x8 384 GB HBA SAS x4 6 Gpbs 2.2GB/s PCIe Host x8 PCIe x8 384 GB PCI-E Switch x8 PCI-E slot SSD suitable for known capacity 48 & 64 lanes PCI-E switches available – x8 or x4 ports Express bay form factor? Few x8 ports? or many x4 ports?

52 Enclosure for SSD (+ HDD?) 2 x4 on each expander upstream – 4GB/s – No downstream ports for expansion? 32 ports for device bays – 16 SSD (7mm) + 16 HDD (15mm) 40 lanes total, no expansion – 48 lanes with expansion

53 Large SSD Array Large number of devices, large capacity – Downstream from CPU has excess bandwidth Do not need SSD firmware peak performance – 1 ) no stoppages, 2) consistency is nice Mostly static data – some write intensive – Careful use of partitioning to avoid index rebuild and defragmentation – If 70% is static, 10% is write intensive Does wear leveling work

54 DATABASE – SQL SERVER 4

55 Database Environment OLTP + DW Databases are very high value – Software license + development is huge – 1 or more full time DBA, several application developers, and help desk personnel – Can justify any reasonable expense – Full knowledge of data (where the writes are) – Full control of data (where the writes are) – Can adjust practices to avoid writes to SSD

56 Database – Storage Growth 10GB per day data growth – 10M items at 1KB per row (or 4 x 250 byte rows) – 18 TB for 5 years (1831 days) – Database log can stay on HDD Heavy system – 64-128 x 256/512GB (raw) SSD – Each SSD can support 20GB/day (36TB lifetime?) With Partitioning – few full index rebuilds Can replace MLC SSD every 2 years if required Big company

57 Extra Capacity - Maintenance Storage capacity will be 2-3X database size – It will be really stupid if you cannot update application for lack of space to modify a large table – SAN evironment Only required storage capacity allocated May not be able to perform maintenance ops – If SAN admin does not allocate extra space

58 SSD/HDD Component Pricing 2013 MLC consumer<$1.0K/TB MLC Micron P400e<$1.2K/TB MLC endurance<$2.0K/TB SLC$4K??? HDD 600GB, 10K$400

59 Database Storage Cost 8 x256GB (raw) SSD per x4 SAS channel = 2TB 2 x4 ports per RAID controller, 4TB/RC 4 RAID Controller per 2 socket system, 16TB 32 TB with 512GB SSD, 64TB with 1TB, – 64 SSD per system at $250 (MLC) $16K – 64 HDD 10K 600GB $400 $26K – Server 2xE5, 24x16GB, qty 2 $12K each – SQL Server 2012 EE $6K x 16 cores $96K HET MLC and even SLC premium OK Server/Enterprise premium – high validation effort, low volume, high support expectations

60 OLTP & DW OLTP – backup to local HDD – Superfast backup, read 10GB/s, write 3GB/s (R5) – Writes to data blocked during backup – Recovery requires log replay DW – example: 10TB data, 16TB SSD – Flat files on HDD – Tempdb will generate intensive writes (1TB) Database (real) restore testing – Force tx roll forward/back, i.e., need HDD array

61 SQL Server Storage Configuration IO system must have massive IO bandwidth – IO over several channels Database must be able to use all channels simultaneously – Multiple files per filegroups Volumes / RAID Groups on each channel – Volume comprised of several devices

62 HDD, RAID versus SQL Server HDD – pure sequential – not practical, impossible to maintain – Large block 256K good enough 64K OK RAID Controller – 64K to 256K stripe size SQL Server – Default extent allocation: 64K per file – With –E, 4 consecutive extents – why not 16???

63 File Layout Physical View Each Filegroup and tempdb has 1 data file on every data volume IO to any object is distributed over all paths and all disks x8 HBA x8 HBA x8 HBA x8 HBA QPI 192 GB x4 10GbE

64 Filegroup & File Layout Each File Group has 1 file on each data volume Each object is distributed across all data “disks” Tempdb data files share same volumes As shown, 2 RAID groups per controller, 1 per port. Can be 4 RG/volume per Ctlr OS and Log disks not shown Controller 1 Port 0 Disk 2 Basic FileGroup A, File 1 FileGroup B, File 1 TempdbFile 1 Controller 1 Port 1 Disk 3 Basic FileGroup A, File 2 FileGroup B, File 2 TempdbFile 2 Controller 2 Port 0 Disk 4 Basic FileGroup A, File 3 FileGroup B, File 3 TempdbFile 3 Controller 2 Port 1 Disk 5 Basic FileGroup A, File 4 FileGroup B, File 4 TempdbFile 4 FileGroup A, File 5 FileGroup B, File 5 TempdbFile 5 Controller 3 Port 0 Disk 6 Basic FileGroup A, File 6 FileGroup B, File 6 TempdbFile 6 Controller 3 Port 1 Disk 7 Basic FileGroup A, File 7 FileGroup B, File 7 TempdbFile 7 Controller 4 Port 0 Disk 8 Basic FileGroup A, File 8 FileGroup B, File 8 TempdbFile 8 Controller 4 Port 1 Disk 9 Basic

65 RAID versus SQL Server Extents Default: allocate 1 extent from file 1, allocate extent 2 from file 2, Disk IO – 64K Only 1 disk in each RAID group is active Controller 1 Port 0 Disk 2 Basic 1112GB Online Extent 1 Extent 17 Extent 33 Extent 5 Extent 21 Extent 37 Extent 9 Extent 25 Extent 41 Extent 13 Extent 29 Extent 45 Controller 1 Port 1 Disk 3 Basic 1112GB Online Extent 2 Extent 18 Extent 34 Extent 6 Extent 22 Extent 38 Extent 10 Extent 26 Extent 42 Extent 14 Extent 30 Extent 46 Controller 2 Port 0 Disk 4 Basic 1112GB Online Extent 3 Extent 19 Extent 35 Extent 7 Extent 23 Extent 39 Extent 11 Extent 27 Extent 43 Extent 15 Extent 31 Extent 47 Controller 2 Port 1 Disk 5 Basic 1112GB Online Extent 4 Extent 20 Extent 36 Extent 8 Extent 24 Extent 40 Extent 12 Extent 28 Extent 44 Extent 16 Extent 32 Extent 48

66 Consecutive Extents -E Controller 1 Port 0 Disk 2 Basic 1112GB Online Extent 1 Extent 17 Extent 33 Extent 2 Extent 18 Extent 34 Extent 3 Extent 19 Extent 35 Extent 4 Extent 20 Extent 36 Controller 1 Port 1 Disk 3 Basic 1112GB Online Extent 5 Extent 21 Extent 37 Extent 6 Extent 22 Extent 38 Extent 7 Extent 23 Extent 39 Extent 8 Extent 24 Extent 40 Controller 2 Port 0 Disk 4 Basic 1112GB Online Extent 9 Extent 25 Extent 41 Extent 10 Extent 26 Extent 42 Extent 11 Extent 27 Extent 43 Extent 12 Extent 28 Extent 44 Controller 2 Port 1 Disk 5 Basic 1112GB Online Extent 13 Extent 29 Extent 45 Extent 14 Extent 30 Extent 46 Extent 15 Extent 31 Extent 47 Extent 16 Extent 32 Extent 48 Allocate 4 consecutive extents from each file, OS issues 256K Disk IO Each HDD in RAID group sees 64K IO Upto 4 disks in RG gets IO

67 Storage Summary OLTP – endurance MLC or consumer MLC? DW - MLC w/ higher OP QA – consumer MLC or endurance MLC? Tempdb – possibly SLC Single log – HDD, multiple logs: SSD? Backups/test Restore/Flat files – HDD No caching, no auto-tiers

68 SAN

69 Software Cache + Tier

70 Cache + Auto-Tier Good idea if 1) No knowledge 2) No control In Database We have 1)Full knowledge 2)Full Control Virtual file stats Filegroups partitioning

71 Common SAN Vendor Configuration Log volume SSD10K7.2K Hot Spares Node 1Node 2 Switch SP ASP B 8 Gbps FC or 10Gbps FCOE 768 GB x4 SAS 2GB/s 24 GB Main Volume Path and component fault-tolerance, poor IO performance Multi-path IO: perferred port alternate port Single large volume for data, additional volumes for log, tempdb, etc All data IO on single FC port 700MB/s IO bandwidth

72 Multiple Paths & Volumes 3 Multiple quad-port FC HBAs Optional SSD volumes Data files must also be evenly distributed Many SAS ports Multiple local SSD for tempdb Node 2 768 GB Node 1 768 GB Switch SP ASP B 8 Gb FC 24 GB x4 SAS 2GB/s SSD x8 SSD x8 Data 5Data 6Data 7Data 1Data 2Data 3Data 4Data 8Data 9Data 13Data 10Data 14Data 11Data 15Data 12Data 16SSD 1SSD 2SSD 3SSD 4Log 1Log 2Log 3Log 4

73 Multiple Paths & Volumes 2 Multiple quad-port FC HBAs Optional SSD volumes Data files must also be evenly distributed Many SAS ports Node 2 768 GB Node 1 768 GB Switch SP ASP B 8 Gb FC 24 GB x4 SAS 2GB/s Data 1Data 2Data 3Data 4 Data 5Data 6Data 7Data 8 Data 9Data 10Data 11Data 12 Data 13Data 14Data 15Data 16 SSD 1 SSD 2 SSD 3 SSD 4 Log 1 Log 2 Log 3 Log 4 SSD x8 SSD x8 Multiple local SSD for tempdb

74 8Gbps FC rules 4-5 HDD RAID Group/Volumes – SQL Server with –E only allocates 4 consecutive extents 2+ Volumes per FC port – Target 700MB/s per 8Gbps FC port SSD Volumes – Limited by 700-800MB/s per 8Gbps FC port – Too many ports required for serious BW – Management headache from too many volumes

75 SQL Server SQL Server table scan to – heap generates 512K IO, easy to hit 100MB/s/disk – (clustered) index 64K IO, 30-50MB/s per disk likely

76 EMC VNX 5300 FT DW Ref Arch

77 iSCSI & File structure x4 10GbE x4 10GbE Controller 1Controller 2 RJ45 SFP+ DB1 filesDB2 files x4 10GbE x4 10GbE Controller 1Controller 2 RJ45 SFP+ DB1 filesDB2 files x4 10GbE x4 10GbE Controller 1Controller 2 RJ45 SFP+ DB1 file 1 DB2 file 1 DB1 file 2 DB2 file2

78

79 EMC VMAX

80 EMC VMAX orig and 2 nd gen  2.8 GHz Xeon w/turbo (Westmere)  24 CPU cores  256 GB cache memory (maximum)  Quad Virtual Matrix  PCIe Gen2  2.3 GHz Xeon (Harpertown)  16 CPU cores  128 GB cache memory (maximum)  Dual Virtual Matrix  PCIe Gen1

81 EMC VMAX 10K

82 EMC VMAX Virtual Matrix Virtual Matrix

83 VMAX Director

84 EMC VMAX Director IOH Director VMI VMAX Engine? FC HBA SAS VMI IOH Director FC HBA SAS VMI VMAX 10K new Upto 4 engines, 1 x 6c 2.8G per dir 50GB/s VM BW? 16 x 8Gbps FC per engine VMAX 20K Engine 4 QC 2.33GHz 128GB Virtual Maxtrix BW 24GB/s System - 8 engines, 1TB, VM BW 192GB/s, 128 FE ports VMAX 40K Engine 4 SC 2.8GHz 256GB Virtual Maxtrix BW 50GB/s System - 8 engines, 2TB, VM BW 400GB/s, 128 FE ports RapidIO IPC 3.125GHz, 2.5Gb/s 8/10 4 lanes per connection 10Gb/s = 1.25GB/s, 2.5GB/s full duplex 4 Conn per engine - 10GB/s 36 PCI-E per IOH, 72 combined 8 FE, 8 BE 16 VMI 1, 32 VMI 2

85

86 SQL Server Default Extent Allocation Data file 1 Extent 1Extent 5Extent 9Extent 13 Extent 17Extent 21Extent 25Extent 29 Extent 33Extent 37Extent 41Extent 45 Data file 2 Extent 2Extent 6Extent 10Extent 14 Extent 18Extent 22Extent 26Extent 30 Extent 34Extent 38Extent 42Extent 46 Data file 3 Extent 3Extent 7Extent 11Extent 15 Extent 19Extent 23Extent 27Extent 31 Extent 35Extent 39Extent 43Extent 47 Data file 4 Extent 4Extent 8Extent 12Extent 16 Extent 20Extent 24Extent 28Extent 32 Extent 36Extent 40Extent 44Extent 48 Allocate 1 extent per file in round robin Proportional fill EE/SE table scan tries to stay 1024 pages ahead? SQL can read 64 contiguous pages from 1 file. The storage engine reads index pages serially in key order. Partitioned table support for heap organization desired?

87 SAN Volume 1 DataVolume 2 DataVolume.. Data Volume 3 DataVolume 4 DataVolume 15 DataVolume 16 DataVolume - LogLog SSD 1SSD 2SSD...SSD 8 Node 1Node 2 Switch SP ASP B 768 GB 24 GB 8 Gb FC x4 SAS 2GB/s SSD10K Node 1 192 GB HBA Node 2 QPI 192 GB HBA

88 Clustered SAS Exp Host SAS Host IOC PCIE Switch SAS Host SAS Host SAS Host 2GB SAS Exp SAS Host IOC PCIE Switch SAS Host SAS Host SAS Host 2GB SAS Exp Node 1 768 GB Node 1 768 GB Host IOC Switch Host SAS Exp Host IOC Switch Host SAS Exp Node 1 192 GB HBA Node 2 QPI 192 GB HBA Node 2 768 GB SAS In SAS Out PCIe x8 PCIe x4 IBRAID 10GbE HDD SSD

89 Fusion-IO ioScale

90

91

92


Download ppt "#SQLSatRiyadh Storage Performance 2013 Joe Chang www.qdpma.com."

Similar presentations


Ads by Google