Presentation is loading. Please wait.

Presentation is loading. Please wait.

GMR Head Northern California CMG February 2004 Ted Oatway Enterprise Solution Specialist.

Similar presentations


Presentation on theme: "GMR Head Northern California CMG February 2004 Ted Oatway Enterprise Solution Specialist."— Presentation transcript:

1 GMR Head Northern California CMG February 2004 Ted Oatway Enterprise Solution Specialist

2 © Copyright 2004 Storage Technology Corporation (StorageTek) Know Your Data 1)I/O is typically dominated by Reads 2)I/O is very Random even for large files 3)Large caches have little to offer 4)A few files are responsible for the majority of the activity 5)These files are typically mapped in host-memory by the calling process

3 © Copyright 2004 Storage Technology Corporation (StorageTek) Reference Paper Instructional Workload (INS) >twenty laboratory machines >eight months of traces Research Workload (RES) >13 desktop machines >one year of traces WEB workload >single web server for an online library project >uses the Postgres database management system >2,300 accesses per day >one month of traces A Comparison of File System Workloads >Drew Roselli, Jacob R. Lorch, and Thomas E. Anderson >University of California, Berkeley and University of Washington >June, 2000

4 © Copyright 2004 Storage Technology Corporation (StorageTek) Basis Reference Papers The BSD study >In 1985, Ousterhout et al. presented a general characterization of dynamically collected traces [Oust85]. >They traced three servers running BSD UNIX for slightly over three days. The Sprite study >In 1991, Baker et al. conducted the same type of analysis on file server and client information and it’s affect on local cache activity. >They collected four two-day sets of traces[Bake91]. The IBM study >In 1991, Bozman et al. repeated many of the Sprite studies using traces from two separate IBM sites [Bozm91]. >This study confirmed that the Sprite study applied to non- academic sites as well. The NT study >In 1999, the same studies were repeated on three sets of two-week traces taken from 45 hosts running Windows NT [Voge99].

5 © Copyright 2004 Storage Technology Corporation (StorageTek) Conclusions (1 of 3) Reads vs. Writes >Reads typically dominate writes for block activity. >WEB workloads are read intensive. >RES - activity is dominated by writes to logs when local cache is small. –When local cache is increased logging activity occurs on the host. –Workload then matches INS >INS workload - a small local cache significantly increases Read activity. (5x with too small local cache) Average Block Lifetime >UNIX, newly created blocks die within an hour. >NT, newly created blocks that survive one second are likely to remain alive a day. >All workloads – –Overwrites cause most deleted blocks –Overwrites show substantial locality –A small write buffer is sufficient to absorb write traffic for nearly all workloads. >A 30-second write delay in cache benefits most workloads.

6 © Copyright 2004 Storage Technology Corporation (StorageTek) Conclusions (2 of 3) Caching read traffic >Small caches can sharply decrease disk read traffic >There is no support for the claim that disk traffic becomes dominated by writes when a large cache is used >Large caches show a diminishing return beyond the working set size >Even a 1MB cache reduces read-bandwidth by 65–90% Memory-mapping >All modern workloads use memory-mapping to a large extent >UNIX, a small number of memory-mapped files are shared among many active processes. >If kept in memory as long as it is memory-mapped by any process, the miss rate for file map requests is extremely low.

7 © Copyright 2004 Storage Technology Corporation (StorageTek) Conclusions (3 of 3) Read-ahead pre-fetching >Applications are accessing larger files and the maximum file size has increased in recent years. >However, larger files are more likely to be accessed more randomly than before >This renders straightforward pre-fetching less useful File access patterns >File access patterns are bimodal in that most files tend to be mostly read or mostly written. >Especially true for files that are accessed frequently

8 © Copyright 2004 Storage Technology Corporation (StorageTek) Understanding the Basic Building Block Disk capacity has increased by a factor of 8x in the last five years >18GB, 36GB, 73GB, 146GB >Soon to be 300GB Disk Internal Transfer Rates have recently barely doubled >32MB/sec to 60MB/sec

9 © Copyright 2004 Storage Technology Corporation (StorageTek) Critical Disk Drive Metrics ITR – Internal Transfer Rate - How quickly the disk can transfer the data between the platters and the buffer cache. ETR – External Transfer Rate - How quickly the disk can transfer the data between the buffer cache and the controller. On-Board Cache - The buffer cache resident on the disk drive. Some disk controllers allow this cache to be operated in write-through mode but this is unusual today. RPM -Revolutions Per Minute or ‘How quickly the next block of data comes around ‘. Average Access Time - On average how quickly the head can move to a selected track. IOPS - RPM + Average Access Time determines the number of I/O operations the disk can perform per second.

10 © Copyright 2004 Storage Technology Corporation (StorageTek) Disk Specifications Capacity RPMCache ETR ITR IOPS 36GB (older)10K4MB100 MB/sec 35 MB/sec GB10K4MB200 MB/sec 35 MB/sec GB10K4MB200 MB/sec 35 MB/sec GB15K8MB200 MB/sec 60 MB/sec GB15K8MB200 MB/sec 60 MB/sec GB10K 16MB200 MB/sec 60 MB/sec GB7.2K 16MB200 MB/sec 28 MB/sec 89 Barracuda SATA Disk Drive: 120GB 7.2K8MB150 MB/sec 71MB/sec 75

11 © Copyright 2004 Storage Technology Corporation (StorageTek) Understanding Volume (RAID) Groups 1)Small RAID groups are better than larger RAID groups for most workloads 2)Small RAID groups better emulate individual disk drives 3)I/O is “bursty” by nature 4)“Bursty” and Random access patterns cause I/O’s to block at the disk drive 5)Contention occurs at the Volume Group level

12 © Copyright 2004 Storage Technology Corporation (StorageTek) Volume Group RAID5 (3+1) Volume = Logical Unit Rules of Thumb The smallest disk in a stripe determines the overall size of the stripe. The slowest disk in a stripe determines the overall speed of the stripe. Contention occurs at the Volume Group level. D-Series Disk – Basic Building Blocks

13 © Copyright 2004 Storage Technology Corporation (StorageTek) Queuing Theory 60 IOPS AWQ = 181ms 18GB 68 IOPS 60 IOPS AWQ = 2.8ms 64x Improvement (Parallelism) 18GB 68 IOPS 18GB 68 IOPS White Paper – Storage Systems Performance ConsiderationsStorage Systems Performance Considerations AWQ = Average Wait Queue 60 IOPS AWQ = 4.3ms 42x Improvement (Concurrency) 36GB 15K 179 IOPS 200 IOPS 36GB 10K 100 IOPS

14 © Copyright 2004 Storage Technology Corporation (StorageTek) Small RAID Groups Small RAID Groups provide –  more throughput than a single large RAID group (in aggregate) We do not design for a RAID level but determine the best layout for the drives we have configured.  a better configuration for High Availability (H/A) A large RAID group often must have two or more disks in a single tray  smaller LUNs with less contention A RAID5 (3+1) using four 73GB disk drives is 210GB useable A RAID5 (7+1) using eight 73GB disk drives is 420GB useable  better RAID5 write performance  better balance on four FC-AL loop disk arrays

15 © Copyright 2004 Storage Technology Corporation (StorageTek) More Throughput RAID5 (3+1) 128K Segment Size RAID5(3+1) design benchmark is about 9,000 IOPS. RAID5 (7+1) or 128K Segment Size RAID5(7+1) design benchmark is about 11,000 IOPS. 2x RAID5(3+1) designs benchmark at 18,000 IOPS.

16 © Copyright 2004 Storage Technology Corporation (StorageTek) Better High Availability (H/A) Configuring RAID10 vertically potentially puts the mirror in the same disk tray as the primary. Configuring RAID10 horizontally puts the mirror in a separate disk tray as the primary. Configuring RAID5 horizontally puts all disks in the same disk tray. Configuring RAID5 vertically puts all disks in a separate disk tray.

17 © Copyright 2004 Storage Technology Corporation (StorageTek) Less Contention One 8-way vs. two 4-way stripes S S S S S S S S S = serviced concurrently S

18 © Copyright 2004 Storage Technology Corporation (StorageTek) RAID5 Write Algorithms Full-stripe writes: writes that update all the stripe units in a parity group. >The new parity value is computed across all new blocks. >No additional read or write operations are required to compute parity >Full-stripe writes are the most efficient type of writes. Reconstruct writes: writes that compute parity by reading in the data from the stripe that are not to be updated. >Parity is then computed over this data and the new data. >Reconstruct writes are less efficient than full-stripe writes Read-modify writes: writes that compute the new parity value by >1) reading the old data blocks from the disks to be updated >2) reading the old parity blocks for the stripe >3) calculating how the new data is different from the old data >4) changing the old parity to reflect these differences. Source: Striping in a RAID Level 5 Disk Array University of Michigan

19 © Copyright 2004 Storage Technology Corporation (StorageTek) RAID5 Rules of Thumb Reads in a RAID Level 5 are very similar to RAID Level 0 Writes in a RAID Level 5 are quite different In general - >writes that span a larger fraction of the stripe are more efficient than writes that span a smaller fraction. Smaller RAID groups tend to be more efficient for writes than large RAID groups. Source: Striping in a RAID Level 5 Disk Array University of Michigan

20 © Copyright 2004 Storage Technology Corporation (StorageTek) Designing with four FC-AL Loops Larger capacity disk drives are allowing for smaller arrays with 20TB capacities Larger capacities but smaller configurations require fewer components overall Attention to small details are now more important

21 © Copyright 2004 Storage Technology Corporation (StorageTek) Small RAID - Matching Segment Size Baseline 80% Improvement Overall

22 © Copyright 2004 Storage Technology Corporation (StorageTek) Loop Saturation – 1Gbps FC-AL

23 © Copyright 2004 Storage Technology Corporation (StorageTek) Vertical Load Balancing (StorageTek Array) Two trays support four fibers for 4 x 2Gbps throughput. LCC A D200 tray has a 2Gbps back-plane. One tray supports two fibers for 2 x 2Gbps throughput. LCC D280 controller Cache Battery LCC Three trays support four fibers for 4 x 2Gbps throughput but the loops are unbalanced. Four trays support four fibers for 4 x 2Gbps throughput and the loops are balanced. LCC

24 © Copyright 2004 Storage Technology Corporation (StorageTek) Horizontal Load Balancing (StorageTek Array) Each tray supports two fibre channel loops. All Even Numbered slots are serviced by the red loop with fail-over to the green loop. All Odd Numbered slots are serviced by green loop with fail-over to the red loop. LCC

25 © Copyright 2004 Storage Technology Corporation (StorageTek) Tray3 Tray1 Tray Volume Group1 Volume Group2 Volume Group3 Volume Group4 Hot Spare Not Used 3-Tray Design Problems RAID5(6+1)

26 © Copyright 2004 Storage Technology Corporation (StorageTek) Volume group1 Volume group2 Volume group3 Volume group4 Back End Primary Paths 3-Tray Design Problems Source: Jean Dion, StorageTek Canada Tray RAID5(6+1) Tray RAID5(3+1) Tray RAID5(6+1)

27 © Copyright 2004 Storage Technology Corporation (StorageTek) Spinning on a Cache Miss 1 LUN per Controller = 50% throughput. 6 LUNs per Controller = 100% throughput.

28 © Copyright 2004 Storage Technology Corporation (StorageTek) A Real-World Example Traditional ORACLE layout using the Optimal Flexible Architecture (OFA) Newer ORACLE layout using the “Stripe and Mirror Everything” Architecture (S.A.M.E.)

29 © Copyright 2004 Storage Technology Corporation (StorageTek) Parallel Access - 24 Disks - RAID1 Data1 Data2 Data3 Data4 Index1 Index2 Index3 Index4 Archive RollBack /u01 Temp RAID1 - 72GB Drives - 144GB per Volume Group - 864GB Useable Since IOPS cannot be shared Hot Spots are created. Twelve threads.

30 © Copyright 2004 Storage Technology Corporation (StorageTek) Parallel Access - 24 Disks - RAID5 Data1 – Index2 Data2 – Index3 Data3 – Index4 Data4 – Index1 Presented to the server as 12 separate LUNs that are NOT concatenated together. Faster “Virtual” volumes are more resistant to Hot Spots. RAID5 72GB Drives RollBack - Dump Temp - /u01 For the same number of disks we see: Wasted disk but no Hot Spots. Six threads.

31 © Copyright 2004 Storage Technology Corporation (StorageTek) Stripe and Mirror Everything (SAME) Data Index Swap Rollback Dump Temp /u01 24 Disks – 1 Thread 16 Disks – 1 Thread Option # 2 Same # of disks Option # 1 Same Size DB

32 © Copyright 2004 Storage Technology Corporation (StorageTek) ORACLE RDBMS Layout RAID5 Group RAID5 Group Service Processor “B” Service Processor “A” RAID1 Group Data_vg RAID5(3+1) 73GB Disk drives 10K RPM ~219GB Volume Group Two ~110GB Volumes RAID1(1+1) 73GB Disk drives 10K RPM ~73GB Volume Group Two ~36GB Volumes Oracle_vg

33 © Copyright 2004 Storage Technology Corporation (StorageTek) Technologies roadmap Optical-electronic technologies DVD R/W (18GB today) Blue-Ray Disc 200 GB planned Holographic storage (3D) Colossal Storage Corp. project: (atomic holography recoding) density : 200 Tbits/in 2 Other Technologies (magnetic, MEMS..) Atomic resolution storage (1 000 Gbits/in 2 ) Heat-assisted magnetic recording (1000 Gbits/in 2 planned) Perpendicular magnetization AFM storage ( Millipede IBM project) Superparamagnetic effect (2005) 60 gbits/in 2 (source IBM) Self-Ordered Magnetic Arrays (SOMA)

34 © Copyright 2004 Storage Technology Corporation (StorageTek) Increasing the Areal Density SUPER PARAMAGNETIC LIMIT: >Min elementary bit size 9 nm >Could be reached by 2005 > Gb/in² is the limit TRANSITION WIDTH: >The width between two neighboring bits of opposite magnetization : minimum distance 40 to 80 nm SIDE TRACK EFFECT: >Requires extra space between tracks to prevent over-writing TRACKING: >The smaller the bits get the more difficult it is to read them Source: Seagate – From Gigabytes to Exabytes

35 © Copyright 2004 Storage Technology Corporation (StorageTek) Single Pole Perpendicular Magnetic Recording Source: Seagate – From Gigabytes to Exabytes High-density magnetic data storage Longitudinal recording methods lay the magnetic media in the plane of the recording surface Longitudinal Recording Perpendicular Recording may approach 1 Tb per in 2 1TB of data on a 3.5 inch disk 1TB of data on a tape cartridge Perpendicular recording methods stand the magnetization of the media on end, perpendicular to the plane of the recording surface

36 © Copyright 2004 Storage Technology Corporation (StorageTek) Heat Assisted Magnetic Recording Heat Assisted Magnetic Recording (HAMR) >Also know as Optically assisted recording >Involves producing a hot spot (commonly with a laser) on the media, while data is simultaneously written magnetically. >The net effect is that when the media is heated, the coercivity or field required to write on the media is reduced >Higher stability against superparagmagnetism Is it HAMR or OAR? A laser heats spots on the disk to make them easier to magnetize. Source: Seagate – From Gigabytes to Exabytes

37 © Copyright 2004 Storage Technology Corporation (StorageTek) Micro Electro-Mechanical System (MEMS) Thermomechanical storage: >Tiny depressions melted by an AFM tip into a polymer medium represent stored data bits that can then be read by the same tip >60Kbps Throughput – but there can be thousands of heads in an array >150 Gb/in² to 400 Gb/in² The Millipede (IBM) AFM (atomic-force microscopy) or “probe recording”

38 Questions, Concerns or Comments?

39 © Copyright 2004 Storage Technology Corporation (StorageTek) Holographic Memory Systems First to market >InPhase’s Tapestry product Initial capacity >100GB Transfer rate >20MB/sec. As optical storage goes, InPhase believes that dual-layer DVD will run out of capacity at 50GB. Still many challenges ahead >Materials >Laser technologies >Fast-frame-rate, large-array detectors Volume holography technologies

40 © Copyright 2004 Storage Technology Corporation (StorageTek) SQL Server Performance Considerations 1)Let SQL Server do most of the tuning. 2)Reduce I/O so that buffer cache is best utilized. 3)Create and maintain good indexes. 4)Multiple Instances a.Performance tuning can be complicated when running multiple instances of SQL Server )Partition large data sets and indexes. 6)Monitor disk I/O subsystem performance. 7)Tune applications and queries. 8)Optimize active data. a.As much as 80 percent of database activity may be due to the most recently loaded data. Source: Microsoft – Partitioning Very Large SQL Server Databases

41 © Copyright 2004 Storage Technology Corporation (StorageTek) SQL Server Partitioning SQL Server activity to these objects can be separated across different hard drives, RAID controllers, and PCI channels (or combinations of the three): Transaction Logs Database tempdb Tables Non-clustered Indexes Provides the most flexibility allowing separate RAID channels to be associated with the different areas of activity. Takes full advantage of online RAID expansion. Easily relates disk queuing associated with each activity back to a distinct RAID channel. Disk queuing issues are then simply resolved by adding more drives to the RAID array. Source: Microsoft – Partitioning Very Large SQL Server Databases

42 © Copyright 2004 Storage Technology Corporation (StorageTek) SQL Server Layout RAID5 Group RAID5 Group Service Processor “B” Service Processor “A” RAID1 Group RAID5(3+1) 73GB Disk drives 10K RPM ~219GB Volume Group Two ~110GB Volumes RAID1(1+1) 73GB Disk drives 10K RPM ~73GB Volume Group Two ~36GB Volumes tempdbDatabase and TablesTransaction Logs


Download ppt "GMR Head Northern California CMG February 2004 Ted Oatway Enterprise Solution Specialist."

Similar presentations


Ads by Google