We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byKerry Pope
Modified about 1 year ago
GMR Head Northern California CMG February 2004 Ted Oatway Enterprise Solution Specialist
© Copyright 2004 Storage Technology Corporation (StorageTek) Know Your Data 1)I/O is typically dominated by Reads 2)I/O is very Random even for large files 3)Large caches have little to offer 4)A few files are responsible for the majority of the activity 5)These files are typically mapped in host-memory by the calling process
© Copyright 2004 Storage Technology Corporation (StorageTek) Reference Paper Instructional Workload (INS) >twenty laboratory machines >eight months of traces Research Workload (RES) >13 desktop machines >one year of traces WEB workload >single web server for an online library project >uses the Postgres database management system >2,300 accesses per day >one month of traces A Comparison of File System Workloads >Drew Roselli, Jacob R. Lorch, and Thomas E. Anderson >University of California, Berkeley and University of Washington >June, 2000
© Copyright 2004 Storage Technology Corporation (StorageTek) Basis Reference Papers The BSD study >In 1985, Ousterhout et al. presented a general characterization of dynamically collected traces [Oust85]. >They traced three servers running BSD UNIX for slightly over three days. The Sprite study >In 1991, Baker et al. conducted the same type of analysis on file server and client information and it’s affect on local cache activity. >They collected four two-day sets of traces[Bake91]. The IBM study >In 1991, Bozman et al. repeated many of the Sprite studies using traces from two separate IBM sites [Bozm91]. >This study confirmed that the Sprite study applied to non- academic sites as well. The NT study >In 1999, the same studies were repeated on three sets of two-week traces taken from 45 hosts running Windows NT [Voge99].
© Copyright 2004 Storage Technology Corporation (StorageTek) Conclusions (1 of 3) Reads vs. Writes >Reads typically dominate writes for block activity. >WEB workloads are read intensive. >RES - activity is dominated by writes to logs when local cache is small. –When local cache is increased logging activity occurs on the host. –Workload then matches INS >INS workload - a small local cache significantly increases Read activity. (5x with too small local cache) Average Block Lifetime >UNIX, newly created blocks die within an hour. >NT, newly created blocks that survive one second are likely to remain alive a day. >All workloads – –Overwrites cause most deleted blocks –Overwrites show substantial locality –A small write buffer is sufficient to absorb write traffic for nearly all workloads. >A 30-second write delay in cache benefits most workloads.
© Copyright 2004 Storage Technology Corporation (StorageTek) Conclusions (2 of 3) Caching read traffic >Small caches can sharply decrease disk read traffic >There is no support for the claim that disk traffic becomes dominated by writes when a large cache is used >Large caches show a diminishing return beyond the working set size >Even a 1MB cache reduces read-bandwidth by 65–90% Memory-mapping >All modern workloads use memory-mapping to a large extent >UNIX, a small number of memory-mapped files are shared among many active processes. >If kept in memory as long as it is memory-mapped by any process, the miss rate for file map requests is extremely low.
© Copyright 2004 Storage Technology Corporation (StorageTek) Conclusions (3 of 3) Read-ahead pre-fetching >Applications are accessing larger files and the maximum file size has increased in recent years. >However, larger files are more likely to be accessed more randomly than before >This renders straightforward pre-fetching less useful File access patterns >File access patterns are bimodal in that most files tend to be mostly read or mostly written. >Especially true for files that are accessed frequently
© Copyright 2004 Storage Technology Corporation (StorageTek) Understanding the Basic Building Block Disk capacity has increased by a factor of 8x in the last five years >18GB, 36GB, 73GB, 146GB >Soon to be 300GB Disk Internal Transfer Rates have recently barely doubled >32MB/sec to 60MB/sec
© Copyright 2004 Storage Technology Corporation (StorageTek) Critical Disk Drive Metrics ITR – Internal Transfer Rate - How quickly the disk can transfer the data between the platters and the buffer cache. ETR – External Transfer Rate - How quickly the disk can transfer the data between the buffer cache and the controller. On-Board Cache - The buffer cache resident on the disk drive. Some disk controllers allow this cache to be operated in write-through mode but this is unusual today. RPM -Revolutions Per Minute or ‘How quickly the next block of data comes around ‘. Average Access Time - On average how quickly the head can move to a selected track. IOPS - RPM + Average Access Time determines the number of I/O operations the disk can perform per second.
© Copyright 2004 Storage Technology Corporation (StorageTek) Disk Specifications Capacity RPMCache ETR ITR IOPS 36GB (older)10K4MB100 MB/sec 35 MB/sec 123 36GB10K4MB200 MB/sec 35 MB/sec 100 72GB10K4MB200 MB/sec 35 MB/sec 100 36GB15K8MB200 MB/sec 60 MB/sec 179 72GB15K8MB200 MB/sec 60 MB/sec 179 146GB10K 16MB200 MB/sec 60 MB/sec 130 180GB7.2K 16MB200 MB/sec 28 MB/sec 89 Barracuda SATA Disk Drive: 120GB 7.2K8MB150 MB/sec 71MB/sec 75
© Copyright 2004 Storage Technology Corporation (StorageTek) Understanding Volume (RAID) Groups 1)Small RAID groups are better than larger RAID groups for most workloads 2)Small RAID groups better emulate individual disk drives 3)I/O is “bursty” by nature 4)“Bursty” and Random access patterns cause I/O’s to block at the disk drive 5)Contention occurs at the Volume Group level
© Copyright 2004 Storage Technology Corporation (StorageTek) Volume Group RAID5 (3+1) Volume = Logical Unit Rules of Thumb The smallest disk in a stripe determines the overall size of the stripe. The slowest disk in a stripe determines the overall speed of the stripe. Contention occurs at the Volume Group level. D-Series Disk – Basic Building Blocks
© Copyright 2004 Storage Technology Corporation (StorageTek) Queuing Theory 60 IOPS AWQ = 181ms 18GB 68 IOPS 60 IOPS AWQ = 2.8ms 64x Improvement (Parallelism) 18GB 68 IOPS 18GB 68 IOPS White Paper – Storage Systems Performance ConsiderationsStorage Systems Performance Considerations AWQ = Average Wait Queue 60 IOPS AWQ = 4.3ms 42x Improvement (Concurrency) 36GB 15K 179 IOPS 200 IOPS 36GB 10K 100 IOPS
© Copyright 2004 Storage Technology Corporation (StorageTek) Small RAID Groups Small RAID Groups provide – more throughput than a single large RAID group (in aggregate) We do not design for a RAID level but determine the best layout for the drives we have configured. a better configuration for High Availability (H/A) A large RAID group often must have two or more disks in a single tray smaller LUNs with less contention A RAID5 (3+1) using four 73GB disk drives is 210GB useable A RAID5 (7+1) using eight 73GB disk drives is 420GB useable better RAID5 write performance better balance on four FC-AL loop disk arrays
© Copyright 2004 Storage Technology Corporation (StorageTek) More Throughput RAID5 (3+1) 128K Segment Size RAID5(3+1) design benchmark is about 9,000 IOPS. RAID5 (7+1) or 128K Segment Size RAID5(7+1) design benchmark is about 11,000 IOPS. 2x RAID5(3+1) designs benchmark at 18,000 IOPS.
© Copyright 2004 Storage Technology Corporation (StorageTek) Better High Availability (H/A) Configuring RAID10 vertically potentially puts the mirror in the same disk tray as the primary. Configuring RAID10 horizontally puts the mirror in a separate disk tray as the primary. Configuring RAID5 horizontally puts all disks in the same disk tray. Configuring RAID5 vertically puts all disks in a separate disk tray.
© Copyright 2004 Storage Technology Corporation (StorageTek) Less Contention One 8-way vs. two 4-way stripes S S S S S S S S S = serviced concurrently S
© Copyright 2004 Storage Technology Corporation (StorageTek) RAID5 Write Algorithms Full-stripe writes: writes that update all the stripe units in a parity group. >The new parity value is computed across all new blocks. >No additional read or write operations are required to compute parity >Full-stripe writes are the most efficient type of writes. Reconstruct writes: writes that compute parity by reading in the data from the stripe that are not to be updated. >Parity is then computed over this data and the new data. >Reconstruct writes are less efficient than full-stripe writes Read-modify writes: writes that compute the new parity value by >1) reading the old data blocks from the disks to be updated >2) reading the old parity blocks for the stripe >3) calculating how the new data is different from the old data >4) changing the old parity to reflect these differences. Source: Striping in a RAID Level 5 Disk Array University of Michigan
© Copyright 2004 Storage Technology Corporation (StorageTek) RAID5 Rules of Thumb Reads in a RAID Level 5 are very similar to RAID Level 0 Writes in a RAID Level 5 are quite different In general - >writes that span a larger fraction of the stripe are more efficient than writes that span a smaller fraction. Smaller RAID groups tend to be more efficient for writes than large RAID groups. Source: Striping in a RAID Level 5 Disk Array University of Michigan
© Copyright 2004 Storage Technology Corporation (StorageTek) Designing with four FC-AL Loops Larger capacity disk drives are allowing for smaller arrays with 20TB capacities Larger capacities but smaller configurations require fewer components overall Attention to small details are now more important
© Copyright 2004 Storage Technology Corporation (StorageTek) Small RAID - Matching Segment Size Baseline 80% Improvement Overall
© Copyright 2004 Storage Technology Corporation (StorageTek) Loop Saturation – 1Gbps FC-AL
© Copyright 2004 Storage Technology Corporation (StorageTek) Vertical Load Balancing (StorageTek Array) Two trays support four fibers for 4 x 2Gbps throughput. LCC A D200 tray has a 2Gbps back-plane. One tray supports two fibers for 2 x 2Gbps throughput. LCC D280 controller Cache Battery LCC Three trays support four fibers for 4 x 2Gbps throughput but the loops are unbalanced. Four trays support four fibers for 4 x 2Gbps throughput and the loops are balanced. LCC
© Copyright 2004 Storage Technology Corporation (StorageTek) Horizontal Load Balancing (StorageTek Array) Each tray supports two fibre channel loops. All Even Numbered slots are serviced by the red loop with fail-over to the green loop. All Odd Numbered slots are serviced by green loop with fail-over to the red loop. LCC 0 1 2 3 4 5
© Copyright 2004 Storage Technology Corporation (StorageTek) Tray3 Tray1 Tray2 12345678910 Volume Group1 Volume Group2 Volume Group3 Volume Group4 Hot Spare Not Used 3-Tray Design Problems RAID5(6+1)
© Copyright 2004 Storage Technology Corporation (StorageTek) 1 43 1 22 2 1111 2 3 4 1 43 1 223 2 11 2423 Volume group1 Volume group2 Volume group3 Volume group4 Back End Primary Paths 3-Tray Design Problems Source: Jean Dion, StorageTek Canada 1 324 8686 4-Tray RAID5(6+1) 1 324 7777 4-Tray RAID5(3+1) 1 324 10954 3-Tray RAID5(6+1)
© Copyright 2004 Storage Technology Corporation (StorageTek) Spinning on a Cache Miss 1 LUN per Controller = 50% throughput. 6 LUNs per Controller = 100% throughput.
© Copyright 2004 Storage Technology Corporation (StorageTek) A Real-World Example Traditional ORACLE layout using the Optimal Flexible Architecture (OFA) Newer ORACLE layout using the “Stripe and Mirror Everything” Architecture (S.A.M.E.)
© Copyright 2004 Storage Technology Corporation (StorageTek) Parallel Access - 24 Disks - RAID1 Data1 Data2 Data3 Data4 Index1 Index2 Index3 Index4 Archive RollBack /u01 Temp RAID1 - 72GB Drives - 144GB per Volume Group - 864GB Useable Since IOPS cannot be shared Hot Spots are created. Twelve threads.
© Copyright 2004 Storage Technology Corporation (StorageTek) Parallel Access - 24 Disks - RAID5 Data1 – Index2 Data2 – Index3 Data3 – Index4 Data4 – Index1 Presented to the server as 12 separate LUNs that are NOT concatenated together. Faster “Virtual” volumes are more resistant to Hot Spots. RAID5 72GB Drives RollBack - Dump Temp - /u01 For the same number of disks we see: Wasted disk but no Hot Spots. Six threads.
© Copyright 2004 Storage Technology Corporation (StorageTek) Stripe and Mirror Everything (SAME) Data Index Swap Rollback Dump Temp /u01 24 Disks – 1 Thread 16 Disks – 1 Thread Option # 2 Same # of disks Option # 1 Same Size DB
© Copyright 2004 Storage Technology Corporation (StorageTek) ORACLE RDBMS Layout RAID5 Group RAID5 Group Service Processor “B” Service Processor “A” RAID1 Group Data_vg RAID5(3+1) 73GB Disk drives 10K RPM ~219GB Volume Group Two ~110GB Volumes RAID1(1+1) 73GB Disk drives 10K RPM ~73GB Volume Group Two ~36GB Volumes Oracle_vg
© Copyright 2004 Storage Technology Corporation (StorageTek) Technologies roadmap Optical-electronic technologies 2000200520102015 DVD R/W (18GB today) Blue-Ray Disc 200 GB planned Holographic storage (3D) Colossal Storage Corp. project: (atomic holography recoding) density : 200 Tbits/in 2 Other Technologies (magnetic, MEMS..) 2000200520102015 Atomic resolution storage (1 000 Gbits/in 2 ) Heat-assisted magnetic recording (1000 Gbits/in 2 planned) Perpendicular magnetization AFM storage ( Millipede IBM project) Superparamagnetic effect (2005) 60 gbits/in 2 (source IBM) Self-Ordered Magnetic Arrays (SOMA)
© Copyright 2004 Storage Technology Corporation (StorageTek) Increasing the Areal Density SUPER PARAMAGNETIC LIMIT: >Min elementary bit size 9 nm >Could be reached by 2005 >60 - 70 Gb/in² is the limit TRANSITION WIDTH: >The width between two neighboring bits of opposite magnetization : minimum distance 40 to 80 nm SIDE TRACK EFFECT: >Requires extra space between tracks to prevent over-writing TRACKING: >The smaller the bits get the more difficult it is to read them Source: Seagate – From Gigabytes to Exabytes
© Copyright 2004 Storage Technology Corporation (StorageTek) Single Pole Perpendicular Magnetic Recording Source: Seagate – From Gigabytes to Exabytes High-density magnetic data storage Longitudinal recording methods lay the magnetic media in the plane of the recording surface Longitudinal Recording Perpendicular Recording may approach 1 Tb per in 2 1TB of data on a 3.5 inch disk 1TB of data on a tape cartridge Perpendicular recording methods stand the magnetization of the media on end, perpendicular to the plane of the recording surface
© Copyright 2004 Storage Technology Corporation (StorageTek) Heat Assisted Magnetic Recording Heat Assisted Magnetic Recording (HAMR) >Also know as Optically assisted recording >Involves producing a hot spot (commonly with a laser) on the media, while data is simultaneously written magnetically. >The net effect is that when the media is heated, the coercivity or field required to write on the media is reduced >Higher stability against superparagmagnetism Is it HAMR or OAR? A laser heats spots on the disk to make them easier to magnetize. Source: Seagate – From Gigabytes to Exabytes
© Copyright 2004 Storage Technology Corporation (StorageTek) Micro Electro-Mechanical System (MEMS) Thermomechanical storage: >Tiny depressions melted by an AFM tip into a polymer medium represent stored data bits that can then be read by the same tip >60Kbps Throughput – but there can be thousands of heads in an array >150 Gb/in² to 400 Gb/in² The Millipede (IBM) AFM (atomic-force microscopy) or “probe recording”
Questions, Concerns or Comments?
© Copyright 2004 Storage Technology Corporation (StorageTek) Holographic Memory Systems First to market >InPhase’s Tapestry product Initial capacity >100GB Transfer rate >20MB/sec. As optical storage goes, InPhase believes that dual-layer DVD will run out of capacity at 50GB. Still many challenges ahead >Materials >Laser technologies >Fast-frame-rate, large-array detectors Volume holography technologies
© Copyright 2004 Storage Technology Corporation (StorageTek) SQL Server Performance Considerations 1)Let SQL Server do most of the tuning. 2)Reduce I/O so that buffer cache is best utilized. 3)Create and maintain good indexes. 4)Multiple Instances a.Performance tuning can be complicated when running multiple instances of SQL Server 2000. 5)Partition large data sets and indexes. 6)Monitor disk I/O subsystem performance. 7)Tune applications and queries. 8)Optimize active data. a.As much as 80 percent of database activity may be due to the most recently loaded data. Source: Microsoft – Partitioning Very Large SQL Server Databases
© Copyright 2004 Storage Technology Corporation (StorageTek) SQL Server Partitioning SQL Server activity to these objects can be separated across different hard drives, RAID controllers, and PCI channels (or combinations of the three): Transaction Logs Database tempdb Tables Non-clustered Indexes Provides the most flexibility allowing separate RAID channels to be associated with the different areas of activity. Takes full advantage of online RAID expansion. Easily relates disk queuing associated with each activity back to a distinct RAID channel. Disk queuing issues are then simply resolved by adding more drives to the RAID array. Source: Microsoft – Partitioning Very Large SQL Server Databases
© Copyright 2004 Storage Technology Corporation (StorageTek) SQL Server Layout RAID5 Group RAID5 Group Service Processor “B” Service Processor “A” RAID1 Group RAID5(3+1) 73GB Disk drives 10K RPM ~219GB Volume Group Two ~110GB Volumes RAID1(1+1) 73GB Disk drives 10K RPM ~73GB Volume Group Two ~36GB Volumes tempdbDatabase and TablesTransaction Logs
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM J. Wilkes, R. Golding, C. Staelin T. Sullivan HP Laboratories, Palo Alto, CA.
Page 1 Mass Storage 성능 분석 강사 : 이 경근 대리 HPCS/SDO/MC.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
DISKS IS421. DISK A disk consists of Read/write head, and arm A platter is divided into Tracks and sector The R/W heads can R/W at the same time.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] = 1 block[i]
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
1 Components of the Virtual Memory System Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk.
OS and Hardware Tuning. Tuning Considerations Hardware Storage subsystem Configuring the disk array Using the controller cache Components upgrades.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
CSE521: Introduction to Computer Architecture Mazin Yousif I/O Subsystem RAID (Redundant Array of Independent Disks)
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
1 Storage Refinement. Outline Disk failures To attack Intermittent failures To attack Media Decay and Write failure –Checksum To attack Disk crash –RAID.
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
A Comparison of File System Workloads D. Roselli J. Lorch T. Anderson University of California, Berkeley.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure.
CS422 Principles of Database Systems Disk Access Chengyu Sun California State University, Los Angeles.
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Key Perf considerations & bottlenecks Windows Azure VM characteristics Monitoring TroubleshootingBest practices.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Database Storage Considerations Adam Backman White Star Software DB-05:
CS 6560: Operating Systems Design
Maximizing Performance – Why is the disk subsystem crucial to console performance and what’s the best disk configuration. Extending Performance – How.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
LAN / WAN Business Proposal. What is a LAN or WAN? A LAN is a Local Area Network it usually connects all computers in one building or several building.
Managing storage requirements in VMware Environments October 2009.
Module 3 - Storage MIS5122: Enterprise Architecture for IT Auditors.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
The concept of RAID in Databases By Junaid Ali Siddiqui.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
12/3/2004EE 42 fall 2004 lecture 391 Lecture #39: Magnetic memory storage Last lecture: –Dynamic Ram –E 2 memory This lecture: –Future memory technologies.
Overview of Physical Storage Media
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
INTRODUCTION TO ORACLE DATABASE ADMINISTRATION Lynnwood Brown President System Managers LLC Introduction – Lecture 1 Copyright System Managers LLC 2003.
Tertiary Storage Devices
The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan Hewlett-Packard Laboratories Presented by Sri.
DBI313. MetricOLTPDWLog Read/Write mixMostly reads, smaller # of rows at a time Scan intensive, large portions of data at a time, bulk loading Mostly.
4.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 4: Organizing a Disk for Data.
The Hard Drive By “The Back Table”.
Computer Organization CS224 Fall 2012 Lesson 51. Measuring I/O Performance I/O performance depends on l Hardware: CPU, memory, controllers, buses l.
© 2011 IBM Corporation Sizing Guidelines Jana Jamsek ATS Europe.
© 2017 SlidePlayer.com Inc. All rights reserved.