Presentation is loading. Please wait.

Presentation is loading. Please wait.

ANALYZING STORAGE SYSTEM WORKLOADS Paul G. Sikalinda, Pieter S. Kritzinger {psikalin, DNA Research Group Computer Science Department.

Similar presentations

Presentation on theme: "ANALYZING STORAGE SYSTEM WORKLOADS Paul G. Sikalinda, Pieter S. Kritzinger {psikalin, DNA Research Group Computer Science Department."— Presentation transcript:

1 ANALYZING STORAGE SYSTEM WORKLOADS Paul G. Sikalinda, Pieter S. Kritzinger {psikalin, psk}, DNA Research Group Computer Science Department University of Cape Town, and Lourens O. Walters. Mosaic Software Rondebosch Cape Town Republic of South Africa.

2 Presentation Outline Introduction Motivation and Objectives Storage Systems Storage System Workloads The Storage System Workload Analyzed Statistical Methodology Workload Analysis Results Conclusions Future Work 2

3 3 – specification of … – design of … – modelling of … – building of … – security of … – *workload analysis of … – correctness analysis of … – performance analysis of … concurrent computing systems (CCS). Introduction The DNA Group specializes, among other things, in using theory, formal methods and software tools in the:


5 Introduction (cont’d) RP RQ PROCESSOR ANALYZING STORAGE SYSTEM WORKLOADS 5 Start Address Operation Type Request Size Timestamps Etc. 5

6 Motivation and Objectives A lot of effort is being spent in improving the I/O subsystem because it is a bottleneck in current computer systems. -In design, performance and correctness evaluation of storage systems the workload modelling is an important component. Common assumption not correct: -Uniform distribution of start addresses, -Exponential inter-arrival times. Therefore storage system workload analysis should be done to come up with correct models. 6

7 Motivation and Objectives (cont’d) -Designing storage systems. -Designing I/O optimization techniques (read caching, write caching, pre-fetching, I/O parallelism, I/O rescheduling) to improve performance. -Understanding application behavior and requirements. -Deciding to pool storage system resources (SSPs). -Implementing intelligent storage systems. etc. 7

8 Motivation and Objectives (cont’d) Our aim was to analyze storage system workloads in terms of (a)inter-arrival times, (b)sizes and (c)“seek distances” of I/O requests and provide statistics for these parameters to be used to: (a) derive models for storage system evaluation and (b) design optimization techniques (read caching, I/O parallelism etc. ) 8

9 Storage Systems Enterprise Storage System (ESS) 9 Host/Bus adapter Cache Array controller Path to disks Path to cache Path to controller Path to host Disk drives

10 Storage Systems (cont’d) ESS are powerful disk storage systems with the following capabilities: -High performance*, -Large capacity and availability -Protection against physical drive failure can be provided using RAID methods. *But can not still match the processor speeds because of mechanical processes in the disk drives. 10

11 Storage System Workloads I/O Request Servicing and workload classification: -Logical Workloads (File System Workloads) -Storage System Workloads (Physical I/O Traffic) 11 Operating System File System Application Software Disk System I/O request

12 Storage System Workloads (cont’d) Workload Parameters: -Logical Volume Number -*Start Address (seek distances) -*Request Size -Operation Type (i.e., read or write) -*Time Stamp (inter-arrival times) 12

13 The Storage System Workload Analyzed We analyzed inter-arrival times, request sizes, and ”seek distances” of I/O requests from a system running a web search engine deviation. Got the I/O trace files from Storage Performance Council (SPC). ( 13

14 Statistical Methodology -Visual Techniques: -Histogram and -ECDF graphs. -Key Data Statistics -Sample mean, -Variance and standard deviation, -Coefficient of skew, kurtosis, and variation, -Five number data summaries (minimum, lower quartile, median, upper quartile, maximum). -Lower and upper outlier limits 14

15 Results 1: inter-arrival times (µm) Sample Size1055448 Five Number Summary(126, 242, 1695, 4487, 100100) Sample Mean2985.761 Sample Variance12508927 Standard Deviation3536.796 Coefficient of Variation1.184554 Coefficient of Skew2.142186 Coefficient of Kurtosis8.884555 Upper Outlier26142 15

16 Results 1: inter-arrival times -Highly variable data. Range (126, 100100 microseconds) -Coefficient of kurtosis shows that the distribution is heavy tailed. 16

17 Results 2: Request sizes (bytes) Sample Size1055449 Five Number Summary(512, 8192, 8192, 24580, 1138000) Sample Mean15510 Sample Variance102017528 Standard Deviation10100.37 Coefficient of Variation0.6512577 Coefficient of Skew 3.441212 Coefficient of Kurtosis287.6503 Upper Outlier106520 17

18 Results 2: Request sizes Distribution peaks – 8192 (60%), 16384(10%), 24576 (9%) and 32768 (20%). Reason: OS Filesystem Block - 8192 bytes 18

19 Results 3: Seek distances (blocks) Sample Size1055448 Five Number Summary(-34926160, -8581248, 6.4, 8580496, 34910700) Sample Mean27.95 Sample Variance170691900000000 Standard Deviation13064910 Coefficient of Skew0 Coefficient of Variation467398.8 Upper Outlier51482656 Lower Outlier-51482528 19

20 Results 3: Seek distances -The distribution of seek distances is symmetrical. 20

21 Conclusions (1) Analyzing storage system workloads is necessary to properly model the workloads: -To model Web inter-arrival time, Weibull, lognormal, beta, gamma, exponential probability density functions should be considered. -To model Web data size and seek distance using probability mass function is more appropriate. *We intend to use the models in simulations of ESS. 21

22 Conclusions (cont’d) (2) The analysis results are useful when designing optimization techniques of storage system. E.g., -Cache management block size – 8192 bytes. -I/O rescheduling and background tasking would be ideal for the workload. -The storage system handling the workload we analyzed can be optimized to handle the symmetrical behavior*. *The results are not broadly applicable. 22

23 Conclusions (cont’d) (3) Other conclusions: -Request sizes influenced by filesystem in use. -Seek distances are not always uniform distributed. *In summary, we have provided statistics about the parameters for the storage system workload that we analyzed and have shown how we can use them to derive models and design I/O optimization techniques. 23

24 Future Work -Rigorously find a probability density function matching a given data set of inter-arrival times. - Analyze the storage system workloads in terms of other parameters (e.g., logical volume numbers and operation types) 24


Download ppt "ANALYZING STORAGE SYSTEM WORKLOADS Paul G. Sikalinda, Pieter S. Kritzinger {psikalin, DNA Research Group Computer Science Department."

Similar presentations

Ads by Google