Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding I/O Performance with PATROL-Perform and PATROL-Predict

Similar presentations


Presentation on theme: "Understanding I/O Performance with PATROL-Perform and PATROL-Predict"— Presentation transcript:

1 Understanding I/O Performance with PATROL-Perform and PATROL-Predict
Debbie Sheetz Sr. Staff Consultant BMC Software

2 I/O Performance Analysis Overview
I/O metric definitions Baseline I/O performance analysis What–if I/O performance analysis C4P075

3 How Important is I/O to Performance?
Predict/Visualizer presents a unified view of the system so that the relative contributions of CPU and disk I/O can be assessed Don’t solve a problem that you don’t have CPU is the dominant factor here C4P075

4 Source of I/O Metrics Key to understanding I/O is to know your metrics
Disks are reported/collected as they are defined/known to UNIX or NT This may or may not correspond 1-to-1 to physical units Disk configuration is collected from standard interface for the particular OS Disk statistics are collected from standard interface for the particular OS (same metrics used by iostat, etc.) Analyze/Predict interprets and reports based on these metrics C4P075

5 I/O Configuration Collection Issues
Sometimes the disk configuration is reported as “Unknown” Three possible causes Disk configuration is not available from the OS Standard interface to OS fails to return the disk configuration Collected disk configuration is not matched by an entry in the hardware (.hrw) and .odm RAID is not collected directly This DOES NOT AFFECT the baseline metrics or baseline model calibration For certain ‘what-if’ disk modeling scenarios, the disk must be identified C4P075

6 Key I/O Metrics A few metrics tell most of the story about disk I/O
Disk throughput Data transferred (e.g. bytes, words, etc.) Disk reads/writes Disk accesses Disk utilization (active time) C4P075

7 I/O Metrics: Throughput
Data transferred (e.g. bytes, words, etc.) PATROL-Perform and Predict report I/Os in 4 KB units Consistency for reporting (Analyze, Visualizer, Predict) Ease of modeling I/O cross-node and cross-platform Units measured vary by platform HP, OSF: words Disk Statistics, Words Xfered Solaris, AIX: blocks Disk Statistics, Blocks Read/Written NT: bytes NT Physical Disk, Disk Read/Write C4P075

8 I/O Metrics: Throughput
Disk accesses (i.e. transfers) Number of times an I/O request was made of the disk Size of data transfer can vary Doesn’t matter where the I/O is actually serviced: Physical disk (seek, latency, and data transfer) Cache on the disk Cache on the disk controller Doesn’t matter whether RAID or non-RAID Similar metrics collected for UNIX/NT UNIX Disk Statistics, Transfers NT NT Physical Disk, Disk Transfers/Sec C4P075

9 I/O Metrics: Throughput
Disk reads/writes Number of times a read vs. write I/O request was made of the disk Size of data transfer can vary Different metrics collected for UNIX/NT Solaris, AIX Disk Statistics, Blocks Read/Written HP, OSF Not Available NT NT Physical Disk, Disk Read/Write Bytes/Sec Reported in Analyze/Predict in 4 KB=I/O rates C4P075

10 I/O Metrics: Utilization (Active Time)
Disk utilization (active time) Amount of time disk was observed to be actively servicing an I/O request Doesn’t matter where the I/O is actually serviced: Physical disk (seek, latency, and data transfer) Cache on the disk Cache on the disk controller Doesn’t matter whether RAID or non-RAID Should reflect the relative efficiency of I/O processing when compared with disk throughput measures Use disk service time for this (service time = utilization / IOs) C4P075

11 I/O Metrics: Utilization (Active Time)
Disk active time Different metrics collected for UNIX/NT UNIX Disk Statistics, Active Time NT NT Physical Disk, % Disk Time Windows NT Physical Disk, % Idle Time Windows/NT metrics are reinterpreted by Analyze Perfmon caps calculated utilization at 100% Observations of collected Windows/NT disk data show utilizations well over 100% Analyze scales all collected NT times down Perfmon and Analyze/Predict will not match C4P075

12 I/O Metrics Collection Issues
If “iostat” can’t see it, the collector can’t collect it The OS is supplying the metrics If the metrics are missing or incorrect, both “iostat” and PATROL-Perform/Predict, etc. will report the same Problem needs to be addressed by the OS vendor Refer any questions about valid I/O metrics to BMC Technical Support Always need to know the exact platform (e.g. HP 11.00, 64-bit) Run iostat and the collector in parallel Use current collector for the platform C4P075

13 Baseline I/O Performance Analysis Overview
Observe key disk I/O metrics from baseline measurements Identify I/O patterns For the system For a disk or group of disks Distribution amongst disks For a workload/transaction Determine how important I/O is to overall performance C4P075

14 Baseline I/O Performance Analysis Overview
Observe key disk I/O metrics from baseline measurements Identify I/O performance characteristics Relative speed of I/O processing Read/write ratios Blocksize used Disk utilization objectives Distribution amongst disks C4P075

15 Baseline Case Study CPU pattern doesn’t precisely match I/O pattern

16 Baseline Case Study I/O is dominated by one oracle instance, but there are other contributors Study patterns within days and across days, weeks, etc. C4P075

17 Baseline Case Study I/O is the major component of response time during prime time C4P075

18 Baseline Case Study Distribution of I/O amongst disks is fairly even
C4P075

19 I/O Analysis Technique: CUTDISK
How to filter I/O data so only the important disks are studied? Use “CUT DISK” feature In Analyze In Manager If already specified in .an file input to Manager, don’t need Manager specification, too Analyze/Predict reports shorter, Visualizer files smaller, Visualizer database smaller, Visualizer graphics easier to present C4P075

20 I/O Analysis Technique: CUTDISK
Concept is to aggregate I/O from less utilized disks, preserve important disks individually I/Os are NOT removed from the model Choose appropriate threshold I/O rate or Disk utilization may be used Threshold value can be set for a specific purpose Setting of 0 removes only disks which are not used at all Setting of 5% utilization removes most disks Paging disks are never removed C4P075

21 I/O Analysis Technique: CUTDISK
Specify under Options, Cut Disk Options in Analyze C4P075

22 I/O Analysis Technique: CUTDISK
Specify under Options, Advanced Features in Manager C4P075

23 Baseline Case Study Observe Disk Utilization patterns
Utilizations mostly even, most under 40% C4P075

24 Baseline Case Study Observe Disk processing efficiency
Looks good! Most service times under 5 ms per 4 KB transfer. A few outliers could use a closer look … C4P075

25 Baseline Case Study Look at ssd4
High service time isn’t so high after all: transfers divided by 9.85 I/Os is That means service time is for 1.3 actual data transfers or 9.3 ms per physical transfer. C4P075

26 Baseline Case Study Look at ssd3
High service time isn’t really high here either: transfers divided by 1.37 I/Os is That means service time is for 7.8 actual data transfers or 6.9 ms. Another way to think about this is that the average blocksize is 4 KB / 7.8 or .5 KB. C4P075

27 Baseline Case Study In fact, good (larger) blocksizes explain the good disk performance These graphics show roughly a 2:1 ratio between I/Os and transfers, or an 8 KB blocksize C4P075

28 Baseline Case Study Conclusion
Even though I/O is a major contributor to response time, there are no obvious tuning opportunities Continue to study the key I/O metrics over time Identify trends in I/O performance C4P075

29 What-if I/O Performance Analysis Overview
Via the Predict model, you can change: I/O patterns For the system Change in workload volume Change in the types of workloads For a disk or group of disks Distribution amongst disks Change in amount of transaction I/O required C4P075

30 What-if I/O Performance Analysis Overview
Via the Predict model, you can change: I/O performance characteristics Relative speed of I/O processing Disk configuration change Blocksize used C4P075

31 What-if I/O Performance Analysis Overview
Predict shows how this affects performance Performance objectives Workload/transaction response objectives Disk utilization objectives Reports I/O patterns System Distribution amongst disks Reports individual disk performance Can view results in Predict and/or Visualizer C4P075

32 What-if Case Study Management wants to know how performance will change if a new RAID disk technology is implemented Study strategy Perform Visualizer analysis of baseline I/O performance characteristics, build baseline model Perform Visualizer analysis of benchmark of I/O using new disk technology (IBM “Shark”) Use Predict to do ‘what-if’ C4P075

33 What-if Case Study: Benchmark Data Analysis
Benchmark demonstrates substantial I/O rate Since current system has high I/O rates, a subset of the benchmark will be studied C4P075

34 What-if Case Study: Benchmark Data Analysis
Selected subset of the benchmark C4P075

35 What-if Case Study: Benchmark Data Analysis
Key I/O characteristics: I/Os vs. transfers Ratio of I/Os to transfers is about 5.7, or 23 KB per native I/O access C4P075

36 What-if Case Study: Benchmark Data Analysis
Key I/O characteristic: reads vs. writes Ratio of reads to writes is about 1.5:1 C4P075

37 What-if Case Study: Benchmark Data Analysis
Key I/O characteristic: service time for 4 KB I/O Predominant service time is about .5 ms C4P075

38 What-if Case Study: Benchmark Data Analysis
Key I/O characteristic: service time for 4 KB I/O View by controller, disks over 5% utilization Note less efficiency at lower I/O load C4P075

39 What-if Case Study: Change Model
Only one change is needed in the Predict model Set the disk service time/IO according to the benchmark DO NOT use the hardware table method because more specific info is available Hardware table method applies ratio of new disk type to current disk type Both disk types must be in the hardware table Baseline disk type must be specified C4P075

40 What-if Case Study: Change Model
Model must be baselined Two methods for changing service time Edit the disk service time/IO in the GUI Use a command file if there are many disks Command file format MODIFY DISK hdisk10 EDISKTIME .5 MODIFY DISK hdisk11 Etc. C4P075

41 What-if Case Study: Modeling Results
Model is evaluated and net change is observed << Baseline What–if >> C4P075

42 What-if Case Study: Modeling Results
Relative reduction in response time reported with relative response time Reduction of 26% for the workload of interest C4P075

43 What-if Case Study: Modeling Results
Why not a larger reduction? << Baseline What-if >> New service time/utilization is about 75% of baseline (.5 ms / .65 ms) for the disks doing the most I/O C4P075

44 What-if Case Study: Modeling Results
What else will improve performance more? More even I/O distribution in benchmark C4P075

45 What-if Case Study: Modeling Results
What else will improve performance more? Possible use of more optimistic service time, e.g. .45 ms observed with CUTDISK set at 100 IO/sec Should confirm with more benchmark data and/or vendor C4P075

46 What-if Case Study Conclusion
Change to new technology will Reduce I/O service time Reduce I/O wait time From reduced utilization (due to service time decrease) From better I/O distribution (due to more even utilizations) Reduction not as large as expected because current I/O performance is already good (.65 ms vs. .5 ms) Allows for additional workload growth compared with current technology C4P075


Download ppt "Understanding I/O Performance with PATROL-Perform and PATROL-Predict"

Similar presentations


Ads by Google