Storage Optimization Strategies Techniques for configuring your Progress OpenEdge Database in order to minimize IO operations Tom Bascom, White Star Software.

Slides:



Advertisements
Similar presentations
Storage Optimization Strategies Techniques for configuring your Progress OpenEdge Database in order to minimize IO operations.
Advertisements

Chapter 6: Memory Management
DB-03: A Tour of the OpenEdge™ RDBMS Storage Architecture Richard Banville Technical Fellow.
T OP N P ERFORMANCE T IPS Adam Backman Partner, White Star Software.
ProTop version 3 – An open source Progress database performance monitor ProTop is a free, Open Source database monitor for Progress OpenEdge databases.
ProTop version 3 – An open source Progress database performance monitor ProTop is a free, Open Source database monitor for Progress OpenEdge databases.
Database Storage for Dummies Adam Backman President – White Star Software, LLC.
Numbers, We don’t need no stinkin’ numbers Adam Backman Vice President DBAppraise, Llc.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
DB-13: Database Health Checks How to tell if you’re heading for The Wall Richard Shulman Principal Support Engineer.
File Systems.
Segmentation and Paging Considerations
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
Lecture 11: Memory Management
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
Harvard University Oracle Database Administration Session 5 Data Storage.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
By : Nabeel Ahmed Superior University Grw Campus.
Simplify your Job – Automatic Storage Management Angelo Session id:
Oracle Database Administration Database files Logical database structures.
IT – DBMS Concepts Relational Database Theory.
DISKS IS421. DISK  A disk consists of Read/write head, and arm  A platter is divided into Tracks and sector  The R/W heads can R/W at the same time.
Database Storage Considerations Adam Backman White Star Software DB-05:
Objectives Learn what a file system does
Chapter 5 Part 2 Secondary Storage Mgt. File Mgt. in Popular OSs
Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Top Performance Enhancers Top Performance Killers in Progress Dan Foreman Progress Expert
1 Growth: It's a Good Problem To Have! But what are you going to do about it? Abstract: Many partners start out with a great idea, create a fantastic product,
Top 10 Performance Hints Adam Backman White Star Software
Strength. Strategy. Stability.. Progress Performance Monitoring and Tuning Dan Foreman Progress Expert BravePoint BravePoint
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
A first look at table partitioning PUG Challenge Americas Richard Banville & Havard Danielsen OpenEdge Development June 9, 2014.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
File Storage Organization The majority of space on a device is reserved for the storage of files. When files are created and modified physical blocks are.
Module 4.0: File Systems File is a contiguous logical address space.
1 Chapter 17 Shared Memory Contention. 2 Overview Specifically talking about SGA – Buffer Cache – Redo Log Buffer Contention in these areas of SGA – Can.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
I MPLEMENTING FILES. Contiguous Allocation:  The simplest allocation scheme is to store each file as a contiguous run of disk blocks (a 50-KB file would.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems File systems.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Using Progress® Analytical Tools Adam Backman White Star Software DONE-05:
Storage Systems CSE 598d, Spring 2007 OS Support for DB Management DB File System April 3, 2007 Mark Johnson.
for all Hyperion video tutorial/Training/Certification/Material Essbase Optimization Techniques by Amit.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 9: Virtual Memory.
Chapter 7 Memory Management Eighth Edition William Stallings Operating Systems: Internals and Design Principles.
Part III Storage Management
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
Select Operation Strategies And Indexing (Chapter 8)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
Tales of the Secret Bunker 2016 (231) Dump and Load Edition Mike Furgal – Director MDBA and Pro2 Services Gus Bjorklund - Lackey.
OpenEdge Standard Storage Areas
CSE 120 Principles of Operating
FileSystems.
OpenEdge Standard Storage Areas
Walking Through A Database Health Check
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
So far… Text RO …. printf() RW link printf Linking, loading
File Storage and Indexing
Four Rules For Columnstore Query Performance
OPS-14: Effective OpenEdge® Database Configuration
Virtual Memory: Working Sets
Lecture Topics: 11/20 HW 7 What happens on a memory reference Traps
Presentation transcript:

Storage Optimization Strategies Techniques for configuring your Progress OpenEdge Database in order to minimize IO operations Tom Bascom, White Star Software

A Few Words about the Speaker Tom Bascom; Progress user & roaming DBA since 1987 President, DBAppraise, LLC – Remote database management service for OpenEdge. – Simplifying the job of managing and monitoring the world’s best business applications. – VP, White Star Software, LLC – Expert consulting services related to all aspects of Progress and OpenEdge. – 2

We Will NOT be Talking about: SANs Servers Operating systems RAID levels … and so forth. 3

What Do We Mean by “Storage Optimization”? The trade press thinks it means BIG DISKS. Your CFO thinks it means BIG SAVINGS. Programmers think it means BIG DATABASES. SAN vendors think it means BIG COMMISSIONS. DBAs seek the best possible reliability and performance at a reasonable cost. 4

5 The Foundation of OpenEdge Storage Optimization

Type 2 Storage Areas Type 2 storage areas are the foundation for all advanced features of the OpenEdge database. Type 2 areas have cluster sizes of 8, 64 or 512. Cluster sizes of 0 or 1 are Type 1 areas. Data blocks in Type 2 areas contain data from just one table. # misc32 storage area d “misc32_dat":11,32;8. 6

Only Read What You Need Because data blocks in Type 2 storage areas are “asocial”: – Locality of reference is leveraged more strongly. – Table-oriented utilities such as index rebuild, binary dump and so forth know exactly which blocks they need to read and which blocks they do not need to read. – DB features, such as the SQL-92 fast table scan and fast table drop, can operate much more effectively. 7

MYTH Storage optimization is just for large tables. Type 2 storage areas are just for large tables. 8

Truth Very small, yet active tables often dominate an application’s IO profile. And type 2 areas are a very powerful tool for addressing this. 9

Case Study A system with 30,000 record reads/sec: – The bulk of the reads were from one 10,000 record table. – Coincidentally, Big B was set to 10,000. – That table was in a Type 1 area, and its records were widely scattered. – Moving the table to a Type 2 Area patched the problem. Only 2% of –B was now needed for this table! – Performance improved dramatically. 10

Type 2 Storage Area Usage Always use type 2 areas… … for areas that contain data, indexes or LOBS. The schema area is a type 1 area  11

12 How to Define Your Storage Areas

Use the Largest DB Block Size Large blocks reduce IO; fewer operations are needed to move the same amount of data. More data can be packed into the same space because there is proportionally less overhead. Because a large block can contain more data, it has improved odds of being a cache “hit.” Large blocks enable HW features to be leveraged: especially SAN HW. 13

What about Windows? There are those who would say “except for Windows.” (Because Windows is a 4K-oriented OS.) I have had good success with Windows & 8k blocks. NTFS can be changed to use an 8k block… 14

Use Many (Type 2) Storage Areas Do NOT assign tables to areas based on “function.” Instead group objects by common “technical attributes.” Create distinct storage areas for: – Each very large table – Tables with common Rows Per Block settings – Indexes versus data 15

16 Record Fragmentation

Fragmentation and Scatter “Fragmentation” is splitting records into multiple pieces. “Scatter” is the distance between (logically) adjacent records. 17

$ proutil dbname –C dbanalys > dbname.dba … RECORD BLOCK SUMMARY FOR AREA "APP_FLAGS_Dat" : Record Size (B) -Fragments- Scatter Table Records Size Min Max Mean Count Factor Factor PUB.APP_FLAGS M … Fragmentation and Scatter “Fragmentation” is splitting records into multiple pieces. “Scatter” is the distance between (logically) adjacent records. 18

Fragmentation and Scatter “Fragmentation” is splitting records into multiple pieces. “Scatter” is the distance between (logically) adjacent records. 19 $ proutil dbname –C dbanalys > dbname.dba … RECORD BLOCK SUMMARY FOR AREA "APP_FLAGS_Dat" : Record Size (B) -Fragments- Scatter Table Records Size Min Max Mean Count Factor Factor PUB.APP_FLAGS M …

Create Limit The minimum free space in a block Provides room for routine record expansion OE10.2B default is 150 (4k & 8k blocks) Must be smaller than the toss limit Only rarely worth adjusting 20

Toss Limit The minimum free space required to be on the “RM Chain” Avoids looking for space in blocks that don’t have much Must be set higher than Create Limit. Default is 300 (4k & 8k blocks) Ideally should be less than average row size Only rarely worth adjusting 21

22 Fragmentation, Create & Toss Summary

Create and Toss Limit Usage SymptomAction Fragmentation occurs on updates to existing records. You anticipated one fragment, but two were created. Increase Create Limit - or - Decrease Rows Per Block There is limited (or no) fragmentation, but database block space is being used inefficiently, and records are not expected to grow beyond their original size. Decrease Create Limit - or - Increase Rows Per Block You have many (thousands, not hundreds) of blocks on the RM chain with insufficient space to create new records. Increase Toss Limit 23 * Create and Toss limits are per area for Type 1 areas and per table for Type 2 areas.

24 Rows Per Block

Why not “One Size Fits All”? A universal setting such as 128 rows per block seems simple. And for many situations it is adequate. But… Too large a value may lead to fragmentation and too small to wasted space. It also makes advanced data analysis more difficult. And it really isn’t that hard to pick good values for RPB. 25

Set Rows Per Block Optimally Use the largest Rows Per Block that: – Fills the block – But does not unnecessarily fragment it Rough Guideline: – Next power of 2 after BlockSize / (AvgRecSize + 20) – Example: 8192 / ( ) = 34, next power of 2 = 64 Caveat: there are far more complex rules that can be used and a great deal depends on the application’s record creation & update behavior. # misc32 storage area d “misc32_dat":11, 32 ;8. 26

27 RPB Example

Set Rows Per Block Optimally BlkSzRPBBlocksDisk (KB) Waste/ Blk %UsedActual RPB IO/1,000 Recs 143, % ,50010,0002,96523% ,2505,0002,07546% , % , % ,50020,0007,06011% ,0004, , % , % , %3529 Original 28

Set Rows Per Block Optimally BlkSzRPBBlocksDisk (KB) Waste/ Blk %UsedActual RPB IO/1,000 Recs 143, % ,50010,0002,96523% ,2505,0002,07546% , % , % ,50020,0007,06011% ,0004, , % , % , %3529 Original Oops! 29

Set Rows Per Block Optimally BlkSzRPBBlocksDisk (KB) Waste/ Blk %UsedActual RPB IO/1,000 Recs 143, % ,50010,0002,96523% ,2505,0002,07546% , % , % ,50020,0007,06011% ,0004, , % , % , %3529 Original Suggested Oops! 30

Rows Per Block Caveats Blocks have overhead, which varies by storage area type, block size, Progress version and by tweaking the create and toss limits. Not all data behaves the same: – Records that are created small and that grow frequently may tend to fragment if RPB is too high. – Record size distribution is not always Gaussian. If you’re unsure – round up! 31

32 Cluster Size

Blocks Per Cluster When a type 2 area expands it will do so a cluster at a time. Larger clusters are more efficient: – Expansion occurs less frequently. – Disk space is more likely to be contiguously arranged. # misc32 storage area d “misc32_dat":11,32; 8. 33

Why not “One Size Fits All”? A universal setting such as 512 blocks per cluster seems simple… 34

Set Cluster Size Optimally There is no advantage to having a cluster more than twice the size of the table. Except that you need a cluster size of at least 8 to be Type 2. Indexes are usually much smaller than data. There may be dramatic differences in the size of indexes even on the same table. 35

Different Index Sizes $ proutil dbname –C dbanalys > dbname.dba … RECORD BLOCK SUMMARY FOR AREA "APP_FLAGS_Dat" : Record Size (B) -Fragments- Scatter Table Records Size Min Max Mean Count Factor Factor PUB.APP_FLAGS M … INDEX BLOCK SUMMARY FOR AREA "APP_FLAGS_Idx" : Table Index Flds Lvls Blks Size %Util Factor PUB.APP_FLAGS AppNo M FaxDateTime K FaxUserNotified K

37 Logical Scatter

Logical Scatter Case Study 38 A process reading approximately 1,000,000 records. An initial run time of 2 hours. – 139 records/sec. Un-optimized database.

Perform IO in the Optimal Order TableIndex%Sequential%Idx UsedDensity Table1t1_idx1*0%100%0.09 t1_idx20% 0.09 Table2t2_idx169%99%0.51 t2_idx2*98%1%0.51 t2_idx374%0%0.51 4k DB Block 39

Perform IO in the Optimal Order TableIndex%Sequential%Idx UsedDensity Table1t1_idx1*0%100%0.09 t1_idx20% 0.09 Table2t2_idx169%99%0.51 t2_idx2*98%1%0.51 t2_idx374%0%0.51 TableIndex%Sequential%Idx UsedDensity Table1t1_idx1*71%100%0.10 t1_idx263%0%0.10 Table2t2_idx185%99%1.00 t2_idx2*100%1%1.00 t2_idx383%0%0.99 4k DB Block 8k DB Block 40

Perform IO in the Optimal Order TableIndex%Sequential%Idx UsedDensity Table1t1_idx1*0%100%0.09 t1_idx20% 0.09 Table2t2_idx169%99%0.51 t2_idx2*98%1%0.51 t2_idx374%0%0.51 TableIndex%Sequential%Idx UsedDensity Table1t1_idx1*71%100%0.10 t1_idx263%0%0.10 Table2t2_idx185%99%1.00 t2_idx2*100%1%1.00 t2_idx383%0%0.99 4k DB Block 8k DB Block 41 Oops!

Logical Scatter Case Study Block Size Hit Ratio %SequentialBlock ReferencesIO OpsTime 4k ,71919, k ,1499, k ,3506, k ,0269, k ,8054, k ,0083,19220 The process was improved from an initial runtime of roughly 2 hours (top line, in red) to approximately 20 minutes (bottom) by moving from 4k blocks and 69% sequential access at a hit ratio of approximately 95% to 8k blocks, 85% sequential access and a hit ratio of 99%. 42

43 Avoid IO, But If You Must…

… in Big B You Should Trust! LayerTime# of Recs # of OpsCost per Op Relative Progress to –B ,000203, B to FS Cache ,00026, FS Cache to SAN ,00026, B to SAN Cache* ,00026, SAN Cache to Disk ,00026, B to Disk ,00026, * Used concurrent IO to eliminate FS cache 44

New Feature! 10.2B supports a new feature called “Alternate Buffer Pool.” This can be used to isolate specified database objects (tables and/or indexes). The alternate buffer pool has its own distinct –B. If the database objects are smaller than –B, there is no need for the LRU algorithm. This can result in major performance improvements for small, but very active, tables. proutil dbname –C enableB2 areaname Table and Index level selection is for Type 2 only! 10.2B supports a new feature called “Alternate Buffer Pool.” This can be used to isolate specified database objects (tables and/or indexes). The alternate buffer pool has its own distinct –B. If the database objects are smaller than –B, there is no need for the LRU algorithm. This can result in major performance improvements for small, but very active, tables. proutil dbname –C enableB2 areaname Table and Index level selection is for Type 2 only! 45

Conclusion Always use Type 2 storage areas. Define your storage areas based on technical attributes of the data. Static analysis isn’t enough – you need to also monitor runtime behaviors. White Star Software has a great deal of experience in optimizing storage. We would be happy to engage with any customer that would like our help! 46

Thank You! 47

Questions? 48