3Database Internals Internal Blocks Physical Layout Data blocks Index blocksOther block typesPhysical LayoutData storage areasPrimary recovery areaAfter image journalOther storage
4Database Blocks Master Blocks Data Block (RM Block) Index Block (Ix Block)Index Anchor BlockFree BlocksEmpty BlocksNotes:
5Master BlockThis stores the “master” information for the database including:Area Status (opened, closed, crashed)Last opened date & timeHigh water mark for the areaLast backup date & timeThis information and more can be retrieved through the virtual system tables (VSTs)Notes:
6Data Blocks These are also known as RM (Record Manager) blocks Can contain information from one or more tablesThey can be “full” (RM Blocks)Partially full (RM Chain blocks)Notes:
7Record StorageIn most environments, records are mixed from different tables in the same blockProgress can store from 1 to 256 records per block per storage areaAll areas for a database must have the same block size (1 – 8kb)Total records per area is fixedMore records per block equals lower total blocksNotes:Prior to version 9 any block 4k or less could only have 32 records and 8k blocks could have 64 records.
8Index Blocks Also known as IX blocks Only contain information from one indexAlways considered partially full, blocks will split to accommodate growthEach block contains the address of itself, the next and previous blocks to support forward and reverse searchesNotes:
9Index Structure Balanced B-tree Compressed data All data access* through index*Except rowid accessNotes:
11Free Blocks Contain address information No affiliation with IX or RM until utilizedUnder high water mark of the databaseNotes:
12Empty Blocks White space No addresses Under total blocks of the database (area)Above high water markNotes:No individual block allocationReduced fragmentation of the databaseUtilities (Example: index rebuild) do not look at these blocks
13Storage Areas Data objects Control Area Schema/Default Area Primary recovery areaApplication Data area(s)Notes:
14Storage Areas – Data Objects Index objectTable objectSchema objectSequencesNotes:Index objects contain index information. Each index object contains information about only 1 index.Table objects contain table information. Each index object contains information about only 1 table.There is only 1 schema object per database which contains information about all tables and indexes stored in the database.Sequences are stored in the schema (default) area but are treated differently than schema information. Think of this information as “quasi-schema”.
15Control Area Always has a .db extension Describes the “physical schema” or layout of the databaseLists storage areas and extents associated with a databaseAlso known as the structure fileNotes:
16Default Storage Area Also known as the “schema” area This is always area 6All information that is not assigned a storage area will be stored hereAll information is stored in this area if the data is converted from version 8 with a conv89Notes:
17Application Data Areas An area can contain 1 or more data objectsAn area can have 1 or more extentsTables and indexes can share areasThese use area numbersNotes:The goal of areas should be to reduce downtime and improve performanceData that is sequentially may perform better if isolated to it’s own areaThe number of records per block is tunable per area
18Primary Recovery Area General information Format Reuse Same as bi file(s) in version 8 and earlierThis is always area number 3Notes:
19Primary Recovery Area General Info This process is automatic and can’t be turned offThis file is vital to database integrity both physical and logicalThis file is generally sequentialNotes:
20Primary Recovery Area Format Each block is a clusterEach cluster contains information regarding transactions to allow transaction undo and redoThe transaction information is called notesEach note contains a transaction idNotes:
21Primary Recovery Area Reuse Clusters fill sequentiallyWhen the last formatted cluster is reached there is a reuse decision pointThe “oldest” cluster is examined to determine if it can be reusedThen, either the oldest cluster is reused or another cluster is added, formatted and usedNotes:Cluster reuse decision point.Once there are no more formatted clusters to use the decision point process happens at every checkpoint.A checkpoint is a synchronization point between memory, the database and the before image file that occurs every time a before image cluster is filled.Are all of the transactions in the “oldest” cluster completed?Has the before image truncate interval (-G Default is 60 seconds) expired?
22Progress Memory Architecture The database engine can be serverlessThe database engine can be multi-serverProgress applications can be host-basedProgress applications can be 2-tier client/serverProgress applications can be n-tier client/serverNotes:
23Shared Memory Host-based Configuration Record locks(-L)Buffers (-B)Index Cursors(-c)After ImageBuffersNotes:This is a visual representation of the Progress database broker process. It is a misnomer to call this process a server as it is commonly called as each local user is self service. The local users connect directly to shared memory and update the structures without the help of a server process.The latch control table ensures that two people cannot update the same portion of memory at the same time.Before ImageBuffersUserControlTableServerControlTableLatchControlTableOtherStuffHash Table
24What are Latches? Concurrency control mechanism Very course in old versions of ProgressMore granular in current versions of Progress
25Shared Memory Client/Server Configuration Record locks(-L)ListenSocketBuffers (-B)Index Cursors(-c)After ImageBuffersNotes:The listen socket “listens” for requests for connection to come in over the network and then spawns servers or connects the user to an existing server for the duration of their session. All subsequent requests from that session will be directed to the server.A session will maintain a connection to a single server until it is disconnected.Before ImageBuffersUserControlTableServerControlTableLatchControlTableOtherStuffServersHash Table
26Shared Memory Client/Server Configuration ListenSocketDatabaseBrokerMemoryNotes:The listen socket “listens” for requests for connection to come in over the network and then spawns servers or connects the user to an existing server for the duration of their session. All subsequent requests from that session will be directed to the server.A session will maintain a connection to a single server until it is disconnected.AppServerAppServerServers
27Hardware Configurations Disk ConsiderationsMemory AllocationCPU Considerations
28Disk ContentionIn most environments disks are the largest area for improvement. All of the data flows from the disks to the other resources so this effects both local and networked usersNotes:Disk at the root of 90% of performance problems. It may be the application asking for too much data (improper index use), a bad layout or not enough buffers but in any case the disks play a large role.
29Balancing Disk I/OBalancing disk I/O is the process of making sure you are using all of the available disk resources (filesystems, disks and controllers) are working equally as hard at load. This is also called eliminating variance. A well tuned system will have less than a 15% varianceNotes:
30What Causes Disk I/O? Operating system (swapping and paging) Progress Database (DB and BI)Application (code and temp files)Other applicationsNotes:
31What RAID Really Means RAID has many levels. I will only cover a few RAID 0: This level is also called striping.RAID 1: This is referred to as mirroring.RAID 5: Most common RAID levelRAID 10: This is mirroring and striping. Also known as RAID 0 + 1Notes:
32Raid 0: Striping Disk 1 Disk 2 Disk 3 Volume Set Disk Array Stripe 1 Notes:
33Raid 0: Striping (continued) Good for read and write I/O performanceNo failover protectionLower data reliability (1 fails they all fail)Notes:
34Raid 1: Mirroring Primary Parity Disk 1 Disk 2 Parity 1 Parity 2 Notes:
35Raid 1: Mirroring (continued) OK for read and write applicationsGood failover protectionHigh data reliabilityMost expensive in terms of hardwareNotes:
36Raid 5: Poor Man’s Mirroring This is the kiss of death for OLTP performanceUser information is stripedParity information is striped WITH user informationOK for 100% read only applicationsPoor performance for writesNotes:RAID 5’s problem lies in it’s write overhead. Each write must do the following:Write the primary informationCalculate the parity information (compress)Write the calculated parity information.Software RAID compounds the problem as the calculations of parity contend for CPU with the user processes.Manufacturers will contend that they have “solved” the problem with hardware accelerators but RAID 5 will never be able to perform as well as standard mirroring (RAID 1).
37Raid 10: Mirroring and Striping Good for read and write applicationsHigh level of data reliability though not as high as RAID 1 due to stripingJust as expensive as RAID 1Notes:
38Software Methods for I/O Distribution Manual spread of data across non-striped disksBetter control as you can see where the I/O is goingMore attention by system administrator is neededNotes:
39Options Progress multi-volume Progress storage areas 8K database block sizeBI Cluster sizeUse page writersMove the temp-file I/O with -TLocation of application filesUse of program libraries to reduce I/ONotes:
40Multi-Volume Database Progress-specific way to distribute I/OOnly way to eliminate I/O indirection in a Progress environmentOnly way to pre-allocate database blocksEvery database is multi-volume in Progress version 9Notes:I/O indirection is when a file is so large that all of it’s addresses cannot be stored in a single i-node table (the i-node table stores the logical to physical address map for the file). If the file is very large (over 500MB) there is a probability that the file will require multiple tables to store the data and additional seeks will be required to locate information on the disk.OS manufacturers will content that indirection will not occur until files are greater the 2GB but testing has proven that the value is much lower. 500MB is a safe choice but if you have a very large database you can use 1GB extents and probably will not see indirection.
41Storage Areas Benefits Drawbacks Greater control of location of data Minimize downtime for utilitiesStripe some, leave some on straight disksDrawbacksMore things to breakMore complex to monitorNotes:
42Storage Areas - Control A storage area can hold 1 or more data objects (index, table, schema, …)Separate schema from data if possibleTry to keep the number of areas manageable, only add more areas for valid business reasonsNotes:
43Minimize DowntimeSmaller data areas allow utilities, such as off line index rebuild, to run faster as they have less blocks to scanNotes:
44Database Administration Tools Backup and restoreAfter image journalingOther Utilities
45probkup Pros Cons Progress aware Supports online backup easy Slower than OS methodsDoes not backup more than the databaseNotes:Progress aware:knows the structure of the databaseKnow the database status (running, down, crashed, …)Support incremental backupsWill not backup empty blocksCompresses free blocksThis is the only utility that can backup the database while it is running in multi-user mode.probkup only backs up the database and before image files. It does not get application files, the log file, or after image files.OS utilities must backup the entire database including the empty blocks at the end of each storage area.
46prorest Utility to restore a Progress backup Can restore to a different structure provided there are enough storage areasSyntax:prorest dbname device_or_filename [-list | vp | -vf]Notes:-listProvides a description of all application data storage areas contained within a database backup. The information can be used to create a new structure description file.-vpThis is a verification pass on the tape. Probkup reads the backup volumes and computes and compares the backup blocks CRC with those in the block headers.-vfSpecifies that the restore utility compares the backup to the database block-for-block. This will not work if the database you are trying to compare with is in use.
47After Imaging Pros Cons Allows you to recover to present Recover from media failureOnly way to “repair” catastrophic user errorConsAdditional point of failureAdds complexity to the systemPerformance impactNotes:
48How After Imaging Works FOR EACH CUSTOMER:UPDATE CUSTOMER.END.Before image note writtenAfter image note writtenNotes:
49How to Integrate After Imaging In conjunction with a backup siteTo update a report serverAs a means of backupNotes:Every high availability system should have a tested backup strategy that includes after imaging.
50AI to Update a Backup Site Poor man’s replicationAllows for periodic update of a copy of the databaseThe copy can then be backed up with a conventional backup mechanismNotes:Progress supports software replication for sites that need 100% up-to-date replication. In most cases, a system that includes asynchronous replication using after imaging will provide the proper level of protection from data loss.
51AI to Update a Report Server Similar to keeping a backup siteRequires two copies of the database in addition to the original (one for update and a second for reporting)The reporting database is a copy of the backup that is done periodically to keep the data synchronized
52AI as a Means of Backup Not generally a good idea Increased recovery timeReduced reliabilityBackup the database each weekendBackup the AI file(s) each weeknightNotes:
53Progress® Utilities Index rebuild DB analysis idxbuild idxfix idxcompactDB analysisNotes:
55idxbuild Can only be run on a database that has been shutdown Can be run on 1 or more indexesSyntax:proutil <dbname> -C idxbuild [-TB n] [-TM n] [-T dirname]Notes:You will get significantly better performance if you rebuild indexes with sortingSorting can take as much space as the database itself, in most cases 50% of the database size will be taken up by sorting.If the sort files will exceed 2GB you must use a dbname.srt file to specify the location and size of the sort files. Generally, it is a good idea to limit the size of each sort file to 500 MB.If you use a dbname.srt file you do not need to specify a –T parameter
56idxfix Verifies index to record linkage Verifies index block to index block linkageWorks online while in multi-user modeSyntax:proutil <dbname> -C idxfixNotes: Fixes index corruption only You will get a menu like this:Index Fix UtilitySelect one of the following:1. Scan records for missing index entries.2. Scan indexes for invalid index entries.3. Both 1 and 2 above.4. Cross-reference check of multiple indexes for a table.5. Build indexes from existing indexes.6. Delete one record and its index entries.7. QuitOption 1 Scans the database records for missing or incorrect index entries.Option 2 Scans the index for corrupt index entries. You can scan 1 or more indexesOption 3 Does option 1 and option 2Option 4 Prompts you for the table and indexes for which you want to run a cross-reference check. Looks for both invalid index entries and invalid records.Option 5 Prompts you to specify the table and the index that you want to use as the source from which to build another index for the same table.Option 6 Prompts you to specify the recid of the record you want to delete. Deletes one record and all its indexes from the database.Option 7 Ends the PROUTIL Index Fix utility
57idxcompact Fast way to compress (reorganize) indexes online Utility will pass through the index several times (number of index levels + 1)Runs onlineSyntax:proutil <dbname> -C idxcompact [owner- name.]table-name.index-name [n]Notes:The default level of compaction is 80%, The n at the end of the command specifies the level of compaction that you want. If the index has a greater level of compaction than you specify it will retain that level of compaction after the utility is run.The index compacting utility operates in phases:Phase 1: If the index is a unique index, the delete chain is scanned and the index blocks are cleaned up by removing deleted entries.Phase 2: The non-leaf levels of the B-tree are compacted starting at the root working toward the leaf level.Phase 3: The leaf level is compacted.You cannot run any other administrative operation on an index that is being compacted, idxfix can be run serially but not in parallel with index compact
58Database Analysis ixanalys – analysis of indexes chanalys – analysis of record chainsdbanalys – analysis of records and indexesSyntax:proutil <dbname> -C XXanalysNotes: Sample output dbanalys:SUMMARY FOR AREA "Inventory" : 8Records Indexes CombinedNAME Size Tot % Size Tot % Size Tot %PUB.Bin K K KPUB.InventoryTrans B B BPUB.Item B B B 0.6PUB.POLine K K KPUB.PurchaseOrder K K KPUB.Supplier B B BSample output ixanalys:INDEX BLOCK SUMMARY FOR AREA "Order" : 11Table Index Fields Levels Blocks Size % Util FactorPUB.BillTocustnumbillto BPUB.OrderCustOrder KOrderDate KOrderNum KPUB.ShipTocustnumshipto B
59Truncate BI Reduce for size of the BI file Change the cluster size of the BI fileChange the block size of the BI fileSyntax:proutil <dbname> -C truncate bi [-bi n][-biblocksize n] [-G n]Notes:Use only when modifying BI cluster or block sizeOrAfter abnormal growth of the BI file (after a large schema change)-bi specifies the BI cluster size in kb.Generally, a BI cluster size between 1024 (1 MB) and 8192 (8 MB) is ideal depending on transaction load-biblocksize specifies the BI block size in kb (16kb max)Either 8kb (v9 default) or 16 kb (Maximum) is ideal
60BI GrowAfter truncation it is best to pre-grow your BI file to it’s anticipated sizeKeeps BI sequential (good for performance)Database must be shutdownSyntax:proutil <dbname> -C bigrow nNotes:The n at the end of the command specifies the number of BI clusters that you want to add to the BI file
61Table Move Allows the movement from one storage area to another Works “online”Uses 4-times the amount of BI space as is taken up by the tableSyntax:proutil <dbname> -C tablemove [owner- name.]table-name table-area [index-area]Notes:While the table is being moved a lock is placed on the entire table and any updates to this table will be delayed until the table move is complete. This is why the word online is quoted above.owner-name: Specifies the owner of the table containing the data you want to dump. You must specify an owner name unless the table's name is unique within the database, or the table is owned by "PUB." By default, Progress 4GL tables are owned by PUB.table-name: The name of the table to be moved.table-area: Specifies the target storage area name which the table is to be moved.Optionally, You can specify the name of the target index area. If the target index area is supplied, the indexes will be moved to that area. Otherwise they will be left in their existing location.Note: Area names with spaces in the name must be quoted, for example, "Area Name."index-area
62Index Move Allows movement of indexes from one storage area to another Works “online”Uses a significant amount of BI spaceSyntax:proutil db-name -C indexmove [owner- name.]table-name.index-name area-nameNotes:Basically, the index is locked (either shared or exclusive) for the duration of the process.The PROUTIL INDEXMOVE utility operates in two phases:Phase 1: The new index is being constructed in the new area. The old index remains in the old area and all users can continue to use the index for read operations.Phase 2: The old index is being killed and all the blocks of the old index are being removed to the free block chain. For a large index, this phase might take a significant amount of time. During this phase all operations on the index are blocked until the new index is available; users accessing the index might experience a freeze in their applications.owner-name: Specifies the owner of the table containing the data you want to dump. You must specify an owner name unless the table's name is unique within the database, or the table is owned by "PUB." By default, Progress 4GL tables are owned by PUB.table-name: Specifies the source table containing the index to be moved.index-name: Specifies the name of an index to move.area-name: Specifies the target storage area name into which the index is to be moved.Note: Area names that contain spaces must be quoted. For example, "Area Name."
63Database Log Truncation Reduces the size of the log fileDatabase must be down for it to workSyntax:prolog <dbname>Notes:There is a truncate_log script in the $SCRIPTS directory that will truncate log files while the database is up and running.
64Performance Tuning - Basics Before Image cluster sizeDatabase block sizeTuning APWsMemory tipsIncreasing CPU efficecy
65Networking Tips Keep things local No temp files on network drivesMove the application “close” to the userUse -cache to speed initial connectionUse -pls if you are using program libraries over the networkApplication issues are magnified over a network (field-lists, no-lock, indexes, …)Notes:The best way to improve performance in a network environment is to avoid using the network. This is the reason web applications are more efficient than n-tier and n-tier is more efficient than client server.A well written network application will work well locally but the converse is not true.
66Networking Tips (Continued) -Mm 8192 to increase the tcp packet size from 1k to 8k-Ma Increase the number of servers to reduce or eliminate server contentionNotes:Progress will fill these packets as much a possible and when smaller amounts of data are sent (lock requests) then smaller packets will be sent.View -Mm as a upper size limit.
67Stripe Some, Leave Others Flat Tables that are accessed sequentially may benefit from being isolated to their own table spaceRandomly accessed tables will generally perform better on striped volumesDisk systems that have read ahead algorithms will help sequential access most when placed on a single disk (or mirror)Notes:
688k Block SizeMost systems will benefit from using 8k block size (NT should use 4k)You will retrieve more information per physical I/O especially on index readsI/O is done how the operating likes it to be doneNotes:You may need to increase the number of records per block to avoid wasting space in the database.
69BI Cluster Size Somewhere between 1MB and 4MB works for most people If you are checkpointing every 2 minutes or more often during peak periods increase the cluster sizeIf you a “workgroup” version of Progress leave your cluster size alone (512kb)Don’t forget to use bigrow to avoid allocating clusters one at a timeNotes:There are cases where larger cluster sized are necessary. In load scenarios it is generally a good idea to set the BI cluster size to a large value (32MB). This is especially important when using the binary load facility.BI Grow pre-formats BI clusters to eliminate the eliminate the initial formatting of clusters. It also allows the BI file to be used and reused in a more sequential manner.
70Progress® Page Writers Every database that does updates should have a before image writer (BIW)Every database that does updates should have at least 1 asynchronous page writer (APW)Every database that is using after imaging should have a after image writer (AIW)Notes:
71Tuning APWs Start with 1 APW Monitor buffers flushed at checkpoint on the activity screen (option 5) in promonIf buffers flushed increases during the “important” hours of the day add 1 APWNotes:If you do an online backup or a quiet point you will see additional buffers flushed and these need additional buffers need to be ignored.There are cases where buffers flushed cannot be eliminated (i.e.. Online backup), so if you add APWs and buffers flushed does not decrease go back to the previous number of APWs.
72Use -T to Level Disk I/OLocal (host based) users and batch jobs should use the -T parameter to place their temporary file (.srt, .pge, .lbi, …) I/O on a drive that is not working as hard as the other drives on the systemNote: -T should never point to a network driveNotes:Network users (Client/Server) should always keep the temporary files local to the client process.By default, the temporary files go in the working directory of the client process.
73Application Files Keep paths short say run <subdir>/program to eliminate unnecessary searchesPut programs into libraries (prolib) to reduce I/O to temp filesLibraries use a hashed search mechanism for better performanceNotes:
74Memory ContentionMemory should be used to reduce disk I/O. Broker (server) side parameters should be tuned first and then user parameters can be modified. In a memory lean situation, memory should be taken away from individual users before reducing broker parametersNotes:A memory lean situation can cause a disk contention issue by swapping
75Memory HintsSwapping is bad, buy more memory or reduce parameters to avoid itIncrease -B in 10% increments until the point of diminishing returns or swapping, whichever comes firstUse V9 private buffers (-Bp) for reportingDo not use private buffers (-I) prior to V9Notes:Use the memory you have but leave a buffer for growth.At small -B settings (Less than 100MB) higher incremental changes may get you to the proper setting faster.
76Memory Hints(continued) Use memory for the users closest to the customer first (developers increase last)Use -Bt for large temp tablesSet -bibufs between 50 and Look at the activity screen in promon (BI buffer waits) to see if additional tuning is necessary. Start with 50 as this will work for the vast majority of peopleNotes:
77CPU ContentionHigh CPU activity is not bad in and of itself but high system CPU activity is bad and should be correctedNotes:High system CPU activity can be caused by other operations or functions, like NFS or Samba.
78Components of CPU Activity USER - This is what you paid forSYSTEM - This is overheadWAIT - This is wasteIDLE - This is nothing ;-)Notes:
79CPU Activity GoalsThe goal is to have as much USER time as possible with as little SYSTEM and WAITA practical split is USER: 70%SYSTEM: 20%WAIT: 0%IDLE: 10%Notes:In many cases you can get 75+% User so try to get the best you can by adjusting -spin, napmax and workloads.Remember: Look at your baseline timings and make sure they are improving.Good statistics do not equal good performance is all cases.
80Eliminating High SYSTEM CPU Activity Always use -spinUse a setting of 1 for single CPU systemsUse a higher setting for multiple CPU systemsTesting has shown that the optimal setting for -spin is somewhere between 2000 and First try 2000-napmax should default to 5000 but in some late 7 and early 8 versions of Progress it is set to 100 which is way too lowNotes:Spin can be modified within promon or through VSTs.Promon:R&D, option 4, Option 4, Option 1To see the value of napmax:promon, R&D, debghb, Option 4, Option 4
81Eliminating High WAIT CPU Activity WAIT = Waiting on I/OIf you still have IDLE time it generally is not a big problemLook at paging/swapping firstNext look at your disk I/ONotes:Generally, it is good to maintain about 5% idle.In some cases, Wait will be logged as idle.
82Progress Database Future Directions Increased uptime through online utilitiesIncreased speed of utilities to maintain the databaseSupport for clusters to increase reliabilityOpen standards support
83Replication New feature in 9.1D of Progress Fathom High Availability Allows for single or bi-directional replicationTarget database can be used for update or reporting
84Replication (continued) Source database has an agent that forwards changes to the target database(s)Only one agent per databaseOne or more targets per agentRaw record format is used to increase performance and reduce overhead