Redo and Archiving.

Redo and Archiving

Objectives After completing this lesson, you should be able to:
Explain how to measure redo resource contention List the causes of redo contention Detail the LGWR algorithm and tuning concepts Identify redo-less operations and their likely effect on performance Describe the ARCH process and best archiving practices

Redo Architecture Concepts
Physiological logging: Page actions: Change Vectors Minitransactions: Redo Records Page fix rule Write ahead log Log force at commit Logical ordering of redo Redo Architecture The purpose of redo architecture is to guarantee transaction durability. This is the property through which a transaction must be able to survive failures after it has been committed. The redo architecture of the Oracle server is based on the process of logging transactional information to operating system files. You can then use these files to recover transactions after failure, restoring the database to a consistent state. Several rules and algorithms are also required to complete the redo architecture. Page fix rule Write ahead log Log force at commit Logical ordering of redo

Logging Methods TABLE A INDEX B INDEX C Logging Methods
Assume that an insert statement is executed against a table (Table A) with two indexes (Index B, and Index C). Because undo information must be stored together with the original operation, a delete operation is logged. This example assumes that there is no activity on data dictionary objects. Note: For more information on the concepts of transaction processing, see “Transaction Processing: Concepts and Techniques” by Jim Gray and Andreas Reuter.

Logging Methods Logical Log Physical Log Physiological Log
insert operation delete operation Physical Log After Images A B C A' B' C' Physiological Log dba A, delete op, insert op Logging Methods (continued) Logical Logging A logical logging method stores the insert and delete operations. This does not require much storage space, but one such operation could be responsible for the modification of many blocks. Consider what would happen if the insert operation affected hundreds of blocks and the recovery process failed for some reason, stopping in the middle of such a statement. First, all the modified blocks would need to be rolled back, which is not straightforward because there is only an undo operation for the entire statement, not for part of it. Then the insert operation would have to be repeated from the beginning, wasting valuable time. Physical Logging A database that uses physical logging would need to store before and after images of all blocks that are affected, requiring a significant amount of storage space. Physiological Logging Physiological logging uses the best features of both of these methods, storing operations (like logical logging) instead of full block images. However, the granularity is at the block level (like physical logging), so one operation is stored for each individual block change. This method takes less space than physical logging, and it can easily be restarted in case of failure. dba B, delete op, insert op dba C, delete op, insert op

Oracle9i Architecture Diagram
SMON PMON System Global Area Redo Log Buffer Buffer Cache Control Files DBWR DBWn Server CKPT ARCn LGWR Oracle9i Architecture Diagram Shown above is a simplified Oracle9i architecture diagram, illustrating the processes and file systems used. Physiological logging is achieved with Oracle’s redo architecture. Coordination exists between the processes of the Oracle database: SGA Buffer cache, Redo log buffer Physical devices Data files, redo log files (online and archived) Background processes LGWR, DBWn, CKPT, ARCn Redo Logs Archived Redo Logs Database Files

Redo Logging in the Oracle Server
Complex actions are structured as a sequence of minitransactions, called redo records: Constitute an atomic change to the database state Are used to move the database state forward Redo records are structured as a sequence of page actions, called change vectors: Change one Oracle block Are used to move the block consistency forward Redo Logging in the Oracle Server The Oracle server implements physiological logging. At the transactional level, a change to a block requires both redo and undo to be logged. This indicates that a redo record relating to a transactional change to a data block includes changes to both data and undo blocks. At commit time, the Oracle server guarantees that the transaction table of the rollback segment header is updated. It also guarantees that the redo and undo of the changes, and the commit record (containing the commit SCN) are all written to the redo logs. Note: There are some exceptions to redo generation as described above. For example, in the use of temporary tables, the Oracle server only logs the undo part of the transaction.

Page Fix Rule The page fix rule guarantees that:
The buffer cache location of a data block is locked before reading or modifying the block Redo and undo change vectors are generated before modifying the block Consistency and isolation of the buffer are maintained Page Fix Rule An Oracle buffer (or page) must be protected while it is being modified. The page fix rule prevents other access until all the changes that are needed to take the buffer to its next consistent state have been applied. A redo record must be created before changing the buffer. The Oracle server guarantees that when a buffer is unpinned after a page action completes, the redo records are consistent with the buffer state. The page fix rule is implemented on UNIX with semaphores: A page semaphore must be obtained in Exclusive mode before altering the page, or in Shared mode before reading. The semaphore is held until the changes to be applied are recorded in a log buffer and the changes have been made to the page. Inconsistent buffers caused by failures during a page-action (for example, process death in the middle of page-actions) need online block recovery by applying redo log records from the redo logs to the block copy. This is performed automatically when a buffer is found to be corrupt in memory.

Write-Ahead Logging Write-ahead logging guarantees that:
Before a block buffer is changed, in the buffer cache, redo must have been written to the log buffer Before DBWR writes a data block buffer, LGWR must have written the necessary redo records to the log files. Write-Ahead Logging This rule implies two conditions: Before a change is made to a block, the Oracle server guarantees that the redo record has been written to the redo log buffer. Because of the LRU algorithm that implements buffer replacement policy in the buffer cache, it is possible that a dirty and uncommitted block may be required to be paged out to disk. Before DBWR can write the dirty block back to disk, the redo log records that contain the corresponding changes (which is on the redo log buffer according to the previous condition) must be written to persistent storage (the online log files). The entries are written to the online logs by LGWR. Thus DBWR must post and wait for LGWR before doing its own write operation.

Log Force at Commit A transaction commit:
Signals the completion of the transaction Posts LGWR to write the contents of log buffer to disk Waits for LGWR to complete the write Does not force a data block write Group commit: When several transactions commit simultaneously, only one post to LGWR is sufficient to flush all the contents of the log buffer. Log Force at Commit During the commit action an SCN is allocated and a commit redo record is generated (ktucm). The commit record is initially stored in the log buffer and in order to make the committed change permanent, all the redo entries up to and including the commit record must be written to the online logs (on disk). The Oracle server optimizes the writing of redo such that the redo for any transactions which complete during the signaling of a write is bundled together and written at the same time (this is termed group, or piggyback, commits). Recovery is designed to support the transactional property of durability. The Oracle server can achieve this by only writing the redo entries at commit time. It is not necessary to write all the changed data blocks at this time.

Logical Ordering of Redo
Logical ordering of redo is achieved through the implementation of system change numbers (SCNs) Each database has one global SCN generator. This sequence acts as an internal clock identifying the committed version of a database. An SCN is assigned to a transaction when it commits and stored with each redo record Logical Ordering of Redo An SCN defines a committed version of the database. An SCN is saved in the header of each redo record. A new SCN is allocated when a transaction commits, or when it is necessary to mark the redo as being allocated after a specific SCN (for consistency purposes). SCNs are also allocated and stored in other data structures, such as the control file or data file headers. An SCN is 6 bytes long and consists of a 4-byte base and a 2-byte wrap number. SCNs can be allocated at a rate of 16,384 SCNs per second for over 500 years without running out of them.

Redo Log A redo log is an ordered list of redo records.
Redo logs are normally mirrored, forming a redo logs group. Each individual log in a group is called a member. The database server requires a minimum of two redo log groups. Redo log groups are used in a circular fashion. Each instance creates its own sequence of redo logs. This is known as a thread of redo. Redo Log For redundancy purposes, redo logs are normally mirrored by the Oracle server, forming redo log groups. A group is associated to a thread#, sequence#, and a particular range of SCNs. Each individual log in a group is called a member. All the members of a group have identical contents and are written to simultaneously (as much as the operating system and the hardware allow it). In general, it is preferable to use Oracle multiplexing rather than operating system mirroring, although both can be used simultaneously.

Redo Byte Address Pointer to a specific location within a redo thread
REDO RECORD - Thread:1 RBA: 0x LEN: 0x02f0 VLD: 0x01 SCN scn: 0x e4d3 05/12/02 15:19:54 Pointer to a specific location within a redo thread Recorded at the beginning of each redo record Made of three components: Log sequence number Block number within the redo log Byte offset within the block Redo Byte Address The Redo Byte Address (RBA) field is recorded in the control file and the data files together with the corresponding SCN. In earlier versions, the Oracle server used the RBAs to control recovery. The Oracle server now uses SCNs, but the RBA remains a very useful method to quickly locate the beginning of the redo stream to be applied.

Redo Log Buffer Buffer cache Shared pool Redo log buffer DBWR SGA LGWR
Redo log files Data files Redo Log Buffer This buffer is placed in the SGA. The server processes fill it up with redo records and LGWR flushes the contents to the redo log files.

Redo Log Buffer The redo log buffer is a memory area for temporary storage of redo records The total size is determined by the LOG_BUFFER parameter Subdivided into several pieces that are used in a circular fashion The size of each piece is an operating system block (usually defined as 512 bytes). Each piece maps to an on-disk block of the current redo log. Redo Log Buffer (continued) The total size of the log buffer is determined by the LOG_BUFFER parameter. The default value is 512K or 128K multiplied by the number of CPUs, whichever is greater. The value of the parameter must be a multiple of the physical block size. There is also an imposed minimum limit which is four times the maximum database block size on that host, plus two memory pages (for the “guard” pages that help to protect the redo buffer). The memory page size is 8192 (on Sun/Solaris) and 4096 (on Windows NT).

Redo Generation The server process:
Pins a data block buffer in Exclusive mode Builds change vectors and bundles them into a redo record in the PGA Determines the space that is required in the log buffer Allocates that space in the log buffer Writes the redo record into the log buffer Changes the data block in the buffer cache Redo Generation A server process intending to change a data block has to start by pinning the buffer in Exclusive mode. Then, it constructs an array of change vectors describing the changes. The array, called kcocv, is built in the PGA. Making calls to kconew() and kcoadd(), this array is populated. After the redo record is built, it has to be written to the redo log file by a call to kcrfwr(). This routine starts by determining how many bytes must be allocated. Then, it determines the SCN to be stored with the redo record. Note that not every record has necessarily a different SCN. After a valid SCN has been determined, a redo copy latch is acquired, while holding the latch, each change vector is checked to ensure that it is not changing a file that has had its redo generation stopped.

Redo Generation (continued)
Next the redo allocation latch is acquired. The redo copy latch is not really needed until later on, but acquiring the redo copy latch first eliminates executing the code to acquire it while holding the redo allocation latch. This increases the contention on the redo copy latch, but it can be reduced by using multiple latches. At this point, a quick check is made to ensure that no other process has generated redo at a higher SCN than the one that is determined above. If this is the case, then a new SCN is allocated. Next, the function determines if there is enough space available to write the redo record. Space can only be allocated in the redo buffer if there is enough space in the log file. If there is space in the buffer, but not in the file, then a log switch must occur first. Because a small part of the redo log may be unused, on some platforms the archive redo logs may be smaller than the online logs. If there is not enough space to proceed with the allocation, then the process has to wait for LGWR to make it available. The redo allocation latch and redo copy latch are freed at this point. Then, the process must determine whether to wait for LGWR to finish the current I/O (if one is in progress), whether to post LGWR to initiate a new I/O (because no other process has made the request yet) or whether to request a log switch. In order to prevent multiple processes posting LGWR or requesting log switches simultaneously, the redo writing latch is implemented. A process in need for space must first allocate this latch to determine what to do next, whether to post LGWR to perform a write and free space in the buffer, request a log switch, or simply wait. When LGWR is writing redo to the files, it also has the redo writing latch, so any other process trying to get this latch has to wait. Hence, the first thing a process is going to do after it gets the redo writing latch is to check whether space has just become available. If this is the case, then the latch is released and processing starts again, trying to get the redo copy latch. If there is no space in the current log file, and kcrfwr() returns False indicating that it cannot write the redo to the buffer, then kcrfws() is called to determine whether another process has already requested a log switch, in which case, it just waits for the space to become available. Or, it requests a log switch itself. If there is no space in the buffer but there is space in the current redo log, then LGWR is posted to do a write and free some space. If the latches were released because space was not available, then they have to be acquired again. When space is finally available, it is allocated. Then, the redo allocation latch is freed. Still under the redo copy latch, the change vectors are copied into the log buffer, the buffers are linked into the checkpoint queues for bounded recovery time, and the change vectors are applied to them. At this point, the redo copy latch is also released. The last thing this function does is to determine whether LGWR should be posted for a write or not. This is determined by reaching certain thresholds. If the answer is yes, LGWR is requested to perform a write.

Redo Generation N Y Y N Y N Release redo allocation latch
Calculate redo size Is there space available? Get redo allocation latch N Y Release redo copy latch Increment statistics Adjust remaining free space Increment redo buffer allocation retries Determine SCN Release redo allocation latch Get redo copy latch Get redo writing latch Copy change vectors to log buffer Y Return true Release redo copy latch Has space become available? Release redo writing latch N Y Wait for log buffer space Post LGWR to perform I/O Release redo writing latch Is there space in the current log file? Redo Generation (continued) This diagram shows the algorithm of kcrfwr(). This is followed by the server process when copying redo information into the log buffer. N Return false

Writes to Redo Log Files
LGWR writes the contents of the buffer to disk: If posted by foreground processes: Because redo log buffer space is not available Because the transaction is committed If the log buffer is 1/3 full If 1 MB worth of redo records has been logged If the thread is closed At a log switch (user-initiated or normal) At a three-second LGWR inactivity timeout Writes to Redo Log Files If a foreground process has detected that there is no space in the redo log buffer but there is space in the current redo log, then LGWR is posted to make some space available in the buffer. Also, when a transaction commits, LGWR is forced to flush the log buffer. LGWR is signaled with a request to write the redo records, up to and including the last record containing the commit SCN, to the log file on disk. Before LGWR can respond to the request, other foregrounds may generate more redo and post it with buffer write requests as well. All these requests are combined forming a group or piggyback commit. The users do not see a Commit complete message until LGWR completes the write operation to disk. The parameter _LOG_IO_SIZE determines the write threshold for LGWR. This is the number of used redo blocks that automatically initiate a log write. By default, this parameter is zero which forces the write to occur when the log buffer is 1/3 full. If a value different from zero is set, then this value is the threshold used provided it is not greater than half the total redo buffer size, otherwise this value is used. In either case, the write threshold cannot be more than 1 MB, otherwise this value is used. A write occurs from the start of the redo in the buffer to the current position.

Writes to Redo Log Files (continued)
After LGWR writes out the log buffers, the changes become “real” to the database. By “real” here it is meant that the Oracle server can guarantee that the committed transactions are present, and the uncommitted transactions are rolled back if the database is restarted. A database crash coupled with the loss of the current log file is the worst thing that can happen, because it becomes practically impossible to recover the database fully.

LGWR Algorithm Acquire the redo writing and redo allocation latches.
Determine the buffers to write out. Release the redo allocation latch. Determine how many writes are required. B2 A B1 Log Buffer Request A: Single write LGWR Algorithm When LGWR is required to write redo buffers, it requests the redo writing latch first to make sure that other foreground process do not continue to post LGWR requesting I/O, and then the redo allocation latch to prevent them from allocating more redo space. LGWR then determines the buffers to be written. All full buffers up to and including the buffer which is to be synched or used are included for write. At this point, a new SCN is allocated to ensure that LGWR never does two flushes at the same SCN. After this has been completed, the redo allocation latch can be released. Because the log buffer is a circular structure that is represented by an array, up to two write requests are built. Depending on where the last request ended, the log may have wrapped. Request B: Two writes

LGWR Algorithm Calculate the target RBA for advancing the incremental checkpoint. Release the redo writing latch. Ensure that all foregrounds have completed modifying the redo buffers that need to be written. Update redo block headers in the log buffer, including checksums if necessary. Write the blocks to disk. LGWR Algorithm (continued) For recovery purposes, the target RBA is calculated and the need for a checkpoint is evaluated based on LOG_CHECKPOINT_INTERVAL. After this, the redo writing latch is released. Because it is possible in some cases that redo generation is still in progress, LGWR must ensure that all foregrounds have completed modifying the redo buffers it is about to write. While copying into the redo buffer a foreground holds a redo copy latch and has the low block number of the redo stored with the latch. When the redo copy latch is not being used, the buffer number is set to UB4MAXVAL. Thus LGWR need not wait unless the block that is associated with the redo copy latch is less than the first block that goes into the next disk write. This must be done before calculating the checksums because the values to be check summed could be in the middle of changing. If a foreground is still copying redo, then LGWR requests an awakening post message when the corresponding redo copy latch is released. Note that LGWR does not acquire any redo copy latches at all, it just waits for them to be released. This is enough to guarantee that the foreground has finished modifying the buffer to be written. This way, LGWR is not blocking foregrounds that intend to write in other areas of the log buffer. After all the redo generation is completed for the buffers to be written, LGWR fills the block header (with SCN information) for each redo buffer and then issues the disk writes. You can enable async I/O for LGWR by using the parameter _LGWR_ASYNC_IO.

Redo Wait Events There are 12 wait events that are directly related to redo. Under normal operation, only a few of these events are waited on: Log file parallel write Log file sync Log file switch In a well-tuned system, redo-related waits should be none or minimal. Redo Wait Events Log file parallel write This is the main event waited while writing the redo records to the current log file. The call to write the log buffer is only made in LGWR, and the write operation is parallelized per physical disk. However, the wait is not over until the last redo buffer is sent to the disk. Log file sync When a request is made to sync, the write by LGWR includes all the redo up to the specific RBA and the redo pointed to by that RBA that is not already on disk. The time it takes for LGWR to complete the write and then post the requestor is the log file sync wait event. Each user COMMIT or ROLLBACK must sync the redo with an infinite RBA, forcing a complete flush of the log buffer to the redo log file. Log file switch (checkpoint incomplete) This can be caused under heavy load when the checkpoint for the next log is not completed. The process has to wait before a log switch is possible. Log file switch (archiving needed) A process has to wait for the log switch and the reason why this is not done yet is because the previous contents of the new log have not been archived yet.

Redo Wait Events (continued)
Log file switch (clearing log file) This happens when the DBA enters the ALTER SYSTEM CLEAR LOG FILE command and the LGWR needs to switch to the log file that is being cleared. Wait time: 1 second. Log file switch completion This happens when the current log file is simply full. The LGWR must complete the writing to the current log file and open the new one (switch log files). Wait time: 1second. Switch log file command Waiting for the user command to switch log files to complete. Log buffer space This is the event kcrfws() waits on. It means a process is waiting for space in the log buffer because the foreground processes are writing redo records faster than LGWR can write them out. Consider making the log buffer bigger if it is small, or moving the log files to faster disks such as striped disks. Wait time: 1 second normally, but 5 seconds if waiting for a switch log file to complete. Log switch/archive This event is waited when the DBA manually enters the command ALTER SYSTEM ARCHIVE LOG CHANGE <scn>. If the log that is to be archived is in the current thread, then a log switch is requested and waited until it is done. Otherwise (the log that is to be archived is not in the current thread), this event is waited for 10 seconds. Log file sequential read Any request to read from the redo log files waits for this event. The read can be the redo records or the log file header. The P3 column indicates the number of blocks that are read and is set to 1 if the wait is for the log file header. The number of blocks to read in one batch is port specific and cannot be tuned. Log file single write This event is waited only when writing the log file header block. There are two places where the header block is written: when adding a new member file and when the log sequence number is advanced. The only contention reason could be the access time to the file, which can simply be resolved by reducing nonredo-related activity on the disk or placing the redo files on faster disks. LGWR wait for redo copy LGWR is about to write a set of log blocks but it has to wait until the foreground processes complete any current copy operations which affect the buffers that are about to be written out. This used to occur under latch waits for the redo copy latch in Oracle7.0 and 8.0. This method of waiting is intended to be faster as the foreground should post LGWR when done. Wait time: 10 ms or until posted.

Redo Statistics There are 16 redo statistics:
Most of them (14) are defined in kcrfh.h. The others are defined in kcb.h and are related to the buffer cache and log file synchronization. Statistics are calculated in these kcr functions: kcrfwr(): Write redo into the log buffer kcrfws(): Wait for space (log switch) kcrfwi(): Write redo into the log file Redo Statistics Here is the complete list of redo-related statistics and a brief explanation: Redo writes, redo blocks written, redo write time: All three of these statistics are calculated in kcrfwc(), at the end of the redo parallel write operation. These are the main redo statistics and are only updated by LGWR. Redo writes is the number of times the log buffer is written and redo blocks written is the total number blocks written. The average blocks/write can be easily calculated from these values. Redo write time is the total time that is required to write all the redo to disk. In a way, this is the cumulative value of each log file parallel write wait time. Redo log space requests, redo log space wait time: These statistics show the number of times that are requested to allocate space in the redo log file, and the total time that is spent waiting. These statistics are incremented only if space is not immediately available and a foreground process is waiting for a log switch. The number of requests is the sum of all log file switch events except for an ALTER SYSTEM SWITCH LOG FILE command. Redo size, redo entries: These statistics are calculated in kcrfwr() just before the redo record is copied into the log buffer. These numbers simply indicate how much redo is generated since the instance startup. As the redo size is in bytes, the average redo information per redo record can be calculated easily by dividing the redo size with the number of redo entries. There are no events associated with these statistics.

Redo Statistics (continued)
Redo buffer allocation retries: This is the total number of retries that are necessary to allocate space in the redo buffer. Retries are needed because either the LGWR has fallen behind, or because an event (such as log switch) is occurring. While a process is retrying it frees the redo allocation and redo copy latches but has to acquire the redo writing latch. Redo log switch interrupts: In a RAC environment, log switch is forced on the idle instances to minimize recovery time. This is the number of times that another instance asked to advance the log. Redo wastage: The number of bytes that were free in the log buffer when they were flushed to disk. It is necessary to flush the log buffer when a transaction commits, or a log is switched. The name “wastage” is misleading, it is rather unused buffer space. Redo writer latching time: When LGWR is writing the log buffer, it has to ensure that no other process is writing in the blocks that are flushed out. In Oracle7.0 and 8.0, LGWR would get and release all the redo copy latches to make this verification. In Oracle8i, LGWR does not get the latch, but still waits until the foreground processes complete their writes. In either case, this statistic determines the time that is wasted by LGWR waiting for other processes to finish writing into the log buffer. Redo ordering masks: This statistic is only relevant in a RAC environment. The ordering of the redo records are based on the SCN, and in some uncommon situations the current SCN (Lamport SCN) in the current instance might be less than the current (ordering) SCN. In this case a new SCN is required (through the Distributed Lock Manager). This statistic is the number of times that an SCN had to be allocated to force a redo record to have a higher SCN than a record that is generated in another thread using the same block. Redo synch writes, redo synch time: These two statistics are calculated in the buffer cache management routines rather than the redo routines. Redo is generated and copied into the buffer and it is written by LGWR in the background. Redo synch writes is incremented each time that the redo records must be written out to disk due to a commit. Redo synch time is the time it takes for the redo synch write to complete.

Redo Latches There are three types of redo latches:
Redo copy Redo allocation Redo writing There can be multiple redo copy latches. There is only one redo allocation and one redo writing latch. Redo Latches The number of redo copy latches can be defined by setting the initialization parameter _LOG_SIMULTANEOUS_COPIES to the desired number. From Oracle8i, the parameter is hidden because it is not a good idea to add too many redo copy latches, especially for OLTP. Adding extra redo copy latches reduces the probability that foregrounds miss getting a latch. The default for the number of redo copy latches is two times the number of CPUs. This may be a high value in many cases. Even though LGWR no longer gets these latches, it still has to wait for foregrounds to complete. Thus the more number of latches, the more likely the waits. In the past, it was possible to copy redo into the buffer using the redo allocation latch, provided the redo record size did not exceed the parameter LOG_SMALL_ENTRY_MAX_SIZE. From Oracle8i, this parameter is obsolete and a redo copy latch is always acquired to copy the redo. The redo writing latch exists to prevent numerous processes from posting LGWR when there is no space in the log buffer. Only the first process to get the latch posts LGWR. Likewise, LGWR also gets the latch while writing, so no other process can be posting while this is happening. When a new process gets the redo writing latch, the first thing it checks is whether LGWR has just made space available. If this is the case, the process releases the latch without posting LGWR.

Optimizing the Redo Buffer
Log buffer contention is not generally a major problem: It is easily identified by checking the statistic redo buffer allocation retries. Ideally, there should be no waits for redo log space. Two ways to reduce space requests: Reduce the amount of redo that is generated. Improve the efficiency of LGWR. Optimizing the Redo Buffer Contention for redo log buffers does not generally cause major performance problems on the database. Usually, the LGWR process writes the changes quickly enough to ensure that buffers are free for new entries. Waits for the event log buffer space could indicate contention for the redo log buffers. However, it may be more useful to find out the proportion of redo entries which had to wait for space to become available. This ratio can be calculated as: redo buffer allocation retries / redo entries This ratio should be near zero. If only a small fraction of the entries ended up waiting for space in the log buffer, then it might be an indication of only temporary contention. For example, eliminating a 0.1% space requests does not increase the database performance by any more than 0.1%. Reducing the amount of redo that is generated on the system can be achieved by performing redo-less operations (UNRECOVERABLE or NOLOGGING). These operations are discussed later in this lesson.

Optimizing the Redo Buffer
The parameter LOG_BUFFER defines the size of redo log buffer. Increase this parameter if there are no I/O bottlenecks on the redo disks. A value of 128K is reasonable for most systems. Striping redo log files may help LGWR to flush the log buffer faster. Optimizing the Redo Buffer (continued) In theory, three times the maximum write-size should be all that is necessary to keep LGWR at optimal speed. However, having a larger buffer allows for the scheduling delay in activating LGWR and may allow for it to be activated less often, leading to fewer context switches. Use the following formula when sizing the redo log buffer: 1.5 * (average transaction redo size * number of transaction commits per second) Note: The 1.5 factor is used because LGWR starts to write the redo buffer when it becomes 1/3 full. It is also easy to see if there is I/O contention on the redo disks. The system event log file sync will have high wait times in such a case. Generally the wait time should be less than 0.2 seconds. If the event has a high wait time, then look at the total number of timeouts compared to the total number of waits: Logical waits = total_waits – total_timeouts Striping the log file may help in this case with a recommended stripe size of 16K. The redo log file write batch is 128K, so striping the redo logs across 8 disks with a stripe size of 16K could improve the write time by as much as 75%.

Tuning Redo Latch Contention
If waits for log file sync have the greatest impact: If waits for LGWR wait for redo copy are also high, then probably there are too many copy latches. If not, then the I/O is too slow. If waits for latch free have the greatest impact: Contention for redo copy latch: Increase the number of copy latches Contention for redo allocation latch: Consider redo-less operations Contention for redo writing latch: Increase the size of the log buffer or reduce _LOG_IO_SIZE Tuning Redo Latch Contention If the wait events show that the most significant event (most impact) is log file sync, then this means LGWR is taking a long time to write and acknowledge the commits. If waits for LGWR wait for redo copy are also high, then: LGWR is waiting for the foregrounds to modify the buffers that must be written to disk Probably there are too many redo copy latches Reduce the value of _LOG_SIMULTANEOUS_COPIES If there are high waits for log buffer space instead of LGWR wait for redo copy, then: Reduce _LOG_IO_SIZE to increase the frequency of redo writes if there is no contention for the redo allocation latch Increase _LOG_IO_SIZE if there are sleeps for the redo allocation latch to reduce the load on the latch from LGWR

Tuning Redo Latch Contention (continued)
If there are no waits for LGWR wait for redo copy or log buffer space, then: The delay is in the I/O itself. This is too slow. Reduce LOG_BUFFER to reduce the number and size of sync writes (try three times the average redo size, calculated with the redo size and redo blocks written statistics) Ensure RAID5 is not being used for the logs Use asynchronous I/O if supported Stripe the redo logs on multiple spindles Reduce the number of members in each log group Use operating system mirror instead of Oracle multiplexing (if nothing else works) If the wait events show that the most significant event is log file switch completion (which means the log file switch is taking too long), then: Increase log file size to reduce the number of log switches Reduce the number of log members (use hardware mirroring for more than two members) Reduce the number of control file copies (use hardware mirroring) Hold raw log files open If the wait events show that the most significant event is latch free. Furthermore, for the latch with the highest contention, do one of the following: Redo copy latch: The foregrounds are fighting over the few copy latches that are available. Increase the value of _LOG_SIMULTANEOUS_COPIES. Make sure that the waits for log file sync do not increase too much. Redo allocation latch: There is a need for many allocations simultaneously. Consider redo-less operations (NOLOGGING). Increase the size of the log buffer (or modify _LOG_IO_SIZE) to reduce the load on the latch from LGWR. Redo writing latch: This is unlikely to happen without serious contention on the redo allocation latch. It means that the foregrounds fill the available space in the buffer far quicker than LGWR can make more space available and find themselves always requesting more space. Increase the size of the log buffer (if the rate of commits is high). Reduce _LOG_IO_SIZE (if the rate of commits is slow). If the wait events show that the most significant event is log buffer space, then increase the size of the log buffer provided the waits for log file switch completion does not increase too much.

Redo-less Operations Certain operations can be redo-less:
Direct loader and direct path INSERT CREATE TABLE as SELECT (CTAS) ALTER TABLE ..MOVE CREATE INDEX ALTER INDEX REBUILD / SPLIT PARTITION All partition operations that involve data movement Redo-less operation is initiated by the NOLOGGING attribute. The object cannot be recovered if it is created with NOLOGGING. Standby database must be refreshed after NOLOGGING. Redo-less Operations It is possible not to generate redo for certain operations. You can specify the NOLOGGING clause at the tablespace level, or for specific commands. For example, Direct Loader, CREATE TABLE AS SELECT, and CREATE INDEX have the NOLOGGING option. When specified, it eliminates redo generation for the operation. Be assured that data dictionary modifications are still logged, as is an invalidation record for the blocks that are affected by the operation. Also, if delayed block cleanout is being performed on the source object, then redo is also generated. For objects other than LOBs, if you omit the NOLOGGING clause, then the logging attribute of the object defaults to the logging attribute of the tablespace in which it resides. For LOBs, if you omit the NOLOGGING clause: If you specify CACHE, then LOGGING is used (because you cannot have CACHE NOLOGGING). If you specify NOCACHE or CACHE READS, then the logging attribute defaults to the logging attribute of the tablespace in which it resides. Note: For more information on the NOLOGGING clause, see Oracle9i SQL Reference Release 2 (9.2)

NOLOGGING Performance
Redo Generated (KB) Time (Seconds) 100000 250 20731 22807 201 10000 200 5004 1000 150 77 104 100 100 30 (Logarithmic Scale) 64 13 44 39 10 50 35 1 CTAS CREATE INDEX Direct INSERT CTAS CREATE INDEX Direct INSERT NOLOGGING Performance The above graphs illustrate the effect of NOLOGGING for three different operations. The table used for the tests were 25M, with an average row size of 100 bytes. The database was bounced before each test to minimize the effect of buffer cache on the time spent. The amount of redo generated was obtained from the statistic redo size in V$SESSTAT. The chart showing the redo generated has a logarithmic Y-axis scale because the difference between a regular and redo-less operation is considerably large to illustrate on a decimal scale. NOLOGGING LOGGING

Redo with NOLOGGING A redo record is still written for NOLOGGING operations. For each write there is an invalidation record. Each record covers several blocks (range). When this redo is applied, the range of blocks is marked soft-corrupt. Redo with NOLOGGING An invalidation record is written to the redo log indicating the range of blocks that should be marked as software corrupted. The change vector type is INVLD and the range is determined with an initial starting DBA and the number of consecutive blocks thereafter. Here is an example: REDO RECORD - Thread:1 RBA: 0x0000cf a.01c0 LEN: 0x0028 VLD: 0x01 SCN scn: 0x04e2.0023a5da 10/07/02 09:54:20 CHANGE #1 INVLD AFN:8 DBA:0x BLKS:0x001f SCN:0x04e2.0023a5da SEQ: 1 OP:19.2 When this redo record is applied, ORA-273 errors are logged against these blocks. ORA “media recovery of direct load data that was not logged” Cause: A media recovery session encountered a table that was loaded by the direct loader without logging any redo information. Some or all of the blocks in this table are now marked as corrupt. Action: The table must be dropped or truncated so that the corrupted blocks can be reused. If a more recent backup of the file is available, then try to recover this file to eliminate this error. Any attempt to read these blocks also results in an ORA-1578 error in the alert log.

Archiving Archivelog mode tracks redo logs that must be archived.
Online redo logs must be archived before being overwritten. Database flag changes to media recovery enabled. ARCH process is normally started to perform automatic archiving. User processes can also archive logs if they are instructed to do so. Archiving A database that is running in Archivelog mode maintains a linked list of the online redo logs that are used and prevents a log from being overwritten before its contents have been saved (archived) somewhere else. Normally, the ARCH process is automatically started through initialization parameters to make copies of the online logs to the archive destination as LGWR fills them up. User processes can also be spawned as temporary archive log processes by issuing archiving commands.

Events that Post the Archiver
When archiving is manually enabled (ARCHIVE LOG START command) Log switch Timeouts every 300 seconds (5 minutes) During instance recovery Events that Post the Archiver The Oracle server only posts ARCH when the database mode is set to Archivelog. If ARCH is not started automatically through the initialization parameter on instance startup, then you can start it up through user commands. Normally, it is LGWR who posts ARCH to archive a log after a log switch. If ARCH has not received a message from LGWR in 5 minutes, then a timeout occurs and ARCH is awoken to check whether there is a need to archive anything. ARCH determines the need for archiving by reading the information in the control file. ARCH is signaled not to archive when instance or media recovery is occurring. Note: In a Real Application Clusters environment, there are several additional events that post the ARCH process. That discussion is beyond the scope of this course. Refer to DSI408: Real Application Clusters Internals for this information.

ARCH Process Flow Read online redo log asynchronously
Fill archive log buffers Write to archive log file asynchronously Redo log file Archive log file Archive log buffer Async read Async write Archive log buffer Async read Async write Archiver Process Flow Generically, ARCH: Reads the database information record from the control file to determined unarchived logs Allocates memory buffers (_LOG_ARCHIVE_BUFFERS) each of size _LOG_ARCHIVE_BUFFER_SIZE * redo log block size Opens all the log members of the online log group (read-only) and validates the headers Conducts a search for available archive destinations Creates and opens archive log files in all active and reachable destinations Loops until the entire log has been copied: Reads portions of the online redo log into the archive buffers by using a round-robin method to assign a different log member for each buffer read, if possible Performs a sanity check on the contents of each buffer Writes the buffers to the archive logs (in SCN order) Ends loop Closes all open files (online log members and archive logs in all destinations) Read and write operations are done asynchronously if supported and configured in the platform.

Archiver Process Flow (continued)
Two parameters are provided to tune archiver so that it runs as fast as necessary: _LOG_ARCHIVE_BUFFERS = number of buffers (default 4, max 128) _LOG_ARCHIVE_BUFFER_SIZE = size of buffers (default 64) The defaults should be adequate for most applications.

Archiver Operations Performs error recovery
Switches to another member if errors occur If all members are bad, then returns an error Performs load balancing Reads each buffer from a different member Performs redo log validation Ensures validation regardless of any initialization parameter Checks log file header structure for errors Enables log block checksums by setting DB_BLOCK_CHECKSUM to true Archiver Operations Error recovery: If anything that is read into the buffer fails the sanity check, then the Oracle server reads the same section from other members. If it fails to read these blocks from all the log members, then the Oracle server signals an error indicating that archiving has failed. Load balancing: After each successful buffer read, the next buffer read is assigned to a different log member to balance I/O. Redo log validation: Regardless of any initialization parameters, ARCH always validates some fundamental data structures in the redo log header. Sanity checking continues as the redo log blocks are read. The following are always validated while reading from the redo log files: Low SCN (SCN allocated when switching into log) Next SCN (SCN after redo in this log) Next SCN of current log should equal Low SCN of next log. DB ID (hashed DB name and create date) Thread number for redo in this log Log initialization seq# within thread Reset Logs Counter and SCN (from control file) at log switch Setting DB_BLOCK_CHECKSUM to TRUE enables both data and log block checksums. When TRUE, every redo log block is given a checksum before it is written to the current log.

Tuning ARCH Wait event log file switch (archiving needed)
Ensure optimal file configuration: Optimize throughput for LGWR and ARCH. Maintain resilience to media failures. Use asynchronous I/O: Operating system asynchronous I/O support Oracle asynchronous I/O parameters Evaluate size and number of redo logs. Tuning ARCH The wait event that indicates a need to tune ARCH is log file switch (archiving needed). Ensure optimal file configuration: Check disk queue lengths, CPU waits and usage, contention in I/O at disk, channel, and/or controller level. Start by making sure that the distribution of the redo log files and archive log destination is appropriate to reduce I/O hot spots. Redo log files should be on raw file systems with a RAID 0+1 configuration, and stored on dedicated disks with fine-grained striping enabled. Archive logs can usually only be created on cooked file systems (for example, UFS, NTFS). These file systems should be built with a RAID 0+1 configuration. Again, fine-grained striping could be used. Archive logs should always be separated from the redo logs and should not be on RAID-5 devices. Use asynchronous I/O: Determine if your operating system supports asynchronous I/O. The Oracle server is likely to need an initialization parameter to use it. Asynchronous reads help tremendously. Asynchronous writes may help if the operating system supports asynchronous I/O to regular file systems. Evaluate the size and number of the online redo logs: Usually just increasing the size and the number of redo log groups gives ARCH more room to keep up with LGWR. Adding more online logs does not help if LGWR activity is always high. However, it may help if LGWR activity is spasmodic because ARCH can catch up during slow periods.

Tuning ARCH Establish a strategy for temporary peaks:
Reduce the number of archive destinations. Use multiple ARCH processes. Tune the archiving process by adjusting: _LOG_ARCHIVE_BUFFER_SIZE _LOG_ARCHIVE_BUFFERS Tuning ARCH (continued) The following initialization parameters control writing to multiple destinations (max 10): LOG_ARCHIVE_DEST_n: Can be set as OPTIONAL or MANDATORY LOG_ARCHIVE_DEST_STATE: The availability state of the corresponding destination. Values: ENABLE or DEFER LOG_ARCHIVE_MIN_SUCCEED_DEST: The minimum number of destinations that must succeed Beware, however, that using multiple archive destinations increases the amount of work to be performed by ARCH. However, multiple archiver processes can also be enabled with LOG_ARCHIVE_MAX_PROCESSES initialization parameter. The maximum is 10 processes. Finally, if everything else has failed to bring the speed of ARCH in line with LGWR, then fine-level tuning can be achieved with the parameters that control the number and size of archive buffers: _LOG_ARCHIVE_BUFFER_SIZE _LOG_ARCHIVE_BUFFER_DEST

Tuning ARCH (continued)
You can tune archiving to either run as slow as possible without being a bottleneck, or as quickly as possible without reducing system performance substantially. Minimizing the Impact on System Performance To make ARCn processes work as slowly as possible without forcing the system to wait for redo logs, begin by setting the number of archive buffers to 1 and the size of each buffer to the maximum possible. If the performance of the system drops significantly while ARCn is working, then make the value of _LOG_ARCHIVE_BUFFER_SIZE lower until system performance is no longer reduced when ARCn runs. Note: If you want to set archiving to be very slow, but find that the Oracle server frequently has to wait for redo log files to be archived before they can be reused, then you can create additional redo log file groups, or use multiple archiver processes. Improving Archiving Speed To improve archiving performance, use multiple archive buffers to force the ARCn process to read the archive log at the same time that it writes the output log. You can set _LOG_ARCHIVE_BUFFERS to 2, but for a very fast tape drive you may want to set it to 3 or more. Then, set the size of the archive buffers to a moderate number, and increase it until archiving is as fast as you want it to be without impairing system performance.

Multiple Archive Log Processes
Support multiple archive locations Increase archiving throughput Reduce the need to perform manual archives Are controlled by the new dynamic parameter: LOG_ARCHIVE_MAX_PROCESSES Are started if LOG_ARCHIVE_START=TRUE and automatic archiving is enabled Default value is 1 Multiple Archive Log Processes An Oracle instance can have up to ten ARCn processes (ARC0 to ARC9). The LGWR process starts a new ARCn process whenever the current number of ARCn processes is insufficient to handle the workload. The alert log keeps a record of when LGWR starts a new ARCn process (load-balance scheduling of new ARCH process). The parameter LOG_ARCHIVE_MAX_PROCESSES specifies the number of ARCn processes to be invoked. Multiple ARCn processing prevents the bottleneck that occurs when LGWR switches through the multiple online redo logs faster than a single ARCn process can write inactive logs to multiple destinations. Note that each ARCn process works on only one inactive log at a time but must archive to each specified destination. Because LGWR keeps track of the archiving performance, generally you do not need to change the value of the LOG_ARCHIVE_MAX_PROCESSES parameter, but in case you anticipate a heavy workload for archiving, such as during bulk loading of data, you can modify its value to start multiple archivers quickly. You can do this by changing the initialization parameter file or by dynamically invoking the ALTER SYSTEM command without having to shut down the instance. For more information, refer to WebIV Note:

Archive Problem Solving
Obtain a thorough description of the problem. Perform debugging with archive commands. Check v$ views. Check any archive trace files. Determine what ARCH is calling. Check for platform-specific issues. Find a solution to prevent the problem occurring. Archive Problem Solving Use the ARCHIVE LOG LIST command to reveal whether archiving is enabled and whether an ARCH process is running. To determine if there is a database hanging problem, compare NEXT LOG TO BE ARCHIVED with the OLDEST ONLINE LOG SEQUENCE number. Using the ALTER SYSTEM ARCHIVE LOG [NEXT|ALL|CURRENT] commands, usually generates further understanding. Check the v$ views. These are numerable and offer valuable information on the status of the ARCH process. Some to consider: v$log v$archive v$archive_dest v$archive_processes v$archived_log The alert.log or background trace files may indicate that the current archive destination cannot be written to, possibly due to lack of disk space. Immediate resolution usually requires freeing up disk space, or redirecting archiving to a different destination.

Archive Problem Solving (continued)
If there is no I/O contention and ARCH seems too slow, or is hanging, but there is no evidence of error or hardware problems, then it may be necessary to dump the call stack for ARCH. This usually requires the use of an operating system debugger or oradebug. Check for platform-specific issues that may be causing archiving bottlenecks. Lastly, if an archiving problem occurs, then it is time to review your archiving procedures. Archive log files should be backed up to media that can be taken off site. When done properly, archiving should not incur degradation to the performance of a database.

Summary In this lesson, you should have learned about:
Causes of performance problems that are related to redo record generation Internal implementation of LGWR and the redo log buffer Tuning redo generation Internal implementation of ARCH Tuning archiving

References WebIV Note: 73163.1
Source: kcrfw.c, kcrf.h, kcrfh.h, kcrr.h, kcrr.c

Redo and Archiving.

Similar presentations

Presentation on theme: "Redo and Archiving."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Redo and Archiving.

Similar presentations

Presentation on theme: "Redo and Archiving."— Presentation transcript:

Similar presentations

About project

Feedback