Presentation is loading. Please wait.

Presentation is loading. Please wait.

Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007.

Similar presentations


Presentation on theme: "Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007."— Presentation transcript:

1

2 Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007

3 Whats our Goal for Today? We will divulge & show you how to deal with some of the complications of Streams.

4 Our Agenda Overview of Streams Replication Overview of Streams Replication What are the parts of Streams? What are the parts of Streams? How does it work? How does it work? Troublesome Parts of Streams Troublesome Parts of Streams Troubleshooting techniques Troubleshooting techniques A fun trivia question A fun trivia question Types of bottlenecks & how to resolve Types of bottlenecks & how to resolve Some helpful SQL scripts Some helpful SQL scripts Questions? Questions?

5 A Few Caveats Unless otherwise stated, our examples assume Oracle 10g, release 2. Unless otherwise stated, our examples assume Oracle 10g, release 2. Streams set-up is not the real problem; thus our we focus on monitoring and troubleshooting. Streams set-up is not the real problem; thus our we focus on monitoring and troubleshooting. Set-up documented in Oracle 10g Release 2 Concepts & Admin Guide. Set-up documented in Oracle 10g Release 2 Concepts & Admin Guide. Alsothe subject of conflict resolution is a complex subject of its own, and will not be covered. Alsothe subject of conflict resolution is a complex subject of its own, and will not be covered.

6 Is Streams Difficult or Easy? How difficult is it to figure out all the pieces, & keep Streams working well? Heres one opinion...

7 Gosh, Its Really Easy! Your Life will be so Peaceful You dont need rocket scientists on your staff! * * Marketing executive, on the simplicity of Streams/CDC.

8 Heres another point of view...

9 Streams is like a Whirlpool Is Streams really a proven product? * * Programmer, noting the numerous bug fixes. DBA

10 Streams Replication: A View from 10,000 Feet Streams propagates both DML & DDL changes to another database (or elsewhere in same database.)Streams propagates both DML & DDL changes to another database (or elsewhere in same database.) Streams is based on Log Miner, which has been around for years.Streams is based on Log Miner, which has been around for years. You decide, via rules, which changes are replicated.You decide, via rules, which changes are replicated. You can optionally transform data before you apply it at the destination.You can optionally transform data before you apply it at the destination.

11 Overview of Streams : Three Main Processes Capture Propagate Apply c0, c1 a001 background processes Note: p processes also used in capture & apply J00, j01

12 Whats in the Redo Log? LogMiner continuously reads DML & DDL changes from redo logs. It converts those changes into logical change records (LCRs). There is at least 1 LCR per row changed. The LCR contains the actual change, as well as the original data. Redo logs LCRs LogMiner Processing

13 Whats in the Redo Log? Recall that changed (dirty) blocks in the redo log buffer are written after anyones commitnot just your own. This means that changes will often be captured, propagated (but not applied) even though you havent committed!

14 The Capture Process: Capture Change Capture Change Generate LCR Enqueue Message The capture change step reads changes from redo logs. All changes (DML and DDL) in redo logs are captured – regardless of whether Streams is configured to propagate any given change. Observe capture_message_number value (in v$streams_capture). It regularly increments – even if there are no application transactions.

15 The Capture Process: Generate LCR First, examine change & determine if Streams is configured to handle that change or not. If so, convert into one or more logical change records, or LCRs. An LCR only contains changes for one row. DML statements that affect multiple rows will cause multiple LCRs to be generated. So, if one statement updates 1,000 rows, at least 1,000 LCRs will be created! Capture Change Generate LCR Enqueue Message

16 The Capture Process: Enqueue Message This step places previously-created LCRs onto the capture processs queue. Oracle uses q00n background workers, & the QMNC (Queue Monitor Coordinator) Capture Change Generate LCR Enqueue Message q001 q002q003 QMNC

17 The Capture Process: Enqueue Message Example of the queue background processes: Select Program, Sid From V$session Where Upper(program) Like '%(Q%' And Osuser = 'Oracle Order By 1; PROGRAM SID (QMNC) 471 (q000) 456 (q001) 936 (q002) 1366 (q003) 261 (q004) 744

18 The Propagate Process Source Queue Propagate Process Destination Queue Streams copies LCRs from the source queue to destination queue.Streams copies LCRs from the source queue to destination queue. These transfers are done by the J00n background processes.These transfers are done by the J00n background processes. How do we know its the j processes?How do we know its the j processes? Lets look at session statistics and see whos doing all the db link communicatingLets look at session statistics and see whos doing all the db link communicating

19 The Propagate Process Whos doing the Sending? Select Sid, Name, Value From V$sesstat One, V$statname Two Where One.Statistic# = Two.Statistic# And Name Like '%Link% And Value > 1000 SID NAME VALUE SID NAME VALUE bytes sent via SQL*Net to dblink E bytes sent via SQL*Net to dblink E bytes sent via SQL*Net to dblink E bytes sent via SQL*Net to dblink E bytes received via SQL*Net from dblink bytes received via SQL*Net from dblink bytes received via SQL*Net from dblink bytes received via SQL*Net from dblink SQL*Net roundtrips to/from dblink SQL*Net roundtrips to/from dblink SQL*Net roundtrips to/from dblink SQL*Net roundtrips to/from dblink SIDs 467, 947 are indeed the j00 processes

20 The Propagate Process Whats in the Queue? Sample of the queue content: Select Queue_Name, Num_Msgs From V$Buffered_Queues; QUEUE_NAME NUM_MSGS APP_NY_PROD_Q 544 CAP_LA_PROD_Q 1640

21 The Apply Process Queue and Dequeue LCRs usually wait in destination queue until their transaction is committed or rolled back.LCRs usually wait in destination queue until their transaction is committed or rolled back. There are some exceptions, due to the queue spilling (covered later)There are some exceptions, due to the queue spilling (covered later) lcrlcrlcr lcr lcr lcr lcr lcr The LCRs also remain on the capture queue until they are applied at the destination.The LCRs also remain on the capture queue until they are applied at the destination.

22 When transaction commits, the Streams sends over a single Commit LCR.When transaction commits, the Streams sends over a single Commit LCR. After destination applies LCRs, destination sends back confirming LCR that its okay for capture queue to remove those LCRs.After destination applies LCRs, destination sends back confirming LCR that its okay for capture queue to remove those LCRs. For rollback, source sends 1 rollback LCR for each LCR in that transaction.For rollback, source sends 1 rollback LCR for each LCR in that transaction. Commits & Rollbacks And now to the troubleshooting...

23 Trivia Question Question: how does Streams avoid infinite loop replication? Question: how does Streams avoid infinite loop replication? For example, after a change is applied at a destination database, why doesnt Streams re- capture that change, and then propagate it back to the original source database? For example, after a change is applied at a destination database, why doesnt Streams re- capture that change, and then propagate it back to the original source database? The answer to this question will be The answer to this question will be provided later in the presentation.

24 LCR Concepts An LCR is one type of Streams message – the type that is formatted by the Capture process. Two types of LCRs: DDL LCRs and row LCRs. Two types of LCRs: DDL LCRs and row LCRs. A DDL LCR contains information about a single DDL change. A DDL LCR contains information about a single DDL change. A row LCR contains information about a A row LCR contains information about a change to a single row. As a result, transactions that affect multiple As a result, transactions that affect multiple rows will cause multiple LCRs to be generated.

25 LCR Concepts LOB Issues Tables that contain LOB datatypes can cause multiple LCRs to be generated per row affected! Tables that contain LOB datatypes can cause multiple LCRs to be generated per row affected! Inserts into tables with LOBs will always cause multiple LCRs to be generated per row. Inserts into tables with LOBs will always cause multiple LCRs to be generated per row. Updates into those tables might cause multiple LCRs, if any of the LOB columns are updated. Updates into those tables might cause multiple LCRs, if any of the LOB columns are updated. Deletes will never generate multiple LCRs – deletes always generate 1 LCR per row. Deletes will never generate multiple LCRs – deletes always generate 1 LCR per row.

26 LCR Concepts Other Items to Note All of the LCRs that are part of a single transaction must be applied by one apply server.All of the LCRs that are part of a single transaction must be applied by one apply server. If a transaction generates 10,000 LCRs, then all 10,000 of them must be applied by just one apply server – no matter how many servers are running. For each transaction, there is one additional LCR generated, at the end of the transaction.For each transaction, there is one additional LCR generated, at the end of the transaction. If a single transaction deletes 2,000 rows, then 2,001 LCRs will be generated. This item is important to note for the spill threshold value.

27 Spill Concepts Normally, outstanding messages are held in Normally, outstanding messages are held in buffered queues, which reside in memory. buffered queues, which reside in memory. In some cases, messages can be spilled; that is, In some cases, messages can be spilled; that is, they can be moved into tables on disk. 3 Reasons: they can be moved into tables on disk. 3 Reasons: The total number of outstanding messages The total number of outstanding messages is too large to fit in the buffered queue; is too large to fit in the buffered queue; A given message has been in memory too long; A given message has been in memory too long; The number of messages in a single transaction is The number of messages in a single transaction is larger than the LCR spill threshold parameter. larger than the LCR spill threshold parameter.

28 Spill Concepts Types of Spill Tables There are two types of tables that can hold spilled messages: There are two types of tables that can hold spilled messages: Queue tables: each buffered queue has a queue table associated with it. Queue tables are created at the same time as buffered queues.Queue tables: each buffered queue has a queue table associated with it. Queue tables are created at the same time as buffered queues. Name format for queue tables: AQ$_ _P Example: AQ$_CAPTURE_QUEUE_TABLE_P

29 Spill Concepts Types of Spill Tables (cont.) The Spill table: there is one (and only one) spill table, in any database that uses Streams.The Spill table: there is one (and only one) spill table, in any database that uses Streams. The name of the spill table is: SYS.STREAMS$_APPLY_SPILL_MSGS_PART As the name implies, the spill table can only be used by apply processes – not by capture processes.

30 Spill Concepts Where are the spilled messages? Tables where spilled messages are placed: Tables where spilled messages are placed: If a message is spilled because the total number of outstanding messages is too large, then that message is placed in the associated queue table.If a message is spilled because the total number of outstanding messages is too large, then that message is placed in the associated queue table. If a message has been held in memory too long, then that message is placed in the queue table.If a message has been held in memory too long, then that message is placed in the queue table. If the number of LCRs in a single transaction exceeds the LCR spill threshold, then all of those LCRs are placed in the spill table – SYS.STREAMS$_APPLY_SPILL_MSGS_PART.If the number of LCRs in a single transaction exceeds the LCR spill threshold, then all of those LCRs are placed in the spill table – SYS.STREAMS$_APPLY_SPILL_MSGS_PART.

31 Troubleshooting Two basic ways to troubleshoot Streams issues: Two basic ways to troubleshoot Streams issues: Query the internal Streams tables and views;Query the internal Streams tables and views; Search the alert log for messages. (This is particularly useful for capture process issues).Search the alert log for messages. (This is particularly useful for capture process issues).

32 Capture Process Tables and Views streams$_capture_process: lists all defined capture processes capture processes dba_capture: basic status, error info v$streams_capture: detailed status info dba_capture_parameters: configuration info

33 Propagate Process Tables and Views streams$_propagation_process: lists all defined propagate procs propagate procs dba_propagation: basic status, error info v$propagation_sender: detailed status v$propagation_receiver: detailed status

34 Apply Process Tables and Views streams$_apply_process: lists all defined apply processes processes dba_apply: basic status, error info v$streams_apply_reader: status of the apply reader v$streams_apply_server: status of apply server(s) v$streams_apply_coordinator: overall status, latency info latency info dba_apply_parameters: configuration info

35 Miscellaneous Tables and Views v$buffered_queues: view that displays the current and cumulative number of messages, for each buffered queue. sys.streams$_apply_spill_msgs_part: table that the apply process uses, to spill messages from large transactions to disk. system.logmnr_restart_ckpt$: table that holds capture process checkpoint information.

36 Types of Bottlenecks Two main types of bottlenecks: Type 1: Replication is completely stopped; i.e., no changes are being replicated at all. Type 2: Replication is running, but it is slower than the rate of DML; i.e., it is falling behind.

37 Capture Process Bottlenecks Capture Process Bottlenecks Capture process bottlenecks typically have to do with a capture process being unable to read necessary online or (especially) archive logs. Capture process bottlenecks typically have to do with a capture process being unable to read necessary online or (especially) archive logs. This will result in a Type 1 bottleneck – that is, no changes will be replicated at all (because no changes can be captured). This will result in a Type 1 bottleneck – that is, no changes will be replicated at all (because no changes can be captured). Almost all of the Type 1 bottlenecks that I have ever encountered have been due to archive log issues with capture processes. Almost all of the Type 1 bottlenecks that I have ever encountered have been due to archive log issues with capture processes.

38 Capture Process Bottlenecks Capture Checkpoints Capture Process Bottlenecks Capture Checkpoints The capture process writes its own checkpoint information to its data dictionary tables. The capture process writes its own checkpoint information to its data dictionary tables. This checkpoint info keeps track of the SCN values that the capture process has scanned. That information is used to calculate the required_checkpoint_scn value. This checkpoint info keeps track of the SCN values that the capture process has scanned. That information is used to calculate the required_checkpoint_scn value. Capture process checkpoint information is primarily stored in the following table: Capture process checkpoint information is primarily stored in the following table:SYSTEM.LOGMNR_RESTART_CKPT$

39 Capture Process Bottlenecks Capture Checkpoints (cont.) Capture Process Bottlenecks Capture Checkpoints (cont.) By default, the capture process writes checkpoints very frequently, and stores their data for a long time. On a very write-intensive system, this can cause the LOGMNR_RESTART_CKPT$ table to become extremely large, very quickly. By default, the capture process writes checkpoints very frequently, and stores their data for a long time. On a very write-intensive system, this can cause the LOGMNR_RESTART_CKPT$ table to become extremely large, very quickly. The amount of data stored in that table can be modified with these capture process parameters: The amount of data stored in that table can be modified with these capture process parameters: _checkpoint_frequency: number of megabytes captured which will trigger a checkpoint checkpoint_retention_time: number of days to retain checkpoint information

40 Capture Process Bottlenecks SCN checks Capture Process Bottlenecks SCN checks When a capture process starts up, it calculates its required_checkpoint_scn value. This value determines which redo logs must be scanned, before any new transactions can be captured. When a capture process starts up, it calculates its required_checkpoint_scn value. This value determines which redo logs must be scanned, before any new transactions can be captured. The capture process needs to do this check, in order to ensure that it does not miss any transactions that occurred while it was down. The capture process needs to do this check, in order to ensure that it does not miss any transactions that occurred while it was down. As a result, when a capture process starts up, the redo log that contains that SCN value – As a result, when a capture process starts up, the redo log that contains that SCN value – and every subsequent log – must be present in the log_archive_dest directory (or in online logs).

41 Capture Process Bottlenecks SCN checks (cont.) Capture Process Bottlenecks SCN checks (cont.) If any required redo logs are missing during a capture process restart, then the capture process will permanently hang during its startup. If any required redo logs are missing during a capture process restart, then the capture process will permanently hang during its startup. This issue will completely prevent the capture process from capturing any new changes. This issue will completely prevent the capture process from capturing any new changes. If the required redo log(s) cannot be restored, then the only way to resolve this situation is to completely rebuild the Streams environment. If the required redo log(s) cannot be restored, then the only way to resolve this situation is to completely rebuild the Streams environment. As a result, it is extremely important to keep track of redo logs that the capture process needs, before deleting any logs.

42 Capture Process Bottlenecks Which archive logs are needed? Capture Process Bottlenecks Which archive logs are needed? The following query can be used to determine the oldest archive log that will need to be read, during The following query can be used to determine the oldest archive log that will need to be read, during the next restart of a capture process. select a.sequence#, b.name from v$log_history a, v$archived_log b where a.first_change# <= (select required_checkpoint_scn from dba_capture (select required_checkpoint_scn from dba_capture where capture_name = ) where capture_name = ) and a.next_change# > (select required_checkpoint_scn from dba_capture (select required_checkpoint_scn from dba_capture where capture_name = ) where capture_name = ) and a.sequence# = b.sequence#(+); If no rows are returned from that query, then the SCN in question resides in an online log. If no rows are returned from that query, then the SCN in question resides in an online log.

43 Capture Process Bottlenecks Flow Control Capture Process Bottlenecks Flow Control In 10g, the capture process is configured with In 10g, the capture process is configured with automatic flow control. This feature prevents the automatic flow control. This feature prevents the capture process from spilling many messages. capture process from spilling many messages. If a large number of messages build up in the If a large number of messages build up in the capture processs buffered queue, then it will capture processs buffered queue, then it will pause capturing new messages, until some pause capturing new messages, until some messages are removed from the queue. messages are removed from the queue. A message cannot be removed from the capture A message cannot be removed from the capture processs queue until one of these items occurs: processs queue until one of these items occurs: The message is applied at the destination The message is applied at the destination The message is spilled at the destination The message is spilled at the destination

44 Propagate Process Bottlenecks I have heard of a few Type 2 propagate I have heard of a few Type 2 propagate bottlenecks in some extreme environments. bottlenecks in some extreme environments. Parameter changes to consider, to try to avoid Parameter changes to consider, to try to avoid propagate-related bottlenecks: propagate-related bottlenecks: Set the propagate parameter LATENCY to 0Set the propagate parameter LATENCY to 0 Set the propagate parameterSet the propagate parameter QUEUE_TO_QUEUE to TRUE (10.2 only) QUEUE_TO_QUEUE to TRUE (10.2 only) Set the init.ora parameter _job_queue_interval to 1Set the init.ora parameter _job_queue_interval to 1 Set the init.ora parameter job_queue_processes to 4 or moreSet the init.ora parameter job_queue_processes to 4 or more

45 Apply Process Bottlenecks Apply Process Bottlenecks Apply process bottlenecks usually deal with the apply process not keeping up with the rate of DML being executed on the source tables. I have generally only seen apply bottlenecks with batch DML (as opposed to OLTP DML). I have generally only seen apply bottlenecks with batch DML (as opposed to OLTP DML). The three main areas to be concerned with regarding apply process bottlenecks are: The three main areas to be concerned with regarding apply process bottlenecks are: - Commit frequency; - Number of apply servers; - Other Streams parameters.

46 Apply Process Bottlenecks Commit Frequency Apply Process Bottlenecks Commit Frequency The number of rows affected per transaction has The number of rows affected per transaction has an enormous impact on apply performance: an enormous impact on apply performance: The number of rows affected per transaction The number of rows affected per transaction should not be too large – due to spill, and to the should not be too large – due to spill, and to the one apply server per transaction restriction. one apply server per transaction restriction. The number of rows affected per transaction The number of rows affected per transaction should not be too small, either – due to DML should not be too small, either – due to DML degradation, and to apply transaction overhead. degradation, and to apply transaction overhead. From my experience, the optimal commit From my experience, the optimal commit frequency is around 500 rows. frequency is around 500 rows.

47 Apply Process Bottlenecks Number of Apply Servers Apply Process Bottlenecks Number of Apply Servers The number of parallel apply servers also has a large impact on the apply processs performance. The number of parallel apply servers also has a large impact on the apply processs performance. The number of parallel apply servers is set by the apply PARALLELISM parameter. The number of parallel apply servers is set by the apply PARALLELISM parameter. From my experience, for best throughput, the PARALLELISM parameter should be three times the number of CPUs on the host machine. From my experience, for best throughput, the PARALLELISM parameter should be three times the number of CPUs on the host machine. Note: setting PARALLELISM that high can potentially eat up all of the CPU cycles on the host, during intensive DML jobs. Note: setting PARALLELISM that high can potentially eat up all of the CPU cycles on the host, during intensive DML jobs.

48 Apply Process Bottlenecks Other Streams Parameters Apply Process Bottlenecks Other Streams Parameters Several other Streams-related parameters can Several other Streams-related parameters can also have an effect on apply performance: also have an effect on apply performance: Apply process parameter _HASH_TABLE_SIZE: Apply process parameter _HASH_TABLE_SIZE: set this relatively high (such as ), to set this relatively high (such as ), to minimize wait dependency bottlenecks. minimize wait dependency bottlenecks. Apply process parameter Apply process parameter TXN_LCR_SPILL_THRESHOLD: set to be a little TXN_LCR_SPILL_THRESHOLD: set to be a little bit higher than the maximum number of LCRs bit higher than the maximum number of LCRs generated per transaction, to prevent spill. generated per transaction, to prevent spill. Init.ora parameter aq_tm_processes: set to 1. Init.ora parameter aq_tm_processes: set to 1.

49 Apply Process Bottlenecks Turning Off Streams Apply Process Bottlenecks Turning Off Streams As mentioned previously, apply bottlenecks generally occur with batch-style DML. As mentioned previously, apply bottlenecks generally occur with batch-style DML. It is possible to configure a session so that Streams will not capture any DML from that session – and therefore, that sessions DML will not be propagated or applied. It is possible to configure a session so that Streams will not capture any DML from that session – and therefore, that sessions DML will not be propagated or applied. This technique allows batch-style DML to be run at each Streams database individually – rather than running the batch DML in one database, and then having that DML get replicated to all of the other databases. This technique allows batch-style DML to be run at each Streams database individually – rather than running the batch DML in one database, and then having that DML get replicated to all of the other databases.

50 Apply Process Bottlenecks Turning Off Streams (cont.) Apply Process Bottlenecks Turning Off Streams (cont.) Turning off Streams in a session is done by setting the sessions Streams tag to a non-NULL value. Here is an example of turning Streams off before a batch statement, and then back on: Turning off Streams in a session is done by setting the sessions Streams tag to a non-NULL value. Here is an example of turning Streams off before a batch statement, and then back on: exec dbms_streams.set_tag (tag => hextoraw(01)); commit; exec dbms_streams.set_tag (tag => NULL); Note: this technique will only work if the include_tagged_lcr parameter, in the Streams capture rules, is set to FALSE. (FALSE is the default value for that parameter in Streams rules.) Note: this technique will only work if the include_tagged_lcr parameter, in the Streams capture rules, is set to FALSE. (FALSE is the default value for that parameter in Streams rules.)

51 Apply Process Bottlenecks Apply Process Aborts! Apply Process Bottlenecks Apply Process Aborts! If there are any triggers on any replicated tables, If there are any triggers on any replicated tables, then the apply process can sometimes abort – and then the apply process can sometimes abort – and not restart – during periods of intense DML activity. not restart – during periods of intense DML activity. These aborts happen most frequently when These aborts happen most frequently when one particular apply server becomes one particular apply server becomes overloaded, and has to run too much overloaded, and has to run too much recursive SQL (triggers cause recursive SQL). recursive SQL (triggers cause recursive SQL). This issue is caused by Oracle bug # This issue is caused by Oracle bug # This bug is apparently fixed in ; and This bug is apparently fixed in ; and there are backport patches available for there are backport patches available for and and

52 Streams Spot Check Script The script called ssc.sql performs a spot check on a variety of Streams objects. Here is some sample output from it: ********** Streams report for dbprod on 03/28/ :33:54 ********** Capture process : STREAMS_CAPTURE_PROCESS CREATE LCR Propagate process: STREAMS_PROPAGATE_PROCESS ENABLED Apply reader : STREAMS_APPLY_PROCESS DEQUEUE MESSAGES Apply server : STREAMS_APPLY_PROCESS EXECUTE TRANSACTION Apply server : STREAMS_APPLY_PROCESS IDLE Apply coordinator: STREAMS_APPLY_PROCESS APPLYING Latency: 2 Buffered queue : STREAMS_CAPTURE_QUEUE Messages: 384 Spill msgs: 21 Buffered queue : STREAMS_APPLY_QUEUE Messages: 4721 Spill msgs: 0 Spill table rows : 0 Total streams pool memory : Free streams pool memory : *************** End of report ***************

53 Contact Information Brian Keating: Chris Lawson: This presentation, and the script mentioned in it, are available for download on the Oracle Magician website:


Download ppt "Diagnosing the Bottlenecks in your Streams Environment Brian Keating Chris Lawson May 17, 2007."

Similar presentations


Ads by Google