Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.infocrest.com1 11262004 Copyright ©2004 Notes.

Similar presentations


Presentation on theme: "Www.infocrest.com1 11262004 Copyright ©2004 Notes."— Presentation transcript:

1 www.infocrest.com1 11262004 Copyright ©2004 Notes

2 www.infocrest.com2 11262004 Copyright ©2004 Step 3 - Worklets o Logical associations m Assemble tasks that form a logical step into a worklet, I.e: n Load Staging n Load Dimensions n Load Facts… o Parallel Paths m Avoid the temptation of connecting all your sessions directly to the worklet start task, even if they can run concurrently n Know your system’s glass ceiling and build accordingly n Start with one path or two parallel paths and build up from there, while watching your system performance X Don’t start with this scenario Start with this scenario and add more parallel paths if your hardware permits

3 www.infocrest.com3 11262004 Copyright ©2004 Step 3 - Worklets o Usability m Build your worklets as restartable units m Nest worklets n For groups of interdependent sessions n For sessions that must be executed in a prescribed order n For instance, a session doing a batch delete that must precede a session doing the batch insert o Variables m A worklet level variables can be set to the value of a workflow level variable m A workflow level variable cannot be set to the value of a worklet level variable Worklet task Parameters Assign the value of a workflow variable to a worklet variable

4 www.infocrest.com4 11262004 Copyright ©2004 Step 4 - Workflow o Flow m Join your worklets to form the master workflow m Link worklets in series when n Worklets already contain parallel paths n Worklets must be executed in sequence o Functionality m Add workflow level functionality n Timed notifications n Success email n Suspend on error email m Modify default links between worklets n Set links to return false on error

5 www.infocrest.com5 11262004 Copyright ©2004 Step 4 - Workflow o Error handling m Suspend on Error n Stops the workflow if a task errors n Only stop the tasks in the same execution path, other paths still run n Can send an email on suspension n Restart workflow from the Workflow monitor after error fix n Task that suspended the workflow is restarted from the top, unless you specified recovery for the failed session. In that case you can use ‘Recover Workflow From Task’ n Works well with fully restartable mappings Email task must be reusable. Only one email is sent per suspension.

6 www.infocrest.com6 11262004 Copyright ©2004 Step 4 - Workflow o Error handling m Using tasks properties and link conditions n Link condition between a failed task and the next task must evaluate to FALSE if you don’t want the workflow to continue n If a link evaluates to FALSE in a Workflow with multiple execution branches, only the branch affected by the error is stopped PrevTaskStatus=SUCCEEDED Fail parent if this task fails is set Stop the execution of the top branch if the first worklet fails and mark the workflow as failed when the bottom branch completes

7 www.infocrest.com7 11262004 Copyright ©2004 Step 4 - Workflow o Error handling m Using link conditions and control tasks n Use a control task if you want to stop or abort the entire Workflow or an entire Worklet. n As with session properties, setting the Control task to Fail Parent… only marks the enclosing object as failed but does not stop or abort it. PrevTaskStatus=SUCCEEDED Stop the execution of both branches, but set the final status of the workflow to stopped, not failed PrevTaskStatus=FAILED Set to Stop top-level workflow

8 www.infocrest.com8 11262004 Copyright ©2004 Step 5 - Triggers o Are the sources available? m You will probably need to wait for some kind of triggering event before you can start the workflow o Event Wait m This task is the easiest way to implement triggers m Trigger files must be directed to the Informatica server machine Will remove the trigger file if checked m If you want to archive trigger files instead of deleting them, add a command task: Delete Filewatch file property cleared

9 www.infocrest.com9 11262004 Copyright ©2004 Notes

10 www.infocrest.com10 11262004 Copyright ©2004 Step 5 - Triggers o Waiting for Multiple Source Systems m One workflow branch for each system, each with an event wait task m Re-join the branches at a Decision task or Event raise task (optional but cleaner) m Make sure the task where all branches rejoin has the ‘Treat links as’ property set to ‘AND’ (default)

11 www.infocrest.com11 11262004 Copyright ©2004 Step 5 - Scheduler o Availability m Reusable, as a special object m Non-reusable, under Workflows  Edit  Scheduler Non- Reusable Reusable

12 www.infocrest.com12 11262004 Copyright ©2004 Step 5 - Scheduler o Basic Properties Start when server starts Default mode, not scheduled Run again as soon as previous run is completed Calendar based run windows Runs every 15 minutes, starting 3/7/03 15:21 and ending 3/24/03

13 www.infocrest.com13 11262004 Copyright ©2004 Step 5 - Scheduler o Custom Repeats Repeat frequency Repeat any day of the month, or several days a month Here, repeats every last Saturday of the month Repeat any day of the week, or several days a week

14 www.infocrest.com14 11262004 Copyright ©2004 Step 5 - Scheduler o Custom Repeats m The scheduler cannot specify a time window within a day (I.e. run every day between 8PM and 11PM) m For this, use a link condition between the start task and the next task and schedule the Workflow to run continuously or every (n) minutes Runs if workflow started between 8 and 10:59 PM

15 www.infocrest.com15 11262004 Copyright ©2004 Step 6 - Testing o One piece at a time m Verify the functionality of your worklets using the ‘Start Task’ command Start task o Testing Worklet Tasks m In order to test and run individual tasks within a worklet, you can copy all the tasks in the worklet and paste them into a new empty workflow m These test workflows can also be used in production, if you have to rerun a worklet or part of a worklet while the main workflow is still running

16 www.infocrest.com16 11262004 Step 6 - Testing o Gantt Chart m Monitor this view and the server performance at the same time m Identify workflow bottlenecks quickly (candidates for partitioning) m Monitor sessions performances when they run concurrently

17 www.infocrest.com17 11262004 Copyright ©2004 Notes

18 www.infocrest.com18 11262004 Copyright ©2004 Session and Server Variables

19 www.infocrest.com19 11262004 Copyright ©2004 o Session Variables m Some session properties can be parameterized with the following variable names: n $DBConnection_Name n $BadFile_Name n $InputFile_Name n $OutputFile_Name n $LookupFile_Name n $PMSessionLogFile m Use parameters in session properties to override n Source, target, lookup or stored procedure connections n Source, target or lookup file names n Reject file names n Session log file names m Session parameters do not have default values n You must provide a value in a parameter file or the session will fail Session & Server Variables

20 www.infocrest.com20 11262004 Copyright ©2004 o Using Session Variables in Session Properties Session & Server Variables [ses_BrowserReport] $InputFile_dailylog_part1=daily_1.log $InputFile_dailylog_part2=daily_2.log 1- use parameter names in the session properties 2- specify a parameter file name in the general properties 3- add an entry for each parameter used in the session properties

21 www.infocrest.com21 11262004 Copyright ©2004 o Server Variables m Specifies the default location of various folders on the Informatica server machine such as n Root directory n Session log directory n Cache and Temp directories n Source, target and lookup file directories n External procedures directory m Also provides default values for the following properties n Success or failure email user n Session and workflow log count n Session error threshold m These variables are set at the server level and cannot be overridden with a parameter file Session & Server Variables

22 www.infocrest.com22 11262004 Copyright ©2004 o Using Session and Server Variables in Session Components Session & Server Variables [ses_BrowserReport] $InputFile_dailylog_part1=daily_1.log $InputFile_dailylog_part2=daily_2.log $PMSessionLogFile=ses_BrowserReport.log 1- select a session component, either non-reusable or reusable 2- use either session or server parameters within the command 3- add an entry for each session parameter used

23 www.infocrest.com23 11262004 Copyright ©2004 Parameter Files o Workflow Parameter Files m Use to override workflow or worklet user-defined variables m Can also contain values for parameters and variables used in sessions and mappings within the workflow m The path to this log file can be provided at the ‘pmcmd’ command line m If you use both, the command line argument has precedence Workflow properties

24 www.infocrest.com24 11262004 Copyright ©2004 o Format [Heading] parameterName = parameterValue o Heading format m Workflow: n [folderName.WF:workflowName] m Worklet: n [folderName.WF:workflowName.WT:workletName] m Nested Worklet n [folderName.WF:workflowName.WT:workletName.WT:nestedWorkletName] m Session n [folderName.WF:workflowName.ST:sessionName] n [folderName.sessionName] n [sessionName] o Example [tradewind.WF:WKF_BrowserReport.ST:SES_loadFacts] $$lastLoadDate=10/2/2003 23:34:56 $$Filter=California,Nevada Fact_Mapplet.$$maxValues=78694.2 Parameter Files Default date formats: mm/dd/yyyy hh24:mi:ss mm/dd/yyyy String are not quoted Folder name required if session name is not unique in repository Workflow name required if session name is not unique in folder Names in heading are case-sensitive. Parameters and variable names are not Mapplet variable prefix

25 www.infocrest.com25 11262004 Copyright ©2004 Notes

26 www.infocrest.com26 11262004 Copyright ©2004 Incremental Load o Using Mapping Variables for Incremental Load m Speeds up the load by processing only the rows added or changed since the last load m Requires a good knowledge of the source systems to figure out exactly what a new or added row is m You can use Informatica’s mapping variable to implement incremental load parameters m For added safety, save the variables in parameter files: Informatica updates the variables in the repository upon completion of the load process Our process only updates values in the parameter file if the data is valid (balanced). If not, we can rerun the load with the old parameter values

27 www.infocrest.com27 11262004 Copyright ©2004 Partitioning

28 www.infocrest.com28 11262004 Informatica Server Architecture 1 - session is started 2 - load manager finds a slot for session 3 - load manager starts DTM 4 - DTM fetches session’s mapping in Repository 5 - DTM creates and starts stage threads

29 www.infocrest.com29 11262004 Copyright ©2004 o Efficient, multi-threaded architecture m data is processed in stages - Each stage is buffered - Stage processes overlap m at least one thread per reader, transformation and writer stage m other threads to control and monitor the overall process o User control m as a user you have control over many performance aspects - Memory usage per session and, in some cases, per transformation - Buffer block size (data is moved in memory by chunks) - Disk allocations, for caches, indexes and logs - Server allocation, with PowerCenter DTM Architecture Data is being transformed as more data is being read There is one DTM process for every running session (process runs as pmdtm) Data is being written as more data is being transformed

30 www.infocrest.com30 11262004 Copyright ©2004 o When to do it m After unit-testing m Mapping should be free of error and optimized m Only mappings seen as bottlenecks during volume testing should be candidates for partitioning o How to do it m Look at the big picture n What sessions are running concurrently with the session you want to partition n What other processes may be running at the same time as the partitioned session m A monster session you wish to partition should not be scheduled to run concurrently with other sessions m Reserve some time on the server to do your partitioning initial testing n If you have to compete with other processes, the test results may be skewed or meaningless m Add one partition at a time n Monitor the system closely. Look at RAM usage, CPU usage and disk I/O n Monitor the session closely. Re-compute the total throughput at each step n Each partition adds a new set of concurrent threads and a new set of memory caches n Sooner or later, you will hit your server’s glass ceiling. At which point, performance will degrade m Add partition point at transformations where you suspect there is a bottleneck n Partition points redistribute the data among partitions and allow for process overlap Partitioning Guidelines

31 www.infocrest.com31 11262004 Copyright ©2004 o Source Pipeline m Data flowing from a source qualifier to transformations and targets m There are two pipelines per joiner transformation. The master source pipeline stops at the joiner transformation m Pipelines are processed sequentially by the Informatica Server o Partition Point m Set at the transformation object level m Define the boundaries between DTM threads m Default partition points are created for n Source Qualifier or Normalizer transformations (reader threads) n Target instances (writer threads) n Rank and unsorted Aggregator transformations m Data can be redistributed between partitions at each partition point o Pipeline Stage m Area between partition points, where the partition threads operate m 3 default stages n Reader stage, reads the source data and brings it in to the Source Qualifier n Transformation stage, moves data from the Source Qualifier up to a Target instance n Writer stage, writes data to the target(s) m Adding a partition points creates one more transformation stage m The processing in each stage can overlap, resulting in improved performance Definitions

32 www.infocrest.com32 11262004 Copyright ©2004 Open one concurrent connection per partition, for relational sources and targets Stage 3Stage 4 Partition points Data can be redistributed between partition threads at partition points, using these methods: Pass through Round robin Hash keys Key/Value range Stage 5Stage 1Stage 2 o 2 partitions, 5 stages, 10 threads Flow Example

33 www.infocrest.com33 11262004 Copyright ©2004 Notes

34 www.infocrest.com34 11262004 Copyright ©2004 o You can change the partitioning method at each partition point to redistribute data between stage threads more efficiently o All methods but pass-through come at the cost of some performance o Round Robin m Distributes data evenly between stage threads m Use in the transformation stage, when reading from unevenly partitioned sources o Hash Key m Keeps data belonging to the same group in the same partition so the data is aggregated or sorted properly m Use with Aggregator, Sorter, Joiner and Rank transformations m Hash auto keys n Hash keys generated by the server engine, based on ‘groups by’ and ‘order by’ ports in transformations m Hash user keys n Define the ports you want to group by Partition Methods

35 www.infocrest.com35 11262004 Copyright ©2004 o Key/Value Range m Define a key (one or more ports) m Define a range of values for each partition m Use with relational sources or targets m You can specify additional SQL filters for relational sources or override the SQL entirely m Workflow Manager does not validate key ranges (missing or overlapping) o Pass Through m Use when you want to create a new pipeline stage without redistributing data. m If you want to set a partition point at an aggregator with sorted input, pass- through is the only method available o DB Target Partitioning m Only available for DB2 targets m Queries system tables and distributes output to the appropriate nodes Partition Methods

36 www.infocrest.com36 11262004 Copyright ©2004 Partitions and Caches o Partitioned Cache Files m Each partitioned cache only hold the data needed to process that partition m Caches are partitioned automatically for Aggregator and Rank transformations m Joiner caches will be partitioned if you set a partition point at the Joiner transformation n When using a joiner with sorted input and multiple partitions for both master and detail sides, make sure all the data before the joiner is kept into one partition to maintain the sort order, then use the hash auto-key partition method at the joiner. n To keep the data in one partition: l Flat files: use a pass-through partition point at the source qualifer with the flat file source connected to the first partition and dummy (empty) files connected to the other partitions l Relational: use a key range partition point at the source qualifier to bring the entire data set into the first partition m Lookup caches will be partitioned if n You set a hash auto key partition point at the Lookup transformation n You use only equality operators in the lookup condition n The database is set for case sensitive comparisons m Sorter caches are not partitioned.

37 www.infocrest.com37 11262004 Copyright ©2004 o Partition points m Cannot delete default partition points at the reader or writer stages m Cannot delete default partition points at Rank or unsorted Aggregator transformation unless n There is only one partition for the session n There is a partition point upstream that uses hash keys m Cannot add a partition point at a Sequence Generator or an unconnected transformation m A transformation can only receive input from one pipeline stage. You cannot add a partition point if it violates this rule. o XML sources m You cannot partition a pipeline that contains an XML source o Joiners m You cannot partition the pipeline that contains the Master source unless you create a partition point at the Joiner Transformation o Mapping changes m After you partition a session, you could make some changes to the underlying mapping that would violate the partitioning rules above. Theses changes would not get validated in the Workflow Manager and the session would fail. Limitations

38 www.infocrest.com38 11262004 Copyright ©2004 o Hash Auto Keys m Make sure the row grouping stays the same when you have one auto key partition point feeding data to several ‘grouping’ transformation, such as a Sorter followed by an Aggregator. If the grouping is different, you may not get the results you expect. o External Loaders m You cannot partition a session that feeds an external loader m The session may validate but the server will fail the session m The exception is Oracle external loader, under certain conditions m One potential solution is to load the target data into a flat file then use an external loader to push the data to the database n On UNIX, the server pipes data through to the external loader as the output data is produced and you would loose this advantage with this solution o Debugger m You cannot run a session with multiple partitions in the debugger o Resources m Partitioning can be a great help in speeding up sessions but it can use up resources very quickly m Review the session’s performance in the production environment to make sure you are not hitting your system’s glass ceiling Limitations

39 www.infocrest.com39 11262004 Copyright ©2004 Log file reader Filter out unwanted transactions Sort input by session ID Aggregate log data per session Rank the top 10 most active sessions Partitioning Demo o You have a mapping that reads data from web log files and aggregates values by user session in a daily table o In addition, you run a top 10 most active sessions report file o You have three web servers, each dedicated to its own subject area m Data for one user session can be spread across several log files m Log file sizes vary between servers o You have a persistent session id in the logs

40 www.infocrest.com40 11262004 Copyright ©2004 Partition point #1; Pass-through, to read the each log file entirely 3 partitions: each will read a separate log file Partition point #2; Round-robin, to even out the load between partitions Partition point #3; Hash auto keys, the server will channel data based on session id, the ‘sort by’ port Partition point #4; Hash auto keys, this ranker does not use a ‘group by’ port. All data will be lumped into one default group and one partition Partition Demo – Strategy o Define your strategy m Using partitions, you can process several log files concurrently, one log per partition m Because the log files vary in sizes, you need to re-balance the data load across partitions m To keep the sorter and aggregator working properly, you need to group the data load by session id across partitions. This way, data that belongs to the same user session will always be processed in the same partition m For the rank transformation, you need all the data to be channeled through one partition, so it can extract the top 10 sessions from the entire data set

41 www.infocrest.com41 11262004 Copyright ©2004 Notes

42 www.infocrest.com42 11262004 Copyright ©2004 1Edit session task properties in workflow manager 2Select Mapping tab 3Select Partitions sub-tab 4Select your source qualifier 5Click ‘Edit Partition Point’ Partition Demo – Implementation I o Create one partition per source file 6Click ‘Add’ twice Source qualifier partition point can only be ‘Pass Through’ for flat file sources or ‘Pass Through’ and ‘Key Range’ for relational sources

43 www.infocrest.com43 11262004 Copyright ©2004 2Select your source qualifier 1Select Transformations sub-tab 3Type filenames Partition Demo – Implementation II o Specify Source Files

44 www.infocrest.com44 11262004 Copyright ©2004 Partition Demo – Implementation III o Re-balance the data load 1Select Partitions sub-tab 2Select your filter 3Click ‘Add Partition Point’ 4Select ‘Round Robin’

45 www.infocrest.com45 11262004 Copyright ©2004 The Aggregator transformation has the ‘Sorted Input’ property set and therefore does not have a default partition point. Since we added a partition point at the Sorter, we don’t need one at the Aggregator. Partition Demo – Implementation IV o Re-group the data for the Sorter and Aggregator transformation 1Select your sorter 2Click ‘Add Partition Point’ 3Select ‘Hash Auto Keys’

46 www.infocrest.com46 11262004 Copyright ©2004 o Ensure the default Rank transformation partition point is set correctly Every Rank transformation gets a default partition point set to hash auto keys, and this is the behavior we want Partition Demo – Implementation V

47 www.infocrest.com47 11262004 Copyright ©2004 o Set the defaults for your target Top 10 file When you write to a partitioned flat file target, data for each partition ends up in its own file. Click ‘Merge Partitioned Files’ to have the server merge those files into one. Partition Demo – Implementation VI 2Select your flat file target 1Select Transformations sub-tab

48 www.infocrest.com48 11262004 Copyright ©2004 o Set Session performance parameters 1Select Properties tab 2Increase total DTM buffer size if needed. Depends on the number of partitions and the number of sources and targets. 3Check ‘Collect Performance Data’ box for a test run to see how your partitioning strategy is performing Partition Demo – Implementation VII

49 www.infocrest.com49 11262004 Copyright ©2004 Notes

50 www.infocrest.com50 11262004 Copyright ©2004 Select Transformation Statistics tab Number of input rows for each log file Number of output rows for the relational target table. Load is spread evenly across partitions Output rows sent to the flat file top 10 target, confined to one partition Partition Demo – Test Run I o Monitor Session statistics through the Workflow Monitor

51 www.infocrest.com51 11262004 Copyright ©2004 Select Performance tab, only visible while the session is running and until you close the window. These numbers are saved in the ‘.perf’ file Round-robin evens out the load at the Filter transformation Partition [1] # output rows in Filter != # input rows in Sorter. Data was redistributed. All output rows from the aggregator end up in this partition to be ranked. A single group is created. Partition Demo – Test Run II o Monitor Session performance through the Workflow Monitor

52 www.infocrest.com52 11262004 Copyright ©2004 Scroll down to the Run Info section Partition Demo – Test Run III o Examine Pipeline stage threads performance in session log m Thread by thread performance, for each pipeline stage n Total run time n Idle time n Busy percentage m High idle time means a thread is waiting for data, look for a bottleneck upstream

53 www.infocrest.com53 11262004 Copyright ©2004 Performance & Tuning

54 www.infocrest.com54 11262004 Copyright ©2004 o Collect base performance data m Establish reference points for your particular system - Your goal is to measure optimal I/O performance on your system - Create pass through mappings for each main source/target combination - Make notes of the read and write throughput counters in the session statistics - Time these sessions and compute Mb/hour or Gb/hour numbers - Do this for various combinations of file and relational sources and targets - Try and have the system to yourself when you run your benchmarks o Collect performance data for your existing mappings m Before tuning them - Collect read and write throughput data - Collect Mb/hour ot Gb/hour data o Identify and remove the bottlenecks in your mappings m Keep notes of what you do and how it affects the performance m Go after one problem at a time and re-check performance after each change m If a fix does not provide speed improvement, revert to your previous configuration Informatica Tuning 101

55 www.infocrest.com55 11262004 Copyright ©2004 Collecting Reference Data o Use a pass-through mapping m a source definition m a source qualifier m a target definition o No transformations m no transformation thread m best possible engine performance for this source and target combination

56 www.infocrest.com56 11262004 Copyright ©2004 1-Writing to a slow target ? 2-Reading from a slow source ?3-Transformation inefficiencies ?4-Session inefficiencies ?5-System not optimized ? Identifying Bottlenecks

57 www.infocrest.com57 11262004 Copyright ©2004 Notes

58 www.infocrest.com58 11262004 Copyright ©2004 Target Bottleneck Change session’s writer to a file write

59 www.infocrest.com59 11262004 Copyright ©2004 Target Bottleneck o Common sources of problems m Indexes or key constraints m Database commit points too high or too low o Common Solutions m Drop indexes and key constraints before loading, rebuild after loading m Use bulk loading or external loaders when practical m Experiment with the frequency of database commit points

60 www.infocrest.com60 11262004 Copyright ©2004 Source Bottleneck OR

61 www.infocrest.com61 11262004 Copyright ©2004 o Common sources of problems m Inefficient SQL query m Table partitioning does not fit the query o Common Solutions m analyze the query issued by the Source Qualifier. It appears in the session log. Most SQL interpreter tools allow you to view an execution plan for your query. m consider using database optimizer hints to make sure correct indexes are used m consider indexing tables when you have order by or group by clauses m try database parallel queries if supported m try partitioning the session if appropriate m If you have table partitioning, make sure your query does not pull data across partition lines m If you have a query filter on non-indexed columns, try moving the filter outside of the query, into a Filter Transformation Source Bottleneck

62 www.infocrest.com62 11262004 Copyright ©2004 Mapping Bottleneck Under Properties -> Performance

63 www.infocrest.com63 11262004 Copyright ©2004 o Common sources of problems m too many transforms m unused links between ports m too many input/output or outputs ports connected out of aggregator, ranking, lookup transformations m unnecessary data-type conversions o Common solutions m eliminate transformation errors m if several mappings read from the same source, try single pass reading m optimize datatypes, use integers for comparisons. m don’t convert back and forth between datatypes m optimize lookups and lookup tables, using cache and indexing tables m put your filters early in the data flow, use a simple filter condition m for aggregators, use sorted input, integer columns to group by and simplify expressions m if you use reusable sequence generators, increase number of cached values m if you use the same logic in different data streams, apply it before the streams branch off m optimize expressions: - isolate slow and complex expressions - reduce or simplify aggregate functions - use local variables to encapsulate repeated computations - integer computations are faster than character computations - use operators rather that the equivalent function, ‘||’ faster than CONCAT(). Mapping Bottleneck

64 www.infocrest.com64 11262004 Copyright ©2004 Session Bottleneck

65 www.infocrest.com65 11262004 Copyright ©2004 Notes

66 www.infocrest.com66 11262004 Copyright ©2004 o Common sources of problems m inappropriate memory allocation settings m under-utilized or over-utilized resources (CPU and RAM) m error tracing override set to high level o Common solutions m experiment with DTM buffer pool and buffer block size - As good starting point is 25MB for DTM buffer and 64K for buffer block size m make sure to keep data caches and indexes in memory - Avoid paging to disk, but be aware of your RAM limits m run sessions in parallel, in parallel workflow execution paths, whenever possible - Here also, be cautious not to hit your glass ceiling m if your mapping allows it, use partitioning m experiment with database commit interval m turn off decimal arithmetic (it is off by default) m use debugger rather than high error tracing,reduce your tracing level for production runs - Create a reusable session configuration object to store tracing level and block buffer size m don’t stage your data if you can avoid it, read directly from original sources m look at the performance of your session components (run each separately) Session Bottleneck

67 www.infocrest.com67 11262004 Copyright ©2004 System Bottleneck

68 www.infocrest.com68 11262004 Copyright ©2004 o Common sources of problems m slow network connections m overloaded or under-powered servers m slow disk performance o Common Solutions m get the best machines to run your server. Better yet, use several servers against the same repository (power center only). m use multiple CPUs and session partitioning m make sure you have good network connections between Informatica server and database servers m Locate the Repository database on the Informatica server machine m shutdown unneeded processes or network services on your servers m use 7 bit ASCII data movement (the default) if you don’t need Unicode m evaluate hard disk performance, try locating sources and targets on different drives m Use different drives for transformation caches, if they don’t fit in memory m get as much RAM as you can for your servers System Bottleneck

69 www.infocrest.com69 11262004 Copyright ©2004 o View Session statistics through the Workflow Monitor Select Transformation Statistics tab Number of input rows for each source file Number of output rows for the relational target table. Load is spread evenly across partitions Output rows sent to the flat file top 10 target, confined to one partition Using Statistics Counters These numbers are available in real- time, they are updated every few seconds.

70 www.infocrest.com70 11262004 Copyright ©2004 o Turning it on m In the Workflow Manager, edit session m Collecting Performance data requires an additional 200K of memory per session 1 - Select Properties tab 2 - Select Performance section 3 - Check ‘Collect Performance Data’ box for a test run to see how your partitioning strategy is performing Using Performance Counters

71 www.infocrest.com71 11262004 Copyright ©2004 o Monitor Session performance through the Workflow Monitor Select Performance tab, only visible while the session is running and until you close the window. These numbers are saved in the ‘.perf’ file Input rows and output rows counters for each transformation Read from disk/cache, Write to disk/cache counters for ranks, aggregators and joiners Using Performance Counters Error rows counters for each transformation

72 www.infocrest.com72 11262004 Copyright ©2004 o How to use the counters m Input & output rows to verify n data integrity n Rows repartition at a partition point m Error rows n Did you expect this transformation to reject rows due to error ? m Read/Write to disk n If the counters have non-zero values, your transformation is paging to disk m Read/Write to cache n Use in conjunction with read/write to disk to estimate the size of the cache needed to hold everything within RAM m New group key n Aggregator and ranker n Number of groups created l Does this number seem right ? If not, your grouping condition may be wrong m Old group key n Aggregator and ranker n Number of times a group was reused m Rows in Lookup Cache n Lookup only n Use to estimate the total cache size Using Performance Counters

73 www.infocrest.com73 11262004 Copyright ©2004 Notes

74 www.infocrest.com74 11262004 Copyright ©2004 o Using Session log’s Run Info m Only available when the session is finished m One entry per stage per partition m Counters: n Run time, total run time for the thread n Idle time, total time the thread spent doing nothing (included in total run time) n Busy percentage, a function of the two counters abover m Replaces V5 buffer efficiency counters Scroll down to the Run Info section Using Run Info Counters

75 www.infocrest.com75 11262004 Copyright ©2004 ReaderTransformWriter High %Low % High %Low % High % Using Run Info Counters o Run Info Busy Percentage m You need to compare the values for each stage to properly evaluate where the bottleneck may be m You want to look for a high value (busy) that stands out. This indicates a problem area. m High values across the board are indicators of an optimized session Bottlenecks in red

76 www.infocrest.com76 11262004 Copyright ©2004 Review Quiz 1.What is a benefit of buffered processing stages ? a)Safety net against network errors b)Lower memory requirements c)Overlapping data processing 2.How do you identify a target bottleneck ? a)By changing the output of the session to point to a flat file instead of a relational target b)By reading the Run-Info section of the session log and looking for a low busy percentage at the writer stage c)By replacing the mapping with a pass-through mapping connected to the same target 3.The ‘Collect Performance Data’ option is enabled by default ? a)No, never b)Yes, always c)No, unless you run a debugging session

77 www.infocrest.com77 11262004 Copyright ©2004 4.You have a shared session memory set to 25MB and a buffer block size set to 64K. How many rows of data can the server move to memory in a single operation ? a)40,000 rows if average row size is 655 bytes b)100 rows if average row size is 655 bytes c)2,500 rows if average row size is 64k 5.The Aggregator Transformation’s ‘Write To Cache’ tells the number of rows written to the disk cache ? a)TRUE. b)FALSE Review Quiz

78 www.infocrest.com78 11262004 Copyright ©2004 Command Line Utilities

79 www.infocrest.com79 11262004 Copyright ©2004 Overview o Pmcmd m Communicates with the Informatica Server n Use with external scheduler tools or server scripts n Use for remote administration when Workflow Manager or Monitor GUI is not accessible m Located in the server install directory n On windows, in the ‘bin’ folder n On Unix, at the parent level o Pmrep & Pmrepagent m Communicates with the Repository Server n Use to backup repository n Use to change database connections parameters or server variables ($PM…) n Use to perform security related tasks m located in the repository server install directory n On windows, in the ‘bin’ folder n On Unix, at the parent level

80 www.infocrest.com80 11262004 Copyright ©2004 o Two modes m Command line n Pass the entire command and parameters to the utility n Use when writing server scripts that automate server or repository functions n Example >>pmcmd startworkflow -u User -p Password -s InfaServer:4001 wkf_LoadFactTables m Interactive n Maintain a connection to the server or repository until typing exit n Use to enter series of commands, when operating server or repository remotely n Example: >>pmcmd >> >>Informatica™PMCMD 7.1 (1119) >>Copyright © Informatica Corporation 1994-2004 >>All Rights Reserved >> >>Invoked at Fri Apr 25 13:14:23 2003 >> >>pmcmd>connect >>username:User >>password: >>server address : InfaServer:4001 Working with pmcmd & pmrep (I) Connection parameters need flags (-u, -p,…) Main command main parameter Just type the utility name at the console to start interactive mode type a command name without parameters and the utility prompts for the parameters (not available for all commands)

81 www.infocrest.com81 11262004 Copyright ©2004 Notes

82 www.infocrest.com82 11262004 Copyright ©2004 o Getting help m Command line Type >>pmcmd help | more to get a paged list of all pmcmd or pmrep commands and arguments Type >>pmcmd help to get help on a specific command m Interactive Type help or help at the pmcmd or pmrep prompt m Example >>pmrep help backup >>backup >>-o >>-f (override existing output file) >>help completed successfully o Terminating an interactive session  Type exit at the pmcmd or pmrep prompt  You can also type quit at the pmcmd prompt Working with pmcmd & pmrep (II) Getting help on the repository server’s backup command. Non-interactive mode

83 www.infocrest.com83 11262004 Copyright ©2004 o Running Workflows and Tasks m Commands for starting, stopping and aborting tasks and workflows starttask startworflow stoptask stopworkflow aborttask abortworkflow schedule Workflow unschedule Workflow m Commands to resume a workflow or worklet resumeworkflow resumeworklet m Commands to wait for a process to finish waittask waitworkflow Using pmcmd (I) You can start a process in wait or nowait mode. You can also specify a parameter file Utility will return control to the user when a given process terminates o Example >>pmcmd starttask -u joe -p 1234 -s InfaServer:4001 -f prodFolder -w wkf_loadFacts ses_LoadTrans Task names are fully qualified. If task is within a worklet, use the syntax workletname.taskname Specify the folder (-f) and workflow (-w) hosting the task

84 www.infocrest.com84 11262004 Copyright ©2004 o Server Administration m Interactive mode: connect disconnect m Both modes pingserver shutdownserver version o Gathering Information About m Server getserverdetails getserverproperties m Repository getsessionstatistics gettaskdetails getworkflowdetails getrunningsessionsdetails o Interactive Mode Only m Setting defaults and properties setfolder, unsetfolder setwait,setnowait Using pmcmd (II) Server and workflows status. You can get info about all, running or scheduled workflows Server name,type and version, repository name Set a default folder or a run mode valid for the entire session Need a user name, password and a server address and port to connect

85 www.infocrest.com85 11262004 Copyright ©2004 Using pmcmd (III) o Return Codes m In Command-Line mode, ‘pmcmd’ returns a value to indicate the status of the last command n Zero l Command was successful. l If starting a workflow or task in wait mode, zero indicates successful completion of the process. l In no-wait mode, zero indicates the server successfully received and processed the request n Non zero l Error status, such as invalid user name or password, or wrong command parameters l See your Informatica documentation for a list of the latest return codes m Caching return codes n Within a dos batch file, use the ERRORLEVEL variable l Check for exact values, starting with the highest one as in: pmcmd pingserver Infa61:4001 IF ERRORLEVEL == 1 GOTO error IF ERRORLEVEL == 0 GOTO success n Within a Perl script, you can use the $? variable shifted by 8 as in: system(‘pmcmd pingserver Infa61:4001’); $returnVal = $? >> 8;

86 www.infocrest.com86 11262004 Copyright ©2004 PMREP Commands o Change Management Commands m CreateDeployment Group m AddToDeploymentGroup m ClearDeploymentGroup m DeployDeploymentGroup m DeleteDeploymentGroup m CreateLabel m ApplyLabel m DeleteLabel m Checkin m UndoCheckout m DeployFolder m ExecuteQuery m FindCheckout  Validate Deployment group functions to create, add to, deploy, clear or delete a group.Groups can either be static or dynamic. Label functions to create, apply or delete a label. Folder copy Executes an existing query

87 www.infocrest.com87 11262004 Copyright ©2004 PMREP Commands o Persistent Input Files m You can create reusable input files for some repository and versioning commands m These files describe the objects that will be affected by these operations m Input files can be created manually or by using repository commands m Operations that support input files: n Add to Deployment Group n Apply Label n Validate n Object Export n List Object Dependencies m Operations that can create a persistent input file n Execute Query n List Object Dependencies o Deployment Control Files m XML files written to specify deployment options such as ‘Replace Folder’ or ‘Retain Mapping Persistent Values’ m Used with n Deploy Folder n Deploy Deployment Group

88 www.infocrest.com88 11262004 Copyright ©2004 PMREP Commands o Repository Commands m ListObjects n Lists repository objects for a given type and folder m Listtablesbysess n List source and target table instance names for a given session m ListObjectDependencies n List objects dependent on another object (or objects if you use an input file) for a given type and folder m ObjectExport m ObjectImport n Import and Export repository objects as XML files m Updateseqgenvals n Change values for non-reusable sequence generators in a mapping n For instance, you can reset dimension key generators start values to 1 when reloading data from scratch (second initial load) in a data mart m Updatesrcprefix m Updatetargprefix n Change the value of a source or target owner name for a given table in a given session

89 www.infocrest.com89 11262004 Copyright ©2004 Notes

90 www.infocrest.com90 11262004 Copyright ©2004 PMREPAGENT Commands o Repository Commands m Backup n Backup a repository to a file. Repository must be stopped m Create n Create a new repository in a pre-configured database m Delete n Delete repository tables from a database m Restore n Restore a repository from a backup file to an empty database m Upgrade n Upgrade an existing repository to the latest version

91 www.infocrest.com91 11262004 Copyright ©2004 Repository MX Views

92 www.infocrest.com92 11262004 Copyright ©2004 o Summary m Provided for reporting on the Repository n Historical load performance n Documentation n Dependencies m These views take most of the complexity out of the production repository tables m Use them whenever possible rather than going against production repository tables m Never modify the production repository tables themselves Repository Views

93 www.infocrest.com93 11262004 Copyright ©2004 o Accessing repository MX views m Cannot be accessed by Informatica directly n Direct access to these tables is prohibited by Informatica n You cannot import these table definitions in the Source designer either n This can however be circumvented: l Create a copy of the views using a different name in the production repository (potentially dangerous) l Create a copy of the views in a different database (safer but slower) » use different view names » create with an account that has read permission into the production repository views m Can be queried by other database tools n SQL*plus for Oracle or SQL query analyzer for MS SQL server n Perl scripts using DBI-DBD modules n PHP scripts Repository Views

94 www.infocrest.com94 11262004 Copyright ©2004 o All views at a glance m REP_DATABASE_DEFS n A list of sources subfolders for each folder m REP_SCHEMA n List of folders and version info m REP_SESSION_CNXS n Info about database connections in reusable sessions m REP_SESSION_INSTANCES n Info about session instances in workflows or worklets m REP_SRC_FILE_FLDS n Detailed info about flat file, ISAM & XML source fields m REP_SRC_FILES n Detailed info about flat, ISAM & XML source definitions m REP_SRC_FLD_MAP n Info about data transformations at the field level for relational sources m REP_SRC_MAPPING n Sources for each mapping m REP_SRC_TBL_FLDS n Detailed info about relational source fields m REP_SRC_TBLS n Info about relational sources for each folder Repository Views m REP_TARG_FLD_MAP n Info about data transformations at the field level for relational targets m REP_TARG_MAPPING n Targets for each mapping m REP_TARG_TBL_COLS n -Detailed info about relational targets columns m REP_TARG_TBL_JOINS n Primary/Foreign key relationship between targets, per folder m REP_TARG_TLBS n Info about relational targets per folder m REP_TBL_MAPPING n List of sources & targets per mapping, with filters, group bys and SQL overrides m REP_WORKFLOWS n Limited info about workflows m REP_SESS_LOG n Historical data about session runs m REP_SESS_TBL_LOG n Historical load info for targets m REP_FLD_MAPPING n Describe data path from source field to target field

95 www.infocrest.com95 11262004 Copyright ©2004 REP_SRC_MAPPING SOURCE_NAME SRC_BUSNAME SUBJECT_AREA MAPPING_NAME VERSION_ID VERSION_NAME MAPPING_COMMENT MAPPING_LAST_SAVED o Usage m You are changing a source or a target table and need to know what mappings are affected m You are changing a database connection and need to know which sessions are affected o Useful Views m Source dependencies n REP_SRC_MAPPING m Target dependencies n REP_TARG_MAPPING m Connections dependencies n REP_SESSION_INSTANCES Repository Views - Dependencies Subject area: name for folder in repository tables and views Explains how the data is transformed from the source to the target Refers to folder versioning REP_TARG_MAPPING TARGET_NAME TARG_BUSNAME SUBJECT_AREA MAPPING_NAME VERSION_ID VERSION_NAME SOURCE_FILTER CONDITIONAL_LOAD GROUP_BY_CLAUSE SQL_OVERRIDE DESCRIPTION MAPPING_COMMENT MAPPING_LAST_SAVED REP_SESSION_INSTANCES SUBJECT_AREA WORKFLOW_NAME SESSION_INSTANCE_NAME IS_TARGET VERSION_ID CONNECTION_NAME CONNECTION_ID

96 www.infocrest.com96 11262004 Copyright ©2004 o Example queries m Display all mappings in the TradeWind folder that have a source or a target named ‘customers’ Repository Views - Dependencies select distinct mapping_name from rep_src_mapping where source_name = 'customers' and subject_area = 'Tradewind’ union select distinct mapping_name from rep_targ_mapping where target_name like 'customers’ and subject_area = 'Tradewind' m Display all workflows and worklets using a server target connection named ‘Target_DB’ select distinct workflow_name from rep_session_instances where connection_name = ’Target_DB' and is_Target = 1

97 www.infocrest.com97 11262004 Copyright ©2004 Notes

98 www.infocrest.com98 11262004 Copyright ©2004 o Usage m Run a report on historical load performance for given targets m Run a post-load check on warning, errors and rejected rows for all sessions within a folder o Useful Views m Target performance n REP_SESS_TBL_LOG m Session performance n REP_SESS_LOG Repository Views - Session Performance Start and end times for the writer stage Both first and last error messages Time when server received the start session request A post load process can access session logs and reject files REP_SESS_TBL_LOG SUBJECT_AREA SESSION_NAME SESSION_INSTANCE TABLE_NAME TABLE_BUSNAME TABLE_INSTANCE_NAME SUCCESSFUL_ROWS FAILED_ROWS LAST_ERROR LAST_ERROR_CODE START_TIME BAD_FILE_LOCATION END_TIME SESSION_TIMESTAMP REP_SESS_LOG SUBJECT_AREA SESSION_NAME SESSION_INSTANCE SUCCESSFUL_ROWS FAILED_ROWS FIRST_ERROR_CODE FIRST_ERROR_MSG LAST_ERROR_CODE LAST_ERROR ACTUAL START SESSION_TIMESTAMP BAD_FILE_LOCATION SESSION_LOG_FILE

99 www.infocrest.com99 11262004 Copyright ©2004 o Example queries m Display historical load data and elapsed load times for a target called ‘T_Orders’ in the folder ‘TradeWind’ n The method to compute the elapsed time will be database dependent select successful_rows, failed_rows, from rep_sess_tbl_log where table_name = ’T_Orders' and subject_area = 'Tradewind’ order by session_timestamp desc m Display a post - load report showing sessions having error or warning messages n The variable load-start-time is taken from repository production table opb_wflow_run n This query assumes there is only one workflow called ‘DailyDatawarehouseLoad’ in the repository select session_instance_name, successful_rows, failed_rows, first_error_msg from rep_sess_log where subject_Area = ‘TradeWind’ and first_error_Msg != ‘No errors encountered.’ and session_timestamp >= (select max(start_time) from opb_wflow_run where workflow_name = ‘DailyDatawarehouseLoad’) Repository Views - Session Performance

100 www.infocrest.com100 11262004 Copyright ©2004 o Usage m Document the schema (sources and targets with their respective fields) for a given folder or the entire repository o Useful Views m REP_SRC_TBLS m REP_TARG_TBLS m REP_SRC_TBL_FLDS m REP_TARG_TBL_COLS Repository Views - Documentation REP_SRC_TBLS TABLE_NAME TABLE_BUSNAME TABLE_ID SUBJECT_AREA DATABASE_TYPE DATABASE_NAME SCHEMA_NAME FIRST_FIELD_ID SOURCE_DESCRIPTION VERSION_ID VERSION_NAME LAST_SAVED REP_TARG_TBLS SUBJECT_AREA TABLE_NAME BUSNAME VERSION_ID VERSION_NAME DESCRIPTION FIRST_COLUMN_ID TABLE_CONSTRAINT CREATE_OPTIONS FIRST_INDEX_ID LAST_SAVED REP_SRC_TBL_FLDS COLUMN_NAME COLUMN_BUSNAME COLUMN_ID SUBJECT_AREA TABLE_ID TABLE_NAME TABLE_BUSNAME COLUMN_NUMBER COLUMN_DESCRIPTION KEY_TYPE SOURCE_TYPE DATA_SCALE NEXT_COLUMN VERSION_ID VERSION_NAME DATA_PRECISION REP_TARG_TBL_COLS SUBJECT_AREA TABLE_NAME TABLE_BUSNAME COLUMN_NAME COLUMN_BUSNAME COLUMN_NUMBER COLUMN_ID VERSION_ID VERSION_NAME DESCRIPTION COLUMN_KEYTYPE DATA_TYPE_GROUP DATA_PRECISION DATA_SCALE NEXT_COLUMN_ID IS_NULLABLE SOURCE_COLUMN_ID DATA_TYPE

101 www.infocrest.com101 11262004 Copyright ©2004 REP_FLD_MAPPING SOURCE_FIELD_NAME SRC_FLD_BUSNAME SOURCE_NAME SRC_BUSNAME TARGET_COLUMN_NAME TARG_COL_BUSNAME SUBJECT_AREA MAPPING_NAME VERSION_ID VERSION_NAME TRANS_EXPRESSION MAPPING_COMMENT MAPPING_LAST_SAVED USER_COMMENT DBA_COMMENT REP_TBL_MAPPING SOURCE_NAME SRC_BUSNAME TARGET_NAME TARG_BUSNAME SUBJECT_AREA MAPPING_NAME VERSION_ID VERSION_NAME SOURCE_FILTER CONDITIONAL_LOAD GROUP_BY_CLAUSE MAPPING_COMMENT MAPPING_LAST_SAVED SQL_OVERRIDE DESCRIPTION o Usage m Document the path from each source field to each target column, with the data transformations in between m For each mapping, document the sources and targets objects including SQL overrides and group conditions o Useful Views m Source & Target level n REP_TBL_MAPPING m Field level n REP_FLD_MAPPING Repository Views - Documentation In Source Qualifier propertiesIn Filter transformationsIn Aggregator transformationsDigest of all transformations that occur between the source field and the target column. Sometimes cryptic and hard to read…

102 www.infocrest.com102 11262004 Copyright ©2004 o Example queries m Display source schema (all source definitions and field properties) for the folder ‘TradeWind’ n The output is sorted by column number to keep it in sync with the field order of each source definition select table_name, column_name, source_type, data_precision, data_scale from rep_src_tbl_flds where subject_area = 'TradeWind' and version_name = '010000‘ and table_name in (select table_name from rep_Src_tbls where subject_area = 'TradeWind') order by table_name, column_number m Display the path of data from source field to target column, each with its concatenated transformation expression, for the mapping ‘OrdersTimeMetric’ in the folder’ TradeWind’ select source_field_name, target_column_name, trans_expression from rep_fld_mapping where mapping_name = 'OrdersTimeMetric’ and subject_area = ‘Tradewind’ Repository Views - Documentation

103 www.infocrest.com103 11262004 Copyright ©2004 o Sample output Repository Views - Documentation ShipCountryCountry:SD.Orders.ShipCountry RequiredDate OnTime_OrdersSUM(IIF (DATE_COMPARE(iif (isnull(:SD.Orders.RequiredDate), :SD.Orders.ShippedDate,:SD.Orders.RequiredDate), :SD.Orders.ShippedDate) >= 0, 1, 0)) ShippedDate OnTime_OrdersSUM(IIF (DATE_COMPARE(iif (isnull(:SD.Orders.RequiredDate), :SD.Orders.ShippedDate,:SD.Orders.RequiredDate), :SD.Orders.ShippedDate) >= 0, 1, 0)) OrderIDLate_OrdersCOUNT(:SD.Orders.OrderID) - SUM(IIF (DATE_COMPARE(iif (isnull(:SD.Orders.RequiredDate), :SD.Orders.ShippedDate,:SD.Orders.RequiredDate), :SD.Orders.ShippedDate) >= 0, 1, 0)) :SD. Prefix for Source definition RequiredDate and ShippedDate source fields both feed the OnTime_Orders column


Download ppt "Www.infocrest.com1 11262004 Copyright ©2004 Notes."

Similar presentations


Ads by Google