Agenda ODI Performance ODI Scheduling ODI Deployment/Release
Uli Bethke Dublin based Blog www.bi-q.ie ODI 2007 Reviewer two ODI books ODI articles OTN Deputy chair OUG BI SIG. Next event 11th June ODI advanced trainer
ODI performance ODI is a metadata driven (SQL) code generator using code templates (knowledge modules). It uses a Java agent to communicate and send data between source and target systems and the repository over the network.
SQL > 80%: ODI performance issues = SQL issues => SQL main ODI skill Perfect your SQL. Advanced SQL. Analytic Functions Know your database(s) inside out. In particular the target Understand, write, and modify Knowledge Modules
Agent Light weight Java based application Tied to host OS Generates code based on ODI metadata. Communicates source, target, repository. JDBC data transport XML Jetty Interpreters: Jython, JBS, JavaScript, Groovy HSQLDB in memory database Scheduler Sizing
Agent Target Least amount of roundtrips. Network (JDBC, XML) One target database server only (DW) Another Server ODBC drivers JEE agent on Weblogic No support for target OS Resources on target DBA
interfaces No!! KM using row by row processing Use ODI functions rather than DB functions Don’t overuse CKM (especially for large data volumes) temp indexes (I$) Gather statistics (C$, I$, TGT when applicable) Rule of thumb: Use loader KMs or db link KMs rather than JDBC KMs
Source/target Schemas on same database server. Physical schema and not data server. Have sources physically close to target Minimize impact on source Chunking
CRITICAL PATH Network Paths: Path Durations: B > E > H 6 + 2 + 11 = 19 B > D > F 6 + 4 + 14 = 24 B > D > G 6 + 4 + 10 = 20 A > C > G 9 + 8 + 10 = 27 Critical Path
Micro Tuning JDBC drivers JVM Type 4 or 5 JDBC drivers (Data Direct) Array fetch size. DB packet size. Network packet size.
Performance Monitoring ODI Log Data Mart Facts Dimensions Metrics Frontend
Dbms_sqltune_util0 dbms_sqltune_util0.sqltext_to_sqlid Link to Data Dictionary Tables
maciEJ KOCON Dublin based ODI 2005 (Sunopsis) Reviewer two ODI books Blog www.bi-q.ie maciek@bi-q.ie
ORCHESTRATING DWH PROCESSES Orchestration of Data Process Flow Standard DWH Process flow orchestration Packages in Oracle Data Integrator 10g Load Plans in Oracle Data Integrator 11g Process Flow use cases - efficiency analysis Alternative scheduling benefits
loads data from sources TYPICAL DATA FLOW in DWH step 1 STAGE E-LT DATA EXTRACT loads data from sources
TYPICAL DATA FLOW in DWH step 1 step 2 STAGE DIMs E-LT DATA EXTRACT loads data from sources LABEL provides structured labeling information
TYPICAL DATA FLOW in DWH step 1 step 2 step 3 STAGE DIMs FACTS E-LT DATA EXTRACT loads data from sources LABEL provides structured labeling information FACTS consists of measurements, metrics or facts
TYPICAL DATA FLOW in DWH step 1 step 2 step 3 STAGE DIMs FACTS E-LT DATA EXTRACT loads data from sources LABEL provides structured labeling information FACTS consists of measurements, metrics or facts data transport & transform units
TYPICAL DATA FLOW in DWH step 1 step 2 step 3 STAGE DIMs FACTS E-LT DATA EXTRACT loads data from sources LABEL provides structured labeling information FACTS consists of measurements, metrics or facts data transport & transform units ODI 10g Packages ODI 11 Load Plans orchestration
ORCHESTRATION – ODI PACKAGES using object directly PKG_ABC INT_A PRC_B INT_C PKG_DE INT_D INT_E
using scenarios – compiled code ORCHESTRATION – ODI PACKAGES using object directly using scenarios – compiled code SYNCHRONOUS PKG_ABC PKG_ABCDE INT_A INT_A PRC_B PRC_B INT_C INT_C PKG_DE PKG_DE INT_D INT_E
using scenarios – compiled code ORCHESTRATION – ODI PACKAGES using object directly using scenarios – compiled code PKG_ABC SYNCHRONOUS PKG_ABCDE INT_A INT_A PRC_B PRC_B INT_C INT_C PKG_DE PKG_DE INT_D INT_E ASYNCHRONOUS PKG_ABCDE INT_A PRC_B INT_C PKG_DE
ODI 10g vs. ODI 11 STAGE DIMs FACTS ODI 10g PKG_DM PKG_ABC PKG_DE PKG_FG INT_A INT_C INT_F ODI 10g Packages PRC_B PRC_D PRC_G INT_C A D F B E G C
ODI 10g vs. ODI 11 STAGE DIMs FACTS ODI 10g ODI 11 PKG_DM PKG_ABC PKG_DE PKG_FG INT_A INT_C INT_F ODI 10g Packages PRC_B PRC_D PRC_G INT_C ODI 11 Load plans
ODI 10g vs. ODI 11 STAGE DIMs FACTS ODI 10g ODI 11 same effect! PKG_DM PKG_ABC PKG_DE PKG_FG INT_A INT_C INT_F ODI 10g Packages PRC_B PRC_D PRC_G INT_C ODI 11 Load plans A D F same effect! B E G C
PROCESS FLOW EFFICIENCY ANALYSIS Standard Flow Orchestration: Stage-(stop)DIMs-(stop)Facts sequential A 30 B 10 C D E F G A D E F G parallel B 30 10 10 C 10 30 + 30 + 10 = 70 10 30 10
PROCESS FLOW EFFICIENCY ANALYSIS Standard Flow Orchestration: Stage-(stop)DIMs-(stop)Facts sequential A 30 B 10 C D E F G A D E F G parallel B 30 10 10 C 10 30 + 30 + 10 = 70 10 30 10 DOWNSIDES: POSSIBLE INEFFICIENCIES (IDLE RESOURCES)
PROCESS FLOW EFFICIENCY ANALYSIS OPTIMIZATION ATTEMPT A 30 B 10 C D E F G
PROCESS FLOW EFFICIENCY ANALYSIS OPTIMIZATION ATTEMPT sequential A 30 B 10 C D E F G A D F G parallel B C 30 10 10 E 10 30 + 10 10 + 30 + 10 = 50 30 10 10 70 50 = 1.4 times quicker! UPSIDE: EFFICIENCY IMPROVED
ADVANCED Data Flow example
Enterprise DWH Data Flow example
Enterprise DWH Data Flow example
PROCESS FLOW EFFICIENCY ANALYSIS OPTIMIZATION ATTEMPT sequential A 30 B 10 C D E F G A D F G parallel B C 30 10 10 E 10 30 + 10 10 + 30 + 10 = 50 30 10 10 70 50 = 1.4 times quicker! UPSIDE: EFFICIENCY IMPROVED DOWNSIDES: TIMINGS KNOWLEDGE REQUIRED OVERALL DEPENDECY KNOWLEDGE REQURED
PROCESS FLOW EFFICIENCY ANALYSIS OPTIMIZATION ATTEMPT sequential A 30 B 10 C D E F G 70 A D E F G parallel B 30 10 10 C 10 30 + 30 + 10 = 70 10 30 10 DOWNSIDE: INEFFICIENCY EXISTS BUT CAN’T BE RESOLVED CONSUMER WAITING & IMPACT
Traditional Scheduling - limitations Possible inefficiencies (idle resources) Timings knowledge required Overall dependecy knowledge requred Inefficiency exists but can’t be resolved Consumer waiting & impact
SCHEDULER Traditional Scheduling - limitations Possible inefficiencies (idle resources) Timings knowledge required Overall dependecy knowledge required Inefficiency exists but can’t be resolved Consumer waiting & impact SCHEDULER
DEPENDENCY DRIVEN Scheduling A B C D E B A C D E A B C D E B A C D E B A C D E A B C D E B A C D E
DEPENDENCY DRIVEN Scheduling A B C D E PACKGAGES & LOAD PLANS B A C D E A B C D E B A C D E B A C D E A B C D E B A C D E
PROCESS FLOW EFFICIENCY ANALYSIS sequential A 30 B 10 C D E F G 70 A D E F G parallel B 30 10 10 C 10 30 + 30 + 10 = 70 10 30 10 A 30 B 10 C D E F G 10 10 10 10 30 30 10
PROCESS FLOW EFFICIENCY ANALYSIS sequential A 30 B 10 C D E F G 70 A D E F G parallel B 30 10 10 C 10 30 + 30 + 10 = 70 10 30 10 A 30 B 10 C D E F G 70 30 10 10 10 10 30 30 10 70 30 = 2.3 times faster!
Dependency Driven Scheduling Simplifies orchestrating the flow only immediate upstream definition required execution timings not relevant self-adapts in the most effective way Improves overall E-LT performance Less idle resources – better utilization Independency unveils its full potential in complex Enterprise class DWHs (Inmon)
Dependency Driven Scheduling Notifications errors (+auto-restartability) finish summary logging Multiple/overlapping E-LT streams load with different frequencies Parameterization improved system stress control process prioritization
FIRST RUN 10 processes
FIRST RUN 10 processes TODAY 584 processes 1389 DEPENDENCIES
10 584 1389 132 231 SCENARIOS RUN FIRST RUN TODAY processes processes DEPENDENCIES 132 231 SCENARIOS RUN
10 584 12h43m 1389 132 231 SCENARIOS RUN TIME LOAD PLANS FIRST RUN processes TODAY 584 processes 1389 DEPENDENCIES 132 231 SCENARIOS RUN 12h43m LOAD PLANS TIME
10 584 2.9 12h43m 4h21m 1389 132 231 SCENARIOS RUN TIME LOAD PLANS FIRST RUN 10 processes TODAY 584 processes 1389 DEPENDENCIES 132 231 SCENARIOS RUN 2.9 12h43m LOAD PLANS 4h21m SCHEDULER TIME TIMES FASTER
Enterprise DWH Data FloW
Release 1.0
Release 2.0 TST
TESTING Release 2.0
Deploy Release 2.0 PRD
The Hot fix SITUATION
Release frequently
CI environment
CI environment
The build master
AUTOMATE Stuff
ODI vs. Source control
ODI structure
Beyond intra build Dependencies