Presentation is loading. Please wait.

Presentation is loading. Please wait. Scaling to Infinity Partitioning in Oracle DW 09-February 2004 Tim Gorman SageLogix, Inc. RMOUG Training Days 2005.

Similar presentations

Presentation on theme: " Scaling to Infinity Partitioning in Oracle DW 09-February 2004 Tim Gorman SageLogix, Inc. RMOUG Training Days 2005."— Presentation transcript:

1 Scaling to Infinity Partitioning in Oracle DW 09-February 2004 Tim Gorman SageLogix, Inc. RMOUG Training Days 2005

2 Who am I? C programmer since 1983 Databases since 1984, including BTrieve, C-ISAM, Unify, DEC VMS/RMS Oracle application developer since 1990 RMOUG Training Days presenter since 1993 Oracle DBA since 1994 Oracle data warehouses since 1994 RMOUG board member since 1995 With Oak Table (2004) With Gary Dodge (2000 and 1998)

3 Agenda DW utopia on Oracle Riding the virtuous cycle Sliding down the death spiral Partitioning, and the use of EXCHANGE PARTITION for ETL, is the differentiator

4 Four DW characteristics Non-volatile, time-variant, subject-oriented, integrated Bill Inmon Building the Data Warehouse 3rd Ed 2002 (Wiley) Think about what these mean? Consider the converse of these characteristics? Volatile? Static-image? Process-oriented? Application-specific? Time-variant dimensional data model implies: Insert, index, and analyze each row of data only once From an implementation perspective, this is vital to remember! Consider an extreme situation? Analytical database for quantum research in physics 50 Tbytes of data to load every day

5 The Virtuous Cycle Insert-only processing enables… Direct-path loads of data Partitioned tables/indexes stored in time-variant tablespaces Direct-path (a.k.a. append) loads enable… Larger volume loads with less overall impact Table compression NOLOGGING and PARALLEL operations Partitioned tables/indexes stored in time-variant tablespaces enable… EXCHANGE PARTITION during ETL READ ONLY tablespaces as data ages Performance scalability from partition pruning

6 EXCHANGE PARTITION during ETL enables… Bitmap indexes and bitmap-join indices Elimination of ETL load window and 24x7 availability for queries Bitmap indices enable… Star transformations on star (dimensional) schemas READ ONLY tablespaces enable… Near-line storage (i.e. NAS, SAMFS/HFS, etc) Right-sizing of storage to the need, classified by IOPS Backup efficiencies READ WRITE tablespaces scheduled for backup every week READ ONLY tablespaces scheduled for backup every year The Virtuous Cycle

7 The Death Spiral Volatile data presented in a static-image according to process-oriented concepts leads to… ETL using conventional-path INSERT, UPDATE, and DELETE operations (including MERGE and multi-table INSERT) trouble Conventional-path operations are trouble with: Bitmap indexes and bitmap-join indexes Forcing frequent complete rebuilds until they get too big Contention in Shared Pool, Buffer Cache, global structures Mixing of queries and loads simultaneously on table and indexes Periodic rebuilds/reorgs of tables if deletions occur Full redo logging and undo transaction tracking ETL will dominate the workload in the database Queries will consist mainly of dumps or extracts to downstream systems Query performance will be abysmal and worsening…

8 death UpSert/Merge logic during large-scale ETL represents death for scalability in a large DW The Death Spiral

9 Virtue is easier on everyone Use dimensional data models for presentation of data to users Anything goes during ETL staging, but users want simplicity and speed Oracle mechanisms optimize dimensional data models Star transformations using bitmap and bitmap-join indexes Partition pruning during queries Non-intrusive ETL processing (24x7 query operations) Direct-path bulk loading without interrupting queries Newly-loaded data published simultaneously to users Conserving resources Any UPDATE or DELETE logic can be converted to INSERT Table compression READ ONLY tablespaces

10 Direct-path loads Bulk loading feature first introduced in Oracle v6 in FASTLOAD utility on MVS to compete with DB2 Incorporated into SQL*Loader DIRECT=TRUE in v7.0 Extended to CREATE TABLE AS SELECT in v7.2 Extended to INSERT /*+ APPEND */ in v8.0 Enhanced in v8.1 to leave behind a direct-path log for use by MV fast refresh Loads data outside of managed space Load above the high-water mark in target table in SQL*Loader DIRECT=TRUE PARALLEL=FALSE After successful completion, high-water mark is raised to include newly-loaded rows in the table Load into TEMPORARY segments in all other load mechanisms After successful completion, TEMPORARY segments are merged into the table segment

11 Direct-path loads Fast bulk load mechanism bypasses: Buffer Cache Though which conventional-path INSERTs, and all UPDATE and DELETE operations pass Log Buffer and entire redo log generation process If keyword NOLOGGING is utilized Processes format blocks with newly-inserted rows in private process memory Writes them directly to datafiles

12 Direct-path loads NOLOGGING options exist for all direct-path or APPEND operations Available only for INSERT operations, never UPDATE or DELETE A potential performance enhancement If the redo logging stream is truly causing performance problems Dont assume that this is so, please verify! Flip side: NOLOGGING means no recoverability RMAN incremental backup capability can help here…

13 Exchange Partition EXCHANGE PARTITION is crucial to non-intrusive ETL Data is transformed, cleansed, loaded, indexed, analyzed offline from live tables and indexes Direct-path load operations are especially tough on live indexes In-flight queries continue to process during and after EXCHANGE PARTITION operations Oracles read-consistency mechanisms cause existing operations to use data that was exchanged away from the table, and new operations to use data exchanged into the table Local-partitioned indexes and statistics are exchanged as well Global-partitioned indexes are maintained during exchange operation

14 Exchange Partition EXCHANGE PARTITION can be transparent to in-flight queries DML locks prevent exchange on objects where INSERT, UPDATE, DELETE, and SELECT … FOR UPDATE in progress What happens to in-flight queries if standalone table TT is truncated or dropped immediately after the exchange completes? P11P12P13P14P15 P17P18 TT Queries started after EXCHANGE utilize the segment that is partition P18 after the exchange Queries that were in-flight before EXCHANGE continue to utilize the segment that was partition P18 before the exchange

15 Exchange Partition The basic technique of bulk-loading new data into a temporary load table, which is then indexed, analyzed, and then published all at once to end-users using the EXCHANGE PARTITION operation, should be the default load technique for all large tables in a data warehouse fact tables slowly-changing or quickly-changing dimensions Assumptions for this example: Composite partitioned fact table named TXN Range partitioned on DATE column TXN_DATE Hash partitioned on NUMBER column ACCT_KEY Data to be loaded into partition P on TXN

16 Exchange Partition 1.Create temporary table TXN_TEMP as a hash- partitioned table 2.Perform parallel, direct-path load of new data into TXN_TEMP 3.Gather CBO statistics on table TXN_TEMP 4.Create indexes on the temporary hash-partitioned table TXN_TEMP corresponding to the local indexes on TXN using PARALLEL, NOLOGGING, and COMPUTE STATISTICS options 5.alter table TXN exchange partition P with table TXN_TEMP including indexes without validation update global indexes; 6.Table TXN_TEMP is left ready for next load cycle

17 22-Feb Feb Feb 2004 (empty)25-Feb 2004 Composite-partitioned table TXN Hash-partitioned table TXN_TEMP 2. Bulk Loads 5. EXCHANGE PARTITION 4. Index Creates 3. Analyze Exchange Partition

18 Exchange Partition It is a good idea to encapsulate this logic inside PL/SQL packaged- or stored-procedures: SQL> execute exchpart.prepare(TXN_FACT,TMP_, FEB-2004,27-FEB-2004); SQL> alter session enable parallel dml; SQL> insert /*+ append nologging parallel(n,4) */ 2 into tmp_txn_fact n 3 select /*+ full(x) parallel(x,4) */ * 4 from stage_txn_fact x 5 where load_date >= 25-FEB and load_date < 28-FEB-2004; SQL> commit; SQL> execute exchpart.finish(TXN_FACT,TMP_); DDL for exchpart.sql posted at

19 Exchange Partition It is wise to encapsulate this partition-exchange functionality in a PL/SQL package- or stored-procedure Along with the related functionality to: Gather CBO statistics on the table Build indexes (in the proper related tablespaces with the proper parameters) Also, the use of stored procedures to encapsulate this logic is crucial for security You do NOT want to grant anybody the ability to ALTER TABLE or CREATE TABLE You do NOT want anybody connecting as the table owner schema! Stored procedures, once created, can be granted

20 Publishing Loaded Data Coordinating the final EXCHANGE PARTITION operation permits all of the newly-loaded data to appear to the end- users simultaneously Publishing data If newly-loaded data is becoming visible to users gradually Then a load window when new queries cannot be started becomes necessary Exchange Partition load techniques make load windows of restricted activity unnecessary

21 Star Transformations Why are star join-transformations desirable? Typical Oracle nested loops, sort/merge, or hash join methods tend to start the query from a dimension table and join into the fact table using only one index Further filtering is performed by joining to other dimensions Very sequential and not optimal Star transformations do the following: Using database statistics, identifies the pattern of one large table at the center of a query involving two or more smaller tables Resolves a result set from each of the dimension tables Merges all of the results sets from all of the dimensions Uses powerful BITMAP MERGE operation on fact table

22 Star Transformations 1.Drive query from one of the dimensions 2.Join to the fact from that dimension 3.Filter on the fact by joining to other dimensions

23 Star Transformations 1.Find result set in each dimension 2.Merge results from all dimensions 3.Join to the fact from merged result set, using BITMAP MERGE index scan

24 Star Transformations Enabling star join transformations in Oracle Parameter settings: COMPATIBLE = or higher OPTIMIZER_FEATURES_ENABLE = or higher STAR_TRANSFORMATION_ENABLED = TEMP_DISABLE The optimizer will consider performing a cost-based query transformation on the star query but will not use temporary tables in the star transformation. Lots of bugs associated with setting to TRUE BITMAP_MERGE_AREA_SIZE = Default is 1M. Set to 16M? 32M? 128M? 512M? HASH_JOIN_ENABLED = TRUE Bitmap indexes All fact table foreign-key columns must have bitmap indices (Optional) all dimension table non-key attribute columns should have bitmap indices ALTER TABLE … MINIMIZE RECORDS_PER_BLOCK

25 Star Transformations STAR_TRANSFORMATION_ENABLE = TEMP_DISABLE Setting to TRUE causes star transformation to create then drop a global temporary table to store intermediate results Buggy, nasty plan… ALTER TABLE … MINIMIZE RECORDS_PER_BLOCK Metalink note # provides explanation ALTER TABLE command scans rows in the table and calculates a value for maximum number of rows in a block, which is used by CREATE BITMAP INDEX operations If MINIMIZE RECORDS_PER_BLOCK not performed, a default max number of rows value is used instead MINIMIZE allows smaller bitmap indexes to be created

26 Star Transformations A star transformation is not named as such in the EXPLAIN PLAN display Instead, it is indicated by the following operations: TABLE ACCESS BY ROWID OF ( ) BITMAP AND BITMAP MERGE (data retrieved from a dimension) BITMAP MERGE (data retrieved from a dimension) BITMAP MERGE (data retrieved from a dimension)

27 Star Transformations Bitmap-join index (BJI) is an optimization of the initial phase of a star transformation Index itself is comprised of saved data from the initial merge structure BJI is an index on a table using data from one or more table joins Almost like a materialized view of one of the steps of a BITMAP MERGE operation during a star query SQL> CREATE BITMAP INDEX c_s_p_bjix1 2 ON SALES (c.region, p.category) 3 FROM SALES s, CUSTOMERS c, PRODUCTS p 4 WHERE c.cust_id = s.cust_id 5 AND p.prod_id = s.prod_id; Index created.

28 Partition Pruning Oracle offers a total of five ways to partition tables and indexes RANGE of data values LIST of specified data values HASH (pseudo-random distribution of data values) Composite RANGE-HASH Composite RANGE-LIST Oracle cost-based optimizer can prune partitions that will not be utilized from the query Explicit pruning (partition/subpartition name specified in query) Implicit pruning (partition-key columns are referenced in query) Dont scan what you dont need

29 Partition Pruning The decision of what columns to partition upon must be carefully considered Question: Which queries do we want to optimize? Try to choose outer RANGE partition-key column based on a frequently-queried DATE column Bear in mind that RANGE partitioning is very important to ETL processing also… EXCHANGE PARTITION permits a great deal of flexibility, so there is no need to choose to benefit ETL exclusively Choose the inner HASH or LIST subpartition-key column based on any other frequently-queried column Use HASH subpartitioning for open-ended data values Use LIST subpartitioning for bounded static data values

30 Partition Pruning Prune by RANGE partition key on TXN_DT Prune by HASH subpartition key on ACCT_ID Or, prune by both!

31 Partition Pruning Explicit partition pruning SELECT… FROMsales_fact PARTITION (sales_200403) WHERE… SELECT… FROMsales_fact SUBPARTITION (sales_200403_sp12) WHERE…

32 Implicit partition pruning select… fromtxn_fact whereposting_date between 03-May-2004 and 05-May-2004 and… select… fromtxn_fact wherestate = NJ and… Partition Pruning

33 Partition Pruning Implicit partition pruning can be disabled if there is an expression involving the partition-key column Partition pruning is disabled in this situation: select… fromsales_fact wheretrunc(posting_date) = 15-MAY-2004 and… Partitiong pruning is enabled in this situation: select… fromsales_fact whereposting_date >= 15-MAY-2004 andposting_date < 16-MAY-2004 and…

34 Partition Pruning EXPLAIN PLAN displays are a little confusing: PARTITION RANGE (ALL) PARTITION LIST (ALL) PARTITION HASH (ALL) No partition pruning occurring PARTITION RANGE (ITERATOR) PARTITION LIST (ITERATOR) PARTITION HASH (ITERATOR) Pruning involving two or more partitions No mention of partitions at all in the EXPLAIN PLAN indicates that pruning has occurred to include one and only one partition

35 Table Compression Available in Oracle9i Release 2 (v9.2.0) Physical storage attribute for tables and materialized views [ CREATE | ALTER ] TABLE … [ COMPRESS | NOCOMPRESS ] … Restrictions: Can be used for RANGE or LIST partitions But cannot be used with HASH partitions But cannot be used for HASH or LIST sub-partitions Can be specified for NESTED tables But cannot be used with any LOB construct Such as CLOB, BLOB, BFILE, and VARRAY Not valid for index-organized or external tables

36 Table Compression Storing repeated data values once in each block A symbol table of distinct data values created in each block The symbol table is stored as another table in the block Each column in a row in a block references back to an entry in the symbol table in the block Header & Tailer Table & Column Map ITL Symbol table Free Row data

37 Table Compression Only bulk-loading INSERT operations perform compression CREATE TABLE … AS SELECT … INSERT /*+ APPEND */ (single-threaded and parallel) ALTER TABLE … MOVE … ALTER TABLE … MOVE PARTITION … ALTER TABLE … MERGE PARTITION … ALTER TABLE … SPLIT PARTITION … SQL*Loader DIRECT=TRUE Conventional INSERT operations unaffected SELECT, UPDATE, and DELETE behavior also unaffected

38 Table Compression SELECT Impressive performance improvements!!! Less I/O due to fewer blocks Compression ratio is linear with performance improvements Better impact on FULL table scans Indexed scans still exhibit less-impressive improvements Conventional and direct-path INSERT Noticeable performance slowdown (2-3x) UPDATE Very negative performance impact observed (4-8x) DELETE Some performance improvements observed

39 Table Compression Columns cannot be added, renamed, modified, or dropped on compressed tables or partitioned tables with compressed partitions Might be fixed in 10g? Local partitioned indexes are marked UNUSABLE during compression Includes indexes on non-partitioned tables Must be rebuilt Global partitioned indexes can be maintained using UPDATE GLOBAL INDEXES Includes non-partitioned indexes on partitioned tables A rare situation when GLOBAL indexes can be more highly available than LOCAL indexes!

40 READ ONLY tablespaces Partitioning by a datetime value allows the time-variant nature of data to be exploited Within the same table, different partitions can exist in different tablespaces Different tablespaces can reside on different types of storage media Most-expensive (i.e. SSD) Very expensive (i.e. SAN) Less expensive (i.e. JBOD/HDD, NAS, SAMFS, etc) Set tablespaces to READ ONLY as soon as possible Verify read-only nature using V$SEGMENT_STATISTICS and/or V$FILESTAT Over time, the majority of data in any DW can be READ ONLY

41 READ ONLY tablespaces can be: Backed up less frequently (i.e. quarterly, annually, etc) than active READ WRITE tablespaces, with no compromise on recovery strategy Moved from faster, more-expensive storage to cheaper, less-expensive storage Without interrupting operations OS-level copy of datafiles can proceed without interruption Finalizing ALTER DATABASE RENAME FILE would actually interrupt in-flight queries Only way to scale to infinity from a storage perspective! READ ONLY tablespaces

42 Q&A Questions? Personal website: Corporate website:

Download ppt " Scaling to Infinity Partitioning in Oracle DW 09-February 2004 Tim Gorman SageLogix, Inc. RMOUG Training Days 2005."

Similar presentations

Ads by Google