1 Tim Gorman Evergreen Database Technologies, Inc. http://www.EvDBT.com North Carolina Oracle Users Group Scaling to Infinity: Partitioning Data Warehouses in OracleTim Gorman Evergreen Database Technologies, Inc.
2 Speaker Qualifications Tim Gorman (chief, cook, and bottle-washer – EvDBT.com)Director of RMOUG “Training Days 2007” conferenceInfo online atCo-author (with Gary Dodge)“Oracle8 Data Warehousing” 1998 John Wiley & Sons“Essential Oracle8i Data Warehousing” 2000 John Wiley & SonsCo-author (with Oak Table Network -“Oracle Insights: Tales of the Oak Table” 2004 ApressOracle data warehousing DBA since 1994Technical manager at Oracle ConsultingIndependent consultant since 1998
3 I’ve seen the good, the bad, and the ugly… AgendaI’ve seen the good, the bad, and the ugly…In the end, successful data warehouses are a combination of all threeBut in general, I see three major errors that result in doom…Ignore basic requirements for DW and design what is familiarFail to portray data changes over timeFail to utilize partitioning from the beginning
4 Ignoring the requirements Repeat after me -- reporting and analysis applications do not enforce business rules!Reporting and analysis applications are responsible for presenting data in the format that works best for end-users and their query/analysis toolsVery often, what end-users seem to want is a simple spreadsheet, hundreds of columns wideGIVE THEM WHAT THEY WANT!Conceal from them what it takes to provide what they wantDo NOT build a data model to enforce referential integrity and/or business rules
5 Ignoring the requirements Third-normal form:Eliminate repeating groupsEvery attribute is atomic and scalarEliminate functional dependencies on composite key componentsEvery attribute is functionally dependent on the whole keyEliminate functional dependencies on non key componentsEvery fact/attribute in the entity should rely on the whole key4th, 5th, and 6th normal forms have been definedBut most entities that are in 3NF are also 4th, 5th, and 6th NFIntended for use in process-oriented operational systemsenforce data integrity according to business rulesusing referential-integrity constraint mechanisms in application code as well as databases
6 Ignoring the requirements Data presented in a simplistic dimensional model versus the 3rd-normal-form (3NF) entity-relationship model used by most operational systemsRalph Kimball discusses in The Data Warehouse ToolkitJohn Wiley & Sons - ISBN #provide immediate, on-demand, and high-performance access to corporate or organizational subject datacomprised of fact tables containing varying levels of summarized data and dimension tables representing important subject areasvery simple representation of dataIt is a spreadsheet with one degree of normalization for flexibilityalso known as a star schema because diagrams generally represent the fact table as a hub and dimensions as spokes
7 Ignoring the requirements TransactionalOperationalEntity-RelationalModelingDimensionalModelingCustomersSuppliersSuppliers DimProducts DimOrdersProductsOrder FactsOrder LinesCustomers DimTime Dim
8 Ignoring the requirements Fact tablesMore volatileContain columns for:Dimension keysMeasuresIn a spreadsheet or tabular reportDimension keys don’t appear at allMeasures appear in the “cells” of the reportDimension tablesUsually more staticAlthough the dimension for people is usually quite volatileAttributesAttributes appear as “column headers” or “row headers”
9 Time-variant data, who cares? Two major types of queries from business intelligence applications to data warehouse databasesPoint in timeWhat is the present situation? What do the numbers look like now?“Situational awareness” applications, also known as “dashboards” or “executive information systems”Usually uses the present point in time, but could also use any specific point in time in the pastTrend analysisHow do things look now versus 3 months ago? A year ago?How have things changed day-by-day over the past quarter? Week-by-week over the past year? Month-by-month over the past 7 years?
10 Time-variant data, who cares? Consider this…Dimension tables are usually designed to be point-in-time or type-1People, items, products, etc.Locations, time, etc.Fact tables are almost always designed to be time-variant or type-2TransactionsWhat happens when you join transactions from years ago with dimensional attributes from the present?For example, when analyzing purchases by location, does it make sense to summarize all transactions by a person’s present location?Or should it reflect the person’s location at the time of the transaction?
11 Time-variant data, who cares? Every data warehouse has at least one slowly-changing dimension (SCD)Usually involving “people” (i.e. accounts, customers, employees, parties, etc)Static dimensions do not need to be time-variantIdentifying “static” dimensions: if a change is made to the dimension, should it be reflected across all time?SCDs should be represented as “type-2”“type-1” views of SCDs can be created as needed“type-1” views of fact tables can also be created, if necessary, to support point-in-time tactical reporting
12 Time-variant data, who cares? Type-2 Dimension(time-variant)Type-1 Dimension(point-in-time view)PERSON_DIMPerson_keyEff_dtLast_nameFirst_nameAddress_1Address_2City…CURR_PERSON_DIMPerson_keyo Eff_dtLast_nameFirst_nameAddress_1Address_2City…PKPK
13 Time-variant data, who cares? PERSON_DIMPerson_keyEff_dtCURR_PERSON_DIMPerson_keyTXN_FACT* …Person_keyPerson_eff_dt…
14 Time-variant data, who cares? Slowly-changing dimensions should always be “type-2”With “type-1” views constructed using the just-loaded “type-2” dataSo, with this in mind…Why do people so often treat time-variant tables as an after-thought?Why do “extraction-transformation-loading” (ETL) processes so often focus on “MERGE” logic (“if row doesn’t exist then INSERT else UPDATE”) on the current point-in-time tables, and then insert change data as an after-thoughta.k.a. “type-1” or “point-in-time” dataInstead of…inserting change data into the time-variant “type-2” table from which point-in-time “type-1” views (as materialized views?) can be built for any point-in-time?Think about it…If users should be using “type-2” data for SCDs, who usually utilizes the “type-1” views of the SCDs? What are they good for?
15 Four characteristics of a DW Non-volatile, time-variant, subject-oriented, integratedBill Inmon “Building the Data Warehouse” 3rd Ed 2002 (Wiley)Think about what these mean?Consider the converse of these characteristics?Volatile? Static-image? Process-oriented? Application-specific?Time-variant, non-volatile database implies:Insert, index, and analyze each row of data only onceFrom an implementation perspective, this is vital to remember! And often ignored completely!!!Consider an extreme situation?Analytical database for quantum research in physics50 Tbytes of data to load every day
16 Insert-only processing enables… The Virtuous CycleInsert-only processing enables…Tables and indexes partitioned by timeOptionally sub-partitioned by other key valuesPartitioned tables/indexes enables…Partition pruning during queriesDirect-path loads using EXCHANGE PARTITIONTime-variant tables/indexes and tablespacesPurging using DROP or TRUNCATE partition instead of DELETEPartition pruning enables…Infinite scalability for queries, regardless of how large the database becomesDirect-path (a.k.a. append) loads enable…Ability to load more data, faster, more efficientlyTable compression
17 Time-variant partitioned tables/indexes enable… The Virtuous CycleTime-variant partitioned tables/indexes enable…Time-variant tablespacesTime-variant tablespaces enable…READ ONLY tablespaces for older, less-volatile dataREAD ONLY tablespaces enable…Near-line storage (i.e. NAS, SAMFS/HFS, etc)“Right-sizing” of storage to the need, classified by IOPSBackup efficienciesREAD WRITE tablespaces scheduled for backup every day or weekREAD ONLY tablespaces scheduled for backup every quarter or year
18 Using EXCHANGE PARTITION for loads enables… The Virtuous CycleUsing EXCHANGE PARTITION for loads enables…Elimination of ETL “load window” and 24x7 availability for queriesDirect-path loadsBitmap indexes and bitmap-join indicesBitmap indices enable…Star transformations on “star” (dimensional) schemasStar transformations enable…Bitmap-join indexesSUCCESS!optimal query-execution plan for dimensional data models!
19 Conventional-path operations are trouble with: The Death SpiralVolatile data presented in a static-image according to process-oriented concepts leads to…ETL using “conventional-path” INSERT, UPDATE, and DELETE operations (including MERGE and multi-table INSERT)Conventional-path operations are trouble with:Bitmap indexes and bitmap-join indexesForcing frequent complete rebuilds until they get too bigContention in Shared Pool, Buffer Cache, global structuresMixing of queries and loads simultaneously on table and indexesPeriodic rebuilds/reorgs of tables if deletions occurFull redo logging and undo transaction trackingETL will dominate the workload in the databaseQueries will consist mainly of “dumps” or extracts to downstream systemsQuery performance will be abysmal and worsening…
20 Everything just gets harder and harder to do… The Death SpiralWithout partitioningQuery performance worsens as tables/indexes grow largerLoads must be performed into “live” tablesUsers must be locked out during “load cycle”In-flight queries must be killed during “load cycle”Bitmap indexes must be dropped/rebuilt during “load cycle”Entire tables must be re-analyzed during “load cycle”Entire database must be backed up frequentlyData cannot be “right-sized” to storage options according to IOPSEverything just gets harder and harder to do……and that stupid Oracle software is to blame…BRING ON TERADATA OR <insert-flavor-of-the-month>
21 Exchange PartitionThe basic technique of bulk-loading new data into a temporary “load table”, which is then indexed, analyzed, and then “published” all at once to end-users using the EXCHANGE PARTITION operation, should be the default load technique for all large tables in a data warehousefact tablesslowly-changing or quickly-changing dimensionsAssumptions for this example:Composite partitioned fact table named TXNRange partitioned on DATE column TXN_DATEHash partitioned on NUMBER column ACCT_KEYData to be loaded into partition P on TXN
23 Exchange PartitionCreate temporary table TXN_TEMP as a hash-partitioned tablePerform parallel, direct-path load of new data into TXN_TEMPCreate indexes on the temporary hash-partitioned table TXN_TEMP corresponding to the local indexes on TXNusing PARALLEL, NOLOGGING, and COMPUTE STATISTICS optionsGather CBO statistics on table TXN_TEMPOnly table and columns stats -- leave computed index stats!alter table TXNexchange partition P with table TXN_TEMPincluding indexes without validation update global indexes;
24 Exchange PartitionIt is a good idea to encapsulate this logic inside PL/SQL packaged- or stored-procedures:SQL> execute exchpart.prepare(‘TXN_FACT’,’TMP_’, -2 ’25-FEB-2004’);SQL> alter session enable parallel dml;SQL> insert /*+ append parallel(n,4) */into tmp_txn_fact n3 select /*+ full(x) parallel(x,4) */ *4 from stage_txn_fact x5 where load_date >= ‘25-FEB-2004’6 and load_date < ‘28-FEB-2004’;SQL> commit;SQL> execute exchpart.finish(‘TXN_FACT’,’TMP_’);DDL for “exchpart.sql” posted at
25 Exchange PartitionLoading time-variant fact and dimension tables is not the only load activity in most data warehousesOften, some tables contain current or point-in-time dataExample: type-1 dimension “snowflaked” from type-2 dimensionThis is often an excellent situation for materialized viewsBut, as is often the case, the refresh mechanisms built in with materialized views might not be the most efficientWith each load cycle, the current images need to be updatedInstead of performing transactional MERGE (I.e. Update or Insert) logic directly on the tableRebuild the table into a temporary table, then “swap” it in using EXCHANGE PARTITION
26 Exchange Partition Composite-partitioned table ACCOUNT_DIM CURR_ACCOUNT_DIMHash-partitioned tableAfter the main composite-partitioned dimension table of ACCOUNT_HISTORY_DIM has been loaded (as documented on the previous slide), it is now necessary to reload or update the current-image “view” represented by the table ACCOUNT_DIM.Since the main type-2 slowly-changing dimensions as well as the main fact tables represent the true image of time-variant data in the data warehouse, then current-image dimensions and facts should be considered generically as “views” of these tables. These “views” are subsets of rows, with the subsets comprised of the latest versions of each data granule along the TIME dimension.Examples of this in CDCI include ACCOUNT_DIM (a.k.a. current-image view of ACCOUNT_HISTORY_DIM) and ACCOUNT_FINANCIAL (a.k.a. current-image view of several fact tables).There are at least five main options for refreshing these current-image “views”:Use ETL utilities (i.e. Ab Initio) to perform INSERT/UPDATE (a.k.a. “up-sert”) or “merge” logic on the current-image view/table.Truncate and completely refresh the current-image view/table by extracting entire set of data from the source (main) tables. For example, a full query of the source table which extracts only the most-recently updated rows would get the job done.Capture changed rows in the source (main) table and incrementally update the current-image view/tableRefresh a complete new copy of the current-image view/table, using a query that finds the most-recent record for each account solely from the entire ACCOUNT_HISTORY_DIM table, then use EXCHANGE PARTITION to publish the new data for end-usersRefresh a complete new copy of the current-image view/table, by merging the existing ACCOUNT_DIM table with the newly-inserted rows in ACCOUNT_HISTORY_DIM, then use EXCHANGE PARTITION to publish the new data for end-usersOur recommendation is option #5. It holds to the basic ETL design principle of “only INSERTs and SELECTs and no UPDATEs or DELETEs” (which disqualifies options #1 and #3). This simple-yet-powerful design principle guarantees performance scalability as data volumes increase, and it is the basis of options #2, #4, and #5. Bulk INSERT operations are orders of magnitude faster than “conventional” SQL operations. Introducing any UPDATE or DELETE operations tend to inhibit scalable performance, and should be used only as a last resort.Option #5 is the best approach because it avoids a “complete” refresh from the large ACCOUNT_HISTORY_DIM table (i.e. option #2). It accomplishes this by utilizing the range-partitioning scheme on the ACCOUNT_HISTORY_DIM table as a “change-data capture” mechanism. That is, we know that the rows just inserted reside only in the latest partition. Thus, we can easily query only those recently-changed rows using Oracle’s “partition-pruning” mechanism and then merge that data against all of the existing rows in the ACCOUNT_DIM table. Thus, we create a new copy of ACCOUNT_DIM in the ACCOUNT_DIM_TEMP table, which is almost a mirror-image of ACCOUNT_DIM. This is illustrated in the slide above.Please note that ACCOUNT_DIM itself is a composite-partitioned table, with only a single range-partition on EFFECTIVE_DATE with values less than MAXVALUE. From a functional perspective, ACCOUNT_DIM is really just a hash-partitioned table, but physically it should be range-partitioned and hash-subpartitioned. The purpose of this seemingly-useless range-partition is to simplify EXCHANGE PARTITION logic to be seen on the next slide…The following SQL statement can be utilized to perform the actual merge/build:INSERT /*+ append nologging parallel(t, 16) */ INTO ACCOUNT_DIM_TEMP TSELECT /*+ ordered full(x) parallel(x, 16) */LAST_VALUE(x.col1) OVER (PARTITION BY ACCOUNT_KEY ORDER BY EFFECTIVE_DATE) col1,LAST_VALUE(x.col2) OVER (PARTITION BY ACCOUNT_KEY ORDER BY EFFECTIVE_DATE) col2,…LAST_VALUE(x.colN) OVER (PARTITION BY ACCOUNT_KEY ORDER BY EFFECTIVE_DATE) colNFROM (SELECT col1, col2, …, colN FROM ACCOUNT_DIMUNIONSELECT col1, col2, …, colN FROM ACCOUNT_HISTORY_DIMWHERE EFFECTIVE_DATEBETWEEN TO_DATE(‘25-FEB :00:00’,’DD-MON-YYYY HH24:MI:SS’)AND TO_DATE(‘25-FEB :59:59’,’DD-MON-YYYY HH24:MI:SS’)ORDER BY ACCOUNT_KEY, EFFECTIVE_DATE) x;The general idea is that the in-line view is comprised of the UNION of two SELECT statements. One of the SELECT statements retrieves the entire contents of the current ACCOUNT_DIM table. The other SELECT statement retrieves only the recently-changed rows in the ACCOUNT_HISTORY_DIM table. The assumption shown in this code example assumes that only the most recent partition (i.e. 25-Feb data) has been recently loaded. This may or may not be the case. If data was also loaded into earlier partitions, then it would make sense to query those as well. For example, if data was loaded as early as 21-Feb 2004, then the query might instead need to look like:BETWEEN TO_DATE(‘21-FEB :00:00’,’DD-MON-YYYY HH24:MI:SS’)AND CDCI_LOAD_DATEThis allows partition-pruning to limit the number of partitions scanned by the EFFECTIVE_DATE range-partition key column while limiting the rows retrieved by the date value in the CDCI_LOAD_DATE column to only those rows loaded on 25-Feb.Please turn to the next slide for the remaining steps…Merge/build operation23-Feb200424-Feb200425-Feb2004
27 New current-image data in hash-partitioned table CURR_ACCT_DIM_TEMP Exchange PartitionEXCHANGE PARTITIONPrevious cycle’s current-image data in composite-partitioned table CURR_ACCOUNT_DIM, with single partition named PZERONew current-image data in hash-partitioned table CURR_ACCT_DIM_TEMPContinuing from the previous slide…Now that the new current-image has been constructed in the temporary ACCOUNT_DIM_TEMP table, our next task is to publish the data to the ACCOUNT_DIM table, which is of course visible to end-users, without interrupting the availability of the table. To do this, we once again use the ubiquitous EXCHANGE PARTITION operation.Since the ACCOUNT_DIM_TEMP is a hash-partitioned table and the ACCOUNT_DIM table is a composite range-partitioned and hash-subpartitioned table, a single EXCHANGE PARTITION command should suffice.The EXCHANGE PARTITION mechanism has the advantage of supporting in-flight queries without interruption. Oracle’s read-consistency mechanism will all queries that are already in-progress to reference rows that have been exchanged to the individual standalone tables. New queries initiated after the EXCHANGE PARTITION operation will, of course, reference the newly-exchanged and newly-visible partition.Another nice side-effect of this technique is that the previous day’s image of ACCOUNT_DIM is still stored in the standalone hash-partitioned ACCOUNT_DIM_TEMP table. Restoring this previous day’s image can be done quite swiftly, if necessary.The upshot is zero downtime for users of the ACCOUNT_DIM table. And it is much faster than any of the other options. In computing, it is not often that high-availability and high-performance go together. All too often, one must be traded for the other. But not in this case… :-)
28 Exchange PartitionINSERT /*+ append parallel(t, 8) */ INTO TMP_CURR_ACCOUNT_DIM TSELECT /*+ full(x) parallel(x, 8) */0 partkey, acctkey, effdt, …(and so on for all columns)…FROM (SELECT acctkey, effdt, …(and so on for all columns)…,row_number() over (partition by acctkey order by effdt) rankingFROM (SELECT acctkey, effdt, …(and so on for all columns)…FROM CURR_ACCOUNT_DIMUNION ALLSELECT acctkey, effdt, …(and so on for all columns)…FROM CURR_ACCOUNT_DIM partition (P ))WHERE RANKING = 1;
29 Exchange Partition ALTER TABLE CURR_ACCOUNT_DIM exchange partition PZEROwith table TMP_ACCOUNT_DIM[ with | without ] validationincluding indexesupdate global indexes;
30 Choosing partition keys The most important decision when partitioning is…Choosing the partition key columnsAll benefits of partitioning hinges upon this choice!!!Which columns to partition upon?If the table contains time-variant dataChoose the RANGE partition key DATE column to optimize:ETL according to load cyclesEnd-user access through partition pruningChoose the HASH or LIST sub-partition key column to optimize:If the table does NOT contain time-variant dataChoose the RANGE, HASH, or LIST partition key column to optimize:
31 Choosing partition keys When choosing columns to optimize ETLChoose a column which distinguishes different load cyclesShould be a DATE columnWhen choosing columns to optimize end-user accessGather hard facts about usage – don’t guess!Oracle STATSPACK and Oracle10g AWRData dictionary table SYS.COL_USAGE$Populated automatically by cost-based optimizer in Oracle9i and aboveDDL script “dba_column_usage.sql” can be downloaded fromAmbeo Usage Tracker (http://www.ambeo.com)Teleran iSight (http://www.teleran.com)
32 Choosing partition keys Example: fact table for credit-card processingFact table is time-variantUse range partitioning on DATE datatype to optimize ETL and queriesUse hash- or list-subpartitioning to optimizer queriesFact table has four DATE columnsTXN_DT (date on which transaction occurred)POST_DT (date on which transaction was posted by merchant)PAID_DT (date on which transaction was paid to merchant)LOAD_DT (date on which transaction was loaded to DW)
33 Choosing partition keys Which should be chosen? And why?LOAD_DTOptimizes ETL perfectly, but does not benefit queries in any way…Data is loaded by LOAD_DTEnd-users don’t query on LOAD_DTTXN_DT, POST_DT, and PAID_DTEach benefits a different set of end-user queriesPresents some problems for ETL processingEach date loads mostly into the latest partition, then a little into each partition for the previous 2-4 daysThis situation can be handled by iterating through the five steps of the basic EXCHANGE PARTITION algorithmWhere each iteration processes a different LOAD_DT value
34 Summary recommendations Use dimensional data models for the “presentation” to end-usersDon’t “free lance” and confuse the end-usersUnderstand the purpose of facts and dimensionsBase the database design on time-variant data structuresDon’t join “type-2” fact data to “type-1” dimension dataLoad “type-2” data first, then rebuild “type-1” data from thatUse partitioningEnable the “virtuous cycle” of Oracle features that cascade from using partitioning intelligently…
36 Thank You! Rocky Mountain Oracle Users Group (www.rmoug.org) “Training Days 2008”, Denver COTue-Thu Feb 2008Tues 12-Feb: 4-hour “university sessions”Wed-Thu Feb: main conferenceThu-Sun Feb 2008Informal ad-hoc ski weekend for attendees who wish to partake!!!Tim’s contact info:Web: