Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.semantec.de Oracle 8i/9i features which support Data Warehousing Author: Krasen Paskalev Certified Oracle DBA Semantec GmbH. D-71083 Herrenberg.

Similar presentations


Presentation on theme: "Www.semantec.de Oracle 8i/9i features which support Data Warehousing Author: Krasen Paskalev Certified Oracle DBA Semantec GmbH. D-71083 Herrenberg."— Presentation transcript:

1 www.semantec.de Oracle 8i/9i features which support Data Warehousing Author: Krasen Paskalev Certified Oracle DBA Semantec GmbH. D-71083 Herrenberg

2 www.semantec.de Agenda ETL Features Data Warehouse Management Data Warehouse Querying Parallel Operations

3 www.semantec.de Agenda ETL (Extraction, Transformation, Transportation and Loading) –Transportable Tablespaces –External Tables –Table Functions –MERGE Statement Data Warehouse Management Data Warehouse Querying Parallel Operations

4 www.semantec.de Transportable tablespaces The fastest method for moving data between databases The tablespeces with all their data are plugged into the data warehouse database ProductionData Warehouse Tablespace ftp

5 www.semantec.de External Tables Can be directly queried and joined in SQL, PL/SQL and Java Avoid data staging One step loading and transformation Save DB space ASCII file Excel sheet Read-only virtual tables External files

6 www.semantec.de Table Functions Can take a set of rows as input Can return a set of rows as output Can be used in the FROM clause Can be paralellized Can be pipelined User defined in PL/SQL, Java or C Region% West Central East 30 50 20 Sales Table Function

7 www.semantec.de Table Functions Pipelining Data Transformation Table Function Table Function Source Target Step 1Step 2 Log table

8 www.semantec.de MERGE statement idamount 43000 81000 92000 idamount 45000 73000 86000 92000 UPDATE INSERT new_salessales MERGE INTO sales s USING new_sales n ON (s.id = n.id) WHEN MATCHED THEN UPDATE s.amount = s.amount + n.amount WHEN NOT MATCHED THEN INSERT (s.id, s.amount) VALUES (n.id, n.amount) idamount 42000 73000 85000

9 www.semantec.de MERGE Advantages Single simple SQL statement Can be paralellized Can use Bulk DML Fewer scans of the base table

10 www.semantec.de More ETL Features Direct-path Interface –SQL*Loader –CREATE AS SELECT –INSERT –Oracle Call Interface Multi-table INSERTs

11 www.semantec.de Agenda ETL Features Data Warehouse Management –Partitioning –Materialized Views –DBMS_STATS Data Warehouse Querying Parallel Operations

12 www.semantec.de Partitioning Jan‘2002 Tablespace 0102 Feb‘2002 Tablespace 0202 Dec‘2002 Tablespace 1202... Table Sales

13 www.semantec.de Advantages of Partitioning Partition independance –LOAD, MOVE, Purge and DROP partitions –MERGE, SPLIT, EXCHANGE partitions –BACKUP, RESTORE, SET READ ONLY Partition elimination –SELECT or JOIN only the partition needed Parallel Operations –SELECT, UPDATE, DELETE, MERGE

14 www.semantec.de Partitioning Methods Hash Partitioning –Even row distribution by hash function Range Patitioning –<01.01.2002 | <01.02.2002 |... | <01.01.2003 List Partitioning –Stuttgart, Munich | Manheim, Frankfurt |...

15 www.semantec.de Table Compression Stores tables or partitions in compressed format Reduces disk space requirements Reduces memory requirements Speeds up query execution Speeds up backup and recovery Very efficient for highly redundant data – the FACT table 2 to 4 times compression is usual

16 www.semantec.de Materialized Views revenue_sum regionmonthrevenue sales regionmonthinvc_sum... SELECT region, month, sum(invc_sum) revenue FROM sales GROUP BY region, month

17 www.semantec.de Advantages of Materialized Views Improved query/reporting performance for: –Summaries –Agregates –Joins Fast Refresh –Data change tracking –Partition change tracking No application change needed – their usage is automatic

18 www.semantec.de DBMS_STATS New package for gathering table and index statistics Gathers statistics in parallel Can export and import statistics Production Data Warehouse Development Data Warehouse Statistics

19 www.semantec.de More Data Warehouse Management Features Index-organized tables Online index rebuild Online table rebuild

20 www.semantec.de Agenda ETL Features Data Warehouse Management Data Warehouse Querying –Bitmap Indexing –Star Query Transformation –Agregation – ROLLUP, CUBE, Grouping Sets –Analytic functions Parallel Operations

21 www.semantec.de Bitmap Indexes RegioneastcentralwestNULL rowid1000 0010...0001 rowid0100 1 0 0 0 0 1 0 0 0 0 1 0 ORAND NOT () = 1 1 0 0

22 www.semantec.de Advantages of Bitmap Indexes Reduced response time for ad-hoq queries Uses much less space than a B-tree index Dramatic performance gains for large class of queries: –Multiple AND, OR and NOT conditions –IS NULL conditions –COUNT –NOT IN - Bitmap MINUS –BETWEEN - Bitmap UNION

23 www.semantec.de Star Query Transformation The query is re-written for efficient execution sales cust_idprod_idamount q_id cust_idnameprod_idnameq_idname customersproductsquarters Steps: 1.Filter all dimentions 2.Combine the bitmap indexes of the fact table‘s foreign keys 3.Retrieve fact and dimention other rows

24 www.semantec.de Agregation Operators Oracle extends the GROUP BY clause by: –ROLLUP –CUBE –Grouping Sets 25008000 4000 6500 10500 SELECT SUM(amount) FROM sales GROUP BY county, quarter Q1 Q2 UKUS 10003000 15005000

25 www.semantec.de ROLLUP and CUBE ROLLUP(country, department, quarter) (country, department, quarter) (country, department) (country) () - Grand Total CUBE(country, department, quarter) (country, department, quarter) (country, department) (country, quarter) (department, quarter) (country) (department) (quarter) () - Grand Total ROLLUP – subtotals at increasing levels of agregation – from right to left CUBE – subtotals on all combinations n+1 2n2n

26 www.semantec.de Agregation Operators Advantages Applicable on many agregation functions: –SUM, AVG, COUNT –MIN, MAX –STDDEV, VARIANCE Flexible agregation groups and levels Runs in parallel

27 www.semantec.de Analytic functions Significantly improved performance for complex reports as: –Ranking – Find top 10 sales in each region –Moving agregates – What is the 90 day moving sales average? –Period-over-period comparison – What are the revenues from January 2002 compared to January 2001?

28 www.semantec.de Example – Moving Window SELECT c.cust_id, t.month, SUM(amount_sold) SALES, AVG(SUM(amount_sold)) OVER (ORDER BY c.cust_id, t.month ROWS 2 PRECEDING) MOV_3_MONTH FROM sales s, times t, customers c WHERE s.time_id = t.time_id AND s.cust_id = c.cust_id AND t. year = 1999 AND c.cust_id IN (6380) GROUP BY c.cust_id, t.month ORDER BY c.cust_id, t.month; CUST_ID MONTH SALES MOV_3_MONTH ------- ------- ------- ----------- 6380 1999-01 19,642 19,642 6380 1999-02 19,324 19,483 6380 1999-03 21,655 20,207 6380 1999-04 27,091 22,690 6380 1999-05 16,367 21,704 6380 1999-06 24,755 22,738

29 www.semantec.de More Data Warehouse Querying Features Function-based Indexes Optimizer Plan Stability Statistics for Long Running Operations Resumable Statements Full Outer Join With Operator Oracle Text “Advanced Searching with Oracle Text” 14.11.2002, 2 nd Conference day 11:50-12:30, Konferenzraum EG

30 www.semantec.de Agenda ETL Features Data Warehouse Management Data Warehouse Querying Parallel Operations

31 www.semantec.de Parallel Operations Dramatically reduce execution time of data intensive operations Loading –Direct Path Load DDL Statements –CREATE AS SELECT, CREATE INDEX –REBUILD INDEX, REBUILD INDEX PARTITION –MOVE, SPLIT, COALESCE PARTITION DML Statements –INSERT AS SELECT –UPDATE, DELETE and MERGE

32 www.semantec.de Parallel Operations Access methods –Table and index range and full scans Join methods –Nested loops, Sort merge, Hash, Star transformation SQL operations –GROUP BY, ROLLUP, CUBE –DISTINCT, UNION, UNION ALL –Agregate functions

33 www.semantec.de Parallel System Requirements Symetric Multiprocessor Systems, Clusters or Massively Parallel Systems Sufficient I/O Bandwidth Sufficient (Underutilized) CPUs Sufficient Memory

34 www.semantec.de Summary Effective handling of multi-terabyte Data Warehouses Rich feature set for all Data Warehouse operations Flexible agregation and analytical features for high performance queries Effective parallelizm

35 www.semantec.de Want to know more? Telephone: Fax: E-Mail: Internet: Company: Name: Address: Semantec GmbH. Krasen Paskalev, Armin Singer, Peter Kopecki Benzstr. 32 D-71083 Herrenberg, Germany Meet us here -> booth 2C at the ground floor +49(7032)9130-0 +49(7032)9130-12 +49(7032)9130-22 krasen.paskalev@semantec.bg singer@semantec.de www.semantec.de


Download ppt "Www.semantec.de Oracle 8i/9i features which support Data Warehousing Author: Krasen Paskalev Certified Oracle DBA Semantec GmbH. D-71083 Herrenberg."

Similar presentations


Ads by Google