Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extreme Performance with Oracle Data Warehousing Andreas Katsaris Data warehouse and BI Practice Manager, Arisant LLC.

Similar presentations

Presentation on theme: "Extreme Performance with Oracle Data Warehousing Andreas Katsaris Data warehouse and BI Practice Manager, Arisant LLC."— Presentation transcript:

1 Extreme Performance with Oracle Data Warehousing Andreas Katsaris Data warehouse and BI Practice Manager, Arisant LLC

2 Source: IDC, July 2009 – Worldwide Data Warehouse Management Tools 2008 Vendor Shares Oracle #1 for Data Warehousing

3 Gartner Magic Quadrant for BI Platforms

4 Market Position Recognized as a leader since OWB ,000+ companies rely on OWB Optimized for Oracle environments 2005

5 Oracle Data Warehousing Complete, Open and Integrated Standard components Certified configurations Comprehensive security Higher availability Easier to manage Lower cost of ownership Smart Storage Operating System Database Data Models BI Tools BI Applications

6 Distributed Data Marts and Servers Expensive Data Warehouse Architecture DataMarts Data Mining OnlineAnalytics ETL

7 Consolidate onto a Data Warehouse Single Source of Truth on Low-Cost Servers and Storage Oracle Database 11g DataMarts Data Mining OnlineAnalytics ETL

8 Choice of Data Warehouse Solutions Database Machine Complete system, including software, servers, networking and storage Custom Solutions Flexibility for the most demanding data warehouse Reference Configurations Best-practice configurations for data warehousing

9 Drastically Simplified Deployments Database Machine eliminates the complexity of deploying database systems –Months of configuration, troubleshooting, tuning Database Machine is ready on day one –Pre-built, tested, standard, supportable configuration –Runs existing applications unchanged Extreme performance out of the box Months to Days

10 Best Data Warehouse Machine Massively parallel high volume hardware to quickly process vast amounts of data –Exadata runs data intensive processing directly in storage Most complete analytic capabilities –OLAP, Statistics, Spatial, Data Mining, Real-time transactional ETL, Efficient point queries Powerful warehouse specific optimizations –Flexible Partitioning, Bitmap Indexing, Join indexing, Materialized Views, Result Cache Dramatic new warehousing capabilities Data Mining OLAP ETL New

11 Sun Oracle Database Machine Hardware Improvements Same architecture as previous Database Machine –Same number and type of Servers, CPUs, Disks Plus Flash Storage! New Latest Technologies Faster Better 80% Faster CPUs 33% More SAS Disk Capacity 100% More SATA Disk Capacity 50% Faster Disk Throughput 100% Faster Networking 125% More Memory 200% Faster Memory 100% More Ethernet Connectivity Xeon 5500 Nehalem 600 GB SAS Disks 2 TB SATA Disks 6 Gb SAS Links 40 Gb InfiniBand 72 GB per DB Node DDR3 DRAM 4 Ethernet links per DB Node New

12 Exadata Database Processing in Storage Exadata storage servers implement data intensive processing in storage –Row filtering based on where predicate –Column filtering –Join filtering –Incremental backup filtering –Storage Indexing –Scans on encrypted data –Data Mining model scoring 10x reduction in data sent to DB servers is common No application changes needed –Processing is automatic and transparent –Even if cell or disk fails during a query

13 Simple Query Example What were my sales yesterday? Exadata Storage Grid SUM Scan compressed blocks in partitions/indexes Retrieve sales amounts for Sept 24 Scan compressed blocks in partitions/indexes Retrieve sales amounts for Sept 24 Optimizer Chooses Partitions and Indexes to Access Select sum(sales) where Date=24-Sept Select sum(sales) where Date=24-Sept 10 TB scanned 1 GB returned to servers 10 TB scanned 1 GB returned to servers Oracle Database Grid

14 Flash Flash storage more than doubles scan throughput –50 GB/sec Smart –Knows when to avoid caching based on object reusability and size –Accepts user directives at table, index and segment level Combined with Columnar compression –Up to 50 TB of data fits in flash –Queries on compressed data run up to 500 GB/sec 50 HITACHI USP V TERADATA 2550 NETEZZA TwinFin 12 SUN ORACLE Database Machine Query Throughput Uncompressed Data

15 50X Up To Exadata Hybrid Columnar Compression Data is stored by column and then compressed –the decompression of the data is offloaded to Exadata eliminating CPU overhead on the database servers Query Mode for data warehousing –Optimized for speed –10X compression ratio is typical –Scans improve proportionally Archival Mode for infrequently accessed data –Optimized to reduce space –15X compression is typical –Up to 50X for some data

16 Exadata Storage Index Transparent I/O Elimination with No Overhead Exadata Storage Indexes maintain summary information about table data in memory –Store MIN and MAX values of columns –Typically one index entry for every MB of disk Eliminates disk I/Os if MIN and MAX can never match where clause of a query Completely automatic and transparent ABCD Min B = 1 Max B =5 Table Index Min B = 3 Max B =8 Select * from Table where B<2 - Only first set of rows can match

17 Benefits Multiply 1 TB with compression 10 TB of user data Requires 10 TB of IO 100 GB with partition pruning 20 GB with Storage Indexes 5 GB Smart Scan on Memory or Flash Subsecond On Database Machine Data is 10x Smaller, Scans are 2000x faster

18 Database Machine Success Database Machine is succeeding in all geographies and industries against every competitor

19 Oracles Data Integration Strategy Pervasive Data Integration Embed Data Integration within Oracle Database –Integrated, optimized and Best for Oracle Database –Easiest way to load external information into Oracle Database Provide Comprehensive Data Integration –Comprehensive Heterogeneous Technology Foundation –Integrated Runtime, Data Management Tools and Administration –Best of Breed: significant architectural differentiators vs. competitors –Hot Pluggable: broad support of sources & packaged applications Pre-Integrate Solutions for Oracle Portfolio –Make data integration pervasive with Lower Cost & Complexity

20 ODI – Enterprise Edition Bundling E-LT Forces E-LT Transformation vs. E-T-L Real Time Change Data Capture Oracle Data Integrator & Oracle Warehouse Builder High Speed Batch Data Movement Data Warehouse Planning System OLTP Sources Application Sources Legacy Sources Set-based Data Transformations

21 OWB 11g Release 2 Heterogeneous connectivity with high performance native code Extensible heterogeneous connectivity –Connect to any JDBC, ODBC or gateway enabled data store –Enhanced support for Change Data Capture and real-time extracts –Advanced ERP/CRM/MDM connectors –Call any web service as a data provider, publish extracts as web services High-performance native code for any platform –Use extensible code templates to create native code –Leverage the database engine for best performance –Use native un-loaders/loaders for extremely fast bulk loading Enhanced Developer Productivity –New declarative user interface based on Fusion Client Platform –Rich metadata support for end-to-end data lineage and impact analysis –Oracle BI EE metadata generation

22 Unified dev team High-Level Roadmap Unified platform that is a superset of OWB and ODI OWB/ODI investments are fully protected No forced migrations Natural upgrade path No regressions in functionality Training and support to continue for both product releases 10gR2 10gR1 11gR1 10gR3 11gR1 11gR2 OWB ODI CY 2012Common pricing

23 Oracle Runs the Largest Databases Meteorological Research –220+ Terabyte Oracle database –Worlds largest database on Linux Website Personalization –500,000,000 unique users –200 Terabyte Data Warehouse How are these companies successful using Oracle?

24 Q : What was the total number of umbrellas sold in Boston during the month of May 2008 ? CustomersTimes Products Channel Sales Star Query Optimization Specific Data Warehouse Access Methods

25 Oracle Partitioning Optimized performance –Comprehensive partitioning strategies to address business problems One consistent way to manage all your data –Not just for data warehouse and high-end OLTP –New partitioning features bring partitioning to every application Reduced total cost of ownership –Place less used data on lower cost storage Proven functionality in 8th generation –Experience comes with age and customer usage

26 The Concept of Partitioning Simple yet powerful Large Table Difficult to Manage Partition Divide and Conquer Easier to Manage Improve Performance Composite Partition Higher Performance More flexibility to match business needs SALES JanFeb SALES JanFeb Europe USA

27 Q : What was the total sales amount for May 20 and May ? Select sum(sales_amount) From SALES Where sales_date between to_date(05/20/2009,MM/DD/YYYY) And to_date(05/22/2009,MM/DD/YYYY); Sales Table 5/17 5/18 5/19 5/20 5/21 5/22 Only the 2 relevant partitions are read Partition for Performance Partition Pruning

28 5% Active 95% Less Active SALES TABLE (7 years) High End Storage Tier 2003 Partition to Manage Data Growth Low End Storage Tier 2-3x less per terabyte

29 Partitioning - 11g enhancements Partition Advisor New composite combinations –list/range, range/range, list/hash, list/list Interval Partitioning –Automatic creation of range-based partitions REF Partitioning –Partition detail table based on the master-table key Virtual-Column Based Partitioning –Partition based on an expression

30 Interval Partitioning Minimizes periodic partition maintenance (No need to create new partitions) Partition segments allocated as soon as new data arrives Local indexes are created and maintained as well Requires at least one range partition Range key value determines the range high point Partitioning key can only be a single column, and either DATE or NUMBER datatype CREATE TABLE sales(…) PARTITION BY RANGE (sales_date) INTERVAL(NUMTOYMINTERVAL(1, 'MONTH')) ( PARTITION p1 VALUES LESS THAN (TO_DATE(' ', 'DD-MM-YYYY')) );

31 ILM – Information Lifecycle Management Why Bother –Compliance –Performance –Cost –Data Maintenance 5% - most active 35% - less active 60% - historical

32 ILM – Information Lifecycle Management Implementation of different tiers of storage Consider Oracle ILM Assistant (free!) –Leverages Oracle Partitioning –Uses Lifecycle Definitions –Calculates storage costs & savings –Simulates the impact of partitioning on a table –Advises how to partition a table –Generates scripts to move data when required

33 In-Memory Parallel Execution Challenge Traditionally Parallel Execution takes advantage of the IO capacity of a system Disk speeds are not keeping up with Moores law while CPU and Memory are Solution In-Memory Parallel Execution harness the memory capacity of the entire system An affinity algorithm places fragments of a object in memory on different RAC nodes New

34 In-Memory Parallel Execution SQL statement Determine the size of the table being looked at Read into the buffer cache on any node Table is extremely small Table is a good candidate for In-Memory Parallel Execution Always use direct read from disk Table is extremely Large Fragments of Table are read into each nodes buffer cache Only parallel server on the same RAC node will access each fragment

35 In-Memory Parallel Execution A single database machine has over 400GB of memory usable for caching Database release 11.2 introduces parallel query processing on memory cached data –Harnesses memory capacity of entire database cluster for queries –Foundation for world record 1TB TPC-H Exadata Hybrid Columnar Compression enables multi-terabyte tables or partitions to be cached in memory QphH: 1 TB TPC-H Faster than in-memory specialized startups Memory has 100x more bandwidth than Disk Source: Transaction Processing Council, as of 9/14/2009: Oracle on HP Bladesystem c-Class 128P RAC, 1,166,976 available 12/1/09. Exasol on PRIMERGY RX300 S4, 1,018,321 available 08/01/08. ParAccel on SunFire X ,842 $4.57 available 10/29/07.SunFire X4100

36 Automatic Degree of Parallelism Auto DOP Optimizer derives the DOP from the statement based on resource requirements for all scans operations Applies to all types of statements Query, DML, or DDL Explain plan has been enhanced to show DOP selected SQL Tune now uses Auto DOP to recommend parallelism

37 Parallel Statement Queuing SQL statements Statement is parsed and oracle automatically determines DOP If enough parallel servers available execute immediately If not enough parallel servers available queue statements FIFO Queue When the required number of parallel servers become available the first stmt on the queue is dequeued and executed

38 Summary management Materialized Views Separate database object –Stores pre-calculated and aggregated results Database supports sophisticated transparent query rewrite –Queries don't change – run against base tables –Join-backs, additional aggregations etc. supported Database supports incremental fast refresh –Based on the query definition

39 Enhanced MV Refresh Refresh time for conventional insert, aggregate MV On average 30 – 40% better refresh performance in 11g Release 2

40 Summaries Sales by Region Sales by Date Sales by Product Sales by Channel Query Rewrite Summary Management Today RegionDate Product Channel SQL Query Relational Star Schema

41 Cube Organized Materialized Views Summaries RegionDate Product Channel SQL Query Query Rewrite Automatic Refresh OLAP Cube

42 Cubes and Cube Organized MViews Stored in special db areas called Analytic Workspaces (which are stored in BLOBs) Manipulated via Analytic Workspaces Manager tool 11g- Best of both worlds –Rewrite and refresh features of regular MVs –Performance benefits of OLAP cubes –Expose OLAP cube as a relational object access via SQL –CUBE_TABLE function searches cube using SQL

43 Cubes and Cube Organized MViews SQL> explain plan for 2 select * from table(cube_table('GLOBAL.PRICE_CUBE')); PLAN_TABLE_OUTPUT Plan hash value: | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | | 0 | SELECT STATEMENT | | 2000 | 195K| 29 (0)| 00:00:01 | | 1 | CUBE SCAN PARTIAL OUTER| PRICE_CUBE | 2000 | 195K| 29 (0)| 00:00:01 | Query cube as if it was a relational object Refresh using DBMS_MVIEW.REFRESH

44 Customer Case Study Reporting Application Cube-organized MVs replaced table-based MVs Time to build aggregate data reduced by 89% Longer running queries reduced from 5 minutes to 12 seconds Transparent access to cube-MV No changes to reporting applications

45 Advanced Compression Table Compression –Table Scan Performance: 2x faster –Storage Savings: 2x smaller –DML Performance: 5% slower CREATE TABLE SALES_FACT (…) COMPRESS FOR ALL OPERATIONS; RMAN Compression –~40% faster than compressed backups in 10g –Slightly better compression ratio than in 10g RMAN> CONFIGURE COMPRESSION ALGORITHM zlib; RMAN> backup as COMPRESSED BACKUPSET database archivelog all; Data Pump Compression expdp hr FULL=y DUMPFILE=dpump_dir:full.dmp COMPRESS;

46 Advanced Compression

47 SQL Query Result Cache Caching of query results or PL/SQL function calls DML/DDL against dependent database objects invalidates cache Candidate queries –access many, many rows –return few rows (small result set) –executed many times

48 SQL Query Result Cache result_cache_mode init.ora parameter –AUTO (optimizer uses repetitive executions to determine if query will be cached) –MANUAL (need use /*+ RESULT_CACHE */ hint in queries) –FORCE (All results are stored in cache) result_cache_max_size init.ora parameter –default is dependent on other memory settings –(0.25% of memory_target or 0.5% of sga_target or 1% of shared_pool_size) –0 disables result cache –never >75% of shared pool (built-in restriction)

49 Q&A Thank you for attending If you have follow-up questions I will be here for the rest of the day or can be contacted by -

Download ppt "Extreme Performance with Oracle Data Warehousing Andreas Katsaris Data warehouse and BI Practice Manager, Arisant LLC."

Similar presentations

Ads by Google