The Oracle9i Multi-Terabyte Data Warehouse Jeff Parker Manager Data Warehouse Development Amazon.com Session id:

Slides:



Advertisements
Similar presentations
Yukon – What is New Rajesh Gala. Yukon – What is new.NET Framework Programming Data Types Exception Handling Batches Databases Database Engine Administration.
Advertisements

Irwin/McGraw-Hill Copyright © 2000 The McGraw-Hill Companies. All Rights reserved Whitten Bentley DittmanSYSTEMS ANALYSIS AND DESIGN METHODS5th Edition.
BY LECTURER/ AISHA DAWOOD DW Lab # 4 Overview of Extraction, Transformation, and Loading.
BY LECTURER/ AISHA DAWOOD DW Lab # 3 Overview of Extraction, Transformation, and Loading.
Module 8 Importing and Exporting Data. Module Overview Transferring Data To/From SQL Server Importing & Exporting Table Data Inserting Data in Bulk.
5 Copyright © 2005, Oracle. All rights reserved. Extraction, Transformation, and Loading (ETL) Loading.
Chapter 5 Data Management. – The Best & Most Convenient Way to Learn Salesforce.com 2 Objectives By the end of the module, you.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Designing a Data Warehouse
Passage Three Introduction to Microsoft SQL Server 2000.
DAT702.  Standard Query Language  Ability to access and manipulate databases ◦ Retrieve data ◦ Insert, delete, update records ◦ Create and set permissions.
Copying, Managing, and Transforming Data With DTS.
Simplify your Job – Automatic Storage Management Angelo Session id:
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
ETL By Dr. Gabriel.
BUSINESS INTELLIGENCE/DATA INTEGRATION/ETL/INTEGRATION AN INTRODUCTION Presented by: Gautam Sinha.
Data Warehouse Tools and Technologies - ETL
Oracle10g for Data Warehousing Jiangang Luo
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
Activity Running Time DurationIntro0 2 min Setup scenario 2 2 min SQL BI components & concepts 4 5 min Data input (Let’s go shopping) 9 7 min Whiteboard.
Database Technical Session By: Prof. Adarsh Patel.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
1 Oracle Database 11g – Flashback Data Archive. 2 Data History and Retention Data retention and change control requirements are growing Regulatory oversight.
Distributed Systems Fall 2014 Zubair Amjad. Outline Motivation What is Sqoop? How Sqoop works? Sqoop Architecture Import Export Sqoop Connectors Sqoop.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
FORUM II Best Practices in Data Warehousing in Higher Education: A Framework for Higher Education Reporting April 18, 2005 Slide 1 Cornell University’s.
Data Management Console Synonym Editor
Oracle Data Integrator Transformations: Adding More Complexity
4 Copyright © 2009, Oracle. All rights reserved. Defining Source Metadata.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
20 Copyright © Oracle Corporation, All rights reserved. Oracle9 i Extensions to DML and DDL Statements.
6 Extraction, Transformation, and Loading (ETL) Transformation.
9 Copyright © Oracle Corporation, All rights reserved. Creating and Managing Tables.
Carey Probst Technical Director Technology Business Unit - OLAP Oracle Corporation.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
3 Copyright © 2009, Oracle. All rights reserved. Accessing Non-Oracle Sources.
Page 1. Data Integration Using Oracle Streams A Case Study Session #:
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
08-Nov Database TEG workshop, Nov 2011 ATLAS Oracle database applications and plans for use of the Oracle 11g enhancements Gancho Dimitrov.
- Joiner Transformation. Introduction ►Transformations help to transform the source data according to the requirements of target system and it ensures.
GLOBEX INFOTEK Copyright © 2013 Dr. Emelda Ntinglet-DavisSYSTEMS ANALYSIS AND DESIGN METHODSINTRODUCTORY SESSION EFFECTIVE DATABASE DESIGN for BEGINNERS.
13 Copyright © Oracle Corporation, All rights reserved. Maintaining Data Integrity.
7 Strategies for Extracting, Transforming, and Loading.
A table is a set of data elements (values) that is organized using a model of vertical columns (which are identified by their name) and horizontal rows.
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
1 Do You Need an ETL Tool? Ben Bor NZ Ministry of Health Ben Bor NZ Ministry of Health.
Batch Jobs Using the batch job functions. Use [Bulk Changes][Batch Job Utility] to start. Read the information panel. Check with TAMS Technical Support.
Relational Database Management System(RDBMS) Structured Query Language(SQL)
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
 CONACT UC:  Magnific training   
5 Copyright © 2007, Oracle. All rights reserved. Implementing the Performance Improvements.
Views / Session 3/ 1 of 40 Session 3 Module 5: Implementing Views Module 6: Managing Views.
2 Copyright © 2009, Oracle. All rights reserved. Managing Schema Objects.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
ETL Design - Stage Philip Noakes May 9, 2015.
Tim Hall Oracle ACE Director
Introduction.
SQOOP.
Populating a Data Warehouse
Using JDeveloper.
Alternative Storage Techniques
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

The Oracle9i Multi-Terabyte Data Warehouse Jeff Parker Manager Data Warehouse Development Amazon.com Session id:

The Challenges Rapidly evolving business Growing data volumes Do more with less

The Challenges Rapidly evolving business – New international markets – Continual innovation of features on Amazon Buy it used Magazine subscriptions – Marketplace Partnerships – Toys R Us, Target Growing data volumes Do more with less

The Challenges Rapidly evolving business Growing data volumes – 2X growth yearly over the past 5 years – Currently 10 Terabytes of raw data Do more with less

The Challenges Rapidly evolving business Growing data volumes Do more with less – Innovative use of technology and resources – Throwing money and people at the problem is not an option – Leverage existing investment in Oracle

Addressing the issues Rapidly evolving business – Denormalize only for performance reasons – Create a solution that allows new datasets to be brought in rapidly to the DW, but without high maintenance costs Growing data volumes Do more with less

Addressing the issues Rapidly evolving business Growing data volumes – Dual database approach to ETL Staging database for efficient transformation of large datasets. SQL and hash-joins allow transforms to scale in a non-linear fashion Second database optimized for analytics – Oracle as an API Simplifies ETL architecture Better scalability than traditional ETL tools Do more with less

Addressing the issues Rapidly evolving business Growing data volumes Do more with less – One DW schema supports all countries – Cut costs by eliminating unneeded software – Data driven Load functionality

The ETL Process Extract data from source The Load process Dimensional Transforms

The ETL Process Extract data from source – Can create one or more files to be loaded – Must produce Metadata upon which the Load process can depend The Load Process Dimensional Transforms

Extract produced Metadata Describes each field in database type terms Changes as the dataset changes Can reference multiple files Very reliable No additional overhead

XML Based Metadata <COLUMN ID="dataset_id" DATA_TYPE="NUMBER" DATA_PRECISION="38" DATA_SCALE="0“/>

The ETL Process Extract data from source The Load Process – Makes extensive use of External Tables – MERGE and Bulk Insert – Contains integrated DBA tasks – Every load is tracked in an operational database Dimensional Transforms

The Load Process

External Tables – access to files on the operating system – Is a building block in a broader ETL process MERGE & Bulk Insert Integrated DBA tasks

The External Table Created by using Metadata from the Extract process Data is read-only No indexes Use DBMS_STATS to set number of rows

Example External Table 1. Copy the data to the database server  Data must reside in a file system location specified by the DBA’s. - create directory DAT_DIR as ‘/stage/flat’

Example External Table 2. Create the external table using the DML from the extract. CREATE TABLE XT_datasets_77909 ( dataset_id NUMBER, dataset_name VARCHAR2(80), creation_date DATE,created_by VARCHAR2(8) ) ORGANIZATION EXTERNAL( TYPE ORACLE_LOADER DEFAULT DIRECTORY dat_dir ACCESS PARAMETERS( records delimited by newline characterset UTF8 fields terminated by '\t' LOCATION (‘/flat/datasets_ _US.txt' )

The External Table No pre-staging of data Ability to describe a flat file to Oracle Handles horizontally partitioned files Good error messaging

The Load Process External Tables MERGE – Can be run in parallel – Combined with external table provides a powerful set of ETL tools Integrated DBA tasks

MERGE Allows for update or insert in a single statement – If key value already exists Yes, update row No, insert row MERGE statement is auto-generated Row level column transforms are supported

MERGE

MERGE example MERGE into DATASETS ds USING ( SELECT ds.dataset_name,ds.creation_date,nvl(created_by,’nobody’) as created_by,sysdate as last_updated FROM XT_datasets_77909 xt ) src On ( xt.dataset_id = ds.dataset_id ) When matched then UPDATE SET ds.dataset_name = src.dataset_name,ds.creation_date = src.dataset_name,ds.created_by = src.created_by,ds.last_updated = sysdate when not matched thenINSERT( dataset_name, creation_date, created_by, last_updated ) VALUES( dataset_name, creation_date, created_by, sysdate )

MERGE Issues we faced – Duplicate records in the dataset – NESTED-LOOPS from external table – Parallelism is not enabled by default – Bulk Load partition determination

The Load Process External Tables MERGE Integrated DBA tasks – Reduces workload required by the DBA team – Streamlines the load process – Eliminates human error

Integrated DBA Tasks Provided by the DBA team – Managed by the DBA team – ETL team does not need special knowledge of table layout

Integrated DBA Tasks Truncate Partition developer makes call truncate_partition( ‘TABLE-NAME’, partition-key1, partition-key2, partition-key3 ) DBA utility translates this and executes alter table TABLE-NAME drop partition dbi _101;

Integrated DBA Tasks Analyze Partition developer makes call analyze_partition( ‘TABLE-NAME’, partition-key1, partition-key2, partition-key3 ) DBA utility translates this and executes dbms_stats.gather_table_stats(ownn ame, tabname, partname, cascade, estimate_percent, granularity);

Integrated DBA Tasks Return Partition Name developer makes call get_partition_name( ‘TABLE-NAME’, partition-key1, partition-key2, partition-key3 ) DBA utility translates this and returns the appropriate name of the partition. This is very useful when bulk loading tables.

Integrated DBA Tasks Partitioning utilities – Helps to streamline the process – Reduces workload of DBA team – Helps to eliminate the problem of double loads for Snapshot tables and partitions

The Load Process External Tables – Provides access to flat files outside the database MERGE – Parallel “upsert” simplifies ETL – Row level transforms can be performed in SQL Integrated DBA tasks – Reduces workload required by the DBA team – Streamlines the load process – Eliminates human error Loads are repeatable processes

Summary Reduction in time to integrate new subject areas Oracle parallelism scales well Eliminated unneeded software

Summary Oracle has delivered on the DW promise – Oracle External table combined with MERGE is a viable alternative to other ETL tools – ETL tools are ready today

& Q U E S T I O N S A N S W E R S

Reminder – please complete the OracleWorld session survey Thank you.