2 Agenda Data Warehouse Data Warehouse Concepts ETL Process Oracle Warehouse Builder(OWB)OWB ArchitectureData Sources and Data TargetsETL: MappingsETL: Process FlowsData Quality ManagementDemonstrationExtracting DataData Profiling and CleansingTransforming DataToday I like to talk about oracle warehouse builder. I’ll give you a short presentation about the basic concepts of warehouse building process and demonstrate it with Oracle warehouse builder tool.
3 Data Warehouse Oracle Warehouse Builder Oracle OLAP/ Data Miner As you know, data warehouse is a repository of an organization's electronically stored data which is designed to facilitate reporting and analysis.When you build a data warehouse system, you can consider the building process as two parts. One is to collect data from different data sources, understand it and integrate it into target platform. It is called ETL process. The other part is to find pattern, predict value and generate report after building one unified data repository.In the mining project, MCMS, we are using oracle warehouse builder as an ETL processing tool to build a data warehouse, and for later part, we are currently using oracle data miner. In this presentation I will briefly review oracle warehouse builder and then my collegue Ying’ will present about data mining.Oracle Warehouse BuilderOracle OLAP/ Data MinerFind PatternPredict Behaviour or value(Classification/ Regression)Generate ReportETL (Extract/ Transform/ Load)Data Quality ControlMeta data Management“one of the major ETL tools in the market “
4 ETL ProcessExtract: extract data from sources and put in a so-called Staging Area(SA),usually with the same structure as the source.Here, the abbreviation of ETL means extract, transform and load. extract data from sources and put it in the target platform usually with the same structure as the source. after extraction all data will be located at one platform so you can easily join and union tables and filters and sort the calculations. In this step, you can check on data quality and cleans the data if necessary. Many experts recommend cleans data in this step.Load. Finally data is loaded into a central warehouse, usually into fact and dimension tables.Transform: join and union tables, filter and sort the calculations. In thisstep, we can check on data quality and cleans the data if necessary.Load: finally, data is loaded into a central warehouse, usually into factand dimension tables.
5 OWB ArchitectureOWB, oracle warehouse builder is one of the major ETL tool in the market and you can use it as free for academic usage.The OWB architecture looks like this. In the client side, you have two application called design centre and repository browser. Repository borswer is web based repository viewing tool and design centre is GUI based application which is mainly used to design warehouse and conduct ETP process. In development time, you login to oracle database server through design centre, design metadata and ETL processes, and then all what you’ve done is saved into server repository called warehouse builder repository. If you deploy and execute your designs in runtime, they are implemented as database objects and scripts suchas PL/SQL and stored into oracle data base.
6 Design CentreThe design centre have many functionality for data modelling, data compliance, integration and quality management.Details for each function will be followed on the rest of this presentation.
7 Data sources and Data Targets OracleTables, Views, MViews, Queues, External Tables, Sqlloader, Transportable Tablespaces, Data Pump…DB2, Sybase, SQLServer, Informix, Mainframes, … (Oracle Transparent Gateways)ODBCFlat FilesXMLApplicationsOracle Ebusiness SuitePeopleSoftSAPSiebelOracleDB2, Sybase, SQLServer, Informix, Mainframes, … (Oracle Transparent Gateways)ODBCFlat FilesXMLOracle warehouse build allow you to define your data sources and targets of any form of this list. Data base including db2, sybase, flat files and xml. And even you can migrate the enterprise application data into data warehouse.
8 ETL: Mappings Declarative modeling of Data Flows Map from Source to TargetIntegrated Data QualityN&A standardizationMatch/MergeProfilingGenerates SQL & PL/SQLMerge, transportable tablespaces, data pump, sqlloader, xml data types, BLOBS/CLOBS, …Leverage custom data transformationsAfter defining the form of data sources and targets, you can start to do ETL processes called mapping.Here you can design how to map your data source object and target object with graphical notations. After then you can see generate sql loader scripts or PL/SQL scripts.
9 ETL: Process flows Declarative modeling of Process/work Flows Co-ordinate execution of Maps and other activitiesCreate complex transitionsSend , FTP source/target files, call any external process, SQL Plus, NotificationsGenerates Oracle Workflow, Oracle Scheduler & XPDLAnother form of ETL process in oracle warehouse builder is designing data process flows. It’s kind of work flow. You can design sequence of tasks including ETLmappings and activities such as sending , execute external process.
10 Data Quality Management Data ProfilingMissing or invalid valuesDistributions of the values in a specific columnData Rule for CleansingWhen you build a data warehouse system, mange data quality is the one of the most important thing. Oracle warehouse builder have a data profiling tool, you can check your data, any missing data or invalid data and then you can make some cleansing strategy here.Once you set the data rules to cleanse your data, the tool deals with detecting and removing errors and inconsistencies from data in order to improve the quality of data. I will show you how it works later on demonstration session.Based on the data profiling results, Warehouse Builder derives a set of data rules that you can use to cleanse the source data. You can automatically generate corrections based on these data rules by performing data correction actions.
11 Metadata Management Dependency Management Data Lineage at attribute levelImpact Analysis at attribute levelMetadata SnapshotsChange Management (diff, merge and reconcile)Reporting (browser)APIs (Scripting, SQL, PL/SQL)Exchange (import/export)Finally, about metadata such as table and view, you can check dependencies between metadata and you can get some idea how any change of metadata will affect the others. And also you can audit the metadata changes.
12 Define Sources & Targets DemonstrationDefine Sources & TargetsExtractData Profiling1. Identifying datasources/ targets and importing metadata2. Import data and design and execute mappings (Extract)3. Data profiling and decide data cleansing strategy“Derived Data Rule”“Generated Code”TransformLoadnow let me give you a demonstration using OWB step by step. First I will define data sources and targets and then import data from source and design and execute mappings to extract the data. Before merging data from different sources, I will do data profiling first and make sure the data is cleansed. Finally, the data are cleansed, merged and ready to be used for reporting and mining.4. Design and executemappings (Merging) and cleansing5. Design dimension tables“Generated Code”