Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005
Chapter 9 - Objectives u Legacy System u How data warehousing evolved. u The main concepts and benefits associated with data warehousing. u OLAP u Data mining
Legacy System u Systems that were developed in the early years of business processing u Rich source of historical data, but it’s difficult to retrieve, because of non-standard features u This is why we need data warehouse
Problems with Legacy System u Access data from a legacy system may be difficult for several reasons: –Developed for a different hardware or software platform –Use a different data model –Use a different DBMS –Use a different data definitions –Use a different data format u All these make difficulty in integration and sharing data
Data Definitions Problems u Homonyms – use different field names to store the same data in the different database u Synonyms - use the same field names to store different data in the different database u Domain integrity – domain for the same field may be different u Business rules – may be different in different database u Referential integrity – may be problems linking related records from different databases u Concurrency control – when multiple users access a database that design for single user
u Technique of extracting and filtering data from diverse database and use this data to build a new database u Stores information extracted from historical, operational and external databases u The primary purpose : to provide information for management decision making Data Warehouse Concepts
ActivityDatabaseData warehosue FunctionSupport business operation Support decision making DataProcess orientedSubject-oriented UsageStructured, repetitive Unstructured, repetitive ProcessingData entryEnd user initiated queries Database vs data warehouse
u Operational database / external database layer u Information access layer u Data access layer u Metadata layer u Process management layer u Application messaging layer u Physical layer u Data staging layer Data Warehouse Architecture
u Data – includes operational, historical and external data u Extraction and transformation – extract and transform data in different table u Data warehouse storage – store the extracted and transformed data in different table u Historical data – used for forecasting purposes u Reports, statistics, data analysis and presentation – output from data warehouse to make a decision Data Warehouse Implementation
Data Warehouse : Benefits and Risks u Benefits : –Reduces reporting cost –Reduces data consolidation and integration cost –Increase efficiency and decision making capabilities u Risks –House the wrong data –Expensive to build and maintain –Require organizational changes
u Support data modeling and multidimensional data analysis u Share the characteristics : –Provide user-friendly interface –Use multidimensional data analysis technique –Provide advanced database support –Support client/server architecture Online Analytical Processing
u Can be classified : –Relational Online Analytical Processing – use RDBMS –Multidimensional Online Analytical Processing – extension of RDBMS
u Data mining is a decision support tools that enables a user to access directly large amount of data and analyzes the data u Data mining is the set of activities used to find new, hidden, or unexpected patterns in data Data Mining
u Data mining process has four phases : –Data preparation – main data sets to be used are identified and cleaned –Data analysis and classification – identify common data characteristic or pattern –Knowledge acquisition – develop a model resemble target data –Prediction – used to predict future behaviour and forecast business outcomes Data Mining Technique
u Data mining tools today has this following characteristics : –Data preparation facilities –Selection of data mining operations –Product scalability and performance –Facilities for visualization of results Data Mining Tools
u END