What is a Data Warehouse? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
Anindya Datta Debra VanderMeer Krithi Ramamritham Presented by –
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Data Warehousing M R BRAHMAM.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
By N.Gopinath AP/CSE. Two common multi-dimensional schemas are 1. Star schema: Consists of a fact table with a single table for each dimension 2. Snowflake.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
CS346: Advanced Databases
Designing a Data Warehouse
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
A Comparsion of Databases and Data Warehouses Name: Liliana Livorová Subject: Distributed Data Processing.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.
Scalable Management of On-line E-commerce Interactions Krithi Ramamritham July 2000.
Data Warehousing.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
1 Data warehousing & Data mining Presented by Siva Krishna Nilesh Kumar.
© 2007 by Prentice Hall 1 Introduction to databases.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Data Warehouse. Design DataWarehouse Key Design Considerations it is important to consider the intended purpose of the data warehouse or business intelligence.
1 Data Warehouses BUAD/American University Data Warehouses.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
OLAP On Line Analytic Processing. OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access.
Pooja Sharma Shanti Ragathi Vaishnavi Kasala. BUSINESS BACKGROUND Lowe's started as a single hardware store in North Carolina in 1946 and since then has.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Improved Query Performance With Variant Indexes Patrick O’Neil, Dallan Quass Presented by Bo Han.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
I am Xinyuan Niu I am here because I love to give presentations. Data Warehousing.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
An Overview of Data Warehousing and OLAP Technology
Data Warehousing COMP3017 Advanced Databases Dr Nicholas Gibbins –
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
11/20/ :11 AMData Mining 1 Data Mining – CSE 9033 Chapter – 1; Data Warehousing Dr. Goutam Sarker, B.E., M.E., Ph.D.(Engineering), Fellow: IE(I),
Data Mining: Data Warehousing
Advanced Applied IT for Business 2
Data warehouse.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data Warehouse.
On-Line Analytic Processing
Data warehouse and OLAP
Physical Database Design and Performance
Chapter 13 The Data Warehouse
Summarized from various resources Modern Database Management
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Database Vs. Data Warehouse
Data Warehousing: Data Models and OLAP operations
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Column-Stores vs. Row-Stores: How Different Are They Really?
DATA CUBES E0 261 Jayant Haritsa Computer Science and Automation
Retail Sales is used to illustrate a first dimensional model
Data Warehouse.
Dimensional Model January 16, 2003
Data Warehousing Concepts
Applying Data Warehouse Techniques
Applying Data Warehouse Techniques
Data Warehouse and OLAP Technology
Presentation transcript:

What is a Data Warehouse? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. [Barry Devlin]

Data Warehouse Architecture Data Warehouse Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased Data ERP Systems

Data Warehouse for Decision Support & OLAP Putting Information technology to help the knowledge worker make faster and better decisions –Which of my customers are most likely to go to the competition? –What product promotions have the biggest impact on revenue? –How did the share price of software companies correlate with profits over last 10 years?

Why Separate Data Warehouse? Performance –Op dbs designed & tuned for known txs & workloads. –Complex OLAP queries would degrade perf. for op txs. –Special data organization, access & implementation methods needed for multidimensional views & queries. Function –Missing data: Decision support requires historical data, which op dbs do not typically maintain. –Data consolidation: Decision support requires consolidation (aggregation, summarization) of data from many heterogeneous sources: op dbs, external sources. –Data quality: Different sources typically use inconsistent data representations, codes, and formats which have to be reconciled.

OLTP vs. Data Warehouse OLTP systems are tuned for known transactions and workloads while workload is not known a priori in a data warehouse Special data organization, access methods and implementation methods are needed to support data warehouse queries (typically multidimensional queries) –e.g., average amount spent on phone calls between 9AM-5PM in Pune during the month of December

To summarize... OLTP Systems are used to “run” a business The Data Warehouse helps to “optimize” the business

Schema Design Database organization –must look like business –must be recognizable by business user –approachable by business user –Must be simple Schema Types –Star Schema –Fact Constellation Schema –Snowflake schema

Dimension Tables Dimension tables –Define business in terms already familiar to users –Wide rows with lots of descriptive text –Small tables (about a million rows) –Joined to fact table by a foreign key –heavily indexed –typical dimensions time periods, geographic region (markets, cities), products, customers, salesperson, etc.

Fact Table Central table –mostly raw numeric items –narrow rows, a few columns at most –large number of rows (millions to a billion) –Access via dimensions

Star Schema A single fact table and for each dimension one dimension table Does not capture hierarchies directly T i m e prodprod custcust citycity factfact date, custno, prodno, cityname,...

Snowflake schema Represent dimensional hierarchy directly by normalizing tables. Easy to maintain and saves storage T i m e prodprod custcust citycity factfact date, custno, prodno, cityname,... regionregion

Fact Constellation –Multiple fact tables that share many dimension tables –Booking and Checkout may share many dimension tables in the hotel industry Hotels Travel Agents Promotion Room Type Customer Booking Checkout

De-normalization Normalization in a data warehouse may lead to lots of small tables Can lead to excessive I/O’s since many tables have to be accessed De-normalization is the answer especially since updates are rare

Indexing Techniques Exploiting indexes to reduce scanning of data is of crucial importance Bitmap Indexes Join Indexes Bitsliced Indexing Projection Indexing Positional Indexing

Indexing Techniques Bitmap index: –A collection of bitmaps -- one for each distinct value of the column –Each bitmap has N bits where N is the number of rows in the table –A bit corresponding to a value v for a row r is set if and only if r has the value for the indexed attribute

BitMap Indexes An alternative representation of RID-list Specially advantageous for low-cardinality domains Represent each row of a table by a bit and the table as a bit vector There is a distinct bit vector Bv for each value v for the domain Example: the attribute sex has values M and F. A table of 100 million people needs 2 lists of 100 million bits

Customer Query : select * from customer where gender = ‘F’ and vote = ‘Y’ Bitmap Index M F F F F M Y Y Y N N N

Bit Map Index Base Table Rating Index Region Index Customers where Region = W Rating = M And

BitMap Indexes Comparison, join and aggregation operations are reduced to bit arithmetic with dramatic improvement in processing time Significant reduction in space and I/O (30:1) Adapted for higher cardinality domains as well. Compression (e.g., run-length encoding) exploited Products that support bitmaps: Model 204, TargetIndex (Redbrick), IQ (Sybase), Oracle 7.3

Join Indexes Pre-computed joins A join index between a fact table and a dimension table correlates a dimension tuple with the fact tuples that have the same value on the common dimensional attribute –e.g., a join index on city dimension of calls fact table –correlates for each city the calls (in the calls table) from that city

Join Indexes Join indexes can also span multiple dimension tables –e.g., a join index on city and time dimension of calls fact table

Star Join Processing Use join indexes to join dimension and fact table Calls C+T C+T+L +P Time Loca- tion Plan

Optimized Star Join Processing Time Loca- tion Plan Calls Virtual Cross Product of T, L and P Apply Selections

Bitmapped Join Processing AND Time Loca- tion Plan Calls Bitmaps

Indexing Techniques Variant Indexes [O’Neil & Quass, 97] –B + -tree (all systems): RID-list for each search-key value –Bitmapped (almost all systems): Bit-vectors (usually compressed) instead of RID-lists –Projection (Sybase IQ): Mirror copy of column –Bit-sliced (Sybase IQ): “bit-level” projection index Join Indexes [Valduriez et. al., 86] –Bitmapped-Join Indexes [O’Neil & Graefe ‘95] (Informix) Limitations: –These structures are maintained in addition to the base table. –Query response times are unacceptable in interactive contexts.

The DataIndex Both a storage and an access structure –Indexing comes for “free” Based on, and extends the notions of vertical partitioning and transposed files Two kinds presented here: –Basic DataIndex (BDI) –Join DataIndex (JDI)

The Basic DataIndex (BDI) Projection-like Index with matching column removed from the table Can have multiple columns (e.g., no TPC-D query asks for ExtPrice or Discount alone) CustKeyQty DiscountExtPrice CK1Q1 D1E1 CK2Q2 D2E2 CK3Q3 D3E3 CK4Q4 D4E4 Base Table CustKey CK1 CK2 CK3 CK4 Qty Q1 Q2 Q3 Q4 BDI DiscountExtPrice D1E1 D2E2 D3E3 D4E4 BDI

The Join DataIndex (JDI) –JDI is “BDI” of RIDs to foreign table. Tax T1 T2 T3 T4 Base Fact Table RIDs RID1 RID2 RID3 Tax T1 T2 T3 T4 JDI NameAddress N1A1 N2A2 N3A3 Base Dimension Table NameAddress N1A1 N2A2 N3A3 DiscountExtPrice D1E1 D2E2 D3E3 D4E4 CustKey CK1 CK2 CK3 CustKey CK1 CK2 CK3 DiscountExtPrice D1E1 D2E2 D3E3 D4E4 BDI CustKey CK1 CK2 CK3 BDI –Joins can be processed efficiently

–(Block ID, Slot Number) to Position: –Position to (Block ID, Slot Number): Order of records is conserved in DataIndexes A simple arithmetic mapping is used to associate fields of a record Records in each vertical partition can easily be mapped to blocks and vice-versa RID: (Block ID, Slot Number within that Block) Maintaining Logical Records

Query Processing with DataIndexes Two common classes of queries in data warehousing: –Range queries –Star join queries Example range query: SELECT CustKey FROM SALES WHERE Qty>10 SALES Table CustKey CK1 CK2 CK3 CK4 Qty BDI Steps: Rowset –Load display BDI(s) into memory CustKey CK1 CK2 CK3 CK4 BDI –Display values –Apply restrictions to form rowset(s)

Star Join Queries A fact table is joined with a set of dimension tables: SELECT Column-list FROM FactTable, DimensionTables WHERE SelectionPredicates AND JoinPredicates JoinPredicates: “ Fact.Attr1 = Dimension.Attr2 ” General Technique Used to Evaluate: 1. Apply SelectionPredicates on individual tables. 2. Perform Join on restricted set of rows or rowsets.