Ahsan Abdullah 1 Data Warehousing Lecture-8 De-normalization Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Slides:



Advertisements
Similar presentations
Organisation Of Data (1) Database Theory
Advertisements

Relational Terminology. Normalization A method where data items are grouped together to better accommodate business changes Provides a method for representing.
DENORMALIZATION CSCI 6442 © Copyright 2015, David C. Roberts, all rights reserved.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Lecture-19 ETL Detail: Data Cleansing
Module 2 Designing a Logical Database Model. Module Overview Guidelines for Building a Logical Database Model Planning for OLTP Activity Evaluating Logical.
Data Warehousing 1 Lecture-25 Need for Speed: Parallelism Methodologies Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
DWH-Ahsan Abdullah 1 Data Warehousing Lecture-5 Types & Typical Applications of DWH Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
© 2005 by Prentice Hall 1 Chapter 6: Physical Database Design and Performance Modern Database Management 7 th Edition Jeffrey A. Hoffer, Mary B. Prescott,
PARTITIONING “ A de-normalization practice in which relations are split instead of merger ”
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Lecture-33 DWH Implementation: Goal Driven Approach (1)
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Lecture-1 Introduction and Background
DWH-Ahsan Abdullah 1 Data Warehousing Lab Lect-2 Lab Data Set Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Ahsan Abdullah 1 Data Warehousing Lecture-12 Relational OLAP (ROLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
CSC271 Database Systems Lecture # 30.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
Ahsan Abdullah 1 Data Warehousing Lecture-17 Issues of ETL Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Ahsan Abdullah 1 Data Warehousing Lecture-11 Multidimensional OLAP (MOLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
1 Data Warehousing Lecture-13 Dimensional Modeling (DM) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
RDBMS Concepts/ Session 3 / 1 of 22 Objectives  In this lesson, you will learn to:  Describe data redundancy  Describe the first, second, and third.
Ahsan Abdullah 1 Data Warehousing Lecture-7De-normalization Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
CORE 2: Information systems and Databases NORMALISING DATABASES.
1 © Prentice Hall, 2002 Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott,
Chapter 8 Physical Database Design
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Information Systems Today (©2006 Prentice Hall) 3-1 CS3754 Class Note 12 Summery of Relational Database.
DWH-Ahsan Abdullah 1 Data Warehousing Lecture-4 Introduction and Background Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
Ahsan Abdullah 1 Data Warehousing Lecture-18 ETL Detail: Data Extraction & Transformation Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. &
Ahsan Abdullah 1 Data Warehousing Lecture-9 Issues of De-normalization Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
1 Data Warehousing Lecture-14 Process of Dimensional Modeling Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
Ahsan Abdullah 1 Data Warehousing Lecture-10 Online Analytical Processing (OLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
Dr. Abdul Basit Siddiqui Assistant Professor FUIEMS.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Ahsan Abdullah 1 Data Warehousing Lecture-16 Extract Transform Load (ETL) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
Component 4: Introduction to Information and Computer Science Unit 6a Databases and SQL.
1 Data Warehousing Lecture-15 Issues of Dimensional Modeling Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Data Warehousing Lecture-30 What can Data Mining do? Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
DWH-Ahsan Abdullah 1 Data Warehousing Lecture-22 DQM: Quantifying Data Quality Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 4 Logical & Physical Database Design
Ahsan Abdullah 1 Data Warehousing Lecture-6Normalization Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
1 Introduction to Database Systems, CS420 SQL Views and Indexes.
Lecture-3 Introduction and Background
Physical Changes That Don’t Change the Logical Design
Lecture-32 DWH Lifecycle: Methodologies
Advanced QlikView Performance Tuning Techniques
Physical Database Design and Performance
ITD1312 Database Principles Chapter 5: Physical Database Design
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Physical Database Design
Relational Database Model
Lecture-38 Case Study: Agri-Data Warehouse
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Lecture-35 DWH Implementation: Pitfalls, Mistakes, Keys
Presentation transcript:

Ahsan Abdullah 1 Data Warehousing Lecture-8 De-normalization Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research National University of Computers & Emerging Sciences, Islamabad

Ahsan Abdullah 2 De-normalization Techniques

Ahsan Abdullah 3 Splitting Tables ColAColBColC Table Vertical Split ColAColBColAColC Table_v1Table_v2 ColAColBColC Horizontal split ColAColBColC Table_h1Table_h2

Ahsan Abdullah 4 Splitting Tables: Horizontal splitting… Breaks a table into multiple tables based upon common column values. Example: Campus specific queries. GOAL  Spreading rows for exploiting parallelism.  Grouping data to avoid unnecessary query load in WHERE clause.

Ahsan Abdullah 5 Splitting Tables: Horizontal splitting ADVANTAGE  Enhance security of data.  Organizing tables differently for different queries.  Graceful degradation of database in case of table damage.  Fewer rows result in flatter B-trees and fast data retrieval.

Ahsan Abdullah 6 Splitting Tables: Vertical Splitting  Infrequently accessed columns become extra “baggage” thus degrading performance.  Very useful for rarely accessed large text columns with large headers.  Header size is reduced, allowing more rows per block, thus reducing I/O.  Splitting and distributing into separate files with repeating primary key.  For an end user, the split appears as a single table through a view.

Ahsan Abdullah 7 Pre-joining …  Identify frequent joins and append the tables together in the physical data model.  Generally used for 1:M such as master- detail. RI is assumed to exist.  Additional space is required as the master information is repeated in the new header table.

Ahsan Abdullah 8Pre-Joining… normalized Tx_IDSale_IDItem_IDItem_QtySale_Rs Tx_IDSale_IDItem_IDItem_QtySale_RsSale_dateSale_person denormalized Sale_IDSale_dateSale_person Master Detail 1 M

Ahsan Abdullah 9 Pre-Joining: Typical Scenario Typical of Market basket query Join ALWAYS required Tables could be millions of rows Squeeze Master into Detail Repetition of facts. How much? Detail 3-4 times of master

Ahsan Abdullah 10 Adding Redundant Columns… ColAColB Table_1 ColAColCColD…ColZ Table_2 ColAColB Table_1’ ColAColCColD…ColZ Table_2 ColC

Ahsan Abdullah 11 Adding Redundant Columns… Columns can also be moved, instead of making them redundant. Very similar to pre-joining as discussed earlier. EXAMPLE Frequent referencing of code in one table and corresponding description in another table.  A join is required.  To eliminate the join, a redundant attribute added in the target entity which is functionally independent of the primary key.

Ahsan Abdullah 12 Redundant Columns: Surprise Note that:  Actually increases in storage space, and increase in update overhead.  Keeping the actual table intact and unchanged helps enforce RI constraint.  Age old debate of RI ON or OFF.

Ahsan Abdullah 13 Derived Attributes: Example Age is also a derived attribute, calculated as Current_Date – DoB (calculated periodically). GP (Grade Point) column in the data warehouse data model is included as a derived value. The formula for calculating this field is Grade*Credits. #SID DoB Degree Course Grade Credits Business Data Model #SID DoB Degree Course Grade Credits GP Age DWH Data Model Derived attributes  Calculated once  Used Frequently DoB: Date of Birth