DENORMALIZATION CSCI 6442 © Copyright 2015, David C. Roberts, all rights reserved.

Slides:



Advertisements
Similar presentations
Chapter 18 Methodology – Monitoring and Tuning the Operational System Transparencies © Pearson Education Limited 1995, 2005.
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Module 2 Designing a Logical Database Model. Module Overview Guidelines for Building a Logical Database Model Planning for OLTP Activity Evaluating Logical.
Chapter 13 (Web): Distributed Databases
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
Topic Denormalisation S McKeever Advanced Databases 1.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
Chapter 3 Database Management
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
Overview Distributed vs. decentralized Why distributed databases
Organizing Data & Information
Physical Database Monitoring and Tuning the Operational System.
Designing for Performance Announcement: The 3-rd class test is coming up soon. Open book. It will cover the chapter on Design Theory of Relational Databases.
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
PARTITIONING “ A de-normalization practice in which relations are split instead of merger ”
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Distributed Databases
Introduction to Databases
IST Databases and DBMSs Todd S. Bacastow January 2005.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Chapters 17 & 18 Physical Database Design Methodology.
CSC271 Database Systems Lecture # 30.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Organizing Data and Information AD660 – Databases, Security, and Web Technologies Marcus Goncalves Spring 2013.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Completing the Model Common Problems in Database Design.
OnLine Analytical Processing (OLAP)
1 Data Warehousing Lecture-13 Dimensional Modeling (DM) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
RDBMS Concepts/ Session 3 / 1 of 22 Objectives  In this lesson, you will learn to:  Describe data redundancy  Describe the first, second, and third.
Ahsan Abdullah 1 Data Warehousing Lecture-7De-normalization Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
CORE 2: Information systems and Databases NORMALISING DATABASES.
Chapter 8 Physical Database Design
Natural vs. Generated Keys. Definitions Natural key—a key that occurs in the data, that uniquely identifies rows. AKA candidate key. Generated key—a key.
© Pearson Education Limited, Chapter 15 Physical Database Design – Step 7 (Consider Introduction of Controlled Redundancy) Transparencies.
Ahsan Abdullah 1 Data Warehousing Lecture-9 Issues of De-normalization Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Organizing Data Revision: pages 8-10, 31 Chapter 3.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
Dr. Abdul Basit Siddiqui Assistant Professor FUIEMS.
1.1 CAS CS 460/660 Introduction to Database Systems Relational Algebra.
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
UNIT-II Principles of dimensional modeling
Methodology – Monitoring and Tuning the Operational System.
©NIIT Normalizing and Denormalizing Data Lesson 2B / Slide 1 of 18 Objectives In this section, you will learn to: Describe the Top-down and Bottom-up approach.
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 4 Logical & Physical Database Design
Flat Files Relational Databases
Denormalization - Causes redundancy, but fast performance & no referential integrity - Denormalize when specific queries occur frequently, a strict performance.
Ahsan Abdullah 1 Data Warehousing Lecture-8 De-normalization Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
NORMALIZATION Handout - 4 DBMS. What is Normalization? The process of grouping data elements into tables in a way that simplifies retrieval, reduces data.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Database Planning Database Design Normalization.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
SQL Basics Review Reviewing what we’ve learned so far…….
1 Agenda TMA02 M876 Block 4. 2 Model of database development data requirements conceptual data model logical schema schema and database establishing requirements.
Databases and DBMSs Todd S. Bacastow January
Physical Changes That Don’t Change the Logical Design
Physical Database Design and Performance
ITD1312 Database Principles Chapter 5: Physical Database Design
Methodology – Monitoring and Tuning the Operational System
Methodology – Monitoring and Tuning the Operational System
Presentation transcript:

DENORMALIZATION CSCI 6442 © Copyright 2015, David C. Roberts, all rights reserved

Agenda Review of normalization Goals of denormalization Denormalization example Techniques Guidelines 2

Normalization Aims at reducing redundancy Single representation of each fact Single representation of each entity type Normalization tends to reduce storage size At one time this was important At 10¢ per gigabyte, it’s often not an issue Redundant data can be inconsistent data Initial designers know to update all copies (usually) Later programmers can easily make mistakes 3

...but Normalization Tends to increase the number of tables Tends to require more joins in queries Tends to reduce performance 4

Join Database tables are independent Join operations are inherently complex and resource-intensive The goal of data independence relies on the independence of tables In a sense, the basic advantages of relational database inflict performance penalties 5

Where We Are Normalization leads to tables with fewer attributes, hence more joins Joins are resource-intensive and tend to reduce performance 6

Denormalization Is the introduction of redundant information in a database in order to improve performance That redundancy is most often in a single database or it may be in multiple databases Special denormalized structures have been devised for use in Business Intelligence applications such as star schemas and OLAP 7

A Simple Example Consider a contact manager, with one table of people’s names and another table with their contact methods Each person may have many contact methods, such as cell phone, land line, pager, , and so on, each with a separate row in the contact method table 8

An Example: Normalized 9

Denormalized 10

Denormalized All the information for a single person comes from a single table, no join required Simpler query Only two types of contact method, and phone, are now allowed in order to limit complexity The denormalized data model is not as general and powerful as the normalized data model, but provides improved performance 11

Denormalization Techniques Duplicate databases If updates do not have to be available immediately, and reading is the bulk of use, use multiple copies of the whole database Update a copy that’s not used for reading Post updates at a time of low traffic Duplicate tables Especially useful if all reads can be directed to one table For systems with lots of reads, can be multiple copies for read, one for write that is replicated to the read copies Split tables Used if parts of tables are used by different applications Horizontal split Vertical split Combined tables Join is eliminated by combining tables Repeating groups across rows All repeating groups in a single row Pre-calculation of derived results Used particularly for multi-row derivations 12

Duplicate Tables If OLTP processing and decision support access the same table, decision support can slow up OLTP Can duplicate a table, use second table for read-only (BI) use because OLTP requires perfect accuracy For large-scale BI processing, more than two copies of data can be used Tradeoff can be made of accuracy of the duplicate table vs. resources to keep it up-to-date Cost: inaccuracy if duplicate table is not continuously updated, or processing resources to keep it up-to-date at all times 13

Split Tables Vertical split: attributes are divided between the two tables, primary key put into both tables Particularly useful if one group of applications accesses some columns and another group accesses different columns This can be a practical approach Horizontal split: rows are divided between two tables Usually rows are divided by range of key values Because performance drops as log of number of rows, this is often not a productive approach Cost: greater complexity of update, slower operation of some applications 14

Combined Tables If some join is a very popular query, then combine the tables that are often joined Join is eliminated and previously joined information is obtained from a single query Cost: added complexity of update, slower operation of applications that need to use just one or the other table 15

Repeating Groups Across Rows Can put repeating groups into a single row, growing across rather than down All repeating groups are obtained in a single row without a join Oracle has a feature to do this called VARRAYs Cost: loss of flexibility, since an upper limit must be placed on the number of repeats 16

Pre-calculation of Derived Results If frequent queries do GROUP BY or use complex calculations on multiple rows, can do the computation in advance and retrieve results from a summary table Oracle has a feature to do this called materialized views Example: total income=salary + commission Cost: processing time is taken to do the computation before it’s needed 17

Summary Denormalization can improve the performance of some operations and make others slower Tradeoff of accuracy vs. processing cost can introduce hard-to- diagnose problems Don’t denormalize unless testing shows inability to meet performance standards when normalized In the era of 3 GHz PCs, a DBMS on a typical laptop can easily query a table of more than 1,000,000 rows in one second 18

Recommendation First, normalize the database Carefully tune for performance using tools that are available Consider denormalization only if performance requirements can’t be met with a normalized database Avoid suggestions to denormalize before the database is even implemented Remember that “I denormalized for performance” is the argument used by designers who don’t understand normalization. Shun such people! Better, teach them! 19

Assessment As computer costs decrease and performance improves, the need for denormalization is greatly reduced Perhaps the era of denormalization is ending 20