Schema Dennis Shasha and Philippe Bonnet, 2013.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Index Dennis Shasha and Philippe Bonnet, 2013.
Chapter 3 : Relational Model
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Manajemen Basis Data Pertemuan 7 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
BTrees & Bitmap Indexes
Normalization DB Tuning CS186 Final Review Session.
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Physical Database Monitoring and Tuning the Operational System.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
Tuning Relational Systems I. Schema design  Trade-offs among normalization, denormalization, clustering, aggregate materialization, vertical partitioning,
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Team Dosen UMN Physical DB Design Connolly Book Chapter 18.
Chapter 14 & 15 Conceptual & Logical Database Design Methodology
Cloud Computing Lecture Column Store – alternative organization for big relational data.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 4 The Relational Model.
Index tuning Performance Tuning.
Chapters 17 & 18 Physical Database Design Methodology.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
CSC271 Database Systems Lecture # 30.
Lecture 8 Index Organized Tables Clusters Index compression
Lecture 9 Methodology – Physical Database Design for Relational Databases.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Database Tuning Prerequisite Cluster Index B+Tree Indexing Hash Indexing ISAM (indexed Sequential access)
Chapter 16 Methodology – Physical Database Design for Relational Databases.
1 © Prentice Hall, 2002 Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott,
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
Physical Database Design Transparencies. ©Pearson Education 2009 Chapter 11 - Objectives Purpose of physical database design. How to map the logical database.
Relational Database. Database Management System (DBMS)
SCUHolliday - coen 1788–1 Schedule Today u Modifications, Schemas, Views. u Read Sections (except and 6.6.6) Next u Constraints. u Read.
Schema Tuning. Outline Database design: Normalization –Problem of redundancy –Why? Functional dependency –How to solve? Decomposition –Objective of the.
Methodology – Physical Database Design for Relational Databases.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Chapter 5 Index and Clustering
CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections )
B. Information Technology (Hons.) CMPB245: Database Design Physical Design.
Physical Database Design DeSiaMorePowered by DeSiaMore 1.
CS 440 Database Management Systems Lecture 6: Data storage & access methods 1.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Chapter 3 The Relational Model. Objectives u Terminology of relational model. u How tables are used to represent data. u Connection between mathematical.
SQL Basics Review Reviewing what we’ve learned so far…….
Practical Database Design and Tuning
Module 11: File Structure
CS 540 Database Management Systems
Storage and Indexes Chapter 8 & 9
Physical Database Design and Performance
Database Management Systems (CS 564)
Methodology – Physical Database Design for Relational Databases
Physical Database Design for Relational Databases Step 3 – Step 8
Evaluation of Relational Operations
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Module 11: Data Storage Structure
Practical Database Design and Tuning
Lecture 19: Data Storage and Indexes
Chapter 13: Data Storage Structures
Index Tuning Additional knowledge.
Chapter 13: Data Storage Structures
Chapter 13: Data Storage Structures
Presentation transcript:

Schema Dennis Shasha and Philippe Bonnet, 2013

Outline Index Utilization – Heap Files – Definition: Clustered/Non clustered, Dense/Sparse – Access method: Types of queries – Constraints and Indexes – Locking Index Data Structures – B+-Tree / Hash – LSM Tree / Fractal tree – Implementation in DBMS Index Tuning – Index data structure – Search key – Size of key – Clustered/Non-clustered/No index – Covering

Database Schema A relation schema is a relation name and a set of attributes R(a int, b varchar[20]); A relation instance for R is a set of records over the attributes in the schema for R.

Some Schema are better than others Schema1: OnOrder1(supplier_id, part_id, quantity, supplier_address) Schema 2: OnOrder2(supplier_id, part_id, quantity); Supplier(supplier_id, supplier_address); Space – Schema 2 saves space Information preservation – Some supplier addresses might get lost with schema 1. Performance trade-off – Frequent access to address of supplier given an ordered part, then schema 1 is good. – Many new orders, schema 1 is not good.

Functional Dependencies X is a set of attributes of relation R, and A is a single attribute of R. – X determines A (the functional dependency X  A holds for R) iff: For any relation instance I of R, whenever there are two records r and r’ in I with the same X values, they have the same A value as well. OnOrder1(supplier_id, part_id, quantity, supplier_address) – supplier_id  supplier_address is an interesting functional dependency

Key of a Relation Attributes X from R constitute a key of R if X determines every attribute in R and no proper subset of X determines an attribute in R. OnOrder1(supplier_id, part_id, quantity, supplier_address) – supplier_id, part_id is not a key Supplier(supplier_id, supplier_address); – Supplier_id is a key

Normalization A relation is normalized if every interesting functional dependency X  A involving attributes in R has the property that X is a key of R OnOrder1 is not normalized OnOrder2 and Supplier are normalized Here normalized, refers to BCNF (Boyce-Codd Normal Form)

Example #1 Suppose that a bank associates each customer with his or her home branch. Each branch is in a specific legal jurisdiction. – Is the relation R(customer, branch, jurisdiction) normalized?

Example #1 What are the functional dependencies? – Customer -> branch – Branch  jurisdiction – Customer  jurisdiction – Customer is the key, but a functional dependency exists where customer is not involved. – R is not normalized.

Example #2 Suppose that a doctor can work in several hospitals and receives a salary from each one. Is R(doctor, hospital, salary) normalized?

Example #2 What are the functional dependencies? – doctor, hospital  salary The key is doctor, hospital The relation is normalized.

Example #3 Same relation R(doctor, hospital, salary) and we add the attribute primary_home_address. Each doctor has a primary home address and several doctors can have the same primary home address. Is R(doctor, hospital, salary, primary_home_address) normalized?

Example #3 What are the functional dependencies? – doctor, hospital  salary – Doctor -> primary_home_address – doctor, hospital  primary_home_address The key is no longer doctor, hospital because doctor (a subset) determines one attribute. A normalized decomposition would be: – R1(doctor, hospital, salary) – R2(doctor, primary_home_address)

Practical Schema Design Identify entities in the application (e.g., doctors, hospitals, suppliers). Each entity has attributes (an hospital has an address, a juridiction, …). There are two constraints on attributes: – An attribute cannot have attribute of its own (1FN) – The entity associated with an attribute must functionally determine that attribute.

Practical Schema Design Each entity becomes a relation To those relations, add relations that reflect relationships between entities. – Worksin (doctor_ID, hospital_ID) Identify the functional dependencies among all attributes and check that the schema is normalized: – If functional dependency AB →  C holds, then ABC should be part of the same relation.

Compression Index level – Key compression Tablespace level – All blocks in a tablespace are compressed – Blocks are compressed/uncompressed as they are transferred between RAM and secondary storage. Row level – All rows in a table are compressed

Compression Methods Lossless methods – Exploits redundancy to represent data concisely Differential encoding (difference of adjacent values in the same column or same tuple) Offset encoding (offset to a base value) – Prefix compression (index keys) Null suppression (omit lead zero bytes or spaces) Non adaptive dictionary encoding Adaptive dictionary encoding (LZ77) Huffman encoding (assign fewer bits to represent frequent characters) Arithmetic encoding (interval representation of a string according to probability of each character)

Compression Trade-offs CPU / IO – Compression/decompression is CPU intensive, reduces IO cost Fewer IOs to transfer data from secondary storage to RAM Overall DB footprint is smaller – Trade-off accounted for During calibration At run time: alter table to activate/deactivate compression. DBMS/SSD – Some SSD support compression. This should be turned off. Compression should be managed at DBMS level.

Tuning Normalization A single normalized relation XYZ is better than two normalized relations XY and XZ if the single relation design allows queries to access X, Y and Z together without requiring a join. The two-relation design is better iff: – Users access tend to partition between the two sets Y and Z most of the time – Attributes Y or Z have large values

Vertical Partitioning Three attributes: account_ID, balance, address. Functional dependencies: – account_ID  balance – account_ID  address Two normalized schema design: – (account_ID, balance, address) or – (account_ID, balance) – (account_ID, address) Which design is better?

Vertical Partitioning Which design is better depends on the query pattern: – The application that sends a monthly statement is the principal user of the address of the owner of an account – The balance is updated or examined several times a day. The second schema might be better because the relation (account_ID, balance) can be made smaller: – More account_ID, balance pairs fit in memory, thus increasing the hit ratio – A scan performs better because there are fewer pages.

Vertical Antipartitioning Brokers base their bond-buying decisions on the price trends of those bonds. The database holds the closing price for the last 3000 trading days, however the 10 most recent trading days are especially important. – (bond_id, issue_date, maturity, …) (bond_id, date, price) Vs. – (bond_id, issue_date, maturity, today_price, …10dayago_price) (bond_id, date, price)