Physical DataBase Design

Slides:



Advertisements
Similar presentations
Irwin/McGraw-Hill Copyright © 2000 The McGraw-Hill Companies. All Rights reserved Whitten Bentley DittmanSYSTEMS ANALYSIS AND DESIGN METHODS5th Edition.
Advertisements

Advance Database System
9/26/2000SIMS 257: Database Management Physical Database Design University of California, Berkeley School of Information Management and Systems SIMS 257:
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
SLIDE 1IS Fall 2002 Physical Database Design University of California, Berkeley School of Information Management and Systems SIMS 202:
9/25/2001SIMS 257: Database Management Physical Database Design University of California, Berkeley School of Information Management and Systems SIMS 257:
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Physical Database Monitoring and Tuning the Operational System.
SLIDE 1IS 257 – Spring 2004 Physical Database Design University of California, Berkeley School of Information Management and Systems SIMS 257:
© 2005 by Prentice Hall 1 Chapter 6: Physical Database Design and Performance Modern Database Management 7 th Edition Jeffrey A. Hoffer, Mary B. Prescott,
Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical Model External Model Conceptual requirements Conceptual.
Modern Systems Analysis and Design Third Edition
Modeling & Designing the Database
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Free Powerpoint Templates Page 1 Free Powerpoint Templates DBMS Unit -1 Overview of physical Storage Media.
Chapter 14 & 15 Conceptual & Logical Database Design Methodology
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
1 Lecture 7: Data structures for databases I Jose M. Peña
© 2005 by Prentice Hall 1 Chapter 6: Physical Database Design and Performance Modern Database Management 7 th Edition Jeffrey A. Hoffer, Mary B. Prescott,
Chapter 6 Physical Database Design. Introduction The purpose of physical database design is to translate the logical description of data into the technical.
Lecture 11: DMBS Internals
Irwin/McGraw-Hill Copyright © 2000 The McGraw-Hill Companies. All Rights reserved Whitten Bentley DittmanSYSTEMS ANALYSIS AND DESIGN METHODS5th Edition.
Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall 9.1.
Chapters 17 & 18 Physical Database Design Methodology.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
1 Introduction to Database Systems. 2 Database and Database System / A database is a shared collection of logically related data designed to meet the.
Chapter 9 Designing Databases Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
CSCI 3140 Module 2 – Conceptual Database Design Theodore Chiasson Dalhousie University.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Chapter 16 Methodology – Physical Database Design for Relational Databases.
Lecture 12 Designing Databases 12.1 COSC4406: Software Engineering.
Copyright 2006 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Third Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
1 © Prentice Hall, 2002 Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott,
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
DATABASE MGMT SYSTEM (BCS 1423) Chapter 5: Methodology – Conceptual Database Design.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
10/10/2012ISC239 Isabelle Bichindaritz1 Physical Database Design.
Methodology - Conceptual Database Design
CIS 210 Systems Analysis and Development Week 6 Part II Designing Databases,
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
Copyright 2006 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Third Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
Part4 Methodology of Database Design Chapter 07- Overview of Conceptual Database Design Lu Wei College of Software and Microelectronics Northwestern Polytechnical.
Methodology – Physical Database Design for Relational Databases.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
File and Database Design Class 22. File and database design: 1. Choosing the storage format for each attribute from the logical data model. 2. Grouping.
GLOBEX INFOTEK Copyright © 2013 Dr. Emelda Ntinglet-DavisSYSTEMS ANALYSIS AND DESIGN METHODSINTRODUCTORY SESSION EFFECTIVE DATABASE DESIGN for BEGINNERS.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Essentials of Systems Analysis and Design Fourth Edition Joseph S. Valacich Joey F.
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Chapter 9 Designing Databases 9.1.
Physical Database Design DeSiaMorePowered by DeSiaMore 1.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 12 Designing.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
Record Storage, File Organization, and Indexes
Physical Database Design
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
Modern Systems Analysis and Design Third Edition
What is Database Administration
Lecture 11: DMBS Internals
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Physical Database Design
Chapter 12 Designing Databases
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Lecture 15: Data Storage Tuesday, February 20, 2001.
Presentation transcript:

Physical DataBase Design

Conceptual design->logical design->physical design (ER diagram->relation database->physical design)

Physical Database Design Purpose –of this design is to translate the logical description of data into the technical specifications for storing and retrieving data Goal – of this phase is to create a design for storing data that will provide adequate performance and ensure database integrity, security and recoverability

Physical Design Information Information needed for physical file and database design includes: Normalized relations plus size estimates for them Definitions of each attribute Descriptions of where and when data are used entered, retrieved, deleted, updated, and how often Expectations and requirements for response time, and data security, backup, recovery, retention and integrity Descriptions of the technologies used to implement the database

Physical Design Decisions During this phase, the decisions are taken on the Storage Format Physical record composition Data arrangement Indexes Query optimization and performance tuning

Physical Design Decisions During this phase, the decisions are taken on the Create base relations Name Attributes Primary key Foreign key Alternative key Indexes Implement integrity rules Domain Enterprise Referential (no action, cascade, set null, set default, and no check for deleting and updating) Entity

Storage Format Choosing the storage format of each field (attribute). The DBMS provides some set of data types that can be used for the physical storage of fields in the database Data Type (format) is chosen to minimize storage space and maximize data integrity

Objectives of data type selection Minimize storage space Represent all possible values Improve data integrity Support all data manipulations The correct data type should, in minimal space, represent every possible value (but eliminate illegal values) for the associated attribute and can support the required data manipulations (e.g. numerical or string operations)

Choosing Data Types CHAR – fixed-length character VARCHAR – variable-length character (memo) LONG – large number NUMBER – positive/negative number DATE – actual date BLOB – binary large object (good for graphics, sound clips, etc.)

Designing Physical Records A physical record is a group of fields stored in adjacent memory locations and retrieved together as a unit The records can be either Fixed Length and variable length

Data Storage The data is stored on memories. The memories can be classified as Cache memory Primary memory (Secondary memory)Disk Tape

The Memory Hierarchy Main Memory = Disk Cache Processor Cache: access time 10 nano’s storage capacity 512K Volatile storage capacity 256Mb-1Gb Access time: 10-100 nanoseconds Disk Tape Persistent storage capacity 10-100 GB storage speed: Access time= 10-15 msecs. 1.5 MB/S transfer rate 280 GB typical capacity Only sequential access

Main Memory Fastest, most expensive (excluding cache) Today: 512MB are common even on PCs Many databases could fit in memory New industry trend: Main Memory Database E.g TimesTen Main issue is volatility

Secondary Storage Secondary Storage is Disks They are Slower, cheaper than main memory It is non volatile in nature, i.e. the data is permanently stored. The unit of disk I/O = block Typically 1 block = 4k A disk block is also called a disk page or simply a page Blocking factor (bfr) for a file is the average number of records stored in a disk block.

The Mechanics of Disk Mechanical characteristics: Cylinder Mechanical characteristics: Rotation speed (5400RPM) Number of platters (1-30) Number of tracks (<=10000) Number of sectors (256/track) Number of bytes / sector (29=512) Block size (212=4096) Spindle Tracks Disk head Sector Arm movement Platters Arm assembly

Important Disk Access Characteristics Block access time = Disk latency + transfer time Disk latency = seek time + rotational latency Seek time = time for the head to reach the right track 10ms – 40ms Rotational latency = rotation time to get to the right sector Time for one rotation = 10ms Average rotation latency = 10ms Transfer time is typically 5-10MB/s Disks read/write one block at a time (typically 4kB)

Representing Data Elements Relational database elements: CREATE TABLE Product ( pid INT PRIMARY KEY, name CHAR(20), description VARCHAR(200), maker CHAR(10) REFERENCES Company(name)) A tuple is represented as a record

Record Formats: Fixed Length Base address (B) Address = B+L1+L2 All fields in the record are fixed in length, so the length of the record is fixed. So all records are equal in length 9

Record Header To schema length F1 F2 F3 F4 L1 L2 L3 L4 header timestamp Need the header because: The schema may change for a while new+old may coexist Records from different relations may coexist 9

Variable Length Records Other header information header F1 F2 F3 F4 L1 L2 L3 L4 length Place the fixed fields first: F1, F2 Then the variable length fields: F3, F4 9

Records With Referencing Fields Other header information header F1 F2 F3 L1 L2 L3 length E.g. to represent one-many or many-many relationships 9

Storing Records in Blocks Blocks have fixed size (typically 4k) BLOCK R4 R3 R2 R1

Spanning Records Across Blocks header block header R1 R2 R3 R2

BLOB Binary large objects Supported by modern database systems E.g. images, sounds, etc. Storage: attempt to cluster blocks together

Modifications: Insertion File is unsorted add it to the end File is sorted: Is there space in the right block ? Yes: we are lucky, store it there Is there space in a neighboring block ? Look 1-2 blocks to the left/right, shift records If anything else fails, create overflow block

Overflow Blocks Blockn-1 Blockn Blockn+1 Overflow After a while the file starts being dominated by overflow blocks: time to reorganize

Modifications: Deletions Free space in block, shift records Maybe be able to eliminate an overflow block

Modifications: Updates If new record is shorter than previous, easy  If it is longer, need to shift records, create overflow blocks

Physical Addresses Each block and each record have a physical address that consists of: The disk The cylinder number The track number The block within the track For records: an offset in the block

Logical Addresses Logical address: a string of bytes (10-16) More flexible: can blocks/records around But need translation table: Logical address Physical address L1 P1 L2 P2 L3 P3

Main Memory Address When the block is read in main memory, it receives a main memory address Buffer manager has another translation table Memory address Logical address M1 L1 M2 L2 M3 L3

Physical Design Interface 1: User request to the DBMS. The user presents a query, the DBMS determines which physical DBs are needed to resolve the query Interface 2: The DBMS uses an internal model access method to access the data stored in a logical database. Interface 3: The internal model access methods and OS access methods access the physical records of the database.

Physical File Design A Physical file is a portion of secondary storage (disk space) allocated for the purpose of storing physical records Pointers - a field of data that can be used to locate a related field or record of data Access Methods - An operating system algorithm for storing and locating data in secondary storage Pages - The amount of data read or written in one disk input or output operation

Internal Model Access Methods Many types of access methods: Physical Sequential Indexed Sequential Indexed Random Direct Hashed Differences in Access Efficiency Storage Efficiency

Physical Sequential Key values of the physical records are in logical sequence Main use is for “dump” and “restore” Access method may be used for storage as well as retrieval Storage Efficiency is near 100% Access Efficiency is poor (unless fixed size physical records)

Sequential File Organization A sequential file is one in which the records are stored in sorted order of one or more key fields.

Sequential File Organization Sequential access means that data is accessed in a ordered sequence. Sequential access is sometimes the only way of accessing the data, for example tape. Records are usually stored on tape and processed one after the other

Sequential file

Advantages Simple file design Very efficient when most of the records must be processed e.g. Payroll Very efficient if the data has a natural order Can be stored on inexpensive devices like magnetic tape.

Disadvantages Entire file must be processed even if a single record is to be searched. Transactions have to be sorted before processing Overall processing is slow, because you have to go through each record until you get to the one you want!

Sequential File Organization A collection of records Stored in key sequence Adding/deleting record requires making new file (so that the sequence is maintained) Used as master files

Indexed Sequential Key values of the physical records are in logical sequence Access method may be used for storage and retrieval Index of key values is maintained with entries for the highest key values per block(s) Access Efficiency depends on the levels of index, storage allocated for index, number of database records, and amount of overflow Storage Efficiency depends on size of index and volatility of database

Indexed sequential file Each record of a file has a key field which uniquely identifies that record. An index consists of keys and addresses, just like an index in a book: The pages in a book are stored sequentially, so you can read through it page by page OR You can look up the page you want in the index and flick straight to it

Indexed sequential file An indexed sequential file is a sequential file (i.e. sorted into order of a key field) which has an index. A full index to a file is one in which there is an entry for every record. Because each record has an index, we can access individual records directly, without having to scroll through all the other records first.

Indexed sequential file Indexed sequential files are important for applications where data needs to be accessed..... sequentially , one record after another OR randomly using the index.

An example of an Indexed Sequential file A company may store details about its employees as an indexed sequential file. Sometimes the file is accessed.... sequentially. For example when the whole of the file is processed to produce pay slips at the end of the month.

An example of an Indexed Sequential file Sometimes the file is accessed.... randomly. Maybe an employee changes address, or a female employee gets married and changes her surname.

Indexed sequential file An indexed sequential file can only be stored on a random access device e.g. magnetic disc or CD. This is because we need a device that will allow us direct access to random files, rather than the sequential access that magnetic tape allows.

Advantages Provides flexibility for users who need both type of access with the same file Faster than sequential

Disadvantages Extra storage space for the index is required, just like in a book: your text book would be 372 pages without the index (go on, check!) but is 380 pages with the index.

Index Sequential Data File Block 1 Adams Becker Block 2 Block 3 Getta Address Block Number 1 2 3 … Actual Value Dumpling Harty Texaci ... Adams Becker Getta Mobile Sunoci

Indexed Sequential: Two Levels Address 7 8 9 … Key Value 385 678 805 001 003 . 150 705 710 785 251 455 480 536 605 610 791 1 2 3 4 5 6

Indexed Random Key values of the physical records are not necessarily in logical sequence Index may be stored and accessed with Indexed Sequential Access Method Index has an entry for every data base record. These are in ascending order. The index keys are in logical sequence. Database records are not necessarily in ascending sequence. Access method may be used for storage and retrieval

Indexed Random Address Block Number 2 1 3 Actual Value Adams Becker Dumpling Getta Harty

Btree F | | P | | Z | R | | S | | Z | H | | L | | P | B | | D | | F | Devils Aces Boilers Cars Minors Panthers Seminoles Flyers Hawkeyes Hoosiers

Direct (Random) File Organization Records are read directly from or written on to the file. The records are stored at known address. The address is calculated by applying a mathematical function to the key field.

Direct Key values of the physical records are not necessarily in logical sequence There is a one-to-one correspondence between a record key and the physical address of the record May be used for storage and retrieval No duplicate keys permitted

Hashing A bucket is a unit of storage containing one or more records (a bucket is typically a disk block). In a hash file organization we obtain the bucket of a record directly from its search-key value using a hash function. Hash function h is a function from the set of all search-key values K to the set of all bucket addresses B.

Hashing Organization Hash function is used to locate records for access, insertion as well as deletion. Records with different search-key values may be mapped to the same bucket; thus entire bucket has to be searched sequentially to locate a record.

EXAMPLE 2 records/bucket 1 2 3 INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 d a c b e h(e) = 1

EXAMPLE: deletion Delete: e f c a 1 2 b d 3 c d e f g maybe move 1 2 3 a d b d c c e maybe move “g” up f g