Physical DataBase Design

Physical DataBase Design

Conceptual design->logical design->physical design
(ER diagram->relation database->physical design)

Physical Database Design
Purpose –of this design is to translate the logical description of data into the technical specifications for storing and retrieving data Goal – of this phase is to create a design for storing data that will provide adequate performance and ensure database integrity, security and recoverability

Physical Design Information
Information needed for physical file and database design includes: Normalized relations plus size estimates for them Definitions of each attribute Descriptions of where and when data are used entered, retrieved, deleted, updated, and how often Expectations and requirements for response time, and data security, backup, recovery, retention and integrity Descriptions of the technologies used to implement the database

Physical Design Decisions
During this phase, the decisions are taken on the Storage Format Physical record composition Data arrangement Indexes Query optimization and performance tuning

Physical Design Decisions
During this phase, the decisions are taken on the Create base relations Name Attributes Primary key Foreign key Alternative key Indexes Implement integrity rules Domain Enterprise Referential (no action, cascade, set null, set default, and no check for deleting and updating) Entity

Storage Format Choosing the storage format of each field (attribute). The DBMS provides some set of data types that can be used for the physical storage of fields in the database Data Type (format) is chosen to minimize storage space and maximize data integrity

Objectives of data type selection
Minimize storage space Represent all possible values Improve data integrity Support all data manipulations The correct data type should, in minimal space, represent every possible value (but eliminate illegal values) for the associated attribute and can support the required data manipulations (e.g. numerical or string operations)

Choosing Data Types CHAR – fixed-length character
VARCHAR – variable-length character (memo) LONG – large number NUMBER – positive/negative number DATE – actual date BLOB – binary large object (good for graphics, sound clips, etc.)

Designing Physical Records
A physical record is a group of fields stored in adjacent memory locations and retrieved together as a unit The records can be either Fixed Length and variable length

Data Storage The data is stored on memories. The memories can be classified as Cache memory Primary memory (Secondary memory)Disk Tape

The Memory Hierarchy Main Memory = Disk Cache Processor Cache:
access time 10 nano’s storage capacity 512K Volatile storage capacity 256Mb-1Gb Access time: nanoseconds Disk Tape Persistent storage capacity GB storage speed: Access time= 10-15 msecs. 1.5 MB/S transfer rate 280 GB typical capacity Only sequential access

Main Memory Fastest, most expensive (excluding cache)
Today: 512MB are common even on PCs Many databases could fit in memory New industry trend: Main Memory Database E.g TimesTen Main issue is volatility

Secondary Storage Secondary Storage is Disks
They are Slower, cheaper than main memory It is non volatile in nature, i.e. the data is permanently stored. The unit of disk I/O = block Typically 1 block = 4k A disk block is also called a disk page or simply a page Blocking factor (bfr) for a file is the average number of records stored in a disk block.

The Mechanics of Disk Mechanical characteristics:
Cylinder Mechanical characteristics: Rotation speed (5400RPM) Number of platters (1-30) Number of tracks (<=10000) Number of sectors (256/track) Number of bytes / sector (29=512) Block size (212=4096) Spindle Tracks Disk head Sector Arm movement Platters Arm assembly

Important Disk Access Characteristics
Block access time = Disk latency + transfer time Disk latency = seek time + rotational latency Seek time = time for the head to reach the right track 10ms – 40ms Rotational latency = rotation time to get to the right sector Time for one rotation = 10ms Average rotation latency = 10ms Transfer time is typically 5-10MB/s Disks read/write one block at a time (typically 4kB)

Representing Data Elements
Relational database elements: CREATE TABLE Product ( pid INT PRIMARY KEY, name CHAR(20), description VARCHAR(200), maker CHAR(10) REFERENCES Company(name)) A tuple is represented as a record

Record Formats: Fixed Length
Base address (B) Address = B+L1+L2 All fields in the record are fixed in length, so the length of the record is fixed. So all records are equal in length 9

Record Header To schema length F1 F2 F3 F4 L1 L2 L3 L4 header
timestamp Need the header because: The schema may change for a while new+old may coexist Records from different relations may coexist 9

Variable Length Records
Other header information header F1 F2 F3 F4 L1 L2 L3 L4 length Place the fixed fields first: F1, F2 Then the variable length fields: F3, F4 9

Records With Referencing Fields
Other header information header F1 F2 F3 L1 L2 L3 length E.g. to represent one-many or many-many relationships 9

Storing Records in Blocks
Blocks have fixed size (typically 4k) BLOCK R4 R3 R2 R1

Spanning Records Across Blocks
header block header R1 R2 R3 R2

BLOB Binary large objects Supported by modern database systems
E.g. images, sounds, etc. Storage: attempt to cluster blocks together

Modifications: Insertion
File is unsorted add it to the end File is sorted: Is there space in the right block ? Yes: we are lucky, store it there Is there space in a neighboring block ? Look 1-2 blocks to the left/right, shift records If anything else fails, create overflow block

Overflow Blocks Blockn-1 Blockn Blockn+1 Overflow After a while the file starts being dominated by overflow blocks: time to reorganize

Modifications: Deletions
Free space in block, shift records Maybe be able to eliminate an overflow block

Modifications: Updates
If new record is shorter than previous, easy  If it is longer, need to shift records, create overflow blocks

Physical Addresses Each block and each record have a physical address that consists of: The disk The cylinder number The track number The block within the track For records: an offset in the block

Logical Addresses Logical address: a string of bytes (10-16)
More flexible: can blocks/records around But need translation table: Logical address Physical address L1 P1 L2 P2 L3 P3

Main Memory Address When the block is read in main memory, it receives a main memory address Buffer manager has another translation table Memory address Logical address M1 L1 M2 L2 M3 L3

Physical Design Interface 1: User request to the DBMS. The user presents a query, the DBMS determines which physical DBs are needed to resolve the query Interface 2: The DBMS uses an internal model access method to access the data stored in a logical database. Interface 3: The internal model access methods and OS access methods access the physical records of the database.

Physical File Design A Physical file is a portion of secondary storage (disk space) allocated for the purpose of storing physical records Pointers - a field of data that can be used to locate a related field or record of data Access Methods - An operating system algorithm for storing and locating data in secondary storage Pages - The amount of data read or written in one disk input or output operation

Internal Model Access Methods
Many types of access methods: Physical Sequential Indexed Sequential Indexed Random Direct Hashed Differences in Access Efficiency Storage Efficiency

Physical Sequential Key values of the physical records are in logical sequence Main use is for “dump” and “restore” Access method may be used for storage as well as retrieval Storage Efficiency is near 100% Access Efficiency is poor (unless fixed size physical records)

Sequential File Organization
A sequential file is one in which the records are stored in sorted order of one or more key fields.

Sequential access means that data is accessed in a ordered sequence. Sequential access is sometimes the only way of accessing the data, for example tape. Records are usually stored on tape and processed one after the other

Sequential file

Advantages Simple file design
Very efficient when most of the records must be processed e.g. Payroll Very efficient if the data has a natural order Can be stored on inexpensive devices like magnetic tape.

Disadvantages Entire file must be processed even if a single record is to be searched. Transactions have to be sorted before processing Overall processing is slow, because you have to go through each record until you get to the one you want!

A collection of records Stored in key sequence Adding/deleting record requires making new file (so that the sequence is maintained) Used as master files

Indexed Sequential Key values of the physical records are in logical sequence Access method may be used for storage and retrieval Index of key values is maintained with entries for the highest key values per block(s) Access Efficiency depends on the levels of index, storage allocated for index, number of database records, and amount of overflow Storage Efficiency depends on size of index and volatility of database

Indexed sequential file
Each record of a file has a key field which uniquely identifies that record. An index consists of keys and addresses, just like an index in a book: The pages in a book are stored sequentially, so you can read through it page by page OR You can look up the page you want in the index and flick straight to it

An indexed sequential file is a sequential file (i.e. sorted into order of a key field) which has an index. A full index to a file is one in which there is an entry for every record. Because each record has an index, we can access individual records directly, without having to scroll through all the other records first.

Indexed sequential files are important for applications where data needs to be accessed..... sequentially , one record after another OR randomly using the index.

An example of an Indexed Sequential file
A company may store details about its employees as an indexed sequential file. Sometimes the file is accessed.... sequentially. For example when the whole of the file is processed to produce pay slips at the end of the month.

An example of an Indexed Sequential file
Sometimes the file is accessed.... randomly. Maybe an employee changes address, or a female employee gets married and changes her surname.

An indexed sequential file can only be stored on a random access device e.g. magnetic disc or CD. This is because we need a device that will allow us direct access to random files, rather than the sequential access that magnetic tape allows.

Advantages Provides flexibility for users who need both type of access with the same file Faster than sequential

Disadvantages Extra storage space for the index is required, just like in a book: your text book would be 372 pages without the index (go on, check!) but is 380 pages with the index.

Index Sequential Data File Block 1 Adams Becker Block 2 Block 3 Getta
Address Block Number 1 2 3 … Actual Value Dumpling Harty Texaci ... Adams Becker Getta Mobile Sunoci

Indexed Sequential: Two Levels
Address 7 8 9 … Key Value 385 678 805 001 003 . 150 705 710 785 251 455 480 536 605 610 791 1 2 3 4 5 6

Indexed Random Key values of the physical records are not necessarily in logical sequence Index may be stored and accessed with Indexed Sequential Access Method Index has an entry for every data base record. These are in ascending order. The index keys are in logical sequence. Database records are not necessarily in ascending sequence. Access method may be used for storage and retrieval

Indexed Random Address Block Number 2 1 3 Actual Value Adams Becker
Dumpling Getta Harty

Btree F | | P | | Z | R | | S | | Z | H | | L | | P | B | | D | | F |
Devils Aces Boilers Cars Minors Panthers Seminoles Flyers Hawkeyes Hoosiers

Direct (Random) File Organization
Records are read directly from or written on to the file. The records are stored at known address. The address is calculated by applying a mathematical function to the key field.

Direct Key values of the physical records are not necessarily in logical sequence There is a one-to-one correspondence between a record key and the physical address of the record May be used for storage and retrieval No duplicate keys permitted

Hashing A bucket is a unit of storage containing one or more records (a bucket is typically a disk block). In a hash file organization we obtain the bucket of a record directly from its search-key value using a hash function. Hash function h is a function from the set of all search-key values K to the set of all bucket addresses B.

Hashing Organization Hash function is used to locate records for access, insertion as well as deletion. Records with different search-key values may be mapped to the same bucket; thus entire bucket has to be searched sequentially to locate a record.

EXAMPLE 2 records/bucket
1 2 3 INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 d a c b e h(e) = 1

EXAMPLE: deletion Delete: e f c a 1 2 b d 3 c d e f g maybe move
1 2 3 a d b d c c e maybe move “g” up f g

Physical DataBase Design

Similar presentations

Presentation on theme: "Physical DataBase Design"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Physical DataBase Design

Similar presentations

Presentation on theme: "Physical DataBase Design"— Presentation transcript:

Similar presentations

About project

Feedback