Presentation is loading. Please wait.

Presentation is loading. Please wait.

Physical DataBase Design.  Conceptual design->logical design->physical design (ER diagram->relation database->physical design)

Similar presentations

Presentation on theme: "Physical DataBase Design.  Conceptual design->logical design->physical design (ER diagram->relation database->physical design)"— Presentation transcript:

1 Physical DataBase Design

2  Conceptual design->logical design->physical design (ER diagram->relation database->physical design)

3 3 Physical Database Design  Purpose –of this design is to translate the logical description of data into the technical specifications for storing and retrieving data  Goal – of this phase is to create a design for storing data that will provide adequate performance and ensure database integrity, security and recoverability

4 Physical Design Information  Information needed for physical file and database design includes: Normalized relations plus size estimates for them Definitions of each attribute Descriptions of where and when data are used  entered, retrieved, deleted, updated, and how often Expectations and requirements for response time, and data security, backup, recovery, retention and integrity Descriptions of the technologies used to implement the database

5 Physical Design Decisions  During this phase, the decisions are taken on the Storage Format Physical record composition Data arrangement Indexes Query optimization and performance tuning

6 Physical Design Decisions  During this phase, the decisions are taken on the  Create base relations Name Attributes Primary key Foreign key Alternative key Indexes  Implement integrity rules Domain Enterprise Referential (no action, cascade, set null, set default, and no check for deleting and updating) Entity

7 Storage Format  Choosing the storage format of each field (attribute). The DBMS provides some set of data types that can be used for the physical storage of fields in the database  Data Type (format) is chosen to minimize storage space and maximize data integrity

8 Objectives of data type selection  Minimize storage space  Represent all possible values  Improve data integrity  Support all data manipulations  The correct data type should, in minimal space, represent every possible value (but eliminate illegal values) for the associated attribute and can support the required data manipulations (e.g. numerical or string operations)

9 9 Choosing Data Types  CHAR – fixed-length character  VARCHAR – variable-length character (memo)  LONG – large number  NUMBER – positive/negative number  DATE – actual date  BLOB – binary large object (good for graphics, sound clips, etc.)

10 Designing Physical Records  A physical record is a group of fields stored in adjacent memory locations and retrieved together as a unit  The records can be either Fixed Length and variable length

11 Data Storage  The data is stored on memories. The memories can be classified as Cache memory Primary memory (Secondary memory)Disk Tape

12 The Memory Hierarchy Main Memory = Disk Cache Volatile storage capacity 256Mb-1Gb Access time: 10-100 nanoseconds Persistent storage capacity 10- 100 GB storage speed: Access time= 10-15 msecs. 1.5 MB/S transfer rate 280 GB typical capacity Only sequential access Processor Cache: access time 10 nano’s storage capacity 512K Disk Tape

13 Main Memory  Fastest, most expensive (excluding cache)  Today: 512MB are common even on PCs  Many databases could fit in memory New industry trend: Main Memory Database E.g TimesTen  Main issue is volatility

14 Secondary Storage  Secondary Storage is Disks  They are Slower, cheaper than main memory  It is non volatile in nature, i.e. the data is permanently stored.  The unit of disk I/O = block Typically 1 block = 4k A disk block is also called a disk page or simply a page Blocking factor (bfr) for a file is the average number of records stored in a disk block.

15 The Mechanics of Disk Mechanical characteristics:  Rotation speed (5400RPM)  Number of platters (1-30)  Number of tracks (<=10000)  Number of sectors (256/track)  Number of bytes / sector (2 9 =512)  Block size (2 12 =4096) Platters Spindle Disk head Arm movement Arm assembly Tracks Sector Cylinder

16 Important Disk Access Characteristics  Block access time = Disk latency + transfer time  Disk latency = seek time + rotational latency  Seek time = time for the head to reach the right track 10ms – 40ms  Rotational latency = rotation time to get to the right sector Time for one rotation = 10ms Average rotation latency = 10ms  Transfer time is typically 5-10MB/s  Disks read/write one block at a time (typically 4kB)

17 Representing Data Elements  Relational database elements: CREATE TABLE Product ( pid INT PRIMARY KEY, name CHAR(20), description VARCHAR(200), maker CHAR(10) REFERENCES Company(name))  A tuple is represented as a record

18 Record Formats: Fixed Length  All fields in the record are fixed in length, so the length of the record is fixed. So all records are equal in length Base address (B) L1L2 L3L4 F1F2 F3F4 Address = B+L1+L2

19 Record Header L1L2 L3L4 F1F2 F3F4 To schema length timestamp Need the header because: The schema may change for a while new+old may coexist Records from different relations may coexist header

20 Variable Length Records L1L2 L3L4 F1F2 F3F4 Other header information length Place the fixed fields first: F1, F2 Then the variable length fields: F3, F4 header

21 Records With Referencing Fields L1L2 L3 F1F2 F3 Other header information length header E.g. to represent one-many or many-many relationships

22 Storing Records in Blocks  Blocks have fixed size (typically 4k) R1R2R3 BLOCK R4

23 Spanning Records Across Blocks block header block header R1R2 R3

24 BLOB  Binary large objects  Supported by modern database systems  E.g. images, sounds, etc.  Storage: attempt to cluster blocks together

25 Modifications: Insertion  File is unsorted add it to the end  File is sorted: Is there space in the right block ?  Yes: we are lucky, store it there Is there space in a neighboring block ?  Look 1-2 blocks to the left/right, shift records If anything else fails, create overflow block

26 Overflow Blocks  After a while the file starts being dominated by overflow blocks: time to reorganize Block n-1 Block n Block n+1 Overflow

27 Modifications: Deletions  Free space in block, shift records  Maybe be able to eliminate an overflow block

28 Modifications: Updates  If new record is shorter than previous, easy  If it is longer, need to shift records, create overflow blocks

29 Physical Addresses  Each block and each record have a physical address that consists of: The disk The cylinder number The track number The block within the track For records: an offset in the block

30 Logical Addresses  Logical address: a string of bytes (10- 16)  More flexible: can blocks/records around  But need translation table: Logical address Physical address L1P1 L2P2 L3P3

31 Main Memory Address  When the block is read in main memory, it receives a main memory address  Buffer manager has another translation table Memory address Logical address M1L1 M2L2 M3L3

32 Physical Design  Interface 1: User request to the DBMS. The user presents a query, the DBMS determines which physical DBs are needed to resolve the query  Interface 2: The DBMS uses an internal model access method to access the data stored in a logical database.  Interface 3: The internal model access methods and OS access methods access the physical records of the database.

33 Physical File Design  A Physical file is a portion of secondary storage (disk space) allocated for the purpose of storing physical records  Pointers - a field of data that can be used to locate a related field or record of data  Access Methods - An operating system algorithm for storing and locating data in secondary storage  Pages - The amount of data read or written in one disk input or output operation

34 Internal Model Access Methods  Many types of access methods: Physical Sequential Indexed Sequential Indexed Random Direct Hashed  Differences in Access Efficiency Storage Efficiency

35 Physical Sequential  Key values of the physical records are in logical sequence  Main use is for “dump” and “restore”  Access method may be used for storage as well as retrieval  Storage Efficiency is near 100%  Access Efficiency is poor (unless fixed size physical records)

36 Sequential File Organization  A sequential file is one in which the records are stored in sorted order of one or more key fields.

37 Sequential File Organization  Sequential access means that data is accessed in a ordered sequence.  Sequential access is sometimes the only way of accessing the data, for example tape.  Records are usually stored on tape and processed one after the other

38 Sequential file

39 Advantages  Simple file design  Very efficient when most of the records must be processed e.g. Payroll  Very efficient if the data has a natural order  Can be stored on inexpensive devices like magnetic tape.

40 Disadvantages  Entire file must be processed even if a single record is to be searched.  Transactions have to be sorted before processing  Overall processing is slow, because you have to go through each record until you get to the one you want!

41 Sequential File Organization  A collection of records  Stored in key sequence  Adding/deleting record requires making new file (so that the sequence is maintained)  Used as master files

42 Indexed Sequential  Key values of the physical records are in logical sequence  Access method may be used for storage and retrieval  Index of key values is maintained with entries for the highest key values per block(s)  Access Efficiency depends on the levels of index, storage allocated for index, number of database records, and amount of overflow  Storage Efficiency depends on size of index and volatility of database

43 Indexed sequential file  Each record of a file has a key field which uniquely identifies that record.  An index consists of keys and addresses, just like an index in a book: The pages in a book are stored sequentially, so you can read through it page by page OR You can look up the page you want in the index and flick straight to it

44 Indexed sequential file  An indexed sequential file is a sequential file (i.e. sorted into order of a key field) which has an index.  A full index to a file is one in which there is an entry for every record.  Because each record has an index, we can access individual records directly, without having to scroll through all the other records first.

45 Indexed sequential file  Indexed sequential files are important for applications where data needs to be accessed..... sequentially, one record after another OR randomly using the index.

46 An example of an Indexed Sequential file A company may store details about its employees as an indexed sequential file. Sometimes the file is accessed....  sequentially. For example when the whole of the file is processed to produce pay slips at the end of the month.

47 An example of an Indexed Sequential file Sometimes the file is accessed....  randomly. Maybe an employee changes address, or a female employee gets married and changes her surname.

48 Indexed sequential file  An indexed sequential file can only be stored on a random access device e.g. magnetic disc or CD.  This is because we need a device that will allow us direct access to random files, rather than the sequential access that magnetic tape allows.

49 Advantages  Provides flexibility for users who need both type of access with the same file  Faster than sequential

50 Disadvantages  Extra storage space for the index is required, just like in a book: your text book would be 372 pages without the index (go on, check!) but is 380 pages with the index.

51 Index Sequential Data File Block 1 Block 2 Block 3 Address Block Number 123…123… Actual Value Dumpling Harty Texaci... Adams Becker Dumpling Getta Harty Mobile Sunoci Texaci

52 Indexed Sequential: Two Levels Address 789…789… Key Value 385 678 805 001 003. 150 705 710. 785 251. 385 455 480. 536 605 610. 678 791. 805 Address 1212 Key Value 150 385 Address 3434 Key Value 536 678 Address 5656 Key Value 785 805

53 Indexed Random  Key values of the physical records are not necessarily in logical sequence  Index may be stored and accessed with Indexed Sequential Access Method  Index has an entry for every data base record. These are in ascending order. The index keys are in logical sequence. Database records are not necessarily in ascending sequence.  Access method may be used for storage and retrieval

54 Indexed Random Address Block Number 2132121321 Actual Value Adams Becker Dumpling Getta Harty Becker Harty Adams Getta Dumpling

55 Btree F | | P | | Z | R | | S | | Z |H | | L | | P |B | | D | | F | Devils Aces Boilers Cars Minors Panthers Seminoles Flyers Hawkeyes Hoosiers

56 Direct (Random) File Organization  Records are read directly from or written on to the file.  The records are stored at known address.  The address is calculated by applying a mathematical function to the key field.

57 Direct  Key values of the physical records are not necessarily in logical sequence  There is a one-to-one correspondence between a record key and the physical address of the record  May be used for storage and retrieval  No duplicate keys permitted

58 Hashing  A bucket is a unit of storage containing one or more records (a bucket is typically a disk block).  In a hash file organization we obtain the bucket of a record directly from its search-key value using a hash function.  Hash function h is a function from the set of all search-key values K to the set of all bucket addresses B.

59 Hashing Organization  Hash function is used to locate records for access, insertion as well as deletion.  Records with different search-key values may be mapped to the same bucket; thus entire bucket has to be searched sequentially to locate a record.

60 60 EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 01230123 d a c b h(e) = 1 e

61 61 01230123 a b c e d EXAMPLE: deletion Delete: e f f g maybe move “g” up c d

Download ppt "Physical DataBase Design.  Conceptual design->logical design->physical design (ER diagram->relation database->physical design)"

Similar presentations

Ads by Google