3 Physical Database Design Purpose –of this design is to translate the logical description of data into the technical specifications for storing and retrieving dataGoal – of this phase is to create a design for storing data that will provide adequate performance and ensure database integrity, security and recoverability
4 Physical Design Information Information needed for physical file and database design includes:Normalized relations plus size estimates for themDefinitions of each attributeDescriptions of where and when data are usedentered, retrieved, deleted, updated, and how oftenExpectations and requirements for response time, and data security, backup, recovery, retention and integrityDescriptions of the technologies used to implement the database
5 Physical Design Decisions During this phase, the decisions are taken on theStorage FormatPhysical record compositionData arrangementIndexesQuery optimization and performance tuning
6 Physical Design Decisions During this phase, the decisions are taken on theCreate base relationsNameAttributesPrimary keyForeign keyAlternative keyIndexesImplement integrity rulesDomainEnterpriseReferential (no action, cascade, set null, set default, and no check for deleting and updating)Entity
7 Storage FormatChoosing the storage format of each field (attribute). The DBMS provides some set of data types that can be used for the physical storage of fields in the databaseData Type (format) is chosen to minimize storage space and maximize data integrity
8 Objectives of data type selection Minimize storage spaceRepresent all possible valuesImprove data integritySupport all data manipulationsThe correct data type should, in minimal space, represent every possible value (but eliminate illegal values) for the associated attribute and can support the required data manipulations (e.g. numerical or string operations)
9 Choosing Data Types CHAR – fixed-length character VARCHAR – variable-length character (memo)LONG – large numberNUMBER – positive/negative numberDATE – actual dateBLOB – binary large object (good for graphics, sound clips, etc.)
10 Designing Physical Records A physical record is a group of fields stored in adjacent memory locations and retrieved together as a unitThe records can be either Fixed Length and variable length
11 Data StorageThe data is stored on memories. The memories can be classified asCache memoryPrimary memory(Secondary memory)DiskTape
12 The Memory Hierarchy Main Memory = Disk Cache Processor Cache: access time 10 nano’sstorage capacity 512KVolatilestorage capacity 256Mb-1GbAccess time:nanosecondsDiskTapePersistentstorage capacity GB storagespeed:Access time=10-15 msecs.1.5 MB/S transfer rate280 GB typicalcapacityOnly sequential access
13 Main Memory Fastest, most expensive (excluding cache) Today: 512MB are common even on PCsMany databases could fit in memoryNew industry trend: Main Memory DatabaseE.g TimesTenMain issue is volatility
14 Secondary Storage Secondary Storage is Disks They are Slower, cheaper than main memoryIt is non volatile in nature, i.e. the data is permanently stored.The unit of disk I/O = blockTypically 1 block = 4kA disk block is also called a disk page or simply a pageBlocking factor (bfr) for a file is the average number of records stored in a disk block.
15 The Mechanics of Disk Mechanical characteristics: CylinderMechanical characteristics:Rotation speed (5400RPM)Number of platters (1-30)Number of tracks (<=10000)Number of sectors (256/track)Number of bytes / sector (29=512)Block size (212=4096)SpindleTracksDisk headSectorArm movementPlattersArm assembly
16 Important Disk Access Characteristics Block access time = Disk latency + transfer timeDisk latency = seek time + rotational latencySeek time = time for the head to reach the right track10ms – 40msRotational latency = rotation time to get to the right sectorTime for one rotation = 10msAverage rotation latency = 10msTransfer time is typically 5-10MB/sDisks read/write one block at a time (typically 4kB)
17 Representing Data Elements Relational database elements:CREATE TABLE Product (pid INT PRIMARY KEY,name CHAR(20),description VARCHAR(200),maker CHAR(10) REFERENCES Company(name))A tuple is represented as a record
18 Record Formats: Fixed Length Base address (B)Address = B+L1+L2All fields in the record are fixed in length, so the length of the record is fixed. So all records are equal in length9
19 Record Header To schema length F1 F2 F3 F4 L1 L2 L3 L4 header timestampNeed the header because:The schema may changefor a while new+old may coexistRecords from different relations may coexist9
20 Variable Length Records Other header informationheaderF1F2F3F4L1L2L3L4lengthPlace the fixed fields first: F1, F2Then the variable length fields: F3, F49
21 Records With Referencing Fields Other header informationheaderF1F2F3L1L2L3lengthE.g. to represent one-many or many-many relationships9
22 Storing Records in Blocks Blocks have fixed size (typically 4k)BLOCKR4R3R2R1
23 Spanning Records Across Blocks headerblockheaderR1R2R3R2
24 BLOB Binary large objects Supported by modern database systems E.g. images, sounds, etc.Storage: attempt to cluster blocks together
25 Modifications: Insertion File is unsortedadd it to the endFile is sorted:Is there space in the right block ?Yes: we are lucky, store it thereIs there space in a neighboring block ?Look 1-2 blocks to the left/right, shift recordsIf anything else fails, create overflow block
26 Overflow BlocksBlockn-1BlocknBlockn+1OverflowAfter a while the file starts being dominated by overflow blocks: time to reorganize
27 Modifications: Deletions Free space in block, shift recordsMaybe be able to eliminate an overflow block
28 Modifications: Updates If new record is shorter than previous, easy If it is longer, need to shift records, create overflow blocks
29 Physical AddressesEach block and each record have a physical address that consists of:The diskThe cylinder numberThe track numberThe block within the trackFor records: an offset in the block
30 Logical Addresses Logical address: a string of bytes (10-16) More flexible: can blocks/records aroundBut need translation table:Logical addressPhysical addressL1P1L2P2L3P3
31 Main Memory AddressWhen the block is read in main memory, it receives a main memory addressBuffer manager has another translation tableMemory addressLogical addressM1L1M2L2M3L3
32 Physical DesignInterface 1: User request to the DBMS. The user presents a query, the DBMS determines which physical DBs are needed to resolve the queryInterface 2: The DBMS uses an internal model access method to access the data stored in a logical database.Interface 3: The internal model access methods and OS access methods access the physical records of the database.
33 Physical File DesignA Physical file is a portion of secondary storage (disk space) allocated for the purpose of storing physical recordsPointers - a field of data that can be used to locate a related field or record of dataAccess Methods - An operating system algorithm for storing and locating data in secondary storagePages - The amount of data read or written in one disk input or output operation
34 Internal Model Access Methods Many types of access methods:Physical SequentialIndexed SequentialIndexed RandomDirectHashedDifferences inAccess EfficiencyStorage Efficiency
35 Physical SequentialKey values of the physical records are in logical sequenceMain use is for “dump” and “restore”Access method may be used for storage as well as retrievalStorage Efficiency is near 100%Access Efficiency is poor (unless fixed size physical records)
36 Sequential File Organization A sequential file is one in which the records are stored in sorted order of one or more key fields.
37 Sequential File Organization Sequential access means that data is accessed in a ordered sequence.Sequential access is sometimes the only way of accessing the data, for example tape.Records are usuallystored on tape andprocessed one afterthe other
39 Advantages Simple file design Very efficient when most of the records must be processed e.g. PayrollVery efficient if the data has a natural orderCan be stored on inexpensive devices like magnetic tape.
40 DisadvantagesEntire file must be processed even if a single record is to be searched.Transactions have to be sorted before processingOverall processing is slow, because you have to go through each record until you get to the one you want!
41 Sequential File Organization A collection of recordsStored in key sequenceAdding/deleting record requires making new file (so that the sequence is maintained)Used as master files
42 Indexed SequentialKey values of the physical records are in logical sequenceAccess method may be used for storage and retrievalIndex of key values is maintained with entries for the highest key values per block(s)Access Efficiency depends on the levels of index, storage allocated for index, number of database records, and amount of overflowStorage Efficiency depends on size of index and volatility of database
43 Indexed sequential file Each record of a file has a key field which uniquely identifies that record.An index consists of keys and addresses, just like an index in a book:The pages in a book are stored sequentially, so you can read through it page by pageORYou can look up the page you wantin the index and flick straight to it
44 Indexed sequential file An indexed sequential file is a sequential file (i.e. sorted into order of a key field) which has an index.A full index to a file is one in which there is an entry for every record.Because each record has an index, we can access individual records directly, without having to scroll through all the other records first.
45 Indexed sequential file Indexed sequential files are important for applications where data needs to be accessed.....sequentially , one record after anotherORrandomly using the index.
46 An example of an Indexed Sequential file A company may store details about its employees as an indexed sequential file. Sometimes the file is accessed....sequentially. For example when the whole of the file is processed to produce pay slips at the end of the month.
47 An example of an Indexed Sequential file Sometimes the file is accessed....randomly. Maybe an employee changes address, or a female employee gets married and changes her surname.
48 Indexed sequential file An indexed sequential file can only be stored on a random access device e.g. magnetic disc or CD.This is because we need a device that will allow us direct accessto random files, ratherthan the sequentialaccess thatmagnetic tape allows.
49 AdvantagesProvides flexibility for users who need both type of access with the same fileFaster thansequential
50 DisadvantagesExtra storage space for the index is required, just like in a book: your text book would be 372 pages without the index (go on, check!) but is 380 pages with the index.
51 Index Sequential Data File Block 1 Adams Becker Block 2 Block 3 Getta AddressBlockNumber123…ActualValueDumplingHartyTexaci...AdamsBeckerGettaMobileSunoci
52 Indexed Sequential: Two Levels Address789…KeyValue385678805001003.150705710785251455480536605610791123456
53 Indexed RandomKey values of the physical records are not necessarily in logical sequenceIndex may be stored and accessed with Indexed Sequential Access MethodIndex has an entry for every data base record. These are in ascending order. The index keys are in logical sequence. Database records are not necessarily in ascending sequence.Access method may be used for storage and retrieval
54 Indexed Random Address Block Number 2 1 3 Actual Value Adams Becker DumplingGettaHarty
55 Btree F | | P | | Z | R | | S | | Z | H | | L | | P | B | | D | | F | DevilsAcesBoilersCarsMinorsPanthersSeminolesFlyersHawkeyesHoosiers
56 Direct (Random) File Organization Records are read directly from or written on to the file.The records are stored at known address.The address is calculated by applying a mathematical function tothe key field.
57 DirectKey values of the physical records are not necessarily in logical sequenceThere is a one-to-one correspondence between a record key and the physical address of the recordMay be used for storage and retrievalNo duplicate keys permitted
58 HashingA bucket is a unit of storage containing one or more records (a bucket is typically a disk block).In a hash file organization we obtain the bucket of a record directly from its search-key value using a hash function.Hash function h is a function from the set of all search-key values K to the set of all bucket addresses B.
59 Hashing OrganizationHash function is used to locate records for access, insertion as well as deletion.Records with different search-key values may be mapped to the same bucket; thus entire bucket has to be searched sequentially to locate a record.