Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 75 Database Systems II Record Organization.

Similar presentations


Presentation on theme: "CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 75 Database Systems II Record Organization."— Presentation transcript:

1 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 75 Database Systems II Record Organization

2 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 76 Introduction We have introduced secondary storage devices, in particular disks. Disks use blocks as basic units of transfer and storage. In a DBS, we have to manage entities, typically represented as records with attributes (relational model). Attribute values (data items) can be complex: texts, images, videos. How to organize records on disk?

3 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 77 Introduction Data Items Records Blocks Files Memory

4 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 78 Data Items Short Integer, unsigned (16 Bits) e.g., 35 is Short Integer, signed e.g., -35 is 0000000000100011 2 5 + 2 1 + 2 0 1000000000100011

5 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 79 Data Items Floating point (32 Bits, single precision) 1 bit for sign, m for exponent, n bits for mantissa value = (-1) sign x 2 exponent – bias x 1.fraction bias = 2 m-1 -1 = 127 = 0.15625 124 1.01 (binary) = 1.25 (decimal)

6 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 80 Data Items Characters various coding schemes suggested most popular is ASCII 1 Character = 1 Byte = 8 Bits examples A:1000001 a:1100001 5:0110101 LF:0001010

7 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 81 Data Items String of characters NULL-terminated e.g., length indicator e.g., fixed length do not need terminating NULL or length indicator ctamuoescta43muoes VARCHAR CHAR

8 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 82 Records Fixed format e.g., relational data model record is list of data items number and type of data items fixed vs. variable format e.g., XML number of data items variable Fixed length vs. variable length e.g., VARCHAR, blobs, repeating fields

9 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 83 Records Record header keeps general information about the record. Header contains (some of) the following: - pointer to schema, - record types, - record length (to skip record without consulting schema), - timestamp of last access, - pointers (offsets) to record attributes.

10 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 84 Fixed-Length Records Fixed length records are the simplest format. Header contains only pointer to schema, record length and timestamp. Example (1) id, 2 byte integer (2) name, 10 char. Schema (3) dept, 2 byte code 55 s m i t h 02 83 j o n e s 01 Records

11 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 85 Variable-Length Records Variable length records first store fixed length fields, followed by variable-length fields. Header contains also pointers (offsets) to variable-length fields (except first one). NULL values can be efficently represented by null pointers in the header. headerbirthdatenameaddress fixedvariable

12 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 86 Variable Format Records Variable format records are self-describing. Simplest representation as sequence of tagged fields. Record header contains number of fields. Tag contains - attribute name, - attribute type, - field length (if not apparent from attribute type).

13 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 87 Variable Format Records Example 4I524SDROF46 # Fields Code identifying field as E# Integer type Value Code for EName String type Length of str. Value

14 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 88 Introduction How to pack records into blocks? Need to address the following issues - separating records, - spanned vs. unspanned, - mixing record types, - ordering records, - addressing records, - BLOBs. Packing Records Into Blocks

15 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 89 Separating records for fixed-length records, no need to separate for variable-length records use special marker or store record lengths (or offsets) - within each record - in block header Packing Records Into Blocks

16 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 90 Spanned vs. unspanned Unspanned records on a single block block 1block 2 Spanned records split into fragments that are distributed over multiple blocks block 1block 2 Packing Records Into Blocks R1R2R3R4R5 R1R2 R3 (a) R3 (b) R6R5R4 R7 (a)

17 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 91 Packing Records Into Blocks Spanned vs. unspanned Spanning necessary if records do not fit in a block, e.g. if they have very long fields (VARCHAR, blob). Spanning useful for better space utilization, even if records fit in a block. 2050 bytes wasted 2046 2050 bytes wasted 2046 R1R2

18 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 92 Packing Records Into Blocks Spanned vs. unspanned Spanned records requires extra header information in records and record fragments: - is it fragment? (bit) - if fragment, is it first or last? (bit) - if applicable, pointers to previous/next fragment.

19 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 93 Packing Records Into Blocks Mixing record types Can we mix records of different types (relations)in same block? Why should we? Records that are frequently accessed together should be in the same block (clustering). Compromise: no mixing on same block, but keep related records in same cylinder. EMP e1 DEPT d1 DEPT d2

20 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 94 Packing Records Into Blocks Ordering records Want to efficiently read records in order Ordering records in file and block by some key value (sequential file) Implementation options - next record physically contiguous - link to next record Next (R1)R1 Next (R1)

21 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 95 Packing Records Into Blocks Ordering records Overflow area R1 R2 R3 R4 R5 header R2.1 R1.3 R4.7 records in sequence  links more flexible under modifications, but require random I/O

22 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 96 Packing Records Into Blocks Addressing records In memory records have virtual memory address. How to address a record on disk? Trade-off between efficiency of dereferencing and flexibility in moving records over time Many options, from purely physical to fully indirect (logical)

23 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 97 Packing Records Into Blocks Addressing records Purely physical direct address on disk Record address (record ID) consists of cylinder # track # block # offset in block  efficient access, but cannot move record within block or across blocks

24 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 98 Packing Records Into Blocks Addressing records Fully indirect (logical) address (record ID) can be any bit string, needs to be mapped to physical address  inefficient access, but can move record anywhere Physical addr. Rec ID Often 8-16 Bytes! Typically ≤ 8 Bytes map

25 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 99 Packing Records Into Blocks Addressing records Indirection in block  compromise between physical and fully indirect Header Free space R3 R4 R1R2 offsets

26 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 100 Packing Records Into Blocks BLOBs Sometimes, very large data items, e.g. images, videos,... Store them as Binary Large Objects (BLOBs) Stored as spanned sequence of blocks - preferably contiguous on disk, - possibly even striped over several disks.

27 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 101 Record Modifications Insertions If record order does not matter, just append record to file, i.e. find a block of that file with empty space. If order does matter, may be more complicated. First locate corresponding block. If block has enough space left, insert record there and rearrange records in block.

28 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 102 Record Modifications Insertions If block does not have enough space left, use one of the two following strategies: - find space in nearby block predecessor or successor in sort order - create overflow block and link it into chain of blocks  only if order implemented by links

29 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 103 Record Modifications Insertions Overflow blocks raise the following issues: - How much free space to leave in each block, track, cylinder? - How often to reorganize file and overflow area? Free space

30 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 104 Record Modifications Deletions Try to reclaim space. Either move around records within a block or maintain available-space list. Must avoid dangling pointers to deleted records. Typical approach: place tombstone in place of deleted record.

31 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 105 Record Modifications Deletions physical record IDs This spaceThis space can never re-usedbe re-used

32 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 106 Record Modifications Deletions logical record IDs IDLOC 7788 map Never reuse ID 7788 nor space in map...

33 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 107 Buffer Management Introduction The part of the DB relevant for the current transactions needs to reside in memory, i.e. in the DB buffer. DB MAIN MEMORY DISK disk block free frame BUFFER POOL choice of frame dictated by replacement policy

34 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 108 Buffer Management Block request If requested block is not in buffer pool, choose a frame for replacement. If this frame is dirty, write it to disk. Read requested page into chosen frame. Pin the page and return its address.

35 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 109 Buffer Management Block replacement Requestor of page must unpin it, as soon as block is no longer needed. Requestor must also indicate whether block has been modified: dirty bit is used for this. Block in pool may be requested many times, and a pin count is used. Block is candidate for replacement if and only if pin count = 0.

36 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 110 Buffer Management Replacement policy Frame is chosen for replacement by a replacement policy: - Least-recently-used (LRU), - Clock, efficient approximation of LRU - Most-recently-used (MRU) etc. Policy can have big impact on number of I/O’s, depends on the access pattern. Will discuss more in the context of query optimization.


Download ppt "CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 75 Database Systems II Record Organization."

Similar presentations


Ads by Google