2Introduction Files, file structures, DB files, indexes B-trees and B+-treesReference Book (in VUW Library): Fundamentals of Database Systems by Elmasri & Navathe (2006), (Chapters 13 and 14 only!)
3The Memory Hierarchy Registers Memory caches Primary memory: in CPU.Memory cachesVery fast, smallCopy of bits of primary memoryNot controllable.Primary memory:Fast, largeVolatileSecondary Storage (hard disks)Slow, very largePersistentData cannot be manipulated directly,data must be copied to main memory,modifiedwritten back to diskNeed to know how files are organised on disk
4Disk Organisation Data on a disk is organised into chunks: Sectors: physical organisation.hardware disk operations work on sector at a timetraditionally: 512 bytesmodern disks: 4096 bytesCan't address more than 4G sectors with 32 bit number⇒ at most 2TB with 512 byte sectorsBlocks:logical organisationoperating system retrieves and writes a block at a time512 bytes to 8192 bytes,⇒ For all efficient file operations, minimise block accesses
5Reading/Writing from files In memoryDiskProgramDiskBuffersDiskControllerHold oneblock ata time
6File Organisation Files typically much larger than blocks A file must be stored in a collection of blocks on diskMust organise the file data into blocksMust record the collection of blocks associated with the fileMust enable access to the blocks (sequential or random access)DiskIn memoryProgramDiskBuffersDiskControllerHold oneblock ata time
7Files of recordsFile may be a sequence of records (especially for DB files)record = logical chunk of informationeg a row from a DataBase tablean entry in a file system description…May have several records per block"blocking factor" = # records per blockcan calculate block number from record number (& vice versa) (if records all the same size)# records = # blocks bf
8Using B+ trees for organising Data B+ tree is an efficient index for very large data setsThe B+ tree must be stored on disk (ie, in a file)⇒ costly actions are accessing data from diskRetrieval action in the B+ tree is accessing a node⇒ want to make one node only require one block access⇒ want each the nodes to be as big as possible ⇒ fill the blockB+ tree in a file:one node (internal or leaf) per block.links to other nodes = index of block in file.need some header info.need to store keys an values as bytes.
9Implementing B Tree in a File To store a B Tree in a block-structured fileNeed a block for each node of the treeCan refer to blocks by their index in the file.Need an initial block (first block in file) with meta data:index of the root blocknumber of itemsinformation about the item sizes and types?Need a block for each internal nodetypechild item child key…. key childlink to nextkey-value key-value …. key-value……………index of block containing child node………
10Cost of B treeIf the block size is 1024 bytes, how big can the nodes be?Node requiressome header informationleaf node or internal nodenumber of items in node,internal node:mN x item sizemN+1 x pointer sizeleaf nodemL x item sizepointer to next leafHow big is an item?How big is a pointer?Must specify which node,ie which block of the fileLeaf nodes could hold more values than internal nodes!
11Cost of B tree If block has 1024 bytes If key is a string of up to 10 characters ⇒ 10 bytesif value is a string of up to 20 characters ⇒ 20 bytesIf child pointer is an int ⇒ 4 bytesInternal nodesize = (10 + 4) mN + 4⇒ mN ≤ (1024 – 9 )/ 14 = ⇒ 72 keys in internal nodesLeaf nodesize = ( ) mL⇒ mL ≤ (1024 – 9 )/ 30 = = ⇒ 33 key-value pairs in leaves