Presentation is loading. Please wait.

Presentation is loading. Please wait.

File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical.

Similar presentations


Presentation on theme: "File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical."— Presentation transcript:

1 File Structures Dale-Marie Wilson, Ph.D.

2 Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical storage e.g. magnetic disks Nonvolatile Cheaper

3 Basic Concepts 2° storage organized into files Each file has one or more records Each record has one or more fields Process User requests tuple e.g. SG37 DBMS maps logical record to physical record Physical record moved to DBMS buffers N.B. Physical record is unit of transfer between disk and primary storage

4 Basic Concepts Physical record typically consists of more than 1 logical record Logical record can correspond to more than 1 physical record Refer to physical record as blocks and pages staffNolNamepositionbranchNo SL21WhitemanagerB005 SG37BeechAssistantB003 SG14FordSupervisorB003 SA9HoweAssistantB007 SG5BrandManagerB003 SL41LeeAssistantB005 Page 1 2

5 Basic Concepts File organization Physical arrangement of data in file into records and pages in 2° storage Determines order records stored and accessed Types Heap (unordered) Records place on disk in no specific order Sequential (ordered) Records ordered by value of specific field Hash Records placement determined by hash function

6 Basic Concepts Access method Steps involved in storing and retrieving records from file

7 Heap Files Unordered files Aka heap files Simplest organization Records placed in same order inserted Linear search for retrieval Insertion efficient; retrieval not efficient Deletion process Relevant page identified Record marked as deleted Page rewritten to disk N.B. deleted record space not reused → performance deterioration Best suited for bulk loading data

8 Ordered Files Ordered files Aka sequential files Sorted on field – ordering field If ordering field = key → ordering key Binary search for retrieval Insertion and deletion problematic Need to maintain order of records Rarely used unless 1° index exists

9 Hash Files Hash files Aka random/direct files Hash function used to det. page address for storing record Chosen to provide most even distribution of records – min. collisions Examples: Folding – applying arithmetic function to hash field e.g. + 7 Division-remainder – uses mod function to det. field value Each address corresponds to a page/bucket Each bucket has slots for multiple records – placed in order of arrival Base field – hash field If hash field = key → hash key Collision Hash function does not calculate unique address for 2 or more records

10 Hash Files Collision management techniques Open addressing Unchained overflow Chained overflow Multiple hashing

11 Collision Management Open addressing Linear search performed to locate 1 st available slot Same procedure for searching for record Record doesn’t exist if empty slot found before record located

12 Collision Management Unchained overflow Overflow area maintained for collisions Improves over open addressing by minimizing collisions Staff SA9 record Staff SL21 record Staff SG5 record Staff SG14 record Staff SG37 record Bucket Staff SL41 record Bucket 0 1 2 3 4

13 Collision Management Chained overflow Overflow area maintained for collisions Uses synonym pointer Additional field that indicates whether collision occurred If collision, contains bucket address of overflow area Staff SA9 record Staff SL21 record Staff SG5 record Staff SG14 record Staff SG37 record Bucket Staff SG7 record Bucket 0 1 2 3 4 0 0 3

14 Collision Management Multiple hashing If collision occurs, new hash function performed 2 nd hash function typically used to place record in overflow area

15 Indexes Index Data structure that allows DBMS to locate particular records in file more quickly Similar to index in book Main types of indices: Primary index Index a key field Clustering index File sequentially ordered on non-key field i.e. more than record can correspond with index Secondary index Index defined on non-ordering field of data file

16 Indexes File can have: At most 1 primary or 1 clustering index Several secondary indices Index may be: Dense Index record for every search key value Sparse Index record for some key search values

17 Indexed Sequential Files Indexed sequential file Sorted data file with primary index Has: Primary storage area Separate index Overflow area

18 Multilevel Index Multilevel index Index treated as file and split into smaller indices Overcomes problems with large indices that span several pages

19 B + Trees Search Tree Used to guide search for a record, given the value of one of its fields Two types of Nodes Internal Nodes contain Key values and node pointers Leaf Nodes contain Key, Record-Pointer pairs Degree/order Max # children allowed B-tree – balanced tree Depth from root to leaf same for every leaf

20 B + Trees The structure of internal nodes in a B + tree of order p: Each internal node is of the form, where q <= p, each P i is a tree pointer Within each internal node, K 1 < K 2 <... < K q-1 For all values of X in the subtree pointed at by P i, we have K i-1 < X < K i for 1 < i < q, X < K i for i=q, and K i-1 < X for i=q Each internal node has at most p tree pointers Each internal node, except the root, has at least (p/2) tree pointers. The root node has at least two tree pointers if it is an internal node. An internal node with q pointers, q <= p, has q-1 search field values.

21 B + Trees

22 The structure of leaf nodes in a B + tree of order p: Each leaf node is of the form,,...,, P next >, where q <= p, each Pr i is a data pointer that points to a record or block of records Within each internal node, K 1 < K 2 <... < K q-1 Each leaf node, has at least (p/2) values All leaf nodes are at the same level The P next pointer points to the next leaf node in the tree This give efficient sequential access to data

23 B + Trees

24 Insertion example for B + Tree: When you insert into a leaf node that is full, you split and pass the rightmost value up to the parent When you insert into a full root, the root splits and a new root is created with the middle value from the child nodes Otherwise, values are inserted into openings at the lowest level

25

26

27 Appendix F Assignment #7


Download ppt "File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical."

Similar presentations


Ads by Google