Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database System Implementation CSE 507

Similar presentations


Presentation on theme: "Database System Implementation CSE 507"— Presentation transcript:

1 Database System Implementation CSE 507
Introduction and File Structures Some slides adapted from R. Elmasri and S. Navathe, Fundamentals of Database Systems, Sixth Edition, Pearson. And Silberschatz, Korth and Sudarshan Database System Concepts – 6th Edition.

2 Course Logistics Two classes per week
Tuesday (10:00am in C11) and Thursday (11:30am in C01) People Involved Course Textbook: R. Elmasri and S. Navathe, Fundamentals of Database Systems. A. Silberschatz, H. Korth and S. Sudashan: Database System Concepts Dr. Viswanath Gunturi Instructor Kapish Malik Teaching Assistant Kanchanjot Kaur Harbeer Singh Aanchal Mongia Priyanka Gupta Naveen Kumar

3 Deliverables 4 homework assignments (11% each) to be done in a team
containing both textbook and programming parts Mid-Term Exam (16%) Final Exam (22%) 2 Quizzes (9% each) (Best two of three quizzes)

4 Policies Makeup Exam or Quiz Policy:
Academic Dishonesty policy of IIIT Delhi does apply Makeup Exam or Quiz Policy: Make-up exams will cover significantly more syllabus. Late submission policy on homeworks: >0 and =<24 hours  reduction of 30% Within 24 to 48 hours  reduction of 50% > 48%  No score! No separate grading scheme for Btechs and Mtechs. Course Web page:

5 Overview of the Course

6 Schematic of a Database System
Conceptual Models Logical Model Physical Model A Database System

7 Schematic of a Database System
Goal: Capture the real world concepts to be modeled in the application E.g., ER diagrams Conceptual Models Logical Model Physical Model A Database System

8 Schematic of a Database System
Conceptual Models Goal: Mathematical representation of the application related concepts. E.g., relational operators, select project, join, normal forms, SQL queries. Logical Model Physical Model A Database System

9 Schematic of a Database System
Conceptual Models Logical Model Goal: Implement the mathematical concepts into a scalable system code which works for a variety of datasets. Physical Model A Database System

10 Schematic of a Database System
Conceptual Models Logical Model Physical Model Focus of this course! A Database System

11 Why is this a part of Data Engineering Stream?
In real world systems “Rules” governing the scalability go beyond big-O asymptotic analysis They dependent on the nature of data. No clear dominance among query processing algorithms For e.g., a O(n2) algorithm may be better than O(nlog n) The system needs to take decisions depending on input data.

12 Why is this a part of Data Engineering Stream?
Take away skill: Ability to think from a system’s perspective. Under what conditions (properties of data) would this algorithm work better? What parameters define this dominance zone?

13 Topics Covered Introduction and File Structures Index Structures
Query processing techniques Query Optimization Transactions Concurrency Control Recovery Database Security Distributed Databases

14 Topics Covered Introduction and File Structures Index Structures
Query processing techniques Query Optimization Transactions Concurrency Control Recovery Database Security Distributed Databases For these topics, we will cover something from the textbook and some material from well cited research papers reflecting the current state of the art.

15 Basics on Disk Storage

16 Memory Hierarchies and Storage Devices
Computer storage media forms a storage hierarchy that includes: Primary Storage CPU cache (static RAM). Main memory (dynamic RAM). Fast but more expensive. Both are volatile in nature.

17 Memory Hierarchies and Storage Devices
Computer storage media forms a storage hierarchy that includes: Primary Storage CPU cache (static RAM). Main memory (dynamic RAM). Fast but more expensive. Both are volatile in nature. Secondary Storage Magnetic disks Optical disks, e.g., CD-ROMs, DVDs etc. Not-so expensive, slower that primary storage Non-volatile in nature.

18 Memory Hierarchies and Storage Devices
Newly emerging Flash memory Non-volatile Speed-wise somewhat between DRAM and magnetic disks Based on “Electronically erasable programmable Read-Only memory” Disadvantage: It can have only a finite number of erases.

19 Storage of Databases They are usually too large to fit in main memory.
Also they need to store data that must persist over time. We prefer secondary storage devices, e.g., magnetic disks.

20 Storage of Databases Why do we need to be smart about storing databases? Databases are typically large. A poor design may lead to increase in query, insert, delete and recovery times. Imagine requirements for systems like airline reservations, and VISA transactions.

21 Secondary Storage Devices: Magnetic Disk
Data stored as magnetized areas on magnetic disk surfaces. Disks are divided into concentric circular tracks on each disk surface. A track is divided into smaller sectors The division of a track into sectors is hard-coded on the disk surface A portion of a track that subtends a fixed angle at the center is a sector. This angle can be fixed or decrease as we move outward.

22 Secondary Storage Devices: Magnetic Disk

23 Secondary Storage Devices: Magnetic Disk

24 Secondary Storage Devices: Magnetic Disk
Division of a track into equal-sized disk blocks (or pages) Done by the operating system during formatting. Typical disk block range from 512 – 8192 bytes. Whole blocks are transferred between disk and main memory for processing.

25 Accessing a Magnetic Disk
A disk is a random access addressable device. Transfer between main memory and disk takes place in units of disk blocks. Hardware address of a block is a combination of Cylinder number Track number Sector number within the track.

26 Accessing a Magnetic Disk
Step 1: Mechanically position the read/write head to correct track/cylinder. Time required to do  Seek time. Step 2: Beginning of the desired block rotates into position under the read/write head. Time required to do so  Rotational delay or latency. Step 3: A block worth (could be a series of sectors) of data is transferred Time required to do so  Block transfer time. Total time required = Seek time + Rotational delay + Block transfer time. Seek time and rotational delay are much larger than the block transfer time.

27 Accessing a Magnetic Disk -- Example
Assume following values for disk parameters: #bytes per sector = 4096 And total of 128 sectors per track Total number of tracks in disk = Disk speed of rotation = 7200rpm  time for one rotation = 8.33 millisecond Seek time = 1 millisec to start and stop + 1 millisec to travel every 1000 cylinders Cost to move one track = millisec Total time to move across entire disk = millisec

28 Accessing a Magnetic Disk -- Example
Assume following values for disk parameters: #bytes per sector = 4096 And total of 128 sectors per track Total number of tracks in disk = Disk speed of rotation = 7200rpm  time for one rotation = 8.33 millisecond Seek time = 1 millisec to start and stop + 1 millisec to travel every 1000 cylinders Cost to move one track = millisec Total time to move across entire disk = millisec What is the min and max time to read a 16,384 byte block?

29 What is the min and max time to read a 16,384 byte block?
Minimum: Happens when disk head is just over the starting sector 4096 bytes per sector  1 block occupies 16384/4096 =4 sectors Therefore, head needs to pass through 4 sectors + 3 gaps Assume each track (avg) 128 sectors gaps Assume gaps 10% and sectors 90% of the track Total angle traveled = 36X(3/128) + 324X(4/128) = degrees Therefore total transfer time = (10.97/360) X time-for-one-rotation = seconds.

30 What is the min and max time to read a 16,384 byte block?
Maximum: Happens if we need to move the head across entire disk and also have one full rotation of disk: Total time = Time for head to move across entire disk + Time for one full rotation + time-for-best-case = = millisec

31 What is the min and max time to read a 16,384 byte block?
Maximum: Happens if we need to move the head across entire disk and also have one full rotation of disk: Total time = Time for head to move across entire disk + Time for one full rotation + time-for-best-case = = millisec Seek Time is usually the greatest and dominates

32 Seek Time is usually the greatest and dominates
Techniques to reduce that: Elevator Algorithm Cylinder Based Organization Multiple Disks Mirroring Prefetching and Double Buffering:

33 Seek Time is usually the greatest and dominates
Techniques to reduce that: Disk Scheduling Elevator Algorithm: Similar to an elevator making sweeps between ground floor and top floor. As head pass a cylinder it stops if there are one or more requests for blocks on that cylinder. Then head proceeds in same direction. Reverse when the head reaches an end.

34 Seek Time is usually the greatest and dominates
Disk Scheduling Elevator Algorithm Example: Assume following values for disk parameters: Time for transferring one block = millisec Average rotational latency: 4.17 millisec Seek time = 1 millisec to start and stop + 1 millisec to travel every cylinders

35 Seek Time is usually the greatest and dominates
Disk Scheduling Elevator Algorithm Example: Assume following values for disk parameters: Time for transferring one block = millisec Average rotational latency: 4.17 millisec Seek time = 1 millisec to start and stop + 1 millisec to travel every cylinders

36 Disk Scheduling Elevator Algorithm Example:
Cylinder Request Time of request Time of service? 2000 ? 6000 14000 4000 10 16000 20 10000 30 Assume head is at cylinder number 2000 at the beginning

37 Disk Scheduling Elevator Algorithm Example:
Cylinder Request Time of request 2000 6000 14000 4000 10 16000 20 10000 30 For 2000: = 4.42 ms For 6000: ( )/ = ms For 14000: (4000 is skipped) = 27.26ms For 16000: = 34.68ms Head returns from 16000th track Assume head is at cylinder number 2000 at the beginning

38 Compare it with first come first serve?
Disk Scheduling Elevator Algorithm Example: Cylinder Request Time of request Time of service 2000 4.42 6000 13.84 14000 27.26 4000 10 57.52 16000 20 34.68 10000 30 46.10 Compare it with first come first serve?

39 Cylinder Request Time of request Time of service In elevator algorithm Time of service in FCFS 2000 4.42 6000 13.84 14000 27.26 4000 10 57.52 42.68 16000 20 34.68 60.10 10000 30 46.10 71.52

40 As the pool of requests grow large Elevator algorithm would give much more better results.

41 Seek Time is usually the greatest and dominates
Techniques to reduce that: Cylinder Based Organization Store data that is likely to be accessed together (a relation) on the same cylinder. Or prefer adjacent cylinders in case of large relations. Disadvantage: Not very good for random access!

42 Seek Time is usually the greatest and dominates
Techniques to reduce that: Multiple Disks: Better throughput for both random access or “patterned” access. Sometimes cost becomes an issue. But still used in high end data warehouses.

43 Seek Time is usually the greatest and dominates
Techniques to reduce that: Mirroring Disks: Cost becomes an issue. We pay twice the cost for same storage Writing data can have the issue of contention/locking.

44 Seek Time is usually the greatest and dominates
Double Buffering: we predict the order in which blocks would be processed. So we load them into main memory before we use them. Made possible as we typically have an independent I/O processor.

45 Placing Files Records on Disk

46 Types of Records Records contain fields which have values of a particular type E.g., amount, date, time, age Records may be of fixed length or of variable length. Variable Length Records can be due to: Variable length fields (e.g, varchar). Some fields may have multiple values. Some fields may be optional. We can have different kind of records.

47 How to put these on a disk?
Fixed length records  Each field can be easily identified from first byte. Handling Variable Length Records ??

48 How to put these on a disk?
Fixed length records  Each field can be easily identified from first byte. Handling Variable Length Records ?? Variable length fields (e.g, varchar)  Separator character after the field Some fields may have multiple values. Some fields may be optional. Different kinds of records.

49 How to put these on a disk?
Fixed length records  Each field can be easily identified from first byte. Handling Variable Length Records ?? Variable length fields (e.g, varchar). Fields may have multiple values.  Two separator characters Some fields may be optional. Different kinds of records.

50 How to put these on a disk?
Fixed length records  Each field can be easily identified from first byte. Handling Variable Length Records ?? Variable length fields (e.g, varchar). Fields may have multiple values. Some fields may be optional  Store <field-name , field-value> Different kinds of records.

51 How to put these on a disk?
Fixed length records  Each field can be easily identified from first byte. Handling Variable Length Records ?? Variable length fields (e.g, varchar). Fields may have multiple values. Some fields may be optional. Different kinds of records  Include a record-type character.

52 Blocking Blocking: Refers to storing a number of records in one block on the disk. Blocking factor (bfr) refers to the number of records per block. There may be empty space in a block if an integral number of records do not fit in one block. Spanned Records: Refers to records that exceed the size of one or more blocks and hence span a number of blocks. Variable vs Fixed length records.

53 Files of Records File records can be unspanned or spanned
Unspanned: no record can span two blocks Spanned: a record can be stored in more than one block The physical disk blocks that are allocated to hold the records of a file can be contiguous, linked, or indexed. Files of variable-length records require additional information to be stored in each record, such as separator characters and field types. Usually spanned blocking is used with such files.

54 Storage of Databases Primary File Organization
How file records are physically stored on the disk? Heap file Sorted file Hashed file. Secondary File Organization An auxiliary access structure Allows efficient access to file records based on alternate fields. They mostly exist as indexes.

55 Files of Unordered Records
Also called a heap or a pile file. New records are inserted at the end of the file. A linear search through the file records is necessary to search for a record. This requires reading and searching half the file blocks on the average, and is hence quite expensive. Record insertion is quite efficient. Reading the records in order of a particular field requires sorting the file records. What about deletion? How can we make it little bit more efficient?

56 Files of Ordered Records
File records are kept sorted by the values of an ordering field. Insertion is expensive: records must be inserted in the correct order. It is common to keep a separate unordered overflow (or transaction) file for new records to improve insertion efficiency; this is periodically merged with the main ordered file. A binary search can be used to search for a record on its ordering field value. Reading the records in order of the ordering field is quite efficient. Deletion handled through deletion markers and re-organization Updating a field ? Key vs Non-Key attribute.

57 Files of Ordered Records

58 Hashing Techniques

59 Introduction to Hashing
Each data-item with hash key value K is stored in location i, where i=h(K), and h is the hashing function. Search is very efficient on the hash key. Collisions occur when a new record hashes to a address that is already full An overflow file is kept for storing such records.

60 Static Hashing A bucket is a unit of storage containing one or more records (a bucket is typically a disk block). In a hash file organization we obtain the bucket of a record directly from its search-key value using a hash function. Hash function h is a function from the set of all search-key values K to the set of all bucket addresses B. Hash function is used to locate records for access, insertion as well as deletion. Records with different search-key values may be mapped to the same bucket; thus entire bucket has to be searched sequentially to locate a record.

61 Example File organization with Hashing
Hash file organization of instructor file, using dept_name as key (See figure in next slide.) There are 10 buckets, The binary representation of the ith character is assumed to be the integer i. The hash function returns the sum of the binary representations of the characters modulo 10 E.g. h(Music) = h(History) = h(Physics) = 3 h(Elec. Eng.) = 3

62 Example File organization with Hashing
Hash file organization of instructor file, using dept_name as key (see previous slide for details).

63 Mapping to Secondary Memory

64 Desirable properties of a Hash Function
Worst hash function maps all search-key values to the same bucket; An ideal hash function is uniform, i.e., each bucket is assigned the same number of search-key values from the set of all possible values. Ideal hash function is random, so each bucket will have the same number of records assigned to it irrespective of the actual distribution of search-key values in the file. Typical hash functions perform computation on the internal binary representation of the search-key.

65 Handling Collisions Hashing
Bucket overflow can occur because of Insufficient buckets Skew in distribution of records. This can occur due to two reasons: multiple records have same search-key value chosen hash function produces non-uniform distribution of key values

66 Handling Collisions Hashing
There are numerous methods for collision resolution: Open addressing: Proceeding from the occupied position check the subsequent positions in order until an unused position is found. Chaining: various overflow locations are kept, usually by extending the array with a number of overflow positions. Which of these are suitable for Databases?

67 Handling Collisions in Hashing

68 Lets Evaluate Static Hashing
Think in following terms: Time required for search and insert. Space utilization?

69 Lets Evaluate Static Hashing
Think in following terms: Time required for search and insert. Space utilization? What if Database grows or shrinks with time ?

70 Lets Evaluate Static Hashing
In static hashing, function h maps search-key values to a fixed set of B of bucket addresses. Databases grow or shrink with time. If initial number of buckets is too small, and file grows, performance will degrade due to too much overflows.

71 Lets Evaluate Static Hashing
In static hashing, function h maps search-key values to a fixed set of B of bucket addresses. Databases grow or shrink with time. If initial number of buckets is too small, and file grows, performance will degrade due to too much overflows. If space is allocated for anticipated growth, a significant amount of space will be wasted initially (buckets will be under full). If database shrinks, again space will be wasted.

72 Lets Evaluate Static Hashing
In static hashing, function h maps search-key values to a fixed set of B of bucket addresses. Databases grow or shrink with time. If initial number of buckets is too small, and file grows too much overflows. If space is allocated for anticipated growth, a significant amount of space will be wasted initially. One solution: Periodic re-organization with a new hash function Its expensive, disrupts normal operations

73 Hashing For Dynamic File Extension
Extendible hashing – one form of dynamic hashing Hash function generates values over a large range — typically b-bit integers, with b = 32. At any time use only a prefix of the hash function to index into a table of bucket addresses.

74 Hashing For Dynamic File Extension
Extendible hashing Let the length of the prefix be i bits, 0  i  32. Bucket address table size = 2i Initially i = 0 Value of i grows and shrinks as the size of the database grows and shrinks. Multiple entries in the bucket address table may point to a bucket (why?)

75 Extendible Hashing Local Depth Global Depth Local Depth Local Depth

76 Extendible Hashing Local Depth: Each bucket j stores a value ij as its local depth All the entries that point to the same bucket have the same values on the first ij bits.

77 Extendible Hashing To locate the bucket containing search-key K:
1. Compute h(K) = X Use the first i high order bits of X as a displacement into bucket address table, and follow the pointer to appropriate bucket

78 Extendible Hashing To insert a record with search-key value Knew
same procedure as look-up and locate the bucket, say j. If there is room in the bucket j insert record in the bucket. Else the bucket must be split and insertion re-attempted.

79 Splitting a bucket in Extendible Hash
If Global Depth > Local Depth i > ij (more than one pointer to bucket j) Allocate a new bucket z, and set ij = iz = (ij + 1) Update the second half of the bucket address table entries originally pointing to j, to point to z Remove each record in bucket j and reinsert (in j or z) Recompute new bucket for Knew and insert record in the bucket Depending on implementation logic further splitting may or may not be done if the new bucket is still overflowing.

80 Splitting a bucket in Extendible Hash
If Global Depth(i) = Local Depth (only one pointer to bucket j) If i reaches some limit b (depends on implementation), or too many splits have happened in this insertion, create an overflow bucket Else (Idea 1 for bucket address table expansion) increment i and double the size of the bucket address table. Each entry in the old table  gives two new entries Go through each bucket. If it was overflowing (due to some past choices), try to resolve by re-hashing with expanded bucket add table. Else point both new entries to the bucket their parent was pointing to. Adjust the local depths. Insert Knew into the new file with expanded add table. Depending on implemen logic further split may or may not happen. Or a simple case of (global dep > local dep) described on previously on slide 79 may happen.

81 Splitting a bucket in Extendible Hash
If Global Depth(i) = Local Depth (only one pointer to bucket j) If i reaches some limit b (depends on implementation), or too many splits have happened in this insertion, create an overflow bucket Else (Idea 2 for bucket address table expansion) increment i and double the size of the bucket address table. Each entry in the old table  gives two new entries Go through each bucket. Re-hash all the keys. If a newly created entry is still pointing to NULL then make it point to the bucket of its “parent entry.” Adjust local depths Insert Knew into the new file with expanded add table. Depending on implemen logic further split may or may not happen. Or a simple case of (global dep > local dep) described on previously on slide 79 may happen.

82 Splitting a bucket in Extendible Hash
If Global Depth = Local Depth (only one pointer to bucket j) If i reaches some limit b (depends on implementation), or too many splits have happened in this insertion, create an overflow bucket Else (Idea 3 for bucket address table expansion) increment i and double the size of the bucket address table. replace each entry in the table by two entries pointing to the same bucket. Adjust the local depths. recompute new bucket address table entry for Knew Now i > ij (global dep > local dep) so use the first case of insert described previously on slide 79.

83 Illustrating an Extendible Hash: Dataset

84 Illustrating an Extendible Hash

85 Illustrating an Extendible Hash
Initial Hash structure; bucket size = 2

86 Illustrating an Extendible Hash
Initial Hash structure; bucket size = 2 Local Depth Global Depth Insert this Record

87 Illustrating an Extendible Hash
Initial Hash structure; bucket size = 2 Local Depth Global Depth 10101 Srinivasan Comp Insert this Record

88 Illustrating an Extendible Hash
Initial Hash structure; bucket size = 2 Local Depth Global Depth 10101 Srinivasan Comp Insert this Record

89 Illustrating an Extendible Hash
Initial Hash structure; bucket size = 2 Local Depth Global Depth 10101 Srinivasan Comp 15151 Mozart Music Insert this Record

90 Illustrating an Extendible Hash
Initial Hash structure; bucket size = 2 Local Depth Global Depth 10101 Srinivasan Comp 15151 Mozart Music Insert this Record

91 Illustrating an Extendible Hash
Local Depth == Global Depth Bucket address table needs to expand Initial Hash structure; bucket size = 2 Local Depth Global Depth 10101 Srinivasan Comp 15151 Mozart Music Insert this Record

92 Illustrating an Extendible Hash
Local Depth == Global Depth Bucket address table needs to expand Step 1: Increase the directory size Global Depth 1 Hash Prefix 1

93 Illustrating an Extendible Hash
Step 2: Re-hash all the old records Re-Hash all the old Records + the new record (using idea 2) Global Depth 1 Hash Prefix 1

94 Illustrating an Extendible Hash
Step 2: Re-hash all the old records Re-Hash all the old Records + the new record (using idea 2) Global Depth 1 Mozart Music 4000 Hash Prefix 15151 1

95 Illustrating an Extendible Hash
Step 2: Re-hash all the old records Re-Hash all the old Records + the new record (using idea 2) Global Depth 1 Hash Prefix 15151 Mozart Music 4000 1

96 Illustrating an Extendible Hash
Step 2: Re-hash all the old records Re-Hash all the old Records + the new record (using idea 2) Global Depth 1 Hash Prefix 15151 Mozart Music 4000 1 10101 Srinivasan Comp 65000

97 Illustrating an Extendible Hash
Step 2: Re-hash all the old records Re-Hash all the old Records + the new record (using idea 2) Global Depth 1 Hash Prefix 15151 Mozart Music 4000 1 10101 Srinivasan Comp 65000

98 Illustrating an Extendible Hash
Step 2: Re-hash all the old records Re-Hash all the old Records + the new record (using idea 2) Global Depth 1 Hash Prefix 15151 Mozart Music 4000 1 10101 Srinivasan Comp 65000 12121 Wu Finance 90000

99 Illustrating an Extendible Hash
Step 2: Re-hash all the old records Global Depth 1 Local Depth 1 Hash Prefix 15151 Mozart Music 4000 1 1 10101 Srinivasan Comp 65000 12121 Wu Finance 90000

100 Illustrating an Extendible Hash
Where will this Physics record go? Global Depth 1 Local Depth 1 Hash Prefix 15151 Mozart Music 4000 1 1 10101 Srinivasan Comp 65000 12121 Wu Finance 90000 Insert this Record

101 Illustrating an Extendible Hash
One more directory split Global Depth 1 Local Depth 1 Hash Prefix 15151 Mozart Music 4000 1 1 10101 Srinivasan Comp 65000 12121 Wu Finance 90000 Insert this Record

102 Illustrating an Extendible Hash
Global Depth 2 Hash Prefix 15151 Mozart Music 4000 00 01 10 10101 Srinivasan Comp 65000 12121 Wu Finance 90000 11

103 Illustrating an Extendible Hash
Re-inserting the old + new records (using idea 2) Global Depth 2 Hash Prefix 15151 Mozart Music 4000 00 01 10 12121 Wu Finance 90000 11 10101 Srinivasan Comp 65000

104 Illustrating an Extendible Hash
Re-inserting the old + new records (using idea 2) Global Depth 2 Hash Prefix 15151 Mozart Music 4000 00 01 10 12121 Wu Finance 90000 11 22222 Einstein Physics 95000 10101 Srinivasan Comp 65000

105 Illustrating an Extendible Hash
Re-inserting the old + new records (using idea 2) Global Depth 2 Hash Prefix 15151 Mozart Music 4000 00 01 10 12121 Wu Finance 90000 11 22222 Einstein Physics 95000 10101 Srinivasan Comp 65000

106 Illustrating an Extendible Hash
What will be the local depth of these buckets? Global Depth 2 Hash Prefix 15151 Mozart Music 4000 00 01 10 12121 Wu Finance 90000 11 22222 Einstein Physics 95000 10101 Srinivasan Comp 65000

107 Illustrating an Extendible Hash
Global Depth 2 Local Depth 1 Hash Prefix 15151 Mozart Music 4000 00 01 Local Depth 2 10 12121 Wu Finance 90000 11 22222 Einstein Physics 95000 Local Depth 2 10101 Srinivasan Comp 65000

108 Illustrating an Extendible Hash
Global Depth 2 Local Depth 1 Hash Prefix 15151 Mozart Music 4000 00 01 Local Depth 2 10 12121 Wu Finance 90000 11 22222 Einstein Physics 95000 10101 Srinivasan Comp 65000 Insert one record Raj with Aero dept. Assume H(Aero) = 010……

109 Illustrating an Extendible Hash
Global Depth 2 Local Depth 1 Hash Prefix 15151 Mozart Music 4000 00 15100 Raj Aero 3400 01 Local Depth 2 10 12121 Wu Finance 90000 11 22222 Einstein Physics 95000 10101 Srinivasan Comp 65000

110 Illustrating an Extendible Hash
Global Depth 2 Local Depth 1 Hash Prefix 15151 Mozart Music 4000 00 15100 Raj Aero 3400 01 Local Depth 2 10 12121 Wu Finance 90000 11 22222 Einstein Physics 95000 10101 Srinivasan Comp 65000 Insert one record with Ramesh Mech dept. H(Mech) = 011……

111 Illustrating an Extendible Hash
Global Depth 2 Local Depth 2 Hash Prefix 15151 Mozart Music 4000 00 01 16251 Ramesh Mech 4500 10 15100 Raj Aero 3400 11 Local Depth 2 12121 Wu Finance 90000 22222 Einstein Physics 95000 10101 Srinivasan Comp 65000

112 Illustrating an Extendible Hash
Global Depth 2 Local Depth 2 Hash Prefix 15151 Mozart Music 4000 00 01 16251 Ramesh Mech 4500 10 15100 Raj Aero 3400 11 Local Depth 2 12121 Wu Finance 90000 Insert one record Peter with Civil dept. Assume H(Civil) = 100…… 22222 Einstein Physics 95000 10101 Srinivasan Comp 65000

113 Illustrating an Extendible Hash
Global Depth 2 Local Depth 2 Hash Prefix 15151 Mozart Music 4000 00 01 16251 Ramesh Mech 4500 10 15100 Raj Aero 3400 11 Local Depth 2 12121 Wu Finance 90000 Directory expand as the new record would go into bucket with Fin and Phy 22222 Einstein Physics 95000 10101 Srinivasan Comp 65000

114 For this insertion, we will illustrate idea 1 where keys of an old bucket would be rehashed only when its overflowing. Global Depth 3 Hash Prefix 000 15151 Mozart Music 4000 001 010 16251 Ramesh Mech 4500 011 15100 Raj Aero 3400 100 12000 Peter Civil 20000 22222 Einstein Physics 95000 101 Directory expand as the new record would go into bucket with Fin and Phy 12121 Wu Finance 90000 110 111 10101 Srinivasan Comp 65000

115 Idea 1: Notice than Ramesh and Raj stayed in the same bucket despite having hash prefix 010 and 011. We didn’t create a new bucket as the old one was not overflowing Global Depth 3 Hash Prefix 000 15151 Mozart Music 4000 001 010 16251 Ramesh Mech 4500 011 15100 Raj Aero 3400 100 12000 Peter Civil 20000 22222 Einstein Physics 95000 101 12121 Wu Finance 90000 110 111 10101 Srinivasan Comp 65000

116 2 3 000 2 001 010 3 011 100 3 101 110 2 111 Local Depth Global Depth
15151 Mozart Music 4000 Hash Prefix 000 2 16251 Ramesh Mech 4500 001 15100 Raj Aero 3400 010 3 12000 Peter Civil 20000 011 22222 Einstein Physics 95000 100 3 12121 Wu Finance 90000 101 110 2 111 10101 Srinivasan Comp 65000

117 Illustrating an Extendible Hash
Assume two more records come to this bucket

118 Comments on Extendible Hash
Benefits of extendable hashing: Hash performance does not degrade with growth of file Minimal space overhead Disadvantages of extendable hashing Extra level of indirection to find desired record Bucket address table may itself become very big Cannot allocate very large contiguous areas on disk either Changing size of directory (aka bucket address table) is expensive.

119 Comments on Extendible Hash
Expected type of queries: Hashing is generally better at retrieving records having a specified value of the key. If range queries are common, ordered indices are to be preferred

120 Linear Hashing Allows the hash file to expand and shrink dynamically without needing a directory Use a family of hash functions: h i K =𝐾 𝑚𝑜𝑑 ( 2 𝑗 𝑀);𝑤ℎ𝑒𝑟𝑒 𝑗= 0,1,2,…. File grows linearly  No bucket directory needed


Download ppt "Database System Implementation CSE 507"

Similar presentations


Ads by Google