Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.

Slides:



Advertisements
Similar presentations
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Advertisements

Hashing Dashiell Fryer CS 157B Dr. Lee. Contents Static Hashing Static Hashing File OrganizationFile Organization Properties of the Hash FunctionProperties.
Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Indexing (Cont.) These slides are a modified version of the slides of the book “Database System Concepts” (Chapter 12), 5th Ed., McGraw-Hill,McGraw-Hill.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Indexing.
CM20145 Indexing and Hashing
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
CIS552Indexing and Hashing1 Cost estimation Basic Concepts Ordered Indices B + - Tree Index Files B - Tree Index Files Static Hashing Dynamic Hashing Comparison.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
CST203-2 Database Management Systems Lecture 7. Disadvantages on index structure: We must access an index structure to locate data, or must use binary.
INDEXING AND HASHING.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
Arch (A) Tented Arch (T) Whorl (W) Loop (U or R) The four main classes of fingerprints Loop (60%) Arch/Tented Arch (6%) Whorl (34%) Other (Less than 1%)
1 Indexing and Hashing Indexing and Hashing Basic Concepts Dense and Sparse Indices B+Trees, B-trees Dynamic Hashing Comparison of Ordered Indexing and.
B+-tree and Hash Indexes
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
Indexing and Hashing.
B+ - Tree & B - Tree By Phi Thong Ho.
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Ch12: Indexing and Hashing  Basic Concepts  Ordered Indices B+-Tree Index Files B+-Tree Index Files B-Tree Index Files B-Tree Index Files  Hashing Static.
Indexing and Hashing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Hashing.
Computing & Information Sciences Kansas State University Friday, 24 Oct 2008CIS 560: Database System Concepts Lecture 23 of 42 Friday, 24 October 2008.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Chapter 12: Indexing and Hashing
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Basic Concepts Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key - attribute to set of attributes.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 560: Database System Concepts Lecture 22 of 42 Wednesday, 22 October.
Indexing and Hashing By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 560: Database System Concepts Lecture 25 of 42 Monday, 31 March 2008 William.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Chapter 5 Record Storage and Primary File Organizations
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Dynamic Hashing (Chapter 12)
Hashing CENG 351.
Chapter 12: Indexing and Hashing
Chapter 11: Indexing and Hashing
Database Management Systems (CS 564)
Dynamic Hashing.
Extendible Indexing Dina Said
Indexing and Hashing Basic Concepts Ordered Indices
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Database Design and Programming
2018, Spring Pusan National University Ki-Joune Li
Module 12a: Dynamic Hashing
Presentation transcript:

Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original file –leaf and interior nodes are different B-trees –search key values appear only once –pointer to record/bucket for that search key value always stored with the search key itself, even in interior nodes Hashing Overview –Hash functions ideally uniform, random, easy to compute

Today Overflow Hash file performance Hash indices Dynamic Hashing (Extendable Hashing) Note: HW#3 due next class (April 17) HW #4: due Thursday April 24 (9 days from now) –Questions: 12.11, 12.12, 12.13, 12.16

Overflow Overflow is when an insertion into a bucket can’t occur because it is full. Overflow can occur for the following reasons: –too many records (not enough buckets) –poor hash function –skewed data: multiple records might have the same search key multiple search keys might be assigned the same bucket

Overflow (2) Overflow is handled by one of two methods –chaining of multiple blocks in a bucket, by attaching a number of overflow buckets together in a linked list –double hashing: use a second hash function to find another (hopefully non-full) bucket –in theory we could use the next bucket that had space; this is often called open hashing or linear probing. This is often used to construct symbol tables for compilers useful where deletion does not occur deletion is very awkward with linear probing, so it isn’t useful in most database applications

Hashed File Performance Metrics An important performance measure is the loading factor (number of records)/(B*f) B is the number of buckets f is the number of records that will fit in a single bucket when loading factor too high (file becomes too full), double the number of buckets and rehash

Hashed File Performance (Assume that the hash table is in main memory) Successful search: best case 1 block; worst case every chained bucket; average case half of worst case Unsuccessful search: always hits every chained bucket (best case, worst case, average case) With loading factor around 90% and a good hashing function, average is about 1.2 blocks Advantage of hashing: very fast for exact queries Disadvantage: records are not sorted in any order. As a result, it is effectively impossible to do range queries

Hash Indices Hashing can be used for index-structure creation as well as for file organization A hash index organizes the search keys (and their record pointers) into a hash file structure strictly speaking, a hash index is always a secondary index –if the primary file was stored using the same hash function, an additional, separate primary hash index would be unnecessary –We use the term hash index to refer both to secondary hash indices and to files organized using hashing file structures

Example of a Hash Index Hash index into file account, on search key account-number; Hash function computes sum of digits in account number modulo 7. Bucket size is 2

Static Hashing We’ve been discussing static hashing: the hash function maps search- key values to a fixed set of buckets. This has some disadvantages: –databases grow with time. Once buckets start to overflow, performance will degrade –if we attempt to anticipate some future file size and allocate sufficient buckets for that expected size when we build the database initially, we will waste lots of space –if the database ever shrinks, space will be wasted –periodic reorganization avoids these problems, but is very expensive By using techniques that allow us to modify the number of buckets dynamically (“dynamic hashing”) we can avoid these problems –Good for databases that grow and shrink in size –Allows the hash function to be modified dynamically

Dynamic Hashing One form of dynamic hashing is extendable hashing –hash function generates values over a large range -- typically b-bit integers, with b being something like 32 –At any given moment, only a prefix of the hash function is used to index into a table of bucket addresses –With the prefix at a given moment being j, with 0<=j<=32, the bucket address table size is 2 j –Value of j grows and shrinks as the size of the database grows and shrinks –Multiple entries in the bucket address table may point to a bucket –Thus the actual number of buckets is < 2 j –the number of buckets also changes dynamically due to coalescing and splitting of buckets

General Extendable Hash Structure

Use of Extendable Hash Structure Each bucket j stores a value i j ; all the entries that point to the same bucket have the same values on the first i j bits To locate the bucket containing search key K j ; –compute H(K j ) = X –Use the first i high order bits of X as a displacement into the bucket address table and follow the pointer to the appropriate bucket T insert a record with search-key value K j –follow lookup procedure to locate the bucket, say j –if there is room in bucket j, insert the record –Otherwise the bucket must be split and insertion reattempted in some cases we use overflow buckets instead (as explained shortly)

Splitting in Extendable Hash Structure To split a bucket j when inserting a record with search-key value K j if i> i j (more than one pointer in to bucket j) –allocate a new bucket z –set i j and i z to the old value i j incremented by one –update the bucket address table (change the second half of the set of entries pointing to j so that they now point to z) –remove all the entries in j and rehash them so that they either fall in z or j –reattempt the insert (K j ). If the bucket is still full, repeat the above.

Splitting in Extendable Hash Structure (2) To split a bucket j when inserting a record with search-key value K j if i= i j (only one pointer in to bucket j) –increment i and double the size of the bucket address table –replace each entry in the bucket address table with two entries that point to the same bucket –recompute new bucket address table entry for K j –now i> i j so use the first case described earlier When inserting a value, if the bucket is still full after several splits (that is, i reaches some preset value b), give up and create an overflow bucket rather than splitting the bucket entry table further –how might this occur?

Deletion in Extendable Hash Structure To delete a key value K j locate it in its bucket and remove it the bucket itself can be removed if it becomes empty (with appropriate updates to the bucket address table) coalescing of buckets is possible –can only coalesce with a “buddy” bucket having the same value of i j and same i j -1prefix, if one such bucket exists decreasing bucket address table size is also possible –very expensive –should only be done if the number of buckets becomes much smaller than the size of the table

Extendable Hash Structure Example Hash function on branch name Initial hash table (empty)

Extendable Hash Structure Example (2) Hash structure after insertion of one Brighton and two Downtown records

Extendable Hash Structure Example (3) Hash structure after insertion of Mianus record

Extendable Hash Structure Example (4) Hash structure after insertion of three Perryridge records

Extendable Hash Structure Example (5) Hash structure after insertion of Redwood and Round Hill records

Extendable Hashing vs. Other Hashing Benefits of extendable hashing: –hash performance doesn’t degrade with growth of file –minimal space overhead Disadvantages of extendable hashing –extra level of indirection (bucket address table) to find desired record –bucket address table may itself become very big (larger than memory) need a tree structure to locate desired record in the structure! –Changing size of bucket address table is an expensive operation Linear hashing is an alternative mechanism which avoids these disadvantages at the possible cost of more bucket overflows

Comparison: Ordered Indexing vs. Hashing Each scheme has advantages for some operations and situations. To choose wisely between different schemes we need to consider: –cost of periodic reorganization –relative frequency of insertions and deletions –is it desirable to optimize average access time at the expense of worst-case access time? –What types of queries do we expect? Hashing is generally better at retrieving records for a specific key value Ordered indices are better for range queries