2018, Spring Pusan National University Ki-Joune Li

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Advertisements

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Hashing Dashiell Fryer CS 157B Dr. Lee. Contents Static Hashing Static Hashing File OrganizationFile Organization Properties of the Hash FunctionProperties.
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Indexing.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
CST203-2 Database Management Systems Lecture 7. Disadvantages on index structure: We must access an index structure to locate data, or must use binary.
INDEXING AND HASHING.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Spring 2004 ECE569 Lecture ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Ch12: Indexing and Hashing  Basic Concepts  Ordered Indices B+-Tree Index Files B+-Tree Index Files B-Tree Index Files B-Tree Index Files  Hashing Static.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Hashing.
Computing & Information Sciences Kansas State University Friday, 24 Oct 2008CIS 560: Database System Concepts Lecture 23 of 42 Friday, 24 October 2008.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Basic Concepts Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key - attribute to set of attributes.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 560: Database System Concepts Lecture 22 of 42 Wednesday, 22 October.
Indexing and Hashing By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Chapter 5 Record Storage and Primary File Organizations
PART 4 DATA STORAGE AND QUERY. Chapter 12 Indexing and Hashing.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Indexing Goals: Store large files Support multiple search keys
Indexing and hashing.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Dynamic Hashing (Chapter 12)
Lecture 21: Hash Tables Monday, February 28, 2005.
Hashing CENG 351.
Chapter 12: Indexing and Hashing
Chapter 11: Indexing and Hashing
Database Management Systems (CS 564)
Dynamic Hashing.
Indexing And Hashing.
External Memory Hashing
Hashing Chapter 11.
Chapter 11: Indexing and Hashing
CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Disk Storage, Basic File Structures, and Hashing
Chapter 11: Indexing and Hashing
Introduction to Database Systems
Indexing and Hashing Basic Concepts Ordered Indices
Index tuning Hash Index.
Advance Database System
File Processing : Index and Hash
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Chapter 11 Indexing And Hashing (1)
Database Design and Programming
CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
CPS216: Advanced Database Systems
Module 12a: Dynamic Hashing
File Processing : Multi-dimensional Index
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

2018, Spring Pusan National University Ki-Joune Li File Processing : Hash 2018, Spring Pusan National University Ki-Joune Li

Index vs. Hash Index Hash Needs a Data Structure : such as B+-tree Stored on Disk Primary or Secondary Index Block number can be determined before the insertion in index Hash Needs a Hash Function h(v)=b (h : hash function, v : key value, b : block number) Block number (b) is determined by hash function where to store data and where to retrieve data Only Primary Index v h b Record

Primary Index vs. Secondary Index Step 1: Store a record in a certain block b e.g. Sequential order Step 2: Insert the key value v into B+-tree with b The determination of block number is independent with B+-tree Primary Index Step 1: Compute hashing function h(v)=b Step 2: Store the record on block b The block number is determined by the Hashing Function

Hash Different Keys may map to the Same Block Number Hash Function for One block may contain more than one record Hash Function for Insertion Search Deletion Static Hash Dynamic Hash

Static Hash Number of Available Blocks : Fixed h(v) : specifies the block where this record is to be stored + 120 “Romeo” “Juliet” “Hamlet” h(v) = 35 h(v) = 13 h(v) = 22 35/m = 2 13/m = 0 22/m = 9 b120 b121 b122 b123 b124 b125 b126 b127 b128 b129 b130 b131 b132 b133 b134 b135

Handling of Block Overflow Block overflow can occur because of Insufficient buckets Skew in distribution of records multiple records have same search-key value hash function produces non-uniform distribution It cannot be eliminated, although the probability of bucket overflow can be reduced, Need overflow buckets.

Overflow Handling Overflow chaining Next Block linked list for overflow block closed hashing Next Block B + h(v) + n Bucket 0 Bucket 1 Bucket Bucket 2 Overflow Bucket

Hash Function Worst Case : Two Conditions Typical hash functions : Hash function maps all search-key values to the same bucket Linear Search Time : No meaning Two Conditions Uniformity Randomness Typical hash functions : internal binary representation of the search-key For example, for a string search-key, the binary representations of all the characters in the string could be added and the sum modulo the number of buckets could be returned. .

Discussion on Static Hash The bucket number remains unchanged Advantages Simple Optimal Hashing Function for static environment When the number of records is fixed : No problem : we prepare a fixed number of blocks When the number of records is variable (DB grows) If it may exceed the Nb*Bf Extension of Blocks An Extensible (or Dynamic) Hashing Mechanism is necessary Or Periodic reorganization

Dynamic Hash b31b30b29,…b2b1b0 i

Dynamic Hash : Example i

Dynamic Hash : Example (3 Records) Overflow +1 Split Overflow +1

Dynamic Hash : Example (4 Records) Split

Dynamic Hash Good for database that grows and shrinks in size Allows the hash function to be modified dynamically Extendable hashing – one form of dynamic hashing Hash function generates values over a large range typically b-bit integers, with b = 32. At any time use only a prefix of the hash function Let the length of the prefix be i bits, 0 ≤ i ≤ 32. Bucket address table size = 2i. Initially i = 0 Value of i grows and shrinks according to the size of the database Multiple entries in the bucket address table may point to a bucket. Thus, actual number of buckets is < 2i The number of buckets also changes dynamically due to coalescing and splitting of buckets.

Index vs. Hash Index Hash Needs a Data Structure such as B+-tree Requires Disk Accesses : such as node accesses in B+-tree Range Query and Exact Match Query Secondary and Primary Index Hash Need no data structure except hash table : much lighter than tree No disk accesses in general Exact Match Query For 1-D key value Primary Index Only