File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Advertisements

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Hashing Dashiell Fryer CS 157B Dr. Lee. Contents Static Hashing Static Hashing File OrganizationFile Organization Properties of the Hash FunctionProperties.
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function.
Hashing and Indexing John Ortiz.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Indexing.
CM20145 Indexing and Hashing
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
CST203-2 Database Management Systems Lecture 7. Disadvantages on index structure: We must access an index structure to locate data, or must use binary.
INDEXING AND HASHING.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
1 Lecture 8: Data structures for databases II Jose M. Peña
BTrees & Bitmap Indexes
Hash Table indexing and Secondary Storage Hashing.
B+-tree and Hashing.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Indexing and Hashing.
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
Spring 2004 ECE569 Lecture ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Ch12: Indexing and Hashing  Basic Concepts  Ordered Indices B+-Tree Index Files B+-Tree Index Files B-Tree Index Files B-Tree Index Files  Hashing Static.
Indexing and Hashing.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Hashing.
Computing & Information Sciences Kansas State University Friday, 24 Oct 2008CIS 560: Database System Concepts Lecture 23 of 42 Friday, 24 October 2008.
Chapter 12: Indexing and Hashing
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
File Processing : Index and Hash 2015, Spring Pusan National University Ki-Joune Li.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Index and Hash 2004, Spring Pusan National University Ki-Joune Li.
Basic Concepts Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key - attribute to set of attributes.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 560: Database System Concepts Lecture 22 of 42 Wednesday, 22 October.
Indexing and Hashing By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 560: Database System Concepts Lecture 25 of 42 Monday, 31 March 2008 William.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Chapter 5 Record Storage and Primary File Organizations
PART 4 DATA STORAGE AND QUERY. Chapter 12 Indexing and Hashing.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Indexing Goals: Store large files Support multiple search keys
Dynamic Hashing (Chapter 12)
Database Management Systems (CS 564)
Dynamic Hashing.
Indexing and Hashing Basic Concepts Ordered Indices
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Database Design and Programming
2018, Spring Pusan National University Ki-Joune Li
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li

STEMPNU Index vs. Hash Index  Needs a Data Structure : such as B+-tree Stored on Disk Primary or Secondary Index  Block number can be determined before the insertion in index Hash  Needs a Hash Function h(v)=b (h : hash function, v : key value, b : block number)  Block number (b) is determined by hash function  where to store data and where to retrieve data  Only Primary Index h h vb Record

STEMPNU Primary Index vs. Secondary Index Secondary Index  Step 1: Store a record in a certain block b e.g. Sequential order  Step 2: Insert the key value v into B+-tree with b  The determination of block number is independent with B+-tree Primary Index  Step 1: Compute hashing function h(v)=b  Step 2: Store the record on block b  The block number is determined by the Hashing Function

STEMPNU Hash Different Keys may map to the Same Block Number  One block may contain more than one record Hash Function for  Insertion  Search  Deletion Static Hash Dynamic Hash

STEMPNU Static Hash Number of Available Blocks : Fixed h(v) :  specifies the block where this record is to be stored “Romeo” “Juliet” “Hamlet” h(v) = 35 h(v) = 13 h(v) = 22 35/m = 2 13/m = 0 22/m = 9 b120b121b122b123 b124b125b126b127 b128b129b130b131 b132b133b134b

STEMPNU Handling of Block Overflow Block overflow can occur because of  Insufficient buckets  Skew in distribution of records  multiple records have same search-key value  hash function produces non-uniform distribution It cannot be eliminated, although the probability of bucket overflow can be reduced,  Need overflow buckets.

STEMPNU Overflow Handling Overflow chaining  linked list for overflow block  closed hashing Next Block  B + h(v) + n Bucket 1 Bucket 0 Bucket 2 Bucket Overflow Bucket

STEMPNU Hash Function Worst Case :  Hash function maps all search-key values to the same bucket Linear Search Time : No meaning Two Conditions  Uniformity  Randomness Typical hash functions :  internal binary representation of the search-key For example, for a string search-key, the binary representations of all the characters in the string could be added and the sum modulo the number of buckets could be returned..

STEMPNU Discussion on Static Hash Static Hash  The bucket number remains unchanged Advantages  Simple  Optimal Hashing Function for static environment When the number of records is fixed : No problem : we prepare a fixed number of blocks When the number of records is variable (DB grows)  If it may exceed the N b *B f Extension of Blocks An Extensible (or Dynamic) Hashing Mechanism is necessary Or Periodic reorganization

STEMPNU Dynamic Hash b 31 b 30 b 29,…b 2 b 1 b 0 i

STEMPNU Dynamic Hash : Example i

STEMPNU Dynamic Hash : Example (3 Records) Overflow +1 Split Overflow +1

STEMPNU Dynamic Hash : Example (4 Records) Split

STEMPNU Dynamic Hash Good for database that grows and shrinks in size  Allows the hash function to be modified dynamically Extendable hashing – one form of dynamic hashing  Hash function generates values over a large range typically b-bit integers, with b = 32.  At any time use only a prefix of the hash function Let the length of the prefix be i bits, 0 ≤ i ≤ 32. Bucket address table size = 2 i. Initially i = 0 Value of i grows and shrinks according to the size of the database  Multiple entries in the bucket address table may point to a bucket. Thus, actual number of buckets is < 2 i The number of buckets also changes dynamically due to coalescing and splitting of buckets.

STEMPNU Index vs. Hash Index  Needs a Data Structure such as B+-tree Requires Disk Accesses : such as node accesses in B+-tree  Range Query and Exact Match Query  Secondary and Primary Index Hash  Need no data structure except hash table : much lighter than tree No disk accesses in general  Exact Match Query For 1-D key value  Primary Index Only