Spring 2003 ECE569 Lecture 04-2.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang www.ece.rutgers.edu/~yyzhangwww.ece.rutgers.edu/~yyzhang.

Slides:



Advertisements
Similar presentations
Review of Chapter 8 張啟中.
Advertisements

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
Hash-Based Indexes The slides for this text are organized into chapters. This lecture covers Chapter 10. Chapter 1: Introduction to Database Systems Chapter.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
CPSC 404, Laks V.S. Lakshmanan1 Hash-Based Indexes Chapter 11 Ramakrishnan & Gehrke (Sections )
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 11 – Hash-based Indexing.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
ICS 421 Spring 2010 Indexing (2) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 2/23/20101Lipyeow Lim.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Spring 2003 ECE569 Lecture 04.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
Spring 2004 ECE569 Lecture ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
Fall 2004 ECE569 Lecture 04.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang Course.
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 5, 6 of Elmasri “ How index-learning turns no student.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Basic File Structures and Hashing Lectured by, Jesmin Akhter, Assistant professor, IIT, JU.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
Comp 335 File Structures Hashing.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
1 Database Systems ( 資料庫系統 ) November 8, 2004 Lecture #9 By Hao-hua Chu ( 朱浩華 )
FALL 2005 CENG 351 Data Management and File Structures 1 Hashing.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
Chapter 5 Record Storage and Primary File Organizations
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Database Applications (15-415) DBMS Internals- Part IV Lecture 15, March 13, 2016 Mohammad Hammoud.
Hash-Based Indexes Chapter 11
Hashing CENG 351.
Database Management Systems (CS 564)
Review Graph Directed Graph Undirected Graph Sub-Graph
Introduction to Database Systems
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hash-Based Indexes Chapter 10
CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hash-Based Indexes Chapter 11
Index tuning Hash Index.
Advance Database System
Database Design and Programming
2018, Spring Pusan National University Ki-Joune Li
Chapter 11 Instructor: Xin Zhang
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Database Implementation Issues
Presentation transcript:

Spring 2003 ECE569 Lecture ECE 569 Database System Engineering Spring 2003 Yanyong Zhang Course URL

Spring 2003 ECE569 Lecture Access Paths  Associative access can be realized via scan.  The class of algorithms and data structures designed for translating attribute values into TID, or into other types of internal addresses of tuples having those attribute values, is called access paths.  Depending on what kind of selection predicate is to be supported, the techniques for associative access vary greatly.

Spring 2003 ECE569 Lecture Content addressability techniques  Primary key access. l A tuple of a relation must be retrieved efficiently via the value of its primary (unique) key(s). e.g., key-sequenced files and hased files. l Point query vs. range query  Secondary key access l A set of tuples are produced  Multi-table access l Tuple access is often based on relationships between different tuples.

Spring 2003 ECE569 Lecture Operations on files  Assumptions l n = number of records in file l R = number of records that can fit in block  Lookup – Given a key find corresponding record l On average, n / (2R) block accesses.  Insertion – add record to file (allows duplicates) l Read last block; it may need to allocate a new block. Approximately, requires 2 accesses  Deletion – delete record l look up record n / (2R) l Write back to disk (1 access) l Reorganize (unpinned) – move tuple from last page to utilize space (2 disk accesses)

Spring 2003 ECE569 Lecture Hashed Files  File is divided into B buckets  Hash function h maps elements of the key space to range [0, B) l Key space is large and unevenly distributed -SSNs as character strings -Each character takes on at most 10 of the possible 256 values Hash function h must map key values evenly among a relatively small number of values.  A bucket directory is an array of B pointers to the allocated buckets. l Small enough to fit entirely in memory l Buckets are allocated only as they are needed.

Spring 2003 ECE569 Lecture Hashed files

Spring 2003 ECE569 Lecture Hash-based associative access FOLDING HASHING Range of positive integers tuple address space Range of Potential Key Values (the shaded areas denote used key values)

Spring 2003 ECE569 Lecture Folding  Convert arbitrary data types to a positive integer h can be applied to.  Reduce number of bits so that arithmetic is efficient.  Example: Key is “Keefe” and l Key value is the concatenation of byte representation of individual fields l Folded value of key is 0x4b 0x65 0x65 0x66 0x65 0x0 0x0 0x0 0x41 0xa3 Partition result into words and combine using XOR 0x4b 0x65 0x65 0x66 0x65 0x0 0x0 0x0 0x41 0xa3 0x0 0x0  0x6f 0xc6 0x65 0x66 =

Spring 2003 ECE569 Lecture Hashing  goal of hasing  How to choose hash function if all the key values are uniformly distributed?  The critical issue is to produce 1:1 mapping  Collision: different inputs are mapped to the same output.  The criteria of a good hash function is to keep the collision as small as possible.

Spring 2003 ECE569 Lecture Static Hashing  Input: folded key values  Output: bytes (relative to the beginning of the file), blocks ?? l Bytes are not good because of the varying tuple size. l A block/page is called a bucket.  H: {0 … } -> {0, B-1} l Continuous allocation l Fixed size: B pages are allocated at file creation time. l Insert -Determine the bucket -Check the bucket ( collision may happen)

Spring 2003 ECE569 Lecture How to find a good hash function  Division / remainder (Congruential hashing) H(K b ) = k b mod B where k b is folded key value and B is the number of buckets.  Nth power l Compute k b N, and from the resulting bit string (n x 31 bits) take log 2 B bits from the middle.  Base transformation  Polynomial division  Numerical analysis  encryption

Spring 2003 ECE569 Lecture Performance  Assumption l Perfect hash function (tuples are uniformly distributed over B buckets)  Lookup l  ½  n/R  1/B  To finish first match l  n/R  1/B  If tuple does not exist  Insertion l  n/R  1/B  + 1Test for duplicates l 1Otherwise  Deletion l  ½  n/R  1/B  delete first match

Spring 2003 ECE569 Lecture Collision  Two keys collide if they hash to same value  A bucket with room for R tuples can accommodate R – 1 collisions before it overflows l Internal resolution: Place overflow blocks in another bucket -(h(K) + 1) mod Bopen addressing -(h2(h1(K))multiple hashing

Spring 2003 ECE569 Lecture Collision - continued l External resolution: Allocation overflow block, link to overflow chain bucketsOverflow pages

Spring 2003 ECE569 Lecture Discussion  How do you limit the number of pages accessed when retrieving a tuple, for both external and internal resolution?

Spring 2003 ECE569 Lecture How to locate a tuple in a page?  Sequential search  Page directory  hash

Spring 2003 ECE569 Lecture Extendible Hashing  The number of buckets can grow/shrink.  An intermediate data structure translates the hash results into page addresses. This data structure needs to be as compact as possible. l Hashes into an array of pointer to buckets (directory). l The array is small enough to be kept in memory.

Spring 2003 ECE569 Lecture Directory Growth To adapt to dynamically varying size of hash file- modify directory size Assume a hash function h(K b ) that produces a bit string s. The directory is of size 2 d. d is called the global depth and is initially 0. Use least significant d bits of s to determine bucket to access Each bucket has a corresponding local depth in the range [0, d]

Spring 2003 ECE569 Lecture Example  Insert 0x13, 0x10, 0x07, 0x00, 0x1f  Each page can contain no more than 2 tuples

Spring 2003 ECE569 Lecture Example – insert 0x1f

Spring 2003 ECE569 Lecture Performance  2 steps for retrieving a tuple  If we can keep the directory in memory, each retrieval is one page access  Assuming 4 bytes per entry, 4KB pages, 1GB hash files, and we want to keep the entire directory in memory, what is the minimum buffer size?

Spring 2003 ECE569 Lecture Discussion  How easy is it to keep the directory in the memory?  How do we reduce the structure when the file shrinks?