CPSC 461 Final Review I Hessam Zakerzadeh Dina Said.

Slides:



Advertisements
Similar presentations
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advertisements

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
Hash-Based Indexes The slides for this text are organized into chapters. This lecture covers Chapter 10. Chapter 1: Introduction to Database Systems Chapter.
1 Linear Hashing Appendix for Chapter 1. 2 Linear Hashing Allow a hash file to expand and shrink dynamically without needing a directory. Suppose the.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
CPSC 404, Laks V.S. Lakshmanan1 Hash-Based Indexes Chapter 11 Ramakrishnan & Gehrke (Sections )
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 11 – Hash-based Indexing.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Tutorial 8 CSI 2132 Database I. Exercise 1 Both disks and main memory support direct access to any desired location (page). On average, main memory accesses.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
ICS 421 Spring 2010 Indexing (2) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 2/23/20101Lipyeow Lim.
Hash Table indexing and Secondary Storage Hashing.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
ICS 321 Fall 2011 Overview of Storage & Indexing (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/9/20111Lipyeow.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
1 Database Systems ( 資料庫系統 ) November 8, 2004 Lecture #9 By Hao-hua Chu ( 朱浩華 )
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Static Hashing (using overflow for collision managment e.g., h(key) mod M h key Primary bucket pages 1 0 M-1 Overflow pages(as separate link list) Overflow.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
B-Trees, Part 2 Hash-Based Indexes R&G Chapter 10 Lecture 10.
Chapter 5 Record Storage and Primary File Organizations
Hash-Based Indexes. Introduction uAs for any index, 3 alternatives for data entries k*: w Data record with key value k w w Choice orthogonal to the indexing.
Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats.
Database Applications (15-415) DBMS Internals- Part V Lecture 14, Oct 18, 2016 Mohammad Hammoud.
Tree-based Indexing Hessam Zakerzadeh.
COP Introduction to Database Structures
Hash-Based Indexes Chapter 11
Insert using Linear Hashing
Database Applications (15-415) DBMS Internals- Part V Lecture 17, March 20, 2018 Mohammad Hammoud.
Extendible Indexing Dina Said
Introduction to Database Systems
External Memory Hashing
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hash-Based Indexes R&G Chapter 10 Lecture 18
Hash-Based Indexes Chapter 10
Extendible Hashing Primarily used for storage of files on disk
CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hashing.
Hash-Based Indexes Chapter 11
Index tuning Hash Index.
Database Systems (資料庫系統)
LINEAR HASHING E0 261 Jayant Haritsa Computer Science and Automation
Index tuning Hash Index.
Hash-Based Indexes Chapter 11
Chapter 11 Instructor: Xin Zhang
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Presentation transcript:

CPSC 461 Final Review I Hessam Zakerzadeh Dina Said

9.1) What is the most important difference between a disk and a tape?

Tapes are sequential devices that do not support direct access to a desired page. We must essentially step through all pages in order. Disks support direct access to a desired page.

Exercise 11.4 Answer the following questions about Linear Hashing: 1. How does Linear Hashing provide an average-case search cost of only slightly more than one disk I/O, given that overflow buckets are part of its data structure?

Linear Hashing No directory More flexibility wrt time for bucket splits Worse performance than Extendible Hashing if data is skewed. Utilizes a family of Hash function h0,h1,… such that hi(v)=h(v) mod 2 i N – N is the initial number of buckets – If N is power of 2 d0, then apply h and look at the last di bits → di=d0+1

Inserting a Data Entry in LH Find bucket by applying h Level / h Level+1 :  If the bucket to insert into is full: Add overflow page and insert data entry. (Maybe) split Next bucket and increment Next.  Else simply insert the data entry into the bucket.

Bucket Split A split can be triggered by  the addition of a new overflow page  conditions such as space utilization Whenever a split is triggered,  the Next bucket is split,  and hash function h Level+1 redistributes entries between this bucket (say bucket number b) and its split image;  the split image is therefore bucket number b+N Level.  Next  Next + 1.

Example: Insert 44 (11100), 9 (01001) 0 h h 1 Level=0, Next=0, N= PRIMAR Y PAGE S 44 * 36 * 32 * 25 * 9*9* 5*5* 14 * 18 * 10 * 30 * 31 * 35 * 11 * 7*7* ( This info is for illustration only!) Next=0

Example: Insert 43 (101011) 0 h h 1 Level=0, N= Next=0 PRIMAR Y PAGE S 44 * 36 * 32 * 25 * 9*9* 5*5* 14 * 18 * 10 * 30 * 31 * 35 * 11 * 7*7* ( This info is for illustration only!) 0 h h 1 Level= Next=1 PRIMAR Y PAGE S OVERFLO W PAGE S * 36 * 32 * 25 * 9*9* 5*5* 14 * 18 * 10 * 30 * 31 * 35 * 11 * 7*7* 43 * ( This info is for illustration only!) ç

Example: End of a Round 0 h h 1 22* Next= Level=0, Next = 3 PRIMARY PAGES OVERFLOW PAGES 32* 9* 5* 14* 25* 66* 10* 18* 34* 35*31* 7* 11* 43* 44*36* 37*29* 30* 0 h h 1 37* Next= PRIMARY PAGES OVERFLOW PAGES 11 32* 9*25* 66* 18* 10* 34* 35* 11* 44* 36* 5* 29* 43* 14* 30* 22* 31*7* 50* Insert 50 (110010) Level=1, Next = 0

Exercise 11.4 Answer the following questions about Linear Hashing: 1. How does Linear Hashing provide an average-case search cost of only slightly more than one disk I/O, given that overflow buckets are part of its data structure?

If we start with an index which has B buckets, during the round all the buckets will be split in order, one after the other. A hash function is expected to distribute the search key values uniformly in all the buckets  A split can be triggered by C onditions such as space utilization → length of the overflow chain reduces.  Therefore, number of overflow pages isn't expect to be more than 1

Exercise 11.4 Answer the following questions about Linear Hashing: Does Linear Hashing guarantee at most one disk access to retrieve a record with a given key value?

Exercise 11.4 Answer the following questions about Linear Hashing: Does Linear Hashing guarantee at most one disk access to retrieve a record with a given key value? No. Overflow chains are part of the structure, so no such guarantees are provided

Exercise 11.4 Answer the following questions about Linear Hashing: If a Linear Hashing index using Alternative (1) for data entries contains N records, with P records per page and an average storage utilization of 80 percent, what is the worst- case cost for an equality search? Under what conditions would this cost be the actual search cost?

Maximum Number of records in each page = 0.8 * P If all keys map to the same bucket  We will have (N / 0.8P) pages in that bucket.  This is the worst time

Exercise 11.4 Answer the following questions about Linear Hashing: If the hash function distributes data entries over the space of bucket numbers in a very skewed (non-uniform) way, what can you say about the space utilization in data pages?

Space utilization =  Total Number of buckets / Total Number of pages If data is skewed:  All records are mapped to the same bucket Suppose that we have m main pages  All records will be mapped to bucket 0  Each additional overflow will cause split  Suppose we added n overflow pages to bucket 0 → we added n buckets  Total Number of buckets = n+1  Total Number of pages = m + n +n  Space Utilization = (n+1) / (m+2n) < 50% → Very bad

13.4

10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page The page is 4K For Pass 0: Ceil(10*10^6 / 320)= Runs Read Cost per Run = ( *320) Write Cost per Run = ( *320) Total I/O cost =  No of Runs * (Cost of read + Cost of Write) = * 2* (15+320) → Cost of Pass 0

10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page The page is 4K Total Cost for subsequent merges =  No. of Passes * (Read Cost + Write Cost)  No. of passes = ceil (log noOfWay 31250) = ceil ( ln / ln No. of ways) Read/Write Cost: = No. of blocks * ( * No. of pages per block)  No. of blocks= Ceil (10*10^6 / No. of pages per block)

10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page b) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges. Total Cost for subsequent merges =  No. of Passes * (Read Cost + Write Cost)  No. of passes = = ceil ( ln / ln No. of ways) = ceil ( ln / ln 256) = 2 Read Cost: = 16 *10^7

10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page b) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges. Write Cost: = No. of blocks * ( * No. of pages per block) = * (15+64)  No. of blocks= Ceil (10*10^6 / No. of pages per block) = ceil (10* 10^6 /64) =

10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page b) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges. Total Cost for subsequent merges =  No. of Passes * (Read Cost + Write Cost) = 2* (16*10^ * (15+64))

10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page e) Create four ‘input’ buffers of 64 pages each, create an ‘output’ buffer of 64 pages, and do four-way merges. Total Cost for subsequent merges =  No. of Passes * (Read Cost + Write Cost) No. of passes = ceil ( ln / ln No. of ways) =8 Read/Write Cost: = No. of blocks * ( * No. of pages per block) = * (15+64)  No. of blocks= Ceil (10*10^6 / No. of pages per block) = ceil (10* 10^6 /64) = Total Cost=8 * (2 * * (15+64))