Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID.

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Advertisements

Disk Storage, Basic File Structures, and Hashing.
Disk Storage, Basic File Structures, and Hashing
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
Hash-Based Indexes The slides for this text are organized into chapters. This lecture covers Chapter 10. Chapter 1: Introduction to Database Systems Chapter.
6. Files of (horizontal) Records
Copyright 2003Curt Hill Hash indexes Are they better or worse than a B+Tree?
CST203-2 Database Management Systems Lecture 7. Disadvantages on index structure: We must access an index structure to locate data, or must use binary.
Spring 2003 ECE569 Lecture ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
CS 728 Advanced Database Systems Chapter 16
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Database Systems Chapters ITM 354. The Database Design and Implementation Process Phase 1: Requirements Collection and Analysis Phase 2: Conceptual.
Spring 2004 ECE569 Lecture ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
Hashing General idea: Get a large array
1 Lecture 7: Data structures for databases I Jose M. Peña
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 5, 6 of Elmasri “ How index-learning turns no student.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
CHAPTER 13:DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING Disk Storage, Basic File Structures, and Hashing Copyright © 2007 Ramez Elmasri and Shamkant.
1 Chapter 1 Disk Storage, Basic File Structures, and Hashing. Adapted from the slides of “Fundamentals of Database Systems” (Elmasri et al., 2003)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Disk Storage, Basic File Structures, and Hashing
Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright © 2004 Pearson Education, Inc.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Basic File Structures and Hashing Lectured by, Jesmin Akhter, Assistant professor, IIT, JU.
Comp 335 File Structures Hashing.
CS4432: Database Systems II Record Representation 1.
1 Overview of Database Design Process. Data Storage, Indexing Structures for Files 2.
Hashing Hashing is another method for sorting and searching data.
FALL 2005 CENG 351 Data Management and File Structures 1 Hashing.
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats.
Static Hashing (using overflow for collision managment e.g., h(key) mod M h key Primary bucket pages 1 0 M-1 Overflow pages(as separate link list) Overflow.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
HW3: Heap-File Page Instructors: Winston Hsu, Hao-Hua Chu Fall 2010 This document is supplementary document that was created by referring Minibase Project.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Lec 5 part2 Disk Storage, Basic File Structures, and Hashing.
Chapter 15 Disk Storage, Basic File Structures, and Hashing. Copyright © 2004 Pearson Education, Inc.
Chapter 5 Record Storage and Primary File Organizations
CS4432: Database Systems II
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Jun-Ki Min. Slide  Such a multi-level index is a form of search tr ee ◦ However, insertion and deletion of new index entrie s is a severe problem.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Disk Storage, Basic File Structures, and Hashing
CS522 Advanced database Systems
Hashing CENG 351.
Disk Storage, Basic File Structures, and Hashing
9/12/2018.
Disk Storage, Basic File Structures, and Hashing
Disk Storage, Basic File Structures, and Hashing.
1/17/2019.
Advance Database System
Database Systems (資料庫系統)
Lec 7:Disk Storage, Basic File Structures, and Hashing
CPS216: Advanced Database Systems
Presentation transcript:

Today Review of Directory of Slot Block Organizations Heap Files Program 1 Hints Ordered Files & Hash Files RAID

Directory of Slots Example

Heap Files Heap files are stored as unordered records –the use of “heap” here is unrelated to the “free store” used for dynamic memory allocation The simplest organization

Heap File Example Paradise, Sal231 Favor, Sue123 Mach, Chris401 Rodgers, Bill616 Smith, Mary Yost, Ned819 Alm, Louis Link, Steve Patch, Linda Jones, Jim Ming, Yao Turing, Alan block 1 of fileblock 2 of fileblock N of file Name ID assuming N data blocks, R records per block, I/O cost of D and “record processing time” of C what are the costs of: 2D+C N(D+RC) N(D+RC)/2 N(D+RC) inserting a record, deleting record given RID (ignore reclaiming space) scan, search for “key” w/ equality selection, search w/ range selection?

Prog 1 Hints check slot numbers to make sure they are valid sizeof(), memcpy(), memmove() –man is your friend test patterns – make sure you handle error cases correctly error reporting

class HFPage { struct slot_t { short offset; short length; }; // equals EMPTY_SLOT if slot is not in use static const int DPFIXED = sizeof(slot_t) + 4 * sizeof(short)+ 3 * sizeof(PageId); short slotCnt; // number of slots in use short usedPtr; // offset of first used byte in data[] short freeSpace; // number of bytes free in data[] short type; // an arbitrary value used by subclasses as needed PageId prevPage; // backward pointer to data page PageId nextPage; // forward pointer to data page PageId curPage; // page number of this page slot_t slot[1]; // first element of slot array. char data[MAX_SPACE - DPFIXED]; // methods...

// ********************************************************** // page class constructor void HFPage::init(PageId pageNo){ nextPage = prevPage = INVALID_PAGE; slotCnt = 0; // no slots in use curPage = pageNo; usedPtr = sizeof(data); // offset of used space in data array freeSpace = sizeof(data) + sizeof(slot_t); // amount of space available // (initially one unused slot) }

init() getNextPage(), setNextPage() getPrevPage(), setPrevPage() insertRecord(), deleteRecord() firstRecord(), nextRecord() getRecord(), returnRecord() available_space() empty()

int HFPage::available_space(void) { // look for an empty slot. if one exists, then freeSpace // bytes are available to hold a record. int i; for (i=0; i < slotCnt; i++) { if (slot[i].length == EMPTY_SLOT) return freeSpace; } // no empty slot exists. must reserve sizeof(slot_t) bytes // from freeSpace to hold new slot. return freeSpace - sizeof(slot_t); }

Ordered Files Also called a sequential file. File records are kept sorted by the values of an ordering field. Insertion is expensive: records must be inserted in the correct order. –It is common to keep a separate unordered overflow (or transaction) file for new records to improve insertion efficiency; this is periodically merged with the main ordered file. A binary search can be used to search for a record on its ordering field value. –This requires reading and searching log 2 of the file blocks on the average, an improvement over linear search. Reading the records in order of the ordering field is quite efficient.

File of Ordered Records

Hashed Files Hashing for disk files is called External Hashing The file blocks are divided into M equal-sized buckets, numbered bucket 0, bucket 1,..., bucket M-1. –Typically, a bucket corresponds to one (or a fixed number of) disk block. One of the file fields is designated to be the hash key of the file. The record with hash key value K is stored in bucket i, where i=h(K), and h is the hashing function. Search is very efficient on the hash key. Collisions occur when a new record hashes to a bucket that is already full. –An overflow file is kept for storing such records. –Overflow records that hash to each bucket can be linked together.

Hashed Files (contd.)

There are numerous methods for collision resolution, including the following: –Open addressing: Proceeding from the occupied position specified by the hash address, the program checks the subsequent positions in order until an unused (empty) position is found. –Chaining: For this method, various overflow locations are kept, usually by extending the array with a number of overflow positions. In addition, a pointer field is added to each record location. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location. –Multiple hashing: The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.

Hashed Files (contd.) To reduce overflow records, a hash file is typically kept 70-80% full. The hash function h should distribute the records uniformly among the buckets –Otherwise, search time will be increased because many overflow records will exist. Main disadvantages of static external hashing: –Fixed number of buckets M is a problem if the number of records in the file grows or shrinks. –Ordered access on the hash key is quite inefficient (requires sorting the records).

Hashed Files - Overflow handling

Fill in This Table Heap Sorted Hashed scan equality range insert delete search search