CS4432: Database Systems II

Slides:



Advertisements
Similar presentations
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Advertisements

Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
Hashing and Indexing John Ortiz.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
File and Index Structure
CS CS4432: Database Systems II Basic indexing.
Chapter 8 File organization and Indices.
1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 February 19, 2004 INDEXING I Lecture based on [GUW,
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
CS 4432lecture #71 CS4432: Database Systems II Lecture #7 Professor Elke A. Rundensteiner.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Primary Indexes Dense Indexes
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
Indexing dww-database System.
Storage and Indexing February 26 th, 2003 Lecture 19.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Index Tuning Conventional index Secondary index To speed up queries on attributes not within primary key Primary index –Determine.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Indexing.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Index Tuning Conventional index. Overview.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
File Organizations and Indexing
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 Jianping Fan Dept of Computer Science UNC-Charlotte.
CS 405G: Introduction to Database Systems 12. Index.
1 Ullman et al. : Database System Principles Notes 4: Indexing.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Storage and File Organization
Module 11: File Structure
Record Storage, File Organization, and Indexes
Indexing and hashing.
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Lecture 20: Indexing Structures
Database Management Systems (CS 564)
CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Database Implementation Issues
File organization and Indexing
Chapter 11: Indexing and Hashing
Lecture 12 Lecture 12: Indexing.
(Slides by Hector Garcia-Molina,
Indexing and Hashing Basic Concepts Ordered Indices
Lecture 19: Data Storage and Indexes
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Chapter 11 Indexing And Hashing (1)
DATABASE IMPLEMENTATION ISSUES
CSE 544: Lecture 11 Storing Data, Indexes
INDEXING.
CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
File Organization.
Database Implementation Issues
Chapter 11: Indexing and Hashing
Advance Database System
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.
Database Implementation Issues
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

CS4432: Database Systems II Indexing-Basics

Locating Records: Table Scans Select ID, name, address From R Where ID = 1000; Naïve way  Table Scan Open the heap file of relation R Access each data page Access each record in each page Check the condition Data Page 1 Page 2 Page N Header Page DIRECTORY

Locating Records: Table Scans Data Page 1 Page 2 Page N Header Page DIRECTORY Open the heap file of relation R Access each data page Access each record in each page Check the condition Only 1 memory block What is the least amount of memory needed for Table Scan?

Locating Records: Index Scan Select ID, name, address From R Where ID = 1000; Table Scan is always an existing option But it is not efficient, especially for large relations  Indexing & Index Scans are more efficient ways Depends on whether or not you have an index

Basic Concepts Indexing mechanisms are used to speed up access to desired data. Search Key - attribute to set of attributes used to look up records in a file. An index file consists of records (called index entries) of the form Search key (search attribute) Select ID, name, address From R Where ID = 1000; search-key pointer

Basic Concepts (Cont’d) An index file consists of records (called index entries) of the form search-key pointer Index files are typically much smaller than the original file Types of indexes Dense vs. Sparse Primary vs. Secondary One-Level vs. Multi-Level

Index Evaluation Metrics Access time Insertion time Deletion time Space overhead Access types supported. E.g., Equality Search ( x = 100): records with a specified value in the attribute Range Search ( 10 < x < 100): records with an attribute value falling in a specified range of values. Savings here Overheads here

Sequential Files & Primary Indexes File where records are ordered on the indexed column

Dense Index on Ordered File 10 20 30 40 50 60 70 80 90 100 110 120 Sequential File Ordered (Sequential) File Records are stored sorted based on the indexed attribute Dense Index Has one entry for each data tuple 20 10 40 30 60 50 80 70 100 90

Dense Index on Ordered File 10 20 30 40 50 60 70 80 90 100 110 120 Sequential File 20 10 #entries in index = #records in file 40 30 60 50 But the index size is much smaller than the file size 80 70 100 90

Dense Index: Locate Key = 100 Index Scan Read each page from the index Search for key = 100 Follow the pointer  (Record Id) Index Binary Search Since all keys are sorted Read middle page in index Either you find the key, Or Move up or down

Sparse Index On Ordered File 10 30 50 70 90 110 130 150 170 190 210 230 Sequential File Sparse Index An entry for only the 1st record in each data block 20 10 40 30 Sparse Index is smaller than a dense index 60 50 80 70 100 90

Sparse Index On Ordered File Can we build a sparse index on unordered file? Sparse Indexes can be built ONLY on ordered (sequential files)

Sparse Index: Locate Key = 100 Index Binary Search still works Either locate the search key in the index, Or Locate the largest key smaller than your search key Follow the pointer and check the data block

Multi-Level Index 1st level Index file is just a file with sorted keys 10 90 170 250 330 410 490 570 Sparse 2nd level 1st level Index file is just a file with sorted keys We can build a 2nd level index on top of it Is the index file always sorted? Is the 2nd level sparse or dense? Can it be dense?

2nd, 3rd, … levels have to be sparse (otherwise no savings) Multi-Level Index 10 90 170 250 330 410 490 570 Sparse 2nd level Is the index file always sorted? Yes Is the 2nd level sparse or dense? Can it be dense? 2nd, 3rd, … levels have to be sparse (otherwise no savings)

Index without Pointers Note : If file is contiguous, then we can omit pointers Index start with a pointer to the first block, then a list of keys (one for each block) If we need Key = K3 3rd key  check the 3rd block Location: first pointer + (3-1)*1024 CS 4432 lecture #8

Sparse vs. Dense Indexes Less space Better for insertion Only for sorted files (or higher-level indexes) Dense More space Must use for unsorted files (secondary indexes) Can tell if record does not exist without checking the data file

Files with Duplicate Keys: Dense Index Entry in the index for each value 10 10 10 10 10 10 10 10 20 10 20 10 20 20 Too much wasted space 30 20 30 20 20 20 30 30 30 30 30 30 30 30 45 40 45 40

Files with Duplicate Keys: Dense Index (Compact Design) 10 10 Entry in the index for each distinct value 10 20 30 20 10 20 10 40 30 20 30 20 How to locate key 35 It does not exist in the index and the index is dense No need to search the data file 30 30 45 40 45 40

Files with Duplicate Keys: Sparse Index One index entry for the 1st record in each block 10 20 30 45 40 10 10 20 careful if looking for 20 or 30! 30

Sequential (Ordered) File Insertion/Deletion

Sparse Index: Deletion Delete record 40 20 10 10 30 50 Index requires no organization Data block will have some empty space Good to have 40 30 70 90 60 50 110 130 80 70 150

Sparse Index: Deletion Delete record 30 20 10 10 40 30 The value 30 in the index will change Record 40 may or may not move 50 40 30 70 60 50 90 110 130 80 70 150

Sparse Index: Deletion Delete records 30 & 40 20 10 10 50 70 30 In the data file, Block 2 will be deleted In the index file, do not create empty spaces in the middle Can have empty spaces at the end 50 40 30 70 60 50 90 110 130 80 70 150

Dense Index: Deletion Delete record 30 10 Same ideas and mechanisms 20 Dense indexes may trigger more updates in the index Record 40 may or may not move in its data blocks Index cannot have free slots in the middle 20 40 30 40 30 40 40 60 50 50 60 70 80 70 80

Sparse Index: Insertion Insert record 34 20 10 Good to have free space in each data block Especially if the file is ordered DBMSs may keep x%, e.g., 10%, free to make insertions easier 10 30 30 40 34 our lucky day! we have free space where we need it! 60 50 40 60

Sparse Index: Insertion Insert record 15 20 10 Approach 1 (Immediate Organization) Move the data records within a block or across blocks to make space for the new record 15 20 30 10 30 30 40 60 50 40 60 Other Cheaper Variations ??

What about inserting 15 instead of 25 ?? Use Of Overflow Blocks Insert record 25 20 10 25 overflow blocks (reorganize later...) 10 30 40 30 60 50 40 What about inserting 15 instead of 25 ?? 60 Record 20 will move the overflow bucket Still index will not change

Insertion, dense index case Similar Often more expensive . . .

Remember… Primary Index is: Big Advantage An index on the ordering column (the column on which the data file is sorted Can be dense or sparse Can be one-level or multi-level Big Advantage Records having the same key (or adjacent keys) are in the same (or adjacent) data blocks Leads to sequential I/Os

Back to Bigger Picture

SQL Query Assume an index is built on column ID Select ID, name, address From R Where ID = 100; Assume an index is built on column ID 2nd-Level Index heap file R heap file 1st-Level Index heap file

Un-Ordered Files & Secondary Indexes File where records are not ordered on the indexed column

Secondary Indexes Can we build a sparse index on un-ordered column?? No. We must have an index entry for each data record. The file may be ordered on another column, say Name. An index on the Name column is primary index (Can be sparse or dense) An index on any other column, say ID, is called secondary index (has to be dense)

Secondary Indexes does not make sense! Sparse index 30 20 80 100 90 50 30 30 20 80 100 70 20 90 ... 40 80 10 100 60 90

Secondary Indexes 10 20 30 An index entry for each data record 40 50 60 70 ... 50 30 70 20 40 80 10 100 60 90 An index entry for each data record Pointers are cause random I/Os (even for same or adjacent values)

Multi-Level Secondary Indexes 10 20 30 40 50 60 70 ... 50 30 70 20 40 80 10 100 60 90 10 50 90 ... 2nd-Level Index (Sparse) 2nd level can be sparse because the 1st level index is a sorted file Lowest level is dense Other levels are sparse

Duplicate Values & Secondary Indexes 10 20 40 20 40 10 40 10 40 30

Option 1: Follow the Rules 10 20 10 20 Problem: excess overhead! disk space search time Repeated keys can be many 40 20 20 30 40 40 10 40 10 40 ... 40 30

Option 2: Variable-Size Index Entries 10 20 Problem: Harder to store Slower to read More metadata Information Variable size records 10 40 20 20 40 10 30 40 40 10 40 30

Option 3: Indirection 10 20 Can we build a 2nd level index now? How? 30 40 20 40 .. 40 10 .. .. .. 40 10 One entry for each distinct key 40 30 Each distinct value stored once (Saves space) Each value points to a bucket of pointers to the duplicate values One entry for each data record

A secondary index (with record pointers) on a nonkey field implemented using one level of indirection so that index entries are of fixed length and have unique field values. Example

A Two-Level Primary Index Example