Storage and Indexing February 26 th, 2003 Lecture 19.

Slides:



Advertisements
Similar presentations
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 How index-learning turns no student pale Yet holds.
Advertisements

File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
Indexes An index on a file speeds up selections on the search key fields for the index. Any subset of the fields of a relation can be the search key for.
B+-Trees and Hashing Techniques for Storage and Index Structures
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
1 Tree-Structured Indexes Module 4, Lecture 4. 2 Introduction As for any index, 3 alternatives for data entries k* : 1. Data record with key value k 2.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
1 Overview of Storage and Indexing Chapter 8 (part 1)
1 File Organizations and Indexing Module 4, Lecture 2 “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander.
Tree-Structured Indexes. Introduction v As for any index, 3 alternatives for data entries k* : À Data record with key value k Á Â v Choice is orthogonal.
1 Lecture 18: Indexes Monday, November 10, Midterm Problem 1a: select student.sname, avg(takes.grade) from student, takes where student.sid =
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
File Organizations and Indexing R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears, Roebuck,
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
1 Overview of Storage and Indexing Chapter 8 1. Basics about file management 2. Introduction to indexing 3. First glimpse at indices and workloads.
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8.
Indexing - revisited CS 186, Fall 2012 R & G Chapter 8.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
BTrees & Sorting 11/3. Announcements I hope you had a great Halloween. Regrade requests were due a few minutes ago…
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 “How index-learning turns no student pale Yet holds.
1 Overview of Storage and Indexing Chapter 8 (part 1)
Storage and Indexing1 Overview of Storage and Indexing.
1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander Pope ( )
DATABASE MANAGEMENT SYSTEMS TERM B. Tech II/IT II Semester UNIT-VIII PPT SLIDES Text Books: (1) DBMS by Raghu Ramakrishnan (2) DBMS by Sudarshan.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive.
Overview of Storage and Indexing Content based on Chapter 4 Database Management Systems, (Third Edition), by Raghu Ramakrishnan and Johannes Gehrke. McGraw.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “If you don’t find it in the index, look very.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
File Organizations and Indexing
Storage and Indexing. How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to.
B+-Trees and Hashing, R. Ramakrishnan and J. Gehrke; extended and significantly revised by Ch. Eick 1 B+-Trees and Hashing Techniques for Storage and Index.
Indexing. 421: Database Systems - Index Structures 2 Cost Model for Data Access q Data should be stored such that it can be accessed fast q Evaluation.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
Tree-Structured Indexes Chapter 10
1 Clustered vs. Unclustered Index Index entries Data entries direct search for (Index File) (Data file) Data Records data entries Data entries Data Records.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 Jianping Fan Dept of Computer Science UNC-Charlotte.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
1 Overview of Storage and Indexing Chapter 8. 2 Review: Architecture of a DBMS  A typical DBMS has a layered architecture.  The figure does not show.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “If you don’t find it in the index, look very.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
Tree-Structured Indexes
Storage and Indexes Chapter 8 & 9
CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Lecture 12 Lecture 12: Indexing.
File Organizations and Indexing
File Organizations and Indexing
Lecture 21: Indexes Monday, November 13, 2000.
Overview of Storage and Indexing
Storage and Indexing May 17th, 2002.
Indexing 1.
CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Storage and Indexing.
General External Merge Sort
Files and access methods
Indexing February 28th, 2003 Lecture 20.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.
Presentation transcript:

Storage and Indexing February 26 th, 2003 Lecture 19

Storage and Indexing How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to the data. We consider: –primary storage of the data –additional indexes (very very important).

Cost Model for Our Analysis As a good approximation, we ignore CPU costs: –B: The number of data pages –R: Number of records per page –D: (Average) time to read or write disk page –Measuring number of page I/O’s ignores gains of pre-fetching blocks of pages; thus, even I/O cost is only approximated. –Average-case analysis; based on several simplistic assumptions. *

File Organizations and Assumptions Heap Files: –Equality selection on key; exactly one match. –Insert always at end of file. Sorted Files: –Files compacted after deletions. –Selections on sort field(s). Hashed Files: –No overflow buckets, 80% page occupancy. Single record insert and delete.

Cost of Operations

Indexes If you don’t find it in the index, look very carefully through the entire catalog. Sears, Roebuck and Co., Consumer’s Guide, 1897.

Indexes An index on a file speeds up selections on the search key fields for the index. –Any subset of the fields of a relation can be the search key for an index on the relation. –Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation). An index contains a collection of data entries, and supports efficient retrieval of all data entries k* with a given key value k.

Alternatives for Data Entry k* in Index Three alternatives: À Data record with key value k Á Â Choice of alternative for data entries is orthogonal to the indexing technique used to locate data entries with a given key value k. –Examples of indexing techniques: B+ trees, hash- based structures

Alternatives for Data Entries (2) Alternative 1: –If this is used, index structure is a file organization for data records (like Heap files or sorted files). –At most one index on a given collection of data records can use Alternative 1. (Otherwise, data records duplicated, leading to redundant storage and potential inconsistency.) –If data records very large, # of pages containing data entries is high. Implies size of auxiliary information in the index is also large, typically.

Alternatives for Data Entries (3) Alternatives 2 and 3: –Data entries typically much smaller than data records. So, better than Alternative 1 with large data records, especially if search keys are small. –If more than one index is required on a given file, at most one index can use Alternative 1; rest must use Alternatives 2 or 3. –Alternative 3 more compact than Alternative 2, but leads to variable sized data entries even if search keys are of fixed length.

Index Classification Primary vs. secondary: If search key contains primary key, then called primary index. Clustered vs. unclustered: If order of data records is the same as, or `close to’, order of data entries, then called clustered index. –Alternative 1 implies clustered, but not vice-versa. –A file can be clustered on at most one search key. –Cost of retrieving data records through index varies greatly based on whether index is clustered or not!

Clustered vs. Unclustered Index Data entries ( Index File ) ( Data file ) Data Records Data entries Data Records CLUSTERED UNCLUSTERED

Index Classification (Contd.) Dense vs. Sparse: If there is at least one data entry per search key value (in some data record), then dense. –Alternative 1 always leads to dense index. –Every sparse index is clustered! –Sparse indexes are smaller; Ashby, 25, 3000 Smith, 44, 3000 Ashby Cass Smith Sparse Index on Name Data File Dense Index on Age 33 Bristow, 30, 2007 Basu, 33, 4003 Cass, 50, 5004 Tracy, 44, 5004 Daniels, 22, 6003 Jones, 40, 6003

Index Classification (Contd.) Composite Search Keys: Search on a combination of fields. –Equality query: Every field value is equal to a constant value. E.g. wrt index: age=20 and sal =75 –Range query: Some field value is not a constant. E.g.: age =20; or age=20 and sal > 10 sue1375 bob cal joe nameagesal 12,20 12,10 11,80 13,75 20,12 10,12 75,13 80, Data records sorted by name Data entries in index sorted by Data entries sorted by Examples of composite key indexes using lexicographic order.

Tree-Based Indexes ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student, then scan to find others. –Cost of binary search can be quite high. Simple idea: Create an `index’ file. * Can do binary search on (smaller) index file! Page 1 Page 2 Page N Page 3 Data File k2 kN k1 Index File

Tree-Based Indexes (2) P 0 K 1 P 1 K 2 P 2 K m P m index entry 10*15*20*27*33*37* 40* 46* 51* 55* 63* 97* Root

B+ Tree: The Most Widely Used Index Insert/delete at log F N cost; keep tree height- balanced. (F = fanout, N = # leaf pages) Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries. The parameter d is called the order of the tree. Index Entries Data Entries Root

Example B+ Tree Search begins at root, and key comparisons direct it to a leaf. Search for 5*, 15*, all data entries >= 24* * 3*5* 7*14*16* 19*20* 22*24*27* 29*33*34* 38* 39* 13

B+ Trees in Practice Typical order: 100. Typical fill-factor: 67%. –average fanout = 133 Typical capacities: –Height 4: = 312,900,700 records –Height 3: = 2,352,637 records Can often hold top levels in buffer pool: –Level 1 = 1 page = 8 Kbytes –Level 2 = 133 pages = 1 Mbyte –Level 3 = 17,689 pages = 133 MBytes

Inserting a Data Entry into a B+ Tree Find correct leaf L. Put data entry onto L. –If L has enough space, done! –Else, must split L (into L and a new node L2) Redistribute entries evenly, copy up middle key. Insert index entry pointing to L2 into parent of L. This can happen recursively –To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.)

Insertion in a B+ Tree Insert (K, P) Find leaf where K belongs, insert If no overflow (2d keys or less), halt If overflow (2d+1 keys), split node, insert in parent: If leaf, keep K3 too in right node When root splits, new root has 1 key only K1K2K3K4K5 P0P1P2P3P4p5 K1K2 P0P1P2 K4K5 P3P4p5 (K3, ) to parent

Insertion in a B+ Tree Insert K=19

Insertion in a B+ Tree After insertion

Insertion in a B+ Tree Now insert 25

Insertion in a B+ Tree After insertion 50

Insertion in a B+ Tree But now have to split ! 50

Insertion in a B+ Tree After the split