Lecture 12 Lecture 12: Indexing.

Slides:



Advertisements
Similar presentations
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 How index-learning turns no student pale Yet holds.
Advertisements

File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
1 Overview of Storage and Indexing Chapter 8 (part 1)
1 File Organizations and Indexing Module 4, Lecture 2 “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
File Organizations and Indexing R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears, Roebuck,
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
1 Overview of Storage and Indexing Chapter 8 1. Basics about file management 2. Introduction to indexing 3. First glimpse at indices and workloads.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Storage and Indexing February 26 th, 2003 Lecture 19.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
1 IT420: Database Management and Organization Storage and Indexing 14 April 2006 Adina Crăiniceanu
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 “How index-learning turns no student pale Yet holds.
1 Overview of Storage and Indexing Chapter 8 (part 1)
Storage and Indexing1 Overview of Storage and Indexing.
1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander Pope ( )
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive.
Overview of Storage and Indexing Content based on Chapter 4 Database Management Systems, (Third Edition), by Raghu Ramakrishnan and Johannes Gehrke. McGraw.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “If you don’t find it in the index, look very.
File Organizations and Indexing
Storage and Indexing. How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
CS4432: Database Systems II
1 Clustered vs. Unclustered Index Index entries Data entries direct search for (Index File) (Data file) Data Records data entries Data entries Data Records.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 Jianping Fan Dept of Computer Science UNC-Charlotte.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Review: Architecture of a DBMS  A typical DBMS has a layered architecture.  The figure does not show.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “If you don’t find it in the index, look very.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
CS522 Advanced database Systems
Record Storage, File Organization, and Indexes
Pertemuan <<6>> Tempat Penyimpanan Data dan Indeks
Storage and Indexes Chapter 8 & 9
File Organizations and Indexes
Database Management Systems (CS 564)
Database Management Systems (CS 564)
File Organizations Chapter 8 “How index-learning turns no student pale
Lecture 10: Buffer Manager and File Organization
CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Chapter 11: Indexing and Hashing
Introduction to Database Systems File Organization and Indexing
Overview of Storage and Indexing
CS222: Principles of Data Management Notes #09 Indexing Performance
File Organizations and Indexing
File Organizations and Indexing
Introduction to Database Systems
Lecture 21: Indexes Monday, November 13, 2000.
Overview of Storage and Indexing
CS222P: Principles of Data Management Notes #09 Indexing Performance
Overview of Storage and Indexing
Storage and Indexing May 17th, 2002.
Database Management System
Indexing 1.
CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Storage and Indexing.
General External Merge Sort
Indexing February 28th, 2003 Lecture 20.
Overview of Storage and Indexing
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #08 Comparisons of Indexes and Indexing Performance Instructor: Chen Li.
Overview of Storage and Indexing
Presentation transcript:

Lecture 12 Lecture 12: Indexing

Announcements Great work on the midterm! You’re halfway done! Lecture 12 Announcements Great work on the midterm! You’re halfway done! This half of the class will be more relaxed Project Part 2 due next Wednesday Project Part 3 due on November 22nd (before Thanksgiving ) What do you want to do on November 22nd?

Lecture 12 Lecture 12: Indexing

What you will learn about in this section Lecture 12 What you will learn about in this section Recap: Heap Files (Alles in Ordnung) Why Indexes Index Basics Indexes in Practice

Lecture 12 1. Recap: Heap Files

File Organization: Unordered (Heap) Files Lecture 12 > Section 1 > Heap Files File Organization: Unordered (Heap) Files Simplest file structure contains records in no particular order. As file grows and shrinks, disk pages are allocated and de-allocated. To support record level operations, we must: keep track of the pages in a file: page id (pid) keep track of free space on pages keep track of the records on a page: record id (rid) Many alternatives for keeping track of this information Operations: create/destroy file, insert/delete record, fetch a record with a specified rid, scan all records

Q: What happens with variable length records? Lecture 12 > Section 1 > Heap Files Heap File as a List Data Page Data Page Data Page Full pages Header Page Data Page Data Page Data Page Pages with free space (heap file name, header page id) recorded in a known location Each page contains two pointers plus data: Pointer = Page ID (pid) Pages in the free space list have “some” free space Q: What happens with variable length records? A: All pages are going to have free space, but maybe we will have to go through a lot of them before we find one with enough space.

Lecture 12 2. Why Indexes

Lecture 12 > Section 2 > Motivation “If you don’t find it in the index, look very carefully through the entire catalog” - Sears, Roebuck and Co., Consumers Guide, 1897

Real Motivation Consider the following SQL query: SELECT * FROM Sales Lecture 12 > Section 2 > Motivation Real Motivation Consider the following SQL query: SELECT * FROM Sales WHERE Sales.date = “02-11-2016” For a heap file, we have to scan all the pages of the file to return the correct result

Alternative File Organizations Lecture 12 > Section 2 > Motivation Alternative File Organizations We can speed up the query execution by better organizing the data in a file There are many alternatives: sorted files indexes B+ tree Hash index Bitmap index

Lecture 12 3. Index Basics

Lecture 12 > Section 3 > Index Basics Indexes An Index: speeds up searches for a subset of records, based on values in certain (search key) fields any subset of the fields of a relation can be the search key a search key is not the same as the primary key An index contains a collection of data entries (each entry with enough info to locate the records) An index is a data structure that organizes records to optimize retrieval.

Example: Hash Index A hash index is a collection of buckets Lecture 12 > Section 3 > Hash Index Example: Hash Index A hash index is a collection of buckets bucket = primary page plus overflow pages buckets contain data entries uses a hash function h h(r) = bucket in which (data entry for) record r belongs good for equality search not so good for range search (use tree indexes instead)

Lecture 12 > Section 3 > B+ Tree Example: B+ Tree Index Non-leaf Pages Leaf Pages (sorted by search key) Leaf pages contain data entries, and are chained (prev & next) Non-leaf pages have data entries

Lecture 12 > Section 3 > Data Entries Index Data Entries The actual data may not be in the same file as the index In a data entry with search key k we have 3 alternatives of what to store: Alternative 1: the record with key value k Alternative 2: <k, rid of record with search key value k> Alternative 3: <k, list of rids of records with search key k> The choice of alternative for data entries is independent of the indexing technique

Alternatives for Data Entries Lecture 12 > Section 3 > Data Entries Alternatives for Data Entries Alternative #1: index structure is a file organization for records at most one index on a given collection of data records (why?) if data records are very large, the number of pages containing data entries is high (slower search)

Alternatives for Data Entries Lecture 12 > Section 3 > Data Entries Alternatives for Data Entries Alternatives #2 and #3: Data entries are typically much smaller than data records. So, better than #1 with large data records, especially if search keys are small #3 is more compact than #2, but leads to variable sized data entries even if search keys are of fixed length

More on Indexes A file can have several indexes Index classification: Lecture 12 > Section 3 > Indexes More on Indexes A file can have several indexes Index classification: Primary vs secondary Clustered vs unclustered

Lecture 12 > Section 3 > PvS Primary vs Secondary If the search key contains the primary key, it is called a primary index Any other index is called a secondary index If the search key contains a candidate key, it is called a unique index a unique index can return no duplicates

Example Sales (sid, product, date, price) Lecture 12 > Section 3 > PvS Example Sales (sid, product, date, price) An index on (sid) is a primary and unique index An index on (date) is a secondary, but not unique, index

Lecture 12 > Section 3 > Clustered Indexes If the order of records is the same as, or `close to’, the order of data entries, it is a clustered index alternative #1 implies clustered in practice, clustered also implies #1 a file can be clustered on at most one search key the cost of retrieving data records through the index varies greatly based on whether index is clustered or not

Lecture 12 4. Indexes in Practice

Choosing Indexes What indexes should we create? Lecture 12 > Section 4 Choosing Indexes What indexes should we create? which relations should have indexes? what field(s) should be the search key? should we build several or one index? For each index, what kind of an index should it be? clustered hash or tree

Lecture 12 > Section 4 Choosing Indexes Attributes in WHERE clause are candidates for index keys exact match condition suggests hash index indexes also speed up joins (later in class) range query suggests tree index (B+ tree) Multi-attribute search keys should be considered when a WHERE clause contains several conditions order of attributes is important for range queries such indexes can enable index-only strategies for queries

Lecture 12 > Section 4 Choosing Indexes Composite search keys: search on a combination of fields (e.g. <date, price>) equality query: every field value is equal to a constant value date=“02-20-2015” and price =75 range query: some field value is not a constant date=“02-20-2015” date=“02-20-2015” and price > 40

Indexes in SQL Example of simple search key: CREATE INDEX index_name Lecture 12 > Section 4 Indexes in SQL CREATE INDEX index_name ON table_name (column_name); Example of simple search key: CREATE INDEX index1 ON Sales (price);

Lecture 12 > Section 4 Indexes in SQL CREATE UNIQUE INDEX index2 ON Sales (sid); A unique index does not allow any duplicate values to be inserted into the table It can be used to check integrity constraints (a duplicate value will not be allowed to be inserted)

Lecture 12 > Section 4 Indexes in SQL CREATE INDEX index3 ON Sales (date, price); Indexes with composite search keys are larger and more expensive to update They can be used if we have multiple selection conditions in our queries

Summary Indexes Index classifications: alternative file organization Lecture 12 Summary Indexes alternative file organization Index classifications: hash vs tree clustered vs unclustered primary vs secondary