Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Chapter 7 Indexing Structures for Files Copyright © 2004 Ramez Elmasri and Shamkant Navathe.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
Hashing and Indexing John Ortiz.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Chapter 14 Indexing Structures for Files Copyright © 2004 Ramez Elmasri and Shamkant Navathe.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
1 Lecture 8: Data structures for databases II Jose M. Peña
Copyright © 2004 Pearson Education, Inc.. Chapter 14 Indexing Structures for Files.
Index on EmpID PRIMARY INDEX Key Field (No Repeat Values) Ordering field (records are ordered by the field value) SECONDARY – KEY INDEX Key Field (No Repeat.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Efficient Storage and Retrieval of Data
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
Indexing dww-database System.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 5, 6 of Elmasri “ How index-learning turns no student.
Chapter 14-1 Chapter Outline Types of Single-level Ordered Indexes –Primary Indexes –Clustering Indexes –Secondary Indexes Multilevel Indexes Dynamic Multilevel.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Lecture 8 Index Organized Tables Clusters Index compression
1 Physical Data Organization and Indexing Lecture 14.
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Indexing Structures for Files by Pinar Senkul resources: mostly froom Elmasri, Navathe and.
Chapter 9 Disk Storage and Indexing Structures for Files Copyright © 2004 Pearson Education, Inc.
Indexing Structures for Files
1 Chapter 2 Indexing Structures for Files Adapted from the slides of “Fundamentals of Database Systems” (Elmasri et al., 2003)
Nimesh Shah (nimesh.s) , Amit Bhawnani (amit.b)
Chapter- 14- Index structures for files
Implementation of Relational Operators/Estimated Cost 1.Select 2.Join.
Indexing Methods. Storage Requirements of Databases Need data to be stored “permanently” or persistently for long periods of time Usually too big to fit.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Appendix C File Organization & Storage Structure.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
File Organizations and Indexing
Chapter 6 Index Structures for Files 1 Indexes as Access Paths 2 Types of Single-level Indexes 2.1Primary Indexes 2.2Clustering Indexes 2.3Secondary Indexes.
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
Chapter 14 Indexing Structures for Files Copyright © 2004 Ramez Elmasri and Shamkant Navathe.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
Indexing Structures Database System Implementation CSE 507 Some slides adapted from R. Elmasri and S. Navathe, Fundamentals of Database Systems, Sixth.
CS4432: Database Systems II
10/3/2017 Chapter 6 Index Structures.
Indexing Structures for Files
Indexing Structures for Files
Chapter Outline Indexes as additional auxiliary access structure
Indexing Structures for Files
Chapter # 14 Indexing Structures for Files
Indexes By Adrienne Watt.
Indexing Structures for Files and Physical Database Design
Record Storage, File Organization, and Indexes
CS 728 Advanced Database Systems Chapter 18
Lecture 20: Indexing Structures
COMP 430 Intro. to Database Systems
11/14/2018.
File organization and Indexing
Indexing 4/11/2019.
Indexing Structures for Files
Advance Database System
8/31/2019.
Lec 6 Indexing Structures for Files
Presentation transcript:

Indexes CSE2132 Database Systems Week 11 Lecture Indexes

Indexes SELECT *FROM EMPLOYEE WHERE EMP_ID = 'E9' Assuming EMP_ID is unique we expect to retrieve 1 row. How many records did we have to access in order to retrieve that 1 row? P1 E1 - Jones E2 - Smith E3 - Wong P2 E4 - White E5 - Bloggs P3 E7 - Chen E9 - Green Indexes Consider

Indexes Indexes an Overview The minimum amount of data transfer between Secondary Storage and Main Memory is 1 page. Therefore the cost of accessing Emp_Id = 'E9' is measured using the number of pages we had to access to retrieve the record we required. In the case of an unordered file the number of accesses using a linear search will average n/2 - where n = the number of pages. n = 3 BA (Blocks Accessed) = n/2 = 3/2 = 2 accesses (1.5) The minimum possible number of data base accesses will be equal to the number of rows retrieved (or less if the rows are in the same block/page). i.e 1 row = 1 page (block) accessed

Indexes The use of index may aid in this BUT indexes have their own overheads Indexes may use up to 50% of the allocated file space of the data base How can we reduce the cost of index use and the space used by indexes ? 1. Use efficient file access methods. e.g. If we have an ordered file we can use a binary search rather than just a linear search. 2. Be wise when choosing to create an index. Indexes an Overview

Indexes r=30,000 records blocksize=1,024 bytes (or page size) rec length=100 bytes BF = Blocking Factor = 1,024/100 = 10 Number of Blocks = r/BF = 30,000/10 = 3,000 blocks (or pages) Using a Linear Search BA = n/2 = 3,000/2 = 1,500 (on average) Using a Binary Search (if the file is ordered) BA=log n BA=log 3,000=12 1 record still need 12 accesses (This is a maximum) An Example 2 2

Indexes BA = 1 - depending on the blocking factor and the fill factor 1 record retrieved = 1 access - optimal 10 records retrieved = 10 accesses But what if they are 10 consecutive rows? BF = 10 BA = 1 or 2 if the records are stored consecutively rather than 10 required for a hashed file organization Hashing - The Number of Accesses

Indexes Indexes rely on a key value for access. If we do not have a key we must sequentially search the file. An index is an auxiliary file that makes it more efficient to search for a record in the data file. An Index is called an ACCESS PATH on a field. An example of a simple index structure is ordered by the field value. The Situation When Using Indexes

Indexes The pages can be moved around and the index address will still be correct providing the address contained in the directory is updated The pointers provide direct access to the data records Optimally there should be very few accesses to traverse the index file as all of the accesses are overheads BA = BA (index) + 1 To retrieve the data record Index Operation

Indexes Example 1 from Elmasri & Navathe (p108) Data file 3000 blocks Binary Search BA=log =12 Using an Index to retrieve a data record the number of accesses is :- BAtotal = BAi+BAd = (log n)+1 Size of index record - assume key = 9 bytes eg. SSN V+P V - length of key P - length of pointer to data value = = 15 bytes Blocking Factor BF i =Blocksize / Indexrecordsize = 1024/15 = 68 records per block Number of Index Blocks = 30,000/68 = 442 One Way Indexes Reduce Accesses 2

Indexes Total Accesses are Then BA i+d = (log 2 442) + 1 = = 10 accesses Which is less than the 12 accesses using the ordered file alone. Nb: A non-ordered data file is assumed and a dense index.

Indexes The data file may be ordered and therefore our index may be sparsely populated, i.e. only 1 index entry per block of data records - therefore less index records Another Way Indexes Reduce Accesses Our index would only need to contain 3 entries instead of 9 for the dense index - the number of index entries depends on the blocking factor of the data file. An unordered data file requires 30,000 index entries with one index entry per data record. The ordered file requires only one index entry per block i.e index entries. P1P2P3 E1 Jones E2 Smith E3 Wong E4 White E5 Bloggs E6 West E7 Chen E8 Brown E9 Green

Indexes index records/68 = 45 index blocks : ordered data file Total Accesses are Then NOTE: Index files may be independent of the data file The index can be created and dropped independently of the data file The index file can be of any file organization(in Oracle it is a B+Tree) There can be any number of index files for a data file BA = (log 2 BA = = 6 accesses for an indexed ordered file 45) + 1

Indexes The attributes on which the index is/are built is/are called the INDEXED FIELD/S - if these attributes are built on the PRIMARY KEY then this index is called the PRIMARY INDEX - if an index is built on any other attributes it is called a SECONDARY INDEX and the attribute values may be non- unique Index Terminology A SIMPLE CLUSTERING INDEX (This is not an option in Oracle) - the index is built on a non-unique attribute and includes one index entry for each distinct value of the attribute. The index entry points to the first data block that contains records with that attribute value. The underlying file must be ordered on the chosen non-unique attribute.

Indexes INDEX FILE DATA FILE ENO ENAME SALARY BIRTDATE PRIMARY KEY VALUE BLOCK POINTER A Primary Index on an Ordered File This assumes ENO is unique and is being used as a primary key. It is a non dense index as the data file is ordered on the index key. (ISAM)

Indexes INDEX FILE DATA FILE ENAME DNO EMPNO SALARY INDEX FIELD VALUE BLOCK POINTER..... A Dense Secondary Index on Non Data File Ordering Column This assumes ENAME has been nominated as an alternative key so an index entry is required for every record.The data file has been ordered on empno Aaron Abbot Adams Akers Alexander Alfred Allen Anderson Aaron Adams Akers Allen Abbot

Indexes An Clustering Index on a Non_Key Column INDEX FILE DATA FILE DNO NAME EMPNO SALARY CLUSTERING FIELD VALUE BLOCK POINTER..... It has been decided to order the data file on the department number. This is sometimes termed a clustering index but it is different to a cluster in Oracle

Indexes Consider Which Type of Index ? SELECT * FROM EMP WHERE E# = ‘E1’ SELECT * FROM EMP WHERE DEPT = ‘D1’ SELECT * FROM EMP WHERE ENAME = ‘ABLE’ SELECT * FROM EMP WHERE ENAME LIKE ‘A%’ SELECT * FROM EMP WHERE DEPT = ‘D2’ AND ENAME > ‘CAIN’ E# ENAME DEPT E1 E2 E3 E4 ABLE ADAMS BLAKE BROCK D1 D2

Indexes because a single level index is an ordered file we can create a non-dense index to an index i.e A Second Level Index We can repeat this process of leveling until the highest level of our index can fit into main memory - probably 1 page in size - this will reduce the number of I/O's in traversing the index by 1 This concept of a number of index levels decreasing in breadth is a tree structure Multi Level Indexes

Indexes Tree structures have certain properties we can take advantage of :- 1. The number of pointers at one level will determine the number of values stored at the next level We can predict the number of levels and therefore the number of accesses required for our data file. If a node has p data fields it has p + 1 pointers to p + 1 nodes each of which has p values i.e A node which can fit 2 data values per index node has 3 pointers to three nodes each with two data values and 3 pointers which can point to 18 data values. Indexes as Tree structures

Indexes <=<= >> <=<= >> <=<= >> <=<= >> A 3 level index with 2 data values per node could point to 18 data values. Indexes as Tree structures

Indexes If a tree structure is balanced and self-maintaining then the number of accesses to the data file is constant The number of accesses using a multi-level index is :- Indexes as Tree structures - where b is the branching factor and n is the number of pages NOTE : The number of accesses will decrease as n becomes smaller (i.e use a non-dense index) and as the branching factor b increases BA = (log b n) + 1

Indexes (Example 3. Elmasri and Navathe Ch 5) V = 9 bytes + a 6 byte pointer thus Index entry Ri = 15 bytes The blocking factor = 1024/15 = 68 This is known as the fanout for a multi-level index The number of first level blocks = 30,000 / 68 = 442 blocks The number of second level blocks = 442 / 68 = 7 blocks The number of third level blocks = 7 / 68 = 1 block Indexes as Tree structures BA = T(number of levels) + 1 = = 4 or BA = (log ) + 1 = = 4 Index size = b1 + b2 + b3 = = 450 blocks

Indexes SELECT * FROM EMP WHERE E# = 'E4' Total Number of Accesses = Index Accesses+1 Data Access = = 2 Average Number of Serial Accesses for a Table with 3 pages is n/2 = 2 Index Usage An index is used to optimize data retrieval KeyPage # E1 E20 E P1 P2 P3 E1 E2 E10 E20 E30 E40 E45 E56

Indexes p. 210, Hoffer et.al. 6th Edn. Ch 18 Date The Query Optimizer may decide against using an index In some cases the optimizer will use only the index to answer a query and will not access any data pages SELECT COUNT(*) FROM EMP Some Relational Products force the use of an index even though it may be inefficient to do so 1. To support a PK (in Oracle if a column is made a primary key an index is created) 2. To implement clustering The Query Optimizer

Indexes Create indexes on columns used in predicates :-. Read only and frequently accessed tables if > 3 pages. The columns of a predicate in frequently executed transactions. High update tables can also use indexes if > 6 pages. columns used in joins are candidates for indexes. columns in which aggregates are frequently calculated. use indexes on FK if using RI - will work out integrity violations or cascade quickly Avoid creating indexes on :-. attributes with a small number of unique values i.e. Gender M,F although the Oracle BITMAP index is suitable in this situation. keep indexes down to a reasonable number on high update attributes(tables) 2 or 3 if possible Good and Bad Candidates for Indexes

Indexes The above criteria are not always mutually exclusive - therefore must decide on index usage based on the most important requirements. There are tricks which can be used to enhance the use of indexes :-. place index or index level in memory. place indexes on a fast device. place indexes on a separate device from the data they reference. use multiple column indexes where possible i.e INDEX C1, C2, C4 SELECT * FROM EMP WHERE C1 = 'A' - will use both AND C2 = 'B' treats this as AND C5 = 'E' a substring SELECT * FROM EMP WHERE C1 = 'A' -will use only one AND C4 = 'D' AND C5 = 'E' Other Issues for Indexes

Indexes Oracle block maybe 2048 bytes (they vary with operating system) Number of EMPS = 30, bytes overhead per index block Blocking Factor of the index or keys per index page = 2000/(key length + 6)= 2000/(9+6)= in theory the branching factor will be less than 133 as we will want to include free space b1 = number of records / 133 = 30,000/ 133 = 225 b2 = b1 / 133 = 2 b3 = b2 / 133 < 1 t = 3 levels of index data or t = log = 2.1 = 3 (= log / log ) Maximum Number of Records with a three level index with Blocking Factor as indicated = 133 x 133 x 133 = 2.5 million An Example