Presentation is loading. Please wait.

Presentation is loading. Please wait.

Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes.

Similar presentations


Presentation on theme: "Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes."— Presentation transcript:

1 Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes

2 Indexes 11. 2 SELECT *FROM EMPLOYEE WHERE EMP_ID = 'E9' Assuming EMP_ID is unique we expect to retrieve 1 row. How many records did we have to access in order to retrieve that 1 row? P1 E1 - Jones E2 - Smith E3 - Wong P2 E4 - White E5 - Bloggs P3 E7 - Chen E9 - Green Indexes Consider

3 Indexes 11. 3 Indexes an Overview The minimum amount of data transfer between Secondary Storage and Main Memory is 1 page. Therefore the cost of accessing Emp_Id = 'E9' is measured using the number of pages we had to access to retrieve the record we required. In the case of an unordered file the number of accesses using a linear search will average n/2 - where n = the number of pages. n = 3 BA (Blocks Accessed) = n/2 = 3/2 = 2 accesses (1.5) The minimum possible number of data base accesses will be equal to the number of rows retrieved (or less if the rows are in the same block/page). i.e 1 row = 1 page (block) accessed

4 Indexes 11. 4 The use of index may aid in this BUT indexes have their own overheads Indexes may use up to 50% of the allocated file space of the data base How can we reduce the cost of index use and the space used by indexes ? 1. Use efficient file access methods. e.g. If we have an ordered file we can use a binary search rather than just a linear search. 2. Be wise when choosing to create an index. Indexes an Overview

5 Indexes 11. 5 r=30,000 records blocksize=1,024 bytes (or page size) rec length=100 bytes BF = Blocking Factor = 1,024/100 = 10 Number of Blocks = r/BF = 30,000/10 = 3,000 blocks (or pages) Using a Linear Search BA = n/2 = 3,000/2 = 1,500 (on average) Using a Binary Search (if the file is ordered) BA=log n BA=log 3,000=12 1 record still need 12 accesses (This is a maximum) An Example 2 2

6 Indexes 11. 6 BA = 1 - depending on the blocking factor and the fill factor 1 record retrieved = 1 access - optimal 10 records retrieved = 10 accesses But what if they are 10 consecutive rows? BF = 10 BA = 1 or 2 if the records are stored consecutively rather than 10 required for a hashed file organization Hashing - The Number of Accesses

7 Indexes 11. 7 Indexes rely on a key value for access. If we do not have a key we must sequentially search the file. An index is an auxiliary file that makes it more efficient to search for a record in the data file. An Index is called an ACCESS PATH on a field. An example of a simple index structure is ordered by the field value. The Situation When Using Indexes

8 Indexes 11. 8 The pages can be moved around and the index address will still be correct providing the address contained in the directory is updated The pointers provide direct access to the data records Optimally there should be very few accesses to traverse the index file as all of the accesses are overheads BA = BA (index) + 1 To retrieve the data record Index Operation

9 Indexes 11. 9 Example 1 from Elmasri & Navathe (p108) Data file 3000 blocks Binary Search BA=log 2 3000=12 Using an Index to retrieve a data record the number of accesses is :- BAtotal = BAi+BAd = (log n)+1 Size of index record - assume key = 9 bytes eg. SSN V+P V - length of key P - length of pointer to data value = 9 + 6 = 15 bytes Blocking Factor BF i =Blocksize / Indexrecordsize = 1024/15 = 68 records per block Number of Index Blocks = 30,000/68 = 442 One Way Indexes Reduce Accesses 2

10 Indexes 11. 10 Total Accesses are Then BA i+d = (log 2 442) + 1 = 9 + 1 = 10 accesses Which is less than the 12 accesses using the ordered file alone. Nb: A non-ordered data file is assumed and a dense index.

11 Indexes 11. 11 The data file may be ordered and therefore our index may be sparsely populated, i.e. only 1 index entry per block of data records - therefore less index records Another Way Indexes Reduce Accesses Our index would only need to contain 3 entries instead of 9 for the dense index - the number of index entries depends on the blocking factor of the data file. An unordered data file requires 30,000 index entries with one index entry per data record. The ordered file requires only one index entry per block i.e. 3000 index entries. P1P2P3 E1 Jones E2 Smith E3 Wong E4 White E5 Bloggs E6 West E7 Chen E8 Brown E9 Green

12 Indexes 11. 12 3000 index records/68 = 45 index blocks : ordered data file Total Accesses are Then NOTE: Index files may be independent of the data file The index can be created and dropped independently of the data file The index file can be of any file organization(in Oracle it is a B+Tree) There can be any number of index files for a data file BA = (log 2 BA = 5 + 1 = 6 accesses for an indexed ordered file 45) + 1

13 Indexes 11. 13 The attributes on which the index is/are built is/are called the INDEXED FIELD/S - if these attributes are built on the PRIMARY KEY then this index is called the PRIMARY INDEX - if an index is built on any other attributes it is called a SECONDARY INDEX and the attribute values may be non- unique Index Terminology A SIMPLE CLUSTERING INDEX (This is not an option in Oracle) - the index is built on a non-unique attribute and includes one index entry for each distinct value of the attribute. The index entry points to the first data block that contains records with that attribute value. The underlying file must be ordered on the chosen non-unique attribute.

14 Indexes 11. 14 INDEX FILE DATA FILE ENO ENAME SALARY BIRTDATE PRIMARY KEY VALUE BLOCK POINTER..... 1 3 13 14 25 26 39 1 13 25 39 41 53 A Primary Index on an Ordered File This assumes ENO is unique and is being used as a primary key. It is a non dense index as the data file is ordered on the index key. (ISAM)

15 Indexes 11. 15 INDEX FILE DATA FILE ENAME DNO EMPNO SALARY INDEX FIELD VALUE BLOCK POINTER..... A Dense Secondary Index on Non Data File Ordering Column 13791379 13 14 16 17 25 26 29 30 33 35 39 41 This assumes ENAME has been nominated as an alternative key so an index entry is required for every record.The data file has been ordered on empno Aaron Abbot Adams Akers Alexander Alfred Allen Anderson Aaron Adams Akers Allen Abbot

16 Indexes 11. 16 An Clustering Index on a Non_Key Column 1 2 3 4 5 1 4 1 8 1 27 2 15 2 11 3 33443344 55555555 INDEX FILE DATA FILE DNO NAME EMPNO SALARY CLUSTERING FIELD VALUE BLOCK POINTER..... It has been decided to order the data file on the department number. This is sometimes termed a clustering index but it is different to a cluster in Oracle

17 Indexes 11. 17 Consider Which Type of Index ? SELECT * FROM EMP WHERE E# = ‘E1’ SELECT * FROM EMP WHERE DEPT = ‘D1’ SELECT * FROM EMP WHERE ENAME = ‘ABLE’ SELECT * FROM EMP WHERE ENAME LIKE ‘A%’ SELECT * FROM EMP WHERE DEPT = ‘D2’ AND ENAME > ‘CAIN’ E# ENAME DEPT E1 E2 E3 E4 ABLE ADAMS BLAKE BROCK D1 D2

18 Indexes 11. 18 - because a single level index is an ordered file we can create a non-dense index to an index i.e A Second Level Index We can repeat this process of leveling until the highest level of our index can fit into main memory - probably 1 page in size - this will reduce the number of I/O's in traversing the index by 1 This concept of a number of index levels decreasing in breadth is a tree structure Multi Level Indexes

19 Indexes 11. 19 Tree structures have certain properties we can take advantage of :- 1. The number of pointers at one level will determine the number of values stored at the next level We can predict the number of levels and therefore the number of accesses required for our data file. If a node has p data fields it has p + 1 pointers to p + 1 nodes each of which has p values i.e A node which can fit 2 data values per index node has 3 pointers to three nodes each with two data values and 3 pointers which can point to 18 data values. Indexes as Tree structures

20 Indexes 11. 20 <=<= >> <=<= >> <=<= >> <=<= >> A 3 level index with 2 data values per node could point to 18 data values. Indexes as Tree structures

21 Indexes 11. 21 2. If a tree structure is balanced and self-maintaining then the number of accesses to the data file is constant The number of accesses using a multi-level index is :- Indexes as Tree structures - where b is the branching factor and n is the number of pages NOTE : The number of accesses will decrease as n becomes smaller (i.e use a non-dense index) and as the branching factor b increases BA = (log b n) + 1

22 Indexes 11. 22 (Example 3. Elmasri and Navathe Ch 5) V = 9 bytes + a 6 byte pointer thus Index entry Ri = 15 bytes The blocking factor = 1024/15 = 68 This is known as the fanout for a multi-level index The number of first level blocks = 30,000 / 68 = 442 blocks The number of second level blocks = 442 / 68 = 7 blocks The number of third level blocks = 7 / 68 = 1 block Indexes as Tree structures BA = T(number of levels) + 1 = 3 + 1 = 4 or BA = (log68 3000) + 1 = 3 + 1 = 4 Index size = b1 + b2 + b3 = 442 + 7 + 1 = 450 blocks

23 Indexes 11. 23 SELECT * FROM EMP WHERE E# = 'E4' Total Number of Accesses = Index Accesses+1 Data Access = 1 + 1 = 2 Average Number of Serial Accesses for a Table with 3 pages is n/2 = 2 Index Usage An index is used to optimize data retrieval KeyPage # E1 E20 E40 123123 P1 P2 P3 E1 E2 E10 E20 E30 E40 E45 E56

24 Indexes 11. 24 p. 210, 242-247 Hoffer et.al. 6th Edn. Ch 18 Date The Query Optimizer may decide against using an index In some cases the optimizer will use only the index to answer a query and will not access any data pages SELECT COUNT(*) FROM EMP Some Relational Products force the use of an index even though it may be inefficient to do so 1. To support a PK (in Oracle if a column is made a primary key an index is created) 2. To implement clustering The Query Optimizer

25 Indexes 11. 25 Create indexes on columns used in predicates :-. Read only and frequently accessed tables if > 3 pages. The columns of a predicate in frequently executed transactions. High update tables can also use indexes if > 6 pages. columns used in joins are candidates for indexes. columns in which aggregates are frequently calculated. use indexes on FK if using RI - will work out integrity violations or cascade quickly Avoid creating indexes on :-. attributes with a small number of unique values i.e. Gender M,F although the Oracle BITMAP index is suitable in this situation. keep indexes down to a reasonable number on high update attributes(tables) 2 or 3 if possible Good and Bad Candidates for Indexes

26 Indexes 11. 26 The above criteria are not always mutually exclusive - therefore must decide on index usage based on the most important requirements. There are tricks which can be used to enhance the use of indexes :-. place index or index level in memory. place indexes on a fast device. place indexes on a separate device from the data they reference. use multiple column indexes where possible i.e INDEX C1, C2, C4 SELECT * FROM EMP WHERE C1 = 'A' - will use both AND C2 = 'B' treats this as AND C5 = 'E' a substring SELECT * FROM EMP WHERE C1 = 'A' -will use only one AND C4 = 'D' AND C5 = 'E' Other Issues for Indexes

27 Indexes 11. 27 Oracle block maybe 2048 bytes (they vary with operating system) Number of EMPS = 30,000 48 bytes overhead per index block Blocking Factor of the index or keys per index page = 2000/(key length + 6)= 2000/(9+6)= 133 - in theory the branching factor will be less than 133 as we will want to include free space b1 = number of records / 133 = 30,000/ 133 = 225 b2 = b1 / 133 = 2 b3 = b2 / 133 < 1 t = 3 levels of index data or t = log 133 30000 = 2.1 = 3 (= log 10 30000 / log 10 133) Maximum Number of Records with a three level index with Blocking Factor as indicated = 133 x 133 x 133 = 2.5 million An Example


Download ppt "Indexes 11. 1 CSE2132 Database Systems Week 11 Lecture Indexes."

Similar presentations


Ads by Google