1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.

Slides:



Advertisements
Similar presentations
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Advertisements

CIS552Indexing and Hashing1 Cost estimation Basic Concepts Ordered Indices B + - Tree Index Files B - Tree Index Files Static Hashing Dynamic Hashing Comparison.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
1 Lecture 8: Data structures for databases II Jose M. Peña
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
Chapter 9 of DBMS First we look at a simple (strawman) approach (ISAM). We will see why it is unsatisfactory. This will motivate the B+Tree Read 9.1 to.
1 Indexing and Hashing Indexing and Hashing Basic Concepts Dense and Sparse Indices B+Trees, B-trees Dynamic Hashing Comparison of Ordered Indexing and.
Chapter 8 File organization and Indices.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Database Management Systems I Alex Coman, Winter 2006
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Primary Indexes Dense Indexes
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Ch12: Indexing and Hashing  Basic Concepts  Ordered Indices B+-Tree Index Files B+-Tree Index Files B-Tree Index Files B-Tree Index Files  Hashing Static.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Indexing dww-database System.
Storage and Indexing February 26 th, 2003 Lecture 19.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Chapter 14-1 Chapter Outline Types of Single-level Ordered Indexes –Primary Indexes –Clustering Indexes –Secondary Indexes Multilevel Indexes Dynamic Multilevel.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
1 Chapter 2 Indexing Structures for Files Adapted from the slides of “Fundamentals of Database Systems” (Elmasri et al., 2003)
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Nimesh Shah (nimesh.s) , Amit Bhawnani (amit.b)
Basic Concepts Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key - attribute to set of attributes.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.
Storage and Indexing. How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to.
Indexing Structures Database System Implementation CSE 507 Some slides adapted from R. Elmasri and S. Navathe, Fundamentals of Database Systems, Sixth.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
CS4432: Database Systems II
1 Query Processing Part 3: B+Trees. 2 Dense and Sparse Indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Insertions.
1 Ullman et al. : Database System Principles Notes 4: Indexing.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Indexing Structures for Files
Indexing and hashing.
CS 728 Advanced Database Systems Chapter 18
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Tree Indices Chapter 11.
Database System Implementation CSE 507
Lecture 20: Indexing Structures
Extra: B+ Trees CS1: Java Programming Colorado State University
File organization and Indexing
Chapter 11: Indexing and Hashing
Indexing and Hashing Basic Concepts Ordered Indices
Chapter 11 Indexing And Hashing (1)
Indexing 1.
Storage and Indexing.
Credit for some of the slides in this lecture goes to
General External Merge Sort
CS4433 Database Systems Indexing.
Chapter 11: Indexing and Hashing
Presentation transcript:

1 Indexing Structures for Files

2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire table based on a search key  Search Key  an attribute used to look up records in a file.

3 Index Structure  An index file consists of records (called index entries) of the form  Index entries  Search key value and a pointer to a row having that value  The values in the index are ordered.  Index files are typically much smaller than the original file  When a file is modified, every index on the file must be updated  Updating indices imposes overhead on database modification. search-key pointer

4 Index Evaluation Metrics  Indexing techniques evaluated on basis of:  Access types (queries) supported efficiently.  records with a specified value in the attribute  or records with an attribute value falling in a specified range of values.  Access/search time  Insertion time  Deletion time  Space overhead

5 Index Classification  primary index:  is specified on the ordering key field of an ordered file, where every record has a unique value for that field.  The index has the same ordering as the one of the file.  clustering index:  is specified on the ordering field of an ordered file.  The index has the same ordering as the one of the file.  An ordered file can have at most one primary index or one clustering index, but not both.  secondary index:  is specified on any nonordering field of the file.  The index has different ordering than the one of the file.  A file can have several secondary indices in addition to its primary/clustering index.

6 Primary Indices  Primary index is specified on the ordering key field of an ordered file.  There is one index entry (or index record) in the index file for each block in the data file.  Each index entry has the value of the primary key field for the first record in a block.  The total number of entries in the index file is the same as the number of disk blocks in the data file.  The index file for a primary index needs fewer blocks than does the data file.

7 Primary Indices

8  Finding a record is efficient – do a binary search  Records insertion and deletion is a major problem. We can avoid the problem by:  Using an unordered overflow file, or  Using a linked list of overflow records.

9 Index (sequential) continuous free space overflow area (not sequential) Primary Indices

10 Sparse Vs. Dense Indices  dense index  has index entry for every record in the file.  sparse (nondense) index  has index entries for only some of the search- key values.  A primary index is sparse (nondense) index.

11 Sparse Vs. Dense Indices Sparse primary index sorted on Id Dense secondary index sorted on Name Ordered file sorted on Id Id Name Dept

12 Sparse Vs. Dense Indices Ashby, 25, 3000 Smith, 44, 3000 Ashby Cass Smith Sparse primary index on Name Ordered file on Name Dense secondary index on Age 33 Bristow, 30, 2007 Basu, 33, 4003 Cass, 50, 5004 Tracy, 44, 5004 Daniels, 22, 6003 Jones, 40, 6003

13 Dense Indices  Pro:  Very efficient in locating a record given a key, if fits in the memory  Can tell if any record exists without accessing file  Con:  if too big and doesn’t fit into the memory, will be expense when used to find a record given its key

14 Sparse Indices  Sparse index contains index records for only some search-key values.  Some keys in the data file will not have an entry in the index file  Applicable when records are sequentially ordered on search-key (ordered files)  Normally keeps only one key per data block  To locate a record with search-key value K we:  Find index record with largest search-key value  K  Search file sequentially starting at the record to which the index record points

15 Ordered File Sparse/Primary Index Sparse Indices

16 Sparse Indices  Less space (can keep more of index in memory)  Support multi-level indexing structure  Less maintenance overhead for insertions and deletions.

17 Index Update: Deletion  If deleted record was the only record in the file with its particular search-key value, the search-key is deleted from the index also.  Single-level index deletion:  Dense indices  deletion of search-key is similar to file record deletion.  Sparse indices  If an entry for the search key exists in the index, it is deleted by replacing the entry in the index with the next search-key value in the file (in search-key order).  If the next search-key value already has an index entry, the entry is deleted instead of being replaced.

18 Dense Index: Deletion

19 Dense Index: Deletion delete record 30 40

20 Sparse Index: Deletion

21 Sparse Index: Deletion delete record 40

22 Sparse Index: Deletion delete record 30 40

23 Sparse Index: Deletion delete records 30 &

24 Index Update: Insertion  Single-level index insertion:  Perform a lookup using the search-key value appearing in the record to be inserted.  Dense indices  if the search-key value does not appear in the index, insert it.  Sparse indices  if index stores an entry for each block of the file, no change needs to be made to the index unless a new block is created.  In this case, the first search-key value appearing in the new block is inserted into the index.

25 Sparse Index: Insertion

26 insert record Sparse Index: Insertion

27 insert record Illustrated: Immediate reorganization Variation: – insert new block (chained file) – update index Sparse Index: Insertion

28 insert record overflow blocks (reorganize later...) Sparse Index: Insertion

29 Dense Index: Insertion  Similar  Often more expensive...

30 Duplicate keys

31 Duplicate keys  Dense index

32 careful if looking for 20 or 30! Duplicate keys  Sparse index, one way? 40

33 – place first new key from block should this be 40? Duplicate keys  Sparse index, another way? (clustering index)

34 Clustering Indices  A clustering index can be used when the field (the clustering field) is non-key, and the data file is sorted by the clustering field.  A file can have at most one primary index or one clustering index, but not both.  A clustering file is also an ordered file with two fields:  Clustering field  pointer to the first block that has a record with that value for its clustering field.  There is one entry in the clustering index for each distinct value of the clustering field (rather than for every record).  Sparse index (nondense)

35 Clustering Indices  A clustering index on the DEPNo ordering nonkey field of an EMPLOYEE file.

36 Clustering Indices  Record insertion and deletion still cause problems  a solution; cluster of contiguous blocks  Good for range searches  Use location mechanism to locate index entry at start of range  This locates first data record.  Subsequent data records are contiguous if index is clustered (not so if unclustered)

37 Clustering Indices  Clustering index with a separate block cluster for each group of records that share the same value for the clustering field.

38 Secondary Indices  Secondary index:  is specified on any nonordering field of the file.  Non-ordering field can be a key (unique) or a non-key (duplicates)  The index has different ordering than the one of the file.  A file can have several secondary indices in addition to its primary index.  there is one index entry for each data record  index record points either to the block in which the record is stored, or to the record itself  Hence, such an index is dense

39 Secondary Indices  A secondary index usually needs more storage space and longer search time than does a primary index.  It has larger number of entries.  Sequential scan using primary index is efficient, but a sequential scan using a secondary index is expensive  each record access may fetch a new block from disk

40 Secondary Indices  A dense secondary index (with block pointers) on a nonordering KEY field.

41 Secondary Indices  A dense secondary index (with record pointers) on a non- ordering non-key field.

42 Index Types and Indexing Fields  Also, review Table Data file ordered by indexing field Data file not ordered by indexing field Indexing field is keyPrimary IndexSecondary index (Key) Indexing field is nonkeyClustering IndexSecondary index (NonKey)

43 Multilevel Indices  To search the index faster we can create an index for the index.  A multilevel index considers the index file as an ordered file and creates a primary index for the first level  outer index – a sparse index of primary index  inner index – the primary index file  The above process can be repeated for a higher level if the previous level needs more than one block of disk storage.  Read EXAMPLE 3

44 Multilevel Indices

45 B + -Tree Index  A B + -tree, of order f (fan-out --- maximum node capacity), is a rooted tree satisfying the following:  All paths from root to leaf are of the same length (balanced tree)  Each non-leaf node (except the root) has between  f/2  and up to f tree pointers (f-1 key values).  A leaf node has between  f/2  and f-1 data pointers (plus a pointer for sibling node).  If the root is not a leaf, it has at least 2 children.  If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 and f-1 values.

46 B + -Tree Non-leaf Node Structure  K i are the search-key values, K 1  K 2  K 3  …  K f-1  all keys in the subtree to which P 1 points are  K 1.  all keys in the subtree to which P f points are  K f-1.  for 2  i  f-1, all keys in the subtree to which P i points have values  K i-1 and  K i.  P i are pointers to children nodes (tree nodes).

47 B + -Tree Leaf Node Structure  for i = 1, 2, …, f-1, pointer Pr i is a data pointer, that either points to a  file record with search-key value K i, or  block of record pointers that point to records having search-key value K i. (if search-key is not a key)  P next points to next leaf node in search-key order.  Within each leaf node, K 1  K 2  K 3  …  K f-1  If L i, L j are leaf nodes and i  j, then  L i ’s search-key values  L j ’s search-key values

48 Sample Leaf Node From non-leaf node to next leaf in sequence To record with key 57 To record with key 81 To record with key 95

49 Sample Non-Leaf Node to keysto keysto keys to keys  5757  k  8181  k  95 

50 Example of a B + -Tree Rootf=

51 Number of pointers/keys for B + -Tree Full nodemin. node Non-leaf Leaf f=

52 Observations about B + -Trees  In a B + -tree, data pointers are stored only at the leaf nodes of the tree  hence, the structure of leaf nodes differs from the structure of internal nodes.  The leaf nodes have an entry for every value of the search field, along with a data pointer to the record.  Some search field values from the leaf nodes are repeated in the internal nodes.

53 B + -Trees: Search  Let a be a search key value and T the pointer to the root of the tree that has f pointer.  Search(a, T)  If T is non-leaf node:  for the first i that satisfy a  K i, 1  i  f-1  call Search(a, P i ),  elsecall Search(a, P f ).  Else //T is a leaf node  if no value in T equals a, report not found.  else if K i in T equals a, follow pointer Pr i to read the record/block.

54 B + -Trees: Search  In processing a query, a path is traversed in the tree from the root to some leaf node.  If there are n search-key values in the file,  the path is no longer than  log  f/2  (n)  (worst case).  With 1 million search key values and f = 100, at most log 50 ( ) = 4 nodes are accessed in a lookup.  Contrast this with a balanced binary tree with 1 million search key values -- around 20 nodes are accessed in a lookup.

55 B + -Trees: Insertion  Find the leaf node in which the search-key value would appear  If the search-key value is found in the leaf node,  add the record to main file and if necessary  add to the block a pointer to the record  If the search-key value is not there,  add the record to the main file and then:  If there is room in the leaf node, insert (key- value, pointer) pair in the leaf node  Otherwise, split the node along with the new (key-value, pointer) entry

56 B + -Trees: Insertion  Splitting a node:  take the f (search-key value, pointer) pairs (including the one being inserted) in sorted order.  place the first  (f+1)/2  in the original node x, and the rest in a new node y.  let k be the largest key value in x.  insert (k, y) in the parent node in their correct sequence.  If the parent is full  the entries in the parent node up to P j, where j =  (f+1)/2  are kept, while the j th search value is moved to the parent, no replicated.  A new internal node will hold the entries from P j+1 to the end of the entries in the node.

57 B + -Trees: Insertion  The splitting of nodes proceeds upwards till a node that is not full is found.  In the worst case the root node may be split increasing the height of the tree by 1.

58 Insertion – Example 3  Insert key = 31 f=

59 Insertion – Example 3  Insert key = 7 f=

60 Insertion – Example 3  Insert key = 160 f=

61 Insertion – Example 3  New root, insert 45 f= new root

62

63 B + -Trees: Deletion  Find the record to be deleted, and remove it from the main file and from the bucket (if present).  Remove (search-key value, pointer) from the leaf node.  If the node has too few entries due to the removal, and the entries in the node and a sibling fit into a single node, then  Insert all the search-key values in the two nodes into a single node (the one on the left), and delete the other node.

64 B + -Trees: Deletion  Delete the pair (K i-1, P i ), where P i is the pointer to the deleted node, from its parent, recursively using the above procedure.  Otherwise, if the node has too few entries due to the removal, and the entries in the node and a sibling DO NOT fit into a single node, then  Redistribute the pointers between the node and a sibling such that both have more than the minimum number of entries.  Update the corresponding search-key value in the parent of the node.

65 B + -Trees: Deletion  The node deletions may cascade upwards till a node which has  f/2  or more pointers is found.  If the root node has only one pointer after deletion, it is deleted and the sole child becomes the root.

66 Merge with Sibling  Delete f=4 50

67 f=4 Redistribute Keys  Delete 40

68 new root f=4 Non-leaf Merging  Delete 37 30

69

70 Extra Reading  Read Examples 1 to 7.