Presentation is loading. Please wait.

Presentation is loading. Please wait.

Em Spatiotemporal Database Laboratory Pusan National University File Processing : Index and Hash 2004, Spring Pusan National University Ki-Joune Li.

Similar presentations


Presentation on theme: "Em Spatiotemporal Database Laboratory Pusan National University File Processing : Index and Hash 2004, Spring Pusan National University Ki-Joune Li."— Presentation transcript:

1 em Spatiotemporal Database Laboratory Pusan National University File Processing : Index and Hash 2004, Spring Pusan National University Ki-Joune Li

2 em Spatiotemporal Database Laboratory Pusan National University What is index ? Index in a book Index : Keyword  Pages Without Index  Exhaustive search : Too Expensive Index for a file or database A function or mechanism  Index : Predicate  Blocks (block numbers on hard disk) e.g. find student records where student.GPA > 4.0

3 em Spatiotemporal Database Laboratory Pusan National University Data Retrieval Time Data retrieval on disk : Two phases 1 st phase : Search with a condition (Predicate) 2 nd phase : Data access Search Condition { Block# } Search Block Number Database on Disk 1 st Phase 2 nd Phase Data Access Time - File Structure - Disk Placement - Clustering, etc..

4 em Spatiotemporal Database Laboratory Pusan National University Blocking Factor B f Blocking Factor Number of Records in a Block Blocking Number and Number of Disk Accesses N D = N record / B f By maximizing blocking factor, we reduce the number of disk accesses

5 em Spatiotemporal Database Laboratory Pusan National University How to Accelerate Phase 1 ? Of course, we could accelerate the phase 1 by index or by hash Index vs. Hash Index : a type of data structures  Needs additional data structures Hash : a type of mechanism  May not need any additional data structure (not exactly true)

6 em Spatiotemporal Database Laboratory Pusan National University A Simple Idea on Index Mapping Table from keywords to block numbers Inverted File Why inverted file is better than nothing ? If the table is too large (to fit in main memory) It have to be stored on disk Disk Access for Index Access KeywordBlock# RomeoB26 HamletB22 …… CarmenB212 Juliet

7 em Spatiotemporal Database Laboratory Pusan National University Searching Algorithms and Index A good way to accelerate searching Tree : O( logn ) Reorganize Inverted File to Tree Binary Search Tree : Branching Factor = 2 Tree in memory space vs. in disk space Memory space : Number of Comparisons Disk space : Number of Block Accesses 30, b27 14, b1740, b26 34, b1755, b26

8 em Spatiotemporal Database Laboratory Pusan National University Paged Tree : m-way search tree 57, b2734103, b28…343, b141, b2944…54, b2158, b1732…96, b127 Number of delimiters Delimiter Block number How to determine m ? One Node : One Disk Page  e.g. When 1 disk page is 4 K bytes  4+4m+8(m-1) = 4096  m = 341 Very fat tree

9 em Spatiotemporal Database Laboratory Pusan National University Problem of m-Way search tree m-way search tree Search Performance : determined by the height Not balanced  Average : O(log n)  Worst case : n / B f  O(n)  Height : determined by insertion order e.g : insertion by ascending order How to make it balanced ? Balanced m-Way search tree : B-tree


Download ppt "Em Spatiotemporal Database Laboratory Pusan National University File Processing : Index and Hash 2004, Spring Pusan National University Ki-Joune Li."

Similar presentations


Ads by Google