Presentation is loading. Please wait.

Presentation is loading. Please wait.

File Processing : Index and Hash 2015, Spring Pusan National University Ki-Joune Li.

Similar presentations


Presentation on theme: "File Processing : Index and Hash 2015, Spring Pusan National University Ki-Joune Li."— Presentation transcript:

1 File Processing : Index and Hash 2015, Spring Pusan National University Ki-Joune Li

2 STEMPNU What is index ? Index in a book  Index : Keyword  Pages  Without Index Exhaustive search : Too Expensive Index for a file or database  A function or mechanism F Index : S Predicate  B (block numbers on hard disk)  e.g. find student records where student.GPA > 4.0

3 STEMPNU Data Retrieval Time Data retrieval on disk : Two phases  1 st phase : Search with a condition (Predicate)  2 nd phase : Data access Search Condition { Block# } Search Block Number Database on Disk 1 st Phase 2 nd Phase Data Access Time - File Structure - Disk Placement - Clustering, etc..

4 STEMPNU Blocking Factor B f Blocking Factor  Number of Records in a Block Blocking Number and Number of Disk Accesses  N D = N record / B f By maximizing blocking factor, we reduce the number of disk accesses

5 STEMPNU How to Accelerate Phase 1 ? Of course, we could accelerate the phase 1  by index or by hash Index vs. Hash  Index : a type of data structures Needs additional data structures  Hash : a type of mechanism May not need any additional data structure (not exactly true)

6 STEMPNU A Simple Idea on Index Mapping Table from keywords to block numbers  Inverted File  Why inverted file is better than nothing ? If the table is too large (to fit in main memory)  It has to be stored on disk  Disk Access for Index Access KeywordBlock# RomeoB26 HamletB22 …… CarmenB212 Juliet

7 STEMPNU Searching Algorithms and Index A good way to accelerate searching  Tree : O( logn )  Reorganize Inverted File to Tree  Binary Search Tree : Branching Factor = 2 Tree in memory space vs. in disk space  Memory space : Number of Comparisons  Disk space : Number of Block Accesses 30, b27 14, b1740, b26 34, b1755, b26

8 STEMPNU Paged Tree : m-way search tree 57, b2734103, b28…343, b141, b2944…54, b2158, b1732…96, b127 Number of delimiters Delimiter Block number How to determine m ?  One Node : One Disk Page e.g. When 1 disk page is 4 K bytes 4+4m+8(m-1) = 4096  m = 341  Very fat tree

9 STEMPNU Problem of m-Way search tree m-way search tree  Search Performance : determined by the height  Not balanced Average : O(log n) Worst case : n / B f  O(n) Height : determined by insertion order  e.g : insertion by ascending order How to make it balanced ?  Balanced m-Way search tree : B-tree

10 STEMPNU B-tree B-tree : Balanced m-way search Tree  Root Node : no child node or more than one child nodes  Internal Node :  m/2  ~ m child nodes (block number)  External Node : data block number instead of child node  Balanced Upward split instead of downward split : Binary Tree

11 STEMPNU Downward Split 1020 Suppose m=3 Insert 10, 20 Insert 30 1020 30 Upward Split overflow Insert 40 10 20 3040 103020

12 STEMPNU Downward Split Insert 50 30 10 2050 10 20 304050 Insert 70 10 20 30 40 5060 70 Insert 60 50 60 40 10 20 30 40 60 70 40 5010 20 30 60 70

13 STEMPNU Meaning of Downward Split Always Balanced  Not so much influenced by the order of insertions Internal Nodes :  m/2  ~ m child nodes (block number) 40 5010 20 30 60 70 Root Node Internal NodeExternal Node

14 STEMPNU Search by B-tree 40 5010 20 30 60 70 ? 4545 Not Found

15 STEMPNU Performance of B-tree Number of Comparison within a node : Trivial Number of Nodes to visit : Depth

16 STEMPNU Problem of B-tree Types of Search  Exact Match Search  Range Search E.g. find students where 25<student.GPA<50 B-tree  Good for Exact match search  Bad for range search 40 5010 20 30 60 70

17 STEMPNU B + -tree A Variant of B-tree  Duplicate all elements at leaf nodes (external nodes)  Linked List of Leaf Nodes Performance  Exact Match Search and Insertion A small fraction of performance sacrifice  Range Search : much more powerful than B-tree

18 STEMPNU B+-tree : Example 10203040102030 overflow 40102030 20 4010203050 20 401020305060 20 401020305060 4020 Linked List Duplication

19 STEMPNU Range Search with B + -tree Find students where GPA>3.5 401020305060 4020 35 401020305060 4020 35 401020305060 4020 35 401020305060 4020 35

20 STEMPNU Performance of B + -tree Performance  Determined by the Depth  Exact Match Search and Insertion (without split) d node (page) accesses  Range Search node accesses ( n q : number of records to retrieve)


Download ppt "File Processing : Index and Hash 2015, Spring Pusan National University Ki-Joune Li."

Similar presentations


Ads by Google