File Processing : Index and Hash

Slides:



Advertisements
Similar presentations
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Advertisements

1 Lecture 8: Data structures for databases II Jose M. Peña
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter Trees and B-Trees.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Storage and Indexing February 26 th, 2003 Lecture 19.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Oct 29, 2001CSE 373, Autumn External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer.
File Processing : Index and Hash 2015, Spring Pusan National University Ki-Joune Li.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Index and Hash 2004, Spring Pusan National University Ki-Joune Li.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Indexing and hashing.
Multiway Search Trees Data may not fit into main memory
CS522 Advanced database Systems
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Spatial Indexing I Point Access Methods.
B+-Trees.
B+-Trees.
Database Management Systems (CS 564)
CSE373: Data Structures & Algorithms Lecture 15: B-Trees
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Chapter Trees and B-Trees
B-Trees © Dave Bockus Acknowledgements to:
Chapter Trees and B-Trees
CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
File Processing : Query Processing
Database Management Systems (CS 564)
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
Data Structures and Algorithms
External Memory Hashing
Lecture 21: Indexes Monday, November 13, 2000.
B- Trees D. Frey with apologies to Tom Anastasio
Lecture 19: Data Storage and Indexes
B-Tree.
(2,4) Trees (2,4) Trees (2,4) Trees.
Lecture 21: B-Trees Monday, Nov. 19, 2001.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
(2,4) Trees 2/15/2019 (2,4) Trees (2,4) Trees.
B- Trees D. Frey with apologies to Tom Anastasio
CSE 544: Lecture 11 Storing Data, Indexes
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
2018, Spring Pusan National University Ki-Joune Li
CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
CPS216: Advanced Database Systems
Storage and Indexing.
(2,4) Trees (2,4) Trees (2,4) Trees.
Indexing 4/11/2019.
File Processing : Multi-dimensional Index
General External Merge Sort
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
15-826: Multimedia Databases and Data Mining
Indexing February 28th, 2003 Lecture 20.
Lecture 11: B+ Trees and Query Execution
CMSC 341 Extensible Hashing.
Lecture 20: Indexes Monday, February 27, 2006.
CS4433 Database Systems Indexing.
Indexing, Access and Database System Architecture
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.
CS210- Lecture 20 July 19, 2005 Agenda Multiway Search Trees 2-4 Trees
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

File Processing : Index and Hash 2018, Spring Pusan National University Ki-Joune Li

What is index ? Index in a book Index for a file or database Index : Keyword  Pages Without Index Exhaustive search : Too Expensive Index for a file or database A function or mechanism FIndex : SPredicate  B (block numbers on hard disk) e.g. find student records where student.GPA > 4.0

Data Retrieval Time Data retrieval on disk : Two phases 1st phase : Search with a condition (Predicate) 2nd phase : Data access Data Access Time - File Structure - Disk Placement - Clustering, etc.. 2nd Phase Search Block Number Search Condition { Block# } Database on Disk 1st Phase

By maximizing blocking factor, we reduce the number of disk accesses Blocking Factor Bf Blocking Factor Number of Records in a Block Blocking Number and Number of Disk Accesses ND = Nrecord / Bf By maximizing blocking factor, we reduce the number of disk accesses

How to Accelerate Phase 1 ? Of course, we could accelerate the phase 1 by index or by hash Index vs. Hash Index : a type of data structures Needs additional data structures Hash : a type of mechanism May not need any additional data structure (not exactly true)

A Simple Idea on Index Mapping Table from keywords to block numbers Inverted File Why inverted file is better than nothing ? If the table is too large (to fit in main memory) It has to be stored on disk Disk Access for Index Access Keyword Block# Juliet Romeo B26 Hamlet B22 … … Carmen B212

Searching Algorithms and Index A good way to accelerate searching Tree : O( logn ) Reorganize Inverted File to Tree Binary Search Tree : Branching Factor = 2 Tree in memory space vs. in disk space Memory space : Number of Comparisons Disk space : Number of Block Accesses 30, b27 14, b17 40, b26 34, b17 55, b26

Paged Tree : m-way search tree How to determine m ? One Node : One Disk Page e.g. When 1 disk page is 4 K bytes 4+4m+8(m-1) = 4096  m = 341 Very fat tree Number of delimiters Delimiter 57, b27 34 103, b28 … 343, b14 Block number 1, b29 44 … 54, b21 58, b17 32 … 96, b127

Problem of m-Way search tree Search Performance : determined by the height Not balanced Average : O(log n) Worst case : n / Bf  O(n) Height : determined by insertion order e.g : insertion by ascending order How to make it balanced ? Balanced m-Way search tree : B-tree

B-tree B-tree : Balanced m-way search Tree Root Node : no child node or more than one child nodes Internal Node : m/2 ~ m child nodes (block number) External Node : data block number instead of child node Balanced Upward split instead of downward split : Binary Tree

Downward Split Suppose m=3 Insert 10, 20 10 20 20 Insert 30 10 20 30 Upward Split overflow 10 20 30 40 Insert 40

Downward Split 10 20 30 40 50 10 20 40 Insert 50 30 50 60 Insert 60 10 70 10 20 30 40 60 50 70 40 50 10 20 30 60 70

Meaning of Downward Split Always Balanced Not so much influenced by the order of insertions Internal Nodes : m/2 ~ m child nodes (block number) Root Node 40 50 10 20 30 60 70 Internal Node External Node

Search by B-tree ? 45 45 40 45 20 60 45 10 30 50 70 Not Found

Performance of B-tree Number of Comparison within a node : Trivial Number of Nodes to visit : Depth

Problem of B-tree Types of Search B-tree Exact Match Search Range Search E.g. find students where 25<student.GPA<50 B-tree Good for Exact match search Bad for range search 40 50 10 20 30 60 70

B+-tree A Variant of B-tree Performance Duplicate all elements at leaf nodes (external nodes) Linked List of Leaf Nodes Performance Exact Match Search and Insertion A small fraction of performance sacrifice Range Search : much more powerful than B-tree

B+-tree : Example Duplication 40 10 20 30 10 20 30 40 10 20 30 overflow Linked List 40 10 20 30 50 40 10 20 30 50 60 40 10 20 30 50 60

Range Search with B+-tree Find students where GPA>3.5 35 40 10 20 30 50 60 40 10 20 30 50 60 35 40 10 20 30 50 60 35 40 10 20 30 50 60 35

Performance of B+-tree Determined by the Depth Exact Match Search and Insertion (without split) d node (page) accesses Range Search node accesses ( nq : number of records to retrieve)