E.G.M. Petrakissearching1 Searching  Find an element in a collection in the main memory or on the disk  collection: (K 1,I 1 ),(K 2,I 2 )…(K N,I N )

Slides:



Advertisements
Similar presentations
Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
Advertisements

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Hashing and Indexing John Ortiz.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Data Organization - B-trees. 11.2Database System Concepts A simple index Brighton A Downtown A Downtown A Mianus A Perry.
1 Lecture 8: Data structures for databases II Jose M. Peña
ICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 02/18/20101Lipyeow Lim.
E.G.M. PetrakisHashing1  Data organization in main memory or disk  sequential, binary trees, …  The location of a key depends on other keys => unnecessary.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Tree-Structured Indexes. Introduction v As for any index, 3 alternatives for data entries k* : À Data record with key value k Á Â v Choice is orthogonal.
1 Tree-Structured Indexes Yanlei Diao UMass Amherst Feb 20, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
E.G.M. Petrakissorting1 Sorting  Put data in order based on primary key  Many methods  Internal sorting:  data in arrays in main memory  External.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
B-Trees Chapter 9. Limitations of binary search Though faster than sequential search, binary search still requires an unacceptable number of accesses.
1 abstract containers hierarchical (1 to many) graph (many to many) first ith last sequence/linear (1 to 1) set.
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
E.G.M. PetrakisB-trees1 Multiway Search Tree (MST)  Generalization of BSTs  Suitable for disk  MST of order n:  Each node has n or fewer sub-trees.
E.G.M. PetrakisTrees in Main Memory1 Balanced BST  Balanced BSTs guarantee O(logN) performance at all times  the height or left and right sub-trees are.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
IT 60101: Lecture #151 Foundation of Computing Systems Lecture 15 Searching Algorithms.
7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
File Processing : Index and Hash 2015, Spring Pusan National University Ki-Joune Li.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
CSC 211 Data Structures Lecture 13
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
1 Searching Searching in a sorted linked list takes linear time in the worst and average case. Searching in a sorted array takes logarithmic time in the.
Chapter 12 B+ Trees CS 157B Spring 2003 By: Miriam Sy.
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
Internal and External Sorting External Searching
Lecture 9COMPSCI.220.FS.T Lower Bound for Sorting Complexity Each algorithm that sorts by comparing only pairs of elements must use at least 
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Chapter 5 Record Storage and Primary File Organizations
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Database Applications (15-415) DBMS Internals- Part IV Lecture 15, March 13, 2016 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
Tree-based Indexing Hessam Zakerzadeh.
CS522 Advanced database Systems
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Database Management Systems (CS 564)
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
Advanced Associative Structures
B+-Trees and Static Hashing
Tree-Structured Indexes
Indexing and Hashing Basic Concepts Ordered Indices
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Database Systems (資料庫系統)
2018, Spring Pusan National University Ki-Joune Li
Multiway Search Tree (MST)
Presentation transcript:

E.G.M. Petrakissearching1 Searching  Find an element in a collection in the main memory or on the disk  collection: (K 1,I 1 ),(K 2,I 2 )…(K N,I N )  given a query (I,K) locate (I i,K i ): K i = K  Primary key K i : identity of record  Secondary key: can be repeated  The search can be successful or unsuccessful

E.G.M. Petrakissearching2 Searching Methods  Sequential: data on lists or arrays  O(N) time, may be unacceptably slow  Indexed search:  tree indexing: data in trees  hashing or direct access : data on tables  Indexing requires preprocessing and extra space

E.G.M. Petrakissearching3 Important Factors  Ordered or unordered data  Known or unknown data distribution  some elements are searched more frequently  Data in main memory or disk  time depends on algorithmic steps or disk accesses  Dynamic (or static) data collections  Insertions & deletions are allowed (or not allowed)  Types of search operations allowed  random queries: search for records with key = k  range queries: search for records key low <= k <= key high

E.G.M. Petrakissearching4 Unordered Sequences  Lists or arrays of N elements  Number of comparisons:  p i : prob. to search for the i-th element  x i : number of comparisons when searching for the i-th element elements

E.G.M. Petrakissearching5 Equally Probable Elements  Cost of successful search  Cost to search for an element which may or may not be in the array  if p e : probability to search for the i-th element

E.G.M. Petrakissearching6 Other Cases  If p 1 >= p 2 >= … >= p N : move elements with higher probabilities to the front  If the probabilities are not known it is likely that some elements are searched more frequently than others element pipi

E.G.M. Petrakissearching7 I. Move to Front  Move the element to the front  e.g., if the user searches for 10  becomes:  Easy for lists, difficult for arrays: N-1 elements are moved 1 position to the left

E.G.M. Petrakissearching8 II. Transpositions  The element is shifted one position to the right  e.g., search(10)  becomes  Easy for arrays and lists

E.G.M. Petrakissearching9 Critique  Move to front adapts rapidly to the search conditions of the application  Transposition adapts slowly but is more intuitively correct  Combine the two techniques:  use initially move to front and  transposition later

E.G.M. Petrakissearching10 Searching Ordered Sequences  Sort the elements once  complexity: O(logN) instead of O(N)  Search techniques:  binary search  interpolation search  indexed sequential search

E.G.M. Petrakissearching11 I. Binary Search d: max number of comparisons d=2 levels

E.G.M. Petrakissearching12 Complexity  Maximum number or comparisons: a leaf is reached  Expected number of comparisons: tree searching stops before a leaf is reached

E.G.M. Petrakissearching13 II. Interpolation  Searching is guided by the values of the array  L: minimum value  U: maximum value  search position  Binary search always goes to the middle position

E.G.M. Petrakissearching14 Example  if x[h] = key element found; else search array on the left or on the right of h  e.g.  search(80): focuses on the 20% rightmost part of the array 0100

E.G.M. Petrakissearching15 Complexity  Average case: O(loglogN) uniform distribution of keys in the array  Worst case: O(N) on non uniform distribution  Binary search is O(logN) always!

E.G.M. Petrakissearching16 III. Indexed Sequential Search  A sorted index is set aside in addition to the array  Each element in the index points to a block of elements in the array  e.g., block of 10 or 20 elements  The index is searched before the array and guides the search in the array

E.G.M. Petrakissearching17 index array

E.G.M. Petrakissearching18 index1 index2 array

E.G.M. Petrakissearching19 File Searching  Access a data page, load it in the main memory and search for the key  unordered files: O(#blocks) disk accesses  ordered files: O(log#blocks) disk accesses  disk head moves back and forth  difficult to control the disk head moves especially in multi-user environments  leave 20% extra space for insertions

E.G.M. Petrakissearching20 Ordered Files  Optimize the performance using an auxiliary batch file  batch operations in ascending key order  process the operations one after the other  batch a 1 <= a 2 <= … <=a N file transactions new file a1a1 not searched

E.G.M. Petrakissearching21 ISAM  Data pages on the disk  Indices for faster retrievals  Pseudo Dynamic Scheme  Dynamic Schemes  B-trees  B+-trees, …

E.G.M. Petrakissearching22 Index Sequential Files (ISAM)  Random access based on primary key  Fast disk access through an index  Indices to data pages on the disk (8, ) (16, ) (27, ) (38, ) (46, ) index file

E.G.M. Petrakissearching23 ISAM Index  Master index: to disks - surfaces  Cylinder index: one per disk unit  Track index: one per cylinder cylinder surface track or block

E.G.M. Petrakissearching24 Retrieval  Locate cylinder: 1 st disk access  Locate surface: 2 nd disk access  Locate track: 3 rd disk access  Overflows will cause more disk accesses!! search Key block records cylinder index surface index

E.G.M. Petrakissearching25 Overflows  No space left on track  Solutions 1. chaining: 2. distribution of overflow space between neighboring primary pages  file reorganization necessary soon or later!!  Dependence on hardware!  Pseudo dynamic behavior! overflow pages

E.G.M. Petrakissearching26 Tree Search  The elements are stored in a Binary Search Tree

E.G.M. Petrakissearching27 Complexity  Average number of key comparisons or length of path traversed  average case: O(logN) comparisons  worst case: BST is reduced to list and search is O(N) !!  The form of a BST depends on the insertion sequence  the keys are ordered: BST becomes list

E.G.M. Petrakissearching28 Theorem  Testing for membership in a random BST takes O(logN) time (expected cost)  P(n): average number of nodes from root to a node  P(0)=0, P(1)=1  P(i): average height of left sub-tree  P(n-i-1): average height of right sub-tree α i N-i-1 < α > α> α

E.G.M. Petrakissearching29 Proof  Average number of comparisons  Average over all insertion sequences left sub-treeright sub-tree root

E.G.M. Petrakissearching30 Proof (cont.)  … because a can be inserted first, second, n-th element => n cases  N – i - 1  i =>  Prove by induction: P(N) <= 1 + 4logN  a more careful analysis shows that the constant is about 1.4 => P(N) <= 1.4logN

E.G.M. Petrakissearching31 TreesArrays/ListsHashing Main memory (Static)  Optimal Trees  Unsorted (move-to-front, transposition)  Sorted (binary search)  Rehashing  Coalesced  chaining Main memory (dynamic mem. allocation)  BST  AVL  SPLAY  Unsorted (move-to-front, transposition)  Separate chaining Disk (static)  Files with overflows  Indexed sequential Files (ISAM)  Table  Separate chaining Disk (dynamic mem. allocation)  M-trees  B-trees, B+- trees (VSAM)  Dynamic  Extendible  Linear