1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

Slides:



Advertisements
Similar presentations
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Advertisements

Hashing and Indexing John Ortiz.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
1 Lecture 8: Data structures for databases II Jose M. Peña
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
Chapter 15 B External Methods – B-Trees. © 2004 Pearson Addison-Wesley. All rights reserved 15 B-2 B-Trees To organize the index file as an external search.
CS CS4432: Database Systems II Basic indexing.
BTrees & Bitmap Indexes
B+-tree and Hashing.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #10.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Efficient Storage and Retrieval of Data
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Primary Indexes Dense Indexes
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
CS 255: Database System Principles slides: B-trees
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
CS4432: Database Systems II
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
1 Physical Data Organization and Indexing Lecture 14.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Index tuning-- B+tree. overview © Dennis Shasha, Philippe Bonnet 2001 B+-Tree Locking Tree Traversal –Update, Read –Insert, Delete phantom problem: need.
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
Indexing.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
CS4432: Database Systems II Query Processing- Part 3 1.
CS 405G: Introduction to Database Systems 22 Index Chen Qian University of Kentucky.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
CS 440 Database Management Systems Lecture 6: Data storage & access methods 1.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman.
CS4432: Database Systems II
1 Query Processing Part 3: B+Trees. 2 Dense and Sparse Indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Insertions.
1 Ullman et al. : Database System Principles Notes 4: Indexing.
Select Operation Strategies And Indexing (Chapter 8)
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
CPS216: Data-intensive Computing Systems
Indexing and hashing.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
External Methods Chapter 15 (continued)
(Slides by Hector Garcia-Molina,
CPSC-310 Database Systems
Database Design and Programming
CSE 544: Lecture 11 Storing Data, Indexes
CPS216: Advanced Database Systems
Lecture 20: Indexes Monday, February 27, 2006.
Chapter 11: Indexing and Hashing
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu

Problem Relation: Employee (ID, Name, Dept, …) Relation: Employee (ID, Name, Dept, …) 10 M tuples 10 M tuples (Filter) Query: SELECT * FROM Employee WHERE Name = “Bob” (Filter) Query: SELECT * FROM Employee WHERE Name = “Bob”

3 Solution #1: Full Table Scan Storage: Storage: Employee relation stored in contiguous blocks Employee relation stored in contiguous blocks Query plan: Query plan: Scan the entire relation, output tuples with Name = “Bob” Scan the entire relation, output tuples with Name = “Bob” Cost: Cost: Size of each record = 100 bytes Size of each record = 100 bytes Size of relation = 10 M x 100 = 1 GB Size of relation = 10 M x 100 = 1 GB 20 MB/s ≈ 1 Minute 20 MB/s ≈ 1 Minute

4 Solution #2 Storage: Storage: Employee relation sorted on Name attribute Employee relation sorted on Name attribute Query plan: Query plan: Binary search Binary search

5 Solution #2 Cost: Cost: Size of a block: 1024 bytes Size of a block: 1024 bytes Number of records per block: 1024 / 100 = 10 Number of records per block: 1024 / 100 = 10 Total number of blocks: 10 M / 10 = 1 M Total number of blocks: 10 M / 10 = 1 M Blocks accessed by binary search: 20 Blocks accessed by binary search: 20 Total time: 20 ms x 20 = 400 ms Total time: 20 ms x 20 = 400 ms

6 Solution #2: Issues Filters on different attributes: SELECT * FROM Employee WHERE Dept = “Sales” Filters on different attributes: SELECT * FROM Employee WHERE Dept = “Sales” Inserts and Deletes Inserts and Deletes

7 Indexes Data structures that efficiently evaluate a class of filter predicates over a relation Data structures that efficiently evaluate a class of filter predicates over a relation Class of filter predicates: Class of filter predicates: Single or multi-attributes (index-key attributes) Single or multi-attributes (index-key attributes) Range and/or equality predicates Range and/or equality predicates (Usually) independent of physical storage of relation: (Usually) independent of physical storage of relation: Multiple indexes per relation Multiple indexes per relation

8 Indexes Disk resident Disk resident Large to fit in memory Large to fit in memory Persistent Persistent Updated when indexed relation updated Updated when indexed relation updated Relation updates costlier Relation updates costlier Query cheaper Query cheaper

Problem Relation: Employee (ID, Name, Dept, …) Relation: Employee (ID, Name, Dept, …) (Filter) Query: SELECT * FROM Employee WHERE Name = “Bob” (Filter) Query: SELECT * FROM Employee WHERE Name = “Bob” Name Single-Attribute Index on Name that supports equality predicates

10 Roadmap Motivation Motivation Single-Attribute Indexes: Overview Single-Attribute Indexes: Overview Order-based Indexes Order-based Indexes B-Trees B-Trees Hash-based Indexes Hash-based Indexes Extensible Hashing Extensible Hashing Linear Hashing Linear Hashing Multi-Attribute Indexes (Chapter 14 GMUW, May cover in future) Multi-Attribute Indexes (Chapter 14 GMUW, May cover in future)

Single Attribute Index: General Construction b 1 2 b i b n b a 1 2 a i a n a AB

b 1 2 b i b n b a 1 2 a i a n a a 1 2 a i a n a AB A = val A > low A < high

13 Exceptions Sparse Indexes Sparse Indexes Require specific physical layout of relation Require specific physical layout of relation Example: Relation sorted on indexed attribute Example: Relation sorted on indexed attribute More efficient More efficient

14 Single Attribute Index: General Construction b 1 2 b i b n b a 1 2 a i a n a a 1 2 a i a n a AB A = val A > low A < high Textbook: Dense Index

Single Attribute Index: General Construction a 1 2 a i a n a A = val A > low A < high How do we organize (attribute, pointer) pairs? Idea: Use dictionary data structures Issue: Disk resident?

16 Roadmap Motivation Motivation Single-Attribute Indexes: Overview Single-Attribute Indexes: Overview Order-based Indexes Order-based Indexes B-Trees B-Trees Hash-based Indexes Hash-based Indexes Extensible Hashing Extensible Hashing Linear Hashing Linear Hashing Multi-Attribute Indexes (Next class) Multi-Attribute Indexes (Next class)

17 B-Trees Adaptation of search tree data structure Adaptation of search tree data structure 2-3 trees 2-3 trees Supports range predicates (and equality) Supports range predicates (and equality)

Use Binary Search Tree Directly?

19 Use Binary Search Tree Directly? Store records of type Store records of type Remember position of root Remember position of root Question: will this work? Question: will this work? Yes Yes But we can do better! But we can do better!

20 Use Binary Search Tree Directly? Number of keys: 1 M Number of keys: 1 M Number of levels: log (2^20) = 20 Number of levels: log (2^20) = 20 Total cost index lookup: 20 random disk I/O Total cost index lookup: 20 random disk I/O 20 x 20 ms = 400 ms 20 x 20 ms = 400 ms B-Tree: less than 3 random disk I/O

21 B-Tree vs. Binary Search Tree k k1 k2k3k40 1 Random I/O prunes tree by half 1 Random I/O prunes tree by 40

22 B-Tree Example

23 B-Tree Example null

24 Meaning of Internal Node 8491 key < ≤ key < ≤ key

25 B-Tree Example null

26 Meaning of Leaf Nodes 6376 pointer to record 63pointer to record 76 Next leaf

27 Equality Predicates null key = 87

28 Equality Predicates null key = 87

29 Equality Predicates null key = 87

30 Equality Predicates null key = 87

31 Range Predicates null ≤ key < 95

32 Range Predicates null ≤ key < 95

33 Range Predicates null ≤ key < 95

34 Range Predicates null ≤ key < 95

35 Range Predicates null ≤ key < 95

36 Range Predicates null ≤ key < 95

37 General B-Trees Fixed parameter: n Fixed parameter: n Number of keys: n Number of keys: n Number of pointers: n + 1 Number of pointers: n + 1

38 B-Tree Example null n = 2

39 General B-Trees Fixed parameter: n Fixed parameter: n Number of keys: n Number of keys: n Number of pointers: n + 1 Number of pointers: n + 1 All leaves at same depth All leaves at same depth All (key, record pointer) in leaves All (key, record pointer) in leaves

40 B-Tree Example null n = 2

41 General B-Trees: Space related constraints Use at least Root: 2 pointers Internal:  (n+1)/2  pointers Leaf:  (n+1)/2  pointers to data Use at least Root: 2 pointers Internal:  (n+1)/2  pointers Leaf:  (n+1)/2  pointers to data

42 n= Internal Leaf Max Min

43 Leaf Nodes n key slots (n+1) pointer slots

44 Leaf Nodes n key slots (n+1) pointer slots kk k 1 2m k 3 …… unused record of k 1 2 …… m … next leaf

45 Leaf Nodes n key slots (n+1) pointer slots kk k 1 2m k 3 …… unused record of k 1 2 …… m … m ≥  (n+1) 2 next leaf

46 Internal Nodes n key slots (n+1) pointer slots

47 Internal Nodes n key slots (n+1) pointer slots kk k 1 2m k 3 key < k 1 k ≤ key < k 12 k ≤ key m unused

48 Internal Nodes n key slots (n+1) pointer slots kk k 1 2m k 3 key < k 1 k ≤ key < k 12 k ≤ key m unused (m+1) ≥ (n+1) 2

49 Root Node n key slots (n+1) pointer slots kk k 1 2m k 3 key < k 1 k ≤ key < k 12 k ≤ key m unused (m+1) ≥ 2

50 Limits Why the specific limits  (n+1)/2  and  (n+1)/2  ? Why the specific limits  (n+1)/2  and  (n+1)/2  ? Why different limits for leaf and internal nodes? Why different limits for leaf and internal nodes? Can we reduce each limit? Can we reduce each limit? Can we increase each limit? Can we increase each limit? What are the implications? What are the implications?