Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

Similar presentations


Presentation on theme: "1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu."— Presentation transcript:

1 1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu

2 Problem Relation: Employee (ID, Name, Dept, …) Relation: Employee (ID, Name, Dept, …) 10 M tuples 10 M tuples (Filter) Query: SELECT * FROM Employee WHERE Name = “Bob” (Filter) Query: SELECT * FROM Employee WHERE Name = “Bob”

3 3 Solution #1: Full Table Scan Storage: Storage: Employee relation stored in contiguous blocks Employee relation stored in contiguous blocks Query plan: Query plan: Scan the entire relation, output tuples with Name = “Bob” Scan the entire relation, output tuples with Name = “Bob” Cost: Cost: Size of each record = 100 bytes Size of each record = 100 bytes Size of relation = 10 M x 100 = 1 GB Size of relation = 10 M x 100 = 1 GB Time @ 20 MB/s ≈ 1 Minute Time @ 20 MB/s ≈ 1 Minute

4 4 Solution #2 Storage: Storage: Employee relation sorted on Name attribute Employee relation sorted on Name attribute Query plan: Query plan: Binary search Binary search

5 5 Solution #2 Cost: Cost: Size of a block: 1024 bytes Size of a block: 1024 bytes Number of records per block: 1024 / 100 = 10 Number of records per block: 1024 / 100 = 10 Total number of blocks: 10 M / 10 = 1 M Total number of blocks: 10 M / 10 = 1 M Blocks accessed by binary search: 20 Blocks accessed by binary search: 20 Total time: 20 ms x 20 = 400 ms Total time: 20 ms x 20 = 400 ms

6 6 Solution #2: Issues Filters on different attributes: SELECT * FROM Employee WHERE Dept = “Sales” Filters on different attributes: SELECT * FROM Employee WHERE Dept = “Sales” Inserts and Deletes Inserts and Deletes

7 7 Indexes Data structures that efficiently evaluate a class of filter predicates over a relation Data structures that efficiently evaluate a class of filter predicates over a relation Class of filter predicates: Class of filter predicates: Single or multi-attributes (index-key attributes) Single or multi-attributes (index-key attributes) Range and/or equality predicates Range and/or equality predicates (Usually) independent of physical storage of relation: (Usually) independent of physical storage of relation: Multiple indexes per relation Multiple indexes per relation

8 8 Indexes Disk resident Disk resident Large to fit in memory Large to fit in memory Persistent Persistent Updated when indexed relation updated Updated when indexed relation updated Relation updates costlier Relation updates costlier Query cheaper Query cheaper

9 Problem Relation: Employee (ID, Name, Dept, …) Relation: Employee (ID, Name, Dept, …) (Filter) Query: SELECT * FROM Employee WHERE Name = “Bob” (Filter) Query: SELECT * FROM Employee WHERE Name = “Bob” Name Single-Attribute Index on Name that supports equality predicates

10 10 Roadmap Motivation Motivation Single-Attribute Indexes: Overview Single-Attribute Indexes: Overview Order-based Indexes Order-based Indexes B-Trees B-Trees Hash-based Indexes Hash-based Indexes Extensible Hashing Extensible Hashing Linear Hashing Linear Hashing Multi-Attribute Indexes (Chapter 14 GMUW, May cover in future) Multi-Attribute Indexes (Chapter 14 GMUW, May cover in future)

11 Single Attribute Index: General Construction b 1 2 b i b n b a 1 2 a i a n a AB

12 b 1 2 b i b n b a 1 2 a i a n a a 1 2 a i a n a AB A = val A > low A < high

13 13 Exceptions Sparse Indexes Sparse Indexes Require specific physical layout of relation Require specific physical layout of relation Example: Relation sorted on indexed attribute Example: Relation sorted on indexed attribute More efficient More efficient

14 14 Single Attribute Index: General Construction b 1 2 b i b n b a 1 2 a i a n a a 1 2 a i a n a AB A = val A > low A < high Textbook: Dense Index

15 Single Attribute Index: General Construction a 1 2 a i a n a A = val A > low A < high How do we organize (attribute, pointer) pairs? Idea: Use dictionary data structures Issue: Disk resident?

16 16 Roadmap Motivation Motivation Single-Attribute Indexes: Overview Single-Attribute Indexes: Overview Order-based Indexes Order-based Indexes B-Trees B-Trees Hash-based Indexes Hash-based Indexes Extensible Hashing Extensible Hashing Linear Hashing Linear Hashing Multi-Attribute Indexes (Next class) Multi-Attribute Indexes (Next class)

17 17 B-Trees Adaptation of search tree data structure Adaptation of search tree data structure 2-3 trees 2-3 trees Supports range predicates (and equality) Supports range predicates (and equality)

18 Use Binary Search Tree Directly? 16325471748392 16 74 71 54 32 92 83

19 19 Use Binary Search Tree Directly? Store records of type Store records of type Remember position of root Remember position of root Question: will this work? Question: will this work? Yes Yes But we can do better! But we can do better!

20 20 Use Binary Search Tree Directly? Number of keys: 1 M Number of keys: 1 M Number of levels: log (2^20) = 20 Number of levels: log (2^20) = 20 Total cost index lookup: 20 random disk I/O Total cost index lookup: 20 random disk I/O 20 x 20 ms = 400 ms 20 x 20 ms = 400 ms B-Tree: less than 3 random disk I/O

21 21 B-Tree vs. Binary Search Tree k k1 k2k3k40 1 Random I/O prunes tree by half 1 Random I/O prunes tree by 40

22 22 B-Tree Example 15365763768792100

23 23 B-Tree Example null 63 36 153657 84 63 7687 91 92100

24 24 Meaning of Internal Node 8491 key < 84 84 ≤ key < 91 91 ≤ key

25 25 B-Tree Example null 63 36 153657 84 63 7687 91 92100

26 26 Meaning of Leaf Nodes 6376 pointer to record 63pointer to record 76 Next leaf

27 27 Equality Predicates null 63 36 153657 84 63 7687 91 92100 key = 87

28 28 Equality Predicates null 63 36 153657 84 63 7687 91 92100 key = 87

29 29 Equality Predicates null 63 36 153657 84 63 7687 91 92100 key = 87

30 30 Equality Predicates null 63 36 153657 84 63 7687 91 92100 key = 87

31 31 Range Predicates null 63 36 153657 84 63 7687 91 92100 57 ≤ key < 95

32 32 Range Predicates null 63 36 153657 84 63 7687 91 92100 57 ≤ key < 95

33 33 Range Predicates null 63 36 153657 84 63 7687 91 92100 57 ≤ key < 95

34 34 Range Predicates null 63 36 153657 84 63 7687 91 92100 57 ≤ key < 95

35 35 Range Predicates null 63 36 153657 84 63 7687 91 92100 57 ≤ key < 95

36 36 Range Predicates null 63 36 153657 84 63 7687 91 92100 57 ≤ key < 95

37 37 General B-Trees Fixed parameter: n Fixed parameter: n Number of keys: n Number of keys: n Number of pointers: n + 1 Number of pointers: n + 1

38 38 B-Tree Example null 63 36 153657 84 63 7687 91 92100 n = 2

39 39 General B-Trees Fixed parameter: n Fixed parameter: n Number of keys: n Number of keys: n Number of pointers: n + 1 Number of pointers: n + 1 All leaves at same depth All leaves at same depth All (key, record pointer) in leaves All (key, record pointer) in leaves

40 40 B-Tree Example null 63 36 153657 84 63 7687 91 92100 n = 2

41 41 General B-Trees: Space related constraints Use at least Root: 2 pointers Internal:  (n+1)/2  pointers Leaf:  (n+1)/2  pointers to data Use at least Root: 2 pointers Internal:  (n+1)/2  pointers Leaf:  (n+1)/2  pointers to data

42 42 n=3 51521 15 31 4256 3142 Internal Leaf Max Min

43 43 Leaf Nodes n key slots (n+1) pointer slots

44 44 Leaf Nodes n key slots (n+1) pointer slots kk k 1 2m k 3 …… unused record of k 1 2 …… m … next leaf

45 45 Leaf Nodes n key slots (n+1) pointer slots kk k 1 2m k 3 …… unused record of k 1 2 …… m … m ≥  (n+1) 2 next leaf

46 46 Internal Nodes n key slots (n+1) pointer slots

47 47 Internal Nodes n key slots (n+1) pointer slots kk k 1 2m k 3 key < k 1 k ≤ key < k 12 k ≤ key m unused

48 48 Internal Nodes n key slots (n+1) pointer slots kk k 1 2m k 3 key < k 1 k ≤ key < k 12 k ≤ key m unused (m+1) ≥ (n+1) 2

49 49 Root Node n key slots (n+1) pointer slots kk k 1 2m k 3 key < k 1 k ≤ key < k 12 k ≤ key m unused (m+1) ≥ 2

50 50 Limits Why the specific limits  (n+1)/2  and  (n+1)/2  ? Why the specific limits  (n+1)/2  and  (n+1)/2  ? Why different limits for leaf and internal nodes? Why different limits for leaf and internal nodes? Can we reduce each limit? Can we reduce each limit? Can we increase each limit? Can we increase each limit? What are the implications? What are the implications?


Download ppt "1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu."

Similar presentations


Ads by Google