Presentation is loading. Please wait.

Presentation is loading. Please wait.

7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record.

Similar presentations


Presentation on theme: "7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record."— Presentation transcript:

1 7-1 Chapter 7 Searching

2 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record 在一起 61 92 83 24 BB16 CC16 AA18 DD17 age name no. 1 2 3 4 external key 另外自成一個 table, 並有 pointer

3 7-3 Terminologies of searching primary key: unique secondary key: may not be unique internal search: data stored in main memory external search: data stored in auxiliary memory retrieval: a successful search a search and insertion algorithm: retrieve the data if a successful search insert the data if an unsuccessful search

4 7-4 Abstract data type typedef KEYTYPE... // a type of key typedef RECTYPE... // a type of record RECTYPE nullrec =... // a "null" record KEYTYPE keyfunct(r) RECTYPE r; {... }; abstract typedef [rectype] TABLE (RECTYPE); abstract member(tbl, k) TABLE(RECTYPE) tbl; KEYTYPE k; postcondition if (there exists an r in tbl such that keyfunct(r) == k) then member = TRUE else MEMBER = FALSE

5 7-5 abstract RECTYPE search(tbl, k) TABLE(RECTYPE) tbl; KEYTYPE k; postcondiction (not member(tbl, k) && (search == nullrec) || (member(tbl, k) && keyfunct(search) == k); abstract insert(tbl, k) TABLE(RECTYPE) tbl; RECTYPE r; precondition member(tbl, keyfunct(R) == FALSE postcondition inset(tbl, r); (tbl - [r]) == tbl'; abstract delete(tbl, k) TABLE(RECTYPE) tbl; KEYTYPE k; postcondition tbl == (tbl' - [search(tbl, k)]);

6 7-6 Sequential search (linear search) Applied to an array or a linked list Data are not sorted. e.g. 9 5 6 8 7 2 (1) search 6: successful (2) search 4: unsuccessful (3) delete 6: 9 5 2 8 7 (4) insert 4: 9 5 2 8 7 4 time complexity: successful search: comparisons = O(n) unsuccessful search: n comparisons = O(n)

7 7-7 algorithm: for (i = 0; i < n; i++) if (key == k[i]) return(i); return(-1); sentinel: an extra key inserted at the end of the array k[n] = key; for (i = 0; key != k[i]; i++) ; if (i < n) return(i); else return(-1); Sequential search with C

8 7-8 Move-to-front method Let p(i) be the probability that record i is retrieved. p(0)+ p(1)+... + p(n-1) = 1. average number of comparisons: p(0) + 2p(1) + 3p(2) +... + np(n-1) This number is minimized if p(0) ≧ p(1) ≧ p(2) ≧... ≧ p(n-1). move-to-front method e.g. 9 5 6 8 7 2 (1) search 6: 6 9 5 8 7 2 (2) search 8: 8 6 9 5 7 2 The retrieved record is moved to the head of the list

9 7-9 Transposition method e.g. 9 5 6 8 7 2 (1) search 6: 9 6 5 8 7 2 (2) search 8: 9 6 8 5 7 2 The retrieved record is interchanged with the preceding record. The transposition method is more efficient in an unchanging probability distribution The move-to-front method is better for a small to medium number of requests and for quickly changing probability distribution. Mixed method: –use the move-to-front method for the first s searches, then use the transposition method.

10 7-10 Searching in an ordered table linear searching: comparisons (sequential) (average) (successful or unsuccessful) 8 73 132 231 321 480 589 592 650 651 732 789 833 876 KeyRecord

11 7-11 Indexed sequential search (1) 321 592 876 8 73 132 231 321 480 589 592 650 651 732 789 833 876 index pointer KeyRecord Indexed sequential file: sorted

12 7-12 Indexed sequential search (2) The use of an index is applicable to a sorted table stored as an array or a linked list. Deletion: by a flag Insertion: 1)shift some elements if there exist some deleted entries. (Pointers need be changed in the index file) 2)keep an overflow area

13 7-13 A secondary index 591 742 321 485 591 647 706 742 (Key)(Record) 321 485 591 647 706 742 Secondary index Primary index Sequential table

14 7-14 Binary search e.g. 2 5 6 7 8 9 search 7: needs 3 comparisons Time complexity: O(logn) used only if the table is sorted and stored in an array. An insertion or a deletion requires O(n) time. Improvement: two arrays, one for flags, the other for the sorted keys and some "empty holes". feeffefff A**DF*GIK data flag e: empty f: full

15 7-15 Binary search tree inorder traversal: 2 5 6 7 8 9 The binary search uses a sorted array as an implicit binary search tree. (The middle element of the array is the root.) 6 2 5 8 79

16 7-16 Insertion in a binary search tree 6 2 5 8 79 4 Insert 4 The inserted key is added to the tree as its leaf node. 6 2 5 8 79

17 7-17 Case 1: The deleted node has no sons. Delete it directly. 8 11 14 13 1512 9 10 3 15 6 7 8 11 14 13 12 9 10 3 15 6 7 Deleting node with key 15. Deletion in a binary search tree (1)

18 7-18 Case 2: The deleted node has only one subtree. Delete it and move the subtree up. 8 11 14 13 1512 9 10 3 15 6 7 8 11 14 13 1512 9 10 3 16 7 Deleting node with key 5. Deletion in a binary search tree (2)

19 7-19 C ase 3: The deleted node has two subtrees. Its inorder successor s takes its place. The right son of s takes the place of s. (s has no left son.) 8 11 14 13 1512 9 10 3 15 6 7 8 12 14 1513 9 10 3 15 6 7 Deleting node with key 11. Deletion in a binary search tree (3)

20 7-20 Asymmetric deletion: replaced by inorder successors Symmetric deletion: replaced by inorder predecessors and successors alternately. Average search time in a binary search tree: O(logn) Deletion in a binary search tree (4)

21 7-21 Optimum binary search trees e.g. sorted data: 2 3 5 7 some binary search trees: In an optimum binary search tree, the expected number of comparisons is minimized under a given set of keys and probabilities. 2 3 5 7 3 5 7 2 3 7 5 2 5 7 3 2

22 7-22 e.g. k2k2 k3k3 k1k1 p2p2 q1q1 q0q0 p1p1 p3p3 q2q2 q3q3 p i : probability for successful search q i : probability for unsuccessful search expected number of comparisons: 2p 1 + p 2 + 2p 3 + 2q 0 + 2q 1 + 2q 2 + 2q 3 e.g. k3k3 k2k2 k1k1 expected number of comparisons: 2p 1 + 3p 2 + p 3 + 2q 0 + 3q 1 + 3q 2 + q 3

23 7-23 Construction of (near) optimum search trees (1) (1) Balancing method e.g. key(data) 1 2 3 4 5 6 7 frequencies of 2 10 3 1 4 8 9 successful search partial sum 2 12 15 16 20 28 37 Select i as the root such that the difference of the costs on the left and the right is minimized. The binary search tree can be constructed recursively. Time complexity: O(n) frequency 5 7 4 2 316 1617 24 1 8

24 7-24 Construction of (near) optimum search trees (2) 2424 7575 3131 1456 node key split key (2) Median split tree e.g. key(data) 1 2 3 4 5 6 7 frequencies 2 10 3 1 4 8 9 The most frequent key is stored in the root. The split key is the median of all remaining keys. The binary search tree can be constructed recursively. The tree is a balanced tree. Time complexity: O(nlogn) How to search?

25 7-25 Balanced binary tree (AVL tree) The heights of the two subtrees of every node never differ by more than 1. balance = (height of left subtree) – (height of right subtree) Each node in a balanced binary tree has a balance of 1, -1, or 0. A balanced binary tree: 1 0 00 0 0 00 00 0 1 00 0

26 7-26 Rotations of a binary tree The inorder traversal is the same after a rotation is performed. (a) Original tree(b) Right rotation B AD FC EG D BF GECA p q r (c) Left rotation F DG EB AC left rotation : q = right(p) r = left(q) left(q) = p right(p) = r

27 7-27 Insertion of an AVL tree (1) 1 0 Tree T3 Height = n T1 H = n T2 H = n Newly inserted node C A Case 1: Node C is the first unbalanced node traced up from the newly inserted node. 0 0 T3 H = n T1 H = n T2 H = n Newly inserted node A C The height of the subtree is not changed after the new insertion. right rotation on the subtree rooted at C

28 7-28 2 2 T4 H = n T1 H = n T3 H = n-1 Newly inserted node C B T2 H = n-1 0 A 1 0 C A 0 B T4 H = n T1 H = n T3 H = n-1 T2 H = n-1 Newly inserted node First rotation: left rotation on the subtree rooted at A Case 2: Insertion of an AVL tree (2)

29 7-29 0 0 T4 H = n T1 H = n T3 H = n-1 Newly inserted node B A T2 H = n-1 C Second rotation: right rotation on the subtree rooted at C The height of the subtree is not changed after the new insertion. Insertion requires at most 2 rotations. Deletion is more complex, it requires O(logn) rotations.

30 7-30 Multiway search trees A multiway search tree of order n: at most n subtrees at most n-1 keys in a node 12 50 85 60 70 80100 120 150 6 10 37 62 65 69 110 37 A B C D E F G H

31 7-31 B-trees B-tree of order m: ≦ # of keys in a nonroot node ≦ m-1 1 ≦ # of keys in the root node ≦ m-1 m-1 2 a B-tree of order 5: 320 540 430 480 451 472 380 395 406 412493 506 511 (a) Initial portion of a B-tree

32 7-32 451 472 493 506 511 (b) After inserting 382 380 382406 412 395 430 480 451 472 (c) After inserting 518 and 508 380 382406 412 395 430 480 508 493 506 511 518

33 7-33 a B-tree of order 4: 152 186 194 87 140 90 100 10623 61 74 (a) An initial B-tree twig 152 186 194 23 61 74 (b) Inserting 102 with a left bias 97 102 140 90 100106

34 7-34 152 186 194 23 61 74 (c) Inserting 102 with a right bias 87 100 140 102 10690

35 7-35 Deletion in multiway search trees (1) The simplest method a)Mark a deleted key, do not remove it. b)disadvantage Waste space In a nonleaf node, only the same key can reuse the "deleted" space. (2) A technique similar to binary search trees used in an unrestricted multiway search tree a)If the key has an empty left or right subtree, remove it. If it is the only one key in the node, remove the node. b)Otherwise, its successor takes its place. (The successor has an empty left subtree.)

36 7-36 (i) Shift a key from its father and its brother (borrow) 80 120 150 126 135 142 80 126 150 90 120135 142 90 113 A B AB Delete key 113 Deletion in B-trees

37 7-37 (ii) Take a key from its father and combine with its brother 80 126 150 68 7390 126 135 142 90 120 B B 68 73135 142 80 150 Delete key 120 and consolidate

38 7-38 (iii) do (ii), then do (i) for its father 60 170 80 15030 50 187 202173 178153 16287 9665 72 180 220 280 AB C DE 60 180 150 17030 50 187 202173 178153 16272 80 87 96 220 280 AB C DE Delete 65, consolidate and borrow

39 7-39 (iv) do (ii), then do (ii) for its father. 60 180 300 153 162173 178187 202 220 280150 17030 50 AB C DEF G 60 300 153 162 170 178187 202 150 180 220 28030 50 AB C D EF G Deleting 173 and a double consolidation

40 7-40 Deletion in B-trees This may be done up to the root. If the root has more than one key => no problem. If the root has only one key => remove the root. Insertion, deletion or searching in a B-tree requires O(logn) time, where n denotes the number of nodes in the B-tree.

41 7-41 B + -tree All keys are maintained in leaf nodes and keys are also replicated in nonleaf nodes. Finding the next record: O(1) time

42 7-42 Digital search tree Keys 180 185 1867 195 207 217 2174 21749 217493 226 27 274 278 279 2796 281 284 285 286 287 288 294 307 768 1 8 eok 650 9 5 7 end of key

43 7-43 2 4 01 eok 8 2 6 9 6 7 7 3 7 6 0 7 3 4 9 8 7 7 4 68 5 1 8 4 9

44 7-44 Trie (1)This is one kind of digital search trees. (2)Each node contains exactly m pointers. (Some of them are null.) e.g. m=10 for numerical data. (3)It is useful when the set of keys is dense.

45 7-45 Hashing hash function: to transforming a key into a table index e.g. data: 18 23 33 13 24 10 hash function: h(k) = k mod 10 hash collision: Two records (keys) attempt to insert into the same position. 010 1 2 323 433 513 624 7 818 9

46 7-46 Resolution of hash collision (1) open addressing (rehashing) a) linear probing: to place the collided record in the next available position in the array b) rehashing function:...

47 7-47 (2) chaining 0 1 2 3null 4 5 6 7 8 9 91null 42192372null 130null 75null 49null 66 87 16null 67227417null 40 krnext

48 7-48 Issues of hashing How to choose a hash function? the division method: h(key) = key mod m It is best that the table size m is prime. Advantage of hashing: faster than binary search Disadvantage of hashing: 1.need more memory. 2.to delete a record is difficult.


Download ppt "7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record."

Similar presentations


Ads by Google