# Binary Searching.

## Presentation on theme: "Binary Searching."— Presentation transcript:

Binary Searching

Binary Search Searching an ordered list.
First compare the target to the key in the center of the list. If it is smaller, restrict the search to the left half; otherwise restrict the search to the right half, and repeat. In this way, at each step we reduce the length of the list to be searched by half. Roughly we need lg(n) comparison compared to n in sequential search

Binary Search vs. Sequential
Roughly we need log2(n) comparison compared to n in sequential search.

Algorithm Development
Our binary search algorithm will use two indices, top and bottom, to enclose the part of the list in which we are looking for the target key. At each iteration, we shall reduce the size of this part of the list by about half. To keep track of the progress of the algorithm the following assertion need to be true: “The target key, provided it is present, will be found between the indices bottom and top, inclusive.” We establish the initial correctness of this assertion by setting bottom to 0 and top to the_list.size( ) - 1. To do binary search, we first calculate the index mid halfway between bottom and top as mid = (bottom + top)/2 Next, we compare the target key against the key at position mid and then we change the appropriate one of the indices top or bottom so as to reduce the list to either its bottom or top half.

Algorithm Development
Next, we note that binary search should terminate when top <= bottom; that is, when the remaining part of the list contains at most one item, providing that we have not terminated earlier by finding the target. Finally, we must make progress toward termination by ensuring that the number of items remaining to be searched, top - bottom + 1, strictly decreases at each iteration of the process. Several slightly different algorithms for binary search can be written.

Algorithm Initialization: Set bottom = 0; top = the list.size( ) - 1;
Compare target with the Record at the midpoint, mid = (bottom + top)/2; Change the appropriate index top or bottom to restrict the search to the appropriate half of the list. Loop terminates when top <= bottom, if it has not terminated earlier by finding the target.

Different versions of binary search
The Forgetful Version: binary_search_1( ) forget the possibility that the Key target might be found quickly and continue to subdivide the list until what remains has length 1. Recognizing Equality: binary_search_2( ) it seems that the previous one will often make unnecessary iterations because it fails to recognize that it has found the target before continuing to iterate. Thus we might hope to save computer time with a variation that checks at each stage to see if it has found the target.

recursive_binary_1( ) Error_code recursive_binary_1(const Ordered_list &the_list, const Key &target, int bottom, int top, int &position) { Record data; if (bottom < top) { // List has more than one entry. int mid = (bottom + top)/2; the_list.retrieve(mid, data); if (data < target) // Reduce to top half of list. return recursive_binary_1 (the_list, target, mid ‡ 1, top, position); else // Reduce to bottom half of list. return recursive_binary_1(the_list, target, bottom, mid, position); } else if (top < bottom) return not_present; // List is empty. else { // List has exactly one entry. position = bottom; the_list.retrieve(bottom, data); if (data == target) return success; else return not_present; } // end of recursive_binary_1( )

recursive_binary_1( ) recursive_binary_1(the_list, target, bottom, top, position) { if (bottom < top) { // List has more than one entry. Calculate mid; // int mid = (bottom + top)/2; get data; // the_list.retrieve(mid, data); if (data < target) // Reduce to top half of list. call recursive_binary_1 (the_list, target, mid+1, top, position); else // Reduce to bottom half of list. call recursive_binary_1 (the_list, target, bottom, mid, position); } else if (top < bottom) return not_present; // List is empty. else { // top == bottom; List has exactly one entry. position = bottom; get data; // the_list.retrieve(bottom, data); if (data == target) // Once we arrived to one entry return success; // we are doing equality comparison else return not_present; } // end of recursive_binary_1( )

binary_search_1( ) One element
Error_code binary_search_1 (const Ordered_list &the_list, const Key &target, int &position) { Record data; int bottom = 0, top = the_list.size( ) - 1; while (bottom < top) { int mid = (bottom + top)/2; the_list.retrieve(mid, data); if (data < target) bottom = mid + 1; else top = mid; } if (top < bottom) return not_present; else { position = bottom; the_list.retrieve(bottom, data); if (data == target) return success; else return not_present; Everytime we fetch mid data we only compare with target. We are performing only one comparisons of each mid One element

binary_search_1( ) The division of the list into sublists is described in the following diagram: for each mid if (data < target) bottom = mid + 1; else top = mid;

recursive_binary_2( ) Error_code recursive_binary_2(const Ordered_list &the_list, const Key &target,int bottom, int top, int &position) { Record data; if (bottom <= top) { int mid = (bottom + top)/2; the_list.retrieve(mid, data); if (data == target) { position = mid; return success; } else if (data < target) return recursive_binary_2(the_list, target, mid+1, top, position); else return recursive_binary_2(the_list, target, bottom, mid-1, position); else return not_present; Every time we fetch mid data we compare for equality with target. We are performing two comparisons of each mid

binary_search_2( ) Error_code binary_search_2(const Ordered_list &the_list, const Key &target, int &position) { Record data; int bottom = 0, top = the_list.size( ) - 1; while (bottom <= top) { position = (bottom + top)/2; the_list.retrieve(position, data); if (data == target) return success; if (data < target) bottom = position + 1; else top = position - 1; } return not_present; Everytime we fetch mid data we compare for equality with target. We are performing two comparisons of each mid

binary_search_2( ) The division of the list into sublists is described in the following diagram: for each mid if (data == target) return success; if (data < target) bottom = position + 1; else top = position - 1;

Comparison Trees The comparison tree of an algorithm is obtained by tracing the action of the algorithm, representing each comparison of keys by a vertex of the tree (which we draw as a circle). Inside the circle we put the index of the key against which we are comparing the target key. The number of comparisons done by an algorithm in a particular search is the number of internal vertices traversed in going from the top of the tree, called its root, down the appropriate path to a leaf.

Comparison Tree Root Branch internal vertices Leaf

Level & height The number of branches traversed to reach a vertex from the root is called the level of the vertex. Thus the root itself has level 0, the vertices immediately below it have level 1, and so on. The largest level that occurs is called the height of the tree.

path length The external path length of a tree is the sum of the number of branches traversed in going from the root once to every leaf in the tree. The internal path length is defined to be the sum, over all vertices that are not leaves, of the number of branches from the root to the vertex. We call the vertices immediately below a vertex v the children of v and the vertex immediately above v the parent of v.

Comparison tree for sequential search
If it is an unsuccessful search the number of comparisons is n, that is the internal path length + 1 of the tree.

Comparison tree for binary_1( ); Forgetful Version; n = 10
Target less than key at mid: Search from 1 to mid. greater than mid +1 to top. 5 > <=

Comparison tree for binary_1( ) Forgetful Version
if (target >mid data) mid = (1+5)/2 mid = (6+10)/2

Comparison tree for binary_2( ), n = 10

Comparison tree for binary_2( ), n = 10

Comparison Count for binary_search_1; n = 10
In binary_search_1, every search terminates at a leaf; to obtain the average number of comparisons for both successful and unsuccessful searches, we need what is called the external path length of the tree: the sum of the number of branches traversed in going from the root once to every leaf in the tree. the external path length is: (4 x 5) + (6 x 4) + (4 x 5) + (6 x 4) = 88 Half the leaves correspond to successful searches, and half to unsuccessful searches. Hence the average number of comparisons needed for either a successful or unsuccessful search by binary_search_1( ) is 44/10 = 4.4 when n = 10.

external path length for binary_1( ) Forgetful Version

Comparison Count for binary_search_2; n = 10
In binary_search_2, all the leaves correspond to unsuccessful searches; hence the external path length leads to the number of comparisons for an unsuccessful search. the external path length is: (5 x 3) + (6 x 4) = 39 We shall assume for unsuccessful searches that the n ‡ 1 intervals (less than the first key, between a pair of successive keys, or greater than the largest) are all equally likely; for the diagram we therefore assume that any of the 11 failure leaves are equally likely. Thus the average number of comparisons for an unsuccessful search is (2 x 39)/11 = 7.1

external path length for binary_2( ) Equality Version

Comparison Count for binary_search_2; n = 10
For successful searches, we need the internal path length, which is defined to be the sum, over all vertices that are not leaves, of the number of branches from the root to the vertex. the internal path length (no. of branches) is: = 19 Recall that binary_search_2 does two comparisons for each non-leaf except for the vertex that finds the target, and note that the number of these internal vertices traversed is one more than the number of branches (for each of the n = 10 internal vertices). We thereby obtain the average number of comparisons for a successful search to be 2 x (19/10 +1) - 1 = 4.8 The subtraction of 1 corresponds to the fact that one fewer comparison is made when the target is found.

Generalization What happens when n is larger than 10?
For longer lists, it may be impossible to draw the complete comparison tree, but from the examples with n = 10, we can make some observations that will always be true.

2-Trees A 2-tree is a tree in which every vertex except the leaves has exactly two children. Lemma 7.1 The number of vertices on each level of a 2-tree is at most twice the number on the level immediately above. Hence, in a 2-tree, the number of vertices on level t is at most 2t for t >= 0. Lemma 7.2 If a 2-tree has k vertices on level t , then t >= lg k, where lg denotes a logarithm with base 2.

Floor & ceiling We denote: the floor of x by x and
The floor of a real number x is the largest integer less than or equal to x, and the ceiling of x is the smallest integer greater than or equal to x. We denote: the floor of x by x and the ceiling of x by x . e.g., x = 5.46; floor = 5; ceiling = 6;

Analysis of binary_search_1
We can now turn to the general analysis of binary_search_1 on a list of n entries. The final step done in binary_search_1 is always a check for equality with the target; hence both successful and unsuccessful searches terminate at leaves, and so there are exactly 2n leaves altogether. As illustrated in Figure 7.3 for n = 10, all these leaves must be on the same level or on two adjacent levels. (This observation can be proved by using mathematical induction to establish the following stronger statement: If T1 and T2 are the comparison trees of binary_search_1 operating on lists L1 and L2 whose lengths differ by at most 1, then all leaves of T1 and T2 are on the same or adjacent levels. The statement is clearly true when L1 and L2 are lists with length at most 2 ( L1=1 & L2=2). Moreover, if binary_search_1 divides two larger lists whose sizes differ by at most one, the sizes of the four halves also differ by at most 1, and the induction hypothesis shows that their leaves are all on the same or adjacent levels.)

Analysis of binary_search_1
From Lemma 7.2 it follows that the maximum level t of leaves in the comparison tree satisfies t =  lg 2n  . Since one comparison of keys is done at the root (which is level 0), but no comparisons are done at the leaves (level t ), it follows that the maximum number of key comparisons is also t =  lg 2n  . Furthermore, the maximum number is at most one more than the average number, since all leaves are on the same or adjacent levels. maximum number of key comparisons is also t =  lg 2n  = lg (n x 2) = lg n +lg 2 = lg n + 1 (worst case) When n is odd all leaves will be on the same level and # of comparisons = lg n

binary search 1

Analysis of binary_search_2, Unsuccessful Search
To count the comparisons made by binary_search_2 for a general value of n for an unsuccessful search, we shall examine its comparison tree. For reasons similar to those given for binary_search_1, this tree is again full at the top, with all its leaves on at most two adjacent levels at the bottom. For binary_search_2, all the leaves correspond to unsuccessful searches, so there are exactly n + 1 leaves, corresponding to the n + 1 unsuccessful outcomes: less than the smallest key, between a pair of keys, and greater than the largest key.

Analysis of binary_search_2, Unsuccessful Search
Since these leaves are all at the bottom of the tree, Lemma 7.1 implies that the number of leaves is approximately 2h , where h is the height of the tree, 2h = n + 1 Taking (base 2) logarithms, we obtain that h  lg(n + 1). This value is the approximate distance from the root to one of the leaves. Since, in binary_search_2, two comparisons of keys are performed for each internal vertex, the number of comparisons done in an unsuccessful search is approximately 2 lg(n + 1)

Binary search 2, Unsuccessful Search

Analysis of binary_search_2, Successful Search
In the comparison tree of binary_search_2, the distance to the leaves is lg(n + 1), as we have seen. The number of leaves is n + 1, so the external path length (E) is about, no. of leaves x height: E = (n + 1)lg(n + 1) Theorem 7.3 (E = I + 2q ) then shows that the internal path length is about I = (n + 1)lg(n + 1)- 2n To obtain the average number of comparisons done in a successful search, we must first divide by n (the number of non-leaves) and then add 1 and double, since two comparisons were done at each internal node. Finally, we subtract 1, since only one comparison is done at the node where the target is found. The result is: E = I + 2q I = E - 2q

Analysis of binary_search_2, Successful Search
Average number of comparisons = ( Internal path length / no. of vertices +1) 2 -1 [+1 because every internal path length (x) has (x+1) vertices] = ((n + 1)lg(n + 1)- 2n)/n +1)2 -1 = (2(n + 1)lg(n + 1))/n -2 -1 = (2(n + 1)lg(n + 1))/n -3 if n is big average number of comparisons = (2 lg n)/n -3

Binary search 2

The Path-Length Theorem

Proof To prove the theorem we use the method of mathematical induction, using the number of vertices in the tree to do the induction. If the tree contains only its root, and no other vertices, then E = I = q = 0, and the base case of the theorem is trivially correct. Now take a larger tree, and let v be some vertex that is not a leaf, but for which both the children of v are leaves. Let k be the number of branches on the path from the root to v. See Figure 7.6.

Proof Now let us delete the two children of v from the 2-tree. Since v is not a leaf but its children are, the number of non-leaves goes down from q to q - 1. The internal path length I is reduced by the distance to v (= k) ; that is, to I - k. The distance to each child of v is k + 1, so the external path length is reduced from E to E - 2(k + 1), but v is now a leaf, so its distance, k, must be added, giving a new external path length of E - 2(k + 1) + k = E - k - 2: Since the new tree has fewer vertices than the old one, by the induction hypothesis we know that E - k - 2 = (I - k) + 2(q - 1): Rearrangement of this equation gives the desired result. end of proof

Comparison of Methods The number of comparisons of keys done by binary searches in searching a list of n items is approximately: Search-1: Successful & Unsuccessful Search-2: If high value of n:

Comparison of Methods for even higher value of n:

interpolation search To find the target key target, interpolation search then estimates, according to the magnitude of the number target relative to the first and last entries of the list, about where target would be in the list and looks there. It then reduces the size of the list according as target is less than or greater than the key examined. It can be shown that on average, with uniformly distributed keys, interpolation search will take about lg lg n comparisons of keys, which, for large n, is somewhat fewer than binary search requires. If, for example, n = 1,000,000, then binary_search_1 will require about lg  21 comparisons, while interpolation search may need only about lg lg 106  4.32 comparisons.