Presentation is loading. Please wait.

Presentation is loading. Please wait.

 1 Searching. Why searching? Searching is everywhere. Searching is an essential operation for a dictionary and other data structures. Understand the.

Similar presentations


Presentation on theme: " 1 Searching. Why searching? Searching is everywhere. Searching is an essential operation for a dictionary and other data structures. Understand the."— Presentation transcript:

1  1 Searching

2 Why searching? Searching is everywhere. Searching is an essential operation for a dictionary and other data structures. Understand the efficiency between searching algorithms. 2

3 Back to counting Counting 2’s in a given list. counter = 0 for e in L: if e == counter: counter += 1 Counting is a repeated searching. Search 2; Remove the 2, Search another 2, Remove the 2, …; The repetition is the number of 2s. 3

4 Searching Searching is an operation for a given data, basically a list. You find within a file. A file is equivalent to a big string. A string is kind of a tuple with characters. A tuple is essentially a non-modifiable list! =) A search operation find a set of elements satisfies a given condition; cond() function. cond( )  True 4

5 Linear Search A basic search operation is looking for each element in an order. Advantages: Simple No need of extra memory to remember previously searched elements. Essentially no difference with random search (we will prove!) 5

6 Linear Search def linear_search(L, cond): for (j,e) in enumerate(L): if cond(e): # cond(e) == True return (j, e) return (-1, None) A search op. should give the element’s position as well as its value because we have the value already! 6

7 Linear Search Disadvantage: Better methods are possible with a few assumptions. The number of comparisons (the number of cond() calls) is proportional to the number of items. Let’s do some mathematics. 7

8 Linear Search Given a list of distinct numbers, let’s find from the first element. linear_search(L, L[0]) #  1 cond() call linear_search(L, L[1]) #  2 cond() calls linear_search(L, L[2]) #  3 cond() calls … linear_search(L, L[n]) #  n+1 cond() calls 8

9 Linear Search The expected number of comparisons is 9

10 Better way? Let’s play a number game. A has a number between 1 and 99 and B gives his/her number. If A’s number is smaller than B’s, say ‘down’; Otherwise, say ‘up’. Repeat this until B finds A’s number. 10

11  11 Do you need 45 questions in average? You can ask 2’s multiples or 3’s multiples to skip a few numbers

12 Skip Search Let’s say finding an element with m’s multiples as skip search! However, we need comp() function which tells ‘up’, ‘down’, or ‘bingo!’ instead of cond() function. def comp(n, c): return “bingo!” if n==c else ( “up” if n>c else “down”) 12

13 Skip Search The structure of skip_search until comp() function says “down” we ask m’s multiples from 1. While “up”, skip m-1 numbers each question. If comp() says bingo!, we saved questions. If comp() says “down”, we try linear_search starting from the previous number asked. 13

14 Skip Search 14 Target number: 23 Asking numbers: Compared to linear search (23 questions), only 8 questions were required. 16111621262223

15 Skip Search 15 What is the expected number of questions? To find 1  1 question To find 2  2 questions (skip) + 1 question (linear) To find 3  2 questions (skip) + 2 question (linear) To find 4  2 questions (skip) + 3 question (linear) To find 5  2 questions (skip) To find 6  2 questions (skip) + 3 question (linear) …

16 Skip Search 16 What is the expected number of questions? 1: 1, 2: 2+1, 3: 2+2, 4: 2+3, 5: 2+4, 6: 2, 7: 3+1, 8: 3+2, 9: 3+3, 10: 3+4, 3 … 1 + 2*5+(1+2+3+4)+3*5+(1+2+3+4)+…+ 19*5+(1+2+3+4) = 1+(2+3+4+…+19)*5+(1+2+3+4)*18 = 1+sum(range(2,20))*5+10*18 = 1126

17 Skip Search Compared to 45 questions in linear search, In general, Asymptotically, it didn’t improve …. 17

18 The matter of efficiency 18 In general, it doesn’t matter and its linear performance is acceptable. However, if the number of items are more than millions, billions, or trillions, … More worse, if a nested searching is required? for l1 in L1: L2 = search(X, l1) for l2 in L2: x = search(X, l2)

19 Adaptive Search The cause of inefficiency is the fixed number of skips for each “up” answer. In the equation, m << n so that it is nothing but 1 We could try to skip questions in an adaptive or incremental way skip 1, 2, 4, 8, 16 numbers for each “up” answer 19

20 Adaptive Search If the answer is bingo!, then okay If the answer is down, then repeat this adaptive search! Let’s see how it works 20

21 Adaptive Search 21 Target number: 88 Asking numbers: Even for relatively higher number 88, it needs only 14 questions (19 for the skip, 88 for the linear) 1248163264128 6567718711988

22 Adaptive Search Analysis to find 1  1 question to find 2  2 question to find 3  3 + 1 question … to find 2^n  n question to find 2^n + 1  n+1 question The analysis is difficult due to its recursive structure. 22

23 Adaptive Search Approximate Analysis Suppose that we needed n questions until we get the first “down”. We skipped 1 + 2 + … + 2^(n-1) numbers and learned that the answer is less than 2^(n). For the second question, we have the search range to 2^(n-1) – 1, which is reduced, because 2^(n-1) + 2^(n- 1) = 2^n 23

24 Adaptive Search Approximate Analysis To find from 2^(n-1) – 1 numbers, the maximal number of questions will be n-1 question. If we repeat, we need at most n-2 questions to find the number. We need 1, 2, …, n questions to find a number within a range from 1 to 2^n - 1. 24

25 Adaptive Search Approximate Analysis We need only 1+2+…+n questions to find a number between 1 and 2^n – 1. If we are looking within a range between 1 and n, we can take a log() function since log (2^n) = n The approximated required number of questions are 25

26 Adaptive Search Our adaptive (or incremental) search is efficient than the linear search of the skip search! Making 3^n or 5^n will not change its efficiency asymptotically. In fact, this is the upper bound for search in an ordered sequence. There is a formalized way to do this called the binary search. 26

27  27 Binary Search

28 Adaptive Search vs. Binary Search The efficiency is essentially same. The assumption is that the sequence must be ordered before any search operation. There must be the comp() function which will answer your question with either of “bingo!”, “up”, or “down”. Fortunately, we have for numbers and strings. The ‘<‘ and ‘==‘ operator will answer your question. 28

29 Binary Search We know the number of candidate items to search. The idea is to begin with the number of candidates and decrease by the factor of 2. In other words, search either of the lower halves or the upper halves for each question. The search range reduces by half. 100, 50, 25, 12, 6, 3, 1 29

30 Binary Search Let’s look with visualization. Find 24 from 10 numbers. 30 17918222488929499 17918222488929499 17918222488929499 17918222488929499 17918222488929499 0123456789

31 Binary Search How many variables we need? The index for the lower and upper bounds: begin, limit What is the initial values for begin and limit, respectively? begin = 0 limit = len(L) – 1 What is the middle index, mid? mid = (begin+limit) / 2 31

32 Binary Search (begin, limit) = [0, 9] mid = (0+9)/2 = 4 (integer way!) Call comp() function: comp(L[mid], 24)  comp(L[4], 24)  comp(22, 24)  it says “up” The next range will be (begin, limit) = [mid+1, 9] = [5, 9] because we need the upper halves. 32

33 Binary Search (begin, limit) = [5, 9] mid = (5+9)/2 = 7 (integer way!) Call comp() function: comp(L[mid], 24) = comp(L[7], 24) = comp(92, 24)  it says “down” The next range will be (begin, limit) = [5, mid+1] = [5, 6] because we need the upper halves. 33

34 Binary Search (begin, limit) = [5, 6] mid = (5+6)/2 = 5 (integer way!) Call comp() function: comp(L[mid], 24) = comp(L[5], 24) = comp(24, 24)  it says “bingo!” 34

35 Binary Search The binary search is recursive with modified ranges at each call. 35 def binary_search(L, e, begin=None, limit=None): begin = 0 if begin is None else begin limit = len(L) – 1 if limit is None else limit mid = (begin+limit)/2 if L[mid] == e: return (mid, e) elif begin >= limit: return (-1, e) # no item in L elif e < L[mid]: # down return binary_search(L, e, begin, mid+1) return binary_search(L, e, mid+1, limit) # up

36  36 Root Finding Bisection


Download ppt " 1 Searching. Why searching? Searching is everywhere. Searching is an essential operation for a dictionary and other data structures. Understand the."

Similar presentations


Ads by Google