Presentation on theme: "Searching for Data Relationship between searching and sorting Simple linear searching Linear searching of sorted data Searching for string or numeric data."— Presentation transcript:
Searching for Data Relationship between searching and sorting Simple linear searching Linear searching of sorted data Searching for string or numeric data Binary searching of sorted data Search efficiency Stdlib library binary split search function
Relationship between searching and sorting Searching and sorting are closely related operations. To understand why this is so, think about why it is quicker to find someone's telephone number in a phone directory if you already know their name than it would be to find their name if you knew their phone number, e.g. by dialling It would similarly be much more difficult to find a word in a dictionary corresponding to a definition than it is to look up the definition knowing the word. Phone directories and dictionaries are sorted in the order they are because an assumption has been made about the preferred search key. Searching for a key within a sorted list of keys is clearly much easier than searching for a key within an unsorted list.
Simple linear searching 1 When searching an unsorted array, it is not until the last element used has been examined that it is proven that a particular value does not appear in the array. Thus searching must continue either until the required value is found or until all elements of the array in use have been examined.
Simple linear searching 2 Array index starts at 0 WHILE there are more items to search: IF record key at index same as search key: RETURN array index of record key END IF index = index + 1 END WHILE RETURN -1 meaning key not found
Simple linear search 3
Simple linear searching 4
Linear searching of sorted data 1 Using the algorithm above is inefficient if the keys within the array are in ascending order and the key we are searching for isn't present in the array. This is because we continued searching even when it is obvious that the key we are looking for isn't going to be found because we have already compared all keys in the array less than or equal to the search key. It is possible to stop searching once we know the key can't be in the array.
Linear searching of sorted data 2 STRCMP(3) Linux Programmer’s Manual NAME strcmp - compare two strings SYNOPSIS #include int strcmp(const char *s1, const char *s2); DESCRIPTION The strcmp() function compares the two strings s1 and s2. It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2.
Linear searching of sorted data 3 The integer cmp is given an initial value of -1. So long as the key in the array is less than the search key, cmp will be assigned -1. The search stops, either when the search key is found, or when cmp is assigned +1 because pb[i].name is greater than key.
Searching for string or numeric data 1 In the examples above we were searching for a person's name in the phone book. How would our find function be changed if we were to search for a particular phone extension number stored as an integer instead ? Presumably, the phone book is sorted by name, not by extension number, so we'll have to check all records before knowing an extension isn't present. Warning: it isn't a good idea to search for an exact floating point value due to the probability of decimal to binary conversion rounding errors. You also can't match floating point values in a case statement for the same reason.
Searching for string or numeric data 2
Binary searching of sorted data 1 A sequential search will obviously be very inefficient if a large list of items are to be searched. It is, however, the only solution if the data is unsorted. For sorted data the binary search is more efficient. A binary search is analogous to the way you look up a word in a dictionary or a name in a telephone directory. You might grasp the next paragraph more easily if you first think of a word and then look it up in a dictionary, or look up the telephone number for a friend or business in the phone book, writing down as you carry out your search exactly how you decide where you will make each guess about where in the dictionary or phone book you will next look for the entry.
Binary searching of sorted data 2 The middle element of the list to be searched is compared with the required value, from which it can be determined which half of the list the required value is in. The process is then repeated on the relevant half of the list. Each iteration of the process reduces the number of items still to be searched by a factor of 2, until eventually only one item remains.
Binary searching of sorted data 3 HIGH=highest array index used LOW=lowest array index used WHILE one or more records remain to be searched: MID=(HIGH+LOW)/2 // integer result IF search key < array[MID]: HIGH = MID - 1 IF search key > array[MID]: LOW = MID + 1 END WHILE IF search key is equal to array[MID]: RETURN MID ELSE RETURN -1 // search key is not in array
Binary searching of sorted data 4
Search efficiency 1 The time taken for a search routine to find a key (or to determine that the key is not in the array) will depend on the actual data used. Thus all figures will be average figures assuming randomly distributed data. The time will be proportional to the number of comparisons required which in turn will depend on the number of items to be searched (n). The number of times it is possible to divide n records by 2 using integer division is log 2 n.
Search efficiency 2
Search efficiency 3 Thus the sequential search takes on average n/2 comparisons to find a key; ie 500,000 comparisons to search 1,000,000 items. The binary search takes about 20 comparisons to search the same number of items. Frequently we will simply quote the order of the algorithm; i.e. the sequential search is of order n, which can be written O(n) whilst the binary search is O(log 2 n).
Library binary split search function 1 For a library function to be able to search for any type of value in any type of sorted array, the prototype will have to be general purpose. So void (i.e. untyped) pointers are used to indicate the search key and array base addresses. A custom comparison function also has to be passed into the library binary search function. We'll be looking in detail at how generic (i.e general purpose) functions are designed later in this module. The next 2 slides are from the documention of the stdlib.h library bsearch function. Don't worry if this seems a little bit cryptic !
Library binary split search function 2 BSEARCH(3) Linux Programmer’s Manual NAME bsearch - binary search of a sorted array SYNOPSIS #include void *bsearch(const void *key, const void *base, size_t nmemb, size_t size, int (*compar)(const void *, const void *));
Library binary split search function 3 DESCRIPTION The bsearch() function searches an array of nmemb objects, the initial member of which is pointed to by base, for a member that matches the object pointed to by key. The size of each member of the array is specified by size. The contents of the array should be in ascending sorted order according to the comparison function referenced by compar. The compar routine is expected to have two arguments which point to the key object and to an array member, in that order, and should return an integer less than, equal to, or greater than zero if the key object is found, respectively, to be less than, to match, or be greater than the array member. RETURN VALUE The bsearch() function returns a pointer to a matching member of the array, or NULL if no match is found. If there are multiple elements that match the key, the element returned is unspecified.