Presentation is loading. Please wait.

Presentation is loading. Please wait.

Search Algorithm Lecture-14 1. Chapter 15-2 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation : Search Methods for Simple Selection:

Similar presentations


Presentation on theme: "Search Algorithm Lecture-14 1. Chapter 15-2 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation : Search Methods for Simple Selection:"— Presentation transcript:

1 Search Algorithm Lecture-14 1

2 Chapter 15-2 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation : Search Methods for Simple Selection: S1. Linear search (brute force): Retrieve every record in the file, and test whether its attribute values satisfy the selection condition. S2. Binary search : If the selection condition involves an equality comparison on a key attribute on which the file is ordered, binary search (which is more efficient than linear search) can be used. S3. Using a primary index or hash key to retrieve a single record: If the selection condition involves an equality comparison on a key attribute with a primary index (or a hash key), use the primary index (or the hash key) to retrieve the record.

3 Chapter 15-3 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation (cont.): Search Methods for Simple Selection: S4. Using a primary index to retrieve multiple records: If the comparison condition is >, ≥, <, or ≤ on a key field with a primary index, use the index to find the record satisfying the corresponding equality condition, then retrieve all subsequent records in the (ordered) file. S5. Using a clustering index to retrieve multiple records: If the selection condition involves an equality comparison on a non-key attribute with a clustering index, use the clustering index to retrieve all the records satisfying the selection condition. S6. Using a secondary (B+-tree) index : On an equality comparison, this search method can be used to retrieve a single record if the indexing field has unique values (is a key) or to retrieve multiple records if the indexing field is not a key. In addition, it can be used to retrieve records on conditions involving >,>=, <, or <=. (FOR RANGE QUERIES)

4 Chapter 15-4 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation (cont.): Search Methods for Complex Selection: S7. Conjunctive selection : If an attribute involved in any single simple condition in the conjunctive condition has an access path that permits the use of one of the methods S2 to S6, use that condition to retrieve the records and then check whether each retrieved record satisfies the remaining simple conditions in the conjunctive condition. S8. Conjunctive selection using a composite index : If two or more attributes are involved in equality conditions in the conjunctive condition and a composite index (or hash structure) exists on the combined field, we can use the index directly.

5 Chapter 15-5 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation (cont.): Search Methods for Complex Selection: S9. Conjunctive selection by intersection of record pointers : This method is possible if secondary indexes are available on all (or some of) the fields involved in equality comparison conditions in the conjunctive condition and if the indexes include record pointers (rather than block pointers). Each index can be used to retrieve the record pointers that satisfy the individual condition. The intersection of these sets of record pointers gives the record pointers that satisfy the conjunctive condition, which are then used to retrieve those records directly. If only some of the conditions have secondary indexes, each retrieved record is further tested to determine whether it satisfies the remaining conditions.

6 Chapter 15-6 Algorithms for SELECT and JOIN Operations (7) Implementing the SELECT Operation (cont.): Whenever a single condition specifies the selection, we can only check whether an access path exists on the attribute involved in that condition. If an access path exists, the method corresponding to that access path is used; otherwise, the “brute force” linear search approach of method S1 is used. For conjunctive selection conditions, whenever more than one of the attributes involved in the conditions have an access path, query optimization should be done to choose the access path that retrieves the fewest records in the most efficient way.

7 Chapter 15-7 Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL JOIN) – two–way join: a join on two files e.g. R A=B S – multi-way joins: joins involving more than two files. e.g. R A=B S C=D T Examples (OP6): EMPLOYEE DNO=DNUMBER DEPARTMENT (OP7): DEPARTMENT MGRSSN=SSN EMPLOYEE

8 Chapter 15-8 Algorithms for SELECT and JOIN Operations (9) Implementing the JOIN Operation (cont.): Methods for implementing joins: J1.Nested-loop join (brute force): For each record t in R (outer loop), retrieve every record s from S (inner loop) and test whether the two records satisfy the join condition t[A] = s[B]. J2.Single-loop join (Using an access structure to retrieve the matching records): If an index (or hash key) exists for one of the two join attributes — say, B of S — retrieve each record t in R, one at a time, and then use the access structure to retrieve directly all matching records s from S that satisfy s[B] = t[A].

9 Chapter 15-9 Algorithms for SELECT and JOIN Operations (10) Implementing the JOIN Operation (cont.): Methods for implementing joins: J3.Sort-merge join: If the records of R and S are physically sorted (ordered) by value of the join attributes A and B, respectively, we can implement the join in the most efficient way possible. Both files are scanned in order of the join attributes, matching the records that have the same values for A and B. In this method, the records of each file are scanned only once each for matching with the other file— unless both A and B are non-key attributes, in which case the method needs to be modified slightly.

10 Chapter 15-10 Algorithms for SELECT and JOIN Operations (11) Implementing the JOIN Operation (cont.): Methods for implementing joins: J4.Hash-join: The records of files R and S are both hashed to the same hash file, using the same hashing function on the join attributes A of R and B of S as hash keys. A single pass through the file with fewer records (say, R) hashes its records to the hash file buckets. A single pass through the other file (S) then hashes each of its records to the appropriate bucket, where the record is combined with all matching records from R.

11 BTREE fundamentals Lecture -15 11

12 B-TREE A B-tree is a specialized multiway tree designed especially for use on disk In a B-tree each node may contain a large number of keys Only a small number of nodes must be read from disk to retrieve an item The goal is to get fast access to the data 12

13 A multiway tree of order m is an ordered tree where each node has at most m children. For each node, if k is the actual number of children in the node, then k - 1 is the number of keys in the node. If the keys and subtrees are arranged in the fashion of a search tree, then this is called a multiway search tree of order m. For example, the following is a multiway search tree of order 4. Note that the first row in each node shows the keys, while the second row shows the pointers to the child nodes. Of course, in any useful application there would be a record of data associated with each key, so that the first row in each node might be an array of records where each record contains a key and its associated data. Another approach would be to have the first row of each node contain an array of records where each record contains a key and a record number for the associated data record, which is found in another file. This last method is often used when the data records are large. 13 B-TREE

14 Hashing Functions Lecture -16 14

15 Hash tables hash table: an array of some fixed size, that positions elements according to an algorithm called a hash function 15 elements (e.g., strings) 0 … length –1 hash func. h(element) hash table

16 Hashing, hash functions The idea: somehow we map every element into some index in the array ("hash" it); this is its one and only place that it should go – Lookup becomes constant-time: simply look at that one slot again later to see if the element is there – add, remove, contains all become O(1) ! For now, let's look at integers ( int ) – a "hash function" h for int is trivial: store int i at index i (a direct mapping) if i >= array.length, store i at index (i % array.length) – h(i) = i % array.length 16

17 Hash function in action Add these elements to the hash table: – 89 – 18 – 49 – 58 – 9 17

18 Hash function example elements = Integers h(i) = i % 10 add 41, 34, 7, and 18 constant-time lookup: – just look at i % 10 again later We lose all ordering information: – getMin, getMax, removeMin, removeMax – the various ordered traversals – printing items in sorted order 18 0 141 2 3 434 5 6 77 818 9

19 Hash collisions collision: the event that two hash table elements map into the same slot in the array example: add 41, 34, 7, 18, then 21 – 21 hashes into the same slot as 41! – 21 should not replace 41 in the hash table; they should both be there collision resolution: means for fixing collisions in a hash table 19 0 121 2 3 434 5 6 77 818 9

20 Linear probing linear probing: resolving collisions in slot i by putting the colliding element into the next available slot (i+1, i+2,...) – add 41, 34, 7, 18, then 21, then 57 21 collides (41 is already there), so we search ahead until we find empty slot 2 57 collides (7 is already there), so we search ahead twice until we find empty slot 9 – lookup algorithm becomes slightly modified; we have to loop now until we find the element or an empty slot what happens when the table gets mostly full? 20 0 141 221 3 434 5 6 77 818 957

21 Clustering problem clustering: nodes being placed close together by probing, which degrades hash table's performance – add 89, 18, 49, 58, 9 – now searching for the value 28 will have to check half the hash table! no longer constant time... 21 049 158 29 3 4 5 6 7 818 989

22 Quadratic probing quadratic probing: resolving collisions on slot i by putting the colliding element into slot i+1, i+4, i+9, i+16,... – add 89, 18, 49, 58, 9 49 collides (89 is already there), so we search ahead by +1 to empty slot 0 58 collides (18 is already there), so we search ahead by +1 to occupied slot 9, then +4 to empty slot 2 9 collides (89 is already there), so we search ahead by +1 to occupied slot 0, then +4 to empty slot 3 – clustering is reduced – what is the lookup algorithm? 22 049 1 258 39 4 5 6 7 818 989

23 Quadratic probing in action 23

24 Chaining chaining: All keys that map to the same hash value are kept in a linked list 24 0 1 2 3 4 5 6 7 8 9 10 107 221242

25 Load factor load factor: ratio of elements to capacity – load factor = size / capacity = 6 / 10 = 0.6 25 0 141 221 3 434 5 6 77 818 957

26 Rehashing, hash table size rehash: increasing the size of a hash table's array, and re-storing all of the items into the array using the hash function – can we just copy the old contents to the larger array? – When should we rehash? Some options: when load reaches a certain level (e.g., = 0.5) when an insertion fails What is the cost (Big-Oh) of rehashing? what is a good hash table array size? – how much bigger should a hash table get when it grows? 26

27 Hash table removal lazy removal: instead of actually removing elements, replace them with a special REMOVED value – avoids expensive re-shuffling of elements on remove – example: remove 18 --> – lookup algorithm becomes slightly modified what should we do when we hit a slot containing the REMOVED value? – keep going – add algorithm becomes slightly modified what should we do when we hit a slot containing the REMOVED value? – use that slot, replace REMOVED with the new value – add(17) --> slot 8 27 0 141 221 3 434 5 6 77 8 REMOVED 957

28 Hashing practice problem Draw a diagram of the state of a hash table of size 10, initially empty, after adding the following elements: 7, 84, 31, 57, 44, 19, 27, 14, and 64 Assume that the hash table uses linear probing. Assume that rehashing occurs at the start of an add where the load factor is 0.75. Repeat the above problem using quadratic probing. 28

29 Writing a hash function If we write a hash table that can store objects, we need a hash function for the objects, so that we know what index to store them We want a hash function to: 1.be simple/fast to compute 2.map equal elements to the same index 3.map different elements to different indexes 4.have keys distributed evenly among indexes 29

30 Hash function for strings elements = Strings let's view a string by its letters: – String s : s 0, s 1, s 2, …, s n-1 how do we map a string into an integer index? (how do we "hash" it?) one possible hash function: – treat first character as an int, and hash on that h(s) = s 0 % array.length is this a good hash function? When will strings collide? 30

31 Better string hash functions view a string by its letters: – String s : s 0, s 1, s 2, …, s n-1 another possible hash function: – treat each character as an int, sum them, and hash on that h(s) = % array.length what's wrong with this hash function? When will strings collide? a third option: – perform a weighted sum of the letters, and hash on that – h(s) = % array.length 31

32 Analysis of hash table search load: the load of a hash table is the ratio:  no. of elements  array size analysis of search, with linear probing: – unsuccessful:  – successful:  32

33 Analysis of hash table search analysis of search, with chaining: – unsuccessful: (the average length of a list at hash(i)) – successful: 1 + ( /2) (one node, plus half the avg. length of a list (not including the item)) 33

34 Compound collections Collections can be nested to represent more complex data example: A person can have one or many phone numbers – want to be able to quickly find all of a person's phone numbers, given their name implement this example as a HashMap of Lists – keys are Strings (names) – values are Lists (e.g ArrayList) of Strings, where each String is one phone number StringList name -->list of phone numbers "Phil" -->["234-8793", "439-8575",...] 34

35 35 The Tuning Cycle Lecture -17

36 Performance tuning Performance tuning is the main activity associated with performance management. Reduced to its most basic level, tuning consists of finding and eliminating bottlenecks — a condition that occurs, and is revealed, when a piece of hardware or software in a server approaches the limits of its capacity. 36

37 The Tuning Cycle The tuning cycle, which is an iterative series of controlled performance experiments.. Repeat the four phases of the tuning cycle shown below until you achieve the performance goals that you established prior to starting the tuning process. Tuning Cycle: 37

38 Collecting The Collecting phase is the starting point of any tuning exercise. Gathering data with the collection of performance counters that are chosen for a specific part of the system. Analyzing After you have collected the performance data that you require for tuning the selected part of the system, you need to analyze the data to determine bottlenecks. 38 Tuning Cycle

39 Configuring After you have collected your data and completed the analysis of the results, you can determine which part of the system is the best candidate for a configuration change and implement this change. Testing After implementing a configuration change, you must complete the appropriate level of testing to determine the impact of the change on the system that you are tuning. 39

40 When testing, be sure to: Check the correctness and performance of the application that you are using for testing by looking for memory leaks and inordinate delays in response to client requests. Ensure that all tests are working correctly. Make sure you can repeat all tests by using the same transaction mix and the same clients generating the same load. Document changes and results. 40 Tuning Cycle

41 Database Performance Tuning DBMS Installation – Setting installation parameters Memory Usage – Set cache levels – Choose background processes Input / Output (I/O) Contention – Distribution of heavily accessed files CPU Usage – Monitor CPU load Application tuning – Modification of SQL code in applications 41


Download ppt "Search Algorithm Lecture-14 1. Chapter 15-2 Algorithms for SELECT and JOIN Operations Implementing the SELECT Operation : Search Methods for Simple Selection:"

Similar presentations


Ads by Google