Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/2005 11 Oct.

Similar presentations


Presentation on theme: "1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/2005 11 Oct."— Presentation transcript:

1 1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/2005 11 Oct 2004 1st Lecture Christian Schindelhauer schindel@upb.de

2 Search Algorithms, WS 2004/05 2 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Contents  The very various aspects of search in computer science  Likewise –Searching text –Searching the Web –Searching the DNS –Searching the exit of a maze (labyrinth) –Searching a man over board –Trade-offs between time and space in search –Searching or Deciding? Which one is harder?  Language: English –Examinations can be also made in German (if wanted)

3 Search Algorithms, WS 2004/05 3 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Organisation (I)  Lecture: –Monday, 11 am - 1pm, FU 116 (Beethoven)  Exercise Classes (Übungen) –Start next week –Participation in the exercises classes is mandatory –Monday, 1pm - 2pm, Stefan Rührup –Wednesday, 1pm - 2pm, Christian Schindelhauer  Registration for Exercise Classes –By StudInfo-System –See web page: http://wwwcs.upb.de/cs/ag-madh/WWW/Teaching/2004WS/SearchAlg/ –Find web page from my home page: http://www.upb.de/cs/schindel.html  Register for the Exercise Classes as soon as possible!

4 Search Algorithms, WS 2004/05 4 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Organisation (II)  Material available at the web-site –Slides of the lectures in MS PowerPoint format and PDF –Lecture notes (with possible exam questions) –Exercises –Schedule of the lecture (with upcoming topics and examination dates) –Literature links  Material not available at the web-site –Solutions for the exercises –Solutions for the exam questions –Names of students registered for exercise classes or examinations

5 Search Algorithms, WS 2004/05 5 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Examinations  Two exams: –1st written exam (45 minutes) Wednesday, 8 Dec. 2004, 12 am, in F0.530 Contents: Lectures and Exercises in October and November 2004 –2nd oral exam (25 minutes) In the week from 7 Feb to 11 Feb 2005 in F2.315 –Each exam covers one half of the lecture –The over-all grade is the mean of both examination grades  Exercise rebate –If a student does not participate within the exercise class: 1 extra examination question in the first test 1 extra hour for solving an exercise prior to the 2nd oral exam

6 Search Algorithms, WS 2004/05 6 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Exercises  Successful participation includes: –Registration to one of the exercise classes –Regularly appearing in the exercise classes –Solving at least two exercises (one in the first half and one in the second half) –Presenting these solutions within the exercise class –Written workouts of these solutions (submitted before the exams)  Reservations for exercises for presentation –Can be made by the StudInfo-System

7 Search Algorithms, WS 2004/05 7 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Chapter I Searching Text 10 Oct 2004

8 Search Algorithms, WS 2004/05 8 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Text (Overview)  The task of string matching –Easy as a pie  The naive algorithm –How would you do it?  The Rabin-Karp algorithm –Ingenious use of primes and number theory  The Knuth-Morris-Pratt algorithm –Let a (finite) automaton do the job –This is optimal  The Boyer-Moore algorithm –Bad letters allow us to jump through the text –This is even better than optimal (in practice)  Literature –Cormen, Leiserson, Rivest, “Introduction to Algorithms”, chapter 36, string matching, The MIT Press, 1989, 853-885.

9 Search Algorithms, WS 2004/05 9 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The task of string matching  Given –A text T of length n over finite alphabet  : –A pattern P of length m over finite alphabet  :  Output –All occurrences of P in T amnmaaanptaiiptpii T[1]T[n] ptai P[1]P[m] amnmaaanptaiiptpii ptai Shift s T[s+1..s+m] = P[1..m]

10 Search Algorithms, WS 2004/05 10 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The Naive Algorithm Naive-String-Matcher(T,P) 1.n  length(T) 2.m  length(P) 3.for s  0 to n-m do 4. if P[1..m] = T[s+1.. s+m] then 5. return “Pattern occurs with shift s” 6.fi 7.od Fact:  The naive string matcher needs worst case running time O((n-m+1) m)  For n = 2m this is O(n 2 )  The naive string matcher is not optimal, since string matching can be done in time O(m + n)

11 Search Algorithms, WS 2004/05 11 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The Rabin-Karp-Algorithm  Idea: Compute –checksum for pattern P and –checksum for each sub-string of T of length m amnmaaanptaiiptpii 423142311323110 ptai 3 valid hit spurious hit checksums checksum

12 Search Algorithms, WS 2004/05 12 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The Rabin-Karp Algorithm  Computing the checksum: –Choose prime number q –Let d = |  |  Example: –                      –Then d = 10, q = 13 –Let P = 0815 S 4 (0815) = (0  1000 + 8  100 + 1  10 + 5  1) mod 13 = 815 mod 13 = 9

13 Search Algorithms, WS 2004/05 13 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer How to Compute the Checksum: Horner’s rule  Compute  by using  Example: –                      –Then d = 10, q = 13 –Let P = 0815 S 4 (0815) = ((((0  10+8)  10)+1)  10)+5 mod 13 = ((((8  10)+1)  10)+5 mod 13 = (3  10)+5 mod 13 = 9

14 Search Algorithms, WS 2004/05 14 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer How to Compute the Checksums of the Text  Start with S m (T[1..m]) amnmaaanptaiiptpii S m (T[1..m])S m (T[2..m+1]) checksums

15 Search Algorithms, WS 2004/05 15 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer The Rabin-Karp Algorithm Rabin-Karp-Matcher(T,P,d,q) 1.n  length(T) 2.m  length(P) 3.h  d m-1 mod q 4.p  0 5.t 0  0 6.for i  1 to m do 7. p  (d p + P[i]) mod q 8. t 0  (d t 0 + T[i]) mod q od 9.for s  0 to n-m do 10. if p = t s then 11. if P[1..m] = T[s+1..s+m] then return “Pattern occurs with shift” s fi 12. if s < n-m then 13. t s+1  (d(t s -T[s+1]h) + T[s+m+1]) mod q fi od Checksum of the pattern P Checksum of T[1..m] Checksums match Now test for false positive Update checksum for T[s+1..s+m] using checksum T[s..s+m-1]

16 Search Algorithms, WS 2004/05 16 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Performance of Rabin-Karp  The worst-case running time of the Rabin-Karp algorithm is O(m (n-m+1))  Probabilistic analysis –The probability of a false positive hit for a random input is 1/q –The expected number of false positive hits is O(n/q) –The expected run time of Rabin-Karp is O(n + m (v+n/q)) if v is the number of valid shifts (hits)  If we choose q ≥ m and have only a constant number of hits, then the expected run time of Rabin-Karp is O(n +m).

17 Search Algorithms, WS 2004/05 17 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Knuth-Morris-Pratt: The Principle amnmaaamptaiipt mmaa mmaa mmaa mmaa mmaa mmaa mmaa

18 18 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Thanks for your attention End of 1st lecture Next lecture:Mo 18 Oct 2004, 11 am, FU 116 Next exercise class: Mo 18 Oct 2004, 1 pm, F0.530 or We 20 Oct 2004, 1 pm, F1.110


Download ppt "1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/2005 11 Oct."

Similar presentations


Ads by Google