The chromosomes contains the set of instructions for alive beings

Slides:



Advertisements
Similar presentations
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
Advertisements

Space-for-Time Tradeoffs
MSc Bioinformatics for H15: Algorithms on strings and sequences
Factor Oracle, Suffix Oracle 1 Factor Oracle Suffix Oracle.
1 String Matching of Bit Parallel Suffix Automata.
1 CSC 421: Algorithm Design & Analysis Spring 2013 Space vs. time  space/time tradeoffs  examples: heap sort, data structure redundancy, hashing  string.
Boyer Moore Algorithm String Matching Problem Algorithm 3 cases Searching Timing.
1 Fastest Approach to Exact Pattern Matching Date:102/3/13 Publisher:Information and Emerging Technologies (ICIET), 2010 Information and Emerging Technologies.
1 A simple fast hybrid pattern- matching algorithm Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)
Advisor: Prof. R. C. T. Lee Speaker: Y. L. Chen
1 The Colussi Algorithm Advisor: Prof. R. C. T. Lee Speaker: Y. L. Chen Correctness and Efficiency of Pattern Matching Algorithms Information and Computation,
1 Advisor: Prof. R. C. T. Lee Speaker: G. W. Cheng Two exact string matching algorithms using suffix to prefix rule.
Boyer-Moore string search algorithm Book by Dan Gusfield: Algorithms on Strings, Trees and Sequences (1997) Original: Robert S. Boyer, J Strother Moore.
Knuth-Morris-Pratt Algorithm left to right scan like the naïve algorithm one main improvement –on a mismatch, calculate maximum possible shift to the right.
Master Course MSc Bioinformatics for Health Sciences H15: Algorithms on strings and sequences Xavier Messeguer Peypoch (
Algorismes de cerca Algorismes de cerca: definició del problema (text,patró) depèn de què coneixem al principi: Cerca exacta: Cerca aproximada: 1 patró.
Master Course MSc Bioinformatics for Health Sciences H15: Algorithms on strings and sequences Xavier Messeguer Peypoch (
Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp Adviser:
Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)
1 KMP algorithm Advisor: Prof. R. C. T. Lee Reporter: C. W. Lu KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R.,, Fast pattern matching in strings, SIAM Journal.
Quick Search Algorithm A very fast substring search algorithm, SUNDAY D.M., Communications of the ACM. 33(8),1990, pp Adviser: R. C. T. Lee Speaker:
Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)
Exact and Approximate Pattern in the Streaming Model Presented by - Tanushree Mitra Benny Porat and Ely Porat 2009 FOCS.
Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)
Backward Nondeterministic DAWG Matching Algorithm
String Matching String matching: definition of the problem (text,pattern) depends on what we have: text or patterns Exact matching: Approximate matching:
Raita Algorithm T. RAITA Advisor: Prof. R. C. T. Lee
Indexing and Searching
Aho-Corasick Algorithm Generalizes KMP to handle sets of strings New ideas –keyword trees –failure functions/links –output links.
1 Exact Matching Charles Yan Na ï ve Method Input: P: pattern; T: Text Output: Occurrences of P in T Algorithm Naive Align P with the left end.
String Matching String matching: definition of the problem (text,pattern) depends on what we have: text or patterns Exact matching: Approximate matching:
Master Course MSc Bioinformatics for Health Sciences H15: Algorithms on strings and sequences Xavier Messeguer Peypoch (
A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber May 1994.
Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
String Matching Using the Rabin-Karp Algorithm Katey Cruz CSC 252: Algorithms Smith College
String Matching String matching: definition of the problem (text,pattern) depends on what we have: text or patterns Exact matching: Approximate matching:
KMP String Matching Prepared By: Carlens Faustin.
Chapter 7 Space and Time Tradeoffs James Gain & Sonia Berman
1 Speeding up on two string matching algorithms Advisor: Prof. R. C. T. Lee Speaker: Kuei-hao Chen, CROCHEMORE, M., CZUMAJ, A., GASIENIEC, L., JAROMINEK,
MA/CSSE 473 Day 24 Student questions Quadratic probing proof
MCS 101: Algorithms Instructor Neelima Gupta
Exact String Matching Algorithms: A Survey Mehreen Ali, Hina Naz Khan, Shumaila Sayyab, Nadeem Iftikhar Department of Bio-Science Mohammad Ali Jinnah University,
String Matching of Regular Expression
Book: Algorithms on strings, trees and sequences by Dan Gusfield Presented by: Amir Anter and Vladimir Zoubritsky.
Bioinformatics PhD. Course Summary (approximate) 1. Biological introduction 2. Comparison of short sequences (
String Matching String matching: definition of the problem (text,pattern) depends on what we have: text or patterns Exact matching: Approximate matching:
Hannu Peltola Jorma Tarhio Aalto University Finland Variations of Forward-SBNDM.
CS5263 Bioinformatics Lecture 15 & 16 Exact String Matching Algorithms.
Bioinformatics PhD. Course 1. Biological introduction Exact Extended Approximate 6. Projects: PROMO, MREPATT, … 5. Sequence assembly 2. Comparison of short.
Design and Analysis of Algorithms – Chapter 71 Space-Time Tradeoffs: String Matching Algorithms* Dr. Ying Lu RAIK 283: Data Structures.
Recuperació de la informació Modern Information Retrieval (1999) Ricardo-Baeza Yates and Berthier Ribeiro-Neto Flexible Pattern Matching in Strings (2002)
CSG523/ Desain dan Analisis Algoritma
Advanced Data Structure: Bioinformatics
Source : Practical fast searching in strings
Exact string matching: one pattern (text on-line)
Recuperació de la informació
13 Text Processing Hongfei Yan June 1, 2016.
Adviser: R. C. T. Lee Speaker: C. W. Cheng National Chi Nan University
Chapter 7 Space and Time Tradeoffs
Tècniques i Eines Bioinformàtiques
Recuperació de la informació
String Matching 11/04/2019 String matching: definition of the problem (text,pattern) Exact matching: depends on what we have: text or patterns The patterns.
Knuth-Morris-Pratt Algorithm.
Chap 3 String Matching 3 -.
Tècniques i Eines Bioinformàtiques
Space-for-time tradeoffs
Improved Two-Way Bit-parallel Search
MA/CSSE 473 Day 27 Student questions Leftovers from Boyer-Moore
Presentation transcript:

The chromosomes contains the set of instructions for alive beings Genome Cell Nucleus Tissue The chromosomes contains the set of instructions for alive beings The chromosomes are the volumes of an encyclopedia called Genome

Chromosome >human chromosome TACGTATACTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGCGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCCGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTCGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGGTACGATCGTCGATCGTCAGCTCGATACGTTACGATCTACGATTACGATCATCTATACTATACTATACGATATATCTAGATATCGATCTA.ACTCCATTCTTTAAACCGTACTACACACACTACTGATCGACGATTACGACGACGAAAGGGCCATATCGGCTAACTACATCATAGACAACATCACGGATCGTCTAAGGCCGAGTTAGGTACGATTAACGTACGACTACCTATCGTATATACATCACGGATATAACCTATCTACTACGATTAACACGATCTATCGTACGGCATATGCATCGTATAGCATCGATTAGAATACGTATACGTACGATCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGCGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGCTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGATGCATGCTAGCGATGCTACGACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGATGCTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACTGCATCGATGCTATACGACGATCGTAGCTACGTACGATCGTACGACGTACGTTACGTACGATCGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTGTCACGTAGCATGCTGACGTACGATCGATTCGATCGATCGTACGATCGTAGCTAGCTAGTCGTAGCGACGTAGGATTCACGTAGCGATGCGTAGCGTAGCATGCTGACGATGCATCGATCGATGCATCATGCTAGCGTAGCTAGCTAGCATGACTGATCGATTAACGGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGTACGGTACACCGCGCACGATCACACGATGCGACGATGCGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGCTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGATCGTACGACTGCTAGCTACGCATGCCTACGTACGTATCCTACGTACGATCGTGCAGCATCGATGCTACGTACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGACGACGATCGATATTAATGCAATCATGCAGCTGCATGCTAGCGATGCTACGTACGATCGTATGCTAGCTAGCATGCATGCATGCATGCAT ………..

Recuperació de la informació 16/04/2017 Bioinformatics. Sequence and genome analysis David W. Mount Flexible Pattern Matching in Strings (2002) Gonzalo Navarro and Mathieu Raffinot Algorithms on strings (2001) M. Crochemore, C. Hancart and T. Lecroq http://www-igm.univ-mlv.fr/~lecroq/string/index.html

String Matching 16/04/2017 String matching: definition of the problem (text,pattern) Exact matching: depends on what we have: text or patterns The patterns ---> Data structures for the patterns 1 pattern ---> The algorithm depends on |p| and || k patterns ---> The algorithm depends on k, |p| and || Extensions Regular Expressions The text ----> Data structure for the text (suffix tree, ...) Approximate matching: Dynamic programming Sequence alignment (pairwise and multiple) Sequence assembly: hash algorithm Probabilistic search: Hidden Markov Models

Exact string matching: one pattern 16/04/2017 How does the string algorithms made the search? For instance, given the sequence CTACTACTACGTCTATACTGATCGTAGCTACTACATGC search for the pattern ACTGA. and for the pattern TACTACGGTATGACTAA As you have seen this morning ....

Exact string matching: Brute force algorithm 16/04/2017 Example: Given the pattern ATGTA, the search is G T A C T A G A G G A C G T A T G T A C T G ... A T G T A A T G T A A T G T A A T G T A A T G T A A T G T A As you have seen this morning ....

Exact string matching: Brute force algorithm 16/04/2017 Which is the next position of the window? How the comparison is made? Text : Pattern : From left to right: prefix Text : Pattern : As you have seen this morning .... The window is shifted only one cell

Exact string matching: one pattern 16/04/2017 How does the matching algorithms made the search? There is a sliding window along the text against which the pattern is compared: Pattern : Text : At each step the comparison is made and the window is shifted to the right. As you have seen this morning .... Which are the facts that differentiate the algorithms? How the comparison is made. The length of the shift.

Exact string matching: one pattern (text on-line) 16/04/2017 Experimental efficiency (Navarro & Raffinot) BNDM : Backward Nondeterministic Dawg Matching | | BOM : Backward Oracle Matching 64 32 16 Horspool 8 BOM BNDM 4 2 Long. pattern w 2 4 8 16 32 64 128 256

Horspool algorithm How the comparison is made? 16/04/2017 Which is the next position of the window? How the comparison is made? Text : Pattern : Sufix search Pattern : Text : a As you have seen this morning .... Shift until the next ocurrence of “a” in the pattern: a We need a preprocessing phase to construct the shift table.

Horspool algorithm : example 16/04/2017 Given the pattern ATGTA The shift table is: A C G T As you have seen this morning ....

Horspool algorithm : example 16/04/2017 Given the pattern ATGTA The shift table is: A 4 C G T As you have seen this morning ....

Horspool algorithm : example 16/04/2017 Given the pattern ATGTA The shift table is: A 4 C 5 G T As you have seen this morning ....

Horspool algorithm : example 16/04/2017 Given the pattern ATGTA The shift table is: A 4 C 5 G 2 T As you have seen this morning ....

Horspool algorithm : example 16/04/2017 Given the pattern ATGTA The shift table is: A 4 C 5 G 2 T 1 As you have seen this morning ....

Horspool algorithm : example 16/04/2017 Given the pattern ATGTA The shift table is: A 4 C 5 G 2 T 1 The searching phase: G T A C T A G A G G A C G T A T G T A C T G ... A T G T A A T G T A A T G T A A T G T A A T G T A A T G T A As you have seen this morning ....

Exemple algorisme de Horspool 16/04/2017 Given the pattern ATGTA The shift table is: A 4 C 5 G 2 T 1 The searching phase: G T A C T A G A G G A C G T A T G T A C T G ... A T G T A A T G T A A T G T A A T G T A A T G T A A T G T A As you have seen this morning .... A T G T A

Qüestions sobre l’algorisme de Horspool 16/04/2017 Given the pattern ATGTA, the shift table is A 4 C 5 G 2 T 1 Given a random text over an equally likely probability distribution (EPD): 1.- Determine the expected shift of the window. And, if the PD is not equally likely? 2.- Determine the expected number of shifts assuming a text of length n. As you have seen this morning .... 3.- Determine the expected number of comparisons in the suffix search phase

Exact string matching: one pattern (text on-line) 16/04/2017 Experimental efficiency (Navarro & Raffinot) BNDM : Backward Nondeterministic Dawg Matching | | BOM : Backward Oracle Matching 64 32 16 Horspool 8 BOM BNDM 4 2 Long. pattern w 2 4 8 16 32 64 128 256

BNDM algorithm How the comparison is made? 16/04/2017 Which is the next position of the window ? How the comparison is made? Text : Pattern : Search for suffixes of T that are factors of Once the next character x is read D3 = D2<<1 & B(x) B(x): mask of x in the pattern P. For instance, if B(x) = ( 0 0 1 1 0 0 0) D = (0 0 0 1 0 0 0) & (0 0 1 1 0 0 0 ) = (0 0 0 1 0 0 0 ) x That is denoted as D2 = 1 0 0 0 1 0 0 Depends on the value of the leftmost bit of D As you have seen this morning ....

BNDM algorithm: exaple 16/04/2017 Given the pattern ATGTA The mask of characters is: B(A) = ( 1 0 0 0 1 ) B(C) = ( 0 0 0 0 0 ) B(G) = ( 0 0 1 0 0 ) B(T) = ( 0 1 0 1 0 ) The searching phase: G T A C T A G A G G A C G T A T G T A C T G ... A T G T A A T G T A A T G T A A T G T A D1 = ( 0 1 0 1 0 ) D2 = ( 1 0 1 0 0 ) & ( 0 0 0 0 0 ) = ( 0 0 0 0 0 ) D1 = ( 0 0 1 0 0 ) D2 = ( 0 1 0 0 0 ) & ( 0 0 1 0 0 ) = ( 0 0 0 0 0 ) As you have seen this morning .... D1 = ( 1 0 0 0 1 ) D2 = ( 0 0 0 1 0 ) & ( 0 1 0 1 0 ) = ( 0 0 0 1 0 ) D3 = ( 0 0 1 0 0 ) & ( 0 0 1 0 0) = ( 0 0 1 0 0 ) D4 = ( 0 1 0 0 0 ) & ( 0 0 0 0 0) = ( 0 0 0 0 0 )

Exemple algorisme BNDM 16/04/2017 Given the pattern ATGTA The mask of characters is : The searching phase: G T A C T A G A G G A C G T A T G T A C T G ... A T G T A B(A) = ( 1 0 0 0 1 ) B(C) = ( 0 0 0 0 0 ) B(G) = ( 0 0 1 0 0 ) B(T) = ( 0 1 0 1 0 ) D1 = ( 1 0 0 0 1 ) A T G T A D2 = ( 0 0 0 1 0 ) & ( 0 1 0 1 0 ) = ( 0 0 0 1 0 ) D3 = ( 0 0 1 0 0 ) & ( 0 0 1 0 0 ) = ( 0 0 1 0 0 ) D4 = ( 0 1 0 0 0 ) & ( 0 1 0 1 0 ) = ( 0 1 0 0 0 ) D5 = ( 1 0 0 0 0 ) & ( 1 0 0 0 1 ) = ( 1 0 0 0 0 ) As you have seen this morning .... D6 = ( 0 0 0 0 0 ) & ( * * * * * ) = ( 0 0 0 0 0 ) Trobat!

Exemple algorisme BNDM 16/04/2017 Given the pattern ATGTA The mask of characters is : B(A) = ( 1 0 0 0 1 ) B(C) = ( 0 0 0 0 0 ) B(G) = ( 0 0 1 0 0 ) B(T) = ( 0 1 0 1 0 ) How the shif is determined? The searching phase: G T A C T A G A A T A C G T A T G T A C T G ... A T G T A A T G T A A T G T A D1 = ( 0 1 0 1 0 ) D2 = ( 1 0 1 0 0 ) & ( 0 0 0 0 0 ) = ( 0 0 0 0 0 ) As you have seen this morning .... D1 = ( 0 1 0 1 0 ) D2 = ( 1 0 1 0 0 ) & ( 1 0 0 0 1 ) = ( 1 0 0 0 0 ) D3 = ( 0 0 0 0 0 ) & ( 1 0 0 0 1 ) = ( 0 0 0 0 0 )