Dipankar Ranjan Baisya, Mir Md. Faysal & M. Sohel Rahman CSE, BUET Dhaka 1000 Degenerate String Reconstruction from Cover Arrays (Extended Abstract) 1.

Slides:



Advertisements
Similar presentations
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
Advertisements

Boosting Textual Compression in Optimal Linear Time.
SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES. SYMBOL TABLES Compilers that produce an executable (or the representation of an executable in object module.
Longest Common Subsequence
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms Greed is good. (Some of the time)
CPSC 335 Dynamic Programming Dr. Marina Gavrilova Computer Science University of Calgary Canada.
3.2 Pumping Lemma for Regular Languages Given a language L, how do we know whether it is regular or not? If we can construct an FA to accept the language.
T(n) = 4 T(n/3) +  (n). T(n) = 2 T(n/2) +  (n)
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture11: Variants of Turing Machines Prof. Amos Israeli.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture7: PushDown Automata (Part 1) Prof. Amos Israeli.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Copyright © Zeph Grunschlag, Counting Techniques Zeph Grunschlag.
6-1 String Matching Learning Outcomes Students are able to: Explain naïve, Rabin-Karp, Knuth-Morris- Pratt algorithms Analyse the complexity of these algorithms.
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
CS5371 Theory of Computation Lecture 5: Automata Theory III (Non-regular Language, Pumping Lemma, Regular Expression)
Algorithm Design Techniques: Induction Chapter 5 (Except Section 5.6)
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
Data Structures – LECTURE 10 Huffman coding
Elementary Data Types Scalar Data Types Numerical Data Types Other
CS 454 Theory of Computation Sonoma State University, Fall 2011 Instructor: B. (Ravi) Ravikumar Office: 116 I Darwin Hall Original slides by Vahid and.
FSA Lecture 1 Finite State Machines. Creating a Automaton  Given a language L over an alphabet , design a deterministic finite automaton (DFA) M such.
Efficient algorithms for the scaled indexing problem Biing-Feng Wang, Jyh-Jye Lin, and Shan-Chyun Ku Journal of Algorithms 52 (2004) 82–100 Presenter:
Orgad Keller Modified by Ariel Rosenfeld Less Than Matching.
Lecture 7 Topics Dynamic Programming
Nondeterminism (Deterministic) FA required for every state q and every symbol  of the alphabet to have exactly one arrow out of q labeled . What happens.
Great Theoretical Ideas in Computer Science.
Case Study. DNA Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions used in the development and functioning of all known.
Cardinality of Sets Section 2.5.
Dynamic Programming – Part 2 Introduction to Algorithms Dynamic Programming – Part 2 CSE 680 Prof. Roger Crawfis.
Formal Language Finite set of alphabets Σ: e.g., {0, 1}, {a, b, c}, { ‘{‘, ‘}’ } Language L is a subset of strings on Σ, e.g., {00, 110, 01} a finite language,
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Discrete Mathematics CS 2610 March 26, 2009 Skip: structural induction generalized induction Skip section 4.5.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.
CSC312 Automata Theory Lecture # 2 Languages.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
20/10/2015Applied Algorithmics - week31 String Processing  Typical applications: pattern matching/recognition molecular biology, comparative genomics,
Great Theoretical Ideas in Computer Science.
MCS 101: Algorithms Instructor Neelima Gupta
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 1: Exact String Matching.
6/4/ ITCS 6114 Dynamic programming Longest Common Subsequence.
CHAPTER 1 Regular Languages
MCS 101: Algorithms Instructor Neelima Gupta
Copyright © Cengage Learning. All rights reserved.
String Matching String Matching Problem We introduce a general framework which is suitable to capture an essence of compressed pattern matching according.
Specifying Languages Our aim is to be able to specify languages for use in the computer. The sketch of an FSA is easy for us to understand, but difficult.
Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.
CompSci 102 Discrete Math for Computer Science February 7, 2012 Prof. Rodger Slides modified from Rosen.
Restrictions Objectives of the Lecture : To consider the algebraic Restrict operator; To consider the Restrict operator and its comparators in SQL.
Part 2 # 68 Longest Common Subsequence T.H. Cormen et al., Introduction to Algorithms, MIT press, 3/e, 2009, pp Example: X=abadcda, Y=acbacadb.
9/27/10 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis L ECTURE 16 Dynamic.
Sorting by placement and Shift Sergi Elizalde Peter Winkler By 資工四 B 周于荃.
1 Chapter 7 Quicksort. 2 About this lecture Introduce Quicksort Running time of Quicksort – Worst-Case – Average-Case.
Computing smallest and largest repetition factorization in O(n log n) time Hiroe Inoue, Yoshiaki Matsuoka, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai,
Advanced Algorithms Analysis and Design
Copyright © Cengage Learning. All rights reserved.
JinJu Lee & Beatrice Seifert CSE 5311 Fall 2005 Week 10 (Nov 1 & 3)
Longest common subsequence (LCS)
Huffman Coding Greedy Algorithm
Languages Fall 2018.
Presentation transcript:

Dipankar Ranjan Baisya, Mir Md. Faysal & M. Sohel Rahman CSE, BUET Dhaka 1000 Degenerate String Reconstruction from Cover Arrays (Extended Abstract) 1

Motivation Extensive use of degenerate string in molecular biology. o used to model biological sequences. (e.g., FASTA format for representing either nucleotide sequences or peptide sequences) o very efficient in expressing polymorphism in biological sequences. (e.g., Single Nucleotide Polymorphism (SNP)) 2

Goal of the Paper Present efficient algorithms to reconstruct a degenerate string from a valid cover array. We have derived two algorithms one using unbounded alphabet and other using minimum sized alphabet for that purpose. 3

Degenerate String Let = { A, C, G, T } 4

Degenerate String Let = { A, C, G, T } Then we can get 2^4 -1 = 15 non-empty sets of letters. 5

Degenerate String Let = { A, C, G, T } Then we can get 2^4 -1 = 15 non-empty sets of letters. At each position of an degenerate string we have one of those sets. 6

Degenerate String 7

Degenerate String Example 8

Degenerate String: Equality/Match 9

10

Cover Array of Regular String We say that a string S covers a string U if every letter of U is contained in some occurrence of S as a substring of U. Then S is called a cover of U. 11

Cover Array of Regular String We say that a string S covers a string U if every letter of U is contained in some occurrence of S as a substring of U. Then S is called a cover of U. The cover array C of a regular string X[1..n], is a data structure used to store the length of the longest proper cover of every prefix of X. 12

Cover Array of Regular String 13 so for all stores the length of the longest cover of X[1..i] or 0.

Cover Array of Regular String 14 so for all stores the length of the longest cover of X[1..i] or 0. since every cover of any cover of X is also a cover of X, it turns out that, the cover array C compactly describes all the covers of every prefix of X.

Exp. of Cover Array of Regular string Table 2 : Cover array of abaababaabaababaaba 15

Exp. of Cover Array of Regular string From Table 2 we can see that, cover array of X, has the entries representing the three covers of X having length 11, 6 and 3 respectively. 16

Cover Array for Degenerate String For a degenerate string, stores the list of the lengths of the covers of. 17

Cover Array for Degenerate String For a degenerate string, stores the list of the lengths of the covers of. More elaborately, each is a list such that where denotes the p-th largest cover of X[1..i]. 18

Example of Cover Array for Degenerate String Table 3. Cover Array of aba[ab][ab]a 19

Basic Validation of a Cover Array for we define as of a Cover of. 20

Basic Validation of a Cover Array for we define as of a Cover of. candidate-lengths of covers of is 21

Basic Validation of a Cover Array 22

Basic Validation of a Cover Array Further we assume that a symbol is required by. k 23

Basic Validation of a Cover Array Further we assume that a symbol is required by. Notably. k 24

Basic Validation of a Cover Array Lemma 1. Suppose is the cover array of a string. If, then 25

Basic Validation of a Cover Array Lemma 1. Suppose is the cover array of a string. If, then Proof. proof will be provided in the journal version. 26

Basic Validation of a Cover Array Theorem 2. Proof. proof will be provided in the journal version. 27

Our Algorithms 28

Our Algorithms 29

CrAyDSRUn we denote by the new set of characters introduced in, i.e.,. 30

CrAyDSRUn we denote by the new set of characters introduced in, i.e.,. we denote by the set of symbols that are not allowed only at Position, i.e., 31

CrAyDSRUn we denote by the new set of characters introduced in, i.e.,. we denote by the set of symbols that are not allowed only at Position, i.e., denotes the set of symbols that can be assigned to. 32

CrAyDSRUn Lemma 3. Proof. proof will be provided in the journal version. 33

CrAyDSRUn: Basic Steps Suppose, we have a cover array where is the product of maximum string length and maximum list length of cover array. 34

CrAyDSRUn: Basic Steps Suppose, we have a cover array where is the product of maximum string length and maximum list length of cover array. so, 35

CrAyDSRUn: Basic Steps step 1, validity checking. 36

CrAyDSRUn: Basic Steps step 1, validity checking. step 2, find. 37

CrAyDSRUn: Basic Steps step 1, validity checking. step 2, find. step 3, find. 38

CrAyDSRUn: Basic Steps step 1, validity checking. step 2, find. step 3, find. step 4, place new character in proper positions. 39

CrAyDSRUn: Step 1 scan/proceed one position at a time from left to right i.e., in an online fashion. check validity at each position of input array according to “Basic Validation” section of a cover array. 40

CrAyDSRUn: Step 1 as soon as it finds a violation of validity the algorithms returns the input array as invalid. for further validation see Lemma 4 and Lemma 5 (Slide no. 45 and 46). 41

CrAyDSRUn: Step 2 for each, our algorithm finds for each.. 42

CrAyDSRUn: Step 2 for each, our algorithm finds for each.. our algorithm finds with the help of Lemma 4 and Lemma 5.(see next slides) 43

CrAyDSRUn: Step 2 Lemma 4. Proof. proof will be provided in the journal version. 44

CrAyDSRUn: Step 2 Lemma 5. Proof. proof will be provided in the journal version. 45

CrAyDSRUn: Step 2 we can see from Lemma 4 and Lemma 5 that we need to perform intersection operation. Finding intersection will take. so for each position step 2 takes. 46

CrAyDSRUn: Step 3 similar to step 2. find using Lemma 3.(see slide no.34) for that our algorithms finds a set of in a similar way of finding in previous step except that instead of there will be iterations for each. 47

CrAyDSRUn: Step 3 similar to step 2. find using Lemma 3.(see slide no.34) for that our algorithms finds a set of in a similar way of finding in previous step except that instead of there will be iterations for each. then. 48

CrAyDSRUn: Step 3 49 so for each position step 3 takes.

CrAyDSRUn: Step 4 if where then 50

CrAyDSRUn: Step 4 if where then if then new characters will be placed in proper positions. 51

CrAyDSRUn: Running Time for each position i, Construction of 52

CrAyDSRUn: Running Time for each position i, Construction of so for positions, we need 53

CrAyDSRin Lemma 6. for every degenerate string if is the only entry in, then and Proof. proof will be provided in the journal version. 54

CrAyDSRin works exactly like except it computes slight differently. 55

CrAyDSRin works exactly like except it computes slight differently. it computes following Lemmas 3.a and 6 (instead of Lemma 3.b). 56

CrAyDSRin: Running Time Runtime analysis follows readily from analysis of. only extra calculation which can be done in. 57

CrAyDSRin: Running Time Runtime analysis follows readily from analysis of. only extra calculation which can be done in. therefore overall runtime complexity. 58

Thank you 59