Cyclic string-to-string correction

Slides:



Advertisements
Similar presentations
An Extension of the String-to- String Correction Problem Roy Lowrance and Robert A. Wagner Journal of the ACM, vol. 22, No. 2, April 1975, pp
Advertisements

Indexing DNA Sequences Using q-Grams
Longest Common Subsequence
DYNAMIC PROGRAMMING ALGORITHMS VINAY ABHISHEK MANCHIRAJU.
ESSENTIAL CALCULUS CH11 Partial derivatives
Optimization Problems in Optical Networks. Wavelength Division Multiplexing (WDM) Directed: Symmetric: Optic Fiber.
Final presentation Final presentation Tandem Cyclic Alignment.
Chapter 7 Dynamic Programming.
Inexact Matching of Strings General Problem –Input Strings S and T –Questions How distant is S from T? How similar is S to T? Solution Technique –Dynamic.
Rapid Global Alignments How to align genomic sequences in (more or less) linear time.
§ 8 Dynamic Programming Fibonacci sequence
Introduction to Bioinformatics Algorithms Dynamic Programming: Edit Distance.
Boyer-Moore string search algorithm Book by Dan Gusfield: Algorithms on Strings, Trees and Sequences (1997) Original: Robert S. Boyer, J Strother Moore.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Distance Functions for Sequence Data and Time Series
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining - revision Martin Russell.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 11: Core String Edits.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
Performance Evaluation of Grouping Algorithms Vida Movahedi Elder Lab - Centre for Vision Research York University Spring 2009.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Do Now - Review Find the solution to the system of equations: x – y = 3 x + y = 5.
Gene Matching Using JBits Steven A. Guccione Eric Keller.
Geodesic Minimal Paths Vida Movahedi Elder Lab, January 2010.
1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.
7 -1 Chapter 7 Dynamic Programming Fibonacci sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Identification of Regulatory Binding Sites Using Minimum Spanning Trees Pacific Symposium on Biocomputing, pp , 2003 Reporter: Chu-Ting Tseng Advisor:
1 Optimal Cycle Vida Movahedi Elder Lab, January 2008.
Alignment, Part I Vasileios Hatzivassiloglou University of Texas at Dallas.
Minimum Edit Distance Definition of Minimum Edit Distance.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014.
A survey of different shape analysis techniques 1 A Survey of Different Shape Analysis Techniques -- Huang Nan.
1 Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: u GCGCATGGATTGAGCGA u TGCGCCATTGATGACCA.
Cross Language Clone Analysis Team 2 February 3, 2011.
Dynamic Programming: Edit Distance
A * Search A* (pronounced "A star") is a best first, graph search algorithm that finds the least-cost path from a given initial node to one goal node out.
Lower Bounds for Embedding Edit Distance into Normed Spaces A. Andoni, M. Deza, A. Gupta, P. Indyk, S. Raskhodnikova.
An Improved Search Algorithm for Optimal Multiple-Sequence Alignment Paper by: Stefan Schroedl Presentation by: Bryan Franklin.
Core String Edits, Alignments, and Dynamic Programming.
Dynamic Programming for the Edit Distance Problem.
Approximate k-edit-distance
Cycle Canceling Algorithm
Definition of Minimum Edit Distance
Approximate Matching of Run-Length Compressed Strings
Definition of Minimum Edit Distance
Distance Functions for Sequence Data and Time Series
Investigating the Hausdorff Distance
2. Derivatives on the calculator
SINGLE-SOURCE SHORTEST PATHS IN DAGs
Straight Line Graphs (Linear Graphs)
Definition In simple terms, an algorithm is a series of instructions to solve a problem (complete a task) We focus on Deterministic Algorithms Under the.
SPIRE Normalized Similarity of RNA Sequences
String matching.
CS 3343: Analysis of Algorithms
Intro to Alignment Algorithms: Global and Local
Dynamic Programming Computation of Edit Distance
Unit-4: Dynamic Programming
A graphing calculator is required for some problems or parts of problems 2000.
SPIRE Normalized Similarity of RNA Sequences
CSE 589 Applied Algorithms Spring 1999
Bioinformatics Algorithms and Data Structures
Lecture 15: Least Square Regression Metric Embeddings
Lecture 6 Shortest Path Problem.
Richard Anderson Winter 2019 Lecture 6
SPIRE Normalized Similarity of RNA Sequences
Presentation transcript:

Cyclic string-to-string correction Vida Movahedi Elderlab, October 2009

Contents Problem Definition Linear string-to-string correction Dynamic Programming Cyclic strings A faster approach Application: curve similarity

Problem Definition Two strings: Edit operation Taking A to B

Linear string-to-string correction Cost of edit Example: edit ‘high’ to ‘low’ Edit sequence: delete ‘h’, change ‘i’ to ‘l’, delete ‘g’, change ‘h’ to ‘o’, insert ‘w’ Goal: find edit sequence with minimum cost

Edit Graph, path and trace

Dynamic Programming Why is dynamic programming an option? Complexity: O(nm)

Cyclic strings Cyclic shifts Edit cost if cyclic shifts m possible shifts, m runs of dynamic programming: O(nm2)

A faster approach All edit graphs are included in edit graph of A and BB (let’s call it graph H)

Non-crossing Paths Consider shifts j, k, l where Traces corresponding to the optimal edit sequences are non-crossing on graph H: P(j), P(k), P(l) Reducing necessary calculations

Non-crossing paths

O(nmlogm) algorithm

An Application: Curve Similarity Two curves as two strings A and B Edit cost: Euclidean distance Minimum edit cost corresponds to optimal matching Symmetric cost for each edit operation  Symmetric distance Contour Mapping Distance=7.73

References Maurice Maes (1990), “On a cyclic string-to-string correction problem”, Information Processing Letters, vol. 35, pp. 73-78.