# Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.

## Presentation on theme: "Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam."— Presentation transcript:

Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

Outline Greedy approach to Motif searching Genome rearrangements Sorting by Reversals Greedy algorithms for sorting by reversals Approximation algorithms Breakpoint Reversal sort

Greedy motif searching Developed by Gerald Hertz and Gary Stormo in 1989 CONSENSUS is the tool based on greedy algorithm Faster than Brute force and Simple motif search algorithms An approximation algorithm with an unknown approximation ratio

Greedy motif search – Psuedocode

Greedy motif search – Steps Input – DNA Sequence, t (# sequences), n (length of one sequence), l (length of motif to search) Output – set of starting points of l-mers Performs an exhaustive search using hamming distance on first two sequences of the DNA Forms a 2 x l seed matrix with the two closest l-mers Scans the rest of t-2 sequences to find the l-mer that best matches the seed and add it to the next row of the seed matrix

Complexity Exhaustive search on first two sequences require l(n-l+1) 2 operations which is O(ln 2 ) The sequential scan on t-2 sequences requires l(n-l+1)(t-2) operations which is O(lnt) Thus running time of greedy motif search is O(ln 2 + lnt) If t is small compared to n algorithm behaves O(ln 2 )

Consensus tool Greedy motif algorithm may miss the optimal motif Consensus tool saves large number of seed matrices Consensus tool can check sequences in random Consensus tool is less likely to miss the optimal motif

Genome rearrangements Gene rearrangements results in a change of gene ordering Series of gene rearrangements can alter genomic architecture of a species 99% similarity between cabbage and turnip genes Fewer than 250 genomic rearrangements since divergence of human and mice

History of Chromosome X Rat Consortium, Nature, 2004

Types of Rearrangements Reversal 1 2 3 4 5 61 2 -5 -4 -3 6 Translocation 4 1 2 3 4 5 6 1 2 6 4 5 3 1 2 3 4 5 6 1 2 3 4 5 6 Fusion Fission

Greedy algorithms in Gene Rearrangements Biologists are interested in finding the smallest number of reversals in an evolutionary sequence gives a lower bound on the number of rearrangements and the similarity between two species Two greedy algorithms used - Simple reversal sort - Breakpoint reversal sort

Gene Order Gene order is represented by a permutation   1  ------  i-1  i  i+1 ------  j-1  j  j+1 -----  n Reversal  ( i, j ) reverses (flips) the elements from i to j in   ( i, j ) ↓  1  ------  i-1  j  j-1 ------  i+1  i  j+1 -----  n

Reversal example  = 1 2 3 4 5 6 7 8  (3,5) ↓ 1 2 5 4 3 6 7 8  (5,6) ↓ 1 2 5 4 6 3 7 8

Reversal distance problem Goal: Given two permutations, find the shortest series of reversals that transforms one into another Input: Permutations  and  Output: A series of reversals  1,…  t transforming  into  such that t is minimum t - reversal distance between  and  d( ,  ) - smallest possible value of t, given  and 

Sorting by reversal Goal : Given a permutation, find a shortest series of reversals that transforms it into the identity permutation. Input: Permutation π Output : A series of reversals  1,…  t transforming  into identity permutation, such that t is minimum

Sorting by reversal - Greedy algorithm If sorting permutation  = 1 2 3 6 4 5, the first three elements are already in order so it does not make any sense to break them. The length of the already sorted prefix of  is denoted prefix(  ) – prefix(  ) = 3 This results in an idea for a greedy algorithm: increase prefix(  ) at every step

Simple Reversal sort – Psuedocode A very generalized approach leads to analgorithm that sorts by moving ith element to ith position SimpleReversalSort(  ) 1 for i  1 to n – 1 2 j  position of element i in  (i.e.,  j = i) 3 if j ≠i 4    *  (i, j) 5 output  6 if  is the identity permutation 7 return

Example – SimpleReversalSort not optimal Input – 612345 612345 ->162345 ->126345 ->123645->123465 - -> 123456 Greedy SimpleReversalSort takes 5 steps where as optimal solution only takes 2 steps 612345 -> 543216 -> 123456 An example of SimpleReversalSort is ‘Pancake Flipping problem’

Approximation Ratio These algorithms produce approximate solution rather than an optimal one Approximation ratio is of an algorithm A is given by A(  ) / OPT(  ) – For algorithm A that minimizes objective function (minimization algorithm): max |  | = n A(  ) / OPT(  ) – For maximization algorithm: min |  | = n A(  ) / OPT(  )

Breakpoints – A different face of greed In a permutation  =  1  ----  n - if  i and  i+1 are consecutive numbers it is an adjacency - if  i and  i+1 are not consecutive numbers it is a breakpoint Example:  = 1 | 9 | 3 4 | 7 8 | 2 | 6 5 Pairs (1,9), (9,3), (4,7), (8,2) and (2,6) form breakpoints Pairs (3,4) (7,8) and (6,5) form adjacencies b(  ) - # breakpoints in permutation p Our goal is to eliminate all breakpoints and thus forming the identity permutation

Breakpoint Reversal Sort – Steps Put two elements  0 =0 and  n + 1 =n+1 at the ends of  Eliminate breakpoints using reversals Each reversal eliminates at most 2 breakpoints This implies reversal distance ≥ #breakpoints/2  = 2 3 1 4 6 5 0 2 3 1 4 6 5 7 b(  ) = 5 0 1 3 2 4 6 5 7 b(  ) = 4 0 1 2 3 4 6 5 7 b(  ) = 2 0 1 2 3 4 5 6 7 b(  ) = 0 Not efficient as it may run forever

Psuedocode – Breakpoint reversal Sort BreakPointReversalSort(  ) 1 while b(  ) > 0 2 Among all possible reversals, choose reversal  minimizing b(   ) 3     (i, j) 4 output  5 return

Using strips A strip is an interval between two consecutive breakpoints in a permutation Decreasing strip: strip of elements in decreasing order Increasing strip: strip of elements in increasing order 0 1 9 4 3 7 8 2 5 6 10 A single-element strip can be declared either increasing or decreasing. We will choose to declare them as decreasing with exception of the strips with 0 and n+1

Reducing breakpoints Choose the decreasing strip with the smallest element k in  Find K-1 in the permutation Reverse the segment between k and k-1 Eg:  = 1 4 6 5 7 8 3 2 0 1 4 6 5 7 8 3 2 9 b(  ) = 5 0 1 2 3 8 7 5 6 4 9 b(  ) = 4 0 1 2 3 4 6 5 7 8 9 b(  ) = 2 0 1 2 3 4 5 6 7 8 9

ImprovedBreakpointReversalSort Sometimes permutation may not contain any decreasing strips So an increasing strip has to be reversed so that it becomes a decreasing strip Taking this into consideration we have an improved algorithm ImprovedBreakpointReversalSort(  ) 1 while b(  ) > 0 2 if  has a decreasing strip 3 Among all possible reversals, choose reversal  that minimizes b(   ) 4 else 5 Choose a reversal  that flips an increasing strip in  6     7 output  8 return

Example – ImprovedBreakPointSort There are no decreasing strips in , for:  = 0 1 2 | 5 6 7 | 3 4 | 8 b(  ) = 3   (6,7) = 0 1 2 | 5 6 7 | 4 3 | 8 b(  ) = 3  (6,7) does not change the # of breakpoints  (6,7) creates a decreasing strip thus guaranteeing that the next step will decrease the # of breakpoints.

Approximation Ratio - ImprovedBreakpointReversalSort Approximation ratio is 4 – It eliminates at least one breakpoint in every two steps; at most 2b(  ) steps – Approximation ratio: 2b(  ) / d(  ) – Optimal algorithm eliminates at most 2 breakpoints in every step: d(  )  b(  ) / 2 – Performance guarantee: ( 2b(  ) / d(  ) )  [ 2b(  ) / (b(  ) / 2) ] = 4

References An Introduction to Bioinformatics Algorithms - Neil C.Jones and Pavel A.Pevzner http://bix.ucsd.edu/bioalgorithms/slides.php# Ch5 http://bix.ucsd.edu/bioalgorithms/slides.php# Ch5

Questions