This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.

Slides:



Advertisements
Similar presentations
MATH 224 – Discrete Mathematics
Advertisements

AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
Chapter 4: Trees Part II - AVL Tree
AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
Types of Algorithms.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Section 7.4: Closures of Relations Let R be a relation on a set A. We have talked about 6 properties that a relation on a set may or may not possess: reflexive,
CSC 2300 Data Structures & Algorithms March 16, 2007 Chapter 7. Sorting.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
CS4413 Divide-and-Conquer
Spring 2015 Lecture 5: QuickSort & Selection
Nattee Niparnan. Recall  Complexity Analysis  Comparison of Two Algos  Big O  Simplification  From source code  Recursive.
Updated QuickSort Problem From a given set of n integers, find the missing integer from 0 to n using O(n) queries of type: “what is bit[j]
CSC1016 Coursework Clarification Derek Mortimer March 2010.
Introduction to Analysis of Algorithms
Complexity (Running Time)
Pattern Matching 4/17/2017 7:14 AM Pattern Matching Pattern Matching.
Analysis of Algorithms CS 477/677
Recursion Chapter 7. Chapter 7: Recursion2 Chapter Objectives To understand how to think recursively To learn how to trace a recursive method To learn.
Searching1 Searching The truth is out there.... searching2 Serial Search Brute force algorithm: examine each array item sequentially until either: –the.
Analysis of Algorithms COMP171 Fall Analysis of Algorithms / Slide 2 Introduction * What is Algorithm? n a clearly specified set of simple instructions.
Games, Hats, and Codes Mira Bernstein Wellesley College SUMS 2005.
1 HEAPS & PRIORITY QUEUES Array and Tree implementations.
Recursion Chapter 7. Chapter Objectives  To understand how to think recursively  To learn how to trace a recursive method  To learn how to write recursive.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
Today  Table/List operations  Parallel Arrays  Efficiency and Big ‘O’  Searching.
Computer Science 101 Fast Searching and Sorting. Improving Efficiency We got a better best case by tweaking the selection sort and the bubble sort We.
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 19: Searching and Sorting Algorithms.
 2006 Pearson Education, Inc. All rights reserved Searching and Sorting.
Order Statistics. Order statistics Given an input of n values and an integer i, we wish to find the i’th largest value. There are i-1 elements smaller.
The Binary Heap. Binary Heap Looks similar to a binary search tree BUT all the values stored in the subtree rooted at a node are greater than or equal.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Sorting Fun1 Chapter 4: Sorting     29  9.
Analysis of Algorithms CS 477/677
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
Sorting.
Types of Algorithms. 2 Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We’ll talk about a classification.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
Searching and Sorting Searching: Sequential, Binary Sorting: Selection, Insertion, Shell.
Searching CSE 103 Lecture 20 Wednesday, October 16, 2002 prepared by Doug Hogan.
The Selection Algorithm : Design & Analysis [10].
Data Structures Arrays and Lists Part 2 More List Operations.
Keeping Binary Trees Sorted. Search trees Searching a binary tree is easy; it’s just a preorder traversal public BinaryTree findNode(BinaryTree node,
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
CSE 250 – Data Structures. Today’s Goals  First review the easy, simple sorting algorithms  Compare while inserting value into place in the vector 
Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.
Sorting and Runtime Complexity CS255. Sorting Different ways to sort: –Bubble –Exchange –Insertion –Merge –Quick –more…
Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Sorting by Tammy Bailey
Quick-Sort 9/12/2018 3:26 PM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,
Algorithm Analysis CSE 2011 Winter September 2018.
(edited by Nadia Al-Ghreimil)
Quick-Sort 11/14/2018 2:17 PM Chapter 4: Sorting    7 9
Quick-Sort 11/19/ :46 AM Chapter 4: Sorting    7 9
Unit-2 Divide and Conquer
Winter 2018 CISC101 12/2/2018 CISC101 Reminders
Applied Combinatorics, 4th Ed. Alan Tucker
24 Searching and Sorting.
Simple Sorting Methods: Bubble, Selection, Insertion, Shell
Quick-Sort 2/23/2019 1:48 AM Chapter 4: Sorting    7 9
Lecture 8. Paradigm #6 Dynamic Programming
Quick-Sort 4/25/2019 8:10 AM Quick-Sort     2
B-Trees.
The Selection Problem.
CS203 Lecture 15.
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit steps which are required to change one string into another. –Three types of edit steps: insert, delete, replace. –Example: abbc  babb –abbc  bbc  bbb  babb (3 steps) –abbc  babbc  babb (2 steps) –We are trying to minimize the number of steps.

Idea: look at making just one position right. Find all the ways you could use. – Count how long each would take and recursively figure total cost. –Orderly way of limiting the exponential number of combinations to think about. –For ease in coding, we make the last character right (rather than any other). (Then I can tell the routine the beginning address and number of positions without having the string begin in a different location. In C it would be no problem, but other languages use a different technique.)

There are four possibilities (pick the cheapest) 1.If we delete a n, we need to change A(n-1) to B(m). The cost is C(n,m) = C(n-1,m) + 1 C(n,m) is the cost of changing the first n of str1 to the first m of str2. 2. If we insert a new value at the end of A(n) to match b m, we would still have to change A(n) to B(m-1). The cost is C(n,m) = C(n,m-1) If we replace a n with b m, we still have to change A(n-1) to B(m-1). The cost is C(n,m) = C(n-1,m-1) If we match a n with b m, we still have to change A(n-1) to B(m-1). The cost is C(n,m) = C(n- 1,m-1)

–We have turned one problem into three problems - just slightly smaller. –Bad situation - unless we can reuse results. Dynamic Programming. –We store the results of C(i,j) for i = 1,n and j = 1,m. –If we need to reconstruct how we would achieve the change, we store both the cost and an indication of which set of subproblems was used.

M(i,j) which indicates which of the four decisions lead to the best result. –Complexity: O(mn) - but needs O(mn) space as well. –Consider changing do to redo: –Consider changing mane to mean:

Longest Increasing Subsequence Find the longest increasing subsequence in a sequence of distinct integers. Idea 1. Given a sequence of size less than m, can find the longest sequence of it. (Induction) The problem is that we don't know how to increase the length. Case 1: It either can be added to the longest subsequence or not Case 2: It is possible that it can be added to a non-selected subsequence (creating a sequence of equal length - but having a smaller ending point) Case 3: It can be added to a non-selected sub-sequence creating a sequence of smaller length but successors make it a good choice. Example:

Idea 2. Given a sequence of size string < m, we know how to find all the longest increasing subsequences. – Hard. There are many, and we need it for all lengths.

Idea 3 Given a sequence of size < m, can find the longest subsequence with the smallest ending point. – We might have to create a smaller subsequence, before we create a longer one.

Idea 4. Given a sequence of size <m, can find the best increasing sum (BIS) for every length (k < m-1). – For each new item in the sequence, when we add it to the sequence of length 3 will it be better than the current sequence of length 4?

For s= 1 to n (or recursively the other way) For k = s downto 1 until find correct spot If BIS(k) > A s and BIS(k-1) < A s BIS(k) = A s

Actually, we don't need the sequential search as can do a binary search Length BIS –To output the sequence would be difficult as you don't know where the sequence is. You would have to reconstruct.

Probabilistic Algorithms –Suppose we wanted to find a number that is greater than the median (the number for which half are bigger). –We could sort them - O(n log n) and then select one. –We could find the biggest - but stop looking half way through. O(n/2) –Cannot guarantee one in the upper half in less than n/2 comparisons. –What if you just wanted good odds? –Pick two numbers, pick the larger one. What is probability it is in the lower half?

There are four possibilities: – both are lower – the first is lower the other higher. – the first is higher the other lower – both are higher. We will be right 75% of the time! We only lose if both are in the lowest half.

Select k elements and pick the biggest, the probability is 1 - 1/2 k. Good odds - controlled odds. –Termed a Monte Carlo algorithm. It may –give the wrong result with very small probability. Another type of probabilistic algorithm is one that never gives a wrong result, but its running time is not guaranteed. –Termed Las Vegas algorithm as you are guaranteed success if you try long enough.

A coloring Problem: Las Vegas Style –Let S be a set with n elements. (n only effects complexity not algorithm) –Let S 1, S 2... S k be a collection of –distinct (in some way different) –subsets of S, each containing exactly r elements such that k  2 r-2. (Use this bound below) –Color each element of S with one of two colors (red or blue) such that each –subset S i contains at least one red and one blue element.

Idea –Try coloring them randomly and they just checking to see if you happen to –win. Checking is fast, as you can quit checking each subset when you see –one of each. You can quit checking the collection when any single color subset is found. –What is the probability that all items in a set are red? 1/2 r –as equal probability that each color is assigned and r items in the set.

What is the probability that any one of the collection is all red? – k/2 r –Since we are looking for the {\it or} of a set of probabilities, we add. – k is bound by 2 r-2 so k*1/2 r <= 1/4 –The probability of all blue or all red in a single set is one half. (double probability of all red) –If our random coloring fails, we simply try again until success. – Our expected number of tries is 2.

Finding a Majority –Let E be a sequence of integers x 1,x 2,x 3,... x n The multiplicity of x in E is the number of times x appears in E. A number z –is a majority in E if its multiplicity is greater than n/2. –Problem: given a sequence of numbers, find the majority in the sequence or determine that none exists.

–For example, suppose there is an election. Candidates are represented as integers. Votes are represented as a list of candidate numbers. –We are assuming no limit of the number of possible candidates.

Ideas 1. sort the list O(n log n) 2. If have a balanced tree of candidate names would be n log c (where c is number of candidates) Note, if we don’t know how many candidates, we can’t give them indices. 3. See if median (kth largest algorithm) occurs more than n/2 times. O(n) 4. Take a small sample. Find the majority - then count how many times it occurs in the whole list. This is a probabilistic approach. 5. Make one pass - Discard elements that won’t affect majority.

Note: – if x i  x j and we remove both of them, then the majority in the old list is the majority in the new list. –If x i is a majority there are m x_i s out of n, where m > n/2. If we remove two elements, (m-1 > (n-2)/2). –The converse is not true. If there is no majority, removing two may make something a majority in the smaller list: 1,2,4,5,5.

Thus, our algorithm will find a possible majority. –Algorithm: find two unequal elements. Delete them. Find the majority in the smaller list. Then see if it is a majority in the original list. –How do we remove elements? It is easy. We scan the list in order. –We are looking for a pair to eliminate. –Let i be the current position. All the items before x i which have not been eliminated have the same value. All you really need to keep is the number of times, Occurs, this candidate, C value occurs (which has not been deleted).

For example: List: Occurs: X X 1 X X 1 X 1 X Candidate: ? 2 ? 1 ? is a candidate, but is not a majority in the whole list. –Complexity: n-1 compares to find a candidate. n-1 compares to test if it is a majority. So why do this over other ways? Simple to code. No different in terms of complexity, but interesting to think about.