Presentation is loading. Please wait.

Presentation is loading. Please wait.

Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.

Similar presentations


Presentation on theme: "Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence."— Presentation transcript:

1 Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence is synonymous with string s = AATGCA Length, |s| = 6, s[1] = A Empty string

2 Strings Substring t is a string from consecutive characters of the parent s sSuperstring s is parents string of substring t s[i,j] indicates characters from string s between indices i and j. Concatenation of two strings is st prefix and suffix

3 Graphs A graph consists of two sets V: the set of nodes or vertices E: the set of edges (pair of vertices) G=(V,E) Simple graph: No loops Directed Graphs: Directed Edges valence (in and out degree of vertex) Weighted Graphs

4

5 Connectedness Cycles: No edge repeated and return to start Acyclic: no cycles Complete: Every possible edge Bipartite: Separated into two disjoint subsets Tree: acyclic and connected graph (root, leaves) Interval Graphs: Collection of intervals of real line with edge if intersection nonempty

6 Graph Problems Hamiltonian: Cycle with every vertex on it Eulerian: Every edge in cycle but only once Coloring: Minimum number of colors so that no two adjacent vertices have same color Matching: Subset of edges such that no two edges in M share an endpoint Adjacency Matrix

7 Finite Automata A Finite Collection of States Q A finite alphabet E of input signals A function d which for every possible combination of current state and input determines a new state. Two special states, Initial and Final or Accepting state.

8 The FA accepts any sequence of symbols that puts it in an accepting state The set of all such sequences is the language of the automaton Input Reset Accept

9 State Transition Diagram 012 5 3 4 0 0 0 0 0 0 1 1 1 1 1 1 ??? ? ? ?

10 Regular Expressions 01(001)*01 Language accepted by a FA Pumping Lemma: If L is a regular language, then there is a constant n such that for each word W in L with length >= n, there are words X, Y, Z such that W=XYZ, length of XY =1, and XY k Z is in L for k integer.

11 Used to tell when a language is not in a particular class Let L be language of all palindromes over [a,b]. Abbababba (symmetric about midpoint) Is L regular? W = a n ba n (definition of palindrome) W=XYZ, XY = a n, Z=ba n W=XY 2 Z=a m ba n in L by pumping lemma, m>n W not in L, not a palindrome, L not regular

12 Chomsky Hierarchy

13 Turing Machine StartReset 010010111011011010101000111101010110100010111010010101010111101010001010101101 Read/Write

14 Turing machine M x is a string over M’s alphabet E R/W head over leftmost symbol in x, M in start state R/W communicates symbol on tape to control mechnisim in box M can read symbol, replace symbol, move tape to right or left one cell at a time If M halts (final state), string y on the tape is M’s output corresponding to input x Doesn’t necessarily halt for every x Computes partial function f: E*---->E* M is same thing as its program, which is a set of quintuples (q, s, q’, s’, d) where q is current state, s is current symbol, q’ is next state, s’ is symbol to be written, and d is direction to move M’s compute a particular class of functions over intergers called partial recursive functions

15 Church-Turing Thesis All notions of effective computability are equivalent. Therefore, all computers are created equal. Other schemes: Lambda calculus, General Recursive Functions, etc...

16 Universal Turing Machine Fixed Program in Finite Control Program reads description of Turing Machine from one tape and simulates its behavior on another tape (two tapes) Universal Machine U, Machine to be simulate T

17 Fixed program for U is like an interpreter Tape 1 contains quintuples defining T Tape 2 intially blank. Same output as T here Given T’s current state and input symbol, find the quintuple (q, s, q’, s’, d) in the description of T that applies Record the new state q’, write the new symbol s’ on tape 2, move in direction d, read new symbol on tape 2, and record it beside q’

18 Halting Problem What is not effectively computable? It the a TM, M, that does the following: Given an arbitrary TM, T, as input, and an equally arbitrary tape, t, decide whether T halts on t Equivalent to does T accept t Undecidable

19 Diagonalization

20 Diagonal Set: _ X X _ X _ Its Complement: X _ _ X _ X The complement of the diagonal is different for every row. Can be extended to infinite sets. Used to show that there are languages that are not acceptable by TM. Therefore, there can be no TM that decides that decides whether arbitrary strings are accepted by arbitrary Turing Machines. Since we can represent TM by strings, after some work, it follows that there can be no TM that decides halting problems. Therefore, there are problems that admit no algorithmic solution.

21 Complexity Classes P: efficient algorithms NP: no efficient algorithms found Check solution in polynomial time Transform any NP (P is subset) to NP- complete in polynomial time P = NP ???

22 Satisfiability (SAT) Boolean Expression: (x 1 +~x 3 +x 4 )(~x 1 +~x 2 +~x 4 )(~x 2 +x 3 )(~x 1 +x 2 +x 4 ) What combination of variable values (0,1) makes statement true or false (1,0) 2 n combinations Decision problem: Is formula satisfiable?

23 NP-complete NP: Nondeterministic Polynomial Time 1970, Cook found way to transform every problem in NP to a single, complete problem (satisfiability). Transform in polynomial time Instance of one problem has solution if and only if instance of other problem does Solve any instance of any problem equivalent to solving some instance of SAT

24 NP-Complete P and NP are decision problems (answer yes or no) Optimization problems (minimize or maximize an objective function) NP-hard As least as hard as NP-complete decision problem

25 What to do? Solve efficiently or prove NP-complete –X In NP? Check solution in polynomial time –Known NP-complete Y to X: Solve X in P then solve Y in P Solve on specific, easier instances Exhaustive search Approximate in polynomial time Heuristics Quantum Computer


Download ppt "Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence."

Similar presentations


Ads by Google