Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Assembly Charles Yan 2008. Fragment Assembly Given a large number of fragments, such as ACC AC AT AC AT GG …, the goal is to figure out the original.

Similar presentations


Presentation on theme: "Genome Assembly Charles Yan 2008. Fragment Assembly Given a large number of fragments, such as ACC AC AT AC AT GG …, the goal is to figure out the original."— Presentation transcript:

1 Genome Assembly Charles Yan 2008

2 Fragment Assembly Given a large number of fragments, such as ACC AC AT AC AT GG …, the goal is to figure out the original sequence that consists of each and every of the fragment.

3 Overlaps The overlap between string T and S is the longest suffix of S that is also the prefix of T. S=ATCGATCCG T=CGATCCGATTAT overlap(T, S)= CGATCCG

4 A Simplified Problem Shortest common superstring problem: Given a set of strings, to find a minimal length string S that each and every one of the input strings appears as a substring of S.

5 Directed Graph Model Nodes: Each input fragment is a node. (Each node is labeled with an input fragment) Edge(v,w) is labeled with overlap (W,V), where W and V are the node labels of w, and v respectively. The edge weight is |overlap (W,V)|. To find a superstring is to find a directed path that traverse each and every node once (Hamilton path problem) Shortest superstring: A Hamilton path with the maximal sum of edge weight.

6 Directed Graph Model NPC No efficient solution that can give accurate results for all cases Heuristic

7 Genome Assembly Difficulties Repeats Bidirectional nature of DNA Errors


Download ppt "Genome Assembly Charles Yan 2008. Fragment Assembly Given a large number of fragments, such as ACC AC AT AC AT GG …, the goal is to figure out the original."

Similar presentations


Ads by Google