Presentation is loading. Please wait.

Presentation is loading. Please wait.

RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

Similar presentations


Presentation on theme: "RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do."— Presentation transcript:

1 RNA Sequence Assembly WEI Xueliang

2 Overview Sequence Assembly Current Method My Method RNA Assembly To Do

3 Sequence Assembly Goal : get the DNA/RNA sequence. Machine cannot read whole genomes in one go, but rather small pieces between 20 and 1000 bases. Define: Read = Tag = Fragment

4 De novo sequence assembly

5 MAPPING Map the tag to the reference genome. Resequencing

6 Overview Sequence Assembly Current Method My Method RNA Assembly To Do

7 De novo sequence assembly Calculating the overlap need huge amount of time.

8 DE BRUIJN GRAPH K-Mer : Length k substring of the Tag. Each nodes only have 4 out degrees at most. Hashing the node. “CTG”=>(132) 4 =(30) 10 “CTG”=>”TG G ” (132=) 4 shift left. (1320) 4 module (1000) 4 (320) 4 + (3) 4 ‘G’ (323) 4

9 DE BRUIJN GRAPH (CONT’) If there are repeats, like ”GACT” 3-Mer De Bruijn can not know which way is the correct way. 6-Mer can get the correct sequence. Larger K, better result.

10 De novo sequence assembly Suppose use K = Length of Tag. (20-Mer) TGACGTAGCTATGTATTTTG GACGTAGCTATGTATTTTG T (no 20-Mer) Coverage is not enough to support large K.

11 Overview Sequence Assembly Current Method My Method RNA Assembly To Do

12 MY METHOD. Tag length=6, K=3 When we have AAGACT? Try all the way: A AGACTC AAGACTT AAGACTG Check Tag : AGACTC The correct way should be AAGACT C

13 Overview Sequence Assembly Current Method My Method RNA Assembly To Do

14 RNA ASSEMBLY

15 ALTERNATIVE SPLICING The graph All cDNA sequences.

16 RNA ASSEMBLY’S PROBLEM Merge? Index the sequence.

17 RNA ASSEMBLY’S PROBLEM(CONT’) Solution?

18 RNA ASSEMBLY’S PROBLEM(CONT’) Index Tags

19 RNA ASSEMBLY’S PROBLEM(CONT’) Solution? Speed?

20 SINGLE TAG’S LIMITATION |Yellow Sequence| >= Length of Tag Length of Tag 25-100bp. Single Tag is not enough!

21 DATASET - PAIRED END TAGS Fragment length usually > 1k Some RNA sequence is shorter than 1k.

22 TO DO Handle large data-sets. (10G) Improve accuracy. Using PETs data.

23 Thanks!!


Download ppt "RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do."

Similar presentations


Ads by Google