Download presentation
Presentation is loading. Please wait.
Published byPauline Fox Modified over 9 years ago
1
RNA Assembly Using extending method. Wei Xueliang 2010-04-07
2
Overview Why abandon deBruijn. Why abandon Extended deBruijn. Introduction to current method. Handle the old problem. The new problem. Todo
3
Why abandon deBruijn. De Bruijn Graph’s (dis)advantage: – Very Fast. – Coverage distribution and K-Value affect a lot Key : the coverage is not uniform distributed in the RNA assembly. – No best K value.
4
Why abandon deBruijn. The length of the red part is 27.
5
deBruijn Graph of K = 28
6
deBruijn Graph of K = 29
7
deBruijn Graph of K = 30
8
Why abandon deBruijn. Key : The coverage is not uniform distributed in the RNA assembly. – No best K value. Can we using different K to run the program many times? This is not De Novo Assembly’s job. – Time. – Provide high accurate contigs with-in limited time. – Scaffolding programs.
9
Why abandon Extended deBruijn. My Extended de Bruijn method: – Using two or more K value at the same time.
10
Why abandon Extended deBruijn. The change rate of coverage is above my expectation. Need many K. The convert between different K are difficult. Memory problem for big K. When K > 32, each K-index need > 50G (with Data-Sets: 10G) Throw the K away.
11
Introduction to the new method From Pramila’s genome assembly method. Start from any Tag and do a correction. If successfully corrected, continue.
12
Introduction to the new method Find all the tag which have at least 24 bps overlaps. (Magic number) Using these overlapping tags to extend Base and continue add more tags.
13
Introduction to the new method How to find the overlapping tags fast and with mis-match? Index and Union: {Tag3}, {Tag2, Tag3}, {Tag3, Tag4} Union =>{Tag1, Tag2, Tag3, Tag4}
14
Introduction to the new method How to find the next overlapping tags fast and with mis-match? V1 <= U3 V2 <= (U1 << 1) + 0 V3 <= (U2 << 1) + 0
15
Handle the old problem. When the length of overlapping part < 24?
16
Handle the old problem. Check the tags one by one by descending order of the length of overlap.
17
Handle the old problem. AG OverlapCount% % 6016.67%14.76% 52320.00%14.76% 44640.00%29.52% 361066.67%1047.62% 301173.33%1676.19% 2415100.00%21100.00%
18
Handle the old problem. AG(High Exp) OverlapCount% % 5616.67%52.50% 50320.00%105.00% 44640.00%2010.00% 361066.67%12060.00% 301173.33%15075.00% 2415100.00%200100.00%
19
Handle the old problem. Degree of approximation.
20
Handle the old problem. Less tips. Do not have bubbles. – Because we doing overlap with mis- match. – Use whole tags
21
The new problem. Speed. The tail of the tag often have more errors. – Reverse Extending Problem.
22
Todo Handle Reverse Extending Problem. Speed Finish the comparision between deBruijn method(velvet) and my method. Paired End Tag.
23
Thank you very much for attention.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.