RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.

Slides:



Advertisements
Similar presentations
CS 336 March 19, 2012 Tandy Warnow.
Advertisements

Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
RNA Assembly Using extending method. Wei Xueliang
MCB Lecture #9 Sept 23/14 Illumina library preparation, de novo genome assembly.
Next Generation Sequencing, Assembly, and Alignment Methods
SplitMEM: graphical pan-genome analysis with suffix skips Shoshana Marcus May 29, 2014.
CS273a Lecture 4, Autumn 08, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Assembly.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Sequencing and Assembly Cont’d. CS273a Lecture 5, Aut08, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
Shuffle Exchange Network and de Bruijn’s Graph Shuffle Exchange graph Merge exchange into a single node De Bruijn.
NGS Transcriptomic Workflows Hugh Shanahan & Jamie al-Nasir Royal Holloway, University of London.
Genome Assembly Charles Yan Fragment Assembly Given a large number of fragments, such as ACC AC AT AC AT GG …, the goal is to figure out the original.
Delon Toh. Pitfalls of 2 nd Gen Amplification of cDNA – Artifacts – Biased coverage Short reads – Medium ~100bp for Illumina – 700bp for 454.
De-novo Assembly Day 4.
Mon C222 lecture by Veli Mäkinen Thu C222 study group by VM  Mon C222 exercises by Anna Kuosmanen Algorithms in Molecular Biology, 5.
CS 394C March 19, 2012 Tandy Warnow.
Todd J. Treangen, Steven L. Salzberg
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.
Meraculous: De Novo Genome Assembly with Short Paired-End Reads
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
Metagenomics Assembly Hubert DENISE
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
De Novo Genome Assembly - Introduction Henrik Lantz - BILS/SciLife/Uppsala University.
BNFO 615 Usman Roshan. Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Gena Tang Pushkar Pande Tianjun Ye Xing Liu Racchit Thapliyal Robert Arthur Kevin Lee.
billion-piece genome puzzle
De Novo Genome Assembly - Introduction
The iPlant Collaborative
Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Effective Parallel Multicore-optimized K-mers Counting Algorithm
CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow.
Sequencing technologies and Velvet assembly Lecturer : Du Shengyang September 29 , 2012.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Chapter 5 Sequence Assembly: Assembling the Human Genome.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
MERmaid: Distributed de novo Assembler Richard Xia, Albert Kim, Jarrod Chapman, Dan Rokhsar.
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
Short Read Sequencing Analysis Workshop
Lesson: Sequence processing
Assembly algorithms for next-generation sequencing data
Phusion2 and The Genome Assembly of Tasmanian Devil
CAP5510 – Bioinformatics Sequence Assembly
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Metafast High-throughput tool for metagenome comparison
Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert
Fragment Assembly (in whole-genome shotgun sequencing)
Professors: Dr. Gribskov and Dr. Weil
RNA molecule RNA fragment Activity Intro Slide:
Kallisto: near-optimal RNA seq quantification tool
Introduction to Genome Assembly
Transcriptome Assembly
Removing Erroneous Connections
Henrik Lantz - NBIS/SciLife/Uppsala University
CS 598AGB Genome Assembly Tandy Warnow.
Do You Want to Build a Transcriptome?
DNA Sequencing By Dan Massa.
Presentation transcript:

RNA Sequence Assembly WEI Xueliang

Overview Sequence Assembly Current Method My Method RNA Assembly To Do

Sequence Assembly Goal : get the DNA/RNA sequence. Machine cannot read whole genomes in one go, but rather small pieces between 20 and 1000 bases. Define: Read = Tag = Fragment

De novo sequence assembly

MAPPING Map the tag to the reference genome. Resequencing

Overview Sequence Assembly Current Method My Method RNA Assembly To Do

De novo sequence assembly Calculating the overlap need huge amount of time.

DE BRUIJN GRAPH K-Mer : Length k substring of the Tag. Each nodes only have 4 out degrees at most. Hashing the node. “CTG”=>(132) 4 =(30) 10 “CTG”=>”TG G ” (132=) 4 shift left. (1320) 4 module (1000) 4 (320) 4 + (3) 4 ‘G’ (323) 4

DE BRUIJN GRAPH (CONT’) If there are repeats, like ”GACT” 3-Mer De Bruijn can not know which way is the correct way. 6-Mer can get the correct sequence. Larger K, better result.

De novo sequence assembly Suppose use K = Length of Tag. (20-Mer) TGACGTAGCTATGTATTTTG GACGTAGCTATGTATTTTG T (no 20-Mer) Coverage is not enough to support large K.

Overview Sequence Assembly Current Method My Method RNA Assembly To Do

MY METHOD. Tag length=6, K=3 When we have AAGACT? Try all the way: A AGACTC AAGACTT AAGACTG Check Tag : AGACTC The correct way should be AAGACT C

Overview Sequence Assembly Current Method My Method RNA Assembly To Do

RNA ASSEMBLY

ALTERNATIVE SPLICING The graph All cDNA sequences.

RNA ASSEMBLY’S PROBLEM Merge? Index the sequence.

RNA ASSEMBLY’S PROBLEM(CONT’) Solution?

RNA ASSEMBLY’S PROBLEM(CONT’) Index Tags

RNA ASSEMBLY’S PROBLEM(CONT’) Solution? Speed?

SINGLE TAG’S LIMITATION |Yellow Sequence| >= Length of Tag Length of Tag bp. Single Tag is not enough!

DATASET - PAIRED END TAGS Fragment length usually > 1k Some RNA sequence is shorter than 1k.

TO DO Handle large data-sets. (10G) Improve accuracy. Using PETs data.

Thanks!!