Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jan Pačes Institute of Molecular Genetics AS CR

Similar presentations


Presentation on theme: "Jan Pačes Institute of Molecular Genetics AS CR"— Presentation transcript:

1 Jan Pačes Institute of Molecular Genetics AS CR
hard assembly Jan Pačes Institute of Molecular Genetics AS CR

2 problems genomes high GC content
repetitions (short - low informational content, long) polymorphic "unreadable" sequences, "weird" structures technologies nonrandom libraries wrong sizes erroneous or chimeric reads

3 sequencing technologies
ABI (sanger) 454 (pyrosequencing) solexa (reversible terminator) SOLiD (2base ligation) PacBio (SMRT)

4 example of errors in one technology

5 high GC regions are underrepresented
Aird et al. Genome Biology 2011

6 protocol optimization for high GC content
Aird et al. Genome Biology 2011

7 repetitions scaffold repetition

8 repetitions

9 repetitions recognition
Repeatmasker RepeatModeller (RECON and RepeatScout) position aware assemblers MIRA MaSuRCA SPAdes

10 k-mer distribution

11 k-mer analysis JELLYFISH - Fast, Parallel k-mer Counting for DNA
Quake is a package to correct substitution sequencing errors in experiments with deep coverage KHMER Trim off likely erroneous k-mers

12 repetitions repetition scaffold

13 filling gaps GapCloser (part of SOAPdenovo)
GapFiller (part of SSPACE) GapFiller

14 454 multiplicates

15 contig coverage by large libraries

16 illumina pe and mate-pairs libraries
1616 illumina pe and mate-pairs libraries

17 highly polymorphic genomes
two copies of polymorphic contigs scaffold

18 polymorphic assembly workflow
normal assembly condensing alternative contigs mapping to identify SNPs "repair" reads second "polymorpic" assembly

19

20 G-quadruplex

21 Chicken p53 – coverage from RNAseq data
AGCGACCCCCCCCCACCACCGCCACCACCACCTCTGCCATTGGCCGCCGCCGCCCCCCCCCCATTAAACCCCCCCACCCCCCCCCGCGCTGCCCCCTCCCCGGTGG Coverage > 13,000X

22 Chicken erythropoietin (EPO)– coverage from RNAseq data
CCCGCCCACCCCCACCCCCACCCGCACCCCCCACTCTCCCACCCCCACCCCCTTTTCTCCCACCCCCTCTTCTCCCACCCCCTTTTCCCCCCCTTCCTCCCCCCACTCCG CCCCCCCCCCGCCCCCTCCCCCCCCCCAGGTGAGGACCCT Coverage > 500X from RNAseq (*EPO locus not completed even from 1000X coverage genomic Illumina data!)

23 chicken missing genes

24 that’s it, thank you many thanks also to: Daniel Elleder Tomáš Hron
Michal Kolář Hynek Strnad


Download ppt "Jan Pačes Institute of Molecular Genetics AS CR"

Similar presentations


Ads by Google