Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology University of Trento Italy Microbial Genome Assembly 1.

Similar presentations


Presentation on theme: "Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology University of Trento Italy Microbial Genome Assembly 1."— Presentation transcript:

1 Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology University of Trento Italy Microbial Genome Assembly 1

2 2 Outline-summary 4. CASE STUDY 2. GENOME ASSEMBLY 3. ASSEMBLY STRATEGIES 1. QUICK INTRODUCTION

3 3 DNA packaging

4 4

5 5 Outline-summary 4. CASE STUDY 2. GENOME ASSEMBLY 3. ASSEMBLY STRATEGIES 1. QUICK INTRODUCTION

6 6 Next Generation Sequencing TCTTATTGTGACC TAGGCTAGCTTAG GCAATGCAGTAAC TCCAGCTAGGTTC ACGTAGGCTAGCGTTAGCGA........ CTGCAT C

7 7 Genome Assembly 1.GENOME SEQUENCING 2.PRELIMINARY ANALYSIS 3.ASSEMBLY 4.ADVANCED BIOINFORMATIC ANALYSIS OVERLAPPING SEQUENCE ALIGMENT

8 Sequencing the human genome with shotgun sequencing + assembly is the only feasible strategy Computational assembly of shotgun sequencing data is simply unfeasible, and a bad idea anyway Weber, James L., and Eugene W. Myers. "Human whole-genome shotgun sequencing." Genome Research 7.5 (1997): 401-409. Green, Philip. "Against a whole-genome shotgun.“ Genome Research 7.5 (1997): 410-417. They were both right! (…well, Weber and Myers were a bit more right from the practical viewpoint…) On the feasibility of sequence assembly

9 9 Outline-summary 4. CASE STUDY 2. GENOME ASSEMBLY 3. ASSEMBLY STRATEGIES 1. QUICK INTRODUCTION

10 10 Genome assembly strategies  Greedy approach → SSAKE  De Bruijn graph (DBG) → Velvet, SOAPdenovo  Overlap Consensus Layout (OLC) → MIRA  Mixed approaches → MaSuRCA

11 11 Genome assembly strategies DE BRUIJN GRAPH APPROACH (DBG)  Velvet, SOAPdenovo2 Nodes = overlapping sequences of reads of uniform length Edges = kmer (unique subsequences within reads) EULERIAN PATH

12 12 Genome assembly strategies OVERLAP CONSENSUS LAYOUT (OLC)  MIRA Nodes = reads Edges = overlap between reads 1.OVERLAP 2.LAYOUT 3.CONSENSUS HAMILTONIAN PATH

13 13 Genome assembly strategies

14 14 Genome assembly strategies DBGOLC ADVANTAGES Very sensitive to repeatsModular algorithmic design Kmer storaged just onceFlexibility and robustness Eulerian cycle Never explicitly computes pairwise computation DISADVANTAGES Sensitive to sequencing errors (new k-mers) Hamiltonian cycle Large computational memory space requirements Overlap stage istime- consuming Genome-size limitations

15 15  Greedy approach → SSAKE  De Bruijn graph (DBG) → Velvet, SOAPdenovo  Overlap Consensus Layout (OLC) → MIRA  Mixed approaches → MaSuRCA Genome assembly strategies

16 16 Genome Assemblers Average Coverage Number of Contigs Number of Contigs > 1Kb N50 contig size Fraction of reads assembled Total consensus (in nt) Number of scaffolds N50 scaffolds size Ion Torrent PGM → MIRA 3.9 Illumina → MaSuRCA MIRA 3.9 too produced good quality results, but it has a longer execution time and it becomes unstable with large amount of small reads

17 17 Outline-summary 4. CASE STUDY 2. GENOME ASSEMBLY 3. ASSEMBLY STRATEGIES 1. QUICK INTRODUCTION

18 18 Mycobacteria Assembly: Case Study Responsible for many animal and human diseases M. tuberculosis and M. leprae (TM) M. fortuitum (NTM) outbreak (nail salon, 2002) M. chelonae (NTM) outbreak (face lifts, 2004) Illumina HiSeq sequencing (NGS Facility – CIBIO/UNITN) Twenty mycobacterial strains From 20 different Mycobacteria species → MaSuRCA Novel mycobacteria detection clinical tests

19 19 Fastq-mcf tool poor quality ends of reads Ns, duplicates and sequencing adapters reads that are too short Reduction up to 73% Raw data quality assessment and pre-processing

20 20 K-mers: strings of a particular length k, which are shorter than entire reads Best empirical k-mer length: 91 bases long Assembly parameters setting High coverage

21 21 MaSuRCA results of Mycobacteria Abnormal GC content Genome size too high

22 22 Examples of environmental contaminations GC content based quality analysis Staphylococcus epidermidis

23 Thanks Photo coming soon http://gcat.davidson.edu/phast/#methods


Download ppt "Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology University of Trento Italy Microbial Genome Assembly 1."

Similar presentations


Ads by Google