Presentation is loading. Please wait.

Presentation is loading. Please wait.

Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.

Similar presentations


Presentation on theme: "Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden."— Presentation transcript:

1 Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden

2 De novo assembly

3 Overall idea

4

5 Repeats and non random sheering

6 scaffolding Multiple libraries contigs are directed by mate pairs -> scaffolding

7 4 types of assemblers Greedy algorithms Overlap-layout-consensus Align-layout-consensus Bac by Bac sequencing

8 Types of assemblers I Greedy algorithms  joins similar reads  easily confused by repeats

9 Types of assemblers II Overlap layout consensus assembler  nodes represent end of read  lines represent similarity between reads (overlap)‏  layout step removes redundant information  consensus step is building of genome

10 Types of assemblers III Align-layout-consensus.  process called comparative assembly.  The overlap stage of assembly is replaced by an alignment step.  The layout stage is also greatly simplified due to the additional constraints provided by the alignment to the reference.

11 Types of assemblers IV Bac by bac sequencing  genome broken in fragments  Bac’s location is determined in the lab  minimum tiling path (whole genome is covered by at least one Bac  Bac’s sequenced

12 Lander-Waterman equation “rain drops” to cover a tile 8-10 fold coverage  5 contigs for 1MB genome

13 Timeline 1975 Sanger sequencing 1990 First shotgun/EST assemblers  overlap-layout-consensus approach 2000 Human shotgun assembly 2001 Mouse shotgun assembly 2005 454 roche available 2006 Solexa available 2007 short read assembers  de Bruijn graphs

14 The complexity of sequence assembly Long reads –better identification –much slower Short reads –faster to align –more difficult with repeats Amount of reads Length of reads Mismatches Algorithms can show quadratic or even exponential complexity

15

16 3 NGS Projects Dragon fly Medical Maggots EST comparison

17 Dragon Fly (libelle)‏ Class Odonata 3000 species  90 in Europe Undergo a morphic change

18 Pilot study for African Dragon Fly Morphic change Some migrate others don't Genetically divergent Contain lots of introns in their genome

19 Project questions What are the homologies with other species? How big is the genome? Are there already sequences in Genbank and are they present in the data?

20 Dragon fly project data Genomic  Single end  1 x 1147762 reads  Trimmed to 34/51 nucleotides  39.023.908 nucleotides sequenced CDNA  Paired end  2 x 1291901 reads  Read lenght = 51  131.773.902 nucleotides sequenced

21 Dragon fly methods Assemble cDNA Blast resulting contigs to determine homologies Align genomic DNA to contigs Calculate genome size

22 Dragon fly assembly results total contigs: 3898  average length of contigs: 176  average coverage of contigs: 24  contigs larger than 300 nucleotides: 800  average length of contigs larger then 300: 508  average coverage of contigs larger then 300: 15

23 Dragon fly genes and homologies libellula pulchella Enallagma aspersum Erythromma najas Ischnura verticalis many Drosophila species Criteria used for in this analysis was an e- value of less then 1*10^-40 and a score of more than 200. COII gene with accession number GQ256052.1 (partial)‏ COI gene with accession number GQ256032.1 (partial)‏ NDI gene with accession number GQ255994.1 (partial) found in the cDNA contigs.

24 Dragon fly genome size 30 genomic genes selected after blasting Size 300-1500 Alignment with Bowtie “calculation”

25 Medicinal maggots Treated to non healing wounds genes revealed  Signaling proteins Inhibitor of apoptosis protein 2  Digestive enzymes Lipases proteinases  antimicrobial peptides (AMPs)‏ Lucilia defensin diptericin

26 Medicinal maggots data 5 degenerate peptide sequences  36 Peptides cDNA  8.199.983 reads  read lenght 32  2.623.994.560

27 Medicinal maggots question Have we sequenced (pieces) of the genes corresponding to the peptides.

28 Medicinal maggots methods Build local library of peptides Assemble contigs  CLCbio  Nextgene  Velvet Blast contigs to peptides Find hits Make coverage plot

29 Nextgene assembly maggots aantal contigs = 59048 gemiddelde lengte = 59 gemiddelde coverage = 11 aantal contigs >300 = 719 gemiddelde lengte >300 = 661 gemiddelde coverage >300 = 64

30 CLC assembly Aantal contigs = 78 gemiddelde lengte = 2282 gemiddelde coverage = 514

31 Velvet assembly made total contigs: 586 length of contigs:168 coverage of contigs: 55 contigs larger than 300 nucleotides:62 length of contigs larger then 300: 779 coverage of contigs larger then 300: 63

32 Found Genes Maggots C.vicina mRNA for arylphorin subunit A4  Velvet Drosophila willistoni GK21455 (Dwil\GK21455) mRNA  nextgene Lucilia cuprina clone sbsp9 serine proteinase mRNA  nextgene

33 EST comparison Traditional EST sequencing known library assemblers  CLCbio  Nextgene  Velvet

34 EST comparison method Assemble cDNA and match with known ESTs

35 EST results

36 conclusions Big differences between assemblers  coverage  length  amount of nodes  sequence x performs best on EST test

37 Questions?


Download ppt "Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden."

Similar presentations


Ads by Google