Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.

Slides:



Advertisements
Similar presentations
Mo17 shotgun project Goal: sequence Mo17 gene space with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of.
Advertisements

Accurate Assembly of Maize BACs Patrick S. Schnable Srinivas Aluru Iowa State University.
Next Generation Sequencing, Assembly, and Alignment Methods
Alignment Problem (Optimal) pairwise alignment consists of considering all possible alignments of two sequences and choosing the optimal one. Sub-optimal.
Genome Sequence Assembly: Algorithms and Issues Fiona Wong Jan. 22, 2003 ECS 289A.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Assembly.
DNA Sequencing and Assembly
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
DNA Sequencing and Assembly. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Genome sequencing and assembling
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
Reminder: Class on Friday, Discussion of Li et al. Proposal/Projects CAMERA feedback?
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
Sequencing a genome and Basic Sequence Alignment
Genome Sequencing & App. of DNA Technologies Genomics is a branch of science that focuses on the interactions of sets of genes with the environment. –
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
Assembling Genomes BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
De-novo Assembly Day 4.
Genomic sequencing and its data analysis Dong Xu Digital Biology Laboratory Computer Science Department Christopher S. Life Sciences Center University.
Mouse Genome Sequencing
CS 394C March 19, 2012 Tandy Warnow.
Todd J. Treangen, Steven L. Salzberg
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Introduction to next generation sequencing Rolf Sommer Kaas.
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Genome Sequencing in the Legumes Le et al Phylogeny Major sequencing efforts Minor sequencing efforts ~14 MY ~45 MY.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
Sequencing a genome and Basic Sequence Alignment
The Changing Face of Sequencing
RNA Sequencing I: De novo RNAseq
RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.
Human Genome.
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
1. Assembly by alignment Instead of overlap-layout-consensus we use alignment-consensus 2.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
What is BLAST? Basic BLAST search What is BLAST?
Human Genome Project.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
A Fast Hybrid Short Read Fragment Assembly Algorithm
Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert
Genome sequence assembly
Pre-genomic era: finding your own clones
Very important to know the difference between the trees!
Introduction to Genome Assembly
CSE182-L12 Gene Finding.
CS 598AGB Genome Assembly Tandy Warnow.
Bioinformatics: Buzzword or Discipline (???)
Introduction to Sequencing
Sequence the 3 billion base pairs of human
Assembling Genomes BCH339N Systems Biology / Bioinformatics – Spring 2016 Edward Marcotte, Univ of Texas at Austin.
Presentation transcript:

Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden

De novo assembly

Overall idea

Repeats and non random sheering

scaffolding Multiple libraries contigs are directed by mate pairs -> scaffolding

4 types of assemblers Greedy algorithms Overlap-layout-consensus Align-layout-consensus Bac by Bac sequencing

Types of assemblers I Greedy algorithms  joins similar reads  easily confused by repeats

Types of assemblers II Overlap layout consensus assembler  nodes represent end of read  lines represent similarity between reads (overlap)‏  layout step removes redundant information  consensus step is building of genome

Types of assemblers III Align-layout-consensus.  process called comparative assembly.  The overlap stage of assembly is replaced by an alignment step.  The layout stage is also greatly simplified due to the additional constraints provided by the alignment to the reference.

Types of assemblers IV Bac by bac sequencing  genome broken in fragments  Bac’s location is determined in the lab  minimum tiling path (whole genome is covered by at least one Bac  Bac’s sequenced

Lander-Waterman equation “rain drops” to cover a tile 8-10 fold coverage  5 contigs for 1MB genome

Timeline 1975 Sanger sequencing 1990 First shotgun/EST assemblers  overlap-layout-consensus approach 2000 Human shotgun assembly 2001 Mouse shotgun assembly roche available 2006 Solexa available 2007 short read assembers  de Bruijn graphs

The complexity of sequence assembly Long reads –better identification –much slower Short reads –faster to align –more difficult with repeats Amount of reads Length of reads Mismatches Algorithms can show quadratic or even exponential complexity

3 NGS Projects Dragon fly Medical Maggots EST comparison

Dragon Fly (libelle)‏ Class Odonata 3000 species  90 in Europe Undergo a morphic change

Pilot study for African Dragon Fly Morphic change Some migrate others don't Genetically divergent Contain lots of introns in their genome

Project questions What are the homologies with other species? How big is the genome? Are there already sequences in Genbank and are they present in the data?

Dragon fly project data Genomic  Single end  1 x reads  Trimmed to 34/51 nucleotides  nucleotides sequenced CDNA  Paired end  2 x reads  Read lenght = 51  nucleotides sequenced

Dragon fly methods Assemble cDNA Blast resulting contigs to determine homologies Align genomic DNA to contigs Calculate genome size

Dragon fly assembly results total contigs: 3898  average length of contigs: 176  average coverage of contigs: 24  contigs larger than 300 nucleotides: 800  average length of contigs larger then 300: 508  average coverage of contigs larger then 300: 15

Dragon fly genes and homologies libellula pulchella Enallagma aspersum Erythromma najas Ischnura verticalis many Drosophila species Criteria used for in this analysis was an e- value of less then 1*10^-40 and a score of more than 200. COII gene with accession number GQ (partial)‏ COI gene with accession number GQ (partial)‏ NDI gene with accession number GQ (partial) found in the cDNA contigs.

Dragon fly genome size 30 genomic genes selected after blasting Size Alignment with Bowtie “calculation”

Medicinal maggots Treated to non healing wounds genes revealed  Signaling proteins Inhibitor of apoptosis protein 2  Digestive enzymes Lipases proteinases  antimicrobial peptides (AMPs)‏ Lucilia defensin diptericin

Medicinal maggots data 5 degenerate peptide sequences  36 Peptides cDNA  reads  read lenght 32 

Medicinal maggots question Have we sequenced (pieces) of the genes corresponding to the peptides.

Medicinal maggots methods Build local library of peptides Assemble contigs  CLCbio  Nextgene  Velvet Blast contigs to peptides Find hits Make coverage plot

Nextgene assembly maggots aantal contigs = gemiddelde lengte = 59 gemiddelde coverage = 11 aantal contigs >300 = 719 gemiddelde lengte >300 = 661 gemiddelde coverage >300 = 64

CLC assembly Aantal contigs = 78 gemiddelde lengte = 2282 gemiddelde coverage = 514

Velvet assembly made total contigs: 586 length of contigs:168 coverage of contigs: 55 contigs larger than 300 nucleotides:62 length of contigs larger then 300: 779 coverage of contigs larger then 300: 63

Found Genes Maggots C.vicina mRNA for arylphorin subunit A4  Velvet Drosophila willistoni GK21455 (Dwil\GK21455) mRNA  nextgene Lucilia cuprina clone sbsp9 serine proteinase mRNA  nextgene

EST comparison Traditional EST sequencing known library assemblers  CLCbio  Nextgene  Velvet

EST comparison method Assemble cDNA and match with known ESTs

EST results

conclusions Big differences between assemblers  coverage  length  amount of nodes  sequence x performs best on EST test

Questions?