Jan Pačes Institute of Molecular Genetics AS CR

Slides:



Advertisements
Similar presentations
Capturing the chicken transcriptome with PacBio long read RNA-seq data OR Chicken in awesome sauce: a recipe for new transcript identification Gladstone.
Advertisements

Running Assembly Jobs on the Cluster with Checkpointing NERSC Tutorial 2/12/2013 Alicia Clum.
Doug Brutlag 2011 Sequencing the Human Genome Doug Brutlag Professor Emeritus of Biochemistry.
Pamela Ferretti Laboratory of Computational Metagenomics Centre for Integrative Biology University of Trento Italy Microbial Genome Assembly 1.
Next Generation Sequencing, Assembly, and Alignment Methods
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
What Is Genomics? Genomics is the study of how the entire genome of a species functions as a unit and evolves over time. It is the study of life’s blueprint,
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
CS273a Lecture 5, Win07, Batzoglou Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort contigs from largest to smallest,
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
BME 130 – Genomes Lecture 4 Sequencing technology II Next generation sequencing.
Henrik Lantz - BILS/SciLife/Uppsala University
Evaluation of PacBio sequencing to improve the sunflower genome assembly Stéphane Muños & Jérôme Gouzy Presented by Nicolas Langlade Sunflower Genome Consortium.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
Genome sequencing and assembly Mayo/UIUC Summer Course in Computational Biology Genome sequencing and assembly.
Sequencing Data Quality Saulo Aflitos. Read (≈100bp) Contig (≈2Kbp) Scaffold (≈ 2Mbp) Pseudo Molecule (Super Scaffold) Paired-End Mate-Pair LowComplexityRegion.
Next generation sequencing Xusheng Wang 4/29/2010.
De-novo Assembly Day 4.
Expression Analysis of RNA-seq Data
CS 394C March 19, 2012 Tandy Warnow.
Todd J. Treangen, Steven L. Salzberg
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
GENOME SEQUENCING AND ASSEMBLY Mayo/UIUC Summer Course in Computational Biology.
DAY 1. GENERAL ASPECTS FOR GENETIC MAP CONSTRUCTION SANGREA SHIM.
Meraculous: De Novo Genome Assembly with Short Paired-End Reads
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Genome Sequencing in the Legumes Le et al Phylogeny Major sequencing efforts Minor sequencing efforts ~14 MY ~45 MY.
P. Tang ( 鄧致剛 ); RRC. Gan ( 甘瑞麒 ); PJ Huang ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Genome Sequencing Genome Resequencing De novo Genome.
Next Generation DNA Sequencing
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
The Changing Face of Sequencing
The iPlant Collaborative
Towards your own genome. Designing your Sequencing Run Sequencing strategy Genome size and genome.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
Problems of Genome Assembly James Yorke and Aleksey Zimin University of Maryland, College Park 1.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
Bombus terrestris, the buff-tailed bumble bee Native to Europe A managed pollinator Commercially available Reared in greenhouses Important pollinator in.
Overview of the Drosophila modENCODE hybrid assemblies Wilson Leung01/2014.
Human Genome.
Introduction to RNAseq
University of Connecticut School of Engineering Assembler Reference Abyss Simpson et al., J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones,
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
BIOL 433 Plant Genetics Term 2, Instructors: Dr. George Haughn Dr. Ljerka Kunst BioSciences 2239BioSciences Tel
Dobrynin et al., Genome Biology,  The African cheetah  Fastest land animal  Ancestors were distributed in the Americas, Europe and Asia until.
Meet the ants Camponotus floridanus Carpenter ant Harpegnathos saltator Jumping ant Solenopsis invicta Red imported fire ant Pogonomyrmex barbatus Harvester.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
JERI DILTS SUZANNA KIM HEMA NAGRAJAN DEEPAK PURUSHOTHAM AMBILY SIVADAS AMIT RUPANI LEO WU Genome Assembly Final Results
De Novo Assembly of Mitochondrial Genomes from Low Coverage Whole-Genome Sequencing Reads Fahad Alqahtani and Ion Mandoiu University of Connecticut Computer.
Sequencing technologies
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Denovo genome assembly of Moniliophthora roreri
BIOL 433 Plant Genetics Term 2,
Professors: Dr. Gribskov and Dr. Weil
Assembly.
Sequencing technology and assembly
Very important to know the difference between the trees!
Henrik Lantz - NBIS/SciLife/Uppsala University
Genome Sequencing and Assembly
BIOL 433 Plant Genetics Term 2,
CSCI 1810 Computational Molecular Biology 2018
Transcript length distribution resulting from different assemblies of the embryo samples across the three technologies (HiSeq, MiSeq, and PacBio). Transcript.
Mapping rates of different transcript sets to the P
IWGS workflow. iWGS workflow. A typical iWGS analysis consists of four steps: (1) data simulation (optional); (2) preprocessing (optional); (3) de novo.
Presentation transcript:

Jan Pačes Institute of Molecular Genetics AS CR hard assembly Jan Pačes Institute of Molecular Genetics AS CR

problems genomes high GC content repetitions (short - low informational content, long) polymorphic "unreadable" sequences, "weird" structures technologies nonrandom libraries wrong sizes erroneous or chimeric reads

sequencing technologies ABI (sanger) 454 (pyrosequencing) solexa (reversible terminator) SOLiD (2base ligation) PacBio (SMRT)

example of errors in one technology http://chevreux.org/mira_ex_454sanger.html

high GC regions are underrepresented Aird et al. Genome Biology 2011

protocol optimization for high GC content Aird et al. Genome Biology 2011

repetitions scaffold repetition

repetitions

repetitions recognition Repeatmasker http://www.repeatmasker.org/ RepeatModeller (RECON and RepeatScout) http://www.repeatmasker.org/RepeatModeler.html position aware assemblers MIRA http://sourceforge.net/projects/mira-assembler/ MaSuRCA http://www.genome.umd.edu/masurca.html SPAdes http://bioinf.spbau.ru/spades

k-mer distribution

k-mer analysis JELLYFISH - Fast, Parallel k-mer Counting for DNA http://www.cbcb.umd.edu/software/jellyfish/ Quake is a package to correct substitution sequencing errors in experiments with deep coverage http://www.cbcb.umd.edu/software/quake/ KHMER Trim off likely erroneous k-mers https://khmer-protocols.readthedocs.org/en/v0.8.2/

repetitions repetition scaffold

filling gaps GapCloser (part of SOAPdenovo) http://soap.genomics.org.cn/soapdenovo.html GapFiller (part of SSPACE) http://www.baseclear.com/lab-products/bioinformatics-tools/gapfiller/ GapFiller http://sourceforge.net/projects/gapfiller/

454 multiplicates

contig coverage by large libraries

illumina pe and mate-pairs libraries 1616 illumina pe and mate-pairs libraries

highly polymorphic genomes two copies of polymorphic contigs scaffold

polymorphic assembly workflow normal assembly condensing alternative contigs mapping to identify SNPs "repair" reads second "polymorpic" assembly http://www.fishbrowser.org/software/L_RNA_scaffolder

G-quadruplex

Chicken p53 – coverage from RNAseq data AGCGACCCCCCCCCACCACCGCCACCACCACCTCTGCCATTGGCCGCCGCCGCCCCCCCCCCATTAAACCCCCCCACCCCCCCCCGCGCTGCCCCCTCCCCGGTGG Coverage > 13,000X

Chicken erythropoietin (EPO)– coverage from RNAseq data CCCGCCCACCCCCACCCCCACCCGCACCCCCCACTCTCCCACCCCCACCCCCTTTTCTCCCACCCCCTCTTCTCCCACCCCCTTTTCCCCCCCTTCCTCCCCCCACTCCG CCCCCCCCCCGCCCCCTCCCCCCCCCCAGGTGAGGACCCT Coverage > 500X from RNAseq (*EPO locus not completed even from 1000X coverage genomic Illumina data!)

chicken missing genes

that’s it, thank you many thanks also to: Daniel Elleder Tomáš Hron Michal Kolář Hynek Strnad