Discovery of Structural Variation with Next-Generation Sequencing Alexandre Gillet-Markowska Gilles Fischer Team – Biology.

Slides:



Advertisements
Similar presentations
RNA-Seq as a Discovery Tool
Advertisements

CZ5225 Methods in Computational Biology Lecture 9: Pharmacogenetics and individual variation of drug response CZ5225 Methods in Computational Biology.
Genetic Analysis of Genome-wide Variation in Human Gene Expression Morley M. et al. Nature 2004,430: Yen-Yi Ho.
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
Introduction to genomes & genome browsers
Mapping translocation breakpoints by next- generation sequencing Chen, Wei, Vera Kalscheuer, Andreas Tzschach, Corinna Menzel, Reinhard Ullmann, Marcel.
Using the whole read: Structural Variation detection with RPSR
Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.
Next-generation sequencing and PBRC. Next Generation Sequencer Applications DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene.
Next-generation sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department Harvard Nanocourse October 7, 2009.
Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.
1000 Genomes SV detection Boston College Chip Stewart 24 November 2008.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Next-generation sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department.
Comparative Genomic Hybridization (CGH). Outline Introduction to gene copy numbers and CGH technology DNA copy number alterations in breast cancer (Pollack.
Informatics tools for next-generation sequence analysis Gabor T. Marth Boston College Biology Department University of Michigan October 20, 2008.
02_13.jpg Human chromosome 4 02_15.jpg 02_15_2.jpg.
Informatics challenges and computer tools for sequencing 1000s of human genomes Gabor T. Marth Boston College Biology Department Cold Spring Harbor Laboratory.
NGS Workshop Variant Calling
Next generation sequencing Xusheng Wang 4/29/2010.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Detecting copy number variations using paired-end sequence data Nick Furlotte CS224 May 29, 2009.
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
1 Genetic Variability. 2 A population is monomorphic at a locus if there exists only one allele at the locus. A population is polymorphic at a locus if.
Constitutional (germ-line) variants in hereditary conditions
Copy Number Variants: detection and analysis Manuel Ferreira & Shaun Purcell Boulder, 2009.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
P. Tang ( 鄧致剛 ); RRC. Gan ( 甘瑞麒 ); PJ Huang ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Genome Sequencing Genome Resequencing De novo Genome.
High throughput sequencing: informatics & software aspects Gabor T. Marth Boston College Biology Department BI543 Fall 2013 January 29, 2013.
Genomics Method Seminar - BreakDancer January 21, 2015 Sora Kim Researcher Yonsei Biomedical Science Institute Yonsei University College.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
Identification of Copy Number Variants using Genome Graphs
Genomics and Forensics
Cancer genomics Yao Fu March 4, Cancer is a genetic disease In the early 1970’s, Janet Rowley’s microscopy studies of leukemia cell chromosomes.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
CHROMOSOMAL INVERSIONS IN HUMAN POPULATIONS Andrea González Morales.
Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
End Show Slide 1 of 24 Copyright Pearson Prentice Hall 12-4 Mutations Outline 12–4: Mutations.
동물 분자 유전체 연구의 최신 동향 National Institute of Animal Science Animal Genomics & Bioinformatics 정호영
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
Recent Advances in Genomic Science Julian Sampson Institute of Medical Genetics, Cardiff.
Canadian Bioinformatics Workshops
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Canadian Bioinformatics Workshops
From Reads to Results Exome-seq analysis at CCBR
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Denovo genome assembly of Moniliophthora roreri
SVs and CNVs They are often confused…
Relationship between Genotype and Phenotype
Jin Zhang, Jiayin Wang and Yufeng Wu
Whole-Genome Sequencing Identifies Patient-Specific DNA Minimal Residual Disease Markers in Neuroblastoma  Esther M. van Wezel, Danny Zwijnenburg, Lily.
Linking Genetic Variation to Important Phenotypes
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Next-generation DNA sequencing
Single-Molecule Sequencing: Towards Clinical Applications
BF528 - Genomic Variation and SNP Analysis
BF528 - Whole Genome Sequencing and Genomic Variation
Canadian Bioinformatics Workshops
Relationship between Genotype and Phenotype
SNPs and CNPs By: David Wendel.
Presentation transcript:

Discovery of Structural Variation with Next-Generation Sequencing Alexandre Gillet-Markowska Gilles Fischer Team – Biology of Genomes UMR7238 Laboratory of Computational and Quantitative Biology Université Pierre et Marie-Curie, Paris

(i)Structural variations (SV) (ii) SV detection technologies (iii) Read pairs: 2 types of Illumina genomic DNA libraries (iv) SV detection using Read pairs (v) Polymorphic SV Structural Variations (SV) outline

1 Yes, the minimal size is arbitrary… 1 Structural Variations (SV)

INVERSION (INV)RECIPROCAL TRANSLOCATION (RT) INSERTION (INS) DELETION (DEL) ref SV ref SV Balanced SV Unbalanced SV (CNV) Intrachromosomal SVInterchromosomal SV ref SV ref SV TANDEM DUPLICATION (DUP) Balanced SV versus Unbalanced SV Pictures adapted from Feuk et al., 2006 Nature Reviews Calvin Blackman Bridges, Science

Why Discover SV ?  involved in > 30 diseases (Psoriasis, Crohn disease, ASD…)  chromosomal instability detected in the vast majority of cancers  powerful mechanism of adaptation and evolution

SV detection technologies

Calvin Blackman Bridges, Science Timeline of technologies used to discover SV SV, Structural Variations since Lejeune, Study of somatic chromosomes from 9 mongoloid children, Hebd Seances Acad Sci 1959 Smith et al, Interstitial deletion of (17)(p11.2p11.2) in nine patients. Am J Med Genet 1986 Comparative cytogenetics

Calvin Blackman Bridges, Science 200 et 221 CNV 360 Mb CNVR (12% du génome humain) 1936 Lejeune, Study of somatic chromosomes from 9 mongoloid children, Hebd Seances Acad Sci 1959 Smith et al, Interstitial deletion of (17)(p11.2p11.2) in nine patients. Am J Med Genet 1986 Iafrate, Detection of large-scale variation in the human genome, Nature Sebat, Large-scale copy number polymorphism in the human genome, Science 2004 Redon, Global variation in copy number in the human genome, Nature 2006 Comparative cytogenetics Microarrays Timeline of technologies used to discover SV SV, Structural Variations since 1936

Calvin Blackman Bridges, Science 200 et 221 CNV 360 Mb CNVR (12% du génome humain) Microarrays Korbel et al, Paired-end mapping reveals extensive structural variation in the human genome, Science NGS 1936 Lejeune, Study of somatic chromosomes from 9 mongoloid children, Hebd Seances Acad Sci 1959 Smith et al, Interstitial deletion of (17)(p11.2p11.2) in nine patients. Am J Med Genet 1986 Iafrate, Detection of large-scale variation in the human genome, Nature Sebat, Large-scale copy number polymorphism in the human genome, Science 2004 Redon, Global variation in copy number in the human genome, Nature HGP, A map of human genome variation from population-scale sequencing, Nature SV SV Comparative cytogenetics Timeline of technologies used to discover SV SV, Structural Variations since 1936

‘Range of usability’ of technologies  Size limit  SV type limit

SV detection with NGS data

Breakpoints res. SV size range CNV Balanced SV FDR Missing rate >100 bp > Insert Size Yes Variable Quinlan & Hall 2011 Trends in Genetics LI 2011 Nature 1 bp 1 bp–50 kbp Yes >10% >25% 1-10 bp >10 bp Yes No High? 1 bp >1 bp Yes low High? How to detect SV with NGS data ?

Read pairs: 2 types of Illumina genomic DNA libraries 1) Illumina Paired-End 2) Illumina Mate-Pair

1) Illumina Paired-End

2) Illumina Mate-Pair

Illumina Paired end vs Mate-Pair (MP allows a better genome assembly than PE) MP allows to detect SV that involve repeated elements

Illumina Paired end vs Mate-Pair Insert-size distribution of 100,000 read-pairs Insert-size (bp) 5,000 (or much less…)

Illumina Paired end vs Mate-Pair

SV detection with Read pairs 1)trim the data 2)align data to reference genome 3)remove PCR duplicates 4)SV calling

Trim the data First criteria: Chargaff rule

Trim the data First criteria : %A = %T and %G = %C on both DNA strands

Trim the data Second criteria: nucleotide quality Bcbio-nextgen Btrim CANGS Chipster Clean reads ConDeTri Ea-utils Fastx Flexbar PRINSEQ Reaper SeqTrim Skewer SolexaQA TagCleaner Trimmomatic Trimming tools

Align the data to reference genome

Remove PCR duplicates samtools rmdup (only intra-molecular duplicates) markduplicates.jar (picard tools) FastUniq … PCR duplicates annotation tools

SV signatures SV have nearly identical signatures with MP and PE

SV signatures Gillet-Markowska, 2014, Bioinformatics

SV signatures

Inter-tool variability is immense

Adapted from ICGC-TCGA challenge

Inter-tool variability is immense

SV examples

Korbel et al, Science 2007 SV in the Human genome

Not-so-identical monozygotic twins Bruder, C. E. G. et al. Phenotypically concordant and discordant monozygotic twins display different DNA copy-number-variation profiles. Am. J. Hum. Genet. 82, 763–771 (2008)

Butterfly mimicry

Livestock phenotypes caused by CNV

Polymorphic SV Structural Variations (SV)

Individual (germ line) SV in 100% of cells of each individual Tissue (somatic) SV in one tissue / in a few cells Polymorphic SV Structural Variations (SV)

#generation Bottleneck Bottleneck 2Bottleneck 3Bottleneck 4Bottleneck 5 Bottleneck #cells Sequencing a single culture Can we detect de novo SV occurring in a single cell culture by high throughput sequencing ? DNA extraction Sequencing (n=80) DNA extraction Sequencing The physical coverage (theoretically) sets the detection threshold S. cerevisiae 30 # generations # cells ,000X 700X

Pair-End sequencing: insert size ~ 400 bp Sequencing with high physical coverage Reference Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 Cell 6 Cell 7 Cell 8 Cell 9 Cell 10

Pair-End sequencing: insert size ~ 400 bp Sequencing with high physical coverage Reference Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 Cell 6 Cell 7 Cell 8 Cell 9 Cell 10

Pair-End sequencing: insert size ~ 400 bp Sequencing with high physical coverage Coverage (sequence) cov seq = 0.5X Reference Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 Cell 6 Cell 7 Cell 8 Cell 9 Cell 10

Pair-End sequencing: insert size ~ 400 bp Sequencing with high physical coverage Coverage (sequence) cov seq = 0.5X cov phys = 0.85X Coverage (physical) Reference Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 Cell 6 Cell 7 Cell 8 Cell 9 Cell 10

Pair-End sequencing: insert size ~ 400 bp Sequencing with high physical coverage Coverage (sequence) cov seq = 0.5X cov SV = 0 Reference Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 Cell 6 Cell 7 Cell 8 Cell 9 Cell 10 cov phys = 0.85X Coverage (physical)

Mate Pair sequencing: insert size ~ 1 to 20 kb Sequencing with high physical coverage Reference Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 Cell 6 Cell 7 Cell 8 Cell 9 Cell 10 Discordant Paired Sequence

Mate Pair sequencing: insert size ~ 1 to 20 kb Sequencing with high physical coverage Reference Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 Cell 6 Cell 7 Cell 8 Cell 9 Cell cov seq = 0.5X cov phys = 5X Coverage (sequence) Coverage (physical) cov SV = 1 Discordant Paired Sequence Mate Pair sequencing increases the sensitivity of SV detection

Illumina Paired-End