Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jin Zhang, Jiayin Wang and Yufeng Wu

Similar presentations


Presentation on theme: "Jin Zhang, Jiayin Wang and Yufeng Wu"— Presentation transcript:

1 Jin Zhang, Jiayin Wang and Yufeng Wu
An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data Jin Zhang, Jiayin Wang and Yufeng Wu Department of Computer Science and Engineering University of Connecticut 8:33 PM RECOMB-seq 2012

2 Structural Variation (SV)
Alternative deletion insertion Reference Reference Alternative Mean insert size + 3 σ SV calling using HTS sequencing data Method Pair or Single Coverage Exact breakpoints Assembly Higher Read depth No Read pair Pair only Split read Reference Alternative Deletion Reference Alternative Deletion Exact breakpoint Mills et al. (Nature, 2011) “…,which facilitated analysing their origin and functional impact.“ Lam et al. (Nature Biotechnology 2010) classification and annotation Problem Finding SVs with Exact breakpoints using Low-coverage Paired-end reads 8:33 PM RECOMB-seq 2012

3 Split-read mapping (e.g. Deletion)
Reads mapping tools: Not map it Or Soft-clipping Focal region Reference Maximum event size Alternative Deletion Because of sequence and repeats, longer Maximum Event size (e.g. 1Mbps) may cause false positives Different way of splits may cause even more false positives Shorter maximum event size may reduce false positives but also may fail to find some larger deletions Method Algorithm Max Deletion Size Cutoff Insert Size Focal Region Pindel: Ye et al. (Bioinformatics 2009) Pattern growth Yes SVseq1: Zhang et al. (Bioinformatics 2011) BWT SVseq2 (For this work) (Recomb-seq 2012) Dynamic Programming 8:33 PM RECOMB-seq 2012 8:33 PM RECOMB-seq 2012 3

4 SVseq2: a pattern for deletion calling:
Finding focal region with the help of a spanning pair li: library mean σ: standard deviation l: read length li + 3σ Known breakpoint Alternative unknown breakpoint li+ 3σ -2l They are the same breakpoint on Alternative Spanning pair E.g. li+ 3σ -2l = * = 350 bps Note Maximum Event Size can be 1Mbps Reference li+ 3σ -2l li+ 3σ -2l Alternative (not known) Deletion (a) within length li+ 3σ -2l from ,find (b) Find by using , coz they are a pair (c) Find by mapping the soft-clipped portion within length li+ 3σ -2l of Using focal region: (1)Search in much smaller space (2)Reduce the way of splits (3)Able to find large deletions 8:33 PM RECOMB-seq 2012

5 SVseq2: another pattern in deletion calling
Reference Alternative Deletion Anchor li+ 3σ -2l The pair itself is also a spanning pair Dynamic alignment algorithm (semi-global) Similarity : 1 for matches and −1 for mismatches. Penalty: for gaps inside the sequence, 0 outside. GTTCTAAGCCAGTGGTTCTACCAACTTGAGTATGCATCAGAATCACTTGGA AGTGGTTCT- CCAACTTGAGAATGCATCA 8:33 PM RECOMB-seq 2012

6 SVseq2: Type III pattern for Insertion calling:
Read 1 Read 2 Alternative Overlap Region 1 Reference Portion 3 Portion 2 Portion 1 Portion 4 Mapping score: Penalties same as the deletion case Calling: Score / length of overlap < Threshold SVseq2 currently not reconstruct inserted sequences still use cutoff 8:33 PM RECOMB-seq 2012

7 Results Simulation on deletions
Simulate on chromosome 15 (100, 338, 915 bps); Introduce deletions with exact break from 1000 genomes project release: union deletions.genotypes.vcf.gz (number of them are 132) 45 individuals Simulate reads with wgsim ( (error rate 0.02) Pair-ends reads with length 100, outer distance 500 Mapped by BWA Cutoff: SVseq2: cutoff 3; SVseq1 cutoff 3; Pindel 0.2.4d cutoff 3 8:33 PM RECOMB-seq 2012

8 Real data Individual data Pooled data
Illumina datasets of 18 individuals on chromosome 20 (9 CEU, 9YRI) Mapped by BWA on NCBI37 Benchmark: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/working/ Contains called SVs using BreakDancerMax1.1, CNVnator, GenomeStrip, EMBL/Delly and Pindel ( with data of 1094 individuals) SVseq2 Cutoff 3(no cutoff for type I) and 4; SVseq1 and Pindel 0.2.4d cutoff 3. Individual data Pooled data ** F: Findings SE: supported by Exact breakpoint SO: supported by Overlap 8:33 PM RECOMB-seq 2012

9 Running time Acknowledgement NA19312, One Thread
Supported by NSF grant IIS 8:33 PM RECOMB-seq 2012


Download ppt "Jin Zhang, Jiayin Wang and Yufeng Wu"

Similar presentations


Ads by Google