Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reconstruction of Haplotype Spectra from NGS Data Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering.

Similar presentations


Presentation on theme: "Reconstruction of Haplotype Spectra from NGS Data Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering."— Presentation transcript:

1 Reconstruction of Haplotype Spectra from NGS Data Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering University of Connecticut

2 Haplotype Spectra Reconstruction Given NGS reads, reconstruct: – Full length sequences – Sequence frequencies Example applications: – Single individual haplotyping – Allele specific transcriptome reconstruction – Viral quasispecies reconstruction

3 Single Individual Haplotyping Somatic cells are diploid, containing two nearly identical copies of each autosomal chromosome – Heterozygous loci found by mapping reads to reference genome – Long haplotype fragments can be generated by sequencing fosmid pools [Duitama et al. 2012]

4 Single Individual Haplotyping Input: Matrix M of m fragments covering n loci Locus12345...n f1f1 *01100 f2f2 110*11 f3f3 00011* fmfm **1*11

5 Single Individual Haplotyping Input: Matrix M of m fragments covering n loci Locus12345...n f1f1 *01100 f2f2 110*11 f3f3 00011* fmfm **1*11

6 Single Individual Haplotyping Input: Matrix M of m fragments covering n loci Locus12345...n f1f1 *01100 f2f2 110*11 f3f3 00011* fmfm **1*11

7 Single Individual Haplotyping Input: Matrix M of m fragments covering n loci Locus12345...n f1f1 *01100 f2f2 110*11 f3f3 00011* fmfm **1*11

8 RefHap Algorithm [Duitama et al. 12] Reduce the problem to Max-Cut Solve Max-Cut Build haplotypes according with the cut Locus12345 f1f1 *0110 f2f2 110*1 f3f3 1**0* f4f4 *00*1 3 f1f1 1 1 f4f4 f2f2 f3f3 h 1 00110 h 2 11001 Chr. 22, 32k SNPs, 14k fragments

9 Haplotype Spectra Reconstruction Given short sequence fragments, reconstruct: – Full length sequences – Sequence frequencies Example applications: – Single individual haplotyping – Allele specific transcriptome reconstruction – Viral quasispecies reconstruction

10 Transcriptome Reconstruction Challenge: Alternative Splicing [Griffith and Marra 07]

11 1742365 t 1 : 174365 t 2 : 174235 t 3 :t 4 : 174351742365

12 Map the RNA-Seq reads to genome Construct Splice Graph - G(V,E) – V : exons – E: splicing events Generate candidate transcripts – Depth-first-search (DFS) Filter candidate transcripts – Fragment length distribution (FLD) – Integer programming Genome TRIP Transciptome Reconstruction using Integer Programming

13 How to filter? Select the smallest set of putative transcripts that yields a good statistical fit between – empirically determined during library preparation – implied by “mapping” read pairs 13 123 500 300 200 Mean : 500; Std. dev. 50

14 Allele Specific Expression

15 Haplotype Spectra Reconstruction Given short sequence fragments, reconstruct: – Full length sequences – Sequence frequencies Example applications: – Single individual haplotyping – Allele specific transcriptome reconstruction – Viral quasispecies reconstruction

16 RNA Virus Replication High mutation rate (~10 -4 ) Lauring & Andino, PLoS Pathogens 2011

17 How Are Quasispecies Contributing to Virus Persistence and Evolution? Variants differ in – Virulence – Ability to escape immune response – Resistance to antiviral therapies – Tissue tropism Lauring & Andino, PLoS Pathogens 2011

18 Shotgun reads starting positions distributed ~uniformly Amplicon reads have predefined start/end positions covering fixed overlapping windows Shotgun vs. Amplicon Reads

19 Reconstruction from Shotgun Reads: ViSpA Read Error Correction Read Alignment Preprocessing of Aligned Reads Read Graph Construction Contig Assembly Frequency Estimation Shotgun reads Quasispecies sequences w/ frequencies

20 Reconstruction from Amplicon Reads: VirA Reference in FASTA format Error- corrected SAM/BAM Read data Estimate Amplicons Max-Bandwidth Paths Viral population variants with frequencies Amplicon Read Graph Frequency Estimation

21 K amplicons represented by K-layer read graph Vertices ⇔ distinct reads Edges ⇔ reads with consistent overlap Vertices have count function c(v) Amplicon Read Graph

22 Read Graph Transformation Heuristic to reduce edges in dense graphs Replace bipartite cliques with star subgraphs

23 Challenges Scalability Exploit inherent sparsity of biological instances E.g., exact scaffolding algorithm using non-serial dynamic programming based on SPQR trees Flexibility Long (noisy) reads + short Heterogeneous data, e.g., RNA-Seq + TSSeq + PolyA-Seq Quantifying reconstruction uncertainty Compute intensive, e.g., bootstrapping + + + - - + - -

24 Acknowledgements Jorge Duitama Sahar Al Seesi Mazhar Kahn Rachel O’Neill Alexander Artyomenko Adrian Caciula Nicholas Mancuso Serghei Mangul Bassam Tork Alex Zelikovsky Irina Astrovskaya Pavel Skums


Download ppt "Reconstruction of Haplotype Spectra from NGS Data Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering."

Similar presentations


Ads by Google