Presentation is loading. Please wait.

Presentation is loading. Please wait.

Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Similar presentations


Presentation on theme: "Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky."— Presentation transcript:

1 Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky

2  Introduction  EM Algorithm  Results  Conclusions and future work

3 ABCDE Make cDNA & shatter into fragments Sequence fragment ends Map reads Gene Expression (GE) ABC AC DE Isoform Discovery (ID) Isoform Expression (IE)

4  Read ambiguity (multireads)  What is the gene length? ABCDE

5  Ignore multireads  [Mortazavi et al. 08] ◦ Fractionally allocate multireads based on unique read estimates  [Pasaniuc et al. 10] ◦ EM algorithm for solving ambiguities  Gene length: sum of lengths of exons that appear in at least one isoform  Underestimate expression levels for genes with 2 or more isoforms [Trapnell et al. 10]

6 ABCDE AC

7  [Jiang&Wong 09] ◦ Poisson model, single reads only  [Li et al.10] ◦ EM Algorithm, single reads only  [Feng et al. 10] ◦ Convex quadratic program, pairs used only for ID  [Trapnell et al. 10] ◦ Extends Jiang’s model to paired reads ◦ Fragment length distribution

8  EM Algorithm for IE ◦ Single and paired reads ◦ Fragment length distribution ◦ Strand information ◦ Base quality scores  Solving GE by adding isoform levels

9  Introduction  EM Algorithm  Results  Conclusions and future work

10

11  Paired reads  Single reads ABC AC ABC ACAC ABCABC AC ABC AC ABC AC

12 E-step M-step

13  Introduction  EM Algorithm  Results  Conclusions and future work

14  Human genome UCSC known isoforms  GNFAtlas2 gene expression levels ◦ Uniform/geometric expression of gene isoforms  Normally distributed fragment lengths ◦ Mean 250, std. dev. 25

15  Error Fraction (EF) ◦ Percentage of isoforms (or genes) with relative error larger than given threshold t  Median Percent Error (MPE) ◦ Threshold t for which EF is 50% r2r2 ◦ Coefficient of determination

16  30M single reads of length 25  Main difference b/w IsoEM and RSEM is fragment length modeling

17  30M single reads of length 25

18  Fixed sequencing throughput (750Mb)  50bp reads better than 100bp!

19  1-60M 75bp reads  Pairs help, strand info doesn’t  [Trapnell et al. 10] r 2 =.95 for 13M PE reads

20  Introduction  EM Algorithm  Results  Conclusions and future work

21  Presented EM algorithm for isoform frequency estimation that exploits fragment length distribution for both single and paired reads ◦ Significant accuracy improvement over existing methods ◦ Code and datasets to be released publicly soon  Ongoing extensions ◦ Confidence intervals ◦ Allelic specific isoform expression ◦ Testing for novel isoforms ◦ Integration with isoform discovery

22


Download ppt "Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky."

Similar presentations


Ads by Google