8 Q1-Q2-Q3: Projected filtered RF distribution (internal=overlap target gene ; projection done by pool) 39% internal 46% exonic 54% intronic 61% external 71% genic 79% exonic 22% overlap most 5' ex. of tr. 21% intronic 29% intergenic 86% internal 88% exonic 12% intronic 14% external 78% genic 88% exonic 47% overlap most 5' ex. of tr. 12% intronic 22% intergenic 21% internal 47% exonic 53% intronic 79% external 78% genic 69% exonic 23% overlap most 5' ex. of tr. 31% intronic 22% intergenic Q1Q3Q2 chimeric transcripts?
9 Why are Q3 RF mostly external (79%) ? Existence of a systematic swap between certain pairs of pools? For each RF we have computed the overlap with all genes of Q3 and then compared: RF pool with RF overlapping gene pool
31 Pool unspecific unique RF (USPP-filtered) Most pool unspecific unique RF are: Q1: internal exonic (72%) Q2: internal exonic (87%) Q3: external (91%) (of which 63% are exonic) 20 unique RF are in more than 4 pools
32 Pool unspecific unique Q3 RF (filtered) - Hits found by blat. - Need to be done again using our highlighted probe simulator.
33 Q3 Q1 Q2 Q1-Q3: Number of pools a unique RF appears in (unfiltered/filtered)
34 Pool-unspecific RFs in Q3 Possibly due to... 1 - cross-hybridization? is there a correlation between number of pools a RF is found in and the number of non-unique probes it overlaps? no by the way 135,380 / 2,191,331 (6%) of probes from chr21/22 chip have multiple perfect matches in genome
35 Pool-unspecific RFs in Q3 Possibly due to... 2 - high GC content? -> Answer: NO!
36 Pool-unspecific RFs in Q3 Possibly due to... 3 - mis-priming on unknown transcripts of chr21 or chr22 (missed by the simulator)? 4 - genuine chimeric transcripts? 5 - Pooling errors the same gene is present in >1 pool because it has 2 different identifiers (UCSC known genes / RefSeq nomenclature discrepancy we found a few cases like this, not sure yet how widespread it is (systematic survey to come)
37 Genes present in several pools 5 genes present in 2 pools: RP5-1042K10.2,NM_015705 (pools 14,15) CHODL,NM_024944 (pools 6,10) NM_005446,P2RXL1 (pools 8,9) ZNF74,NM_003426 (pools 10,13) NM_015367,BCL2L13 (pools 12,3) 1 gene present in 3 pools: NM_021090,NM_001013676,MTMR3 (pools 15,14,16) Eliminate RF present in these pairs/triplets of pools (problematic pool RF)
38 Effect of filtering problematic pool RF on Q3 pool unspecificity -48 Genes present in several pools do not explain all pool unspecific RF of Q3
39 Distribution of pool specific and pool unspecific unique Q3 RF Pool unspecific Q3 RF are more: ● external to Q3 genes, ● exonic, compared to pool specific Q3 RF
40 Pool specific and unspecific RF regarding gene overlap Pool specific RF overlap their target gene more than pool unspecific RF
41 Two other criteria for comparing Q3 pool specific and unspecific RF Overlap with gene in same orientation as target gene Distance to target gene Pool unspecific RF are more distant to their target gene Pool unspecific RF behave similarly as pool specific RF
42 6 genes of Q3 are in two different pools generates pool unspecific RF Problematic pools are: ● 6-10-13 ● 8-9 ● 12-3 ● 14-15-16
43 Impact of index exon position on RF coverage
46 The USPP filter removes more intergenic than genic RF Q1: proportion of exonic, intronic and intergenic RF before and after USPP-based filtering
47 The USPP filter removes more RF located: - from 100 to 200 kb - from 1 to 5 Mb to closest gene within pool Q1: Distance of RF to closest gene within pool before and after the USPP-based filter
48 Q2: Class 0, 1, 3, 5 RF removed by USPP-based filter (using 0, 1 and 2 Race/probe mismatches) The USPP filter: - removes 37 times more 3' RF than 5' RF - is ~ independent of the number of RACE/probe mismatches
49 Proportion of RF and projected RF eliminated by the USPP-based filter (projections made by pool)
50 Proportion of RF and projected RF eliminated by the USPP-based filter (projections made by pool)