Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Q1-Q3 results. 2 RF lengths 3 Filtered RF length distribution.

Similar presentations


Presentation on theme: "1 Q1-Q3 results. 2 RF lengths 3 Filtered RF length distribution."— Presentation transcript:

1 1 Q1-Q3 results

2 2 RF lengths

3 3 Filtered RF length distribution

4 4 Q1 filtered RF length distribution

5 5 Q2 filtered RF length distribution

6 6 Q3 filtered RF length distribution

7 7 RF position when compared to genes and exons

8 8 Q1-Q2-Q3: Projected filtered RF distribution (internal=overlap target gene ; projection done by pool) 39% internal 46% exonic 54% intronic 61% external 71% genic 79% exonic 22% overlap most 5' ex. of tr. 21% intronic 29% intergenic 86% internal 88% exonic 12% intronic 14% external 78% genic 88% exonic 47% overlap most 5' ex. of tr. 12% intronic 22% intergenic 21% internal 47% exonic 53% intronic 79% external 78% genic 69% exonic 23% overlap most 5' ex. of tr. 31% intronic 22% intergenic Q1Q3Q2  chimeric transcripts?

9 9 Why are Q3 RF mostly external (79%) ? Existence of a systematic swap between certain pairs of pools? For each RF we have computed the overlap with all genes of Q3 and then compared: RF pool with RF overlapping gene pool

10 10 RF overlapping gene pool

11 11 Q3 RF compared to Q3 genes  Q3 RF are more overlapping genes of their pool than genes of other pools (no clear pool swap)

12 12 6 genes of Q3 are in two different pools  generates pool unspecific RF Problematic pools are: ● 6-10-13 ● 8-9 ● 12-3 ● 14-15-16

13 13 Q3 RF overlapping Q3 genes

14 14 Position of Q3 filtered projected RF when filtering RF shorter than a threshold

15 15 Q2 vs Encode 2005 86% internal 88% exonic 12% intronic 14% external 78% genic 88% exonic 47% overlap most 5' ex. of tr. 12% intronic 22% intergenic 68% internal 49% exonic 51% intronic 32% external 80% genic 70% exonic 23% overlap most 5' ex. of tr. 30% intronic 20% intergenic Q2Encode 2005 433 out of 1577 (27.5%) are novel projected RF 2859 out of 4951 (57.8%) are novel projected RF

16 16 Distance of RF to closest gene within pool (target gene)

17 17 Q1, Q3: proportion of RF > 3Mb away from target gene Q1: 983/10387= 9.4% filtered RF > 3Mb away from target gene Q3: 1789/3411 = 52.4% RF > 3Mb away from target gene 839/1249 = 67.2% external non exonic RF > 3Mb away from target gene

18 18

19 19

20 20

21 21 Proportion of Q3 filtered RF >3 Mb away from target gene

22 22

23 23

24 24

25 25 Do external exonic projected RF overlap most 5' exons of transcripts more than other exons of transcripts ?

26 26 Proportion of external exonic projected RF overlapping most 5' exons of transcripts Real: 22.3% (63) Same strand: 68.3% (43) Opposite strand: 31.7% (20) Random: 19.8% (56) Same strand: 41.1% (23) Opposite strand: 58.9% (33) Real: 23.0% (335) Same strand: 62.1% (208) Opposite strand: 37.9% (127) Random: 15.8% (230) Same strand: 49.1% (113) Opposite strand: 50.9% (117) Real: 46.5% (206) Same strand: 45.6% (94) Opposite strand: 54.4% (112) Random: 30.7% (136) Same strand: 54.4% (74) Opposite strand: 45.6% (62) Q1Q3Q2

27 27 Does the most 5' RF of a particular gene and a particular tissue overlap most 5' exons of transcripts more than other RF?

28 28 Correlation of most 5' RF with CAGE tags

29 29 Correlation of most 5' racefrags with cage tags Most 5'RF 5'

30 30 Pool unspecific RF

31 31 Pool unspecific unique RF (USPP-filtered) Most pool unspecific unique RF are: Q1: internal exonic (72%) Q2: internal exonic (87%) Q3: external (91%) (of which 63% are exonic) 20 unique RF are in more than 4 pools

32 32 Pool unspecific unique Q3 RF (filtered) - Hits found by blat. - Need to be done again using our highlighted probe simulator.

33 33 Q3 Q1 Q2 Q1-Q3: Number of pools a unique RF appears in (unfiltered/filtered)

34 34 Pool-unspecific RFs in Q3 Possibly due to... 1 - cross-hybridization? is there a correlation between number of pools a RF is found in and the number of non-unique probes it overlaps? no by the way 135,380 / 2,191,331 (6%) of probes from chr21/22 chip have multiple perfect matches in genome

35 35 Pool-unspecific RFs in Q3 Possibly due to... 2 - high GC content? -> Answer: NO!

36 36 Pool-unspecific RFs in Q3 Possibly due to... 3 - mis-priming on unknown transcripts of chr21 or chr22 (missed by the simulator)? 4 - genuine chimeric transcripts? 5 - Pooling errors the same gene is present in >1 pool because it has 2 different identifiers (UCSC known genes / RefSeq nomenclature discrepancy we found a few cases like this, not sure yet how widespread it is (systematic survey to come)

37 37 Genes present in several pools 5 genes present in 2 pools: RP5-1042K10.2,NM_015705 (pools 14,15) CHODL,NM_024944 (pools 6,10) NM_005446,P2RXL1 (pools 8,9) ZNF74,NM_003426 (pools 10,13) NM_015367,BCL2L13 (pools 12,3) 1 gene present in 3 pools: NM_021090,NM_001013676,MTMR3 (pools 15,14,16) Eliminate RF present in these pairs/triplets of pools (problematic pool RF)

38 38 Effect of filtering problematic pool RF on Q3 pool unspecificity -48 Genes present in several pools do not explain all pool unspecific RF of Q3

39 39 Distribution of pool specific and pool unspecific unique Q3 RF Pool unspecific Q3 RF are more: ● external to Q3 genes, ● exonic, compared to pool specific Q3 RF

40 40 Pool specific and unspecific RF regarding gene overlap Pool specific RF overlap their target gene more than pool unspecific RF

41 41 Two other criteria for comparing Q3 pool specific and unspecific RF Overlap with gene in same orientation as target gene Distance to target gene Pool unspecific RF are more distant to their target gene Pool unspecific RF behave similarly as pool specific RF

42 42 6 genes of Q3 are in two different pools  generates pool unspecific RF Problematic pools are: ● 6-10-13 ● 8-9 ● 12-3 ● 14-15-16

43 43 Impact of index exon position on RF coverage

44 44

45 45 USPP filter results

46 46 The USPP filter removes more intergenic than genic RF Q1: proportion of exonic, intronic and intergenic RF before and after USPP-based filtering

47 47 The USPP filter removes more RF located: - from 100 to 200 kb - from 1 to 5 Mb to closest gene within pool Q1: Distance of RF to closest gene within pool before and after the USPP-based filter

48 48 Q2: Class 0, 1, 3, 5 RF removed by USPP-based filter (using 0, 1 and 2 Race/probe mismatches) The USPP filter: - removes 37 times more 3' RF than 5' RF - is ~ independent of the number of RACE/probe mismatches

49 49 Proportion of RF and projected RF eliminated by the USPP-based filter (projections made by pool)

50 50 Proportion of RF and projected RF eliminated by the USPP-based filter (projections made by pool)

51 51 Tissue specificity results

52 52 Q1: number of tissues a unique RF appears in (unfiltered/filtered)

53 53 Q2: Number of tissues a unique RF appears in (unfiltered/filtered)

54 54 Generating RF from probes

55 55 Generating RF from probes

56 56 Comparison between Encode 2005 and Q2

57 57 Intersection between Encode 2005 and Q2 RF sets

58 58 Comparison between Q1 and Q3

59 59 Overlap between Q1 and Q3 RF assigned to genes common to Q1-Q3 40% overlap between Q1 and Q3 RF assigned to genes common to both experiments  problem in gene assignment?


Download ppt "1 Q1-Q3 results. 2 RF lengths 3 Filtered RF length distribution."

Similar presentations


Ads by Google