Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discussion Points for 2 nd Pseudogene Call Mark Gerstein 2005,09.22 11:00 EST.

Similar presentations


Presentation on theme: "Discussion Points for 2 nd Pseudogene Call Mark Gerstein 2005,09.22 11:00 EST."— Presentation transcript:

1 Discussion Points for 2 nd Pseudogene Call Mark Gerstein 2005,09.22 11:00 EST

2 86 87 Havana-Gencode: 167 pseudogenes Yale: 184 pseudogenes UCSC retrogenes: 15 expressed (7-8 pseudogenes) + 143 not expressed (all pseudogenes) 16 18 22 17 45 35 21 42 18 Provided by France. Intersection of Pseudogenes from Three Groups: Original 86 havana peudogenes overlap with any Yale pseudogene and 87 Yale pseudogenes overlap with any havana pseudogene (idem for retrogenes). This is a global result: maybe in some loci three havana pseudogenes overlap with only one yale pseudogene, but in other loci, several yale pseudogenes overlap with one havana pseudogene.

3 82 (34) Havana-Gencode: 167 pseudogenes Yale: 164 pseudogenes UCSC retrogenes: 146 not expressed 17 (7) 33 (1) 15 (1) 14 (2) 16 (0) 52 (2) The numbers in parentheses are pseudogenes from GIS. All from http://pseudogene.org/ENCODE/cross-refhttp://pseudogene.org/ENCODE/cross-ref Pseudo-exons were merged to form pseudogenes and used for this comparison (now a pseudogene has only a single start and end) Strand information is ignored There are a total of 229 pseudogenes in the union Intersection of Pseudogenes from 4 Groups: Updated

4 82 (34) Havana-Gencode: 167 pseudogenes Yale: 164 pseudogenes UCSC retrogenes: 146 not expressed 17 (7) 33 (1) 15 (1) 14 (2) 16 (0) 52 (2) Intersection of Pseudogenes from 4 Groups: Non-processed Consensus GENCODE Processed GENCODE Non-Processed Yale Processed 7 / 85 / 5 Yale Non-Processed 4 / 439 / 37 Roughly agreement now is: 82 + 52 – 7 = 127 from 229 total What to do with 102 ?

5 How to Pick Pseudogenes for RT-PCR? Start with the intersection 127 Duplicated v processed: how many of each? (2:1?) Rank Pseudogenes: –By likelihood to be transcribed according to ENCODE evidence ditag, then CAGE, then tiling array –By their uniqueness in genome Good primers Non cross-hybridizing probes How to get a consistent rank? Who will do RT-PCR ? What coordinates to use ? (Ignore 1 processed pseudogene already being sequenced by GIS group.)

6 How to generate a consensus for remaining 102 pseudogenes? Stick with the intersection 127 Develop a consistent criteria for identifying pseudogenes and uniformly apply to ENCODE –E.g. protein matches with disablements found from a pipeline –Ignores tricky cases flagged by manual annotation Do a simple union of UCSC, Havana & Yale, giving 229 –GIS is a subset of other 3 –Describe pseudogenes as being identified by multiple approaches and then explicitly flag each group’s unique ones in final annotation –Easy but perhaps biases stats Do a qualified union –Allow each group to “question” particular pseudogenes in another’s set –Send questions around and then have a call to sort out differences –Need a way to arbitrate– e.g. we could demand an obvious disablement –We might learn something! How do we represent this in the browser & in stats?

7 Once we have consensus, how to agree on pseudogene boundaries? Keep unchanged each group’s boundaries –If pseudogenes overlap, take largest region (union) or smallest Develop a uniform criteria for assigning pseudogene boundaries and apply it to each of the pseudogenes in the consensus set –Could just take each pseudogene in the consensus and have one group realign it against parent


Download ppt "Discussion Points for 2 nd Pseudogene Call Mark Gerstein 2005,09.22 11:00 EST."

Similar presentations


Ads by Google