Presentation is loading. Please wait.

Presentation is loading. Please wait.

Do not reproduce without permission 1 Gerstein.info/talks (c) 2004 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

Similar presentations


Presentation on theme: "Do not reproduce without permission 1 Gerstein.info/talks (c) 2004 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation."— Presentation transcript:

1 Do not reproduce without permission 1 Gerstein.info/talks (c) 2004 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation is copyright Mark Gerstein, Yale University, 2004, Feel free to use images in it with PROPER acknowledgement.

2 Do not reproduce without permission 2 Gerstein.info/talks (c) 2004 2 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Pseudogenes in the ENCODE Regions: Consensus Annotation, Analysis of Transcription and Evolution Deyou Zheng, Adam Frankish, Robert Baertsch, Philipp Kapranov, Alexandre Reymond, Siew Woh Choo, Yontao Lu, France Denoeud, Stylianos Antonarakis, Michael Snyder, Yijun Ruan, Chia-Lin Wei, Thomas Gingeras, Roderic Guigo, Jennifer Harrow, Mark Gerstein Yale, Sanger, UCSC, GIS, AFFX, U Geneva, IMIM a GT effort with great thanks to MSA, VAR, TR Talk at ENCODE 2006,07.05 12' in 20:30-21:30

3 Do not reproduce without permission 3 Gerstein.info/talks (c) 2004 3 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Pseudogenes are among the most interesting intergenic elements Regulatory regions, repeats, non-coding RNA, origins of replication…. Formal Properties of Pseudogenes (  G)  Inheritable  Homologous to a functioning element  Non-functional* No selection pressure so free to accumulate mutations –Frameshifts & stops –Small Indels –Inserted repeats (LINE/Alu) What does this mean? no transcription, no translation?… [Mighell et al. FEBS Letts, 2000]

4 Do not reproduce without permission 4 Gerstein.info/talks (c) 2004 4 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Pseudogenes (  G) as Disabled Homologies Cyc gene A pseudogene

5 Do not reproduce without permission 5 Gerstein.info/talks (c) 2004 5 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Why Study Pseudogenes?  Important for Doing Accurate Gene Annotation  Abundant: > 8000 retropseudogenes in human  High sequence similarity with genes  25% in C. elegans ? [Mounsay, Genome Research, 2002]  Interfere with study on functional genes  Cross-hybridation in micro-array and RT-PCR. [Ruud, Int. J. Cancer 1999]  Some pseudogenes have regulatory roles   G are “genomic fossils”  Study the evolution of genes and genomes  Measure mutation/insertion rates

6 Do not reproduce without permission 6 Gerstein.info/talks (c) 2004 6 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Why Study Pseudogenes?  Cause errors in sequence databases  > 8000 retropseudogenes in human  Contamination in Ensembl  25% in C. elegans ? [Mounsay, Genome Research, 2002]  "Interfere" with functional genes  Cross-hybridation in microarray and PCR (Cytokeratin 19, Int. J. Cancer 1999)  Very rarely this gives some pseudogenes regulatory roles   G are “genomic fossils”  Study the evolution of genes and genomes  Measure mutation/insertion rates In mouse, a pseudogene up-regulates gene expression of Makorin1 by binding to a transcriptional repressor or an RNA- digesting enzyme [Hirotsune et al. Nature 423 2003]

7 Do not reproduce without permission 7 Gerstein.info/talks (c) 2004 7 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Why Study Pseudogenes?  Cause errors in sequence databases  > 8000 retropseudogenes in human  Contamination in Ensembl  25% in C. elegans ? [Mounsay, Genome Research, 2002]  Interfere with study on functional genes  Cross-hybridation in micro-array and RT-PCR. [Ruud, Int. J. Cancer 1999]  Some pseudogenes have regulatory roles   G are “genomic fossils”  Study the evolution of genes and genomes  Illuminate important genomic remodeling processes of duplication and retrotransposition  Measure mutation/insertion rates

8 Do not reproduce without permission 8 Gerstein.info/talks (c) 2004 8 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Duplicated Pseudogenes Original Gene Gene Duplication Mutations retains intron/exon structure e.g. globins, Hox cluster and Arabidopsis genome sometimes can be transcribed

9 Do not reproduce without permission 9 Gerstein.info/talks (c) 2004 9 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Retro-pseudogenes (Processed  G) Original Gene LINE-11 mediated retrotransposition Mostly dead-on-arrival (DOA) Intronless, poly-A tail, direct repeats Target-primed reverse-transcription: -TT|AAA- AACATA AAAAAA Other types: Numt (nuclear mitochondria DNA)

10 Do not reproduce without permission 10 Gerstein.info/talks (c) 2004 10 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Overlap of Pseudogenes by 5 Different Methods 4 automatic pipelines (comparing protein or transcript v genomic DNA, filtering, application of rules) + HAVANA manual GIS

11 Do not reproduce without permission 11 Gerstein.info/talks (c) 2004 11 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu  HNRPA1  MTND2  MTND4  CYTB Ribonucleoprotein A1 proc. pseudogene Inserted mito. seq. resulting in 3 pseudogenes Complexities in Pseudogene Annotation

12 Do not reproduce without permission 12 Gerstein.info/talks (c) 2004 12 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Complexities in Pseudogene Annotation  HNRPA1  MTND2  MTND4  CYTB Ribonucleoprotein A1 proc. pseudogene Inserted mito. seq. resulting in 3 pseudogenes

13 Do not reproduce without permission 13 Gerstein.info/talks (c) 2004 13 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Regional Distribution 201 pseudogenes 77 non-processed 124 processed OR

14 Do not reproduce without permission 14 Gerstein.info/talks (c) 2004 14 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Ex. Pseudogene Intersecting Transcriptional Evidence TARS CAGE diTAG ChIP- chip

15 Do not reproduce without permission 15 Gerstein.info/talks (c) 2004 15 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Intersection of Pseudogenes with Transcriptional Evidence

16 Do not reproduce without permission 16 Gerstein.info/talks (c) 2004 16 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Targeted Transcription Expts. RACE expts  Interrogated 160 pseudogenes (49 non-processed & 111 processed)  In 51 cases (26 non-processed and 25 processed pseudogenes), could design distinguishing primers (>4 mismatched bp v. parent)  The resulting data supported transcription from 14 (8 processed and 6 non-processed) of the 160 pseudogenes (9 with pseudogene specific primers)  These numbers might represent a conservative estimate since a RACEfrag was assigned to its parent gene by default if it could be mapped to both a parent locus and a pseudogene locus. RACE expts + sequencing (CAGE, PET, EST and mRNA)  unambiguous evidence for pseudogene transcription  All together, these data indicate 38 of 201 pseudogenes being the source of novel RNA transcripts  5 of these had cryptic promotors (from TR analysis)

17 Do not reproduce without permission 17 Gerstein.info/talks (c) 2004 17 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu History of Pseudogene Preservation Absent Present with Disablement Present without Disablement

18 Do not reproduce without permission 18 Gerstein.info/talks (c) 2004 18 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Retrotransposition within Last 45 MYA Created Many Processed Pseudogenes

19 Do not reproduce without permission 19 Gerstein.info/talks (c) 2004 19 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Sequence Decay of Pseudogenes, Approximately Neutral

20 Do not reproduce without permission 20 Gerstein.info/talks (c) 2004 20 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Sequence Decay of Pseudogenes Relative to their Immediate Genomic Context

21 Do not reproduce without permission 21 Gerstein.info/talks (c) 2004 21 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Scaling Issues 201 pseudogenes X 100 = ~20K, which agrees with previous est. for whole genome Interplay between manual annotation and automatic pipelines  Dynamic interplay with gene annotation (can't overlap)  Need to have a protein alignment

22 Do not reproduce without permission 22 Gerstein.info/talks (c) 2004 22 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Using phastOdd value to examine neutral evolution of pseudogenes


Download ppt "Do not reproduce without permission 1 Gerstein.info/talks (c) 2004 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation."

Similar presentations


Ads by Google