1 ENCODE Pseudogene Summary for GT call Mark Gerstein 2005,10.28 11:00 EDT summary of 6 Calls: Sept. 15, 22; Oct. 6, 13, 20, 27.

Slides:



Advertisements
Similar presentations
1 Q1-Q3 results. 2 RF lengths 3 Filtered RF length distribution.
Advertisements

EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
Homology Based Analysis of the Human/Mouse lncRNome
Breakdown of 244 total (Yale+Vega) Pseudogenes Amongst Various ENCODE Regions 211 Yale, 178 Vega, Union is 244 More pseudogenes in the manually picked.
Transcriptome Sequencing with Reference
Protein Synthesis.
Gene prediction in ENCODE roderic guigó i serra crg-imim-upf, barcelona Advanced Bioinformatics, chsl, october 2005.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Comparison of array detected transcription map with GENCODE/HAVANA annotations in ENCODE regions.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
The Molecular Genetics of Gene Expression
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
(CHAPTER 12- Brooker Text)
Transcription: Synthesizing RNA from DNA
Protein Synthesis The genetic code – the sequence of nucleotides in DNA – is ultimately translated into the sequence of amino acids in proteins – gene.
GENETICS ESSENTIALS Concepts and Connections SECOND EDITION GENETICS ESSENTIALS Concepts and Connections SECOND EDITION Benjamin A. Pierce © 2013 W. H.
March 9, 2007 Bologna, February the complexity of human genes The ENCODE Genes & Transcripts group Roderic Guigó Centre de Regulació Genòmica, Barcelona.
CHAPTER 17 FROM GENE TO PROTEIN Copyright © 2002 Pearson Education, Inc., publishing as Benjamin Cummings Section B: The Synthesis and Processing of RNA.
Chapter 2 Genes Encode RNAs and Polypeptides
ENCODE pseudogene updates Adam Frankish, HAVANA 6/10/05.
Eukaryotic cells modify RNA after transcription
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
is accessible at: The following pages are a schematic representation of how to navigate through ALE-HSA21.
1. Bacterial genomes - genes tightly packed, no introns... HOW TO FIND GENES WITHIN A DNA SEQUENCE? Scan for ORFs (open reading frames) - check all 6 reading.
LOC_Os02g08480 Supplementary Figure S1. Exons shorter than a read length have few or no reads aligned. The gene at LOC_Os02g08040 contains exons shorter.
Discussion Points for 2 nd Pseudogene Call Mark Gerstein 2005, :00 EST.
Mapping Sites of Transcription Across the Drosophila Genome Using High Resolution Tiling Microarrays LBNL, Berkeley CA August 20, 2007 A. WillinghamAffymetrix,
Chapter 13. The Central Dogma of Biology: RNA Structure: 1. It is a nucleic acid. 2. It is made of monomers called nucleotides 3. There are two differences.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
LECTURE CONNECTIONS 14 | RNA Molecules and RNA Processing © 2009 W. H. Freeman and Company.
Chapter 17 From Gene to Protein. Gene Expression DNA leads to specific traits by synthesizing proteins Gene expression – the process by which DNA directs.
Sackler Medical School
Proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M Genome Res 17: example of complexities observed.
The Havana-Gencode annotation GENCODE CONSORTIUM.
Curation Tools Gary Williams Sanger Institute. SAB 2008 Gene curation – prediction software Gene prediction software is good, but not perfect. Out of.
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gerstein Lab Aims in ModENCODE.
Transcription. Recall: What is the Central Dogma of molecular genetics?
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Fgenes++ pipelines for automatic annotation of eukaryotic genomes Victor Solovyev, Peter Kosarev, Royal Holloway College, University of London Softberry.
1 ENCODE Pseudogene Call Summary Mark Gerstein 2005, :00 EDT (Draft for G&T call on 2005, :00 EDT)
ENCODE pseudogene updates Adam Frankish, HAVANA 13/10/05.
-1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.
CFE Higher Biology DNA and the Genome Transcription.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
GROUP 2 DNA TO PROTEIN. 9.1 RICIN AND YOUR RIBOSOMES.
The Transcriptional Landscape of the Mammalian Genome
The modern view of dispersed genome activity
Experimental Verification Department of Genetic Medicine
ENCODE Pseudogenes and Transcription
RNA-seq Replicate 1 RNA-seq Replicate 2 DNA
Organization of the human genome
Discovery and Characterization of piRNAs in the Human Fetal Ovary
Protein Synthesis The genetic code – the sequence of nucleotides in DNA – is ultimately translated into the sequence of amino acids in proteins – gene.
Alternative Splicing May Not Be the Key to Proteome Complexity
Volume 116, Issue 4, Pages (February 2004)
closing in on the set of human genes. The ENCODE project.
Chapter 6: Transcription and RNA Processing in Eukaryotes
Volume 10, Issue 8, Pages (April 2000)
Protein Synthesis The genetic code – the sequence of nucleotides in DNA – is ultimately translated into the sequence of amino acids in proteins – gene.
Characterization of New Members of the Human Type II Keratin Gene Family and a General Evaluation of the Keratin Gene Domain on Chromosome 12q13.13  Michael.
Presentation transcript:

1 ENCODE Pseudogene Summary for GT call Mark Gerstein 2005, :00 EDT summary of 6 Calls: Sept. 15, 22; Oct. 6, 13, 20, 27

2 Developed Consensus Set of 198 Pseudogenes ADerived from a qualified union of GIS, Havana, UCSC, & Yale with a uniform criteria on boundaries 1.Identify a “good” set of human proteins – HAVANA set? 2.Remove pseudogenes (from all 4 groups) overlapping with current GENCODE exons (does GENCODE have an updated version?). 3.Create an union of the remaining pseudogenes. 4.Find the “best” matching proteins for each pseudogene, remove entries without a BLAST hit (e-value cutoff issue?). 5. Realign each pseudogene to its parent protein to produce a uniform alignment and to define the start and end coordinates. 6.Apply a threshold to sequence identity and coverage? (No.) 7.Classify pseudogenes into processed and non-processed (how?) BOverall 222 pseudogenes; application of above receipe gives 198 Consensus (Intersection set of above is 81 (proc) + 49 (non-proc)) CCurrently, on test browser + encode wiki From Deyou Z. + Robert B.

3 Interesting Complexities of Pseudogene Annotation: Insertion of One Pseudogene into Another One heterogeneous nuclear ribonucleoprotein A1 (HNRPA1) pseudogene (parent on Chr12) NADH dehydrogenase 2 (MTND2) pseudogene (parent mitochondrial) NADH dehydrogenase 4 (MTND4) pseudogene (parent mitochondrial) cytochrome b (CYTB) pseudogene (parent mitochondrial) First insertion event Remnant of a second, mitochondrial insertion event (has post-insertion deletions) Protein evidence From Adam F.

4 Frameshift LILRA3 LILR pseudogene EST Evidence of Expression from a Pseudogene at 5’ UTR of Known Gene Upstream pseudogene corresponds to exons 1-3 of LILR family genes, 3’ exons have been lost. EST evidence supports expression from the pseudogene locus extending to known gene LILRA3. From Adam F.

5 - # of 198 overlapped by interrogated regions (affy arrays): 180 (90.9%) - # of 198 overlapped by yale tars or affy transfrags (union): 106 (53.5% of all ; 58.9% of interrogated) => There is evidence of transcription (from tars or transfrags) of the pseudogene or the parent gene (if cross-hybridization) for 53.5% of the consensus pseudogenes (upper bound on transcription) - # overlapping cage tags: 11 (5.5%) - # overlapping ditag tags: 1 (0.5%) (83 (41.9%) are overlapped by full length ditags) From France D. TAR/Transfrag Evidence for Transcription in 198 consensus pseudogenes

6 Example Pseudogene overlapped by tars/transfrags and tags: ENCODE_consensus_187 but pseudogene is 93% similar to parent From France D.

7 Consensus Pseudogenes with ≥2 ChIP-chip Hits Pgene-IDPgene-typeE2FH3K4me3 (0h & 30h) Sp3STAT1 13Processed Processed Processed Processed Processed Processed Non-Processed0100 [ 177 ]Non-Processed Processed Processed0011 Has Trans- criptional Evidence (intersects Gencode transcript) From Deyou Z. Look for ChIP-chip hits upstream of the pseudogenes

8 From Deyou Z. Pot. Transcribed Pseudogene (#177) with Upstream ChIP-chip Hits

9 Experiments to Validate Expression of Encode Pseudogenes Select ENCODE pseudogenes from the intersection part of consensus set –49 non-processed, 125 processed Designed oligos (25mer, Tm 70°C) –Either specific to pseudogene or shared between parental gene and pseudogene Doing 5’RACE in 12 human tissues –Brain, heart, kidney, spleen, liver, colon, sm. intestine, muscle, lung, stomach, testis, placenta –First 96 pseudogenes 5’RACEs done in 12 tissues –Last 78 will be done next week To do: pool multiple RACEs, send to Santa Clara and hybridize to Affymetrix ENCODE 20 nucleotide resolution arrays From Alex R. Stylianos Antonarakis, Robert Baertsch, Jorg Drenkow, Tom Gingeras, Charlotte Henrichsen Philipp Kapranov, Catherine Ucla, Alexandre Reymond Affymetrix, UCSC, University of Geneva, University of Lausanne

10 Extra Slides

11 Pseudogene group Core people: Jennifer Harrow,WEI Chia-Lin,Adam Frankish, "Dike, Sujit",Robert Baertsch, imim.es,Deyou Zheng, Yontao Lu unige.ch, Others: "Hoyem, Tara L", Roderic Guigo Serra, "'Gingeras, Tom'“ Suganthi Balasubramanian 6 Calls : Sept. 15, 22; Oct. 6, 13, 20, 27

12 81 (34) Havana-Gencode: 165 pseudogenes ( ) Yale: 167 pseudogenes ( ) UCSC retrogenes: 146 not expressed 16 (7) 33 (1) 15 (1) 17 (2) 16 (0) 54 (2) Refresher: many repetitions of the below “Venn analysis” 7 Havana agrees to be added (8, 11, 40, 59, 139, 152, 169). 4 at coding loci. [Yale agrees to delete] 1 with weak sequence identity.* 5 with “non-real” proteins.* 9 Havana agrees to be added. 2 at coding loci. [Yale agrees to delete] 1 with weak sequence identity.* 2 with “non-real” proteins.* * Solved by consistent protein set & threshold Numbers according to Adam’s note

13 Rearranged exon order in unprocessed pseudogene Protein evidence adaptor-related protein complex 1, beta 1 subunit (AP1B1) pseudogenes Exon 6Exon 3 Splice sites same as parent gene Dot plot protein evidence vs genome Following duplication of the AP1B1 locus rearrangements/duplications have produced two unprocessed pseudogenes corresponding to exons 6 and 3 of the parent gene From Adam F.

14 Rearrangement of processed pseudogene pseudogene similar to part of ribosomal protein L3 (RPL3) Protein dot plot mRNA dot plot Following insertion, one end of the RPL3 pseudogene has been flipped onto the opposite strand (with some loss of internal sequence) From Adam F.

15 Overlaps by tar/transfrag subset - Nb overlapped by interrogated regions (affy arrays): 180 (90.9%) - Nb overlapped by yale tars or affy transfrags (union): 106 (53.5% of all ; 58.9% of interrogated) - Nb overlapped by yale tars (union): 84 (42.4% of all ; 46.7% of interrogated) - Nb overlapped by affy transfrags (union): 102 (51.5% of all ; 56.7% of interrogated) - Nb overlapped by polyA+ tars/transfrags (union) 105 (53% of all ; 58.3% of interrogated) - Nb overlapped by total RNA tars (union) 61 (30.8% of all ; 33.9% of interrogated) From France D.

16 Expression from pseudogene locus (1) – putative novel transcript HAVANA sialyltransferase pseudogene (RP3-477O4.5) supported by protein evidence Putative novel transcript supported by a single EST with has a polyA site and signal Supporting EST (100% ID) Aligned proteins (column collapsed) polyA site and signal Appears to be some transcription from this locus which is supported at the 3’ end by a single EST From Adam F.

17 Intersect Consensus Pseudogenes with ChIP-chip Hits FactorsE2FH3K4me3 (0h) H3K4me3 (30h) Sp3STAT1 GroupUCDavisUCSD StanfordYale Total Hits Known Genes (405)  genes (198) From Deyou Z.