ENCODE Pseudogenes and Transcription

Slides:



Advertisements
Similar presentations
Chap. 6 Problem 2 Protein coding genes are grouped into the classes known as solitary (single) genes, and duplicated or diverged genes in gene families.
Advertisements

Breakdown of 244 total (Yale+Vega) Pseudogenes Amongst Various ENCODE Regions 211 Yale, 178 Vega, Union is 244 More pseudogenes in the manually picked.
Retroviruses And retroposons
Comparison of array detected transcription map with GENCODE/HAVANA annotations in ENCODE regions.
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
How to access genomic information using Ensembl August 2005.
Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Characterizing Alternative Splicing With Respect To Protein Domains BME 220 Project Charlie Vaske.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
Microarray Preprocessing
Mouse Genome Sequencing
ENCODE pseudogene updates Adam Frankish, HAVANA 6/10/05.
1 ENCODE Pseudogene Summary for GT call Mark Gerstein 2005, :00 EDT summary of 6 Calls: Sept. 15, 22; Oct. 6, 13, 20, 27.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
EXPLORING DEAD GENES Adrienne Manuel I400. What are they? Dead Genes are also called Pseudogenes Pseudogenes are non functioning copies of genes in DNA.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
NCBI’s Genome Annotation: Overview Incremental processing Re-annotation ( batch ) Post-annotation review Case studies NOTE: limiting discussion to annotation.
Discussion Points for 2 nd Pseudogene Call Mark Gerstein 2005, :00 EST.
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Supplementary Figure S1 Percentage of peaks from Trf1 +/+ p53 -/- -Cre vs Trf1  /  p53 -/- -Cre comparison that are located in non subtelomeric and subtelomeric.
Mapping Sites of Transcription Across the Drosophila Genome Using High Resolution Tiling Microarrays LBNL, Berkeley CA August 20, 2007 A. WillinghamAffymetrix,
Chapter 21 Eukaryotic Genome Sequences
The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
The Havana-Gencode annotation GENCODE CONSORTIUM.
Analysis of protein-DNA interactions with tiling microarrays
Human Genome.
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gerstein Lab Aims in ModENCODE.
Eukaryotic Genomes: The Organization and Control.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
1 ENCODE Pseudogene Call Summary Mark Gerstein 2005, :00 EDT (Draft for G&T call on 2005, :00 EDT)
ENCODE pseudogene updates Adam Frankish, HAVANA 13/10/05.
1 Many to 1 Gene Associations The following slides show a few examples of gene predictions by one annotation group that overlap one or more genes from.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.
1 Many to 1 Gene Associations The following slides show a few examples of gene predictions by one annotation group that overlap one or more genes from.
Considerations for multi-omics data integration Michael Tress CNIO,
Special Topics in Genomics ChIP-chip and Tiling Arrays.
GENCODE: a rich dataset of all gene features in the human genome The GENCODE consortium aims to identify all gene features in the human genome, using a.
Multi-Genome Multi- read (MGMR) progress report Main source for Background Material, slide backgrounds: Eran Halperin's Accurate Estimation of Expression.
The Transcriptional Landscape of the Mammalian Genome
Human Genome Project.
Figure 1. Annotation and characterization of genomic target of p63 in mouse keratinocytes (MK) based on ChIP-Seq. (A) Scatterplot representing high degree.
GO : the Gene Ontology & Functional enrichment analysis
University of Pittsburgh
SGN23 The Organization of the Human Genome
DNase‐HS sites are main independent determinants of DNA replication timing Simulations based on genome sequence features (GC content, CpG islands), or.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Gene Density and Noncoding DNA
Schedule for the Afternoon
lincRNAs: Genomics, Evolution, and Mechanisms
Quiz#2 LC710 10/15/12 name____________
Volume 126, Issue 6, Pages (September 2006)
Comparison of gene expression maps from available S
Volume 1, Issue 3, Pages (September 2007)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

ENCODE Pseudogenes and Transcription Deyou Zheng Yale University 7-05-05, ENCODE-GT

Pseudogenes in ENCODE Regions 211 pseudogenes were identified using an updated computational pipeline (Zhang et al. 2003) and manual curation. Compare Yale pseudogenes with pseudogenes from VEGA group and the ENSEMBL group. 113 VEGA 40 Yale 75 23 ENSEMBL 2 9 211 Yale pseudogenes, 178 Vega pseudogenes, 135-136 are common. 2

Break Down of Yale Pseudogenes Manually Picked (ENm*) Randomly Picked (ENr*) No. of Genes No. of Pseudogenes r2=0.31 In 44 encode regions: 104 processed, 19 duplicated, 88 others, total up to 211 More pseudogenes in the manually picked regions. 211 Pseudogenes can be separated into 104 processed, 19 duplicated and 88 others. Others – those can’t be clearly binned to processed or duplicated, e.g., fragments. Numbers of genes and pseudogenes are weakly correlated in ENCODE regions. 3

Intersection of Pseudogenes with Transcription Data ENm004 Yale TARs using Oligo-microarray Affymetrix TARs using Oligo-microarray GIS-PET CAGE Interesting aspect of pseudogenes – transcription EST Transcription factors binding sites from ChIP-Chip Sequence conservation in rat, mouse and chimp 4

Example of a Pseudogene with Various Transcription Evidence Yale_Pgene_58 5

Intersection of Pseudogenes with Transcription Data Yale Pseudogenes Vega Pseudogenes ENm* ENr* Total Pseudogenes 136 75 211 112 66 178 Yale-TARS 54-61 33-39 87-98 47-57 36-47 83-103 Affy-TARs 28-48 26-36 54-84 27-43 35-43 62-85 GIS-PET 1 2 3 5 CAGE 10 15 7 12 EST 9 6 By random chance, 20-30 Yale pseudogenes will intersect with TARs. ~40% ENCODE pseudogenes intersect with TARs. So high percentage? 6

Intersection of TARs with Pseudogenes Affy-Unique-TAR Yale-Unique-TAR No. of TARs Overlapping a Pseudogene Affy-not-Unique-TAR Yale-not-Unique-TAR No. of TARs Not-”unique” TAR: one with a sequence of 60 bp (~3 probes) mapping to > 1 genomic locations (≥ 95% identity). 7 7

Summary 8 211 Pseudogenes (253, Yale + Vega) in ENCODE regions. Some pseudogenes (< 7%) might be transcribed based on GIS-PET, CAGE or EST data. About one half of pseudogenes overlap with TARs. Non-unique TARs intersect with pseudogenes 5 times more often than unique TARs, probably due to cross-hybridization. Comparison with previous analysis: A more detailed survey found that 12-16% of chr22 pseudogenes intersected with TARs from tiling microarray (Zheng et al., 2005). Both a chr22 and a whole genome analysis showed that ~5% human pseudogenes are likely transcribed (Zheng et al., 2005; Harrison et al., 2005). Cheng et al. (2005) also reported that pseudogene-overlapping TARs are usually not unique. We repeat their analysis using ENCODE pseudogenes and find the same. Refs: Cheng et al., 2005, Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 308(5725): 1149-54. Harrison et al., 2005, Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability. Nucleic Acids Res. 33(8): 2374-83. Zheng et al., 2005, Integrated pseudogene annotation for human chromosome 22: evidence for transcription. J Mol Biol. 349(1):27-45. 8