Breakdown of 244 total (Yale+Vega) Pseudogenes Amongst Various ENCODE Regions 211 Yale, 178 Vega, Union is 244 More pseudogenes in the manually picked.

Slides:



Advertisements
Similar presentations
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Advertisements

Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Transcriptome Sequencing with Reference
Comparison of array detected transcription map with GENCODE/HAVANA annotations in ENCODE regions.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Microarray Data Analysis Using R Studies in Tissue Databases Mark Reimers, NCI.
Characterizing Alternative Splicing With Respect To Protein Domains BME 220 Project Charlie Vaske.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
Genome organization Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.
ENCODE pseudogene updates Adam Frankish, HAVANA 6/10/05.
1 ENCODE Pseudogene Summary for GT call Mark Gerstein 2005, :00 EDT summary of 6 Calls: Sept. 15, 22; Oct. 6, 13, 20, 27.
1. Abstract SAGE Serial analysis of gene expression (SAGE) is a method of large-scale gene expression analysis.that involves sequencing small segments.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
EXPLORING DEAD GENES Adrienne Manuel I400. What are they? Dead Genes are also called Pseudogenes Pseudogenes are non functioning copies of genes in DNA.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
NCBI’s Genome Annotation: Overview Incremental processing Re-annotation ( batch ) Post-annotation review Case studies NOTE: limiting discussion to annotation.
MAKER Annotation Process Example of Glossina VectorBase Karyn Mégy Dan Hughes.
Discussion Points for 2 nd Pseudogene Call Mark Gerstein 2005, :00 EST.
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
Supplementary Figure S1 Percentage of peaks from Trf1 +/+ p53 -/- -Cre vs Trf1  /  p53 -/- -Cre comparison that are located in non subtelomeric and subtelomeric.
Mapping Sites of Transcription Across the Drosophila Genome Using High Resolution Tiling Microarrays LBNL, Berkeley CA August 20, 2007 A. WillinghamAffymetrix,
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Chapter 21 Eukaryotic Genome Sequences
Wfleabase.org/docs/tilexseq0904.pdf What is all this genome expression? Observations and statistics for expression at the base level April 2009Don Gilbert.
The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M Genome Res 17: example of complexities observed.
Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
The Havana-Gencode annotation GENCODE CONSORTIUM.
Mark D. Adams Dept. of Genetics 9/10/04
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Analysis of protein-DNA interactions with tiling microarrays
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gerstein Lab Aims in ModENCODE.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
1 ENCODE Pseudogene Call Summary Mark Gerstein 2005, :00 EDT (Draft for G&T call on 2005, :00 EDT)
Chapter 3 The Interrupted Gene.
ENCODE pseudogene updates Adam Frankish, HAVANA 13/10/05.
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
1 Many to 1 Gene Associations The following slides show a few examples of gene predictions by one annotation group that overlap one or more genes from.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Accessing and visualizing genomics data
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.
1 Many to 1 Gene Associations The following slides show a few examples of gene predictions by one annotation group that overlap one or more genes from.
Quest for epigenetic determinants of local coexpression clusters Wieslawa Mentzen Labrador and Corces, 2002.
Gene structure and function
Considerations for multi-omics data integration Michael Tress CNIO,
Special Topics in Genomics ChIP-chip and Tiling Arrays.
GENCODE: a rich dataset of all gene features in the human genome The GENCODE consortium aims to identify all gene features in the human genome, using a.
EGASP 2005 Evaluation Protocol
Human Genome Project.
EGASP 2005 Evaluation Protocol
ENCODE Pseudogenes and Transcription
DNase‐HS sites are main independent determinants of DNA replication timing Simulations based on genome sequence features (GC content, CpG islands), or.
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
The Release 5.1 Annotation of Drosophila melanogaster Heterochromatin
Volume 128, Issue 6, Pages (March 2007)
Presentation transcript:

Breakdown of 244 total (Yale+Vega) Pseudogenes Amongst Various ENCODE Regions 211 Yale, 178 Vega, Union is 244 More pseudogenes in the manually picked regions. Numbers of genes and pseudogenes are weakly correlated in ENCODE regions. Half of total are processed pseudogenes – Note: many pseudogenes in ENm007 and ENm009 are fragment-like. No. of Genes No. of Pseudogenes r 2 =0.35 ManualRandom

Intersection of Pseudogenes with Transcriptional Data Pseudogenes Transcription factors binding sites from ChIP-Chip Genes Sequence conservation in rat, mouse and chimp EST ENm004 Affymetrix TARs using Oligo-microarray GIS-PET Yale TARs using Oligo- microarray CAGE

Example of a Pseudogene with Various Transcriptional Evidence Yale_Pgene_58

Intersection of Pseudogenes with Transcriptional Data: Comparison of Duplicated and Processed Pseudogenes (just Yale set) Duplicated (confident assignment) Processed (confident assignment) RestTotal # Yale TAR S (in all 5 expts) 12 (-15) 47 (-60) 87 (-98) Affy Transfrags (in all 6 expts) 9 (-13) 43 (-59) 54 (-84) Some pseudogenes can be confidently annotated as "duplicated" (having intron and exon structure) or "processed" (retrotransposed and disabled) These can be intersected with transcriptional data in 11 experiments at Yale and Affy (Total intersection over all is shown in big type. Just occurring in one experiment is shown in small type.) By random chance, of 211 Yale pseudogenes will intersect with TARs Expectation (confirmed) is that duplicated pseudogenes will have higher transcription

Intersection of Pseudogenes with Transcriptional Data: Manual vs Random Picks, Vega v Yale Yale PseudogenesVega Pseudogenes ENm*ENr*TotalENm*ENr*Total Pseudogenes Yale-TARS Affy-TARs GIS-PET CAGE EST By random chance, Yale pseudogenes will intersect with TARs. ~40% ENCODE pseudogenes intersect with TARs.

EXTRA

Intersection of TARs with Pseudogenes Not-”unique” TAR: one with a sequence of 60 bp (~3 probes) mapping to > 1 genomic locations (≥ 95% identity). 7 No. of TARs Overlapping a Pseudogene 7 Yale-Unique-TAR Yale-not-Unique-TAR No. of TARs Affy-not-Unique-TAR Affy-Unique-TAR

Summary 211 Pseudogenes (253, Yale + Vega) in ENCODE regions. Some pseudogenes (< 7%) might be transcribed based on combination of GIS- PET, CAGE or EST data. About one half of pseudogenes overlap with TARs. Non-unique TARs intersect with pseudogenes 5 times more often than unique TARs, probably due to cross-hybridization. Comparison with previous analysis: A more detailed survey found that 12-16% of chr22 pseudogenes intersected with TARs from tiling microarray (Zheng et al., 2005). Both a chr22 and a whole genome analysis showed that ~5% human pseudogenes are likely transcribed (Zheng et al., 2005; Harrison et al., 2005). Cheng et al. (2005) also reported that pseudogene-overlapping TARs are usually not unique. We repeat their analysis using ENCODE pseudogenes and find the same. Refs: Cheng et al., 2005, Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 308(5725): Harrison et al., 2005, Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability. Nucleic Acids Res. 33(8): Zheng et al., 2005, Integrated pseudogene annotation for human chromosome 22: evidence for transcription. J Mol Biol. 349(1):

 genes with Transcription Evidence, Merging Vega and Yale  genes ENm*ENr*Total Yale-TARS a Affy-TARs b Union-TARs GIS-PET224 CAGE61016 EST11718 By random chance, ~25 pseudogenes will intersect with TARs. ~40% ENCODE pseudogenes intersect with TARs. So high percentage? a,b – average of TARs from 5, individual experiments. 2

113 VEGA 40 Yale ENSEMBL 2 9 Pseudogenes in ENCODE Regions 211 pseudogenes were identified using an updated computational pipeline (Zhang et al. 2003) and manual curation. Compare Yale pseudogenes with pseudogenes from VEGA group and the ENSEMBL group. 2

Break Down of 211 Yale Pseudogenes Among Encode Regions More pseudogenes in the manually picked regions. 211 Pseudogenes can be separated into 104 processed, 19 duplicated and 88 others. Others – those can’t be clearly binned to processed or duplicated, e.g., fragments. Numbers of genes and pseudogenes are weakly correlated in ENCODE regions. 3 Manually Picked (ENm*) Randomly Picked (ENr*) No. of Genes No. of Pseudogenes r2=0.31