Presentation is loading. Please wait.

Presentation is loading. Please wait.

Proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M. 2007. Genome Res 17: 669-681 example of complexities observed.

Similar presentations


Presentation on theme: "Proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M. 2007. Genome Res 17: 669-681 example of complexities observed."— Presentation transcript:

1 proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M. 2007. Genome Res 17: 669-681 example of complexities observed by ENCODE (A) annotated exons (black rectangles), novel transcriptionally active regions or TARs (hollow rectangles); conventional annotation identifies only 4 genes or just a fraction of the transcripts reported (dashed lines are introns) (B) observed transcripts are shown alongside the sequences that regulate them (gray circles); note that some of the enhancers are actually promoters for novel splice isoforms

2 a redefinition of the “gene” 1. a gene is a genomic sequence directly encoding functional product molecules, either RNAs or proteins 2. when there are several functional products that share overlapping regions, take the union of all overlapping genomic sequences encoding them 3. this union must be coherent, done separately for protein and RNA products, but it does not require that all the products necessarily share a common subsequence concisely summarized as a union of genomic sequences encoding a coherent set of potentially overlapping functional products

3 4 genes defined in this one locus there are three primary transcripts, two of which encode five proteins, while the third encodes a noncoding RNA; two primary transcripts share a 5’ untranslated region, but they are considered different genes because the translated regions (D and E do not overlap; there is a noncoding RNA, but the fact it shares its genomic sequence (X and Y) with the protein-coding genomic segments A and E does not make it a co- product of these genes; there are four genes in this one locus by the new definition

4 gene number estimates as a function of time and methodology time genes sequence annotation observed transcripts genome is sequenced dark matter dark matter is reproducible, but it’s poorly transcribed, poorly conserved, non protein coding, and outnumbers validated microRNAs by ~1000 fold

5 cDNA sequencing reveals an abundance of non-coding genes mouse cDNAs by Okazaki Y, …, Hayashizaki Y. 2002. Nature 420: 563 or human cDNAs by Imanishi T, …, Sugano S. 2004. PLoS Biol 2: e162

6 neutral evolution of non-coding cDNAs from mouse transcriptome ncRNAs are known RNA genes; intron1 and intergenic are negative controls communications arising Wang J, …, Wong GK. 2004. Nature 431: after p757

7 tiling array data are riddled with unexplained signal anomalies too do not assume that non-coding cDNAs are tiling arrays exons human thymus polyA + cDNAs profiled at locus of Ewing sarcoma breakpoint region 1 gene; from Johnson JM, …, Schadt EE. 2005. Trends Genet 21: 93 mystery BURST

8 indications of biological relevance: transcription, conservation, both lines of evidence, or neither? possible dark matter explanations: 1. biological noise, i.e. real transcripts with no biological roles 2. RNA genes unique to a species 3. long RNAs are precursors for short (and conserved) RNAs NB: dark matter based on tiling arrays with 150 bp exons is not equivalent to cDNA sequences with 1800 bp exons poorly transcribed highly transcribed most biology highly conserved dark matter poorly conserved

9 hypothesis is unannotated long RNAs are precursors for short RNAs Kapranov P, …, Gingeras TR. 2007. Science 316: 1484-1488 nuclear and cytosolic polyadenylated RNAs longer than 200 nt (long RNAs, l RNAs) and whole-cell RNAs less than 200 nt (short RNAs, sRNAs) for non-repetitive portion of human genome; 64% of poly(A)+ transcription (nucleus and cytosol) do not align with annotated exons but of these 265,237 annotated exons some 80% are detected

10 l RNAs that overlap with sRNAs are more PhastCons conserved (i) PhastCons identifies evolutionarily conserved elements from a multi-species sequence alignment, given their phylogenetic tree, and based on a statistical model of evolution called a phylogenetic hidden Markov model (phylo-HMM)

11 l RNAs that overlap with sRNAs are more PhastCons conserved (ii) quantile-quantile plot of PhastCons scores for long RNAs that do (x axis) and do not (y axis) overlap with short RNAs; conservatively, 3.1% of HepG2 and 2.4% of HeLa nuclear l RNA transfrags might be parts of precursors of sRNAs

12 sRNAs associate with 5’ and 3’ boundaries of annotated transcripts enrichment over random expectation is plotted as function of distance from 5’ and 3’ termini for sRNAs on same (sense) or opposite (antisense) strand as the annotated transcripts; comparison is made against random regions with matched G+C content


Download ppt "Proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M. 2007. Genome Res 17: 669-681 example of complexities observed."

Similar presentations


Ads by Google