Presentation is loading. Please wait.

Presentation is loading. Please wait.

Landscape of transcription in human cells S. Djebali, C. A. Davis, A. Merkel, A. Dobin, T. Lassmann, A. Mortazavi, A. Tanzer, J. Lagarde, W. Lin, F. Schlesinger,

Similar presentations


Presentation on theme: "Landscape of transcription in human cells S. Djebali, C. A. Davis, A. Merkel, A. Dobin, T. Lassmann, A. Mortazavi, A. Tanzer, J. Lagarde, W. Lin, F. Schlesinger,"— Presentation transcript:

1 Landscape of transcription in human cells S. Djebali, C. A. Davis, A. Merkel, A. Dobin, T. Lassmann, A. Mortazavi, A. Tanzer, J. Lagarde, W. Lin, F. Schlesinger, C. Xue, G. K. Marinov, J. Khatun, B. A. Williams, C. Zaleski, J. Rozowsky, M. R¨oder, F. Kokicinski, R. F. Abdelhamid, T. Alioto, I. Antoshechkin, M. T. Baer, N. S. Bar, P. Batut, K. Bell, I. Bell, S. Chakrabortty, X. Chen, J. Chrast, J. Curado, T. Derrien, J. Drenkow, E. Dumais, J. Dumais, R. Duttagupta, E. Falconnet, M. Fastuca, K. Fejes-Toth, P. Ferreira, S. Foissac, M. J. Fullwood, H. Gao, D. Gonzalez, A. Gordon, H. Gunawardena, C. Howald, S. Jha, R. Johnson, P. Kapranov, B. King, C. Kingswood, O. J. Luo, E. Park, K. Persaud, J. B. Preall, P. Ribeca, B. Risk, D. Robyr, M. Sammeth, L. Schaffer, L.-H. See, A. Shahab, J. Skancke, A. M. Suzuki, H. Takahashi, H. Tilgner, D. Trout, N. Walters, H. Wang, H. Wrobel, Y. Yu, X. Ruan, Y. Hayashizaki, J. Harrow, M. Gerstein, T. Hubbard, A. Reymond, S. E. Anonarakis, G. Hannon, M. C. Giddings, Y. Ruan, B. Wold, P. Carninci, R. Guig´o, & T. R. Gingeras. Nature 489 (2012) 101–108. Aziz Khan, Ning Wang April 22, 2013 GROUP: 3

2 Outline Motivation & goal Workflow Data generation Results ◦Long RNA expression landscape ◦Short RNA expression landscape ◦RNA editing & allele-specific expression ◦Repeat region transcription ◦Characterization of enhancer RNA Conclusion Landscape of transcription in human cells, Djebali et al. Nature 2012 2

3 Motivation and goal ENCODE pilot phase (2003–2007) ◦Examine 1% of the human genome The ENCODE second phase (2007-2012) ◦To interrogate the complete human genome ◦80% of the human genome have at least one biochemical function Goal: ◦Provide a genome-wide catalogue of the produced RNAs ◦Identify the subcellular localization for the produced RNAs Landscape of transcription in human cells, Djebali et al. Nature 2012 3

4 Experimental workflow K562 cell line Landscape of transcription in human cells, Djebali et al. Nature 2012 4 CAGE(cap analysis of gene expression) sites of initiation of transcription RNA-PET(pair end tags) sites of 5’ & 3’ transcripts termini

5 15 ENCODE cell lines Landscape of transcription in human cells, Djebali et al. Nature 2012 5

6 RNA data processing Landscape of transcription in human cells, Djebali et al. Nature 2012 6

7 RNA data & processing software The mapped data was used to assemble and quantify annotated GENCODE v7 elements Elements and quantifications were further assessed for reproducibility between replicates using a non-parametric version of IDR Landscape of transcription in human cells, Djebali et al. Nature 2012 7

8 Long RNA expression landscape.

9 Detection of annotated transcripts 70% of annotated splice junctions, transcripts and genes were cumulatively detected ◦~ 85% of annotated exons with an average of coverage 96% (by RNA-seq). Small variation in the proportion of detected GENCODE elements Only a small proportion of GENCODE elements are detected exclusively in the Poly-A− RNA fraction Landscape of transcription in human cells, Djebali et al. Nature 2012 9 GENCODE elements are detected by RNA-seq data

10 Detection of novel transcripts The identified novel elements covered 78% of the intronic nucleotides and 34% of the intergenic sequences Used Cufflinks to predict (over all long RNA-seq samples) the following elements in intergenic and antisense regions: ◦94,800 exons (19%) ◦69,052 splice junctions (22%) ◦73,325 transcripts (45%) ◦41,204 genes (80%) Landscape of transcription in human cells, Djebali et al. Nature 2012 10

11 Independent validation of multi- exonic transcript models Landscape of transcription in human cells, Djebali et al. Nature 2012 11 Using overlapping targeted Roche FLX 454 paired-end reads and mass spectrometry ◦~ 3,000 intergenic and antisense transcript models tested ◦ Validation rates from70%to 90%were observed ◦These experiments identified >22,000 novel splice ◦Almost 8x increase compared to the sites originally detected with RNA-seq Thus, most novel transcripts seem to lack protein-coding capacity

12 The transcriptome of nuclear subcompartments (K562) For K562 cell line, total RNA isolated from: ◦Chromatin, Nucleolus and Nucleoplasm 51.64% (18,330) of the GENCODE (v7) annotated genes detected for all 15 cell lines (35,494) were identified Only a small fraction of annotated/novel elements was unique to that compartment Landscape of transcription in human cells, Djebali et al. Nature 2012 12 Thus, by analysing short and long RNAs in the different subcellular compartments: They confirm that splicing predominantly occurs during transcription.

13 Co-transcriptional splicing Short read mappings for exon-based splicing completion Landscape of transcription in human cells, Djebali et al. Nature 2012 13 a, b = exon inclusion c = exon exclusion d, e = exons not being completed The complete splicing index (coSI): coSI =

14 Co-transcriptional splicing Distribution of coSI scores computed on GENCODE internal exons Landscape of transcription in human cells, Djebali et al. Nature 2012 14 Distribution in the total chromatin RNA fractionDistribution in cytosolic Ploy-A+ RNA fraction

15 Gene expression across cell lines Landscape of transcription in human cells, Djebali et al. Nature 2012 15 Abundance of gene types in cellular compartments The distribution of gene expression is very similar across cell lines Protein-coding genes having on average higher expression levels than lncRNAs. Abundance of gene types in cellular compartments lncRNAs contributes more to cell-line specificity than protein-coding genes

16 Isoform expression within a gene Landscape of transcription in human cells, Djebali et al. Nature 2012 16 Cannot distinguish whether this is the result of multiple isoforms expressed in the same cell or of different isoforms expressed in different cells.

17 Transcription initiation Workflow of CAGE processing and elements: Raw CAGE reads  mapped to hg19 genome (using Delve) ◦Delve: a probabilistic mapper using HMM ◦Reads with bad mapping quality were discarded Mapped reads  clustered using paraclu ◦Clusters shorter than 200bp were selected Using TSS predictor: ◦A non-supervised classifier based on modeling sequences surrounding CAGE regions via HMMs. ◦To capture sequence motifs of length 2–8 present at a certain distance from the middle of each cluster Landscape of transcription in human cells, Djebali et al. Nature 2012 17

18 Transcription initiation Landscape of transcription in human cells, Djebali et al. Nature 2012 18 82,783 non-redundant TSSs were identified ~ 48% of the CAGE-identified TSSs are located within 500bp of an annotated RNA- seq detected GENCODE TSS. 3% are within 500bp of a novel TSS.

19 Transcription termination 128,824 sites mapping within annotated GENCODE transcripts were identified. ◦Trim unmapped RNA-seq reads with long terminal poly-As first. ◦~ 20% mapped proximal to annotated poly-A sites (PAS). ◦~ 80% correspond to novel PAS. It increased the average number of PAS per gene from 1.1 to 2.5 A cell-specific preference for proximal PAS in the cytosol compared to the nucleus. Landscape of transcription in human cells, Djebali et al. Nature 2012 19

20 Short RNA expression landscape

21 Annotated small RNAs Expression of GENCODE (v7) annotated small RNA genes Gene type*GENCODE total Detected genes (% detected) No. genes expressed in only one cell line (% detected) No. genes expressed in 12 cell lines (% detected) miRNA guide fragment‡ miRNA passenger fragment§ Internal fragments ∥ of annotated small RNA (average per detected gene) miRNA 1,756497 (28)59 (12)147 (30)454 (454)175 (175)18 snoRNA 1,521458 (30)73 (16)223 (49)NA 60 snRNA 1,944378 (19)123 (33)41 (11)NA 36 tRNA 624465 (75)29 (6)197 (42)NA 52 Other†† 1,209191 (16)69 (36)24 (13)NA 32 Total GENCODE 7,054 1,989 (28) 353 (18)632 (32)NA 40 Landscape of transcription in human cells, Djebali et al. Nature 2012 21 * Includes all other GENCODE small transcript biotypes except for pseudogenes. † All elements that have passed npIDR (0.1). ‡ Number of detected miRNAs with an expressed annotated guide (with an annotated guide in mirbase). § Number of detected miRNAs with an expressed annotated passenger (with an annotated passenger in mirbase). ∥ Short RNA-seq mapping for which the 5′ end starts 5 bp after the start and ends 5 bp before the end of a detected gene.

22 Annotated small RNAs (K562) Landscape of transcription in human cells, Djebali et al. Nature 2012 22

23 Unannotated short RNAs Two types: 1. Subfragments of annotated small RNAs 2. Second and largest source of unannotated short RNAs corresponds to novel short RNAs, associated with promoter, terminator regions of annotated genes,and termini-associated short RNAs. Landscape of transcription in human cells, Djebali et al. Nature 2012 23

24 Genealogy of short RNAs Landscape of transcription in human cells, Djebali et al. Nature 2012 24

25 Genealogy of short RNAs About 6% of all annotated long transcripts overlap with small RNAs and are probably precursors to these small RNAs. Many long RNAs seem to have dual roles, as functional RNAs, and as precursors for many important classes of small RNAs. Landscape of transcription in human cells, Djebali et al. Nature 2012 25

26 RNA editing & allele-specific expression.

27 RNA editing Cells can make discrete changes to specific nucleotide sequences within a RNA molecule after it has been generated by DNA A-I editing (main form of RNA editing in mammals) and C-U editing They found: A-I editing (88%) and T-C (5%) Notice : Their results do not support a recent report of a substantial number of non-canonical SNV edits in the RNA of human lymphoblastoid cells30. Landscape of transcription in human cells, Djebali et al. Nature 2012 27

28 Allele-specific expression (GM12878 RNA-seq datasets) 18% of both GENCODE annotated protein-coding & non- coding genes exhibit allele-specific expression. Similar proportion of genes with allele-specific expression in whole-cell, cytoplasm, & nucleus. Used AlleleSeq pipeline [Rozowsky et al. Mol. Syst. Biol. 2011] Landscape of transcription in human cells, Djebali et al. Nature 2012 28

29 Repeat region transcription

30 18% (14,828) of CAGE-defined TSS regions overlap repetitive elements. They found that CAGE clusters mapping to repeat regions were noticeably more narrowly expressed than CAGE clusters mapping within genic regions ( Shown by Shannon Entropy ) Cell-line specificity Indicate they may have real functions Landscape of transcription in human cells, Djebali et al. Nature 2012 30

31 Characterization of enhancer RNA

32 Transcription at enhancers RNA polymerase II binds some distal enhancer regions and produce enhancer-associated transcripts (eRNAs) Landscape of transcription in human cells, Djebali et al. Nature 2012 32

33 Pattern of RNA elements around enhancer predictions Landscape of transcription in human cells, Djebali et al. Nature 2012 33

34 Enhancer transcripts differ from promoter transcripts Landscape of transcription in human cells, Djebali et al. Nature 2012 34

35 Chromatin state at transcribed enhancers Landscape of transcription in human cells, Djebali et al. Nature 2012 35

36 Enhancer activity & transcription is cell-type specific Landscape of transcription in human cells, Djebali et al. Nature 2012 36

37 Summary Extend the current genome-wide annotated catalogue of long poly- adenylated & small RNAs of GENCODE 62.1% and 74.7% of the human genome are covered by either processed or primary transcripts ◦Primary = contigs + introns + GENCODE genes ◦Processed = contigs + GENCODE exons ◦On average one cell line transcribe 39% of the genome ◦No cell line transcribes more than 56.7% of the transcriptomes across all cell lines ◦The consequent reduction in the length of intergenic regions leads to a significant overlapping of neighbouring genic regions and prompts a redefinition of a gene Landscape of transcription in human cells, Djebali et al. Nature 2012 37

38 Summary A tendency for genes to express many isoforms simultaneously ◦(plateau: 10–12 expressed isoforms per gene per cell line) Cell-type-specific enhancers are promoters differentiable from other regulatory regions Coding & non-coding transcripts are predominantly localized in the cytosol and nucleus, respectively ~ 6% of all annotated coding and non-coding transcripts overlap with small RNAs and probably precursors to these small RNAs Landscape of transcription in human cells, Djebali et al. Nature 2012 38

39 Redefinition of the concept of a gene Landscape of transcription in human cells, Djebali et al. Nature 2012 39 Novel genes increase the proportion of small intergenic regions

40 Thank you! Landscape of transcription in human cells, Djebali et al. Nature 2012 40


Download ppt "Landscape of transcription in human cells S. Djebali, C. A. Davis, A. Merkel, A. Dobin, T. Lassmann, A. Mortazavi, A. Tanzer, J. Lagarde, W. Lin, F. Schlesinger,"

Similar presentations


Ads by Google