Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oryza Arjan van Zeijl Claire Lessa Alvim Kamei Robert van Loo Ruud Heshof BIF-30806 8-3-2013.

Similar presentations


Presentation on theme: "Oryza Arjan van Zeijl Claire Lessa Alvim Kamei Robert van Loo Ruud Heshof BIF-30806 8-3-2013."— Presentation transcript:

1 Oryza Arjan van Zeijl Claire Lessa Alvim Kamei Robert van Loo Ruud Heshof BIF-30806 8-3-2013

2 Goal Generate a platform to analyze gene expression of Saccharomyces cerevisiae using RNAseq data. Compare high expressed genes vs. low expressed genes on exon-intron length, GC-content, codon-usage.

3 MustTopHat, Cufflink ShouldExon-Intron length, GC content CouldGO-annotation, Codon-usage, Palindromes WouldChemostat analysis, Cytoscape MoSCoW

4 RNAseq data Trimmed Untrimmed TopHat Cufflinks Exon – Intron length GC content Palindrome Codon-usage Pipeline NCBI data GO-terms Validation Sequence retrieval

5 RNAseq data Selected Top100 genes per 20% batches of total genes 0-20%20-40%40-60%60-80%80-100% 100 genes Data output Perc 1Perc 2Perc 3Perc 4Perc 5 FPKM-value

6 NCBI data LOCUS NP_014825 63 aa linear PLN 25-FEB-2013 DEFINITION ribosomal 40S subunit protein S30B [Saccharomyces cerevisiae S288c]. ACCESSION NP_014825 VERSION NP_014825.3 GI:398365605 DBSOURCE REFSEQ: accession NM_001183601.3 KEYWORDS. SOURCE Saccharomyces cerevisiae S288c ORGANISM Saccharomyces cerevisiae S288c Eukaryota; Fungi; Dikarya; Ascomycota; Saccharomycotina; Saccharomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces....

7 Exon - Intron length ID SHORT EXON INTRON FPKM CDS GC_CDS L_PALIN GC_PALIN YOR182C RPS30B 192 412 15623.7 189 41.27 0 - Ribosomal 40S subunit protein S30B

8 GC content Claire Does more GC means more mRNA?

9 GC content & CDS length

10 Comparative genome analysis suggests characteristics of yeast inverted repeats that are important for transcriptional activity (2011) Humphrey-Dixon EL, Sharp R, Schuckers M, Lock R. Genome 54(11):934-42 Palindrome IR: at least 6 bp long, spacers maximum 10 bp Conservation: IR must be identical, spacer not

11 Palindrome Comparative analysis in 4 Saccharomyces genomes: S. cereviseae S. paradoxus S. mikatae S. bayanus IR in S. cereviseae Conserved in the 4 species Crossed the top 100 gene lists with the palindrome list to create 3 hash tables using the gene ID as keys: %gene_palin; %gene_palinseq; %GC_palin ;

12 Palindrome length

13 Percentiles 1

14 GC Palindrome & CDS

15 Codon usage Previous studies indicated more extreme codon usage preference in highly expressed genes (Sharp, 1986; Plotkin, 2011) Codon usage bias was shown to correlate with tRNA abundance (Sharp, 1986) Non-optimal codons might slow down translation, to allow correct protein folding (Pechmann, 2013) HOT TOPIC: 2 papers in Nature this week  Non-optimal codon usage is important for circadian clock rhythms

16 Codon usage MEASURE: Relative Synonymous Codon Usage (RSCU) Took mean RSCU over genes in top 100 for each class Problem annotation: CDS not always dividable by three

17 Codon usage

18 No.GOBPIDPvalueOddsRatio ExpCoun tCountSizeTerm 1GO:00021812.54E-85147.82.464171cytoplasmic translation 2GO:00090581.10E-2818.627.8771968biosynthetic process 3GO:00442679.87E-2712.121.7691570 cellular protein metabolic process 4GO:00104674.36E-2616.39.447731gene expression 5GO:00346452.27E-2110.616.8551386 cellular macromolecule biosynthetic process 6GO:00064071.40E-1488.30.31019rRNA export from nucleus 7GO:00718431.46E-136.38.434598 cellular component biogenesis at cellular level 8GO:00442371.62E-139.322.6501471cellular metabolic process 9GO:00004625.11E-1122.50.71152 maturation of SSU-rRNA from tricistronic rRNA transcript (SSU- rRNA, 5.8S rRNA, LSU-rRNA) 10GO:00422741.22E-1012.71.514111 ribosomal small subunit biogenesis 11GO:00442383.89E-106.053.7803842primary metabolic process 12GO:00506588.76E-088.81.712121RNA transport 13GO:00511688.76E-088.81.712121nuclear export 14GO:00064033.20E-077.71.912136RNA localization 15GO:00159316.94E-077.22.012146 nucleobase-containing compound transport 16GO:00193201.30E-0612.10.8859hexose catabolic process 17GO:00709251.38E-069.81.1980organelle assembly 18GO:00000281.84E-0633.90.2516 ribosomal small subunit assembly 19GO:00063642.17E-065.43.114235rRNA processing 20GO:00511692.25E-066.32.312163nuclear transport 21GO:00064502.44E-0631.50.2517 regulation of translational fidelity 22GO:00060945.12E-0616.80.5633gluconeogenesis 23GO:00060965.12E-0616.80.5633glycolysis 24GO:00463647.34E-0615.60.5635 monosaccharide biosynthetic process 25GO:00346602.54E-053.46.719480ncRNA metabolic process 26GO:00064175.20E-056.91.3897regulation of translation 27GO:00064146.39E-053.74.815345translational elongation 28GO:00718260.0001055.51.99136 ribonucleoprotein complex subunit organization 29GO:00160520.0003275.21.78125carbohydrate catabolic process Long list top 100 Basically two processes, components, functions Ribosome and translation related Glycolysis/gluconeogenesis related Zoom in on part of the table GO term enrichment

19 No.GOBPIDPvalueOddsRatioExpCountCountSizeTerm 1GO:00021812.54E-85147.82.464171cytoplasmic translation 2GO:00090581.10E-2818.627.8771968biosynthetic process 3GO:00442679.87E-2712.121.7691570 cellular protein metabolic process 8GO:00442371.62E-139.322.6501471cellular metabolic process 11GO:00442383.89E-106.053.7803842primary metabolic process 16GO:00193201.30E-0612.10.8859hexose catabolic process 22GO:00060945.12E-0616.80.5633gluconeogenesis 23GO:00060965.12E-0616.80.5633glycolysis 24GO:00463647.34E-0615.60.5635 monosaccharide biosynthetic process Top 100 GO-terms

20 Blast2GO

21

22 GO terms top 100

23 KEGG pathways

24 Technical validation use 4 paired end RNA-seq reads Create multiple copies (total 200, each 25 %) Run pipeline: 5 hits found! (one maps on two homologous gene on two chromosomes) FPKM values not equal (large length differences), so this is right Validation

25 Conclusion -High expressed genes have a high chance to contain introns. -There is a correlation between palindrome length and gene expression. -There is a preference for codon usage in highly expressed genes. -Highly expressed genes are richer in GC content and are shorter -Large differences exist in GC, intron/exon, palindromes and in GO terms between the top 100 and the rest

26 Questions


Download ppt "Oryza Arjan van Zeijl Claire Lessa Alvim Kamei Robert van Loo Ruud Heshof BIF-30806 8-3-2013."

Similar presentations


Ads by Google