Comparative transcriptomic analysis of fungi Group Nicotiana Daan van Vliet, Dou Hu, Joost de Jong, Krista Kokki.

Slides:



Advertisements
Similar presentations
Codon Bias and Regulation of Translation among Bacteria and Phages
Advertisements

Microarray Data Analysis Day 2
Transcriptome Sequencing with Reference
DEG Mi-kyoung Seo.
Open Day 2006 From Expression, Through Annotation, to Function Ohad Manor & Tali Goren.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
living organisms According to Presence of cell The non- cellular organism The cellular organisms According to Type the Eukaryotes the prokaryotes human.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
RNA-seq Analysis in Galaxy
Lecture 12 Splicing and gene prediction in eukaryotes
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
NGS Analysis Using Galaxy
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genomic Analysis. Flowchart get genome sequence – genome assembly find genes translate genes all against all, self-comparison all against all, interproteome.
BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis Species: C. Elegans Project: Advanced.
Networks and Interactions Boo Virk v1.0.
RNAseq analyses -- methods
Introduction to RNA-Seq & Transcriptome Analysis
DNA Structure & Function. Perspective They knew where genes were (Morgan) They knew what chromosomes were made of Proteins & nucleic acids They didn’t.
Next Generation DNA Sequencing
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
RNA-Seq Analysis Simon V4.1.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Data Analysis Project Advanced Bioinformatics BIF
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Summarizing Differential Expression Using Mann-Whitney U-tests.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Oryza Arjan van Zeijl Claire Lessa Alvim Kamei Robert van Loo Ruud Heshof BIF
From Genomes to Genes Rui Alves.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Decoding the Network Footprint of Diseases With increasing availability of data, there is significant activity directed towards correlating genomic, proteomic,
Central dogma: the story of life RNA DNA Protein.
Introduction to biological molecular networks
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Codon Bias and its Relationship to Gene Expression Presented through a virtual grant by the Virtual Student Union.
While replication, one strand will form a continuous copy while the other form a series of short “Okazaki” fragments Genetic traits can be transferred.
Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Comparative transcriptomics of fungi Group Nicotiana Daan van Vliet, Dou Hu, Joost de Jong, Krista Kokki.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
Protein Synthesis Making proteins – one of the jobs of genes.
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
Canadian Bioinformatics Workshops
Group Medicago Basic Project: Gene expression in yeast Advanced Bioinformatics.
Canadian Bioinformatics Workshops
Group Medicago Basic Project: Gene expression in yeast Advanced Bioinformatics.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Canadian Bioinformatics Workshops
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
Discovering the codon bias
Advanced Bioinformatics
Genomes and Their Evolution
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Transcription and Translation
Basic Local Alignment Search Tool
Additional file 2: RNA-Seq data analysis pipeline
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Project progress Brachypodium Rodenburg Wang Muminov Karrenbelt.
Presentation transcript:

Comparative transcriptomic analysis of fungi Group Nicotiana Daan van Vliet, Dou Hu, Joost de Jong, Krista Kokki

Research objective To study differences in gene expression in related fungus species Studies species: -Reference genome -RNA reads > 100 bp -Preferably: Paired-end -Related species -Similar conditions

Comparison Comparison between different species - Saccharomyces cerevisiae(yeast) - Komogataella pastoris(Pichia, yeast) - Aspergillus oryzae(fungus)

Methods – Data [Daan] RNA-seq: SRA Genome and annotation:Ensembl Fungi Read quality analysis performed with FastQC

Methods - Data processing Cleaning reads: SolexaQA Mapping reads: TopHat Assembly/Quantification: Cufflinks Optional replicate assembly: Cuffmerge Extracting transcript seqs: gffread Selection of top 100 genes: Linux

Methods – Gene properties PropertyExplanationTool (input datafile) ExpressionCount of mapped readsPerl script (fasta) LengthCount of base pairs of whole genePerl script (fasta) Intron lengthCount of base pairs within intronsPerl script (gtf) GC contentGC count/LengthPerl script (fasta) NcRatio: 20-61; 20 = one codon per amino acid; 61: random codon use CodonW (fasta) CG3sGC content of 3 RD synonymous codon position CodonW (fasta)

Methods – Interaction Top 100 genes were mapped to the interactome file and visualised through Cytoscape.

Hypothesis for yeast - Validation GC-content correlates positively with gene length. Negative correlation with gene length and degree of codon bias. Codon bias is more extreme in highly expressed genes. Genes with longer introns show higher bias in codon usage. The overall codon usage matches the known bias.

GO-terms and gene locations GOBPIDPvalueOddsRatioExpCountCountSizeTerm 1GO: E cytoplasmic translation 2GO: E primary metabolic process 3GO: E cellular component biogenesis at cellular level 4GO: E rRNA export from nucleus 5GO: E organelle assembly ChromosomeIIIIIIIVVVIVIIVIIIIXXXIXII XIVXVXVI Nro. of genes The top 5 most over-represented GO-terms for all the found genes The chromosomes the genes are found in.

Results – Correlations Gene expression vs. Gene length Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris

Results – Correlations Gene expression vs. Intron length Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris

Results – Correlations Gene expression vs. Effective Nr of codons Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris

Results – Correlations Effective Nr of Codons vs. GC-cont. 3 rd pos. Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris

Results – Correlations Gene length vs. Effective Nr of Codons Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris

Results – Correlations Gene length vs. GC-content Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris

Results – Correlations Gene length vs. Intron length Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris

Results – Correlations Intron length vs. Nc Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris

Results – Correlations Overall: -Within species: Few correlations between gene properties -Between species: Different patterns(?)

Cytoscape GO terms Top100 genes show different interactive network in GO terms

Results - First choice Yeast Interactome Project for S. cerevisiae high-throughput yeast two-hybrid (Y2H) provides high-quality binary interaction information. high-throughput Y2H dataset covering ~20% of all yeast binary interactions. This binary map is enriched for transient signalling interactions and inter-complex connections with a highly significant clustering between essential proteins.

Database choosing interactions from CCSB-YI1 1,809 interactions among 1,278 proteins

Second choice YeastNet v. 2 a probabilistic functional gene network of yeast genes, constructed from ~1.8 million expermental observations from DNA microarrays, physical protein interactions, genetic interactions, literature, and comparative genomics methods. In total, YeastNet v.2 covers 102,803 linkages among 5,483 yeast proteins a modified Bayesian integration of diverse data types, with each data type weighted according to how well it links genes that are known to share functions. (LLS)

Database choosing All the top 100 genes could find interactors in the Yeastnet v.2. We could find 9896 possibilities among 102,803 linkages

The end Questions?

Results – Correlations Gene expression vs. CG content Saccharomyces cerevisiae Aspergillus oryzae Komogataella pastoris