Transcriptome analysis

Slides:

Advertisements

Similar presentations

RNA-seq library prep introduction

Advertisements

Yaroslav Ryabov Lognormal Pattern of Exon size distributions in Eukaryotic genomes.

A very short introduction (in plants)

LECTURE 17: RNA TRANSCRIPTION, PROCESSING, TURNOVER Levels of specific messenger RNAs can differ in different types of cells and at different times in.

Regulation of eukaryotic gene sequence expression Lecture 6.

12/04/2017 RNA seq (I) Edouard Severing.

Peter Tsai Bioinformatics Institute, University of Auckland

Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.

Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

Transcriptomics Jim Noonan GENE 760.

Gene Expression Overview

RNA-seq Analysis in Galaxy

mRNA-Seq: methods and applications

Regulation of eukaryotic gene sequence expression

Day 2! Chapter 15 Eukaryotic Gene Regulation Almost all the cells in an organism are genetically identical. Differences between cell types result from.

RNA processing. RNA species in cells RNA processing.

Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.

Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.

RNAseq analyses -- methods

Regulation of Gene Expression Eukaryotes

RNA-Seq Analysis Simon V4.1.

DNA to Protein – 12 Part one AP Biology. What is a Gene? A gene is a sequence of DNA that contains the information or the code for a protein or an RNA.

Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.

LECTURE CONNECTIONS 14 | RNA Molecules and RNA Processing © 2009 W. H. Freeman and Company.

RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.

Chapter 11: Functional genomics

Introduction to RNAseq

Chapter 17 Transcription and Translation From Gene to Protein.

Topic 1: Control of Gene Expression Jamila Al-Shishani Mehran Hazheer John Ligtenberg Shobana Subramanian.

The iPlant Collaborative

RNA-seq: Quantifying the Transcriptome

While replication, one strand will form a continuous copy while the other form a series of short “Okazaki” fragments Genetic traits can be transferred.

No reference available

GENE REGULATION RESULTS IN DIFFERENTIAL GENE EXPRESSION, LEADING TO CELL SPECIALIZATION Eukaryotic DNA.

CFE Higher Biology DNA and the Genome Transcription.

Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.

RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.

RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

Simon v RNA-Seq Analysis Simon v

Using DNA Subway in the Classroom

Fig Prokaryotes and Eukaryotes

Promoters and expression

Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017

Gene expression from RNA-Seq

RNA-Seq analysis in R (Bioconductor)

Distribution of Introns among Full Length cDNA

S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.

High-Throughput Analysis of Genomic Data [S7] ENRIQUE BLANCO

Kallisto: near-optimal RNA seq quantification tool

Recitation 7 2/4/09 PSSMs+Gene finding

Reference based assembly

TRANSCRIPTION Sections 5.2 & 5.3.

Regulated Unproductive Splicing

AH Biology: Unit 1 Proteomics and Protein Structure 1

Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs BMI/CS Spring 2019 Colin Dewey

From DNA to Protein Class 4 02/11/04 RBIO-0002-U1.

Nonsense-Mediated mRNA Decay (NMD)

From gene to protein.

Sequence Analysis - RNA-Seq 2

Eukaryotic Gene Regulation

Sequence Analysis - RNA-Seq 1

Presentation transcript:

Transcriptome analysis Edouard Severing

Overview Introduction: Transcriptome complexity Transcriptome reconstruction Without a genome With a genome Transcript abundances Differential expression Transcript abundances models (Maximum likelihood)

Gene-expression/Phenotypes What are the gene expression differences that underly these phenotypic differences? Gene expression measured by assessing the abundance of mRNA molecules

Transcriptome vs. genome Initial assumption N mRNA Molecules N Proteins N Protein coding genes Assumption is based on studies that were performed on bacterial systems

Complexity and gene count 20.000 genes 25.000 genes

Transcriptome vs. genes in eukaryotes Current view X N mRNA Molecules ? N Proteins What happens here ? N Protein coding genes

Splicing Splicing 5’- -3’ 5’- -3’ 5’- -3’ Pre-mRNA Exon Intron Exon Gene

Alternative splicing II (Alternative splicing) Pre-mRNA 5’- -3’ 5’- -3’ Splicing 5’- -3’ 5’- -3’ Splicing

Complexity and AS 90% genes have AS 42% genes have AS The average number of transcripts produced by human genes is also higher than the average number of transcripts produced by plant genes

Extremes Dscam gene produces over 35,000 transcripts

AS type difference Humans Plants In humans exon skipping is most frequent AS event type In plants intron retention are the most common AS event type Humans Exon skipping Plants Intron retention

RNA editing (Base modification) Primary transcript (Predicted sequence) C U C 5’- A G U - 3’ A RNA-Editing After editing (Observed sequence) A C U U 5’- A G U - 3’ A Difficulty: Distinguish genuine RNA-editing from sequencing errors

Translation or decay A large fraction (>30%) of transcripts of protein coding genes are degraded by the nonsense-mediated decay (NMD) pathway. The position of the stop codon is used to predict whether a transcript is likely to be degraded by the NMD pathway

NMD target prediction Pre-mRNA 5’- -3’ Exon/Exon junctions mRNA 5’- Open reading frame M 5’- -3’ Stop d Transcripts containing a Stop codon more than 55 nt upstream of the last exon/exon junction are predicted to be targets for the NMD-pathway.

Remember The number of unique mRNA molecules is much larger than the number of genes. A large fraction of the mRNA molecules is degraded by the NMD pathway. NMD provides a means to regulate gene-expression at the post-transcriptional level

Transcriptome analysis. Reconstruction of the expressed transcripts given the sequencing data (Fragmented). Without a reference genome Trinity, TransABySS and Velvet With a reference genome Cufflinks, Scripture Determining the relative abundances of the predicted transcripts (cufflinks) Differential analysis (cufflinks) Gene-expression Alternative splicing

Without genome I

Without genome II

With a genome (Spliced alignment) 5’- -3’ mRNA

With a genome

With Genome II

Assignment Transcriptome reconstruction Mapping of reads to the genome using tophat Reconstruction of the transcriptome using cufflinks Blast analysis of the assembly result

Your login barshap berryk cizara dennisv dirkv dunyac giorgiot heleenw hildam ioannism jitskel joelk kamleshs leilas luigif mushtar patricial peterve roberte seyeda taox tristanj weic xiaoxues yanickh allemaal hetzelfde pw: wvdABcv12

Change password ssh <yourlogin>@137.224.100.201 passwd Exit Enter your password Change it to new password Type new password again Exit

Details ssh –X <yourlogin>@137.224.100.212 cd /mnt/geninf15/work/bif_course_2012 assignments are in assignment.txt

Estimating Expression levels Would be easy if only full length transcripts were recovered. However, we have transcript fragments. Simply counting the number of reads mapping to a gene or transcript is not good enough (Normalization is needed) The number of fragments that can be produced from a transcript not only depends its abundance but also its length.

Expression levels FPKM is analogous to RPKM One fragment One read

Back to gene level expression (I)

Back to gene level expression (II)

Differential expression analysis A genes is differentially expressed under two conditions if its expression difference is statistically significant. Larger that you would expect based random natural variation - In order to estimate the variance it is important to have experimental replicates . (Variation between biological replicates is larger than that between technical replicates).

Expression assignment Estimate the expression levels of predicted transcripts / genes in Arabidopsis roots and flower buds. (Cufflinks) Differential expression analysis of transcript abundances in Arabidopsis roots and flower buds (Cuffdiff)