RNA surveillance and degradation: the Yin Yang of RNA RNA Pol II AAAAAAAAAAA AAA production destruction RNA Ribosome.

Slides:



Advertisements
Similar presentations
RNA-seq library prep introduction
Advertisements

IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
12/04/2017 RNA seq (I) Edouard Severing.
Exploring the Human Transcriptome
Walk-thru of CAGE exercise Also at /tag_analysis/ /tag_analysis/
DEG Mi-kyoung Seo.
Transcriptomics Jim Noonan GENE 760.
Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan
RNA-seq Analysis in Galaxy
High Throughput Sequencing
Before we start: Align sequence reads to the reference genome
Li and Dewey BMC Bioinformatics 2011, 12:323
Expression Analysis of RNA-seq Data
RNAseq analyses -- methods
RNA-Seq Analysis Simon V4.1.
Eran Yanowski, Eran Hornstein’s: Monitor drug impact on the transcriptome of mouse beta cells (primary and cell-line) using Transeq/RNA-Seq Report.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
The iPlant Collaborative
Chapter 21 Eukaryotic Genome Sequences
Sackler Medical School
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.
The iPlant Collaborative
The iPlant Collaborative
No reference available
Accessing and visualizing genomics data
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Canadian Bioinformatics Workshops
RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Simon v RNA-Seq Analysis Simon v
Introductory RNA-seq Transcriptome Profiling
Canadian Bioinformatics Workshops
Cancer Genomics Core Lab
RNA Sequencing Day 7 Wooohoooo!
Gene expression from RNA-Seq
RNA-Seq analysis in R (Bioconductor)
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Kallisto: near-optimal RNA seq quantification tool
Volume 50, Issue 1, Pages (April 2013)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
RNA Exosome Depletion Reveals Transcription Upstream of Active Human Promoters by Pascal Preker, Jesper Nielsen, Susanne Kammler, Søren Lykke-Andersen,
Adrien Le Thomas, Georgi K. Marinov, Alexei A. Aravin  Cell Reports 
Baekgyu Kim, Kyowon Jeong, V. Narry Kim  Molecular Cell 
Additional file 2: RNA-Seq data analysis pipeline
Volume 21, Issue 9, Pages (November 2017)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Volume 63, Issue 3, Pages (August 2016)
Basic Local Alignment Search Tool
Quantitative analyses using RNA-seq data
Sequence Analysis - RNA-Seq 2
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

RNA surveillance and degradation: the Yin Yang of RNA RNA Pol II AAAAAAAAAAA AAA production destruction RNA Ribosome

MODEL: * * * * AAAAA Exosome Degradation of hypomodified tRNA i Met Hypomodified tRNA i Met * * * * Polyadenylation by Trf4p * * * * AAAAA Mtr3p Rrp41p Rrp45p Rrp40p Rrp46p Rrp42p Rrp4p Rrp43p Rrp44p Csl4p * *- Hypothetical diagram of the exosome Rrp6p Trf4p Mtr4

Workflow Knockdown mMtr4 Library Construction PolyA-SeqMapping Remove Internal A AggregateNormalize Connect& Compare CollectVisualize

Next Gen sequencing PolyA-Seq Mtr4 TRAMP Complex Papd5 ZCCHC7 siRNA knockdown AAAA

Library creation for NGS

Map paired end reads to genome BWA (Burrows-Wheeler Aligner) Algorithm used to map each pair of reads to the genome Report each pair of reads as a single nucleotide position within the genome where polyadenylation detected in an RNA sample Average insert size 300 – Read size ~45 TTTp-5’ AAAA-3’ 3’-A

Raw reads vs Mapped reads Data type/kd typeRaw readsMapped readspositions Replicate Data Mtr415,135,07810,853,534651,551 Ctrl16,348,78011,708,310652,128 Rrp615,971,92612,388,266705,173 Original data Mtr4ND34,204,5341,124,968 CtrlND7,195,942582,256 Rrp6ND8,241,505597,672 Normalization of data: reads per million (rpm)

Analysis Starting with refseq database – Raw read counts converted to reads per million Reads at position/total reads in sample – Remove all non-coding RNAs – From each sample collect normalized reads mapping at the 3’ end +/- 50 bases of each refseq encoding protein – Dot Plot normalized reads on log scale, X axis=control and Y axis=mMtr4KD

mRNA polyadenylation does not change between Mtr4 and control KD R2=

Problems encountered Sequencing read depth very different in the original data – 34 mil mapped reads in one sample 8 mil in other Lack of 3 replicates for robust statistical analysis of data Removal of internal A – Seq reads that map to a oligoadenylate track in the genome – Algorithm developed misses many – Manual removal takes too much time.

Remove Internal A AAAAAAAA TTTTTTTTT

How to mine the data based on a hypothesis Hypothesis: PolyA+ RNAs of unknown identity will accumulate upon depletion of mMtr4 vs. the control. – How can the transcriptome be queried? – How detailed should a query be? Every pA position, or only those exhibiting greater than x number of raw/normalized reads? How do we find significant differences with one sample, or possibly two? How can repetitive elements be accounted for in the data?

Custom annotation to remove bias from existing annotations Data mapped with Bowtie to mouse genome mm10 build Mapped data from KD and control compared using cufflinks to explore gene expression differences using a custom annotation Custom annotation – 1000 base pair genes with 500 base pair overlap with next gene This did not work well

Problems with using custom annotation First real problem was the no computing could handle more than 5000 genes of the custom annotation at a time – One chromosome had 147K genes There was a problem with assignment when the reads overlapped – Cuffdiff would randomly assign the reads to only one of the genes. Overlaps split into two fasta files, but we could not capture differences in the data that we knew exists. – cuffdiff collects data from the entire 1000 bp gene and compares between 2 samples – This method leads to false negatives for pA data where the focus is on one or a few positions as a pA event.

What next? Mapping Map raw reads against mm10 assembly with Bowtie2/ Tophat Strand and 3’ end selection Select alignments on positive and negative strand Select 3’ read of paired reads to define site of polyadenylation Custom annotation preparation and count Run F-Seq to identify the mode of all peaks Normalize data then collect reads at mode (+/-5-10 nucleotides) Statistical test with DESeq, an R package Negative binominal model

F-Seq Tags to identify specific sequence features for different library preparations (ChIP-seq), (DNase-seq) and (pA-seq). Will summarize and display individual sequence data as an accurate and interpretable signal, by generating a continuous tag sequence density estimation.

Generating Peaks with FSeq 1. Estimate kernel density to estimate pdf 2. compute threshold – n w =nw/L. – x c, – Repeat step 2 k times – s SDs above the mean 2.1 threshold output module is modifiable

Magnitude of data: one sample both strands 51 million bases of Chromosome thousand bases of Chromosome 12 Chromsome 12 is 121 million base pairs long

rRNA workflow Mapping Map raw reads against 13kb rDNA with Bowtie2 Strand and 3’ end selection Select alignments on positive strand Select 3’ read of a pair and 3’ end of a read Density estimation and visualization Density estimation with F-Seq, a peak calling tool

18S28S5.8S pA reads intersecting 45S pre- rRNA

18S 28S 5.8S

Accumulation of micro RNA processed 5’ leader upon depletion of Mtr4 Comparison of Mtr4 V. Control KD Abundant polyA found near 5’ end of annotated Mir322 Confirmed using molecular technique