Sequence on SOLiD The size-selected, bar-coded libraries are sequenced on the SOLiD 5500. Reads are from single end, 50 bp.
Target Read Counts for miRNA The vast majority of miRNA-seq reads do map successfully to miRNA (~90%) Target read counts will be a function of how well resolved low abundance miRNA need to be resolved Large shift or shifts in abundant miRNAs do not necessitate many reads. We aimed for about 10 million reads per condition, which was achievable for 9 samples on one multiplexed lane
Only a few miRNAs tend to dominate the population # miRNAs Cumulative % of Reads 80% of reads from 30 miRNA; 90% from 54 ~340 miRNAs were described by populations of 1000+ reads across conditions in our experiment
Treating Raw miRNA Data Due to the short length of inserts, trimming of adapter sequence is required. Due to a high level of redundancy, it’s often advisable to collapse identical reads to speed alignment. Unique sequences align only once rather than aligning the same sequence thousands of times. Retain count information for quantitation following alignment.
Aligning miRNA reads Alignment is often performed in two stages 1 st against a prepared reference containing ONLY known miRNA sequences for the appropriate organism (miRBase or elsewhere). 2 nd against the genome for identification of novel small RNA. Any typical aligner works well for this purpose Novocraft, Bowtie(1), BWA, etc Other packages exist that ease this process and identification of novel miRNA such as miRanalyzer.
miRanalyzer Available via command-line or by a webapp (common organisms). http://bioinfo5.ugr.es/miRanalyzer/miRanalyzer.php
Novel miRNA and Quantitation Novel identified sequences need to be evaluated for the possibility of forming hairpin structures miRanalyzer does this already, scoring novel alignment regions for the possibility of forming miRNAs Read count tables are produced for further analysis and comparison Reads per miRNA Novel miRNA are only really comparable between experiments in which the same species are observed and are typically kept separately
Comparison Between Conditions Normal RNAseq tools for identifying differential expression from quantitated data tables is the preferred method. DESeq, edgeR, baySeq, limma, etc DESeq was utilized on count tables produced from miRanalyzer (and is also a part of the webapp package). Triplicates from three experimental conditions were compared pairwise for differential expression of miRNA. p-values for exact test of change between conditions are generated padj values result from Benjamini-Hochberg multiple testing to determine a FDR (cutoff of 0.1 is typically applied here). Output varies depending on tool used.
Additional tasks Target Database/Prediction mining of differentially expressed miRNAs miRbase, miRanda, TarBase (experimental observations), etc Validation of DE of miRNA and targets Enrichment analysis