Presentation is loading. Please wait.

Presentation is loading. Please wait.

CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification

Similar presentations


Presentation on theme: "CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification"— Presentation transcript:

1 CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification
Tamar Hashimshony, Florian Wagner, Noa Sher, Itai Yanai  Cell Reports  Volume 2, Issue 3, Pages (September 2012) DOI: /j.celrep Copyright © 2012 The Authors Terms and Conditions

2 Cell Reports 2012 2, 666-673DOI: (10.1016/j.celrep.2012.08.003)
Copyright © 2012 The Authors Terms and Conditions

3 Figure 1 The CEL-Seq Method
(A) Individual cells are added to tubes, each with a uniquely bar-coded primer for reverse transcription. After second-strand synthesis, the reactions are pooled for IVT. The amplified RNA is then fragmented and purified before entry into a modified version of the Illumina directional RNA protocol, the molecules with both Illumina adaptors are selected, and the DNA library is sequenced with paired-end reads. (B) Nucleotide distribution in the sequenced paired-end reads. Each nucleotide position is represented by one column, with the first base on the left. (C) Barcode distribution of one IVT reaction after demultiplexing. The cells from three two-cell stage C. elegans embryos (denoted P1 and AB) and a single one-cell stage embryo (denoted P0) were amplified together in a single multiplexed IVT reaction. (D) Distribution of the reads mapping to the C. elegans genome in the six AB/P1 cells. Error bars indicate the SD. (E) Correlation between biological AB replicates. See also Figure S1C. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions

4 Figure 2 Benchmarking of CEL-Seq on Mouse ES and MEF Cells
(A) CEL-Seq Pearson’s correlation coefficients among the ES and MEF cells (on log10 tpm values), computed as previously described by Islam et al. (2011), on the 1,000 genes most highly expressed in ES cells and 1,000 genes most highly expressed in MEF cells (1,385 genes). (B) Mean number of genes detected above two thresholds (10 and 100 tpm) across the ES and MEF cell types using CEL-Seq and the “STRT” PCR-based method (Islam et al., 2011). Error bars indicate 95% confidence intervals. (C) Reproducibility according to expression level. For each gene the coefficient of variation was computed across the log10 tpm values in the ES cells for the STRT and CEL-Seq methods. The genes were then ranked by expression level in bins of 200 from high to low. For each bin the mean and SD of the coefficients of variation of the genes are shown. See also Figure S2 for additional analyses. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions

5 Figure 3 Sensitivity and Reproducibility of CEL-Seq
(A) CEL-Seq achieves a linear response over the entire detection range. The plot indicates the 92 ERCC (Baker et al., 2005) spike-in levels from six 10 pg replicate samples. Averages for each of the 17 groups of spike-ins with the same nominal concentration are shown as larger circles. For each sample the spike-ins were normalized such that the average expression of the 12th spike-in group containing 1,000 molecules was set to 1,000. The fraction of spike-ins in each group without detected expression is shown at the bottom of the figure. The line indicates an idealized linear relationship. Systematic deviations from the line for some spike-ins with high concentrations are likely due to differences in G/C content. (B) C. elegans transcript counts per sample based on linear regression of spike-in expression levels (5 pg, n = 5; 10 pg, n = 6; 20 pg, n = 5; 40 pg, n = 3). The number of molecules calculated is directly proportional to the amount of input RNA. Error bars indicate the SD. (C) CEL-Seq sensitivity. For different amounts of input RNA, the bars indicate the percentage of genes detected (at least one read) as a function of their absolute copy number, calculated based on the 1 ng reference sample. Error bars indicate the SDs. See Figure S3 for additional analyses. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions

6 Figure 4 Dissecting the Early C. elegans Embryo with CEL-Seq
(A) Differential expression analysis in the two-cell stage blastomeres, AB and P1. A t test was made for the 137 genes for which there was expression >100 tpm and at least 2-fold change between the means of the triplicates. The 17 genes with p < 0.05 (FDR corrected) are shown along with the mean expression (right) and standardized expression of triplicates on the left (mean subtracted and SD divided). (B) The blastomeres examined in this study are abstracted in the cell lineage. The number of new transcripts is indicated under the vertical arrows. The two-sided arrows indicate the number of differentially expressed genes in the anterior versus the posterior blastomeres, respectively. A list of all differentially expressed and new transcripts is in Tables S2 and S3, respectively. (C) Gene expression levels (log10 tpm; see color scale on right) for the indicated genes; cell lineage is as in (B). (D) Classification of the AB and P1 blastomeres. For the indicated number of replicates used in the training data, the performance of the machine-learning classifier (see Experimental Procedures) in assigning blastomere identities is shown as a function of the number of genes included for prediction. See Figure S4 for additional analyses. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions

7 Figure S1 The CEL-Seq Method, Related to Figure 1
(A) Expression of histone genes. Core histone mRNAs are not thought to be poly-adenylated, and therefore CEL-Seq should not detect strong expression for those genes. A list of histone genes was obtained from WormBase, version WS229, and the expression level of each gene in one of the AB samples is shown. The two highly expressed core histone genes are his-72 and his-74, two histone H3 variants required for embryonic viability. Interestingly, some histone H3 variants have been suggested to be specifically poly-adenylated (Keall et al., 2007). Excluding these two outliers, the average expression of core histones is 8.6 tpm, as opposed to 433 tpm for the (poly-adenylated) linker histone mRNAs. This supports the specificity of CEL-Seq for poly-adenylated transcripts. Detection of spurious core histone expression is consistent with the observation that a limited degree of poly-adenylation can be detected for most C. elegans core histones (Mangone et al., 2010). (B) Transcript coverage in relation to distance from the 3′ end showing CEL-Seq’s 3′-end specificity. For each C. elegans gene, the longest annotated transcript variant (i.e., mRNA) was selected. Transcripts longer than 2.5kb are not shown. Transcripts were binned according to their length and aligned at their 3′ ends. Average coverage profiles for each bin are shown. This plot is from data of one of the P1 cells, but is representative for all six AB and P1 samples and the results are comparable when shortest transcript variants are chosen instead. The 3′-end bias of 90% of the reads (Figure 1D) indicates the specificity of the method to mRNA as opposed to genomic DNA. (C) CEL-Seq reproducibility. Pair-wise scatter plots and correlation coefficients between log10 expression levels of the six AB/P1 blastomeres described in Figure 1. (D) A negative control (mock transfer of a blastomere from the EGM medium used in other experiments) was analyzed in the same IVT with eight C. elegans cell samples. The negative control gave overall very few reads, and of those most did not map to the C. elegans genome. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions

8 Figure S2 Benchmarking of CEL-Seq on Mouse ES and MEF Cells, Related to Figure 2 Application of CEL-Seq to mouse embryonic stem cells (ES) and mouse embryonic fibroblasts (MEF) and comparison with a previously published PCR-based method (STRT)(Islam et al., 2011). 20 RT reactions containing an ES or MEF cell were done with the addition of 20 pg total C. elegans RNA in each, added to obtain the minimal RNA requirement for the IVT step (see also Figure S3 for additional information). Reads mapping to the mouse genome were counted, only samples with at least 1 million reads were analyzed which comprised nine ES cells and seven MEF cells. The STRT data (Islam et al., 2011) was retrieved comprising 48 ES cells and 44 MEF cells and the seven cells described originally as of low quality were also excluded here. (A) Distribution of expression levels of each single cell transcriptome across methods (STRT, left; CEL-Seq, right) and cell type (ES, top; MEF, bottom). CEL-Seq shows more reproducible distributions of expression. (B) Analogous to Figure 2A, showing correlation coefficients on log10 values of the STRT method processing of ES and MEF cells (respectively). Correlations are computed as previously(Islam et al., 2011) on the 1000 genes most highly expressed in ES and 1000 genes most highly expressed in MEF cells (1398 genes). (C) Analogous to Figure 2C, indicating the reproducibility according to expression level for MEF cells. For each gene, the coefficient of variation was computed across methods (STRT, CEL-Seq). The genes were then ranked by expression level in bins of 200 from high to low. For each bin, the mean and standard deviation of the coefficients of variation of the genes are shown. CEL-Seq has less MEF cell variation with respect to STRT. (D) Principal components analysis. The same genes used in part (C) for STRT and those genes analyzed in Figure 2A for CEL-Seq were used to compute the PCA on the log10 tpm values. The cells are more clearly separated using CEL-Seq in both the first and second principal components. (E) CEL-Seq exhibits lower technical variation than STRT, based on spike-in expression data. Spike-ins from the five samples of 5 pg C. elegans reference RNA were used to quantify technical variation. To this end, spike-in expression in all samples was standardized by setting the average of the spike-in group with the concentration closest to 1000 molecules (1130 molecules for the CEL-Seq data) to Then, for each spike-in, the sample standard deviation across the replicates was calculated, and the mean of the standard deviations of all spike-ins with the same nominal concentration is shown. Error bars indicate standard deviations. Note that the standard deviation is shown on a log scale, i.e., at 1, 10, and 100 spike-in molecules, CEL-Seq exhibits about four times less variation than STRT. At 1000 molecules, the difference constitutes an entire order of magnitude. The STRT values are taken from Figure 4D of the STRT publication (Islam et al., 2011). Note also that the STRT spike-in data is derived from experiments on cells, while the CEL-Seq data was obtained by analyzing single-cell amounts of diluted reference RNA. However, by standardizing expression based on one of the spike-ins, it is ensured that differences in efficiencies with which diluted RNA are analyzed do not skew the results. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions

9 Figure S3 Sensitivity and Reproducibility of CEL-Seq, Related to Figure 3 (A) Number of C. elegans reads obtained depends on the amount of C. elegans RNA used in the RT reaction, and is not affected by the addition of carrier RNA. In the 5pg and 10pg samples there is no effect of adding carrier RNA, while the difference in the 20pg samples is likely an outlier effect due to the small sample size. Carrier RNA would be expected to show the strongest effect in the low-end input RNA samples (5pg and 10pg). (B) CEL-Seq achieves a linear response over the entire spike-in detection range. Spike-in expression levels in a sample containing 10 pg C. elegans RNA. The dashed line indicates the expression level corresponding to 10 reads. Spike-ins below this threshold were not used in the linear regression, since low read counts lead to increased technical noise. (C) Reproducibility at different absolute expression levels, quantified as median coefficient of variation for all genes in a bin. Error bars indicate inter-quartile ranges. (D) Performance of the method at different sequencing depths. Total RNA from C. elegans embryos of mixed-stages was prepared and CEL-Seq was performed on 12 samples of 20 pg aliquots. Collectively, the 12 samples identified 13,529 genes which we took to represent the pooled sample. For different ranges of the pooled expression, we computed the fraction of genes with expression within 20% of the log10 expression in each of the 12 samples. The mean and standard deviation across the 12 samples is shown for the different pooled expression levels. For example, for samples with 5 million (M) reads, RPM values greater than 100 are 91% likely to be within 20% of the pooled values. We repeated the analysis for different numbers of reads by jack-knifing from the original data and found that the sensitivity remains roughly constant beyond 1 million reads. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions

10 Figure S4 Dissecting the early C. elegans Embryo with CEL-Seq, Related to Figure 4 (A) Validation of CEL-Seq results with two additional high-throughput methods. (Top) Comparison with microarrays. We used Agilent gene expression microarrays to assay gene expression levels in eight pooled cells for each of four blastomeres (AB, P1, EMS, and P2) using a previously described IVT approach (Yanai and Hunter, 2009). For the blastomeres of the 2-cell stage, we examined the differences in expression for those genes determined to be differentially expressed in Figure 3A. In almost all cases, the difference in expression between AB and P1 was in the same direction (left). We repeated this analysis for EMS and P2 with similar results (right). (Bottom) Comparison with standard RNA-Seq. Four pooled AB, P1, EMS, and P2 blastomeres were subjected to two rounds of IVT and standard Illumina RNA-Seq with similar results (left and right, same format as top). (B) Venn diagram indicating the intersection of genes newly expressed in both EMS and C, with transcription factors indicated in red. (C) Classification of the EMS/P2 and C/P3 blastomeres pairs in the same format as Figure 4D. For the indicated number of replicates used in the training data, the performance of the machine learning classifier (see Experimental Procedures) in assigning blastomere identities is shown as a function of the number of genes included for prediction. Cell Reports 2012 2, DOI: ( /j.celrep ) Copyright © 2012 The Authors Terms and Conditions


Download ppt "CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification"

Similar presentations


Ads by Google