Received 25 May 2016; accepted 27 June 2016; published 30 June 2016
In the last decade, three next-generation sequencing (NGS) technologies (the Roche/454 GS FLX, llumina/ Solexa Genome Analyzer and Applied Biosystems SOLiD system) have been released revolutionizing the field of high throughput data collection   . RNA-seq, one major application of these technologies, opens new horizons for large scale gene expression analysis in providing global transcription profiles through the sequencing of millions of nucleic acid fragments  . Illumina sequencing technology that currently generates hundreds millions of reads of 40 - 100 bases per flow cell lane is particularly well adapted to investigate small RNA transcriptomes.
MicroRNAs (miRNAs) are 18 - 24 bases non-coding RNAs that act as key regulators of gene expression in eukaryotes  . miRNAs mediate post-transcriptional regulation of target messenger RNAs (mRNAs) by impairing their translation and/or stability. The latest version (release 21) of the miRBase database  , a central repository for miRNAs, lists sets of 256, 495, 1193 and 1881 miRNA genes (miR genes) in fly, rat, mouse and human genomes, respectively. The relatively limited number of miRNAs, allied to the high sequencing capacity of Illumina technology, allows multiplex sequencing to be performed thus increasing the number of individual miRNA profiles. This is a great advantage given the fact that profiling of individual miRNA transcriptomes is now a major issue to recover complete genetic and epigenetic information. Furthermore, multiplex sequencing gives access to any miRNA transcriptome at a relatively low cost.
miRNA expression profiling with Illumina technology requires cDNA library construction consisting of four steps: ligation of a DNA oligonucleotide (3’ adaptor) to the 3’ end of RNAs, ligation of a RNA or chimerical DNA/RNA oligonucleotide (5’ adaptor) to the 5’ end of RNAs, reverse transcription (RT) of the resulting molecules and double-strand cDNA production by polymerase chain reactions (PCR). cDNA library tagging required for sample identification from multiplex sequencing data can virtually be done at any of the four steps by barcoding either adaptors or primers.
Three challenges are faced when comparing miRNA expression profiles from multiplexed cDNA libraries. First, we must ensure that miRNA expression changes are not artefactual, i.e. not due to differences in ligation efficiency of barcoded adaptors with some miRNAs in “by-ligation” multiplexing strategies, or differences in amplification efficiency of the barcoded primers with some single-strand cDNAs in “by-amplification” multiplexing strategies. Second, we would like to identify miRNAs of low expression as the level of miRNA expression in profiles is highly dependent on the sequences of the oligonucleotides used to build cDNA libraries. As a consequence the level of miRNA expression in profiles does not reflect the real abundance of miRNAs in samples  . Third, we aim to detect discrete changes of miRNA expression levels. Indeed, a moderate change in miRNA expression level may be biologically relevant in triggering significant cellular responses  .
The first challenge or objective is now accomplished using by-amplification multiplexing strategies, however the two others have to be considered when analysing data. Here we describe data obtained from rat brain and liver addressing these issues for improvement of downstream analyses.
2. Material and Methods
2.1. Collection of Samples
Wistar rats (virgin females and males) were obtained from Janvier (CER France) and housed in cages in standard conditions of temperature (20˚C - 22˚C) and hygrometry (40%), in a 12 h light/dark cycle. Animals had free access to water and standard food (280 kcal/100g; formula 113; 18%, 23% and 59% of the energy content derived from lipids, proteins and carbohydrates, respectively, from Safe (Augy, France)) for 4 weeks before breeding. From day 1 of pregnancy and until weaning, females were shifted to an unbalanced food (440 kcal/ 100g; formula 235; 46%, 16% and 38% of the energy content derived from lipids, proteins and carbohydrates,
Figure 1. Schematic overview of strategies and protocols used for cDNA library tagging. To construct RNA-derived cDNA libraries, adaptors are successively ligated to RNA 3’ and 5’ ends, ligated molecules are reverse transcribed and cDNA single strands are eventually amplified by polymerase chain reaction (PCR). (a) “Ligation-based” strategies incorporate barcodes (circles) through adaptors (usually upstream the PA sequence used as the 3’ adaptor) whereas (b) “Amplification- based” strategies incorporate barcodes (circles) through amplification, either downstream the RA3 sequence (TruSeq indexes) or as bulged nucleotides (Bulged Primers).
respectively, from Safe (Augy, France)). At birth, litters were sized to ten pups to prevent under- or overnutrition during lactation. At weaning, all animals were fed the standard food. From 4.5 months, half of each progeny was shifted to the unbalanced food for 10 weeks. Animals of 2.5 or 7 months were killed by decapitation under isoflurane deep anesthesia between 9 am and 10 am, following a 15-16-hour fasting. Brain was removed from males of 2.5 months born to dams fed a standard food, cerebellum was dissected and immediately homogenized in a ceramic bead tube (Ozyme) containing 2 ml of QIAzol lysis reagent (Qiagen) using a PreCellys homogenizer (PreCellys 24/Cryolys) for 20 s at 10˚C. Livers from males of 7 months were taken, immediately frozen in liquid nitrogen and stored at −80˚C. Samples were homogenized as described for cerebellum.
2.2. Purification of miRNA-Enriched Small RNA Fractions
Small (<150 - 200 bases) and long (>150 - 200 bases) RNA fractions of cerebellum were separated from QIAzol homogenates using the RNeasy Mini Kit (Qiagen) following manufacturer’s instructions and recovered in RNase-free H2O (Invitrogen). For liver, nuclei acids in the aqueous phase of the Qiazol extraction were precipitated by adding 1 volume of isopropanol. Pellets were resuspended in RNAse-free H2O. Part of small RNAs was heated in 50% (v/v) formamide, for 3 min at 70˚C, and size-fractionated on a 17% Tris-Borate-EDTA (TBE) urea (8 M) polyacrylamide gel. RNAs of 16 - 40 bases were eluted in 0.4 M NaCl overnight at 4˚C and precipitated in presence of glycogen (0.04 μg/μl; Invitrogen) by addition of 3 volumes of EtOH. After centrifugation at 13,200 rpm for 30 min at 4˚C, pellets were rinsed twice with cold 70% EtOH, air dried at room temperature and resuspended in RNase-free H2O.
2.3. Construction of Tagged cDNA Libraries
All DNA and RNA oligonucleotides (Table 1) used in the preparation of cDNA libraries were purchased from Sigma (France). 3’ adaptors were bought phosphorylated at the 5’ end and blocked at the 3’ end by a C7-Amine residue. They were adenylated as described  .
2.3.1. Tagging with Bulged Primers
For each library, 16 - 40 bases RNA (corresponding to the RNA obtained from 300 - 600 ng of total RNA) was ligated to 0.25 pmoles of the adenylated BC8 3’ adaptor. Ligation reaction was performed in a final volume of 10 μl containing 2.4 μl of PEG 8000 (Biolabs), 1X truncated T4 RNA ligase 2 buffer (Biolabs) and 0.4 μl (20 U/μl) of T4 RNA ligase 2 truncated (Biolabs). The ligation reaction was incubated for 75 min at 25˚C. The 5’ RNA adaptor was subsequently ligated to RNA by adding 1 pmole of adaptor to the mixture. After 3 min denaturation at 70˚C, the mixture was kept on ice while adding 1.0 μl of (20 U/μl) T4 RNA ligase 1 (Biolabs) and 10 pmoles of ATP. The ligation reaction was incubated for 60 min at 25˚C. The ligated RNA was converted to single stranded cDNA using 130 units of Superscript II reverse transcriptase (Life technology) and 50 pmoles of RT1-primer at 50˚C for 90 min in a final volume of 30 μl. Following reverse transcription, a PCR mix was directly added to the RT reaction tube that contained 30 μl of 2X Master Mix Phusion enzyme (Biolabs), 60 pmoles of PB primer and 100 pmoles of bulged primer. The resulting 60 μl reaction volume was distributed into 4 × 15 μl to enhance thermic exchange, denaturated for 1 mn at 98˚C and submitted to 16 PCR cycles (20 s at 98˚C, 30 s at 55˚C, 25 s at 72˚C). PCR products were EtOH precipitated in sodium acetate 0.3 M final and resuspended before size-fractionation on 6% TBE polyacrylamide gel to purify the small RNA-derived cDNAs of 90 - 95 base pair (bp) from the adaptor dimer byproducts of 74 bp. cDNAs were eluted as described above and resuspended in H2O.
2.3.2. Tagging with Barcoded Primers (TruSeq)
For each library, 4 μl of 16 - 40 bases RNA was ligated to 0.25 pmoles of the RA3 3’adaptor. Following ligation to the 5’ adaptor (1 pmole) and reverse transcription with the RT2-primer (50 pmoles), PCR amplification was performed in presence of 100 pmoles of TruSeq primer under the same thermal cycling conditions as above. PCR products of 130 - 140 bp were purified from the 112 bp dimer adaptor byproducts as described above.
2.4. Sequencing Data Analysis
A GAIIx sequencing lane yields approximately 30 - 32 millions quality-filtered reads. The TruSeq barcoded reads were obtained demultiplexed from Imagif platform (Centre de Recherche de Gif http://www.imagif.cnrs.fr). The barcoded BP reads were demultiplexed using our script written for this purpose. This script retrieved sequences having the first eleven nucleotides of the 3’ adaptor. The filtered reads were trimmed from the adaptor
Table 1. Names and sequences (5’-3’) of oligonucleotides used in the three methods of small-RNA library preparation. Barcodes of the cDNA amplification primers are in bold. The canonical sequence of the reverse PCR bulging primers is given in which NN symbolizes different dinucleotides.
sequence and their length distribution was recorded. Reads > 16 bases were analyzed using the miRanalyzer (or its latter version sRNAbench)   . Reads were mapped onto the BN/SsNHsd/Mcwi reference rat genome allowing 2 mismatches. This server uses the miRBase (http://www.mirbase.org) and additional databases to classify the mapped reads into miRNAs, miRNA-related and non-miRNA sequences. Sequences that do not fall into these annotation categories but that match on the reference genome are tested for the encoding of putative novel miRNAs. Concerning former categories, reads are assigned to the miRNA genes that have been identified in the rat genome. Detailed counts of known mature miRNA and mature miRNA-star (the less stable strand of the RNA duplex precursor) sequences as well as detailed counts of new variant sequences not yet annotated in the miRBase database are provided. The former designation of mature miRNA/miRNA* has been replaced by the miRNA-3p/miRNA-5p designation according to the hairpin arm precursor miRNA arise from. The variant sequences denoted “Hairpin rno-miR’” differ from known miRNAs at the 5’ end by one base and define 5’_isomiRs. miRNA sequence-based profiles were then constructed and compared. To account for library size differences, we computed the expression of each miRNA the DESeq normalization methods  . The raw data files have been submitted to SRA database (NCBI) under the accession numbers (SRX363287-90, SRX363303-18, SRS1443267 and SRS1443207). Individual group statistics were calculated using Mann and Whitney tests and corrected for multiple testing.
3. Results and Discussion
3.1. First Challenge: No Barcode Bias
As illustrated in Figure 2, the introduction of barcodes is no more a source of technical variability when barcoding through cDNA amplification with tagged primers (TruSeq indexes or bulged primers). As mentioned in the Introduction, significant variability used to be introduced when using a ‘by-ligation’ multiplexing strategy. This is important to recall because by-ligation protocols were still in use in the scientific community  . It is also important to take into account when re-examining early published data.
3.2. Second Challenge: MicroRNAs of Low Expression
In sequencing data, moderately or even weakly expressed miRNAs can actually correspond to abundant miRNAs in the samples because of adaptor bias. Figure 3 highlights the strong 3’ adaptor-dependency of
Figure 2. Lack of barcoding bias. Comparison of miRNA expression levels between barcoded libraries. (a) Scatter plot comparing the expression of miRNAs between pairs of profiles resulting from the use of TruSeq indexes in cDNA library construction. Read counts in profiles I 1 - 4 were compared to reads counts in profile I1. All read counts align along the diagonal showing a lack of technical bias. (b) Scatter plot comparing the expression of miRNAs between pairs of profiles resulting from the use of bulged primers in cDNA library construction. Read counts in profiles BP 1 - 8 were compared to reads counts in profile BP1. All read counts align along the diagonal showing a lack of technical bias in that case too.
Figure 3. Adaptor bias. miRNA expression profiles of rat cerebellum were constructed either using BC8 (BP protocol) or RA3 (TruSeq protocol) 3’ adaptors. (a) Impact of 3’ adaptor sequence on the top 40 expressed miRNAs in BP profiles. For each miRNA, the mean BP and TruSeq expression is shown. miRNA expression are given in reads per million (RPM) following DESeq normalisation. Y-axis is drawn using a log2 scale. (b) miRNA expression in the mean TruSeq profile are plotted against miRNA expression in the mean BP profile. miRNA expression differ by a factor of 1 - 3 (Red areas) for 47 miRNAs, 3 - 10 (blue areas) for 52 miRNAs, 10 - 50 (green areas) for 23 miRNAs, and >50 for 9 miRNAs. X- and Y-axes are drawn using log10 scales.
miRNA expression profiles. Studies devoted to RNA ligase activities have shown that, for a given adaptor, RNA ligases favour some miRNA species over others   . This selective capture impairs measurement of the true (absolute) expression level of miRNAs and less biased library preparation remains an important issue as testified by the recent publications in the field of small-RNA deep sequencing  -  . Our study markedly illustrates how much miRNA expression profiles are dependent of the adaptor sequence per se. From the same small RNA sample, we have built eight multiplexed BP libraries and four TruSeq multiplex cDNA libraries. These two protocols used the same 5’ adaptor but two 3’ adaptors of unrelated sequences (BC8 or RA3). Following the construction of mean profiles specific of the BC8 or RA3 oligonucleotides, miRNA expression were established in reads per million (RPM). Both adaptors approximately capture the same number of miRNAs with a common list of 268 miRNAs, “the 268-set”. However, these miRNAs displayed important variation that can be two orders of magnitude even for miRNAs of high expression ranks (Figure 3(a)). For instance we obtained in BP and TruSeq mean profiles respectively, 760,000 versus 8000 for let-7d-5p, 20,000 versus 204,000 for miR-26a-5p, 1500 versus 78,000 for miR-30a-5p. In Figure 3(b), we plot for each miRNA its expression value in the mean BP profile versus the RA3 one. Four categories of miRNAs can be detected according to expression fold changes: 1 to 3, 3 to 10, 10 to 50, and beyond. Our data confirmed the strong sequence and/or structural-fold preference of RNA ligases ultimately leading to 3’- or 5’-adaptor-dependent miRNA profiles  . To solve this problem and allow a more accurate measurement of the absolute expression level of miRNA, a strategy had been proposed using pools of adaptors degenerated at their first positions     . However, this cannot be sufficient to achieve the “true” miRNome characterization as we reported that profiles recovered with a family of barcoded adaptors displayed variation far below the variation observed between adaptors differing over their entire length for a large number of miRNAs (personal data).
So it is important to estimate the abundance of a miRNA using alternative quantification techniques such as in situ hybridization or calibration curve in real-time RT-qPCRs.
3.3. Third Challenge: Quantification of Moderate Expression Changes
miRNA repertoires and miRNA expression profiles are specific of tissues and/or physiological/pathophysio- logical status. The range of miRNA expression changes is expected to greatly differ, depending on their biological status: small miRNA expression differences (<3) often characterize a tissue under different physiological conditions while miRNA expression differences of several orders of magnitude are observed between different tissues, or healthy tissues versus neoplastic tumors. In former cases, estimates of expression change will be highly dependent on the accuracy of expression level between sample groups. As shown in  , increasing the number of biological replicates is powerful to increase accuracy of expression level of highly or moderately expressed genes. For lowly expressed genes, adding replicates and sequencing depth both participate to increase accuracy of expression levels. From our experience on miRNA populations of rodent tissues and body fluids, we recommend to sequence 5 - 7 biological replicates at a sequencing depth of at least 1 - 2 millions miRNA-related reads  -  . As an example we provide here comparison of miRNA expression profiles of liver of 7 months old rat fed standard diet (7 biological replicates) versus high-fat diet (6 biological replicates). The 13 libraries were sequenced at a depth of 3.1 ± 0.2 millions reads. More than six hundreds miRNAs were detected giving a total of 181 miRNA gene families. Eleven miRNAs were identified as differentially expressed (padj-values < 0.1). Fold-changes of expression ranged from 2 to 4. Reducing the sample size to 3 - 4 replicates markedly deceased the number of identified differentially expressed miRNAs (Figure 4). In one case, only 4 differentially expressed miRNAs could be identified, in another one, only 1 differentially expressed miRNAs could be identified while in two other cases, no differentially expressed miRNA was identified. In addition, we would like to draw attention to the fact that very different statistical tools are used for the characterization of false-discovery- rates (FDR)  -  . Figure 5 illustrates the fact that different FDR corrections give very different lists of putatively differentially expressed miRNAs.
In summary, this work shows 1) the dependency of miRNA expression profiles from adaptors. This questions the statement of abundance widely used in the literature from sequence data. This may in turn influence our vision of relationships between miRNA expression and function; 2) the impact of sample size on comparative expression analyses. This should be strengthen because profiling individual miRNA expression of complex tissue is becoming a major breakthrough notably in brain specialized structures or substructures which express a high number of ubiquitous and specific miRNAs  -  .
This work has benefited from the facilities and expertise of the high throughput sequencing platform of IMAGIF
Figure 4. Impact of replicate numbers. miRNA expression profiles of liver of rats fed a control (c) or high-fat (HF) diet were constructed using the TruSeq protocol. Samples have been compared with the DESeq procedure, using either the whole sets of replicates or different replicate subsets. padj-values < 0.10 were retained as significant. (a) Nine miRNAs were identified as differentially expressed when using whole sets of replicates. No miRNA was identified in three subset combinations and 4 miRNAs, in one combination. Numbers in parenthesis refer to numbers of biological replicates. Letters in parenthesis refer to replicate subsets. The identification of differently expressed miRNAs greatly depends on the number of replicates. (b) Fold-change of expression. Despite differences in padj-values, fold-changes of expression were similar in all comparisons for the 9 miRNAs.
Figure 5. Impact of statistical tools. miRNA expression profiles of rat liver were constructed using the TruSeq protocol. Samples have been compared using a DESeq-like procedure and methods described by Holm (padj-values(H))  , Benjamini and Hochberg (padj-values(BH))  or Benjamini and Yekutieli (padj-values(BY))  to characterize false- discovery-rates and whole sets of replicates (7 for sample C and 6 for sample HF). Out of 256 miRNAs identified by more than 10 reads, 73 displayed fold-change of expression with p-values < 0.05. Two miRNAs displayed fold-change of expression with a padj-values (BH) < 0.05, 10 miRNAs, with a padj-values (H) < 0.05. No miRNA displayed fold-change of expression with a padj-values (BY) < 0.05. Depending on the test, 0 - 10 miRNAs therefore appeared putatively differentially expressed between samples C and HF.
(Centre de Recherche de Gif http://www.imagif.cnrs.fr). This work was supported by the Centre National de la Recherche Scientifique, the University Paris-Sud, XI and grant from Danone/FRM 2011.