*These authors contribute equally.
Rosa rugosa is an important ornamental plant which belongs to the genus Rosa in the family Rosaceae. It is native to China and is widely distributed in the world. Because of its unique fragrance, color, cold resistance and drought resistance, it has great development potential in garden application  . There are many varieties of roses, but most of them are traditional colors such as pink, purple, etc. A few varieties are white, lacking yellow, bright red, orange and compound color, etc.  . Therefore, how to innovate rose color has become the main goal of breeders. The analysis of the pigment composition of rose and the study of the expression characteristics of the key enzymes encoding genes that catalyze the synthesis of rose pigment are the important prerequisite for rose color oriented molecular breeding  . Anthocyanin determines the color of higher plant organs. Its biosynthesis pathway related structural genes (CHS, CHI, F3H, F3’H, DFR, ANS, 3GT etc.) and regulatory genes (MYB, mostly R2R3 MYB, BHLH and WD40 classes) have been cloned, sequenced and protein function studies in many plants, such as petunia, maize, snapdragon and so on. But less research has been done on R. rugosa.
Anthocyanins, derived from the anthocyanin biosynthesis pathway, are the largest group of water-soluble plant flavonoids found in organs of plants and crops   . Anthocyanins are unstable in plants, mainly in the form of glycosides in the vacuole  . Anthocyanins play an important role in insect pollination, auxin transport, protection of leaves from ultraviolet radiation, inhibition of diseases and insect pests, etc.  . In addition, as a safe, non-toxic natural food pigment, anthocyanins also have anti-oxidation, anti-cancer and anti-arteriosclerosis functions  .
Anthocyanin biosynthesis pathway is one of the secondary metabolic pathways in plants. At present, the research on it has been clearer. Firstly, naringin was formed by the catalysis of CHS and CHI with coumaryl coenzyme A and malonyl Co A. Then flavonols were formed under the action of F3H. The next step is to form colored anthocyanins under the action of DFR and ANS. Finally, through glycosylation, methylation, acylation, hydroxylation and other modifications to form a variety of anthocyanins with a stable structure  . Flavonoid 3-0-glycosyltransferase (3GT) gene is a downstream gene in anthocyanin synthesis pathway. It can catalyze the glycosylation of UDP glucose to replace the 3 hydroxyl groups of anthocyanin and make anthocyanin glycosylation to produce colored and stable anthocyanins. And move the maximum absorption spectrum to the ultraviolet end, thus increasing the blue tone of anthocyanins  . Glycosylation can change the hydrophilicity, biochemical activity and subcellular localization of anthocyanins, which is beneficial to the transport and storage of anthocyanins in cells and organisms  .
Some studies have shown that the anthocyanin content of plants lacking 3GT also decreased significantly  . 3GT gene belongs to a glycosyltransferase (GTs) family 1, whose enzyme protein has a conserved domain of about 44 amino acids at its C-terminal, known as plant secondary product glycosyltransferase (PSPG) box. At present, 3GT has been cloned and analyzed in many plants such as Zea mays  , Vitis vinifera  , Gentiana trflora  , Petunia hybrida  and so on, which has laid a foundation for understanding the metabolic regulation of anthocyanin synthesis pathway.
At present, the studies on R. rugosa are mainly focused on morphological classification, geographical distribution, essential oil extraction and food quality evaluation, and there are few reports on the anthocyanin biosynthesis mechanism, so we don’t know exactly how it works. In this study, based on the R. rugosa transcriptome data, we cloned and identified RrGT1 gene from the petals of R. rugosa ‘Zizhi’ for the first time. We carried out detailed bioinformatics analysis, homology analysis and the temporal and spatial expression pattern analysis of the RrGT1 gene in order to provide some useful informations for the subsequent color improvement project in R. rugosa.
2. Materials and Methods
2.1. Plant Materials
For R. rugosa, three varieties (R. rugosa ‘Zizhi’, R. rugosa ‘Fenzizhi’ and R. rugosa ‘Baizizhi’) cultivated in Rose germplasm nursery of Shandong Agricultural University was used as the test material. We collected the petals at the budding stage, initial opening stage, half opening stage, full opening stage and wilting stage in the forenoon on sunny days from 20 April to 10 May 2017. Seven tissues (roots, stems, leaves, petals at the budding stage, sepals, stamens and pistil) of R. rugosa ‘Zizhi’ were collected at the same time. After quick freezing of liquid nitrogen, all samples collected with three replicates were put into −80˚C refrigerator for storage.
2.2. Extraction of Total RNA and Synthesis of the First-Strand cDNA
The extraction of total RNA is based on the specification of EASY spin plant RNA rapid extraction kit (Aidlab Biotech, Beijing, China). The integrity of RNA was detected by gel electrophoresis with 1.0% nondenatured agarose, the purity and concentration of RNA were detected by Nanodrop2000C ultramicro spectrophotometer (Thermo Fisher Scientific, Wilmington, Delaware, USA), and the qualified RNA was preserved at −80˚C. The first-strand cDNA was synthesized by referring to the steps of 5× All-In-One RT MasterMix reverse transcription kit (ABM Company, Vancouver, Canada) and synthesized according to the requirements of RT-PCR and qRT-PCR.
2.3. Full-Length CDS Cloning of the RrGT1 Gene
3' RACE specific primers were designed based on the sequence information provided by the Rosa transcriptome data. The cDNA 3' terminal sequence of the target gene was amplified by 3' RACE technology. The known target gene sequence in transcriptome data and the 3' terminal sequence obtained by RACE technique were spliced with DNAMAN software to obtain the full-length cDNA sequence of the RrGT1 gene. According to the sequence obtained by splicing, the upstream primer RrGT1-F containing the starting codon of the RrGT1 gene and the downstream primer RrGT1-R containing the terminating codon were designed and amplified. The estimated length of amplification was 1161 bp.
2.4. Bioinformatics Analysis of the RrGT1 Gene
Bioinformatics analysis softwares and tools were used to predict the physicochemical properties and structural functions of the protein encoded by the RrGT1 gene, which provided a reference for the future research and application of the gene. The basic physical and chemical properties of the protein encoded by the RrGT1 gene were analyzed by using Prot-Param tools in ExPasy (http://web.expasy.org/protparam/). The CD-Search function (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) on NCBI was used to predict the conserved domain in the protein encoded by the RrGT1 gene. The 3D structure of the RrGT1 protein was predicted by online tool SWISS-MODEL. DNAMAN software and Blast tool in NCBI were used to analyze the amino acid sequence of the RrGT1 gene. The phylogenetic tree is constructed by using MEGA5.0 software.
2.5. qRT-PCR Detection
We analyzed the gene expression by qRT-PCR on a Bio-Rad CFX96TM Real-Time PCR instrument (Bio-Rad, Inc., USA). The qRT-PCR mixture (20 μL total volume) contained 10 μL of SYBR® Premix Ex TaqTM (TaKaRa, Inc., Japan), 8.2 μL of ddH2O, 0.4 μL of each primer and 1 μL of cDNA. The PCR program was carried out with an initial step of 95˚C for 30 s; 40 cycles of 95˚C for 5 s, 60˚C for 30 s; and then, 95˚C for 10 s, 65˚C for 5 s and 95˚C for 5 s for the dissociation stage. Each gene was assessed with three biological replications. The relative expression levels of the genes were calculated by the 2−ΔΔCt method  .
3.1. Cloning of RrGT1 and Sequence Analysis
We obtained the RrGT1 gene 3' terminal sequence of 277 bp length by using nested PCR method (Figure 1(a)). The full-length sequence (Figure 1(b)) of the RrGT1 gene was cloned by full-length primers (Table 1) and confirmed by sequencing. The RrGT1 gene has a complete open reading frame from the starting codon ATG to the termination codon TAA, encodes a 386-amino acids protein and has a polyA tail. Sequence alignments with the NCBI database showed the RrGT1 protein is a member of the GTB superfamily and has a typical plant secondary product glycosyltransferases conserved domain consisting of 44 amino acid residues at the C-terminal. According to online software prediction, the
Figure 1. The results of 3' RACE amplification and full-length CDS amplification of the RrGT1 gene. (a) 3' RACE amplification product of the RrGT1 gene; (b) Full-length CDS amplification product of the RrGT1 gene.
Table 1. Primers used in the present study.
molecular formula of the RrGT1 protein is C1879H2964N494O556S14, the relative molecular mass is 41,820.02 Da, and the theoretical isoelectric point is pI = 5.03.
3.2. Protein 3D Model Construction
During the construction of the RrGT1 protein 3D model, templates search with BLAST and HHBlits has been performed against the SWISS-MODEL template library. It was found that it had the highest homology with the UDP-glucose:anthocyanidin 3-O-glucosyltransferase protein model in the database (47.01%), so the RrGT1 protein 3D model (Figure 2) was constructed on the basis of its model.
3.3. Homology Analysis
Sequence alignment of multiple species amino acids (Figure 3) showed that the
Figure 2. The protein modelling results and quality estimation of RrGT1. (a) Visualizable 3D model of RrGT1; (b) Local quality estimation of the protein; (c) As a statistical value, Z-score can be used to represent the matching degree between template proteins and proteins to be tested. QMEAN: A comprehensive scoring function for model quality assessment. All Atom: Normalized all-atom potential energy of the residue calculated from the short-range statistical potentials. CBeta: Normalized cbeta potential energy of the residue calculated from the short-range statistical potentials. Solvation: Normalized solvation potential energy of the residue calculated from the short-range statistical potentials. Torsion: Torsion energy of the residue exposed relative solvent accessibility, calculated by dividing the maximally accessible surface area (ASA) of a residue by the observed value; (d) Comparison with Non-redundant Set of PDB (Protein Data Bank) structures; (e) Alignment between the model and the template.
Figure 3. Amino acid sequences homologous analysis of the RrGT1 gene and other species related genes. Alignments were performed using DNAMAN. The relative genes of 21 species including the RrGT1 gene were selected for comparative analysis. The blue box shows the PSPG domains. Black triangles (from left to right) indicate 22, 23 and 44 amino acids in the PSPG box.
RrGT1 protein has strong species specificity in the N-terminal region and PSPG conserved domain in the C-terminal region. Using MEGA5.0 software, phylogenetic tree (Figure 4) was constructed from 21 plant amino acids sequences including R. rugosa RrGT1. The results showed that the RrGT1 gene had the closest genetic relationship with the EsUF3GT gene, and its homology reached 59%, which indicated that GTs gene is highly specific among different species.
3.4. Temporal and Spatial Expression Patterns of the RrGT1 Gene
The expression levels of the RrGT1 gene, which significantly differed, were assessed during five flowering stages. For R. rugosa ‘Zizhi’ (Figure 5(a)), the highest expression level was observed during the full opening stage, and the lowest was observed during the budding stage. For R. rugosa ‘Fenzizhi’ (Figure 5(b)), the highest expression level was observed during the budding stage, and the expression levels of other four stages were relatively low. And for R. rugosa ‘Baizizhi’ (Figure 5(c)), the expression level was also highest during the full opening stage but lowest during the half opening stage. The expression patterns of the RrGT1 gene in R. rugosa ‘Zizhi’ and R. rugosa ‘Baizizhi’ showed the same trend.
Figure 4. Phylogenetic tree of RrGT1 and GT members from other plant species. The tree was constructed by neighbor-joining method using MEGA 5.0 software. Branch numbers represent as percentage of bootstrap values in 1000 sampling replicates and scale indicates branch lengths.
Figure 5. The temporal and spatial expression patterns of RrGT1. (a) The relative expressions of the RrGT1 gene in five flowering stages of R. rugosa ‘Zizhi’; (b) R. rugosa ‘Fenzizhi’; (c) R. rugosa ‘Baizizhi’; (d) Comparison of relative expressions of RrGT1 in five flowering stages of three varieties above; (e) The relative expression of RrGT1 in seven different tissues of R. rugosa ‘Zizhi’. RrGAPDH was used as the internal control. Error bars represent the SDs of triplicate reactions. The experiment was repeated three times with similar results.
Then the expression levels of the RrGT1 gene in three varieties was compared in five flowering stages (Figure 5(d)). In the budding stage, R. rugosa ‘Fenzizhi’ > R. rugosa ‘Baizizhi’ > R. rugosa ‘Zizhi’, while in the initial opening , half opening ,full opening and wilting stage, the trends of gene expression in the three varieties were consistent: R. rugosa ‘Zizhi’ > R. rugosa ‘Baizizhi’ > R. rugosa ‘Fenzizhi’.
The expression levels of the RrGT1 gene, which also significantly differed, were assessed in seven different tissue types of R. rugosa ‘Zizhi’ (Figure 5(e)). The expression level in the leaves, stems and flower buds was relatively high but was relatively low in the other tissues.
Although many genes have been reported to regulate the formation of flower color, there are few reports of downstream structural genes such as GTs. The final formation of anthocyanins depends on the glycosylation of GTs, so it is very important to elucidate the function and influence of the RrGT1 gene in R. rugosa color formation. In this study, we successfully cloned RrGT1 gene with full length cDNA of 1161bp, encoding 386 amino acids from the petals of R. rugosa ‘Zizhi’. It was predicted that the molecular formula of the protein encoded by the RrGT1 gene is C1879H2964N494O556S14, the relative molecular mass is 41820.02 Da, the theoretical isoelectric point pI = 5.03, which belongs to GTB superfamily. During the construction of the RrGT1 protein 3D model, it was found that the RrGT1 protein had the highest homology with the existing UDP-glucose: anthocyanidin 3-O-glucosyltransferase protein model (47.01%), which suggested that the protein encoded by RrGT1 gene was related to the glycosylation of anthocyanin.
The evolutionary analysis of flavonoids GTs by Sawada et al.  showed that GTs, which catalyze the glycosylation of flavonoids in different positions (3-O, 5-O, 7-O), were formed into different evolutionary branches (F3Gly T, F5Gly T, F7Gly T) without restriction of species. It suggested that the region-specific of flavonoids GTs to glycosyl receptors (catalytic site specificity) was formed before species differentiation. In the course of evolution, the ability to utilize UDP-sugar (UDP-glucose, UDP-rhamnose, UDP-galactose) was obtained. Phylogenetic tree analysis showed that RrGT1 was linked to EsUF3GT of 3-O-glycosylation of flavonoids, suggesting that RrGT1 may be involved in the glycosylation process of 3-O positions of flavonoids.
The alignment of amino acid sequences between RrGT1 and glycosyltransferases from the other 21 species indicated that RrGT1 possessed a common PSPG motif of the glycosyltransferases superfamily (Figure 3). Previous studies have shown that the conserved region of PSPG is related to substrate recognition and catalytic activity of enzyme proteins     . If the 44 amino acids of the PSPG domain are numbered, the amino acids at position 22, 23 and 44 play an important role in the selection of enzyme proteoglycan donors. The twenty-second position of tryptophan (Trp, W) can correctly locate UDP-glucose, while arginine (Arg, R) can make UDP-glucuronic acid locate correctly; the twenty-third position of serine (Ser, S) is highly conserved in UDP-glucurono-syltransferase    ; the forty-fourth glutamine (Gln, Q) and histidine (His, H) have strong conservatism in glucosyltransferase and galactotransferase respectively  . In the PSPG domain of the RrGT1 gene, the amino acids at position 22, 23 and 44 are cysteine (Cys, C), asparagine (Asn, N) and histidine (His, H), respectively. Therefore, we speculated that the RrGT1 gene is using UDP-glucose or galactose as the main glycosyl donor, but has no glucuronyltransferase activity.
The expression levels of the RrGT1 gene during flower development and in different tissues were investigated. It was found that the expression of the RrGT1 gene showed a different trend during different flowering periods in three R. rugosa varieties, indicating that the expression of the RrGT1 gene was developmentally regulated in the process of anthocyanin biosynthesis. Studies have shown that the accumulation of anthocyanin in red skinned sand pear, strawberries and litchi is positively correlated with the activity of UF3GT. Boss et al.  also detected the expression of UF3GT in the peels of red grape which accumulated anthocyanin, but not in other tissues of red grape and white grape without anthocyanin accumulation. Gong et al.’s  studies showed that the partial structural genes of Perilla frutescens anthocyanin metabolic pathway were only expressed in the leaves of red varieties, but not in green varieties or the expression in the green leaves was very low. About the tissue-specific expression in R. rugosa ‘Zizhi’, besides the high expression level in flowers, it is worth mentioning that the stems of R. rugosa ‘Zizhi’ are purple, which is consistent with the high expression level of the RrGT1 gene. In addition, RrGT1 was also highly expressed in the leaves, so we infer that RrGT1 is also involved in the glycosylation of secondary metabolites in leaves and plays an important role.
In conclusion, the cloning and expression analysis of the RrGT1 gene was beneficial to analyzing the molecular synthesis and regulation mechanism of anthocyanins, and also provided some important informations for the improvement of R. rugosa flower color in the future.
This project was supported by the Agricultural Seed Project of Shandong Province ( No. 96).