The plant family Fabaceae or Leguminosae is considered the third largest family of flowering plants and is well known for its important ecological function in fixing atmospheric nitrogen. It is comprised of three subfamilies Caesalpinioideae, Mimosoideae and Faboideae. Within the Genisteae tribe of the legume subfamily Faboideae, the genus Lupinus or lupine encompasses more than 280 species of annual herbs and perennial herbaceous and woody shrubs distributed mainly in South and Western North America, the Andes, the Mediterranean regions and Africa  . Because of this ability to establish symbiotic associations with bacteria that can fix atmospheric nitrogen in root nodules, members of the genus Lupinus thrive in nutrient poor soils. The rhizobial requirement of Lupinus has been thought to be somewhat specific, with literature indicating that lupines are mainly nodulated by soil bacteria classified in the genus Bradyrhizobium, although some other rhizobial and endophytic genera nodulating lupines have been identified as Mesorhizobium, Rhizobium, Microvirga, Paenibacillus, Micromonospora, Bosea, Ochrobactrum and Cohnella  -  . In addition, members of the genus Burkholderia (in class Betaproteobacteria) are known as endophytic bacteria in lupine  and considered the major inhabitants of white lupine cluster roots  .
Previously, the 16S rRNA gene sequence was most commonly used in bacterial phylogenetic studies because of its slow mutation rate due to functional importance. The 16S rRNA sequence is genus specific and hence provides genus identification in more than 90% of the cases   . However, there are some drawbacks associated with the use of 16S rRNA gene sequences such as the presence of mosaicism in the 16S rRNA gene due to horizontal gene transfer and recombination events, the presence of multiple copies of the rRNA operon and the low resolution of closely related species  . Multilocus sequence analysis (MLSA), which combines analysis of several conserved housekeeping genes  , provides improved discriminatory power over the use of single locus sequence and thus, it has been increasingly used in phylogenetic analysis of prokaryotes      . For the genus Bradyrhizobium, a database for the taxonomic and phylogenetic identification using MLSA is available online at http://mlsa.cnpso.embrapa.br  .
In this study, we analyzed 17 bacterial strains, of which 13 had been isolated from root nodules of Lupinus spp. from different geographic locations at various times (kindly provided by the National Rhizobium Germplasm Resource Collection, USDA). Additionally, two samples each from root nodules of native Sundial lupine (Lupinus perennis) and domestic soybean (Glycine max) were isolated from plants grown in OH, USA. We applied MLSA to assess the genetic diversity and phylogenetic relationships of these strains using sequence analysis of the 16S rRNA gene and three conserved housekeeping genes (atpD, dnaK and glnII). These housekeeping genes have been widely used in phylogenetic analyses due to their sequence and functional conservation. Additionally, there are a large number of sequences available in the databases. The atpD gene encodes the beta subunit of ATP synthase that produces ATP from ADP in the presence of a proton gradient across the membrane   . The dnaK gene encodes the DnaK protein, which functions as a molecular chaperone responsible for various cellular processes, such as folding of nascent polypeptides, assembly and disassembly of protein complexes, protein degradation and membrane translocation of secreted proteins   . The glnII gene encodes glutamine synthetase II which catalyzes the condensation of glutamate and ammonia to form glutamine     . Our study is distinct because all the reference gene sequences were extracted from the existing complete genomes that are available via the Integrated Microbial Genomes database  . The phylogenetic tree based on the concatenated sequences comparing the 17 strains with 30 reference strains suggests that MLSA provides improved taxonomic relationships of these bacterial strains.
2. Materials and Methods
2.1. Bacterial Strains
A total of 17 strains from at least ten geographic locations were examined in this study (Table 1), including USA, Yugoslavia and Brazil; 13 strains were obtained from the National Rhizobium Germplasm Resource Collection, USDA. Two strains each were isolated from nodules of locally grown (Bowling Green, OH,
Table 1. Bacterial strains.
aSeed company; bPreviously reported as Bradyrhizobium.
USA) Lupinus perennis and Glycine max. The USDA strains had been collected between 1922-73, while the nodules from native lupine and domestic soybean were collected in October 2014 and July 2014, respectively. To isolate bacteria from root nodules, the collected nodules were surface sterilized by the method described by Deng  , with some modifications. Briefly, the nodules were washed with sterile distilled water three times for 30 sec, soaked in 10% Clorox for 30 sec, followed by rinsing with distilled water for 30 sec, then with 70% ethanol for 10 min, after which they were rinsed three times with sterile distilled water. A sterile glass rod was used to crush root nodules from each sample followed by streaking the suspension onto modified arabinose gluconate medium  with a sterile inoculating loop. The cultures were incubated at 30˚C. Initial verification was conducted by colony morphology and Gram stain. All the strains were maintained at 4˚C for temporary storage and in 20% glycerol at −80˚C for long term storage.
2.2. DNA Extraction, PCR and Sequencing
Total genomic DNA of bacterial strains was extracted using ZR Fungal/Bacterial DNA MiniPrep™ kit (Zymo research, CA) following the manufacturer’s recommendations. The 16S rRNA gene and housekeeping genes (atpD, dnaK and glnII) were amplified using primers and PCR conditions       listed in Table 2. The PCR products were purified using the Qiagen MinElute PCR Purification kit (QIAGEN, Germany) and were quantified using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, PA). Sequencing of the PCR amplicons was conducted by DNA Analysis LLC (Cincinnati, OH).
2.3. Phylogenetic Analyses
Multiple nucleotide sequence alignments were generated using MUSCLE version 3.5  . The sequences of three housekeeping genes atpD, dnaK and glnII were concatenated using Sequence Matrix version 1.0  . The phylogenies of each gene and the concatenated sequences were constructed by the maximum-likelihood method using MEGA version 6.06  , with default parameters. Statistical support for tree nodes was determined by bootstrap analysis of 1000 replicates. We used MEGA version 6.06 for identification of conserved, variable and parsimony informative regions of consensus sequences (Table 3).
For phylogenetic analyses, the sequences encoding 16S rRNA and three housekeeping genes of 30 reference strains representing Bradyrhizobium, Burkholderia, Mesorhizobium, Microvirga, Rhizobium and Rhodopseudomonas genera were extracted from the existing complete genomes available via the Integrated Microbial Genomes database of the Joint Genome Institute (http://img.jgi.doe.gov)  and the GenBank database of the NCBI (http://www.ncbi.nlm.nih.gov)  . Sequences of Campylobacter jejuni NCTC 11168 and Helicobacter pylori OK113 were used as outgroups. Accession information of all the sequences used in this study is listed in Table 4.
Table 2. Primers and PCR conditions.
Primers used for: aUSDA 3057a, bL_3D52, cUSDA 3043 and L_3D52, dL_OO, eUSDA 3063, USDA 3715 and USDA 3717.
Table 3. Sequence information. 17 (16S rRNA, dnaK), 15 (glnII) and 14 (atpD) strains were analyzed, together with 30 reference strains.
*Mean number of nucleotides amplified/number of sites analyzed, including gaps.
Table 4. Accession numbers of sequences of the reference strains.
aGene sequences from NCBI GenBank database. bGene sequences from Integrated Microbial Genomes database, United States Department of Energy.
3.1. Phylogeny Based on the 16S rRNA Sequence
PCR amplification of the 16S rRNA gene yielded a single amplicon for each strain ranging in size from 880 bp to 1600 bp. The estimated mean frequencies of T, C, A, and G nucleotides within this region were 19.9%, 23.6%, 24.5%, and 32%, respectively, as shown in Table 3. The consensus 16S rRNA sequence of all 47 strains spanned 1734 positions, out of which 688 were conserved, 918 were variable and 612 were parsimony-informative. A position was considered parsimony-informative if it contained at least two types of nucleotides, and at least two of them occurred with a minimum frequency of two.
The phylogenetic tree constructed with 16S rRNA gene sequences was fairly well resolved and split the strains into two clades: one large group and one relatively smaller group (Groups I and II, Figure 1). Group I, with a bootstrap support of 98%, was further divided into two subgroups that clustered strains primarily belonging to the genera of Bradyrhizobium and Microvirga (clades 1 and 2), or Mesorhizobium and Rhizobium genera (clades 3 and 4) with 96%, 38%, 87%, and 94% bootstrap support, respectively. USDA strains 3040, 3709, 3042 and 3051 together with SB_J (soybean) and SB_5 (soybean) were clustered with Bradyrhizobium, whereas USDA strains 3057a, 3048, 3060, 3715 and 3044 were placed with Microvirga. USDA strains 3717 and 3063 were grouped with Mesorhizobium whereas L_3D52 (lupine) was grouped with Rhizobium. The second smaller group of the 16S rRNA gene tree contained a subgroup with 99% bootstrap support, comprised of five reference strains of Burkholderia, USDA 3504, USDA 3043 and L_OO (lupine).
3.2. Phylogenies Based on Housekeeping Gene Sequences
The three housekeeping genes selected for this study are highly conserved among bacteria of the order Rhizobiales. The mean lengths of the fragments of atpD, dnaK, and glnII genes used in phylogenetic analysis were 1175 bp, 1372 bp, and 957 bp, respectively (Table 3). When sequence conservation at the DNA level was considered, the lowest level of conservation (24%) was observed with glnII, while sequence conservation of atpD and dnaK genes was 41.5% and 39.8%, respectively.
In general, the maximum-likelihood phylogenetic trees of the housekeeping genes (Figures 2-4) were similar to that of the 16S rRNA gene. While most of the clustering pattern was consistent among the single gene trees, there were some exceptions. These exceptions include USDA strains 3044, 3048, 3715 and 3060, which grouped with Microvirga in the 16S rRNA gene tree, but clustered with Bradyrhizobium in all three housekeeping gene trees as shown in Figures 2-4. Furthermore, USDA strain 3504, which was grouped closer to Burkholderia in the 16S rRNA gene tree, was clustered with Rhizobium in the glnII gene tree (Figure 4) but with Bradyrhizobium in the atpD and dnaK gene trees (Figure 2 and Figure 3).
Figure 1. Maximum-likelihood phylogenetic tree showing the relationships of 49 strains based on the 16S rRNA gene sequences. Strains examined in the present study are shown in boldface. Clades I and II correspond to the two main groups identified and Arabic numbers represent the subclades. Bootstrap values ≥ 50% (1000 replicates) are given at the branching points. The scale bar indicates the number of substitutions per site. Campylobacter jejuni NCTC 11168 and Helicobacter pylori OK 113 were used as outgroups. Phylogenetic analyses were conducted using MEGA version 6.06.
Figure 2. Maximum-likelihood phylogenetic trees showing the relationships of 46 strains based on the atpD gene sequences. Strains examined in the present study are shown in boldface. Clades I and II correspond to the two main groups identified and Arabic numbers represent the subclades. Bootstrap values ≥ 50% (1000 replicates) are given at the branching points. The scale bar indicates the number of substitutions per site. Campylobacter jejuni NCTC 11168 and Helicobacter pylori OK 113 were used as outgroups. Phylogenetic analyses were conducted using MEGA version 6.06.
Figure 3. Maximum-likelihood phylogenetic trees showing the relationships of 49 strains based on the dnaK gene sequences. Strains examined in the present study are shown in boldface. Clades I and II correspond to the two main groups identified and Arabic numbers represent the subclades. Bootstrap values ≥ 50% (1000 replicates) are given at the branching points. The scale bar indicates the number of substitutions per site. Campylobacter jejuni NCTC 11168 and Helicobacter pylori OK 113 were used as outgroups. Phylogenetic analyses were conducted using MEGA version 6.06.
Figure 4. Maximum-likelihood phylogenetic trees showing the relationships of 47 strains based on the glnII gene sequences. Strains examined in the present study are shown in boldface. Clades I and II correspond to the two main groups identified and Arabic numbers represent the subclades. Bootstrap values ≥ 50% (1000 replicates) are given at the branching points. The scale bar indicates the number of substitutions per site. Campylobacter jejuni NCTC 11168 and Helicobacter pylori OK 113 were used as outgroups. Phylogenetic analyses were conducted using MEGA version 6.06.
3.3. Phylogeny Based on Concatenated atpD, dnaK and glnII Sequences
When the three housekeeping genes, atpD, dnaK and glnII, were used to generate a concatenated phylogenetic tree, a consensus sequence of 3388 bp was produced; of the 5393 positions analyzed, 35.4% was conserved, 61% variable and 49.5% parsimony-informative. The calculated mean frequencies of T, C, A, and G nucleotides were 16.2%, 32%, 21.3%, and 30.6%, respectively (Table 3).
Two groups were resolved in the concatenated tree; one large group and one relatively smaller group (Figure 5) as found in the 16s rRNA tree. Both groups had a bootstrap support of 84%. The larger group contained four subgroups, each of which included reference strains of Bradyrhizobium, Microvirga, Mesorhizobium, and Rhizobium genera with 87%, 100%, 97%, and 100% bootstrap support, respectively. USDA strains 3040, 3709, 3042, 3044, 3048, 3060, 3504, 3715 and 3051 from Lupinus together with SB_J and SB_5 from Glycine max were placed with Bradyrhizobium, whereas only USDA strain 3057a grouped with Microvirga. USDA strains 3717 and 3063 were placed with Mesorhizobium whereas L_3D52 was included in the subgroup containing Rhizobium. The second smaller group of the concatenated gene tree comprised five reference strains of Burkholderia, USDA 3043 and L_OO (Figure 5).
When the grouping patterns were compared, the concatenated gene tree was more like the trees generated from each of the single housekeeping genes (Figures 2-4) than to the one generated from 16S rRNA gene (Figure 1). The clustering of most strains was consistent with the single gene trees and the concatenated gene tree with few exceptions. For example, the USDA 3504 strain grouped with Burkholderia when only the 16S rRNA gene was used, and with Rhizobium in glnII gene analyses; however, it was classified with Bradyrhizobium both when atpD alone and concatenated sequences were used. Other exceptions include strains USDA 3044, 3048, 3715 and 3060 that were grouped with Microvirga in the 16S rRNA gene tree, but with Rhizobium when glnII was analyzed. These strains clustered with Bradyrhizobium in the atpD, dnaK and concatenated trees.
DNA sequence analysis of evolutionarily stable marker genes is commonly used for the identification and classification of bacterial species  . The past two decades, microbiologists have primarily relied on 16S rRNA gene sequences for bacterial classification because of conservation of its ribosomal operon size, nucleotide sequence and secondary structure within a bacterial species  . However, drawbacks associated with the use of 16S rRNA gene sequences have limited its applicability for bacterial phylogenetic analyses  . Recent studies suggest the possibility of occurrence of horizontal gene transfer and recombination events within the 16S rRNA gene   , including reports of horizontal gene transfers and mosaic-like structures within the 16S rRNA gene in bacterial
Figure 5. Maximum-likelihood phylogenetic tree showing the relationships of 49 strains based on the concatenated sequences of atpD, dnaK and glnII genes. Strains examined in the present study are shown in boldface. Clades I and II correspond to the two main groups identified and Arabic numbers represent the subclades. Bootstrap values ≥ 50% (1000 replicates) are given at the branching points. The scale bar indicates the number of substitutions per site. Campylobacter jejuni NCTC 11168 and Helicobacter pylori OK 113 were used as outgroups. Phylogenetic analyses were conducted using MEGA version 6.06.
genera such as Rhizobium, Aeromonas, Bradyrhizobium, Streptococcus and Actinomycetes     . It has been also reported that the polymorphic nucleotide distribution pattern within the 16S rRNA sequence is highly variable among different species. This could lead to phylogenies that are inconsistent with other genes as evident in causing uncertain phylogenetic placement of Rhizobium galegae  . Therefore, phylogenetic analysis of rhizobial species based on the analysis of partial 16S rRNA gene sequences may lead to distorted phylogenies and misidentification, because it may represent only a part of the mosaic- like structure   .
The presence of multiple copies of the rRNA operon and intra-genomic heterogeneity is another factor that makes 16S rRNA gene an imperfect choice for phylogenetic analysis. The copy number of the rRNA genes can vary from one to 15 within a single bacterial genome  . Although these multiple copies are mostly identical or nearly identical, there are reports of divergent copies within a single genome  . When Rhizobia are considered, the slow growing Bradyrhizobium strains have only a single copy of 16S rRNA gene, whereas the faster growing Rhizobium spp. possess three copies  . Copy number variations and potential of horizontal gene transfer events limit the use of 16S rRNA gene sequence in taxonomy as well as in MLSA  . In addition, high degree of conservation and sequence similarity among the species of the genus Bradyrhizobium have been reported  . All these facts signify that the 16S rRNA sequence is an inferior candidate for phylogenetic inference of rhizobia and emphasize the importance of employing alternative, multi-locus strategies.
For MLSA, we selected a set of housekeeping genes based on the sequence variability among the particular species of bacteria. The housekeeping genes atpD, dnaJ, dnaK, gap, glnA, glnII, gltA, gyrB, pnp, recA, rpoA, rpoB and thrC have been used in MLSA of rhizobial species  . It is also important to determine the number of housekeeping genes that should be used for MLSA. In general, three or more housekeeping genes have been commonly used in MLSA approach for the inference of phylogenetic relationships of rhizobia     -  . In this study, we used three housekeeping genes, atpD, dnaK and glnII, to achieve a good resolution. Since concatenated sequences can yield more accurate phylogenetic trees than consensus of separate gene phylogenies even for sequences with different substitution patterns  , concatenated sequences of the three housekeeping genes were used in phylogenetic analysis.
Out of the 13 USDA strains, 10 strains (USDA 3040, 3042, 3043, 3044, 3048, 3051, 3057a, 3060, 3063 and 3709) have been previously reported as Bradyrhizobium    , USDA 3717 has been classified as Mesorhizobium  , while USDA 3504 and 3715 have not been previously studied. Our MLSA research showed that seven strains (USDA 3040, 3042, 3044, 3048, 3051, 3060, and 3709) are Bradyrhizobium, confirming the previous reports. The strain USDA 3051, previously identified as Rhizobium lupini, was recently reclassified by Peix  as Bradyrhizobium lupini based on MLSA of 16S rRNA, recA and glnII gene sequences. Consistent with the latter report, we identified USDA 3051 as Bradyrhizobium. We used the same approach to analyze two of the same marker genes (16S rRNA and glnII) with two additional housekeeping genes (dnaK and atpD) to support this classification. According to the topologies of both 16S rRNA and concatenated gene phylogenies (Figure 1 and Figure 5), the strains USDA 3043, 3057a, and 3063 grouped with Burkholderia, Microvirga, and Mesorhizobium, respectively. Thus, we recommend to reclassify these three strains, which were previously reported as Bradyrhizobium  . The USDA 3717 was classified as Mesorhizobium in this study, in agreement with previous identification  . To our knowledge, this is the first study to identify USDA 3504 and 3715 and we propose to classify them as Bradyrhizobium (Table 1).
Of the strains obtained from locally grown plants (OH, USA), the two soybean strains (SB_J and SB_5), were identified as Bradyrhizobium, whereas the new lupine strains, L_OO and L_3D52 were identified as Burkholderia and Rhizobium, respectively. These identifications are consistent among all trees obtained from 16S rRNA, single housekeeping and concatenated gene phylogenies. No relationships between the geographical origins and the patterns of gene sequences were apparent in this study, most likely due to this small sample and its limited geographic origins.
Although members of the genus Burkholderia are reported as endophytes of lupines   , there is a lack of evidence about them forming nodules. However, some species of Burkholderia such as B. phymatum, B. tuberum, B. vietnamiensis and B. cepacia are known to effectively nodulate certain other important legumes including common bean and fix nitrogen    -  . These nodulating Burkholderia species contain nod and nif genes that are very similar to those of alphaproteobacteria, suggesting a common origin  . In this MLSA study, the two strains USDA 3043 and L_OO, both of which were isolated from L. perennis, were identified as Burkholderia.
The sequences of atpD and glnII genes of USDA 3043 and L_OO could not be amplified after several attempts using primers specific for rhizobial species listed in Table 2. Since both these strains were identified as Burkholderia with the 16S rRNA and dnaK gene phylogenies, we developed Burkholderia-specific primers for the glnII gene. For the atpD gene, primers specific for Burkholderia by Estrada-De Los Santos  were employed. In addition, the atpD gene of the strain SB_J, which was identified as Bradyrhizobium in all the other trees, could not be amplified. There is compelling evidence of intragenic recombination in the atpD gene, which might be the underlying reason behind the inability to amplify this gene in USDA 3043, L_OO and SB_J even when using genus-specific primers     . Furthermore, it has been recommended that the atpD gene should be used with caution in studying the phylogeny of bacteria belonging to the genus Bradyrhizobium  .
Among the single housekeeping gene trees and the concatenated gene tree, the evidence producing a phylogeny that was incongruent with others was the glnII gene tree, in which the sequences from USDA 3504 clustered with Rhizobium. The glnII gene sequence also harbored the greatest variability (71.3%, Table 3). This high genetic heterogeneity could be attributed to genetic recombination within the glnII gene  . In addition, the two outgroups used in this study placed close to the cluster of Burkholderia in both 16S rRNA and the glnII gene trees, further suggesting the possibility of genetic recombination (Figure 1 and Figure 4).
In conclusion, this study employed MLSA of concatenated housekeeping genes are further authentication for this approach as a more robust method for phylogenetic analysis of rhizobia over the analysis of 16S rRNA gene sequences alone   . According to the phylogeny of the concatenated dataset, we propose that USDA strains 3063, 3717, 3043 and 3057a should be considered for reclassification. Also, despite evidence that lupines are mainly nodulated by members of the genus Bradyrhizobium  , the two strains isolated from lupines in this study were shown to be Burkholderia sp. and Rhizobium sp. However, additional studies are required to confirm symbiotic nodulation, especially of Burkholderia spp., by 1) re-inoculation on lupine, 2) sequencing of symbiosis-essential genes such as nod and nif, and 3) testing nitrogen fixation ability by culturing on a N-free semisolid medium such as BAz medium and an acetylene reduction activity assay   .
This research was funded in part by the Center for Undergraduate Research Scholarship at Bowling Green State University and by the United States Department of Agriculture National Institute of Food and Agriculture (USDA-NIFA) Agriculture and Food Research Initiative Oomycete-Soybean CAP (award 2011-68004-30104). We thank Patrick Elia, the National Rhizobium Germplasm Resource Collection, USDA, for providing bacterial strains and a protocol for growing and maintaining the cultures. We are also grateful to Jacob Sublett for assistance with sample collections, Gayathri Beligala for help with maintaining cultures, Madushanka Manathunga for computer technical support, and Drs. Michael E. Geusz and Paul F. Morris for comments that greatly improved the manuscript.