Recent findings have led to a greater awareness that only a small fraction of proteins function in isolation while the majority of soluble and membrane-bound proteins in modern cells are symmetrical oligomeric complexes with two or more identical or very similar chains   . It is argued that the evolution of protein complexes gives several potential advantages such as increased structural size and diversity, and increased opportunities for allosteric regulation and protein activation   . Among these complexes, cage architectures have been observed through the self-assembly of viral capsid, vault, heat shock and ferritin proteins  . In this paper, we specifically looked at ferritin which can be found ubiquitously in nature.
Ferritin plays a key role in iron detoxification and reserve, storing excess cellular iron as mineralized hydrous ferric oxide in their cavities. Ferritin proteinscan self-assemble into multi-subunit, nano-scale cages. They are also highly amenable through genetic and chemical modifications, which have attracted much recent attention in drug delivery studies and nanomaterial science  . In our previous research, site-directed mutagenesis on N- and C-termini of ferritin 3kx9 led to an increase in the volume of self-assembled protein cages  . In order to account for this observation, we will look into the relationship between the number of ferritin subunits and the size of its outer diameter in this paper.
All protein structural data involved in this calculation was from the PDB  website (http://www.rcsb.org/). On the website, we identified all proteins that share more than 40% similarity to the amino acid sequence of 3kx9 provided by PDB. We found a total of 25 such proteins, including 3kx9. The 3D structure file (PDB format) and the sequence information file (fasta format) for each protein were downloaded.
In our previously published paper  , the results of the two calculation methods were compared. Due to the relatively large amount of calculation, the method based on the center of the sphere is used here. The paper  also gives a comparison of the calculation results of the two methods. Considering the reliability of the results, we calculated 80%, 85%, 90%, 95% of the maximum distance from the spherical center respectively. The average of the values was multiplied by 2 as the outer diameter. After obtaining the outer diameter data, we used linear regression to establish the relationship between the outer diameter of the 25 ferritin and the daughter strand.
The outer diameter and the number of sub-chains of 25 ferritins are shown in Table 1.
Figure 1 shows the outer diameter density distribution of ferritins with 6, 8 and 24 sub-chains.
We used linear regression to establish the relationship between the outer diameter and the number of daughter strands of the 25 ferritins. Figure 2 shows the results of linear regression.
In Figure 2, the abscissa is the number of sub-chains, and the ordinate is the corresponding outer diameter. The correlation coefficient R2 = 0.6469, indicating that the performance of the model is very limited. In particular, we can see
Table 1. OD and number of sub-chains of 25 ferritins.
that the outer diameter distribution of the 12 sub-chains is very large, ranging from about 140 to 210. Also, the outer diameter of the 24 sub-chains is found to be significantly lower than the expected value of the model.
5.1. Ways to Improve
We found that all 25 ferritins consisted of multiple daughter strands of an identical sequence. We used Maga  software to construct a phylogenetic tree for the sub-chains of 25 proteins. The results are shown in Figure 3.
From Figure 3, we discovered that 25 ferritins can be roughly divided into two groups. The lower group contains 11 proteins with corresponding number of sub-chains of 8, 12, 24, 36. The upper group contains 14 proteins, and the corresponding number of sub-chains is 1, 3, 6, and 12 respectively. Obviously, the outer diameter of the lower set is larger compared to that of the upper set. Both groups contain 12 sub-chains, which explains why the 12 sub-chains have the largest difference in outer diameter.
5.2. The Problem That the Amino Acid Sequence Is Consistent and the Number of Sub-Chains Is Different
We found that the sequences of 3kx9 and 1s3q are identical, but the number of the two sub-chains is different. There are 24 sub-chains in 3kx9 and only 12
(a) (b) (c)
Figure 1. Outer diameter density distribution. (a) 6 sub-chains; (b) 8 sub-chains; (c) 24 sub-chains.
Figure 2. Relationship between outer diameter and number of daughter strands.
sub-chains in 1s3q. This indicates that protein self-assembly is a complex kinetic process. Under different conditions, the same sequence of sub-chains can produce ferritins with different number of daughter strands.
Figure 3. The evolutionary tree of 25 ferritin strand sequences.
Figure 4. Relationship between the outer diameter of the sub-chain and the number of sub-chains (the lower group in the evolutionary tree, 11 proteins).
Interestingly, the number of sub-chains of 3kx9 is twice of 1s3q, but the volume of both is almost the same. Does this mean that the amino acid density of 3kx9 doubles that of 1s3q? This question deserves further exploration.
5.3. The Problem That the Mutant’s Volume Becomes Larger
In their work, Williams and co-workers demonstrates that a mutation at a critical interface in DNA-binding protein from starved cells (DPS) alters its assembly
Figure 5. Relationship between the outer diameter of the sub-chain and the number of sub-chains (the upper group in the evolutionary tree, 14 proteins).
Figure 6. The size distribution curve of 36-mer ferritin.
from the canonical 12-mer to a ferritin-like 24-mer under crystallization  . According to the size distribution curve of 36-mer ferritin (Figure 6), it may well be that an increase in diameter from 12 to 18 d∙nm after site-directed mutagenesis on ferritin 3kx9 is due to a structural switch from 24-mer to 36-mer. If this speculation can be confirmed with experimental data, it introduces a new concept of mutational switch between related protein subfamilies.
In this paper, we discovered a close relationship between the number and type of ferritin subunits and the size of outer diameter. After dividing 25 ferritins into two groups based on evolutionary relationships, we significantly enhanced the accuracy of the model and showed a strong positive linear correlation between subunit number and outer diameter of ferritins in both groups, which provided novel understandings of the structural features of ferritin.