ock) protein synthesis), protein is not synthesized in alternative reading. The conclusion was made about the powerful biological protection: nature does not need ephemeral proteins, it does not synthesize proteins corresponding to the shifted positions (for example, if the initial point moves during mutations). Thus, two reading frames―RS were introduced: an open reading frame (ORS)―a sequence of codons that does not contain codons of the term and a blocked RS-BRS, when such codons occur  . For rice 2 protein corresponds to ORS, shifted positions of nucleotide investigations, both +1 and −1-BRS.
In the shifted States, we obtain other sequences of codes (two other RS), in each of which the codons of ter termination will be repeatedly present, which are shown in Figure 2 marked with the same symbol*. It will be two RS with locks-BRS. It can be seen that with the help of three nucleotide substitutions for the same Nucleo-tid C (the positions of the substitutions are indicated below the gene text), none of the three codons ter (symbol*) on a given section of the gene will not occur, and with such substitutions the protein sequence will not change because these three men represent the three substitutions of codons to synonyms. However, ti-typical gene arranged so that these shifts were given just two BRS  . It turned out that only for overlapping genes such a ban does not exist. For the first time this effect was experimentally installed in 1976 in the course of research in reading first whole genome of bacterial viruses FKH 174  . After these studies, their leader F. Sanger became the only two-time winner in the history of the Nobel prize in chemistry. F. Sanger showed interest in one of the first of our RA-bot in a letter to him I formulated a new property of the first whole of genome  . It consists in the fact that to record such a genome, it is necessary to use all 61 semantic codons―this is due to the overlap of genes first discovered in this genome. I was asked to present the result in NAURE. His answer is given in the Sanger file. The total nucleotide sequence of ring single-chain DNA (APPARENTLY, the single-chain factor of DNA established earlier experimentally
Figure 2. The plot of a protein sequence (the first amino acid is Met) encoded in the gene starting with ATG (the first nucleotide and in this triplet marked with a fat point) and cases of shifts to +1 or −1 nucleotide.
and was decisive for reading the first whole genome) contains 5386 nucleotides  , but the total number of amino acid residues in the aftereffects of all proteins multiplied by 3 (taking into account the non-coding regions) exceeds this number of nucleotides. It has been shown that gene E contains a 273 nucleotide and the gene is localized within the D  . This is the first experimentally detected overlap shown in Figure 3.
Currently, it is believed that overlapping genes are although unusual, but still quite common element of the genome organization. In decoded the human genome discovered multiple genetic overlap  have been about 1700. Accumulated extensive material on the genetic overlap has set itself the goal of a thorough and comprehensive analysis. Let us focus on some of the results obtained by us on the basis of mathematical analysis.
It can be seen that there are only 5 different cases of overlapping genes, resolved by the structure of DNA (Figure 4), of which the first two relate to overlaps of genes from the same DNA chain, and the remaining 3-to overlaps of genes taken from different DNA chains.
For Figure 4, only small fragments of real interruptions are presented, and the total length of some of them reaches almost 1300 nucleotides. In addition, the total length of overlap can reach more than half of the genome size (GSHV virus).
It should be emphasized that it is the analysis of multiple relationships of co-dons in genetic overlaps that is the main tool of the conducted research.
4. Degeneracy of Code
One of the tasks that was set by us, refers to the fundamental problem of the genetic code: why do we need the degeneracy of the code, when for the same amino acid are usually more than one encoding, up to 6-and coding. We have investigated the participation of all of the sense codons in overlapping genes. It turned out that there are many overlapping genomes in which all 61 semantic codons must necessarily participate, and with the exclusion of at least one codon, the record of genetic overlap found in experiments seems impossible. One of these genomes is the first whole genome for the bacterium virus, PH174, containing overlaps for 814 nucleotides  . Our article  , as well as the article accompanying it, were cited at least 100 times each (see SITA-tion file 300).
Figure 3. The first genetic overlap was found experimentally  . The figure is presented in the format of the publication of the full text of the first whole genome for the bacteriophage ФX 174  . We see that starting from position 568 and up to position 840, the coding of the new protein E was established on the site of the nucleotide sequence of another protein-protein D.
Figure 4. Five possible cases of overlaps of genes associated with a single (1,2) or two chains of DNA (3-5). Reading texts in this case is carried out in different directions (indicated by the arrow): from left to right for B11, B12, B21, B22, B31, B41, B51 and from right to left for B32, B42, B52. In these fragments contains only the canonical pair of DNA: CG and AT.
5. Irregular Code
Our next task was related to the analysis of codons that deviate the genetic code from the homogeneous structure. This is one of the most mysterious features of the genetic code. As shown by the mathematical analysis of additional codon representation from the Table 1, or irregularity, for Ser it is AGY, for Leu it is TTX, for Arg it is AGX, allow in principle “to organize” overlaps in a number of genomes or significantly expand the range of genetic overlaps both for double (for genome FX 174 in 7 times) and for triple (for HIV-2  in 5 times) overlaps, if they were organized using a homogeneous code, or code without irregularities.
6. The Potential of the Code for Many Genetic Overlaps
Next, we set the task of what is the potential of the genetic code to create all these cases of overlap. The answer was the following―a phenomenal potential! This result was evaluated by the Nobel Laureate премии. de Duve to me (see the file). It turned out that only 16 amino acid pairs out of a possible 400 can create obstacles to the construction of all 5 cases of overlap. These are amino acid pairs for only three cases of overlap:
in case 2, it is 5 pairs:
MetMet, MetAsn, MetLys, Methyl, MetThr, (2)
in case of 3 it is 6 pairs:
PheTyr, TyrTyr, HisTyr, AsnTyr, AspTyr, CysTyr, (3)
in case of 5, it is 5 pairs:
PheMet, PheAsn, PheLys, PheIle, PheThr. (4)
In other words, it seems that the genetic code is under overlapping. Is that so? The answer to this question will be given below.
So, we have established the first integral characteristic of the genetic code, which is denoted by p and which is equal to 16 for the standard code:
7. Common Property of All Natural Codes
The result led to a halt of new tasks. What is the value of p for non-standard (deviant) codes, the number of which is 14 and continues to grow? Note that the first non-standard code was discovered in 1979 in a human cell in a separate organelle―in mitochon-drii: genes of mitochondrial DNA-mtDNA were recorded by such code  . Only 4 codons were reinterpreted. Calculations have shown that the value of p for all 14 deviant codes does not exceed the value of 22 or about 5% of the total number of amino acid pairs  , see Table 2. At the same time over-pretime were all the same three cases of overlapping, like for the standard code and in addition was discovered a code with a zero value of the value p. Thus all the natural genetic codes have a small number of prohibitions on the construction of a genetic overlap. This may be seen as a common property of all natural codes known to date. The question arises: why do natural codes correspond to this, while the number of records of genes with overlaps is immeasurably less than the usual non-overlapping genes and what is the role of rethought codons? Both issues were resolved.
8. One Mathematical Analogy
In solving the first of these issues, an important mathematical analogy was established between gene overlaps from different DNA chains and the most important structural units. As is known, the gene sityva is with DNA, occurs in his text multiple modification of nucleotide T with U (uracil) and forms mRNA, which in turn is structured. The most important elements of this structure are the stems-fragments containing bonds similar to those in DNA. For rice 5 one of the stems of the most known secondary structure of matrix RNA-RNA MS2  is given, which contains more than 130 stems. The structure was completely analyzed by us, and the results are presented in my monograph 2014.
For Figure 5 shows a single stalk of this secondary structure (B), there are also a fragment of A, in the range 3022-3048 primary structure in the style of the quoted article. However, in fragment B, given the record of not only the nucleotides and the corresponding amino acids.-A. But the stem of Figure 5(b) corresponds to the overlap of fragments as if taken from different DNA chains. The reading direction on the stalk B (arrow-Ki) becomes different and this fragment of the secondary structure of the equivalent overlap shaded in a areas taken from different DNA targets (case of overlap 4). However, there are no different chains. This effect is the rotation of the original reading direction (→) (bottom line Figure 5(b) (←)) is due to the presence of a so-called pin loop UCUAUAA-sequence, not the stem.
What is the role of the smallness of the first characteristic p? The fact that its small value allows you to build a phenomenal variety of genetic overlap genes, and thus it allows you to build a phenomenal variety of secondary structures, including also functionally significant areas of the secondary structures of mRNA for all genes recorded as standard, and any of the known deviant codes. The increase of the specified characteristic for the code deviated from the standard one (for example, by an order of magnitude, as it was shown for the hypothetical code from the monograph) leads to a significant reduction of such diversity.
Figure 5. The stem of the secondary structure of MS2 RNA matrix. At the top (a) is given a linear text―a fragment in the range 3022-3048. And the shaded areas correspond to the stems of the secondary structure. Under the text (b), a fragment of the secondary structure is shown. Shows the presence of noncanonicial of a pair of GU (they were discovered experimentally in the structures of the RNA), in addition to the canonical CG in DNA, the canonical and analogue-AU.
Thus, it is established that the small value performs two functions: it allows to build both a phenomenal set of genetic transformations and a phenomenal variety of secondary structures of matrix RNAS for all genes.
9. About the Role Rethought Codons
Let us now consider the role of the reinterpreted codons. We raised the question of the possible relationship between the limitations on the overlap (2)-(4) and the code variability observed in a number of organizations. The analysis showed that such a relationship exists, and it is expressed in the fact that for a number of deviant codes (examples for some of them found in mitochondrial DNA are shown in Figure 6), at-home rethinking codons lead to the possibility of constructing GE-neticesi ceilings no standard code.
In each of the four pieces of overlap shown in Figure 3 the role of the same permutation is shown: TGA(ter) Trp. This natural permutation is observed for
Figure 6. Fragments of genetic overlaps found in the mitochondria of four organisms whose genes are recorded by codes deviated from the standard code. This is the ceiling in one of the DNA chain. Fragments and names of proteins are given by publications     . The number indicates the nucleotide number in the genome.
three deviant codes, which correspond to the given fragments, respectively; the second and fourth fragments are written by the same deviant code. Moreover, a unique permutation is present in all three deviant codes. It turned out that such a permutation made possible overlaps for MetAsn pairs (Figure 6(a), this case corresponds to the DNA of the human mitochondria), MetMet (Figure 6(b), twice MetThr (Figure 6(c) and MetLys (Figure 6(d)), which are forbidden for standard code, see (2). Specified nucleotide pairs and reinterpreted codons are highlighted. Thus, the size of genomes is reduced due to the possibility of building gene overlaps, which are not possible for the standard code. Such a reduction for a living cell can be quite large, because the number of mitochondria, as a rule, more than 1 and can reach a million. The study was cited at least 100 times (see file 300 citation).
The results obtained allowed us to turn to the analysis of experimental data on all deviant genetic codes, or codes rejected from the standard code. However, within the framework of genetic transformations, I was not able to explain the functional significance of all over-understood codons in all deviant codes. The required solution would be found in the study of areas of DNA where genes do not overlap, and such genes―the vast majority.
10. Two Integral Characteristics of the Code
We are talking about the natural blocking of the genes when all 5 codons of sequences, alternative sequences of a gene whether the reading frames―RS contain multiple stopping protein synthesis, or codons from the set (1). For Figure 7 is shown for a portion of a gene and early simplified Figure 2.
The potential of such blocking for a standard genetic code was established: it was shown that for such code only 210 amino acid pairs out of 400 possible ones participate in the blocking process. It is shown that the 31 pairs of them gives the
Figure 7. Six RF for the gene fragment (beginning with the ATG (Met) codon, the reading direction is indicated by an arrow → one of which is open-ORFO (it has 17 semantic codons), and 5 RF―alternative RF are blocked: BRF1-BRF5. While BRF3-BRF5 correspond to the other chain of DNA and reading the sequence of codons is performed in the reverse direction (←). Figures in brackets show the shift in nucleotides relative to the ORFO. Symbol * was designated each of the three codons ter of (1).
inevitable blocking that take place in all the encodings of the amino acids contained in these pairs. The second integral characteristic of the genetic code  containing two components was introduced into consideration:
With this in mind, in addition to the unavoidable, the possible locks that arise for a limited number of encodings were introduced; these locks are the main component of 210 locks, which also include 31 inevitable locks (see Figure 8 and Table 3).
Figure 8. The inevitable lock for a pair of amino acids ValLys (left) and the who-possible blocking for a pair of amino acids ValGlu (right). The ter codons of (1) are shaded. Arrows indicate reading direction. The full list of inevitable locks is presented in Table 3.
Table 3. Complete list of amino acid pairs that cause an inevitable block (column 1), with specification of the numbers of RFs that are blocked (column 2).
Column 2 of this table lists the RF numbers for which (potential) a lock may occur. The number of such RF, depending on the pair, varies from 1 to 4. Let’s imagine the set of these pairs as two subsets. The first will include the pair, inevitably blocking the same RF. There were only 16 such pairs―these are the first 16 pairs from the Table 3. The ima eat block inevitably RF2 5 pairs of amino acids, which coincide with the set of (2) defined above; since RF2 is formed in a ter codon is TGA. For RF3 we have 6 blocking pairs matching the set (3), and in RF3 we form one of the codons ter: TAA or TAG. Of particular note is the lock for 5 pairs of numbers 12 - 16, which coincide with pairs of (4) defined above. Only note that each of these latter pairs will inevitably inhibits RF5, as RF5 formed one of the kodon ter: TAA or TGA .However, in PheAsn, PheLys pairs in addition to RF5 in Table 3 is also indicated by RF1. However, the latter RF does not correspond to the inevitable blocking, unlike RS5. Thus, pairs 1 - 16 of Table 3 form a set of amino acid pairs forbidden to overlap the two genes and have been established above. Previously, a numerical characteristic was introduced, which was indicated by the letter p, and which corresponds to the number of different blocking pairs from (2)-(4), we have a value of p from (5.) Thus, the utilization of a subset of the inevitable blokirouac allows the connection of the studied characteristics: the inequality:
In other words, the integral characteristic of the genetic code p is not independent, but is determined by the choice of the characteristic qmin, which is used in solving a completely different problem―in blocking non-overlapping genes.
When considering only one problem-overlapping pairs of genes, it could be concluded that the genetic code was “chosen” for overlapping genes, since only 16 pairs out of 400 possible pairs are suitable for overlapping. This is true for all 5 methods of pair re-discovery of genes, permitted by the structure of DNA. However, when considering two problems and two integral characteristics p and q, it was found that the genetic code was focused on the “choice” of the two-component integral characteristic (6), one of the components of which according to the inequality (7) determines the area of the value of the other integral characteristic p. Thus, the smallness of the integral characteristic of p is a consequence of a more general principle associated with the selection of the whole set of inevitable amino acid pairs that create blockers. I mean… amino acid pairs corresponding to the characteristic p can be “selected” only from this limited set of the corresponding qmin, and not from the complete set of 400 amino acid pairs. According to what criterion the proposed “choice” of the genetic code has gone out yet remains non-existent.
In connection with the analysis of the blocking problem, we have investigated a number of genomes with a total number of genes more than 200,000. The story of this work requires a separate consideration; a more detailed analysis is presented in  . Note only one result obtained for the human genome. Of the 25,613 genes in this genome, three genes do not contain any blockages: for each of them, the figures are similar to rice.8 do not contain any codon of termination in any of the five alternative RS. It MT1M and MT1G genes from chromosome 16 and KLK8 of chromosome 19 (see Figure 9).
In connection with the obtained result, a number of hypotheses were put forward, from which the simplest is nothing more than cases of overlap of 6 genes.
11. Mathematical Analysis of Code Deviance
The task of studying locks for deviant codes allowed to complete the analysis of one of the most important fundamental problems related to the role of all the reinterpreted codons. Data were obtained for mtDNA of two organisms: H. Sapiens (code K1) and A. Mellifera (code K2). Refer to the Table 4.
It follows from the table that for mtDNA H. Sapiens (K1) there is a participation of all the reinterpreted codons (ATA(Ile) → Met, TGA(ter) → Trp, AGA(Arg) → ter, AGG(Arg) → ter) in the blocking process, as well as the
Figure 9. KLK8 gene from human genome and 5 alternative RF. Each of these RF does not contain a single termination codon, compare with Figure 7. Here, the standard one-letter coding is used for amino acids.
Table 4. Summary table of the participation of reassignment codons in two functions: in the lock (in column 1, the sign + corresponds to participation, the sign-to-non-participation), or in the genetic overlap (column 2). Data were obtained for mtDNA of two organisms: H. Sapiens (code K1) and A. Mellifera (code K2). Rethinking codons are indicated in the column: deviations from the standard code.
re-interpreted TGA(ter) → Trp codon is also involved in gene overlap, forbidden for the standard code. For mtDNA A. Mellifera (K2), all the reinterpreted codons(AGA(Arg) → Ser, AGG(Arg) → Ser, ATA(Ile) → Met) participate in the blocking process, in addition, two such codons participate in the closures: TGA(ter) → Trp and ATA (Ile) → Met. Thus, we have shown that all the re-interpreted codons in each of these two deviant codes were used either in the process of blocking or in the process of overlapping genes (of course, prohibited for the standard code), or in both processes.
12. Is the Code Arbitrary?
The obtained results lead to the conclusion that the code deviations from the standard one are not of a random nature, but bear a very clear functional load (cf. “codon Reinterpretation indicates that random changes can occur in the genetic code of mitochondria”, see  ). In the last monograph we also read “the code seem to have been selected arbitrarily…” (“the Code, apparently, was ‘you-bran’ arbitrarily…”). From these results, it follows that it is possible for all semantic codon families to record two protein sequences almost without interference with the same DNA region, and for this, the most favorable (by the combination of amino acids in the overlap) one of the 5- and the variants of such a compact record of genes (5 cases of overlap) can be used. There is a categorical prohibition for no more than 5% of amino acid pairs, both for the standard code and for all 14 known non-standard codes. I mean… 15 code tables satisfy the same General property. This leaves no chance for any arbitrariness.
13. The Sets of Elementary Overlapping
The main working sets in this theory are the sets of elemental genetic overlaps, which are presented for the first time on pages 13 - 27 in  , and the examples are given in Figure 10. Elementary overlapping is overlapping for single amino acids.
Such complete sets have been used repeatedly in the course of the construction of this theory. First, in the proof of the theorem for the genetic code and then these sets were modified 14 times (by the number of deviant codes) to obtain the first integral character of these 14 codes, which are presented in the Table 2. The most important stage of the research was connected with the mathematical analysis of ambiguities in these sets the Components of the study are numerous elementary genetic overlappings is overlapping for single amino acids. The analysis showed that the set contains features that were called ambiguities. The investigated ambiguities correspond to the cases when for the same pair of amino acids there is more than one elementary overlap. Like all special cases in mathematics, this phenomenon has attracted our attention. It is important to note that the results obtained are applicable to the whole diversity of wildlife, whose proteins are recorded by almost the same genetic code. The analysis showed that the ambiguities occur only in cases of overlapping genes belonging
Figure 10. Some elementary overlaps from five sets corresponding to five cases of possible overlaps of gene pairs from Figure 4. Symbols N: A, C, T, G; M: A, T, C; X: A, G; Y: T, C; Z: A, C.
to different DNA chains. The complex numbers, the elementary beams was relatively small―only 6. The study revealed three functions of the possible use of these ambiguities. These functions were three  . One of the functions of ambiguity that is―has been succeeded in the new model proposed by the author of the wound. It consists in the fact that the overlapping pairs of genes belonging to different DNA chains are mathematical analogues of the stems of the secondary structure of matrix RNA. It is shown that due to the ambiguity it is possible to “regulate” the value of free energy of the stem functionally significant biochemical characteristics  . Now about other two functions, it is clear  . The first of them is related to the solution of the problem of potential positions of silent mutations for cases of gene overlap belonging to different DNA chains. The second is related to the expansion of the possibility of constructing sets of genetic overlaps of more than two genes; the structure of possible overlaps of 6 genes in the human genome is analyzed.
The study of the spatial structure of DNA showed that in addition to three families of forms of double helices with antiparallel orientation of the threads, it is possible to form double helices of DNA with a parallel orientation of the threads. Mathematical analysis of such cases is given in section 4.4 of  , where we analyzed all three new cases of the overlapping pairs of genes.
On the basis of the constructed theory, one of the breakthrough problems―the problem of calculating the genetic code―was solved. Such tasks in the world are unknown and could be set only in the 21st century. One of the approaches to solving this problem is given in article  . The mathematical theory of the genetic code is constantly evolving. We would like to point out the last two works in this direction.
Mathematical analysis of large genomes is an actual problem in connection with the development of genome decoding methods. By now, the genomes of man and some other organisms have already been decoded. The paper presents a numerical analysis of some characteristics of the genetic code common to all these genomes. The obtained results allow us to formulate a new property of the genetic code for the overlap of 6 and 3 genes from one DNA chain: the choice of three terms―toric codons from 64 possible triples of the genetic code has virtually no effect on the power of nucleotide chain sets, allowing six-fold or three-fold overlap of genes  .
The second article  is connected with the mathematical analysis, which allowed formulating the property of the three terminator codons of the standard genetic code, when compared with other theoretically possible triples. The mathematical analysis allows formulating the following property of the three terminator codons of the standard genetic code (these codons stop white synthesis with DNA and do not encode any amino acid) when compared with other theoretically possible triples. For any choice of three terminator codons, one can find a DNA chain with a length of 11 nucleotides, where translation is completely forbidden. For the three term-end codons of the standard genetic code on any DNA chain of length 10 or less nucleotides, at least one reading frame is open, i.e. the translation process for at least one of the three reading frames is possible. This length is the maximum possible and when you select another three terminator codons, it may be less. For a triplet of terminator codons of a standard genetic code on any DNA strand of length 10 or less nucleotides, at least one reading frame is open, i.e. a translation process is possible for at least one of the three reading frames. This length is the maximum possible and may be less when choosing another triple of terminator codons. The number of terminator triples with such properties is 2280, and the probability of falling into this group of randomly selected triples of terminator codes is less than 0.06.
The author thanks a brilliant interpreter O. N. Kozlova, who translated this text from Russian.
The work was supported by Russian Foundation for Basic Research (project codes16-01-00018, 17-01-00053).
Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this paper.
 Sanger, F., Coulson, A.R., Friedmann, T., Air, G.M., Barrell, B.G., Brown, N.L., Fiddes, J.C., Hutchison III, C.A., Slocombe, P.M. and Smith, M. (1978) The Nucleotide Sequence of Bacteriophage ФХ174. Journal of Molecular Biology, 125, 225-246.
 Guyader, M., Emerman, M., Sonigo, P., Clavel, F., Montagnier, L. and Alizon, M. (1987) Genome Organization and Transactivation of the Human Immunodeficiency Virus Type 2. Nature, 326, 662-669.
 Kozlov, N.N. (2014) One Integral Characteristic of the Set of Genetic Codes. The Property of All Known Natural Codes. Mathematical Models and Computer Simulations, 6, 622-630.
 Fiers, W., Contreras, R., Duerinck, F., Haegeman, G., Iserentant, D., Merregaert, J., Min Jou, W., Molemans, F., Raeymaekers, A., Van den Berghe, A., Volckaert, G. and Ysebaert, M. (1976) Complete Nucleotide Sequence of Bacteriophage MS2 RNA: Primary and Secondary Structure of the Replicase Gene. Nature, 260, 500-507.
 Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H.L., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J.H., Staden, R. and Young, I.G. (1981) Sequence and Organization of the Human Mitochondrial Genome. Nature, 290, 457-464.
 Clary, D.O. and Wolstenholme, D.R. (1985) The Mitochondrial DNA Molecule of Drosophila Yakuba: Nucleotide Sequence, Gene Organization, and Genetic Code. Journal of Molecular Evolution, 22, 252-271.
 Cantatore, P., Roberti, M., Rainaldi, G., Gadaletа, M.N. and Saccone, C. (1989) The Complete Nucleоtide Sequence, Gene Organization, and Genetic Code of the Mitochondrial Genome of Paracentrotus lividus. The Journal of Biological Chemistry, 264, 10965-10975.