Current advancements in antiretroviral therapy have turned HIV infection from fatal into chronic and manageable disease. However, treatment is only effective until HIV develops resistance against administered drugs . The fundamental problem is that retroviruses depend on their genetic instability as an evolutionary advantage to boost adaptive mutations. HIV has high genetic variability, which is a result of its fast replication cycle coupled with a high mutation rate . Consequently, HIV is capable of rapidly responding to the selective pressures imposed by the immune system and antiretroviral drugs. Drugs target only specific molecules, which are almost always proteins. Because the drug is so specific, any mutation in these molecules will interfere with or negate its effect, resulting in drug resistance .
Biological systems are paramount examples of complex dynamical systems, so mutation emergence is a fundamental property. The proposed model draws a mutability map for simple genomes such as that of the human immunodeficiency virus depending on a linear equation. The ability to calculate the probability of spontaneous mutations in a specific gene, will help provide an overview of the possibility of emergence of resistance to the protein translated from that gene during antiviral drug development. Likewise, this ability will be beneficial during protocoling of the combination therapy to help predict in vivo antiviral drug activity and to deal with mutation-induced drug resistance.
2. Model Equation
The method used to formulate the mutability map of the HIV-1 genetic pool, is a linear relation in which the probability of spontaneous mutation emergence in a specific gene per duplicate (Pg), is directly proportional to the ratio of the gene length (g) to the whole genome length (G).
wherein: (Si) is the stability index, which is a genome-specific fixed value, and so:
(PG) is the probability of spontaneous mutation emergence in the genome per duplicate. And so, the stability index represents the degree of stability of a genome.
The mutation rate of HIV-1 is approximately 3 × 10−5 per nucleotide base per cycle of replication . The HIV-1 genome contains 9181 bases  and accordingly, the stability index of the HIV-1 genome is:
A mathematical analysis, using the proposed equation, is performed and data are collected in 3 tables. Table 1 describes the analysis of the HIV-1 genetic pool, which indicates that the probability of spontaneous mutation emergence is lesser for tat, vpr and vpu. Table 2 describes the detailed analysis of the probability of spontaneous mutations emergence in the components of pol gene: reverse transcriptase, integrase and protease genes. These genes are translated into the main target proteins of antiretroviral therapy. The analysis indicates that reverse transcriptase RT is the most mutant gene of the polymerases and protease PROT is the least. Table 3 describes the analysis of the structural genes of gag and env, which indicates that gp120 is more susceptible to spontaneous mutations emergence, and so has a higher diversity, than gp41.
Table 1. Shows analysis of the probability of spontaneous mutation in HIV-1 genetic pool; wherein (g) is the gene length, (Pg) is the probability of spontaneous mutation emergence in a specific gene and (% PG) is the percentage of spontaneous mutation probability of the whole genome.
Table 2. Describes the probability of spontaneous mutations in RT, INT and PROT genes of HIV-1, which are translated into the main target proteins of antiretroviral therapy.
Table 3. Shows the analysis of probability of spontaneous mutations in structural genes gag and env.
Spontaneous mutation can arise from a variety of sources, and whatever the cause is, a large gene provides a large target and tend to mutate more frequently. Thus, the probability of spontaneous mutation is related to the ratio between the gene length and the whole genome length. This basic linear relation is used to formulate an equation that calculates the probability of emergence of spontaneous mutation in a certain gene per duplicate, depending on the ratio between the gene
length and the whole genome length ( ) in addition to the fixed genome-specific stability index (Si).
The drawbacks, which halt the development of HIV vaccines, are high mutability and variability of the virus. The mathematical analysis of each gene in the HIV-1 genome (Table 1) indicates that tat, vpr and vpu are the least mutant genes per duplicate, so they are the best candidates for HIV-1 recombinant subunit vaccines or as a part of “prime and boost” vaccine combinations. Also, the analysis indicates that the probability percentage of spontaneous mutation emergence in the major three genes of HIV-1 genome gag, pol and env is 16.36%, 32.8%, and 28% respectively. So, pol gene, which translated into polymerases enzymes, is the most susceptible gene for spontaneous mutations. Polymerases are currently the main targets for antiretroviral therapy and further analysis of pol gene indicates that reverse transcriptase RT gene is the most mutant among the polymerases.
The probability percentage of spontaneous mutations in the RT accounts for 22.63% of the total probability of spontaneous mutation emergence of the whole HIV-1 genome. Despite its high mutability, reverse transcriptase inhibitors should stay as a backbone of any highly active antiretroviral therapy (HAART). Reverse transcriptase, and due to its recombogenic properties and the absence of proofreading activity, is the core source of mutations in the HIV replication cycle. On the other hand, protease PROT gene is the least mutant in the polymerases. The probability percentage of spontaneous mutations in the PROT accounts for 3.23% of the total probability of spontaneous mutation emergence of the whole HIV-1 genome (Table 2). Accordingly, protease inhibitors are better candidates, as a base, for antiretroviral combination therapy and the protease inhibitor-based regime represents a high genetic barrier for HIV to overcome.
The proposed mathematical analysis has many supportive clinical data. For example, the United Kingdom has one of the highest reported rates of primary resistance to HIV drugs worldwide. UK Group on Transmitted HIV Drug Resistance stated that the prevalence of resistance to any antiretroviral drug, to nucleoside or nucleotide reverse transcriptase inhibitors (NRTI), to non-nucleoside reverse transcriptase inhibitors (NNRTI), or to protease inhibitors (PI) were 19.2%, 12.4%, 8.1%, and 6.6%, respectively . In Spain, a study stated that the prevalence was 5.8% for NRTI, 5% for NNRTI and 3.8% for PI . In Turkey, a study stated that the percentage of HIV-1 primary drug resistance mutations, in antiretroviral therapy-naive patients, was 5.2% for NRTI, 3.1% for NNRTI and 2.1% for PI . In Djibouti, a study indicated that among 16 patients with first-line ART failure, 56.2% showed reverse transcriptase inhibitor-resistant HIV-1 strains. But on the contrary, no protease inhibitor-resistant strains were detected . All these findings indicate that resistance emergence to protease inhibitors is much lesser than that of reverse transcriptase inhibitors.
In a wider scope, the main advantage of the proposed mathematical approach is providing a linear equation to calculate the probability of spontaneous mutation per duplicate for simpler viral genomes. Otherwise, further analysis is needed before recruiting this equation to make a mutability map for more complicated bacterial or eukaryotic genomes. If the equation is applicable to these more complex genomes, it will indicate that noncoding genome segments, which present in the genome of prokaryotes and eukaryotes by different portions, not only perform regulatory functions, but also protect the genetic information of the coding genome by providing a wider genetic pool.
Moreover, the proposed equation is useful for antiviral drug activity interpretation, as the mutability of the targeted protein plays an integral role in determining in vivo drug activity. Furthermore, the equation provides a general picture of the mutability of each gene in a targeted viral genome. This can be helpful during drug development researches, and during protocoling of combination therapy. The developers can target proteins translated from the relatively lesser mutating genes.
On the other hand, the main disadvantage of the proposed equation is numerical bias during expressing it with numerical values. As an example, the numerical value of the stability index for the HIV-1 genome can be biased. Mansky and Temin reported that the forward mutation rate for HIV-1 was 3.4 × 10−5 mutations per bp per cycle , while Cuevas et al used the intrapatient frequency of premature stop codons to quantify the HIV-1 genome-wide rate of spontaneous mutation in DNA sequences from peripheral blood mononuclear cells, which revealed a mutation rate of (4.1 ± 1.7) × 10−3 per base per cell .
In addition to the fact that the emergence of antiviral drug resistance is a multifactorial process , the proposed equation only provides the probability of spontaneous mutation emergence in a specific gene but does not determine which of these emergent mutations are lethal and which are not. The lethally mutated viral genomes fail to reach the plasma leading to mass deletion of the emergent mutation.
The mathematical analysis of the HIV-1 genome indicates that tat, vpr and vpu are the least mutant genes per duplicate, so they are the best candidates for HIV-1 recombinant subunit vaccines or as a part of “prime and boost” vaccine combinations. Also, protease inhibitors are better candidates, as a base, for antiretroviral combination therapy and the protease inhibitor-based regime represents a high genetic barrier for HIV to overcome. In a wider scope, the proposed equation offers a wider array of options for drug developers and during drug combination protocoling to help predict in vivo antiviral drug activity and to deal with mutation-induced drug resistance.
Availability of Data and Materials
All datasets, on which the conclusions of the manuscript rely, are presented in the main paper.