An Optimal Control Approach to Structured Treatment Interruptions for HIV Patients: A Personalized Medicine Perspective

Show more

1. Introduction

There are few diseases in our society today that are more polarizing, politicized, and personal than human immunodeficiency virus/acquired immune deficiency syndrome, or HIV/AIDS. According to the Joint United Nations Programme on HIV and AIDS (UNAIDS), there were 34 million people living with HIV/AIDS in 2010, 22.9 million of which live in sub-Saharan Africa. Despite the prevalence and availability of effective treatment of HIV, even as late as 2010 the best practices for management of acute HIV infection are still unknown [1] . The time to initiate treatment and the manner of treatment are still the subject of many ongoing clinical trials that are investigating the treatment benefits versus risks.

Throughout the last few decades, mathematical models have been developed to further the understanding of the progression of the virus as it replicates in-host. These models focus on the interaction between the virus and the CD4^{+} cells of the immune system, one of the virus’ primary targets [2] [3] [4] . The early models of progression were critical in the understanding of the virus and the development of early treatments, as the science and understanding were lagging behind the impact of the spreading epidemic.

The models of in-vivo dynamics are most often formulated as a system of nonlinear ordinary differential equations (ODE), though recently several size structured partial differential equation models have been developed. These models allow for variations in viral production rates as a function of time, since it is thought that as infected cells age their overall fitness decreases [5] [6] . The ordinary differential equation models that we utilize are based on a compartmental and mass-balance analysis of the biological system. These models are developed to incorporate as many relevant physiological principles as possible, though at the cost of increased model complexity.

A primary goal of using modern models of HIV is to influence treatment decisions and construct better treatment protocols for infected patients. The construction of optimal control methodologies has been investigated by many authors, [7] - [16] , however in almost all of these cases the authors worked with simulated data and were not patient specific in nature. To use these dynamic models in a patient specific capacity, the models must first be calibrated to each individual patient. While much attention has been paid to the forward simulation problem over the years, less attention has been focused on the so-called inverse problem. Inverse problems are formulated when we wish to determine the parameters that characterize the system using measurements. One of the main obstacles in solving the inverse problem is that biological and physiological models are often nonlinear and contain many parameters. Additionally, the data for validation is often sparse, or possibly representative of only a portion of the model. Before parameter estimation methods can be applied to an ODE model, an investigation into the identifiability of the parameters, given available data, must be performed.

The investigation into parameter identifiability is not unique to the mathematical sciences, and many papers in engineering, statistics, and biomedical engineering have shed light into this area [17] [18] . Identifiability in an HIV model context has been examined in [19] [20] where structural and observer perspectives were used with a less complex model. In this paper we consider the parameter identifiability of a seven state, twenty two parameters HIV model based on clinically obtained measurements of both CD4^{+} count and viral load. We utilize both sensitivity analysis and identifiability analysis methods to compute patient specific best-estimatable subsets of parameters. In [21] an identifiability analysis was conducted on an identical model, however the authors used a different, ad-hoc subset selection procedure.

Finally, the paper investigates the construction of optimal control and treatment therapies in a patient specific context, i.e. in the setting of personalized medicine or precision medicine. We develop treatments known as structured treatment interruptions (STI), wherein the effective treatment is terminated at specified times and then restarted in an attempt to increase an in-host immune response and provide a drug holiday for the patient. These treatment methods have been studied in detail, though a lack of double-blind, randomized clinical trials has lead to a lack of consensus on the effectiveness of these types of treatments [22] [23] [24] [25] . We implement the optimal STI using the receding horizon control (RHC) methodology [12] . Since RHC is a feedback control, it allows for adjustment for unexpected perturbations to the treatment schedule, e.g. the patient decides to go off-treatment for a period of time not otherwise prescribed.

The organization of the paper is as follows. Section 2 describes the mathematical model of the HIV dynamics, including the control mechanisms. This section also describes the a-priori identifiability analysis, the available data and its incorporation into the parameter estimation problem, and the results of the model validation. In particular, we show how the model can well-fit both the CD4^{+} count and viral load clinical data. The optimal control formulation to compute optimal treatment strategies of an STI type is considered in Section 3. In this section we also provide computed optimal control simulations along side the clinical data and discuss the relevance.

2. An HIV Model

There have been numerous advances in modeling HIV viral dynamics in recent years. An excellent general overview of the modeling challenges of HIV, especially with regards to treatment protocols and stability analyses is given in [3] . In this section, we consider a model developed in [26] .

The dynamics of the HIV model are described by the system of nonlinear ordinary differential equations

$\begin{array}{l}{\stackrel{\dot{}}{T}}_{1}={\lambda}_{1}-{d}_{1}{T}_{1}-\left(1-{\epsilon}_{1}\right){k}_{1}{V}_{I}{T}_{1},\text{\hspace{0.17em}}{\stackrel{\dot{}}{T}}_{2}={\lambda}_{2}-{d}_{2}{T}_{2}-\left(1-f{\epsilon}_{1}\right){k}_{2}{V}_{I}{T}_{2}\\ {\stackrel{\dot{}}{T}}_{1}^{\ast}=\left(1-{\epsilon}_{1}\right){k}_{1}{V}_{I}{T}_{1}-\delta {T}_{1}^{\ast}-{m}_{1}{T}_{1}^{*}E,\text{\hspace{0.17em}}{\stackrel{\dot{}}{T}}_{2}^{\ast}=\left(1-f{\epsilon}_{1}\right){k}_{2}{V}_{I}{T}_{2}-\delta {T}_{2}^{\ast}-{m}_{2}{T}_{2}^{*}E\\ {\stackrel{\dot{}}{V}}_{I}=\left(1-{\epsilon}_{2}\right){N}_{T}\delta \left({T}_{1}^{\ast}+{T}_{2}^{\ast}\right)-\left(c+\left(1-{\epsilon}_{1}\right){\rho}_{1}{k}_{1}{T}_{1}+\left(1-f{\epsilon}_{1}\right){\rho}_{2}{k}_{2}{T}_{2}\right){V}_{I}\\ {\stackrel{\dot{}}{V}}_{NI}={\epsilon}_{2}{N}_{T}\delta \left({T}_{1}^{\ast}+{T}_{2}^{\ast}\right)-c{V}_{NI},\text{\hspace{0.17em}}\stackrel{\dot{}}{E}={\lambda}_{E}+{b}_{E}\frac{{T}_{1}^{\ast}+{T}_{2}^{\ast}}{{T}_{1}^{\ast}+{T}_{2}^{\ast}+{K}_{b}}E-{d}_{E}\frac{{T}_{1}^{\ast}+{T}_{2}^{\ast}}{{T}_{1}^{\ast}+{T}_{2}^{\ast}+{K}_{d}}E-{\delta}_{E}E,\end{array}$ (1)

with an initial condition vector

${\left[{T}_{1}\left(0\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}{T}_{2}\left(0\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}{T}_{1}^{\ast}\left(0\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}{T}_{2}^{\ast}\left(0\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}{V}_{I}\left(0\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}{V}_{NI}\left(0\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}E\left(0\right)\right]}^{{\rm T}}\mathrm{.}$

The model compartments are
${T}_{1}$ (type 1 target cells, e.g. CD4^{+} T-cells, cells/ ml blood),
${T}_{2}$ (type 2 target cells, e.g. macrophages, cells/ml blood),
${V}_{I}$ (infectious free viron particles, RNA copies/ml blood),
${V}_{NI}$ (non-infectious free viron particles, RNA copies/ml blood) and
$E$ (cytotoxic T-lymphocytes (CTL), cells/ml blood). A superscript asterisk denotes infected cells. This model has two important features. The presence of a second target cell type,
${T}_{2}$ satisfies a modeling requirement suggested in [4] : a reasonable HIV model predicts a non- zero steady-state viral load even in the presence of effective therapy. Because of this, one can not expect the viral load to be driven to zero, only to be reduced significantly. This HIV specific immune response state
$E$ consists of CTLs lysing antigens to HIV infected cells, killing them at rates
${m}_{1}$ and
${m}_{2}$ for each class of target cell. The immune response model used in this work does not directly clear viral particles, and so there is no interaction between the immune compartment and the virus compartments. In [27] the authors extended the model by adding a second immune effector state that represents CTL cells with memory. While that model incorporates a greater degree of HIV dynamics, it is shown in [21] that the model used in this paper is sufficient for clinical data fitting and model prediction.

External mechanisms by which the virus is controlled are incorporated into the model. We allow for therapeutic treatment by reverse transcriptase inhibitors (RTIs) and protease inhibitors (PIs). RTIs interfere with the viral RNA to DNA synthesis; when reverse transcriptase is inhibited HIV can enter a cell but the host cell will not become infected. PIs cause infected cells to produce non- infectious virions. Thus when considering treatment that includes PIs, we consider both infectious and noninfectious viral particles. Noninfectious viral particles can remain present even in the absence of PIs. The prevailing paradigm of HIV treatment is to prescribe a highly active antiretroviral therapy (HAART) consisting of both PIs and RTIs. The treatment factors are given as ${\epsilon}_{1}\left(t\right)={\stackrel{\xaf}{\epsilon}}_{1}u\left(t\right)$ and ${\epsilon}_{2}\left(t\right)={\stackrel{\xaf}{\epsilon}}_{2}u\left(t\right)$ , consisting of efficacy ${\stackrel{\xaf}{\epsilon}}_{1}\in \left[\mathrm{0,1}\right]$ corresponding to a reverse transcriptase inhibitor and ${\stackrel{\xaf}{\epsilon}}_{2}\in \left[\mathrm{0,1}\right]$ modeling the effectiveness of a protease inhibitor with a time dependent function $u\left(t\right)\mathrm{,0}\le u\left(t\right)\le 1.$ We assume that the treatment has a reduced efficacy for the population of ${T}_{2}$ , where the efficacy is $f{\epsilon}_{1}\left(t\right)\mathrm{,}f\in \left(\mathrm{0,1}\right)\mathrm{.}$ Treatment protocols are always implemented as a combination of both PIs and RTIs, so monotherapy is not considered.

The model (1) contains 22 biologically relevant parameters that must be specified before numerical simulations can be performed. We evaluate the model at a nominal set of parameter values that is justified through literature and past clinical studies; these values are given in Table 1. The parameter values are taken primarily from work done by [4] [28] . Several of the parameters are not available from human or animal data. The nominal values for ${\lambda}_{1}\mathrm{,}{k}_{1}\mathrm{,}{\lambda}_{2}$ and ${k}_{2}$ are

Table 1. HIV model parameters and their nominal values.

chosen so that several conditions on viral load and target cell equilibria are satisfied. These conditions are actually not reflected in our model since we have omitted the chronically infected state from the Callaway-Perelson model, though it is believed that small adjustments to the parameter values will obtain the same qualitative behavior [26] .

The parameters used in the immune response state are not well known, and are chosen primarily to exhibit expected model behavior in the simulations. The parameters ${m}_{1}\mathrm{,}{m}_{2}$ represent the effectiveness that the immune response $E$ clears the infected cells of the target populations. The commonly used values are taken from [4] .

Because the range of both the state variables and parameter values can vary over several orders of magnitude, we use a ${{\displaystyle log}}_{10}$ transformed system for the state variables. We define the new state variable ${x}_{i}\left(t,q\right)={\mathrm{log}}_{10}{\stackrel{\xaf}{x}}_{i}\left(t,q\right)$ , where ${\stackrel{\xaf}{x}}_{i}$ represents the original, non-log transformed state. We also log-transform the parameters, $q\to {\mathrm{log}}_{10}q\mathrm{,}$ as they also varying widely in magnitude (see Table 1). The new system of equations are then given by

${\stackrel{\dot{}}{x}}_{i}\left(t,q\right)=\frac{{10}^{-{x}_{i}}}{\mathrm{ln}10}f\left({10}^{{x}_{i}\left(q\right)},{10}^{q}\right)$ (2)

where $f$ is the right hand side of (1). Performing this substitution puts both the states and parameters within an order of (log) magnitude of each other, allowing for a more robust estimation procedure. This transformation also resolves issues of physically unrealistic negative solutions that occur due to round- off error.

2.1. Sensitivity and Identifiability Analysis

A first step in the a-priori identifiability analysis of a system is to determine which parameters contribute greatest to the output of the system. For our model, the output of the system is
${y}_{1}={T}_{1}+{T}_{1}^{\ast}$ representing the CD4^{+} count and
${y}_{2}={V}_{I}+{V}_{NI}$ representing the viral load. Sensitivity analysis has traditionally primarily been used in the analysis of the forward problem when one needed to understand the effects of parameter perturbation on the output. In recent years sensitivity analysis has become an important component of inverse problem analysis, partly because it directly aids in uncertainty analysis. When physiologically based mathematical models are developed, the parameters typically have biological relevance, and thus one may wish to estimate these parameters from data. If these parameters are insensitive then they could be difficult to estimate despite their relevance. Sensitivity analysis determines the degree to which parameters affect system output. Sensitivity analysis can also aid in the optimal design of experiments and data collection [29] , possibly maximizing parameter sensitivity.

Sensitivity analysis constitutes a subset of techniques within identifiability analysis. An excellent survey of methods for identifiability in ODE models is given in [30] . In that paper, the authors define a system to be identifiable if the parameter $q$ can be uniquely determined from the system input $u\left(t\right)$ and the measured system output $y\left(t\right)$ . The system is said to be unidentifiable if $q$ can not be uniquely identified. A wide variety of techniques have been developed for identifiability analysis, including power series expansions and similar transforms, the use of differential algebras, and the implicit function theorem. These methods are not dependent on experimental observations, and are termed methods of structural identifiability. We concern ourselves primarily with practical identifiability analysis, which combines both the model structure information and experimental observations to determine which parameters are able to be identified. For a detailed discussion of our implementation of the practical identifiability analysis we refer the reader to [31] .

2.2. Clinical Data

The data available have come from a clinical study at Massachusetts General Hospital of over 100 adults with acute HIV infection between 1996 and 2004. The data collected consists of CD4^{+} T-lymphocyte count (cells/ml) and RNA viral load (RNA copies/ml) represented by
${y}_{1}={T}_{1}+{T}_{1}^{\ast}$ and
${y}_{2}={V}_{I}+{V}_{NI}$ , respectively. The viral load is quantified through different assay techniques; the standard assay can detect viral loads in the range of 400 to 750,000 copies/ml, while an ultra-sensitive assay has a detection range of 50 to 100,000 copies/ml. The ultra-sensitive assay is used when a measurement is below the 400 copies/ml lower detection limit for the standard assay. This can often be the case when patients are successfully suppressing the virus due to HAART.

Due to the range sensitivity limits of the assays, the clinical viral load effectively have upper and lower limits of quantification. The upper limit of quantification is addressed by repeatedly diluting the sample until it falls within the detection range. The lower limit, or left censoring point, directly affects the collected data and how it is used within the inverse problem. When a datum is returned at a censor point, ${L}_{1}=400$ copies/ml or ${L}_{2}=50$ copies/ml, the only available knowledge is that the true measurement is between zero and the left quantification limit. These limits are illustrated in the longitudinal data set from patient identified by the number 1, shown in Figure 1. The data points appearing directly on the horizontal reference lines are said to be censored.

In addition to the data being censored at certain time values, the data are also collected at different time intervals. These intervals also differ between patients. This is visible in Figure 1 as well; the observations of viral load and CD4^{+} may not have been made at the same time points. In general for patient
$j$ , we have CD4^{+} data pairs
$\left({t}_{1}^{ij},{y}_{1}^{ij}\right),i=1,\cdots ,{N}_{1}^{j}$ and possibly different viral RNA pairs
$\left({t}_{2}^{ij},{y}_{2}^{ij}\right),i=1,\cdots ,{N}_{2}^{j}$ .

2.3. Model Validation

Before we use the HIV model given in (1) to design the optimal treatment for

Figure 1. Patient 1 data set, with the censor points indicated by horizontal lines, ${L}_{1}={\mathrm{log}}_{10}400,{L}_{2}={\mathrm{log}}_{10}50$ . The green and red bars on the axis indicate patient treatment adherence and non-adherence, respectively.

HIV patients in the context of personalized medicine, we first calibrate the model parameters to the clinical data through the parameter estimation formulation.

The parameter estimation problem is patient specific; we use data from a single patient j in order to estimate the patient specific parameter value ${q}^{j}$ . A common approach for conducting parameter estimation in ODE models is through minimization of an ordinary least squares cost function [33] . To this end, we let ${z}_{s}\left({t}_{s}^{ij}\right),s=1,2$ denote the model observation for patient j, output s, at time index i. That is, ${z}_{1}\left({t}_{1}^{ij}\right)={T}_{1}\left({t}_{1}^{ij}\right)+{T}_{1}^{\ast}\left({t}_{1}^{ij}\right)$ and ${z}_{2}\left({t}_{2}^{ij}\right)={V}_{I}\left({t}_{2}^{ij}\right)+{V}_{NI}\left({t}_{2}^{ij}\right)$ . We define the cost function

$J\left(q\right)={\displaystyle \underset{s=1}{\overset{2}{\sum}}}\frac{1}{{N}_{s}^{j}}{\displaystyle \underset{i=1}{\overset{{N}_{s}^{j}}{\sum}}}{\left|{z}_{s}\left({t}_{s}^{ij};q\right)-{y}_{s}^{ij}\right|}^{2},$ (3)

over an admissible parameter space $\Omega \subseteq {\mathbb{R}}^{\mathcal{l}}$ , where $\mathcal{l}$ is the number of parameters being identified by the practical identifiability analysis. The parameter estimation problem is to compute the unique minimizer of J, ${q}^{\star j}=argminJ\left(q\right)$ , if it exists. Note that uniqueness of the minimizing parameter vector is an issue addressed by the identifiability analysis. Assuming the data are uncensored and produced by the model with additive and independent Gaussian noise, the ordinary least squares estimate is also the maximum likelihood estimate. However, in our case there is censoring in the data as discussed in section 2.2, and least squares does not suffice as a statistically rigorous methodology. Instead we compute maximum likelihood parameter estimates using an expectation maximization (EM) algorithm [32] . Details regarding the use of this algorithm with ODE models can be found in [31] .

We begin the parameter estimation problem by performing for each patient an a-priori sensitivity analysis at the nominal parameter values ${q}^{\ast}$ given in Table 1. As each patient model is evaluated at the same ${q}^{\ast}$ , the only factor that affects parameter identifiability is the patient adherence to treatment.

Before a subset specific estimate is formed, we first attempt to estimate all 22 parameters using the global optimization algorithm DIRECT with the EM algorithm. DIRECT is a bounded, sampling method based on dividing hyper-rec- tangles [34] . The termination criterion for DIRECT is only the number of iterations, and we only allow 3. DIRECT is sometimes used to determine a “better” local initial iterate for gradient based optimizers, and our approach is similar in this regard. The parameters that are not being estimated are fixed to the output of the DIRECT optimization, and it is at this point we begin the gradient based method with EM to further refine the parameter estimates on the identifiable subset.

The number of identifiable parameters range from $k=8$ to $k=10$ . In addition to these parameters, all seven initial conditions were estimated for each patient. Each patient subset is given in Table 2.

Representative of the parameter estimation results are depicted in Figure 2 and Figure 3 for patients identified by number 1 and 3, respectively (for the fit of the model to other patient data see [31] ). In each figure the collected clinical

Table 2. Patient specific identifiable subsets.

Figure 2. Collected and adjusted data with model prediction for patient 1. The red and green bars along the axis indicate patient adherence to therapy. Clinically collected data are in blue, and EM adjusted data are in black. The red trajectory is the calibrated model prediction.

data are given in blue dots as well as the adjusted data (black circles) calculated through the expectation maximization algorithm. The calibrated model prediction is the solid red line. Near the bottom of the figures are structured treatment

Figure 3. Collected and adjusted data with model prediction for patient 3. The red and green bars along the axis indicate patient adherence to therapy. Clinically collected data are in blue, and EM adjusted data are in black. The red trajectory is the calibrated model prediction.

interruption as prescribed by the medical doctors for each patient (green for on-treatment and red for off-treatment). In general, the model fits the clinical data reasonably well.

3. Optimal STI Therapy

Highly Active Antiretroviral Therapy (HAART) has changed the course of HIV treatments since its introduction. Under HAART, viral load decreases exponentially in the weeks after the start of treatment and is maintained at low levels so long as treatment is continued. Despite this exponential decline in viral load, these treatments do not completely clear the virus from the patient, and a life long therapy is necessary to maintain low viral load. Even with this kind of success, the best way to treat acutely infected HIV patients is still an open question with many approaches under investigation [1] . For many patients, long term continuous HAART is expensive and can include problems with drug toxicity and side effects, as well as increased drug resistance [35] . It has been shown that approximately 80% of patients who adhered to less than 70% of the prescribed treatment regimen experienced treatment failure, whereas patients of high adherence (>95%) experienced the best results [36] . Long term use of protease inhibitors has been associated with insulin intolerance, cholesterol elevation, and the redistribution of body fat. Because of these reasons, some HIV infected patients will voluntarily terminate HAART. Some of these patients will also interrupt the continuous prescribed therapies for short or long periods. After discontinuing HAART, patients will usually experience a rapid increase in viral load coupled with a immediate decline in CD4^{+} counts [22] .

The canonical example of a patient undergoing unsupervised breaks in HAART is that of the “Berlin patient” [23] . In this case, the patient was able to control viral load in the absence of treatment by cycling HAART on and off due to non-related infections. When treatment was fully stopped after day 176, the patient’s viral load did not immediately rebound as is consistent with going off treatment. Due to this patient, there have been a lot of interest in the use of structured treatment interruptions (STI) as a mechanism to regulate an HIV infection.

One of the key variables in the design of any STI based protocol is the criteria in which patients are removed or placed back on treatment. In some studies the viral load was the primary indicator as to when treatment would resume, whereas in other studies off periods were of a fixed length, followed by a fixed length on period. The former approach is simpler, but harder to administer as it requires a higher level of patience scrutiny with more expensive and inconvenient blood tests. The latter strategy is easier to implement in larger patient groups, but the on/off periods are still up for debate and the subject of many clinical trials. Currently the period durations are determined either in an ad-hoc style, or by trying to adapt the methodologies of other trials.

The use of mathematical modeling to determine optimal STI regimes has flourished in recent years. The survey paper [37] details the design of STI strategies using HIV models and emphasizes that the use of models may be beneficial in a clinical design scenario. In [38] the authors build upon the original work of [39] and use an in-host HIV model to simulate many possible different STI treatments. In contrast to our work however, there is no patient specific data or parameter estimates used and the STI treatments are not calculated, only evaluated; several insights into the effects of STI treatments are gained through the “virtual STI” trials.

3.1. A RHC Approach to STI Therapy

Although several control paradigms exist for the control and simulation of nonlinear systems, not many nonlinear controllers can be applied to the discrete sample-and-hold methodology of STI treatment. One method that is highly promising and is well suited to finding an optimal treatment amenable to HIV treatment is the receding horizon control (RHC) [40] . Receding horizon control has also explicitly been used to compute optimal treatment schedules in [41] , though with a very different, immune response centric model. The authors were some of the first investigators to apply RHC methodologies to biological systems.

Receding horizon, or model predictive control seeks to gain the benefits of a feedback control while maintaining the overall existing control methodology (i.e. STI) and model structure. The basic structure of the RHC methodology is shown in Figure 4. RHC is a feedback control system that operates by computing the

Figure 4. Schematic diagram of the receding horizon control.

current control from solving a finite time open loop control problem, using the current state of the system as the initial state. This implicit optimization then yields a control for future time, and the first control in the sequence is applied to the system for a specified control window. This is the primary difference with the open loop methodologies, where the controls are computed for the entire simulation interval before simulation.

This framework is uniquely suited for the computation of controls that fall within the STI paradigm under HAART. Since the number of elements in the control sequence is finite due to the on-off nature of the STI control, the existence of an optimal control on each window is guaranteed. Moreover, this approach is model invariant, in that the HIV model can be updated or swapped out with a model with minimal modifications to the control setup. Lastly the RHC parameters such as horizon and window length, discussed later, can be easily modified based on clinical needs and observations.

There are several basic elements to the RHC system, outlined in [42] :

1) Model equations which govern the system dynamics.

2) The calculation of a sequence of optimal control laws, subject to the cost function.

3) A “receding horizon” strategy, so that on each interval or horizon, that the control is computed is shifted forward in time. The control that is computed is then only used for a portion of the horizon length.

To solidify the RHC methodology, we present the following control problem formulation. Together with the model equations (1), we consider a control problem defined by the objective function,

$J\left(u\left(t\right)\right)={\displaystyle {\int}_{0}^{{t}_{f}}}\left[QV\left(t\right)+{R}_{1}{\epsilon}_{1}{\left(t\right)}^{2}+{R}_{2}{\epsilon}_{2}{\left(t\right)}^{2}-SE\left(t\right)\right]\text{d}t,$ (4)

where ${\epsilon}_{1}\left(t\right)={\epsilon}_{1}u\left(t\right)$ , an RTI, and ${\epsilon}_{2}\left(t\right)={\epsilon}_{2}u\left(t\right)$ , a PI. The values ${\epsilon}_{1}$ and ${\epsilon}_{2}$ are the patient specific values estimated in Section 2. The parameters $Q\mathrm{,}{R}_{1}\mathrm{,}{R}_{2}$ and $S$ represent the control weights for the virus, control input, and immune response, respectively. By varying these weights, the optimal control can be customized in a patient specific sense, and exact treatments can be designed. By including the control terms in (4) we attempt to minimize the systematic cost of the drugs both in the sense of physiologic side effects as well as the monetary treatment cost.

For the RHC formulation, we denote $\left[{t}_{i}\text{\hspace{0.17em}}{t}_{i+1}\right]$ a sequence of time intervals. Let ${t}_{hor\mathrm{,}i}$ such that ${t}_{hor,i}\ge {t}_{i+1}-{t}_{i}$ be the control horizon. We denote ${t}_{i+1}-{t}_{i}$ to be ${t}_{win}$ , the current control window. Consider a sequence of control problems ${P}_{i}$ ,

${u}^{*}\in \mathcal{U}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{s}\text{.t}\text{.}\text{\hspace{0.17em}}{u}^{*}=\mathrm{arg}\mathrm{min}J\left(u\right)={\displaystyle {\int}_{{t}_{i}}^{{t}_{i}+{t}_{hor,i}}}\left[QV\left(t\right)+{R}_{1}{\epsilon}_{1}{\left(t\right)}^{2}+{R}_{2}{\epsilon}_{2}{\left(t\right)}^{2}-SE\left(t\right)\right]\text{d}t,$ (5)

on the interval $\left[{t}_{i}\text{\hspace{0.17em}}{t}_{i+1}\right]$ subject to the model equations given by (1). The solution to each control problem ${P}_{i}$ is computed by the direct search routine as outlined in [10] . In this context, the control functions are represented as ${\epsilon}_{1}\left(t\right)={\epsilon}_{1}u\left(t\right)$ and ${\epsilon}_{2}\left(t\right)={\epsilon}_{2}u\left(t\right)$ , where $u\left(t\right)$ is a binary treatment function, with a 1 representing a patient taking the drug during the given treatment interval, and a 0 meaning no treatment taken for that specific time period. The quality of control can depend on the treatment interval, which is the minimum of time necessary to change the treatment protocol. A sample control input is shown in Figure 5. Computing control functions of the form shown in Figure 5 is akin to calculating a series of structured treatment interruptions, rather than a continuous treatment.

Mathematically, we represent the control numerically with a pair of vectors describing the control function $u\left(t\right)$ and the corresponding time intervals the control applies to. For example, for a treatment regimen with a treatment interval of $m=15$ days, a possible control could be $u\left(t\right)=\left[1\text{\hspace{0.17em}}\text{\hspace{0.17em}}1\text{\hspace{0.17em}}\text{\hspace{0.17em}}0\text{\hspace{0.17em}}\text{\hspace{0.17em}}0\text{\hspace{0.17em}}\text{\hspace{0.17em}}1\right]$ . This corresponds to an overall treatment length of 75 days with on treatment for the first 30 days, off for the next 30, and then back on for the final 15 days of treatment. For any treatment interval with n switchings of treatment, there exists a possible ${2}^{n}$ ways to define a treatment. Because this is a finite set, there exists an optimal control that minimizes (5). We denote the set of all control vectors $\mathcal{U}$ , with $\left|\mathcal{U}\right|={2}^{n}$ . We then seek the optimal control,

${u}^{*}\in \mathcal{U}=\mathrm{arg}\mathrm{min}J\left({\epsilon}_{1}\left(t\right),{\epsilon}_{2}\left(t\right)\right).$ (6)

On average, the data sets contain four years of longitudinal patient data,

Figure 5. A sample control input for the HIV model. The switching of treatment represents a structured treatment inter- ruption.

therefore $n\approx 1400$ days. Because n is relatively large, modeling daily changes in treatment is both impractical from a computational and clinical stand point. In practice, an interval of two to three weeks is clinically appropriate and is what we simulate.

If we assume a 30 day treatment interval, modeling the entire control interval could involve the calculation of nearly 50 control values, or switches. As the problem is defined, $m=30$ and $n\approx 1400$ would involve testing upwards of ${2}^{50}$ possible treatment protocols; clearly this is not computationally tractable. To mitigate this issue, we seek to reduce the number of iterations necessary to find the optimal control. One strategy is to break the treatment interval into bins and calculate independent controls on these intervals before combining all of these controls for the entire interval. For instance, for 30 day segments we first compute an optimal control over the set $\left[\mathrm{0,30,60,90,120,150,180}\right]$ , a calculation that involves the testing of ${2}^{7}=128$ distinct possible controls. Once this control is computed, the resulting trajectories are integrated to $t=180$ , and a control is computed over the set $\left[\mathrm{180,210,240,}\cdots \mathrm{,360}\right]$ . This control is then appended to the prior control vector, and this sequence is repeated until the final time ${t}_{f}$ is reached.

Because we divide the treatment interval into several separate bins, the resulting control is only suboptimal; however in [10] they show that these results are reasonable approximations to a fully efficacious continuous therapy. In our control synthesis, we compare 10, 20, and 30 day treatment intervals.

This process is extended by RHC, outlined as follows:

1) Given an initial condition $x\left({t}_{i}\right)$ solve the optimal control problem ${P}_{i}$ on the horizon interval $\left[{t}_{i}\text{\hspace{0.17em}}{t}_{hor\mathrm{,}i}\right]$ .

2) Use the control calculated in the prior step to compute the trajectory over the current control window $\left[{t}_{i}\text{\hspace{0.17em}}{t}_{i+1}\right]$ .

3) Repeat this process by extending to the next control horizon to compute the next control. Terminate when the next ${t}_{i}>{t}_{f}$ .

A schematic overview of this process is given in Figure 6. Note that in this system, each successive control computed on each new window is now implicitly a function of the prior window’s final state-now the current window’s initial condition. The window length and horizon length vary per simulation given different patient parameters, and are listed with each simulation presented.

Receding control depends on having full state knowledge in the observations of the system. Our control functions are computed after the completion of the inverse problem, so we assume our model to be accurate and predictive for each patient, therefore yielding full state knowledge at each time step.

For problems where there exists only a reduced observation of the system, a (typically) nonlinear estimation of the true state values is needed for the RHC to operate successfully. In cases such as this, the estimator is installed between the system model and the optimizer, as shown in Figure 7.

3.2. Optimal STI Results

In this section we present optimal controls and the resulting state solutions for

Figure 6. An example of receding horizon control. Future controls are calculated by computing controls on successive intervals.

Figure 7. Schematic diagram of the receding horizon control with a non- linear state estimator.

patient 3 under utilizing differing control weights and treatment intervals. We wish to illustrate that treatment protocols can be swayed and adjusted on a patient specific basis using the control weights. We also show cases with an “unexpected” perturbation to the optimal treatment, e.g. forced off treatment to demonstrate the feedback capability of the RHC. The control weights are presented as the vector $\left[Q\text{\hspace{0.17em}}\text{\hspace{0.17em}}S\text{\hspace{0.17em}}\text{\hspace{0.17em}}{R}_{1}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{R}_{2}\right]$ . In all of the following plots, the green and red bars at the bottom of the plot indicate clinical adherence to control, whereas the cyan and magenta at the top of the plot indicate the optimally computed control of on treatment and off treatment, respectively.

3.3. Varying Treatment Intervals

We first investigate the impact of different treatment intervals has on viral load. Figure 8 and Figure 9 differ only in the treatment interval. We first see a 20 day schedule, which results in an average viral of load of less than 1000 RNA copies/ml. The control is on for alternating periods of 60 days, followed by 40 days off, then 60 days on again.

When the interval is increased to 30 days in Figure 9 the average viral load increases to over 1500 RNA copies/ml. The control is on for the first 60 days, followed by an off treatment for 30 days, repeating periodically. Despite having a greater number of switching times, the control is off for fewer day, 30 off, versus

Figure 8. Patient 3, with weights $\left[0.01\text{\hspace{0.17em}}\text{\hspace{0.17em}}1\text{\hspace{0.17em}}\text{\hspace{0.17em}}0.01\text{\hspace{0.17em}}\text{\hspace{0.17em}}0.1\right]$ , with 20 day treatment intervals, 100 day control window, and 720 day control horizon.

Figure 9. Patient 3, with weights $\left[0.01\text{\hspace{0.17em}}\text{\hspace{0.17em}}1\text{\hspace{0.17em}}\text{\hspace{0.17em}}0.01\text{\hspace{0.17em}}\text{\hspace{0.17em}}0.1\right]$ , with 30 day treatment intervals, 90 day control window, and 720 day control horizon.

40 off with $m=20$ in Figure 8. With 30 day treatment intervals, the viral load peaks to a level less than that of using 20 day intervals. The on/off treatment doesn’t decrease the viral load to points achieved due to clinical adherence, which is constant on for nearly two years. However, when the patient goes off treatment even for a matter of days, the viral load spikes to nearly 40,000 RNA copies/ml. This kind of spike does not occur with the STI style regime produced by the RHC.

Compare these results with Figure 10, where we use a 15 day treatment interval. Treatment begins with 45 days of on treatment, before entering into a periodic switching of 30 days off, 15 days on, 15 days off, 30 days on. Due to the greater amount of time spent off control, the lowest viral load is greater than that of other treatment intervals. Through these results we see that having access to a greater number of switchings, or shorter treatment intervals, does not necessarily produce better results overall. Further, we see in general a greater amount of control used in the first 100 or so days, indicating the need to control HIV greater in the acute infection regimes.

3.4. Varying Control Weights

One advantage of the control methodology presented is being able to customize a treatment for a given patient. This can be done by varying the control weights to emphasize or deemphasize specific terms in the goal function (4). Between Figure 8 and Figure 11 we see the effect of changing the weight on viral load reduction by a factor or 10. Nearly twice as much control is used when $Q=0.01$ versus $Q=0.001$ . There are currently ongoing clinical trials that are exploring STI treatments by using more frequent on-treatments compared to prior studies.

Figure 10. Patient 3, with weights $\left[0.01\text{\hspace{0.17em}}\text{\hspace{0.17em}}1\text{\hspace{0.17em}}\text{\hspace{0.17em}}0.01\text{\hspace{0.17em}}\text{\hspace{0.17em}}0.1\right]$ , with 15 day treat- ment intervals, 90 day control window, and 720 day control horizon.

3.5. Unexpected Treatment Perturbations

In Figure 12 we force an off treatment protocol for 100 days early in the simulation. The control returns the viral load to the same level in Figure 11, where treatment is not modified.

4. Concluding Remarks

In this work we have examined an in-vivo model describing the acute and chronic infection regimes of HIV. With this model, we first have performed a

Figure 11. Patient 3, with weights $\left[0.001\text{\hspace{0.17em}}\text{\hspace{0.17em}}1\text{\hspace{0.17em}}\text{\hspace{0.17em}}0.01\text{\hspace{0.17em}}\text{\hspace{0.17em}}0.01\right]$ , with 20 day treat- ment intervals, 100 day control window, and 720 day control horizon.

Figure 12. Patient 3, with weights $\left[0.001\text{\hspace{0.17em}}\text{\hspace{0.17em}}1\text{\hspace{0.17em}}\text{\hspace{0.17em}}0.01\text{\hspace{0.17em}}\text{\hspace{0.17em}}0.01\right]$ , with 20 day treat- ment intervals, 100 day control window, and 720 day control horizon. The control was fixed to zero for 100 days early on in the simulation.

patient specific model calibration to clinically collected data sets consisting of CD4^{+} count as well as viral load, which are censored. We then take the first steps in investigating the construction of optimal treatment strategies, noting the lack of consensus in the best standards of treatment for acutely infected patients. To construct the treatment protocols, we use receding horizon control which computes several successive controls over a shifting interval. The broader controls are then used on short time intervals, allowing for the control to adjust to unexpected physical perturbations to the system. These controls mimic the form of structured treatment interruptions, and there are several ongoing clinical studies to assess their effectiveness against standard HAART therapy. In our control implementation, we assumed full state knowledge of the dynamic system. This is an unrealistic assumption and the use of nonlinear estimators would be beneficial. In addition, calibration and validation of more advanced HIV models could lead to greater insight into the replication processes during acute infection. For example, the development of new immune response models, an area that is currently poorly understood, could shed light into the complicated immune pro- cesses that occur post-infection. There are CD8 immune response data available, but we do not use them in this work, as the data do not match the type of immune response that we have modeled.

Acknowledgements

We thank the Editor and the referee for their comments. Research of H. Tran is funded in part by the National Science Foundation grant DMS 1022688 and by the National Institute of Allergy and Infectious Diseases grant NIAID 9R01AI071915.

References

[1] Bell, S.K., Little, S.J. and Rosenberg, E.S. (2010) Clinical Management of Acute HIV Infection: Best Practice Remains Unknown. The Journal of Infectious Diseases, 202, S278-S288.

https://doi.org/10.1086/655655

[2] Perelson, A.S., Neumann, A.U., Markowitz, M., Leonard, J.M. and Ho, D.D. (1996) HIV-1 Dynamics in Vivo: Virion Clearance Rate, Infected Cell Life-Span, and Viral Generation Time. Science, 271, 1582-1586.

https://doi.org/10.1126/science.271.5255.1582

[3] Perelson, A.S. and Nelson, P.W. (1999) Mathematical Analysis of HIV-1 Dynamics in Vivo. SIAM Review, 41, 3-44.

https://doi.org/10.1137/S0036144598335107

[4] Callaway, D. and Perelson, A.S. (2002) HIV-1 Infection and Low Steady State Viral Loads. Bulletin of Mathematical Biology, 64, 29-64.

https://doi.org/10.1006/bulm.2001.0266

[5] Nelson, P.W., Gilchrist, M.A., Coombs, D., Hyman, J.M. and Perelson, A.S. (2004) An Age-Structured Model of HIV Infection That Allows for Variations in the Production Rate of Viral Particles and the Death Rate of Productively Infected Cells. Mathematical Biosciences and Engineering, 1, 267-288.

https://doi.org/10.3934/mbe.2004.1.267

[6] Rong, L., Feng, Z. and Perelson, A.S. (2007) Mathematical Analysis of Age-Structured HIV-1 Dynamics with Combination Antiretroviral Therapy. SIAM Journal on Applied Mathematics, 3, 731-756.

https://doi.org/10.1137/060663945

[7] Brandt, M.E. and Chen, G. (2001) Feedback Control of a Biodynamical Model of HIV-1. IEEE Transactions on Biomedical Engineering, 48, 754-759.

https://doi.org/10.1109/10.930900

[8] Joshi, H.R. (2002) Optimal Control of an HIV Immunology Model. Optimal Control Applications and Methods, 23, 199-213.

https://doi.org/10.1002/oca.710

[9] Ge, S.S., Tian, Z. and Lee, T.H. (2005) Nonlinear Control of a Dynamic Model of HIV-1. IEEE Transactions on Biomedical Engineering, 52, 353-361.

https://doi.org/10.1109/TBME.2004.840463

[10] Adams, B.M., Banks, H.T., Kwon, H.D. and Trani, H.T. (2004) Dynamic Multidrug Therapies for HIV: Optimal and STI Control Approaches. Mathematical Biosciences and Engineering, 2, 223-241.

[11] David, J.A., Banks, H.T. and Tran, H.T. (2009) HIV Model Analysis under Optimal Control Based Treatment Strategies. International Journal of Pure and Applied Mathematics, 57, 357-392.

[12] David, J.A., Banks, H.T. and Tran, H.T. (2011) Receding Horizon Control of HIV. Optimal Control Applications and Methods, 32, 681-699.

https://doi.org/10.1002/oca.969

[13] Banks, H.T., Kwon, H.D., Toivanen, J.A. and Tran, H.T. (2006) A State-Dependent Riccati Equation-Based Estimator Approach for HIV Feedback Control. Optimal Control Applications and Methods, 27, 93-121.

https://doi.org/10.1002/oca.773

[14] Roy, P.K. and Chatterjee, A.N. (2011) Effect of HAART on CTL Mediated Immune Cells: An Optimal Control Theoretic Approach. In: Sio Long Ao, Ed., Electrical Engineering and Applied Computing, Len Gelman Springer, New York, Vol. 90, 595-607.

[15] Duwal, S., Winkelmann, S., Schutte, C. and von Kleist, M. (2015) Optimal Treatment Strategies in the Context of “Treatment for Prevention” against HIV-1 in Resource-Poor Setting. PLOS Computational Biology, 11, e1004200.

https://doi.org/10.1371/journal.pcbi.1004200

[16] Hernandez-Vargas, E.A., Colaneri, P. and Middleton, R.H. (2013) Optimal Therapy Scheduling for a Simplified HIV Infection Model. Automatica, 49, 2874-2880.

[17] Burth, M., Verghese, G. and Velez-Reyes, M. (1999) Subset Selection for Improved Parameter Estimation in On-Line Identification of a Synchronous Generator. IEEE Transactions on Power Systems, 14, 218-225.

https://doi.org/10.1109/59.744536

[18] Audoly, S., Bellu, G., D’Angio, L., Saccomani, M.P. and Cobelli, C. (2001) Global Identifiability of Nonlinear Models of Biological Systems. IEEE Transactions on Biomedical Engineering, 48, 55-65.

https://doi.org/10.1109/10.900248

[19] Xia, X. and Moog, C.H. (2003) Identifiability of Nonlinear Systems with Application to HIV/AIDS Models. IEEE Transactions on Automatic Control, 48, 330-336.

https://doi.org/10.1109/TAC.2002.808494

[20] Xia, X. (2003) Estimation of HIV/AIDS Parameters. Automatica, 39, 1983-1988.

[21] Adams, B.M., Banks, H.T., Davidian, M. and Rosenberg, E. (2007) Estimation and Prediction with HIV-Treatment Interruption Data. Bulletin of Mathematical Biology, 69, 563-584.

https://doi.org/10.1007/s11538-006-9140-6

[22] Lisziewicz, J. and Lori, F. (2002) Structured Treatment Interruptions in HIV/AIDS Therapy. Microbes and Infection, 4, 207-214.

[23] Lisziewicz, J., Rosenberg, E., Lieberman, J., Jessen, H., Lopalco, L., Siliciano, R., Walker, B. and Lori, F. (1999) Control of HIV Despite the Discontinuation of Antiretroviral Therapy. New England Journal of Medicine, 340, 1683-1683.

https://doi.org/10.1056/NEJM199905273402114

[24] Lori, F., Lewis, M.G., Xu, J., Varga, G., Zinn, D.E., Crabbs, C., Wagner, W., Greenhouse, J., Silvera, P., Yalley-Ogunro, J., Tinelli, C. and Lisziewicz, J. (2000) Control of SIV Rebound through Structured Treatment Interruptions during Early Infection. Science, 290, 1591-1593.

https://doi.org/10.1126/science.290.5496.1591

[25] Lori, F. and Lisziewicz, J. (2001) Structured Treatment Interruptions for the Management of HIV Infection. JAMA: The Journal of the American Medical Association, 286, 2981-2987.

https://doi.org/10.1001/jama.286.23.2981

[26] Adams, B.M. (2005) Non-Parametric Parameter Estimation and Clinical Data Fitting with a Model of HIV Infection. PhD Dissertation, North Carolina State University.

[27] Banks, H.T., Davidian, M., Hu, S., Kepler, G.M. and Rosenberg, E.S. (2008) Modeling HIV Immune Response and Validation with Clinical Data. Journal of Biological Dynamics, 2, 357-385.

https://doi.org/10.1080/17513750701813184

[28] Bonhoeffer, S., Rembiszewski, M., Ortiz, G.M. and Nixon, D.F. (2000) Risks and Benefits of Structured Antiretroviral Drug Therapy Interruptions in HIV-1 Infection. AIDS, 14, 2313-2322.

https://doi.org/10.1097/00002030-200010200-00012

[29] Banks, H.T., Dedui, S. and Ernstberger, S.L. (2007) Sensitivity Functions and Their Uses in Inverse Problems. Journal of Inverse and Ill-Posed Problems, 15, 1-26.

https://doi.org/10.1515/JIIP.2007.001

[30] Miao, H., Xia, X., Perelson, A.S. and Wu, H. (2011) On Identifiability of Nonlinear ODE Models and Applications in Viral Dynamics. SIAM Review, 53, 3-39.

[31] Attarian, A.R. (2012) Patient Specific Subset Selection, Estimation and Validation of an HIV-1 Model with Censored Observations under an Optimal Treatment Schedule. PhD Dissertation, North Carolina State University.

[32] Dempster, A., Laird, N. and Rubin, D. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society: Series B, 39, 1-38.

[33] Banks, H.T. and Tran, H.T. (2009) Mathematical and Experimental Modeling of Physical and Biological Processes. Chapman & Hall/CRC, Boca Raton.

[34] Finkel, D.E. and Kelley, C.T. (2004) Convergence Analysis of DIRECT Algorithm. OPT Online, 14, 1-10.

[35] Phillips, A. (2005) Long Term Probability of Detection of HIV-1 Drug Resistance after Starting Antiretroviral Therapy in Routine Clinical Practice. AIDS, 19, 487-494.

https://doi.org/10.1097/01.aids.0000162337.58557.3d

[36] Paterson, D.L., Swindells, S., Mohr, J., Brester, M., Vergis, E.N., Squier, C., Wagener, M.M. and Singh, N. (2000) Adherence to Protease Inhibitor Therapy and Outcomes in Patients with HIV Infection. Annals of Internal Medicine, 133, 21-30.

https://doi.org/10.7326/0003-4819-133-1-200007040-00004

[37] Rosenberg, E.S., Davidian, M. and Thomas Banks, H. (2007) Using Mathematical Modeling and Control to Develop Structured Treatment Interruption Strategies for HIV Infection. Drug and Alcool Dependence, 88, S41-S51.

[38] Bajaria, S.H., Webb, G. and Kirschner, D.E. (2004) Predicting Differential Responses to Structured Treatment Interruptions during HAART. Bulletin of Mathematical Biology, 66, 1093-1118.

[39] Ortiz, G.M., Wellons, M., Jason, B., Vo, H.T.T., Zinn, R.L., Clarkson, D.E., Loon, K.V., Bonhoeffer, S., Miralles, G.D., Montefiori, D., Bartlett, J.A. and Nixon, D.F. (2001) Structured Antiretroviral Treatment Interruptions in Chronically HIV-1-Infected Subjects. Proceedings of the National Academy of Sciences of the United States of America, 98, 13288-13293.

https://doi.org/10.1073/pnas.221452198

[40] Garcia, C.E., Prett, D.M. and Morari, M. (1989) Model Predictive Control: Theory and Practice—A Survey. Automatica, 25, 335-348.

[41] Zurakowski, R. and Teel, A.R. (2006) A Model Predictive Control Based Scheduling Method for HIV Therapy. Journal of Theoretical Biology, 230, 368-382.

[42] Camacho, E.F. and Bordons, C. (2004) Model Predictive Control Advanced Textbook in Control and Signal Processing. Springer, Berlin.