Survival of HIV/AIDS patients is crucially dependent on comprehensive and targeted medical interventions. Health professionals monitor patients’ health status using such disease markers as CD4 T-cells counts. The disease progression as indicated by the longitudinal CD4 measures may affect the time of an event of interest―death of a patient in this case. The main interest of inference is on the association between the longitudinal and survival processes. Joint models for longitudinal and time-to-event are based on the joint distribution of the two processes  . The joint analysis may be appropriate when the longitudinal variable is correlated with patient’s health status and incorporate all information simultaneously so as to provide valid and efficient inferences  .
The traditional approach in the analysis of survival data assumes a homogeneous population, where all individuals have same health risks. In practice, individual patients possibly differ in health risks such as their vulnerability to causes of death, responses to treatments, and influences of various risk factors. Joint modelling of the two data often assumes normal distributions in the linear mixed models    . It is interesting to look for alternative distributions that can accommodate data that may not be normally distributed.
The current study considers the longitudinal measure in terms of its rate of growth. The rate of growth is an important concept in studying changes. If the level of growth is viewed as the current status of a process at a specific time, the rate of growth measures how fast the process is changing at that time  . Studies by  extend the usual growth models using the generalized error distribution (GED) and estimate its parameters under Bayesian framework. The author studied such a general form of linear growth model , where is growth observation for individual i at time t assuming the error term to have the generalized error distribution and independent of the random effects . The GED has a shape parameter that may possibly vary across time, but here assumed constant. The model can handle data with both leptokurtic and platykurtic errors.
Markos et al.  studied joint modelling of survival time and longitudinal CD4 cell counts of HIV/AIDS patients using Bayesian methods. The authors compared various Bayesian joint models involving the Weibull, lognormal and loglogistic AFT distributions and normality assumption for the longitudinal CD4 measure using two data sets. They recommended the Bayesian joint loglogistic model for one data set collected from Shashemene referral hospital and the Bayesian joint lognormal model for the second data set collected from Bale Robe general hospital. These models have same hazard rate functions as that of data sets. In the current study, we further analyze the same data sets with newly defined Bayesian joint models considering the generalized error distribution for the longitudinal measure instead of normal distribution.
2.1. Description of Data
The study considers two data sets that are collected from two hospitals under similar settings and as considered by    . The data are extracted from patients’ charts which contain epidemiological, laboratory and clinical information of the patients. Patients with ages less than 16 years old are also those who started ART before the defined study period were not included in this study.
Data 1: The first data set is obtained for 354 random samples of HIV/AIDS patients who had been under ART follow-up from January 2006 to December 2012 at the Shashemene referral hospital. There are two outcome variables in each data set. The longitudinal measure is the square root of the number of CD4 cell counts per mm3 of blood repeatedly measured at approximately every 6 months interval. The survival outcome is the time in months of a patient to the associated death event. Explanatory variables for the longitudinal response are: visit time, square of visit time, sex, functional status, alcohol use, tobacco use, number of opportunistic infections. Predictors for the survival time are: TB infection status at baseline, awareness about ART, condom use, number of opportunistic infections, number of living rooms at home. Data 2: The second data set is obtained for 400 random samples of HIV/AIDS patients who had been under ART follow-up from January 2008 to March 2015 at the Bale Robe general hospital. Outcome variables are: longitudinal measure which is the square root of the number of CD4 cell counts per mm3 of blood that repeatedly measured at approximately every 6 months interval, and survival time in months of a patient to death event. Explanatory variables for the longitudinal measure are: visit time, square of visit time, sex, age, weight, number of opportunistic infections, and those for the survival time are: age, weight, functional status, tobacco, condom. Description and codes of the explanatory variables are described in Table 1.
2.2. Linear Mixed Models
The longitudinal data, CD4 T-cell counts, are measurements on the response variable taken from same individuals over several observation times. Thus the set of observations on a subject tends to be inter-correlated   . The two sources of variations expected are the within-patient and the between-patients variations.
Table 1. Explanatory variables with codes.
Analysis of within-patient variation allows studying of changes of the CD4 counts over time, while analysis of between-patients variation allows understanding differences between patients.
Here we assume that the longitudinal CD4 measure has the generalized error distribution for instead of normal. For any variable Y that follows the generalized error distribution, its density function with three parameters as adapted by  from  is:
Here µ is location and is scale parameters. And is the shape parameter of GED that is related to kurtosis of the distribution and characterizes non-normality of Y. The GED can model the error distribution more flexibly than the normal one    .
The generalized error distribution generalizes the normal distribution. Normal distribution is a special case of GED when in which case and . Other special cases include a Laplace (double exponential) distribution when and when becomes a uniform distribution approaches-1. It becomes leptokurtic distribution for , gets fatter tails when , and gets thinner tails than the normal distribution when . Choy and Smith  derived the GED as a scale mixture of normal distributions for .
The GED in Equation (1) is expressed in a simpler form by  as follows:
The normal distribution is a special case of this form of GED when s = 2 and so . But in many situations, data are assumed normal though normality may not be an appropriate assumption. The statistical package such as fitdistrplus developed by  can be used to see whether or not measurement errors of data at hand are normally distributed.
The generalized error distribution was first introduced by Subbotin  as class of symmetric distributions with variation in kurtosis. The distributions have many structural properties close to a normal distribution. Many researchers have studied GED including its applications but not in the joint models studied here. Nelson  developed linear regression and time series models with heavy tails assuming the underlying distribution to be the GED. It can be used in statistical modelling if the observation errors are not necessarily normally distributed. Zhang  proposed and studied linear mixed growth models for longitudinal data with the GED so as to handle leptokurtic and platykurtic errors. The author reported that such models fit better to data than the respective models with normality assumptions.
In our case, we first analyzed the CD4 counts data in fitdistrplus package  and found that measurement errors of the longitudinal CD4 data seem non-normally distributed. Then we define the linear mixed model using generalized error distribution. Let be the longitudinal CD4 measurement of the ith patient at times . The linear mixed model for the longitudinal process with assumption of generalized error distribution for the error term is defined as:
where is time dependent mean response of as a function of predictors and coefficients , is time dependent subject specific random effects having normal distribution with mean zero and variance , and is a random error distributed as . The shape parameter is an important parameter to be studied here.
2.3. Survival Models
The survival time is random variable defined on non-negative real numbers. The observed time is taken as the minimum of the time to event and time to censoring . The time variable is modeled with two AFT distributions (lognormal and loglogistic) as considered in     .
We assume that the survival time follows lognormal distribution. Its probability density function , survival function and hazard function with parameters and can be expressed respectively as:
The regression model linked with the covariates for each individual i is given as:
Assuming that the survival time follows loglogistic distribution, its probability density function , survival function and hazard function with parameters with parameters and can be expressed respectively as:
The regression model is linked with the covariates for each individual
The AFT models allow the direct effects of covariates on survival time instead of hazard rate. Given a vector of predictors , the log-linear form of the AFT model for survival time of individual patient can be written as:
where is a vector of unknown coefficients of , refers to subject specific random effects having normal distribution, is a sequence of mutually independent measurement errors that follows AFT distributions, in this case, lognormal and loglogistic distributions.
3. Bayesian Joint Models
3.1. Likelihood Model
The association between the longitudinal and survival processes is assumed to come through stochastic dependences denoted by and . There are many ways of making the linkages  . Here we consider the links used in  . Thus the joint models that link the GED based model of longitudinal process in Equation (3) to the AFT based model of survival process in Equations (4)-(6) is given as follows:
where are association parameters. Note that are latent variables that are independent subject-specific random effects having bivariate Gaussian distribution with mean zeros and constant variances. These effects are assumed to be induced by the longitudinal process to the time-to-event process through the random intercept and random slope terms in the linear mixed model.
We assume Y and T are conditionally independent given the random effects and model parameters . The two sets of parameters are one for the linear mixed model and those for the survival model in the lognormal or in the loglogistic case. The joint likelihood function of the data from the two processes can be given as:
where each is an indicator for a patient’s survival with if death event occurs and if censored.
3.2. Prior Distributions
Non-informative joint prior distribution of the parameters is considered. Individual parameters β’s and α’s are assumed to be independently and identically normally distributed with mean zero and large variance 1000. The association parameters are each assumed to have normal distribution with mean zero and variance 1000. The shape parameter of GED, the shape parameter of loglogistic distribution , and precision parameters all are assumed to follow Gamma(2, 0.5).
3.3. Posterior Distribution
The Bayesian model  is defined by the posterior distribution of the model parameters and random effects given the data and is expressed by:
where is the likelihood function, is the prior probability distribution, and is the normalizing constant. The main challenge here is computational difficulty. The standard maximum likelihood method involves integrating out latent variables from the log likelihood function which is difficult when the parameters are of high dimensional. Simulation simplifies the computational challenges. Here the Bayesian model in Equation (10) is computed using Markov chain Monte Carlo methods with the Gibbs sampler algorithm that is based on full conditional distributions of the parameters    . The Gibbs sampler algorithm is implemented in the WinBUGS software version 1.4  . The final inferences are made based on independent samples taken from the posterior distribution after convergence of the realizations. Time series plots, auto-correlations and Gelman-Rubin statistics are used to assess and confirm convergences.
4. Results and Discussion
The objective of this study is to model the longitudinal CD4 measurement and the associated time to death data using Bayesian joint modelling approach. The generalized error distribution is assumed for the square root of the CD4 T-cell counts, while lognormal and loglogistic distributions are assumed for the survival time. Two data sets collected from two hospitals are analyzed using four Bayesian joint models. The findings from the models are all interpreted as they are important in many ways.
4.1. Descriptive Analysis
For Data 1 taken from Shashemene referral hospital, the average baseline CD4 cell counts is estimated to be 156.9 with standard deviation of 92.5 per mm3 of blood sample. By the end of study period, percentage of death event is about 5.9%. The average survival time of the patients is estimated to be 48.7 with standard deviation of 21.3 in months.
For Data 2 taken from Bale Robe general hospital, the average number of baseline CD4 counts is about 177.6 with standard deviation of 104.8 per mm3 of blood sample. Among the sample of patients considered, percentage of death event is about 11.5%. The average survival time is about 55.1 with standard deviation of 21.8 in months.
The baseline CD4 counts reveal same variabilities in the two studies which is about 59% as measured by coefficient of variation. However, it seems that there a slight difference between variabilities for time to event data with 44% for Data 1 and 40% for Data 2.
To understand the relationship between the longitudinal measure and follow-up time, mean structures are plotted in Figure 1. The plots show that the average of square root of CD4 counts may have a quadratic relationship with patient’s follow-up time. We thus include both observation time and its square in the linear mixed models as predictor variables.
4.2. Inferential Analysis in the Case of Data 1
In the analysis of Data 1, twenty one parameters are estimated using the two defined Bayesian joint models based on GED-lognormal and GED-loglogistic distributions. The results of analysis are displayed in Table 2 & Table 3.
Table 2. Parameter estimations for the Bayesian Joint GED Lognormal Model in the case of Data 1.
*Significant at 5% significant level; SD: Standard deviation; CI: Credible interval.
Table 3. Parameter estimations for the Bayesian Joint GED Loglogistic Model in the case of Data 1.
*Significant at 5% significant level; SD: Standard deviation; MC: Monte Carlo; CI: Credible interval.
Figure 1. Plots of mean of square roots of CD4 Counts over observed time for the two studies: (a) Data 1 and (b) Data 2.
4.2.1. Bayesian GED Lognormal Analysis
The results of analysis are displayed in Table 2. They reveal that the shape parameter of the GED of the longitudinal measure is significantly different from zero at 5% level of significance. This is because the 95% credible interval does not contain zero. The standard deviation of the GED is as well significant .
The subject-specific random effects and are found to be significant since their respective precision parameters are significant and . For AFT model, the precision parameter in the lognormal distribution is significant . However, the association parameters and in the joint model are not significant and . Thus the Bayesian joint GED lognormal model is not significant.
4.2.2. Bayesian GED Loglogistic Analysis
Analysis results of the Bayesian GED loglogistic model in the case of Data 1 are displayed in Table 3. The parameters of the generalized error distribution are both significant at significant 5% level based on estimations of its shape parameter and scale parameter . The random effects and are significant as well since their respective precision parameters are significant and . In the survival sub-model, the shape parameter of the loglogistic distribution is significant . Similar to the lognormal case, the Bayesian joint GED loglogistic model is not significant since the association parameters and are both insignificant and .
Comparing their total DIC values, the Bayesian GED loglogistic model fits better to Data 1 than the lognormal case. From the analysis of Bayesian GED loglogistic model, the covariates that are found to affect the longitudinal CD4 measure are: observed time, squared observed time and gender. They imply that the disease marker improves over time but later reaches maximum and then declines. Female patients gain more CD4 counts as compared to the males. Effects of the functional status, alcohol use, tobacco use, and opportunistic infection status of patients are not statistically significant. From the survival sub-model, the results show that TB infection status at baseline, awareness about ART and condom use have significant effects on the survival time of a patient.
4.3. Inferential Analysis in the Case of Data 2
4.3.1. Bayesian GED Lognormal Analysis
The results of analysis in Table 4 show that the shape parameter of the generalized growth distribution is significantly different from zero . The standard deviation of the GED is significant and relatively larger than those obtained from Data 1 case. The subject-specific random effects and are significant due to the fact that their respective precision parameters are significant and .
For the survival sub-model, the precision of the lognormal distribution is significant . Again, the association parameters and are not significant and and hence the Bayesian joint GED lognormal model is not significant in the case of Data 2.
The covariates observed time, square of observed time, sex, age, weight and number of opportunistic infection have statistically significant effects on CD4 counts of the HIV/AIDS patients. For survival part, age, functional status, tobacco use, and condom use have effects on survival time of the patients. Patient’s weight is not significant in this case.
4.3.2. Bayesian GED Loglogistic Analysis
The results are displayed in Table 5. The shape parameter of the generalized
Table 4. Parameter estimations for the Bayesian Joint GED Lognormal Model in the case of Data 2.
*Significant at 5% significant level; SD: Standard deviation; MC: Monte Carlo; CI: Credible interval.
Table 5. Parameter estimations for the Bayesian Joint GED Loglogistic Model in the case of Data 2.
*Significant at 5% significant level; SD: Standard deviation; MC: Monte Carlo; CI: Credible interval.
growth distribution is significant and its standard deviation is also significant . The subject-specific random effects and are significant because their respective precision parameters are so and .
For the survival sub-model, the precision parameter of the loglogistic distribution is significant . Only the slope term of the association parameters is significant but not the intercept . The Bayesian joint GED loglogistic model is significant implying that the joint model is important for this data set.
The Bayesian joint GED loglogistic model fits better to the data than the lognormal case based on the total DIC values. From analysis of Data 2 with the selected loglogistic model, the covariates that are found to significantly affect longitudinal CD4 measure of an HIV/AIDS patient are: observed time, square of observed time, sex, age, weight and number of opportunistic infection. With increasing ART follow-up time, CD4 counts of a patient can get parabolic mean growth with reaching maximal point. Female patients achieve higher CD4 counts on average than males. Moreover, the mean CD4 counts of a patient declines as patient’s age increases, weight decreases and number of opportunistic infection increases.
For survival sub-model, the results reveal that improving a patient’s weight improves her/his survival time. Note that weight is an important variable in explaining both the longitudinal and survival processes of a patient. Healthy functional status and condom use during sexual intercourse have also positive effects on survival time. Age and tobacco use are not significant for this data case.
4.4. Assessment of Convergence
For each of the Bayesian models, three parallel sampling chains of 60000 iterations with different starting values are generated. Some plots are given in Figure 2. Inferences are made based on samples of the posterior distributions that are taken with thinning of 10 after burn-in of 25000. Time series plots of the history of the simulations show a reasonable degree of randomness and they may convergence to same values. Auto-correlations and Gelman-Rubin statistics are also used to assess convergences. Finally independent samples are taken from the
Figure 2. Plots from analysis of Data 1 using the Bayesian GED Loglogistic. (a) Time Series plots of simulations of shape parameter γ of GED and shape parameter ρ of loglogistic distribution; (b) Autocorrelation plots; (c) Plots of Gelman-Rubin statistics.
posterior distribution after convergence of the realization with specified burn-in and thinning values, and then all inferences are made using those samples.
Assessment plots are displayed in Figure 2 from the analysis of Data 1 using the Bayesian GED loglogistic model. Results of two parameters: shape parameter of GED and shape parameter of loglogistic distribution are illustrated. They show that the simulations converge.
The findings reveal that the generalized error distribution for both data sets has positive estimate of the shape parameter and so it is of fatter tails than normal distribution. There is higher variation on the CD4 T-cell counts for the data from Bale Robe general hospital (about 49%) than that obtained from Shashemene referral hospital (about 10%).
The posterior distributions estimated under the selected models, Bayesian GED loglogistic models, are the solutions required in this analysis. Though the association parameter in the joint model is significant for one data but not for the other data case, both fitted models are still important to consider as they are newly defined models implementing the generalized error distribution. Thus the respective results are used to report findings on how the longitudinal CD4 counts and survival time of a patient are related and parameter estimations.
The findings from this study are fairly consistent with the studies by  except for the type of models selected. They suggested different models for the two data sets while only one is recommend here. This may be expected as the GED is involved in this study and as indicated by  the GED model can gain more insight on the error distributions than the normal growth curve models.
The current study focuses on developing Bayesian joint models with the assumption of generalized error distribution for the longitudinal CD4 observations and of two AFT distributions for the survival time of HIV/AIDS patients. Analyses of two different data sets show that measurement errors of the longitudinal CD4 variable are not normally distributed and so are modelled by the generalized error distribution. The distributions have fatter tails than the normal distribution. The Bayesian joint GED loglogistic models are found to be important models in fitting to the data sets. Fairly consistent estimates of parameters and more insights of the data are obtained from the models. In one of the data sets, it is found that survival time of a patient is affected by the latent variables generated from the longitudinal CD4 T-cell counts.
Covariates with significant effects are identified from analysis of the Bayesian GED loglogistic models. The findings reveal that under ART follow-up, patients’ health can be improved over time. Female patients gain more CD4 counts as compared to the males. Survival time of a patient is negatively affected by TB infection. Awareness about ART is an important factor as well.
Increase in number of opportunistic infection implies decline of CD4 counts. Age of a patient negatively affects the disease marker but no effects on the survival time. Weight loss is related with decline of CD4 counts and shortening of survival time on average. Improving a patient’s weight may improve her/his survival time. Condom use is positively related with survival time of a patient.
Bayesian joint models with GED and AFT distributions are found to be useful in modelling the longitudinal and survival processes. Bayesian computations of such complex models are well handled in the WinBUGS software by providing simulations and estimations of the posterior distributions. We recommend Bayesian joint models with generalized error distributions when measurement errors of the longitudinal data are not necessarily normally distributed. Further studies may investigate the models with various types of shared random effects, more covariates and prediction of future observations.
The authors would like to sincerely thank the Shashemene Referral and Bale Robe General Hospitals for providing us the data sets used in this study. We acknowledge the anonymous reviewers for their detailed comments and suggestions.