To model hierarchical data when the distribution is not known is a big problem and has affected many researchers dealing with big data  . This is because of the disparity within the data, to account for the heterogeneity a Bayesian non-parametric model is necessary as it leads to flexible density estimates which are capable of identifying clusters of individuals with similar biomarker characteristics. Bayesian non-parametric mixture model is a good fit to model biological markers because it exhibits flexibility when modeling data which has a skewed and multi-modal distribution. The reason behind this is because data sets become bigger every day and require flexible models which can expand with the data. Mixture methods approach allows for probabilistic approach of clustering data points to different clusters  . The model also gives support to out of sample cluster assignments through computing the posterior probabilities for new data points.
In clinical trials, the importance of a treatment is either to decrease the burden of the disease for the patient or to eliminate the disease. To identify a biomarker which is changed by a treatment is not easy due to difficulties associated with the disease mechanisms. If a biomarker which is affected by the treatment has been identified, coming up with the association of the biomarker and the outcome is not easy because of the changes in the variability of the biomarker, patient response, and evaluation methods used. Thus, it is important to identify the changes each individual exhibit and whether there are changes or no changes as a result of the treatment  . The responses of individuals to treatment may be related, and identifying of groups of individuals sharing similar characteristics is of important.
Many authors have applied the Bayesian non-parametric procedures to study various categories of biomarkers ranging from prognostic, predictive, phamacodynamic, and surrogate endpoints. For example,  studied the prognostic biomarkers and showed how they related to the clinical outcome using the Bayesian non-parametric procedures. Additionally,  studied the prognostic biomarkers using Bayesian parametric procedures, and finally  studied the surrogate endpoints using the Bayesian methods. These studies identified the need to study biomarkers and determine how they are related with the clinical outcome.
Bayesian non-parametrics have a wide application in many areas especially big data analytics. Bayesian non-parametric methods are widely used to solve problems where the size of the data changes leading to growth of the dimension of interest, for instance, in problems where the number of features varies with increase in the observed data. Also, they are commonly used in clustering and the number of clusters depends on the data being used. In general, in Bayesian non-parametrics models the number of parameters increases as the size of the data grows.
The study of  applied the Bayesian non-parametrics in modeling biological markers. In the study the model assumed measurements of the biomarkers were taken continuously before the subjects under study are introduced to treatment and after the patient has been given some treatment. In the study the measurements were not depended on covariates and the survival result was due to measurement of the change, though, different distributions could give the same outcome.
Accordingly,  developed an integrative Bayesian predictive modeling framework to identify individual pathological brain states depending on the choice of fluoro-deoxyglucose positron emission tomography (PET) imaging Biomarkers and evaluated the relation of the states with a clinical outcome. The study would identify patient subgroup characterized by different biomarkers to produce the clinical outcome. The strategy also identified imaging Biomarkers with pathological states of the individuals and assumed that the latent individual state gets its values from one of the pathological states, and one of the states was a reference point. The latent random variables were independent and identically distributed taking a multinomial distribution. On the mixture weights a Dirichlet prior was used, considering a where the Gaussian distribution was considered, the mean was taken as one of the parameters to model the latent state specific random effect and to characterize the mean metabolic profile for individuals within the latent state. The Variance-covariance matrix captured the association between regions for individuals with latent state. A likelihood function was also established.
Additionally,  developed a Bayesian model to sample inference with availability of inverse-probability weights. The study used a hierarchical method where the distribution of the weights from the non-sampled units was modeled and included predictors in a non-parametric Gaussian process. Simulation study was used to check how the procedure performed and compared to the classical design-based estimator. The study concluded that Bayesian non-parametric finite population estimator is more appropriate compared with the classical estimator. Also,  compared the hierarchical Bayes model for biomarker subset effects in clinical trials to the profile likelihood method, to make references to the threshold parameter using bootstrap. The method provided improved sample properties for probability coverage at 95% confidence interval.
Therefore, the importance of modeling surrogate markers in this study is to be able to determine the relationship between the baseline biomarker and the samples taken after an individual has been given some treatment. Bayesian non-parametric methods are flexible methods and will accurately indicate the relationship to show whether there is any change and be able to identify groups of individuals which have similar characteristics through clustering  . Also, the method is capable of showing whether after treatment the distribution of the biomarker changed through increase, decrease or it did not change at all.
The other parts of the paper are arranged as follows; in Section 2, discussion of the general modeling framework is done. Section 3, discusses the proposed model by detailing the nested Dirichlet process model for characterizing patient profiles. In Section 4, the hierarchical model is formulated. Section 5, describes the posterior computation. Section 6, is a simulation study to assess the performance of the model. Finally, the conclusion is in Section 7.
2. General Modeling Framework
Let T denote the treatment effect, X represent the baseline biomarker, Y denote the post treatment values, and E the clinical outcome, and Z are the covariates which are present. If p(.) is a distribution, for instance, , is a conditional distribution. If the treatment impact T is put into consideration, then the biomarker distribution will be affected. To address this then the inpatient change from X to Y is necessary. To assess the inpatient change, then putting into consideration of the relationship between X and Y because of the inpatient effects is necessary. Due to the effect the treatment has on Y and the effect of the covariate to X or Y thus it leads to, and , though the distribution can either be highly disperse and complex. The model in this study will involve representation of a biomarker profile as , to symbolize the change made on the biomarker because of treatment, incorporating them to the model to include the impact of the change on the outcome E. The model is also able to classify groups of individuals with various changes in Biomarker profiles depending on how the impacts of T and the change ∆ have on E. Thus, employing the probabilistic factorization then;
From Equation (1), the following assumptions are made;
1) , which implies, with the effect of the covariates and the treatment, the impact of the (X, Y) on E is indicated by the change.
2) Also, the distribution of X and Y may be depended on the covariate, then the study assumes that both do not depend on the covariates.
A hierarchical Bayesian non-parametric model is employed for and for the ; a non-parametric regression model in the Bayesian case is employed, to give adaptable cluster estimates for individual’s specific distributions of ∆ and their clusters. A hierarchical structure is obtained through making assumption of the individual’s specific Dirichlet processes being samples that are conditionally independent and obtained from a hyperprior which is also a Dirichlet process.
3. Proposed Model
Here the structure of the data is developed and the general model introduced. The subjects are indexed by . Assuming Ei is time-to-event outcome, let be the observed time of the event with , if , and 0 if . For , , and be the baseline covariates with . For the ith individual let ni and mi be the measurement frequencies of the levels of the biomarker obtained before treatment and after. Let and be the individuals pre and post-treatment biomarker values, where, and .
The functional , is a representation of the individual change for the levels of the biomarker before and after treatment. Where Ti is the treatment given to the ith individual and ∆ is some measure of distributional distance. The distributional distance is defined on a sample space cumulative density function (Cdf) of one-dimensional random variables, which is the distributional distance between the two cdf’s FX and FY in the space of cumulative density function. The vertical quantile function is;
where, Equation (2) is a quantile function of order p which is a representation of the functional for the density plot. The quantile function allows for comparison of various functions for all the distributions. For instance, is used in median tests. Also, the vertical quantile function is associated with the Receiver Operating Characteristic (ROC) curve represented as
where FX and FY are the cdf’s of the diagnostic variables in the populations. Here, the interest is not to assess the diagnostic performance for a biomarker; however, to evaluate the targeted treatment, the vertical quantile function is estimated by taking into consideration the distribution functions FXi and FYi for the subject levels of biomarker for different individuals. Therefore, the distributional change is;
Equation (3) corresponds to the area under the curve which is majorly applied in diagnostic studies. Thus, ∆ represents the change of the distribution of the biomarkers for the ith individual, because of the treatment administered to the subject. A posterior estimate with , means that the individual’s distribution has moved to the right, that is, there is a biomarker increase. Also, , shows a change to the left side, hence a decrease in the biomarker levels, and , indicates no remarkable change. Thus, from Equation (1) the patient level data likelihood is;
For β a vector of parameters for regression modeling and θ parameterizes the hierarchical model.
Thus, T0, follows one of the common distribution like the log-normal, where the linear component is a function of the change (∆), the treatment (T) and the covariates (Z). The model should be adaptable so as to cover many biomarker distributions which are possible and can either be skewed or multimodal and take account of the variability between subjects. To explain the variability then a hierarchical Bayesian non-parametric framework is used. The model allows flexible density which can identify groups of subjects characterized by individual’s biomarker profile. Therefore, the study assumes that measurements of the biomarker are samples are obtained from unknown individuals distributions with and , Where Xi and Yi are vectors of subject specific measurements. FXi and FYi are modeled separately using mixtures of Gaussian distribution denoted by mean µ and standard deviation δ, that is . The pdf and cdf are denoted by and . The mixture components are defined as wi for each component with the constraint such that, , implying that the total probability distribution will normalize to 1. Thus, the Gaussian mixture model is represented as;
Assuming a DP with a concentration parameter and a base distribution G0. Then, for each individual it follows that;
where, , and .
Let and . Under the mixture model and are sampled from some mixing distributions GXi and GYi as follows;
This means that the conditionals on the realizations of GXi and GYi, thus, the distributions for the Xi and Yi are the following mixtures;
To assess the change in the distribution of Yi verses Xi in terms of so as to be able to investigate the association of the change with the outcome and classify groups of subjects which have the same biological responses. Additionally, a prior model is defined on GXi and GYi and it involves the Dirichlet process (DP), which is commonly preferred prior probability model due to its clustering capability.  expressed this as , which is a random distribution G following a DP that has a base distribution and a concentration parameter . Shows significant properties, that is, how G varies about the mean (base distribution), where a smaller value of shows high uncertainty and vice versa.
Since G are discrete samples, they have some positive probability ties, as some ’s and ’s in Equation (8) may be equal. A DP can be easily used to estimate GXi and GYi for each subject, though it lacks the clustering properties of the distributions for all individuals or among the pre and post treatment values obtained. Clustering is necessary so as to identify the change after the treatment for all the individuals. Thus, the change is obtained by assuming that GXi and GYi are realizations of a common Dirichlet process mixture model (DPMM). In a DPMM the individual’s realizations of GXi and GYi are shared across and for each subject’s pre and post treatment values. Therefore, GXi and GYi are independent conditional samples from the same the Dirichlet process, then;
where G0k is a realization from a common DP prior that is which has a base distribution and a concentration parameter , then;
Here . Therefore each GXi and GYi is automatically obtained from a collection of different distributions that is the G0k’s.
4. Formulation of the Hierarchical Model
The hierarchical model is formulated using the nested Dirichlet Process (nDP) which is as follows;
In the earlier discussions, it is clearly expressed that is a functional of , which in the nDP is determined by the realizations GXi and GYi in Equation (10). Since the Dirichlet process given by Equation (10) has a discrete support and ’s in the equation cannot be neglected, then it shows a non-trivial probability where , which means that the treatment has no biological impact on patient i, this is clearly shown through the posterior estimate of the . Additionally, there is also non-trivial probability that for which implies , implying the biomarkers profiles for individuals i and i' are in one cluster.
To complete the model the base distribution is specified and it is defined as a Normal-Inverse Gamma (N-IG) distribution for the mean and precision parameters in the Normal model and and are assigned independent Gamma priors, thus, the hierarchical probabilistic model.
4.1. Biomarker Profiles Likelihood
The Biomarker Profiles Likelihood is as below;
4.2. The Model and the Priors
where, the fixed hyper parameters are; μ0, k0, a0, d0, μ1, k1, a1, d1.
5. Posterior Computation
To compute the joint posterior distribution for model parameters, this is done computationally. Thus Markov Chain Monte Carlo (MCMC) algorithm for posterior inference is used. The full conditional to update the nDP are gotten using the method described by  depending on a truncated Dirichlet process. At each iteration, for the baseline distribution , parameters are continuously updated based on all the samples represented by the biomarker values. The algorithm is developed using a truncation of a Dirichlet process to give approximate truncation to the stick breaking process of a Dirichlet process leading to method of computation in finite mixture models.
This assumes that, individuals are clustered into K groups and for every individual the observations on the biomarker level can be clustered into L groups. To provide support for the estimation of both clinical and biological effects together, the proposed model accounts completely for the uncertainty of the random quantities, together with variability of the ’s to express the variation of the population. In every iteration the Gibbs sampling algorithm gives samples of the distribution of the biomarker (GXi, GYi) for every individual, used to get the biomarker profile . This can be easily illustrated by Considering the model as in Equation (13) and Equation (14), to obtain samples from the posterior after the burn in, every value that is sampled ( ) is obtained by getting the average for the estimates of the posterior of the subjects distributions of biomarkers. From Equation (9) and Equation (10), it is clear that every mixing distribution G0k, for or , to obtain the biomarker profile for the posterior then, Equation (15) is applicable;
where and thus the estimate of the posterior biomarker profile is obtained by dividing the mean with the posterior values which is;
6. Simulation Study
In this section, a simulation study is presented to show the capability as well the ability of the nested Dirichlet Process when modeling biological markers to give accurate density estimates by obtaining strength from different centers. In the simulation, N samples are obtained from a mixture of four Gaussian distribution.
Equation (17), is a representation of a mixture of Gaussian distribution with wi mixing weight for every component, and is the component which can be represented by any distribution. Here, the components are represented by a normal distribution such that the mixture distribution becomes;
The true distributions are plotted in Figure 1.
Distribution S1 and S2 are asymmetric with a mixture of two Gaussian components with different weights. For distribution S3 and S4, they share three
Table 1. The components of various Gaussian distributions.
Figure 1. Plot for the true distributions used in the simulation study.
mixture components which are located at the origin, with difference only on the fourth component of the distribution S4.
The true cluster memberships are plotted in Figure 2. It represents the true cluster membership si for J = 40 samples through plotting I (si = sj) for all the pairs. The samples are ordered by their true clusters. Therefore, these are simulation conditions with well separated true distributional clusters.
The same number of samples has been simulated for the each of the four true distributions. To obtain the posterior simulation, then; the precision parameters and are both fixed to 1 and the Normal-Inverse Gamma distribution which is the baseline measure (base distribution) are , , , and , such that; NIG(0, 0.01, 3, 1). Therefore, a priori , , , and .
The algorithm described in Section 5 is used to obtain the samples of the posterior distribution using the nested Dirichlet Process. The study runs MCMC chain with 12,000 iterations, discarding the first 2000 iterations and thinning out to save one in every 10 iterations.
The estimated distributions for each distributional cluster are represented in Figure 3. Figure 3 is an image of Figure 1. This is a clear indication that the prior and the posterior samples obtained after the MCMC draws are the same and reflect the distribution where each of the observation has been obtained from. The posterior draws are drawn from all the distributions with all the components. Hence, the posterior and the prior distribution are the same. Thus, in this case when using the Bayesian non-parametric mixture model it reflects the individual biomarker distributions before treatment taking the same form as the after treatment measurements drawn from different centers.
Also, the posterior cluster memberships takes the same form as the true cluster memberships as clearly shown in Figure 4.
The posterior co-clustering probabilities take the same form as the true cluster membership. The model developed is able to classify groups of individuals from different centers (distributions) to one group. The individuals are placed into the groups as per the prior information which is available. Hence, the diagram displays four clusters similar to the estimated distribution as shown in Figure 4.
Figure 2. Representation of the true cluster membership.
Figure 3. Representation of the estimated distribution.
Figure 4. Representation of the posterior co-clustering probabilities for all the distributions.
We introduced a model using the truncated nested Dirichlet process to identify groups of individuals who respond similarly to the same treatment for a specified biological marker. An MCMC algorithm has been used to estimate the posterior inference. Since the nDP is a non-parametric model, it has the capability of grouping all the observations from the mixture depending on the entire distribution, rather than selecting particular features of the distribution. In the simulation study the proposed method for biological markers showed a good performance in differentiating the unimodal distributions from the multimodal distributions.
The proposed procedure in this paper reveals that Bayesian non-parametric mixture model can be used to obtain flexible estimates of the individual biomarkers and characterize the heterogeneity of how the subjects are responding to treatment. The proposed procedure only considers measurements taken before introduction of treatment and after the treatment. Biomarkers sometimes can change with time, thus a more structured model can be developed by use of longitudinal biomarker values to account for individuals biomarker processes. This work can be extended to model the relationship between two or more groups of data after the individuals have been clustered. Also, the procedure did not take into consideration of the covariates which might affect the biomarkers. This can also be incorporated so as to see whether they have any effect.
We thank the Editor and the referee for their comments. Our research is funded by the African Union, this support is greatly appreciated.
 Chiang, S., Guindani, M., Yeh, H.J., Dewar, S., Haneef, Z., Stern, J.M. and Vannucci, M. (2017) A Hierarchical Bayesian Model for the Identification of Pet Markers Associated to the Prediction of Surgical Outcome after Anterior Temporal Lobe Resection. Frontiers in Neuroscience, 11, 669.
 Morita, S., Thall, P.F., Bekele, B.N. and Mathew, P. (2010) A Bayesian Hierarchical Mixture Model for Platelet-Derived Growth Factor Receptor Phosphorylation to Improve Estimation of Progression-Free Survival in Prostate Cancer. Journal of the Royal Statistical Society: Series C (Applied Statistics), 59, 19-34.
 Graziani, R., Guindani, M. and Thall, P.F. (2015) Bayesian Nonparametric estimation of Targeted Agent Effects on Biomarker Change to Predict Clinical Outcome. Biometrics, 71, 188-197. https://doi.org/10.1111/biom.12250
 Chen, B.E., Jiang, W. and Tu, D. (2014) A Hierarchical Bayes Model for Biomarker Subset Effects in Clinical Trials. Computational Statistics and Data Analysis, 71, 324-334. https://doi.org/10.1016/j.csda.2013.05.015