Received 21 February 2016; accepted 19 June 2016; published 22 June 2016
The global spread of infectious diseases threatens the well-being of human population as well as that of domestic and wild animals. Therefore, a proper understanding of the pathways and global spread of communicable and infectious diseases is an important aspect of disease management and policy making. However, the data collected with the object of understanding these patterns are subjected to complications brought about by heterogeneity that can be of spatial or non-spatial in nature   . Ignoring these complexities is likely to mislead inference which may results in erroneous conclusion   . Disease data can be case-event or counts from non-over- lapping regions. Since the data provided for this study only have county level spatial resolution, the approach will be to analyses the data as regional or county specific count data.
There exists a vast amount of literature concerning the development and application of disease mapping approaches  -  . Spatial distribution of a disease is often understood through application of statistical methods to the data and creating maps that visually describes spatial variation of disease risk  . However, disease counts maps are subjected to numerous problems. One such problem is the Modifiable Areal Unit Problem (MAUP), which occurs when inference at the areal level differs from that which is observed at the basic observational unit. This is likely to change conclusions drawn from a study of a count data. The MAUP can be addressed by scaling up to ensure smoothing or averaging of data and making inference at high aggregate level than that used in the analysis. MAUP can also be addressed by scaling down to enable inference at lower level than that used in the analysis. Multi-scale Analysis can also be used to addressed MAUP. This analysis concerns spatial units that are completely matched when aggregated  . Also differences in population between regions result in differences in variance of regional estimates. This problem is addressed by employing a hierarchical Bayesian model that smooths the risk from neighboring regions and clearly accounts for population difference by using a Poisson distribution for outcomes.
Bayesian methods are widely used in disease mapping.  applied the the Empirical Bayes (EB) methods for smoothing a map of lip cancer rates. They assumed a multivariate normal for the log relative risks and allowed for spatial correlation via conditional autoregressive model. Their model could not be considered to be a “fully Bayesian”, since a quadratic approximation was used for the likelihood and this did not account for the uncertainty in the estimates of the hyper-parameters.  was the first example of fully Bayesian disease mapping. They used the convolution prior model described in Section 6.1 to model the log relative risk. They found that the model shrunk extreme disease rates towards the mean and detected spatial association that was apparent in the raw data. According to  , the fully Bayesian model produced more accurate estimates than that produced by Clayton and Kaldor approach.
 ecological study applied Bayesian hierarchical regression model to evaluate the urban spatial and spatio- temporal distribution of TB in Rilirão Preto, state of São Paulo, Southeast Brazil between 2006-2009 and to evaluate TB risk determinants. The study reveals that TB rates are correlated with measures of income, education and social vulnerability. They state that complex relationship may exist between TB incidence and a wide range of environmental and intrinsic factors, which need to be studied in future research.  applied both the Bayesian approach and the generalized mixed model to produce smooth relative risk maps of TB and to model relationship between TB new cases and national TB control program indicators. Their study discovered that high TB risk areas were clustered and TB distribution found to be associated with the number of patients lost to follow-up and the number house holds with more than one case.  study on assessing the prevalence of TB in New York from 1970-2000 using Bayesian analysis approach stated that decline in TB incidence could probably be as a result of good control programs and raised in TB prevalence could be attributed to social disruptions such as homelessness, drug abuse, poverty, and overcrowding. Their study confirmed that increase in TB is mainly due to HIV epidemic.  study of TB pattern in India, using the Bayesian conditional autore- gressive model revealed that north-eastern states have high risk of TB than other regions.
In this paper our focus is to propose best fitting Hierarchical Bayesian approaches for modeling and mapping relative risk tuberculosis in Kenya. Specifically, we determine suitable spatial and non-spatial models for modeling TB in Kenya.
In Section 2, we describe the data used followed by a Bayesian modeling framework in Section 3. In Section 4, we describe two relative risk detection methods. In Section 5, we described Bayesian non-spatial models and spatial models in Section 6 for disease mapping. We give discussion and conclusion in Section 7.
2. Data Description
The data used in this study is routine data from Kenya Demographic and Health Surveys (DHS). The data contain records of Kenya’s population size, tuberculosis cases, and some suspected determinants of tuberculosis for each period from 2002-2009 and for each 67 districts. To study the risk of TB infection in each county, the data from the 67 districts were aggregated to provide county level summaries. Some of the determinants of TB that were recorded are: HIV prevalence, poverty prevalence, illiteracy, population less then 5 km to health facility, firewood, altitude and mean house hold size. Summary Statistics for these determinants of TB are shown in Table 1.
3. Bayesian Modeling Framework
Bayesian methods define the posterior distribution which is the distribution of the parameter(s) given the observed data. This implies that we require a likelihood function for the observed data and a prior distribution for the unknown parameters. Let be a random variable with probability density function, where is a vector of relative risk parameters. The likelihood function of is defined as
Equation (1) is based on the assumption that the sample values of given the parameters are independent  . Once the data model (likelihood function) is chosen, a Bayesian analysis requires the assertion of a prior distribution for the unknown model parameters. The prior distribution can be viewed as representing the current state of knowledge, or current description of uncertainty, about the model parameters prior to data being observed. A prior distribution is a distribution assigned to the parameter before the data are observed  . All parameters within the Bayesian models are stochastic and assigned appropriate prior distribution  . Given a single parameter, , the the prior distribution is denoted as, while for a parameter vector, , the joint prior distribution is denoted as.
The posterior distribution is a probability distribution of the parameters given the data. The posterior distribution which is proportional to the product of the likelihood function and the prior distribution is defined as
where is called the normalizing constant. It have been shown that posterior distribution of
the parameters (2) can be written as
Table 1. Summary statistics for TB determinants.
from which parameter estimates are drawn using Markov Chain Monte Carlo (MCMC) vis Gibbs sampling  . True parameter estimates are obtained when the the Markov chain converges. Convergence diagnostics for each parameter estimate under each implementation is shown.
4. Cluster Detection
This section discusses methods used in disease mapping to detect elevated risk. One such methods is the use of the posterior measures in posterior distribution (3). Another approach for risk detection is the use of the exceedence probability which relies on the posterior measures for cluster detection  . The most commonly criteria for cluster detection method is the exceedence probability in relation with the relative estimates for individual areas or counties  . The exceedence probability is defined as the probability that the relative risk exceeds some threshold level, defined by. Exceedence probability is computed
from the posterior sample values through, where
In evaluating, and the threshold probability must be chosen such that. By convention, k can take the values of 0.95, 0.975, 0.99, etc.  . According to  , exceedence probability is only capable of detecting hot spot cluster and does not consider any other information concerning possible forms of cluster or even neighborhood information.  defined Hot spot as any area displaying “excess” or “unusual” risk. According  , Hossain and Lawson have made some attempts to enhance the exceedence probability by inclusion of neighborhoods. They stated that, for the neighborhood of the area defined as and the number of neighbors as, then
These measures can be used to detect different forms of clustering  . However, the usefulness of the execeedence probability depends on the model that has been fitted to the data and that any poorly fitting model will not demonstrate exceedence relate to clustering  .
5. Bayesian Hierarchical Non-Spatial Models for Disease Mapping
This section presents two non-spatial models used in disease modeling and mapping. These are the Poisson- Gamma (PG) and the Poisson Log-Normal (LN) models. These models are often used to model small area count data and are appropriate when there is relatively low count of disease and the target population is relatively large in each small area. We present the PG model first followed by the LN model.
5.1. The Poisson-Gamma Model (PG)
The Poisson-Gamma model for relative risk estimation uses a gamma prior distribution for the relative risk combines with the Poisson likelihood function for the disease counts which gives a gamma posterior distribution for the relative risk. The Poisson-Gamma model is widely used in disease mapping to account for extra variability in the data through the prior distribution  .
5.2. Model Description
Suppose that the unknown risk of TB in region i is given as. Let and denote the number of TB cases and the population at risk respectively in region i. The expected number of TB cases in
region i can then be written as, where is the overall disease risk in the study population.
Under the frequentist paradigm, we assume that. Based on the sample, the likelihood function and the corresponding log-likelihood function are expressed as
The maximum likelihood estimator of is obtained via and is given by
This estimator, is referred to as the standardized mortality ratio in region i. Under the Bayesian paradigm, assume that, where is is the Poisson mean and has a gamma distribution with shape and scale parameters a and b respectively. The likelihood function for is denoted by
and prior distribution for is denoted by
5.3. Parameter Estimation
We used Bayesian hierarchical methods for parameters estimation in the Poisson-Gamma model. That is, if in addition, a and b are given prior distributions such that and, where and are the hyperprior distribution with hyper-parameters and for a and b re- spectively, then we can obtain parameters using Bayesian hierarchical methods. This is a second stage hierarchical modeling using the Poisson-Gamma model.
In this paper, we defined (as exponential distribution) and as a gamma distribution. Therefore, the posterior distribution is given by
Parameters estimation of the Poisson-Gamma model was carried out using MCMC via Gibbs Sampling. Convergence of the Chain occurs at 40,000 iterations after a burnin period of 1000 sample and thinning of every 30th element of the sample. Figure 1 presents the MCMC convergence diagnostics plots.
5.4. Markov Chain Monte Carlo Diagnosis
Figure 1 presents Gelman and Rubin convergence diagnostics of the Poisson-Gamma model: Column-wise from the top left, Figures 1(a)-(j) are trace plots for a, b, the mean and variance respectively. Figures 1(b)-(k) are posterior marginal density plots for b, a, the mean and the variance respectively. Figures 1(c)-(l) are auto- correlation plots for b, a, the mean and the variance respectively. The Gelman and Rubin trace plots show the convergence of the two parallel chains (Chains with different initial values). “Vanishing” autocorrelation function (ACF) plots show that there is low correlation among parameters that constitute the chain. More satis- factory kernel density plots for parameters of interest would more bell-shaped or symmetric. The density plots for the parameters show that convergence of the chain has reached.
Figure 1. Poisson-gamma model: convergence diagnosis of Markov Chain Monte Carlo.
Table 2. Posterior statistics of the poisson-gamma model.
the posterior relative risk is 0.93 (95% credible interval = 0.82 - 1.08). The posterior mean is approximately the same as the mean of the standardized mortality ratio 0.93 (95% credible interval = 0.62 - 1.05) (Equation (8)). The standard deviation of the relative risk, 0.42 (95% credible interval = 0.35 - 0.56), is lower than the standardized mortality ratio's standard deviation 0.49 (95% credible interval = 0.62 - 1.05). Thus their standard deviation has been reduced by 82% by the Poisson-Gamma model. The significance of the variance indicates variation in risk among counties. In a situation of rare cases, standard deviation of the Poisson-Gamma model is expected to be much lower than that of the standardized mortality ratio  .
Figure 2 shows standardized mortality ratio for TB prevalence in the counties of Kenya for 2002-2009. The
Figure 2. Kenya county level standardized mortality ratio’s maps: (a) The mean of the SMR and its 2.5% quantile (b), median (c) and 97.5% quantile (d).
SMRs vary around their mean, 0.93 with standard deviation, 0.49 (as discussed in Table 2). There is some suggestion of high TB prevalence in the North, West, North-West and Central counties of Kenya and low TB prevalence in the South-East counties except Mombasa (SMR > 2.0).
From Figure 3, we observed high risk of TB prevalence in the North, West, North-West and Central counties of Kenya and low risk in the South-East counties except Mombasa. Nairobi and Mombasa have the highest relative risk (RR > 2.0) and Laikipia, Nandi, Narok, Nyamira, and Vihiga have the lowest risk (RR < 0.5). The mean of the posterior relative risk and the SMR are the same. The range of the posterior relative risk of the Poisson-Gamma remains the same as the SMR, each having lowest relative risk estimated at 0.40 and the highest risk at 2.38. Variability in risk remains the same due to abundant of information or data.
The map of the exceedence probability in Figure 4 revealed 13 counties that exhibit high risk of TB above the national risk (RR > 1). These counties are: Nairobi, Mombasa, Kisumu, Turkan, Migori, Homa bay, Uasin
Figure 3. Kenya county level poisson-gamma posterior mean relative risk maps: (a) The mean of the posterior relative risk and its 2.5% quantile (b), median (c) and 97.5% quantile (d).
Gishu, Isiolo, Marsabit, Siaya, Tharaka-Nithi, Mandera, and Embu. This map confirms with Figure 3 concern- ing high and low TB prevalence areas.
Despite the fact that assigning a gamma prior distribution for is mathematically convenient, it is likely to be restrictive since covariate adjustment is difficult and there is no possibility for allowing spatial correlation between risk in nearby areas  . We therefore present models that nullify theses limitations in the next sections.
5.4.1. Poisson-Log-Normal Model
We now present a model which allows for flexibility of covariates adjustments or incorporation.  noted that in disease mapping, the log-normal model is important as it provides a specification that allows for incorporation of covariates. In section we consider LN model without incorporating random effects and covariates and LN model which takes into account random effects and covariates. The random effect term captures or explains
Figure 4. Poisson-Gamma posterior relative risk exceedence probability map: Row-wise from the top left figure: (a) the posterior mean relative risk exceedence probability and (b) its 3 2.5% quantile for relative risk, (c) median for the relative risk and (d) 97.5% quantile for the relative risk.
heterogeneity in relative risks of TB among counties. The LN mode with the random effect is often refer to as the unstructured heterogeneity (UH) model.
5.4.2 Model Description
Let and be the observed number and the expected number of disease counts in region respectively. Further let be the relative risk of disease in region i. We first consider a situation of a Poisson LN model with no area-specific random effect and covariate. As stated in the previous section, , where is the exponential of the linear link function and is the Poisson mean. Fitting a generalized linear model with a log-link function, we have
. By Bayesian paradigm, we assumed that and its hyper-parameters, and. Parameter estimation was carried out using Bayesin Markov
Chain Monte Carlo via Gibbs Sampling. Convergence of the MCMC was reached at 11,000 iteration after a burnin period of 10,000 sample and thinning of every 30th element of the chain. Convergence diagnosis plots are presented in Figure 5 and posterior statistics of parameters are presented in Table 3.
We now consider a Poisson LN model with area-specific random effect or uncorrelated heterogeneity (UH) effect and c covariate(s) for region i denoted by. Let represents the covariates matrix. The Poisson LN non-spatial model is given by
where is the linear link function, are the residual random effects that capture the
residual unexplained log relative risk in region i and is the precision variance. This implies that
From Equation (13), we can write the relative risk as
Figure 5. Poisson log-normal model: convergence diagnosis of Markov Chain Monte Carlo.
Table 3. Posterior statistics of Poisson log-normal model without covariate and random effect.
where are the relative risk of region i, are regression parameters and is the intercept
or the overall risk effect. Here, the mean of the Poisson distribution is
. Fitting a generalized linear model with a log-link function, we have
5.4.3. Parameter Estimation
Since, the likelihood function of is defined by
We would need the prior distributions for and to obtain the posterior distribution for the parameters of interest. It is assumed that with mean zero and variance. We assume a conjugate prior of the Gaussian distribution for the parameter defined as
We now show that the posterior distribution of has the Gaussian distribution Section 5.2.
5.5. The Likelihood Function of a Regression with Gaussian Random Effect
Consider a response random variable, where is the variance-covariance matrix. The linear
regression function of on is defined by. Therefore, the Gaussian process of
regression density or likelihood function for is given by
where is a design matrix of the covariates. Just to simplify analytic calculation, we can alternatively write the Gaussian linear model Equation (18) as
where, , and are the diagonal covariance matrix with the
standard model:. Consider a general likelihood function, and let us take a second order Taylor expansion of the log-likelihood around its maximum, then we have
where is the maximum likelihood estimator of. Letting and taking of Eq-
uation (19), we have
Finding the derivative of Equation (21), it follows that
Solving Equation (22), we have
Now finding the second derivative of Equation (25), we have:
Hence, we can rewrite Equation (22) as
From the Taylor’s expansion in Equation (20), and by the Maximum Likelihood Principle (MLP) that
it follows that
Taking exponent on both side of Equation (26), we have
5.6. Posterior Function of Gaussian Process Regression
Assuming that the prior distribution of is
Then writing only the terms from the likelihood and prior which depend on the weights, and “completing the squares” for Multiple parameters model, the posterior function is defined as
Simplifying Equation (31), it follows that
where and are the covariance of the likelihood function and the prior function respectively. The Equation (32) above is indeed Gaussian with the constant term. In “completing the squares”, we are given a quadratic form defining the exponent terms in a Gaussian distribution, and we need to determine the corresponding mean and covariance. To avoid computational complexity with “completing of squares”, we sort to using “kernel’s trick” TB324. The exponent of a general Gaussian distribution defined as where is the precision matrix can be expressed as
Comparing Equation (32) with Equation (33), we have
That is. Note that must be invertible, that is. The maximiser of the likelihood is the mean which is again the mode of the likelihood. Therefore, the Maximum Posterior (MAP) is given by
In fact the MAP is similar to the maximum likelihood value. Therefore, the posterior mean and Covariance are respectively defined by
It follows that the posterior distribution for is also Gaussian defined by
Therefore, the prior distribution of is assumed to be normally distributed as
and the prior distribution for the area-specific random effect defined by
Therefore, the posterior distribution is defined as
The prior distribution for the linear regression coefficients is given by. The corresponding conjugate prior distribution for is the inverse-gamma   defined as
The Equation (42) is the hyperprior distribution for with hyper-parameters. We defined and modeled the random effect and hyper-prior distribution for the precision parameter.
We first consider the LN without covariates. Parameter estimation was carried out using Bayesian Markov Chain Monte Carlo via Gibbs Sampling. MCMC convergence was reached at 100,000 iterations after a burnin period of 10,000 sample and thinning of every 30th element in the sample. Figure 5 presents convergence diagnostics plots of this model. Table 3 presents the Posterior statistics of the LN model.
Table 3, revealed that the overall mean of the posterior relative risk is −0.17 (95% credible interval = (−0.31, −0.046). This indicates that the overall TB risk effect in Kenya estimated by the Poisson Log-Normal model decreases keeping all other determinants of TB constants. The standard deviation of the relative risk is 0.46 (95% credible interval = 0.37 - 0.56) with precision variation (95% credible interval = 3.18 - 7.24) indicating significance of variability of TB risk among counties. In a situation of rare TB cases, standard deviation of the Log-Normal model is expected to be much lower than that of the standardized mortality ratio 0.49  .
Figure 6, revealed that TB risk is expected to be high in the North,West, North-West and central counties of
Figure 6. Kenya County level TB prevalence counts: Poisson log-normal model without covariate and random effect. (a) the posterior relative risk map and its 2.5% quantile (b), median (c) and 97.5% quantile (d).
Kenya and low risk in the South-West counties. According to this model, Nairobi and Mombasa are expected to have the highest TB risk (RR > 2.0) and Nandi, Narok, Nyamira, Vihiga, and laikipia are expected to have the lowest TB risk (RR < 0.5). Counties with relative risk above the national risk (RR = 1) are apparent from Figure 7. Again, due to abundant of data or information, the range of the posterior relative risk of the Poisson Log- Normal remains the same to that of the SMR. The lowest estimated risk is 0.40 and highest estimated risk is 2.38. There is no reduction of relative risk range compare to SMR's risk as would be expected in a case of rare information or data.
The exceedence probability map Figure 7 confirmed with the Poisson-Gamma model that 13 counties have their TB risk above the national risk. These counties are: Nairobi, Mombasa, Kisumu, Turkan, Migori, Homa bay, Uasin Gishu, Isiolo, Marsabit, Siaya, Tharaka-Nithi, Mandera, and Embu. Figure 7 confirms with Figure 6 that high risk of TB prevalence is observed in the North, West, North-West and central counties and low risk in the South-East counties except Mombasa.
Figure 7. Kenya County level TB prevalence counts: poisson log-normal model without covariate and random effect. (a) the posterior relative risk exceedence probability map and its 2.5% quantile (b), median (c) and 97.5% quantile (d).
We now present the results of the Poisson log-normal model with with UH and covariates effects. Figure 8 presents the Rubin and Gelman convergence diagnostics plots of this mode. Table 4 revealed that the overall level of relative risk effect estimated is (95% credible interval = (−0.30, −0.06)). The overall risk effect is significantly different from zero and negative. This indicates that overall TB risk effect would be decreasing keeping all other determinants of TB constants. Among the cova- riates considered as TB determinants, only HIV parameters is significantly different from zero 1.20 (95% credible interval = 0.50 - 2.60) but positive. This indicates that TB risk increases with increasing HIV prevalence.  noted that, the higher the, the higher the variability of TB risk among counties and and the lower it is the lower the variability. This means that, a very small will indicate possibility of risk similarity between neighbouring counties. The precision for the UH (95% credible interval = 3.35 - 7.76) is significant, indicating that there exist variation in risk among counties. The UH revealed high variability of relative risk compare to the Poisson Log-Normal without random and covariate effects (95% cre- dible interval = 3.18 - 7.24).
Figure 9 also confirmed that out of the 47 counties in Kenya, 13 exhibit TB relative risk higher than the national risk (RR = 1). Table 5 presents counties in groups according to their relative risk level. High TB risk can again be observed that in the North, West, North-West and central counties of Kenya and low TB risk in the South-East counties except Mombasa.
Table 5 presents the results of the UH model with counties categorised according to their range of relative risk. The results showed that 14 counties have their relative risk above 1 and the lowest risk counties are 5. The exceedence probability map in Figure 10 visually presents counties with risk above 1. Figure 10 shows 14 counties having elevated risk of TB. These maps again confirmed high TB risk in the North, West, North-West and central counties of Kenya and low risk in South-East counties of Kenya except Mombasa.
Figure 11 is the maps of the area-specific random effect, which shows variation of risk among counties in Kenya. This map captures and displays true TB excess risk surface after covariates and confounding factors are considered TB21. Excess risk of TB is observed in Marsbit, Embu, Migori and Kisumu.
Table 4. Posterior statistics of the poisson log-normal model with UH effect.
Table 5. UH results indicating counties with high and low TB risk.
Figure 8. Poisson log-normal model with UH and covariates effects: convergence diagnostics of Markov Chain Monte Carlo.
5.7. Summary of the Non-Spatial Models
Though the Poisson-Gamma model provides good information about the TB prevalence in Kenya, one of its shortcomings is that it is unable to handle problem of spatial correlation and incorporation of covariates  . The Poisson Log-Normal model provides specifications that allow for incorporation of covariates. The Poisson Log-Normal model also enable us to capture the area random effect and to explain the extend of risk variability among counties through the unstructured random effect term.
Though thinning reduces the speed of the MC but it significantly reduces the number of iterations and solves the issue of autocorrelation among parameters that form the chain. Thinning reduces storage demand while preserving the integrity of the MC process  .  noted that the value of every element to be sampled is determined by the researcher and out most care must be taken since extremely large k value may results in lost of potentially an important information.
The slower the chain to converge, the more careful one should be about the burn-in period. However, it should be noted that there is no standard, systematic or guaranteed way of determining the length of the burn-in period  . Nevertheless, considerable work on convergence diagnostics has been done to make specific recommendations and identify tests  .
HIV is identified as significant among the covariates considered. Reason being that HIV patients have their immune system weakened or destroyed by the HIV rendering the body natural defence incapable of carrying out its function of protecting the body against other diseases. Since HIV has this capacity to weaken the immune system, it also implies that it has the effect of re-activating latent TB to active TB in individuals who are latently infected. Models that allow for handling spatial correlation are discussed in Chapter 6.
Figure 9. Kenya county level TB prevalence counts: UH smooth relative risk map (a) and its 2.5% quantile (b), median (c) and 97.5% quantile (d).
6. Bayesian Hierarchical Spatial Model for Disease Mapping
This section presents spatial models used to identify and detect clustering of disease risk in the study area of interest. Spatial data are directly or indirectly referenced to a location on the surface of the earth. These models would allow for borrowing of strength between neighbouring counties such that neighbouring counties shall have similar risk whiles distant counties are expected to show variation in risk. The idea of spatial auto- correlation in spatial data analysis is that values of variables in near-by locations are more similar or related than those far apart. Waldo Tobler’s first law of spatial analysis states that “everything is related to everything else but near-by things are more related than distant things”  . In particular, we investigate the statistical pro- perties of the Conditional Autoregressive (CAR) model and the  models.
Figure 10. Kenya county level TB prevalence counts: UH smooth relative risk exceedence probability map (a) and its 2.5% quantile (b), median (c) and 97.5% quantile (d).
6.1. Conditional Autoregressive (CAR) Model
Though conditional autoregressive models where introduced decades ago by  , they were not widely used until the 1990s. Since they are defined conditionally, they are particularly suited for use with the Gibbs sampler  . The Conditional Autoregressive (CAR) models have been used extensively to identify and detect clustering in diseases risk. In these models, risks of disease at any given area is affected by the risk in the neighbouring areas. These models have been referred to as the structured model or the Correlated Heterogeneity (CH) models. That is, estimation of risk in any given area depends on risk at neighbouring areas  . The distances or boundaries between the regions are often used in determine neighbourhood properties within CAR models  .
Generally, the CAR model is a continuous Markov random field with a conditional probability density function characterization and designed to model spatial phenomena that are highly related to a specific local context   . Application of CAR models can found in    . Let represents the
Figure 11. Area-specific random effect: the posterior map (a) and its 2.5% quantile (b), median (c) and 97.5% quantile (d).
area to be studied. Let denote the set of all regions that are neighbouring region i. Let
be a random variable. We define the corresponding random field as the vector.
In the Gaussian CAR model, we often assume that each observation of the outcome variable has a conditional distribution defined by
These are full conditionals where is the weight of each observation on the mean of and also denotes the spatial dependence parameter. The is non-zero only if. Conventionally, we set since we do not want to regress any observation on itself. Hence no region is a neighbour of itself. The denotes a vector of all observation except. Note that depends only on a set neighbours only if location j is a neighbourhood set of. The is a potential unique variance for. For instance, if state i has M
neighbours and for every state that is a neighbour, and otherwise, then the conditional
expectation of a state's observation is the mean of all neighbours observations  . The Gaussian processes are specified by their mean and covariance function  . Assuming that each conditional distribution is Gaussian, we will need the mean and the variance-covariance to define the CAR model. The mean and the variance- covariance are respectively defined as
Therefore, conditional probability density function of a CAR random variable is has the form 
where, ,. The conditional joint probability density function of all the observations is
where, , invertible matrix defined as
diagonal matrix; such that is symmetric. It follows that the joint multivariate Gaussian distribution for with has covariance matrix which is
symmetric such that. Thus, a conditional autoregressive model in (45) has a pro-
bability density function defined as
and the joint probability density function in (46) becomes
The necessary and sufficient condition for (49) to be a valid joint probability density function is that its covariance matrix should not only be symmetric, but also positive definite (that is, its eigenvalues) TB223. For to be a Gaussian random variable, we need to show that is symmetric.
To show that is symmetric, we defined a symmetric weighted adjacency matrix, where
where is a measure that quantifies the proximity between region i and region j; if then i and j share a common boundary (neighbours). The could be the distance between the centroids of region i and j. Also, if, then j is one of the h nearest neighbours of i. Let be the diagonal of the adjacency matrix. The adjacency matrix of normalization or standardization is defined as
then we define a matrix of interaction, to be a normalized adjacency matrix defined as
Suppose again that the matrix corresponding to a constant diagonal matrix normalized as (53), then we have
It follows that the conditional joint probability density function can be rewritten as
where and. Hence, the CAR model structure for is defined as
and the Equation (49) can be alternatively defined as
The controls the overall variability of, while represents the overall effect of spatial dependence. The value of is should be chosen appropriately  .
The row stochasticity of indicates that the distribution is improper. This impropriety can be fixed by the parameter. Redefining and choose such that is non singular, preferably with, where are the ordered eigenvalues of. Simplifying
the bounds, we replace by. It follows that
If, then is non singular. Non-singularity is guaranteed if where is
the minimum eigenvalue of. The bound mostly preferred is. This is a proper Intrinsic
Autoregressive model which add parametric flexibility and is an indication of independence. is the additional parameter which makes independent when it is equal to zero. An improper choice of may enable wider scope for posterior spatial pattern and may be preferable  .
6.2. Parameter Estimation
In this study, we estimate parameters in the CAR model using Bayesian hierarchical methods. In disease mapping, we assumed that disease counts
The relative risk is defined as
Fitting the generalized linear model with a log-link function, we have. Under the Bayesian method, given the likelihood function of defined as
the prior distribution for is
and the prior distribution for the CAR random effect is defined by
The posterior distribution is defined as
The hyperperprior distribution for the precision parameters and are and respectively. The linear regression coefficient distribution is defined by. Parameter estimation was carried out using Bayesin Markov Chain Monte Carlo via Gibbs Sampling. Con- vergence of the MCMC was reached at 11,000 iteration after a burnin period of 10,000 sample and thinning of every 30th element of the chain. Convergence diagnosis plots are presented in Figure 12 and posterior statistics of parameters presented in Table 6. We compare these results with the  model results in Section 6.3.
6.3. The Besag, York and Mollié (BYM) Model
Among the models proposed for performing risk smoothing which have appeared in literature, the  model has found most extensive application. The BMY model is divided into two components; the CAR model component, and the UH component, (discussed in Section 6.1 and Section 5.1.1 respectively). The BYM model was introduced by  and latter developed by  . The BYM or convolution model is defines as
As noted earlier, we assume
The linear link function. The log relative risk. Therefore, the relative risk for
Figure 12. Poisson log-normal model with CAR model: convergence diagnosis of Markov Chain Monte Carlo.
Table 6. Posterior statistics of the CAR and BYM models.
the i area is defined by
The log log-link function is defined as
where and are vectors of the covariate, the associated parameters, the expected number of cases, and the relative risks of TB prevalence respectively. The is the county level random effect capturing the residual log RR of disease in county i. The (UH) is sometime thought of as a latent variable which captures the effect of unknown or unmeasured area level covariates and has a CAR model structure.
We defined the likelihood function as
The prior distribution for is
prior distribution for the area-specific random effect is defined by
and prior distribution for the CAR structure is
Therefore the posterior distribution for the parameters of interest is defined as
The hyperprior disribution for the precision parameters, and are, and respectively. The linear regression coefficient are as- sumed to have normal distribution defined by. The reflects the amount of extra-poisson variation in the data  . The precision variances and control the variability of and respec- tively. Parameter estimation was carried out using Bayesin Markov Chain Monte Carlo via Gibbs Sampling. Convergence of the MCMC was reached at 11,000 iteration after a burn-in period of 10,000 sample and thinning of every 90th element of the chain.
Posterior statistics of the CAR and the BYM model are presented in Table 6. Table 6 revealed that the estimated overall relative risk effect of the CAR model is (95% credible interval = (−0.180, −0.174)) and BYM model (95% credible interval = (−0.267, −0.0908)). Each model’s overall risk effect is significantly different from zero and negative. These models confirmed with the UH model that overall TB risk would be decreasing keeping all determinants of TB constant. Again, only the HIV variable is signi- ficant and positive for the CAR model and the BYM model with parameter estimates 1.812 (95% credible interval = 0.7735 - 2.758) and 1.41 (95% = 0.488 - 2.34) respectively. We therefore infer that the relative risks of TB increases as HIV prevalence increases.
As noted previously, the smaller the precision variance, the risk in any given area is similar to that in the neighbouring areas. The CAR model’s precision variance, (95% credible interval = 0.94 - 2.19) indicates high similarities of TB relative risk between neighbouring counties than the BYM model’s precision variance, (95% credible interval = 2.1 8- 40.80). High variation of TB risk exhibited by in the BYM model could be due to the presence of the term with precision variation (95% credible interval = 5.79 - 40.00).
Although the CAR model and BYM model each provides important information about TB relative risk behaviour, we recommend the CAR model as the best fitting spatial model to Kenya TB data since it yields lower DIC (627.21) and lower pD (49.19) than the BYM model with DIC (630.76) and pD (50.97). The BYM model, though robust, its robustness as a spatial model is lost a in situation where there is over-fitting  . That is, adding spatially structured extra-variability to the data when such variability doe not actually exit, con- ditionally on the covariates included in the model, leads to over-fitting, and may bias the estimations of the medical association between covariates and relative risk towards the hypothesis that it has no significant effect. In other words, not accounting for an actual spatial variability may lead to major biases but if spatially, variability of health indicators is completely explained by that of the socio-economic and environmental factors taken into consideration, regression residuals could results to a biased estimate of the medical association. We therefore presents detailed results of the CAR model (43). Figure 12 shows the convergence diagnostics plots of the CAR model. The convergence diagnostics of the BYM model can be found in Figure 13 and Figure 14.
Figure 15 also shows that out of the 47 counties in Kenya, 13 exhibit TB relative risk higher than the national risk (RR = 1). Table 7 grouped counties according to their respective relative risk ranges. The pattern of risk behaviour is similar to that reported in the previous models. High TB risk occurs in the North, West, North-West and central counties of Kenya and low risk in the South-East counties except Mombasa.
Table 7 shows 4 counties (Nairobi, Mombasa, Isiolo, and Marsabit) having highest relative risk (RR > 2.0). Out of the 47 counties, 13 counties show high relative risk above 1. Counties with high TB relative risk are visually shown by the exceedence probability map Figure 16.
Figure 16 confirms with the UH model’s results that there are 13 counties elevated risk (RR > 1). These counties exhibiting high relative risk are: Nairobi, Mombasa, Kisumu, Turkan, Migori, Homa bay, Uasin Gishu, Isiolo, Marsabit, Siaya, Tharaka-Nithi, Mandera, and Embu. Again, it can be observed that North, West, North- West and central counties of Kenya exhibit high TB prevalence and low prevalence in the South-West counties except Mombasa. Figure 17 captures areas with potential clusters of disease risk. Clusters of TB risk are suspected in Marsabit, Embu, Migori and Kisumu.
6.4. Summary of the Spatial Models
The CAR and the BYM models are used to capture clustering or clusters information about disease risk. Each model identified HIV as a major cause of high TB prevalence in Kenya. Each model revealed significance of risk similarities between neighbouring counties. Local clusters of TB risk occurs in neighbouring counties with high TB relative risk. Though each of the CAR model and the BYM provides interesting information about Kenya’s TB data, CAR model appears to provide best fit since it yields lower DIC (49.19) and lower pD (627.21) than the BYM model with DIC (50.97) and pD (630.76).
7. Discussion and Conclusion
This thesis provides a framework for application of non-spatial and spatial models for modeling and mapping TB in Kenya. We discuss the non-spatial and spatial models methodology as well as illustrate application of the non-spatial and spatial methods to TB in Kenya. We have also evaluated models performance using the DIC approach proposed by  . Our findings reveal that the CAR and the unstructured heterogeneity models may be suitable for modeling and mapping relative risk of TB in Kenya. Table 8 presents comparison of the non-spatial models (PG, LN and UH), spatial model (CAR) and BYM model (combination of CAR and UH models) used in this study. Though the Poisson-Gamma (PG) model yields the lowest DIC, it does not allow for incorporation of spatial structure. The overall relative risk estimated by the PG is 0.94 (95% credible interval = 0.82 - 1.10). The Log-Normal (LN) provides specifications that can be extended to include spatial structures. The UH, CAR, and BYM models confirm that with all determinants of TB kept constant, overall relative risk of TB will be decreasing. Also, the UH, CAR, and BYM confirm HIV as a major TB determinant and that TB prevalence in
Figure 13. BYM model: Rubin and gelman convergence diagnostics.
Figure 14. BYM model: Rubin and gelman convergence diagnostics cont.
Figure 15. Kenya county level TB prevalence counts: The CAR model’s posterior mean of the relative risk map (a) and its 2.5% quantile (b), median (c) and 97.5% quantile (d).
Kenya increases with increasing HIV. Generally clustering of risk and elevated risk are observed in the North, West, North-West and the central counties of Kenya and low clustering and elevated risk in the South-West counties.
The UH model captures and displays variability of relative in the study area through the area-specific random effect whiles the CAR model and the BYM model provide evidence of risk similarities between neighboring counties. Among the LN, the UH, the CAR and the BYM models, the UH model yielded the lowest DIC (622.75), hence considered as the best fitting model when fitted to Kenya TB data for 2002-2009. However, using the acceptable criteria that a DIC difference between two models greater than 10 implied significant difference while a DIC less than 5 implied a negligible difference  , one could use any of the non-spatial and spatial models for fitting Kenya TB data for 2002-2009 depending on the issue at hand. Although these approaches are less than ideal, we hope that our results provide a useful starting point into the development of
Table 7. Poisson log-normal model with CAR model results indicating counties with high and low TB risk.
Figure 16. Kenya county level TB prevalence counts: The CAR model posterior mean of the relative risk exceedence probability map (a) and its 2.5% quantile (b), median (c) and 97.5% quantile (d).
Figure 17. The correlated heterogeneity effect’s posterior map (a) and its 2.5% quantile (b), median (c) and 97.5% quantile (d).
spatial methodology for modeling and mapping RR of TB in Kenya.
Modeling of risk in space and time is important but is quite a challenging task. Further research is required determined suitable spatio-temporal models for modeling and mapping relative risk of TB in Kenya. This will allow to explain evolution of the relative risk of TB in space and time.
Limitation of the study is that the specification of the adjacency matrix with 0 and 1 in the CAR model (43) is not internally consistent in a case in which the number of neighbors varies (occurs with most irregular lattices). In the CAR model, when is fixed at 1, the CAR models’ specification becomes an “intrinsic” CAR model (which is prevalent in empirical studies), and requires less computation time but presents theoretical and conceptual issues that undermine its validity  . For instance, the precision parameter is unknown (which is always the case), the functional from of the joint distribution of the spatial random effects (), are not identifiable under the “intrinsic” CAR specification. Thus one cannot be confident that his/her estimates, nor
Table 8. Posterior statistics of non-spatial and spatial models.
convergence of the parameters draw, due to potentially improper distributional assumptions. Conceptually, not including in the model blurs one's estimates and can lead to counter-intuitive interpretation  .