Forecasting Diabetes Patients Attendance at Al-Baha Hospitals Using Autoregressive Fractional Integrated Moving Average (ARFIMA) Models

Salem Al Zahrani^{1},
Fath Al Rahman Al Sameeh^{2},
Abdulaziz C. M. Musa^{3},
Ashaikh A. A. Shokeralla^{4}^{*}

Show more

1. Introduction

An increasing diabetic patient became a great challenge in The General Directorate of Health Affairs, Al-Baha, Kingdom of Saudi Arabia; therefore studying of this phenomenon becomes an important issue. Diabetes is a common disease around the world, which can encourage various systemic diseases and high mortality. It is a disease, which categorizes by high sugar levels in the blood and urine. It is usually diagnosed by means of a glucose tolerance test (GTT). There are three kinds of diabetes mellitus [1]. The first kind of diabetes mellitus results from the body’s failure to produce sufficient insulin. It is often occurring among children young. Type 2 diabetes mellitus results from resistance to the insulin, often initially with normal or increased levels of circulating insulin. Gestational diabetes, the third kind is Gestational diabetes and it happens when pregnant women without a previous history of diabetes develop a high blood glucose level. Diabetes is a main health challenge. Globally, the estimated number diagnosed with Diabetes is approximately 463 million people per year and mortality is 4.2 million deaths per year [2]. Al-Baha Health Affairs have launched the “Diabetes Friend” Initiative, targeting 1500 diabetics, including children, school students, and the elderly across the region. The Affairs serve about 20,000 diabetics through the Diabetes Center of King Fahad Hospital-Al-Baha and diabetes clinics of the region’s hospitals, in addition to the follow-up of healthcare centers [3].

There have been growing efforts developed by Saudi Arabia researchers to study and analyze the number of diabetes patients incidence behavior especially Al Baha region. In this study autoregressive fractional integrated moving average of time series methods will apply to data representing diabetes patient in Al Baha hospitals with the objective of deciding which of these models provide accurate prediction to diabetes patients in the Kingdom of Saudi Arabia based accuracy measurements such as AIC, BIC. The study also hypothesizes that the number of diabetes patient’s attendance trend at Al Baha hospitals goes to increasing over the time which led to existence of the long memory characteristic in data. In this study, we shall identify the order of ARFIMA models, estimate the parameters, make relevant forecast based on the models. The paper is organized as follows. In Section 3, we briefly present some theoretical framework on ARFIMA models. Empirical results are discussed in Section 4. Finally, the conclusion is presented in Section 7.

2. Literature Review

Nemours forecasting models were proposed to modeling and forecasting the number of patients in many diseases however, very few papers are available for diabetes incidence researches using the time series model.

Earnest et al. (2005) used autoregressive integrated moving average (ARIMA) models to predict the number of beds occupied during a severe acute respiratory syndrome (SARS) outbreak in tertiary hospital, they used Hospital admission and occupancy for isolation beds data from Tan Tock Seng hospital for the period 14th March 2003 to 31st May 2003. They found that the ARIMA(1, 0, 3) model was able to describe and predict the number of beds occupied during the SARS outbreak well. They also provided three-day forecasts of the number of beds required [4].

Appiah et al. (2015) used time series analysis to forecast Malaria cases in Ejisu Juaben Municipality, they found that ARIMA(2, 1, 1) autoregressive process of order 2, differencing of order 1 and moving average of order 1 was best fit for the secondary data. Using the obtained model, they forecasted for the next two years from 2014 and 2016. Pan et al. [5] (2016) used the ARIMA model for forecasting the patient number of epidemic disease. They have used actual data of every day patient number of epidemic disease between January and August 2014, in total 223 days, which they obtained from real life CDC (center of disease control). They identify time series model of ARIMA (7, 1, 0) best fit the data. Villani et al. [6] (2017) used the time series modeling to forecast diabetic emergencies who attended prehospital Emergency Medical Services (EMS). Within the period from January 2009 until December 2015. The SARIMA (0, 1, 0) × (0, 1, 0) 12 model provided the best fit model dealing with the number of patients [7].

3. ARFIMA Models

The classical approach in modeling time series data is to apply the Box–Jenkins methodology depending on whether the series is stationary or not. If the series show long memory property prediction values based on the identified and estimated Box-Jenkins models may not be dependable [8]; and [9]. The time-series data exhibiting long memory property can be better modeled using the most appropriate model namely ARFIMA(p, d, q), this model presented by Granger and [10]. They showed that it is likely to model long memory series of any span using the Extended Maximum Likelihood (EML) estimation method. In general, the Box-Jenkins ARIMA process is as follows [11] and [12]:

${\varnothing}_{k}\left(B\right){\left(1-B\right)}^{d}{x}_{t}=\theta \left(B\right){e}_{t}$ (1.1)

where: ${x}_{t}$ = time series data, d = nonnegative integer representing the difference to achieve stationarity, B = the difference lag operator, $\varnothing $ = the autoregressive parameters, $\theta $ = e moving average parameters. A long memory process is stationary with gradual decreasing AIC function ${\rho}_{k}$ at lag as $\to \infty $.

${\sum}_{k=0}^{\infty}\left|{\rho}_{k}\right|}=\infty $ (1.2)

Let a process $\left\{{X}_{t}\right\};t=1,\cdots ,T$ be a stochastic process, the model of an ARFIMA process of order (p, d, q) [13], symbolize by ARFIMA(p, d, q), with mean zero and constant variance can be written using backward shift operator notation as [14]

$\varnothing \left(B\right){\left(1-B\right)}^{d}{X}_{t}=\theta \left(B\right){e}_{t}$ (1.3)

where: $\varnothing \left(B\right)=\left(1-{\varnothing}_{1}B-{\varnothing}_{2}{B}^{2}-\cdots -{\varnothing}_{p}{B}^{p}\right)$ and $\theta \left(B\right)=\left(1-{\theta}_{1}B-{\theta}_{2}{B}^{2}-\cdots -{\theta}_{q}{B}^{q}\right)$ are polynomials in with no common factors with roots outside the unit circle, ${\varnothing}_{i},\left(i=1,\cdots ,p\right)$ and ${\theta}_{i},\left(i=1,\cdots ,q\right)$ are parameters of the autoregressive and moving average, respectively. ${e}_{t}$ = white noise process with zero mean, constant variance ${\sigma}_{e}^{2}$, B = the backward-shift operator.

The fractional differencing operator is [15]:

${\left(1-B\right)}^{d}={\nabla}^{d}$ (1.4)

$={\displaystyle {\sum}_{k=0}^{\infty}\left(\begin{array}{c}d\\ k\end{array}\right){\left(-1\right)}^{k}{B}^{k}}$ (1.5)

$={\displaystyle {\sum}_{k=0}^{\infty}\frac{\Gamma \left(k-d\right){B}^{k}}{\Gamma \left(-d\right)\Gamma \left(k+1\right)}}$ (1.6)

with $\Gamma (.)$ denoting the gamma function and the parameter d is escapable to have any real value. The parameter may not be an integer (Fractionally Integrated). A process $\left\{{X}_{t}\right\};t=1,\cdots ,T$ is stationary if; $d=0$ which is reduced to an ARMA(p, q) [15], procedure. The $-0.5<d<0.5$ time series display a stationary and invertible ARMA method with geometrically restricted autocorrelations [8]. A long memory process or ARFIMA(p, d, q) processes $0<d<0.5$ is a stationary process with slowly decreases autocorrelation function ${\rho}_{k}$ at lag k as $k\to \infty $.

According to [16], the autocorrelation function of the fractionally ARIMA process can be expressed as follows:

${\rho}_{1}=\frac{d}{1-d}$, ${\rho}_{2}=\frac{d\left(1+d\right)}{\left(1-d\right)\left(2-d\right)}$ and ${\rho}_{k}=\underset{i=0}{\overset{k}{{\displaystyle \prod}}}\left(\frac{i-1+d}{i-d}\right)$ such that

${\rho}_{k}=\frac{\Gamma \left(k+d\right)\Gamma \left(1-d\right)}{\Gamma \left(d\right)\Gamma \left(k-d+1\right)}$ (1.7)

$\approx \frac{\Gamma \left(1-d\right)}{\Gamma \left(d\right)}{k}^{2d-1}$

${\rho}_{k}\approx m{k}^{2d-1}$ (1.8)

The partial autocorrelation function of the fractionally ARIMA process can be expressed as follows [17]:

${\varnothing}_{kk}=\frac{d}{k-d}$ (1.9)

Hurst parameter (H) is a measure of the strength of a precise time series. ARFIMA(p, d,q) with $0<d<0.5$. The process is not stationary if $d\ge 0.5$

4. Estimating the Hurst Parameter

There are several methods to estimate Hurst parameters of FARIMA model, the most important one is R/S method [18].

5. The Rescaled Range R/S Method

The rescaled range (R/S) technique was first presented by Hurst; He defined the range [19]

$R\left(t,m\right)$ as: ${\mathrm{max}}_{1\le t\le m}Y\left(t,m\right)-{\mathrm{min}}_{1\le t\le m}Y\left(t,m\right)$ (1.10)

where: t = the discrete integer-valued time, m = the time-span and the standard deviation of the process, $S\left(t,m\right)$, is:

$S\left(t,m\right)=\sqrt{\frac{1}{m}}{\displaystyle {\sum}_{t=1}^{m}{\left(Y\left(t,m\right)-\stackrel{\xaf}{Y\left(m\right)}\right)}^{2}}$ (1.11)

The use of R/S ratio permits the observation of the ranges of numerous processes to be linked to long periods. Hurst found that the power acts practical relative among the proportion of the range $R\left(t,m\right)$ and the standard deviation to be:

$S\left(t,m\right):E\left[R\left(m\right)/S\left(m\right)\right]=c{m}^{H}$ as $m\to \infty $ (1.12)

where H is the Hurst parameter (0 < 1), and c is a finite positive constant that does not hang on the period m. by Taking logarithms to (1) (where are H and c in the equations) [20].

$\mathrm{log}\left\{E\left[R\left(m\right)/S\left(m\right)\right]\right\}=C+H\mathrm{log}\left(m\right)+e\left(m\right)$ (1.13)

Equation (13) is recognized as the pox diagram of R/S.

6. Empirical Results

This section discusses the empirical analysis results of applying ARFIMA models to data representing the diabetes patients attended Al-Baha hospitals during the period from January 2006 to November 2016 through testing of long memory, identification, estimation, and diagnostic checking using statistical R software.

The sequence chart of diabetes patients attended at Al-Baha hospitals from the period January 2006 to November 2016 fluctuates is shown in Figure 1.

It can be shown that the number of diabetes patients fluctuate shows a slight increase start from 2008 to 4308 patients in February 2014 and then decreased to 245 patients in June 2014, before it fluctuated steadily till the end of the study interval in 2016. The descriptive statistics of diabetes patients attended at Al-Baha hospitals during the period from January 2006 to November 2016 are reported in Table 1.

From Table 1 the mean and standered deviatin of diabetes patients attended at al Baha hospitals are 878.18 and 651.786 respectively the value of jarque-bera test is 130.14 with a significant probability value of 0.000 which indicates that the distribution of diabetes patients attended Al-Baha hospitals is not normal.

Figure 2 shows Autocorrelation Function and Partial Autocorrelation Function of diabetes patients attended Al-Baha hospitals from January 2006 to November 2016.

It can be shown that the autocorrelation function starts with large positive peaks decays gradually to zero at increasing lags, while the partial autocorrelation function shows a large positive peaks cutoff to zero after lag 5, these results confirmed that the diabetes patients attended Al-Baha hospitals series are non-stationary and

Figure 1. Sequence chart of diabetes patients attended Al Baha hospitals.

Figure 2. Partial autocorrelation function of diabetes patients attended Al-Baha hospitals.

Table 1. Sequence chart and descriptive statistics of diabetes patients attended Al Baha Hospitals.

their time series shows long memory. To check whether the diabetes patients series is stationary both Augmented dickey fuller ADF test as well as Philippe Peron PP test are applied to diabetes patients attended Al-Baha hospitals from January 2006 to November 2016 series level and its first difference, their empirical results are reported in Table 2.

A closer look to Table 2 for the series level, it can be seen that both ADF and PP tests results in absolute terms are (2.15) with a probability (0.51) and (8.74) with probability (0.21) respectively, this results confirmed that the diabetes patients series level is nonstationary however the application of both tests to the first difference of diabetes patients series revealed that both tests results in absolute terms are (6.65) with a probability (0.01) and (14.52) with probability (0.01) respectively, this results conclude that the first difference of diabetes patients series is stationary. The long memory time series characteristic was also investigated to test whether diabetes patients attended Al-Baha hospitals from January 2006 to November 2016 series has a long memory. Hurst exponent test was applied to the diabetes patient’s data, the empirical results is reported in Table 3.

The application of Hurst exponent test results of diabetes patient’s data are reported in Table 3, the test findings confirmed that there is a long memory in diabetes patient’s data. Ones Both correlogram and Hurst exponent test produced by the Rescaled range analysis confirmed the presence of long memory in the data of diabetes patients attended Al-Baha hospitals, fractionally difference must be estimated in order to carry out the fractional difference value d.

Three estimation methods, such as Sperio, Geweke, and Porter-Hudak and R/S Analysis, $d=H\u20130.5$, were applied to data representing diabetes patients attended Al-Baha hospitals to estimate the fractional difference d their finding are shown in Table 4.

The estimated values of fractional difference d are reported in Table 4 a fractional differenced parameter $d=H-0.5=0.44$ is used to estimate ARFIMA model. In order to construct an autoregressive fractionally difference model that used to forecast the diabetes patients data, and after the estimation of the fractionally difference d, the estimated fractionally difference $d=0.44$ was used to generate the fractionally differenced diabetes patients data that used to construct ARFIMA model. Table 5 demonstrates the application of ADF as well as PP tests to the fractional difference of diabetes patient’s data.

Both ADF and PP tests were applied to the fractional difference of diabetes series data ( $d=0.44$ ), the empirical findings confirmed that fractional difference

Table 2. ADF test results of diabetes patients attended Al-Baha hospitals.

Table 3. Hurst exponent test results of diabetes patients attended Al-Baha hospitals.

Table 4. Fractional differenced d estimation results.

Table 5. ADF test results of frac diff (diabetes) patients attended Al-Baha hospitals.

of diabetes series level is stationary. After the empirical results of both correlogram and Hurst exponent test produced by the Rescaled range analysis confirmed the presence of long memory in the data of diabetes patients attended Al-Baha hospitals, and the estimation of fractional difference was achieved the findings confirmed that the autoregressive fractional time series model is appropriate in modeling and forecasting the diabetes patients data. In order to build an ARFIMA model, the fractional difference value of $d:0<0.44<0.5$ is used for the estimation of ARFIMA model. Diabetes patient’s fractional differenced data has been generated. Numerous ARFIMA(p, 0.44, q) models with fixed fractional parameters are estimated and tested in Table 6 in order to select an appropriate and parsimonious candidate model for forecasting the time series data.

A closer look at the Table 6 it can be shown that ARFIMA(1, 0.44, 0) model has the smallest value of AIC and BSC of model selection criteria. In this model it is assumed that diabetes patients’ data is subject to autoregressive order 1, moving average of order 0, and difference of order 0.44. Table 7 reports

Table 6. Simulation of ARFIMA models for diabetes patients attended Al-Baha hospitals.

Table 7. ARFIMA(1, 0.44, 0) Parameter estimate for diabetes patients attended Al Baha Hospitals.

Figure 3. Forecasting values of diabetes patients attendance at Al-Baha hospitals for 24.

ARFIMA(1, 0.44, 0) parameter estimate for diabetes patients attended Al-Baha hospitals, the estimates of the ARFIMA(1, 0.44, 0) model above, the autoregressive parameter estimates are statistically significant at 0.05 significance level, therefore this model appears to be a good fit model (Figure 3).

Figure 3 shows the forecasts values of diabetes patients attendance at Al-Baha hospitals for 24 months generated from the fitted ARFIMA(1, 0.44, 0) model as well as 95% confidence limits, these forecasts are approximately close to the original data under consideration .

The estimated equation of ARFIMA(1, 0.44, 0) is expressed as follows:

${\left(1+0.023B\right)}^{0.44}Diabet{s}_{t}={\epsilon}_{t}$

7. Conclusions

An increasing number of diabetes patients is a great challenge to the General Directorate of Health Affairs, Al Baha, Kingdom of Saudi Arabia, and this paper uses ARFIMA models to model diabetes patients who attended Al-Baha hospitals. Monthly records of diabetes patients covering the period from January 2006 to December 2016 are collected from the General Directorate of Health Affairs Al-Baha region. The empirical results indicate that there is a slight increase in the number of diabetes patients in the region, and the correlogram shows large positive significant patterns decays slowing gradually at increasing lags.

Both ADF and PP tests confirmed that the series level is non-stationary; however, the first difference is stationary. Hurst test results and ACF confirmed that there is a long memory behavior in diabetes patients’ data, and the fractional difference in the data of diabetes series revealed that ( $d=0.44$ ), also unit root tests indicated that fractional difference of diabetes series level is stationary. Nemours models have been suggested to model diabetic patient’s data, according to model selection criteria, ARFIMA(1, 0.44, 0) model shows the smallest values of AIC and BSC, hence this model I is chosen to represent the data. Diagnostic check confirms that ARFIMA(1, 0.44, 0) is an appropriate adequate parsimonious model for diabetes patients attended in Al-Baha hospitals. These findings indicate that for this particular type of data ARFIMA is highly recommended in modeling and forecasting diabetic patient’s data.

Acknowledgements

The authors are grateful to Professor Muhammad Osman Abdullah Al-Baha University for valuable comments.

References

[1] Classification of Diabetes Mellitus 2019.

file:///C:/Users/sshok/Downloads/9789241515702-eng.pdf

[2] (2020) International Diabetes Federation.

https://www.idf.org/our-activities/congress/idf-complications-congress-2020/abstract

-submission.html?gclid=Cj0KCQjwz4z3BRCgARIsAES_OVfz2BX

cgbH9U2VZ9UCzEJowzWWaBeMLOJ9vH1M8FmqTGcOZxvqt65EaAujLEALw_wcB

[3] World Health Organization. Global Report on Diabetes.

http://www.who.int/diabetes/global-report/en/

[4] Earnest, A., Chen, M.I., Ng, D. and Sin, L.Y. (2005) Using Autoregressive Integrated Moving Average (ARIMA) Models. BMC Health Services Research, 5, Article No. 36.

https://bmchealthservres.biomedcentral.com/track/pdf/10.1186/1472-6963-5-36?site=bmchealthservres.biomed

https://doi.org/10.1186/1472-6963-5-36

[5] Singye, T. and Unhapipat, S. (2018) Time Series Analysis of Diabetes Patients: A Case Study of Jigme Dorji Wangchuk National Referral Hospital in Bhutan. Journal of Physics: Conference Series, 1039, Article ID: 012033.

https://doi.org/10.1088/1742-6596/1039/1/012033

[6] Villani, M., Earnest, A., Nanayakkara, N., Smith, K., De Courten, B. and Zoungas, S. (2016) Time Series Modelling to Forecast Prehospital EMS Demand for Diabetic Emergencies. BMC Health Services Research, 17, Article No. 332.

https://doi.org/10.1186/s12913-017-2280-6

[7] Villani, M., Nanayakkara, N., Ranasinha, S., Earnest, A., Smith, K., Soldatos, G., Teede, H. and Zoungas, S. (2017) Utilisation of Prehospital Emergency Medical Services for Hyperglycaemia: A Community-Based Observational Study. PLoS ONE, 12, e0182413.

https://doi.org/10.1371/journal.pone.0182413

[8] Box, G.E.P. and Jenkins, G.M. (1976) Time Series Analysis Forecasting and Control. 2nd Edition, Holden-Day, San Francisco.

[9] Hosking, J.R.M. (1984) Modeling Persistence in Hydrological Time Series Using Fractional Differencing. Water Resources Research, 20, 1898-1908.

https://doi.org/10.1029/WR020i012p01898

[10] Granger, C.W.J. and Joyeux, R. (1980) An Introduction to Long-Memory Time Series and Fractional Differencing. Journal of Time Series Analysis, 1, 15-29.

https://doi.org/10.1111/j.1467-9892.1980.tb00297.x

[11] Galbraith, J. and Zinde-Walsh, V. (2001) Autoregression-Based Estimators for ARFIMA Models. CIRANO, Montreal, 1-45.

[12] Makraidkis, S., Wheelright, S. and McGee, E. (1983) Forecasting: Methods and Application. 2nd Edition, John Wiley & Sons, Hoboken.

[13] Purohit, S., Kelkar, S. and Simha, V. (1998) Time Series Analysis of Patients with Rotavirus Diarrhoea in Pune, India. Journal Diarrhoeal Disease Research Bangladesh, 16, 74-83.

[14] Peters, E.E. (1991) Fractal Market Analysis. Wiley, New York.

[15] Sowell, F. (1992) Modeling Long-Run Behavior with the Fractional ARIMA-Model. Journal of Monetary Econometrics, 29, 277-302.

https://doi.org/10.1016/0304-3932(92)90016-U

[16] Teverovsky, V., Taqqu, M.S. and Willingerb, W. (1999) A Critical Look at Lo’s Modified R/S Statistic. Journal of Statistical Planning and Inference, 80, 211-227.

https://doi.org/10.1016/S0378-3758(98)00250-X

[17] Hurst, H.E. (1951) Long-Term Storage of Reservoirs. Transactions of the American Society of Civil Engineers, 116, 770-799.

[18] Reisen, V., Abraham, B. and Lopes, S. (2001) Estimation of Parameters in ARFIMA Processes: A Simulation Study. Communication in Statistics—Simulation and Computation, 30, 787-813.

[19] Reisen, V.A. (1994) Estimation of the Fractional Difference Parameter in the ARIMA(p,d,q) Model Using the Smoothed Periodogram. Journal of Time Series Analysis, 15, 335-350.

https://doi.org/10.1111/j.1467-9892.1994.tb00198.x

[20] Sheng, H., Chen, Y.Q. and Qiu, T. (2011) On the Robustness of Hurst Estimators. IET Signal Process, 5, 209-225.

https://doi.org/10.1049/iet-spr.2009.0241