As the World Health Organization (WHO) has stated  that diabetes is an important public health problem, one of the four priority noncommunicable diseases targeted by world leaders. WHO  also estimated that 422 million adults or 8.5% of the adult population were living with diabetes in 2014, increased from 108 million or 8.5% of the adult population in 1980. It also estimated 1.5 million deaths in 2012 due to diabetes and an additional 2.2 million deaths due to the increased risks of cardiovascular and other diseases caused by higher-than-optimal blood glucose levels. The International Diabetes Federation (IDF)  estimated that the numbers of people (age 20 - 79) with diabetes would increase from 415 million in 2015 to 642 million in 2040. The IDF also estimated that 5.0 million deaths were caused by diabetes and that the cost of diabetes might have been between $673 billion and $1197 billion in 2015 ($ refers to US$). NCD Risk Factor Collaboration  studied the trends of the diabetes population since 1980 and concluded that if post-2000 trends would continue, only a limited number of countries could achieve the global target of halting the rise in the prevalence of diabetes by 2025. The direct annual global cost of diabetes was estimated to be $825 billion, led by China ($170 billion), the United States ($105 billion), India ($73 billion), and Japan ($37 billion) based on the number of people with diabetes in 2014. The American Diabetes Association (ADA)  estimated that the total cost of diabetes in the United States was $245 billion in 2012.
More recently, Bommer et al.  estimated that the worldwide cost of diabetes was $1.31 trillion or 1.8% of the world gross domestic product (GDP) in 2015 (also see Zhang and Gregg  for comments on this study). They reported that two-thirds of the costs were direct medical costs ($857 billion) and one-third were indirect costs, such as reductions in productivity. Diabetes as a comorbidity prolongs the length of stay (LOS) in a hospital  -  . The costs and economic burden of diabetes are a serious worldwide issue and have been the subject of various studies  -  . Diabetes may cause various complications. WHO  stated that “Possible complications include heart attack, stroke, kidney failure, leg amputation, vision loss and nerve damage.” It has also been pointed out that diabetes increases the risk of cancer   .
In Japan, according to a patient survey by the Ministry of Health, Labour and Welfare  , the number of diabetes patients treated regularly was 3.16 million (1.77 million males and 1.40 million females) in 2014, and this number represented an increase of 0.46 million from the previous survey done in 2011. On the other hand, the National Health and Nutrition Survey  reported that 19.5% of males and 9.2% of females were living with diabetes in 2015 in Japan. As a result, the medical costs for diabetes reached 1219 billion yen or 3.0% of Japan’s total medical expenditure (40.8 trillion yen or 8.33% of Japanese GDP) in fiscal year 2015 (the Japanese fiscal year runs from April to March of the next year)  .
Nawata and Kawabuchi     analyzed the LOS and daily medical expenditures of type 2 diabetic patients. (Diabetes is classified as type 1 or type 2  , and 90% or more diabetes cases are classified as type 2 diabetes    .) They found that there were large differences in average LOS (ALOS) among hospitals. On the other hand, the differences in daily medical expenditures among hospitals were relatively small, and ALOS accounted for the largest part of total medical expenditures for diabetes. The problem with these studies is that only diabetic inpatients at Diagnosis Procedure Combination (DPC) hospitals were analyzed (for details regarding DPC hospitals, see Nawata et al.  ). The medical expenditures of outpatients and patients in non-DPC hospitals were not considered. Analyzing all diabetic patients and comparing these patients with healthy non-diabetic persons are necessary to evaluate the total cost and economic burden of diabetes. To prevent diabetes, it is also necessary to determine what factors affect diabetes. For this purpose, it is necessary to investigate a dataset including normal and healthy persons and compare them with diabetic persons. However, it is very difficult and costly to get a large scale individual dataset that includes many normal and healthy individuals because they do not go to hospitals or clinics voluntary by themselves. Moreover, whether a person is diagnosed diabetes or not is a binary variable, and we cannot use the standard regression analysis.
In Japan, health insurance societies are formed by private companies and central and local governments for their employees. The health insurance societies pay the medical expenses of their members. Moreover, yearly medical checkups (hereafter, checkups) are required for most workers age 40 or older in Japan  . This means that the health insurance societies have all of the health and medical information of their members including normal and healthy persons. The monthly reports of medical payments, including types of treatments, institutions used and amounts paid for medical care, that are sent from medical institutes to health insurance societies are called “receipts” in Japan. Nawata et al.  and Nawata and Kimura  analyzed blood pressures using a dataset containing 113,979 checkups obtained from 48,022 persons with the cooperation of one health insurance society. However, information from the receipts was not used in the analysis.
In this paper, we first analyze the total costs of diabetes using the dataset combining both checkups and receipts. The dataset contains 113,979 checkups and 3,671,783 monthly medical, dental, care-giving and pharmacy receipts from fiscal year 2013 to 2015. Since the outcome (diabetes or not) is a binary variable, we evaluate factors affecting diabetes by the probit models.
2. Data and Analysis
In this study, we first made up a dataset combining checkups and receipts with the cooperation of the health insurance society of one large Japanese corporation. The dataset was anonymized at the society. The dataset contained information regarding checkups   and all receipts from fiscal year 2013 to fiscal year 2015 (i.e. April, 2013 to March, 2016). It included 113,979 checkups obtained from 48,022 persons including employees of the corporation and their family members (voluntary). The receipts contained monthly information on diseases, treatments, institutes used, and payments of all members (employees and their family members satisfying determined conditions) of the society. The receipts were classified into five categories: dental, inpatients of DPC hospitals (hereafter, DPC), outpatients and inpatients of non-DPC hospitals (hereafter, outpatient & non-DPC), care-giving, and pharmacies. Among these receipts, we used the information of DPC hospital, outpatient & non-DPC hospital and pharmacy receipts for the analysis of diabetes. The total of the values for these three categories is referred to as the “medical payment” in the rest of this paper. (Although we also analyzed the dental payments, no significant difference was observed between diabetic and non-diabetic persons.) The numbers of DPC, outpatient & non-DPC hospital, and pharmacy receipts were 15,652, 1,986,494 and 1,169,920, respectively. The total number of these receipts was 3,172,066 during the sample period. (There were 498,200 and 1517 dental and care-giving receipts.) These receipts were gathered in each fiscal year and combined with the checkup results. We used the information for people who underwent checkups during the fiscal year. A total of 113,979 cases for which both the results of checkups and receipts were available for the same fiscal year were used and their relationships are analyzed. The ages in the dataset ranged from 39 to 74 years. Note that we did not separate type 1 and 2 diabetes since distinguishing between these types is difficult in adults   and types of diabetes are not reported in many cases.
We first compared the average medical payments per person in a fiscal year for all and diabetic cases. However, other factors might affect the medical payments. For example, medical expenditures increase as a person becomes older. This means that if most diabetic persons were old, the medical payments would increase, and diabetes might not be an important factor. To remove the effects of age and gender, we use the regression analysis.
Next, we evaluate the factors that affect being diagnosed with diabetes by the probit model. The probit model is widely used for the analysis of binary response variables. Let y be a dummy variable that takes 1 if a person is diagnosed with diabetes and 0 otherwise. Suppose that an unobserved health condition indicator of a person is given by
where is a vector of explanatory variables and is an error terms that follows the standard normal distribution. Let y = 1 (with diabetes) if and y = 0 (without diabetes) otherwise. Since can include a constant term and the sign of does not change if a positive constant value is multiplied to, we can assume that the mean and variance of are 0 and 1 without loss generality. The probabilities are given by
where is the standard normal distribution function. The model is estimated by the maximum likelihood method. Explanatory variables include basic characteristics of individuals and risk factors pointed out in previous studies.
3.1. Medical Payments
The average medical payment per person in a fiscal year was 13,672 points with standard deviation (SD) 39,756 points (medical payments are measured as points in Japan. 10 yen per point is payed to medical institutes. For details, see Nawata et al.  ). In 21,573 (18.9%) cases, medical payments were zero. On the other hand, 1945 (1.7%) cases, medical payments exceeded 100,000 points. The total medical expenditure is 1,558 million points in three fiscal years. In 3353 (2.9%) cases, persons were diagnosed diabetes. The average medical payment for these cases was 37,112 points with SD 61,242 points; thus, the medical payments for diabetic persons were 2.7 times higher than the overall average, or 23,439 points higher. As a result, the total medical payments to diabetic persons over three fiscal years were 124.4 million points, 8.0% of the total medical payments. In the national survey  , the total payment to hospitals, clinics, and pharmacies was 36.5 trillion yen, and the medical expenditure for diabetes was 3.3% of this total; this implies that diabetes might be a much costlier disease than generally thought.
3.2. Regression Analysis of Medical Payments
The results of the previous section suggest that diabetes is a costly disease. As mentioned earlier, other factors might affect the medical payments. For example, the total medical expenditures per person were 278 thousand yen for those aged 45 - 64 and 724 thousand yen for those aged 65 or over in fiscal year 2015  . In fact, the average ages were 49.7 and 53.4 years for all cases and for diabetic cases, respectively. To remove the effects of age and gender, we consider the regression model (Model A),
where is the medical payment, (1: female 22.3%, 0: male 77.7%), Diabetes (1: diabetes 2.9%, 0: non-diabetes 97.1%), F2014 (1: fiscal year 2014 33.3%, 0: otherwise) and F2015 (1: fiscal year 2015 34.6%, 0: otherwise) are dummy variables, and u is an error term with a mean of zero.
All 113,979 cases were assessed using this model and the results of the estimation were as follows:
(830) (16) (279) (690) (287) (284)
The standard errors (SEs) are in parentheses. Even if we consider the effects of age, gender and years, there existed a large difference (21,716 points) between medical payments for diabetic and non-diabetic persons.
3.3. Evaluation of Factors Affecting Diabetes by the Probit Model
Next, we evaluate the factors that might affect being diagnosed with diabetes using the probit models. We first consider Model B:
if and 0 otherwise.
If a person is diagnosed with diabetes, y = 1, and y = 0 otherwise. This model gives useful information about effective prevention methods of diabetes. A total of 95,366 cases (2788 diabetes and 92,758 non-diabetes) that had no missing values for any of the variables were used in the analysis. The definitions of and basic statistics for the explanatory variables were as follows: Age, Female, Height, BMI, SBP (systolic blood pressure), DBP (diastolic blood pressure), Eat_fast (1: eating faster than other people, 0: otherwise), Late_Supper (1: eating supper within two hours before bedtime three times or more in a week, 0: otherwise), After_supper (1: eating snacks after supper three times or more in a week, 0: otherwise), No_breakfast (1: not eating breakfast three times or more in a week, 0: otherwise), Exercise (1: doing exercise for 30 minutes or more twice or more in a week for more than a year, 0 otherwise), Daily_activity (1: doing physical activities (walking or equivalent) for one hour or more daily, 0: otherwise), Walk_fast (1: walking faster than other people of a similar age and the same gender, 0: otherwise), Smoke (1: smoking, 0: otherwise), Alcohol_freq (0: not drinking alcoholic drinks, 1: some-times, 2: everyday), Alcohol_amount (0: not drinking, 1: drinking less than 180 ml of Japanese sake wine (about a 15% alcohol percentage) or equivalent alcohol in a day when drinking, 2: drinking 180 - 360 ml, 3: drinking 360 - 540 ml, 4: drinking 540 ml or over), Sleep (1: sleeping well; 0: otherwise), F2014 and F2015. Age, Female and Height represents basic characteristics of a person. Previous studies pointed out that obesity, blood pressures and smoking were important risk factors  -  . Eat_fast, Late_ Supper, After_supper, No_breakfast, Exercise, Daily_activity, drinking alcohol and sleeping represent perons’ lifestyles  . These factors are important to give practical advices to improve their lifestyles. Since two Hight and three BMI values were too large, they were recalculated using the formula BMI= Weight(kg)/ Hight(m)2.
The summary of these variables is given under Model B in Table 1. Note that the models considered by Nawata et al.  used information on anamnesis as an explanatory variable. However, diabetes itself is an important anamnesis, and “anamnesis” was not included in this analysis. The results of the estimation are given under Model B in Table 2. The estimates of Age, BMI, SBP, Eat_fast, Late_
Table 1. Summary of explanatory variables.
SD: standard deviation.
Supper and Smoke were positive and significant at the 1% level. On other hand, estimates of Female, DBP, No_Breakfast, Alcohol_freq and Sleep were negative and significant at the 1% level. Exercise was positive and significant at the 5% level. These variables were considered to be important variables that affected being diagnosed with diabetes.
The relationship between socioeconomic status is an important and controversial issue and many studies have been done      . Therefore, we consider a model that includes Wage (average wage between April and June of the fiscal year for employees of the corporation; mean 368 thousand yen and SD 138 thousand yen). This variable represents the socioeconomic status of a person. We could not get other variables representing the socioeconomic status. Since Wage was available only for workers, 82,732 cases (2636 diabetes and 80,096 non-diabetes) were used in the analysis. The model (Model C) is as follows:
Table 2. Results of Estimation of the probit models.
*: significant at the 5% level, **: significant at the 1% level, SE: standard error.
if and 0 otherwise
The summary of the explanatory variables is given under Model C in Table 1. The results of the estimation are given under Model C in Table 2. Nawata et al.  showed that a person’s wage is an important variable affecting blood pressure. However, it was not significant in the present study (t-value is 0.655) and is not considered to be an important variable for diabetes. Additional surveys might be necessary to find out proper variables (such as positions and types of works at the company) representing the socioeconomic status. The results of significance estimates were similar to those of Model B except that Exercise became insignificant in this model.
Since wages are not a significant factor in being diagnosed with diabetes, we use the results of Mode 1 B in this section. Unlike previous studies, we can calculate probabilities that a person might have diabetes by the formula where is the maximum likelihood estimator of in Model B. For comparison of probabilities, we consider a male person with an age of 50, height 170 cm, BMI 24, SBP 125 mmHg, and DBP 80 mmHg; the values of all other variables are set to be zero as a base case. The probability of this person having diabetes is 3.3%. We consider the effects of variables which were significant in Models B except age and gender. Since there were arguments about effectiveness of high intense exercise    , we did not evaluate Exercise in this study. If BMI drops to 20, the probability is reduced to 1.9%. On the other hand, the probability becomes 7.0% if the BMI is 30. Hence, obesity is a very important factor in controlling diabetes, as has been widely suggested   . In other words, controlling overweight may be an effective way to reduce the prevalence of diabetes.
With SBP values of 140, 160 and 180 mmHg (these values correspond to the SBP criteria for grades 1, 2 and 3 hypertension, respectively  ), the probabilities become 4.5%, 6.5% and 9.1%, respectively. High SBP or SBP hypertension is another important factor   . For hypertensives of SBP 180 mmHg, the risk becomes about 3 times as much as that of normotensives. Hypertension makes both vascular mortality  and diabetes rates higher. Therefore, controlling SBP may be more important than previously thought. On the other hand, with DBP values of 70 and 60 mmHg, the probabilities increase and become 4.3% and 5.5%, respectively. Although the risk of postural hypotension for diabetic persons was reported   , low DBP or DBP hypotension itself might also be an important factor for diabetes.
As for eating habits, eating quickly and having a late supper increase the probability to 3.8% and 3.9%, respectively. However, having no breakfast reduces the probability to 1.8%. Smoking increases the probability  to 4.3% but drinking alcohol decreases it  . If a person drinks alcohol every day, the probability reduces to 1.8%. However, the proper guidelines  should be kept for drinking alcohol. Sleeping well slightly improves the probability to 2.9%.
The reasons of some factors are unclear. Only limited information of lifestyle habits was reported in this study. Therefore, further surveys are necessary to get detailed information of personal life styles. For example, we need to know total calories consumed, types (carbohydrates, lipid and protein) and amounts of food eaten and information of exercises and physical activities to give proper and practical advices to prevent diabetes. This is a clear limitation of the study.
In this paper, we first analyzed the medical payments made for persons diagnosed with diabetes. A total of 2.9% of cases were diagnosed with diabetes; in these cases, the average medical payment per person was 2.7 times the overall average medical payment per person. As the result, they spent 8.0% of medical payments, which was much bigger than national survey data. This result did not change much even if ages and genders were considered.
Next, we evaluated the factors that affected whether a person was diagnosed with diabetes using the probit models. BMI and high SBP, low DBP, physical activities, eating habits, smoking, drinking alcohol, and sleeping were important factors affecting diabetes. This means that the prevalence of diabetes could be recused by efforts such as prevention of overweight and obesity. However, the reasons underlying some of these factors are unclear, and further studies of the lifestyles of persons are necessary to determine effective methods to prevent diabetes. Although “Wage” is used to represent the socioeconomic status, it may not be a proper variable. It may be necessary to find out other variables for evaluation of socioeconomic status. We have succeeded in obtaining information from other health insurance societies including other socioeconomic variables and are constructing a dataset that is much bigger than those used in previous studies. An analysis of the dataset is necessary. These are subjects to be studied in future.
This study was supported by a Grant-in-Aid for Scientific Research, “Analyses of Medical Checkup Data and Possibility of Controlling Medical Expenses (Grant Number: 17H22509),” from the Japan Society of Science. This study was approved by the Institutional Review Boards of the University of Tokyo (number: KE17-10). The authors would like to thank the health insurance society providing us the data for their sincere cooperation. We would also like to thank two anonymous referees for their helpful comments and suggestions.