The World Health Organization (WHO)  states that high blood pressure (BP), or hypertension, is one of the most important health factors. According to the Ministry of Health, Welfare, and Labor, medical expenditures for high BP and related diseases were 1.85 trillion yen, accounting for 4.36% of total medical expenditures (42.36 trillion yen) in fiscal year 2015 in Japan  . Moreover, hypertension increases the medical expenditures of patients with other diseases such as diabetes    , reduces happiness and life satisfaction   , and the true cost of hypertension is considered to be much higher than the direct cost.
Many medicines for controlling high BP (hereafter, BP medicines) are widely used. Therefore, proper evaluation of the effects of these medicines is very important. For assessing quality trial designs, systematic reviews of randomized controlled trials (RCT)  are considered the most reliable method, followed by simple RCT, other controlled clinical trials, observational studies (cohort and case control), cases studies, anecdotes, and personal opinions  . In RCT, participants are randomly divided into two (or more) groups. One group is treated with the medication in question (treated group), and the other with a placebo or standard medicine (control group).
To control for the effects of participant characteristics, such as gender, age and health conditions, a large sample is often needed, especially for long-term trials. Although systematic reviews of RCT, which increase the number of participants by combining various studies, are one solution, the biases of each review must be considered. These include publication bias, where trials with positive results are more likely to be published than those with negative or questionable results    . Researchers themselves might not have strong incentives to publish when the expected results are not obtained  . Sponsor bias is also possible. Many studies are sponsored by the pharmaceutical companies that make the medical products, and bias toward the sponsor’s products and conflicts of interest have been suggested    . In the worst case, researchers might be dishonest in conducting the trials, raising conflicts of interest.
The Diovan scandal is famous in Japan. Diovan is the name of the valsartan, a medicine widely used to treat hypertension, sold by Novartis Pharma, the Japanese subsidiary of Novartis International AG. RCT were conducted at five university hospitals to study the effects of valsartan on the cardiovascular and cerebrovascular risks of hypertensive patients. Initial publications emphasized the effectiveness of the medicine; however, fraud, improper treatment of obtained datasets, and conflicts of interest were subsequently revealed. Data were altered in favor of the medicine, and a company employee analyzed the data without divulging his true position. As a result, many papers published in international academic journals such as the European Heart Journal, Circulation, International Journal of Cardiology, American Journal of Cardiology and Lancet had to be retracted, and researchers were penalized   . The Japanese Society of Hypertension issued a special report on the Diovan scandal  , and Novartis Pharma itself had to make statements to restore its credibility  .
Moreover, practices are often terminated in the early stages for various reasons  , which can cause a termination (or endpoint) bias. These facts suggest that while systematic reviews of RCT can be useful, performing and evaluating RCT properly is not an easy task. Double-blind RCT, where both participants and researchers are unaware as to who is receiving prescribed medicines, are usually used. Numerous studies on treatments for hypertension and related diseases have been conducted by double-blind RCT  -  . However, ethical problems with double-blind RCT have also been widely discussed  -  . The major arguments are that the control group does not get the benefit of the medicine when it is effective, and the double-blind design does not allow rapid detection of adverse effects related to the study medicine  . Informed consent is very important for satisfying ethical requirements and protecting participants  , but individuals might hesitate to take part in trials if they know that they might receive the placebo. As a result, there have been strong suspicions concerning reliability of findings     .
While RCT are a very important tool for evaluating the effectiveness of medicines, they are far from perfect. Therefore, different methods for verifying and evaluating costly and time-consuming RCT results are needed. In this paper, we propose methods to evaluate the effects of medicines that do not depend on RCT. In Japan, most employees 40 or older are required to have health and medical checkups once a year by the Industrial Safety and Health Act  . Private companies and central and local governments form health insurance associations for their workers. Health and medical checkup information, including BP and treatment with BP medicines  . Since the dataset is not intended to evaluate particular medicines, it is free from the various described biases. However, because the characteristics of the dataset are quite different from those collected by RCT, it must be analyzed with a great care. We consider four different statistical methods to analyze the effects of BP medicines.
2. Data and Methods
In this study, we analyzed an anonymized dataset of health checkups obtained from one health insurance society. The dataset includes BP levels and information about treatment with BP medicines. First, we compared the BP levels of two groups: those taking BP medicines (hereafter, with BP medicines), and those not taking them (without BP medicines). Individual characteristics and health conditions can affect BP levels. To control for these factors, a regression using a dummy variable for taking BP medicines (BP_medicine dummy) was conducted for the second stage. Next, the BP medicine dummy was replaced by its expected values in the regression model to deal with the endogeneity problem. Finally, we analyzed the data of individuals who took BP medicines during certain periods of time, and not other periods. In this model, sample selection bias  problems were considered.
The dataset consisted of 113,979 health and medical checkup cases obtained from 48,022 individuals (all employees 40 or over and their family members (voluntary)) of the society for three fiscal years (April 2013 through March 2016). The dataset contained various health and medical information for individuals, including BP levels and treatments with BP medicines. For details of the dataset, see Nawata et al.  and Nawata and Kimura    . In this study, we considered only systolic BP (SBP) to simplify the argument.
We began by comparing the SBP of two groups: those treated with BP medicines, and those not treated. As Nawata et al.  suggested, various factors, such as age, gender, eating habits, daily activities, smoking, drinking alcohol, and sleeping habits can affect BP levels. We therefore considered the following regression model to remove such effects. Let BP_Medicine be a dummy variable that takes 1 if an individual took BP control medicines in that fiscal year and 0 otherwise, and x be other explanatory variables. We considered the following regression model (Model 1),
The other explanatory variables were Height (cm), BMI (body mass index), Anamnesis (1: with anamnesis; 0: otherwise), Eat_fast (1: eating faster than other people; 0: otherwise), Late_Supper (1: eating supper within 2 hours before bed 3 or more times a week; 0: otherwise), After_supper (1: eating snacks after supper 3 or more times a week, 0: otherwise), No_breakfact (1: not eating breakfast 3 or more times a week; 0: otherwise), Exercise (1: exercising 30 minutes or more 2 or more times a week for more than a year; 0 otherwise), Daily_activity (1: doing physical activities [walking or equivalent] 1 hour or more daily, 0: otherwise), Walk_fast (1: walking faster than other people of similar age and same gender; 0: otherwise), Smoke (1: smoking; 0: otherwise), Alcohol_freq (0: not drinking alcohol, 1: sometimes, 2: every day), Alcohol_amount (0: not drinking; 1: drinking less than 180 ml of Japanese sake wine [percentage alcohol about 15%] or equivalent in a day when drinking; 2: drinking 180 - 360 ml; 3: drinking 360 - 540 ml; 4: drinking 540 ml or more); Sleep (1: sleeping well; 0: otherwise). Trend is the time trend by year, and given by (year of checkup―2013) and Weight_year (1: weight change of more than 3 kg from previous year; 0: otherwise).
Since individuals take BP medicines because their BP levels are high, the endogeneity problem could exist. Let be the unobserved SBP without BP medicines. An individual takes medicine if the is higher than a certain value, depending on individual characteristics. Since is not observable if an individual is taking BP medicines, we assume
and the critical value of taking BP medicine or not is given by
An individual takes BP medicines if , and does not take it otherwise. Assuming the normality of error terms, we get
Therefore, we get
where is the distribution function of the standard normal distribution. Equation (4) and the expected values are calculated by the Probit maximum likelihood method. Replacing the BP_Medicine dummy by its expected value in Model 1, we get the regression model (Model 2) given by
Since Model 2 does not satisfy the standard assumption of the ordinary regression model, White’s method  is used to calculate the standard error  .
The dataset contains the data of individuals who took BP medicines during certain periods of time, and not other periods. Therefore, we consider the model just using the dataset of these individuals (Model 3). As before, we have to consider individual characteristics and Model 3 is the same as Model 1 except we used a part of the dataset. Since only a part of the dataset is used, a bias due to selection of the dataset might exist. Therefore, we check the results by Heckman’s sample selection bias model  (For details, see Appendix A). Since Heckman’s two step estimator  sometimes behaves poorly   , we used the maximum likelihood method of EViews (V. 9).
3.1. Comparisons of BP Levels with and without BP Medicines
Figure 1 shows the distributions of SBP levels with and without BP medicines. BP levels without BP medicines were lower than those with BP medicines. We excluded cases where BP values were too large or too small (over 300 and under 10, respectively), and where information regarding BP medicines was not available, and thus considered 113,960 cases. The basic BP statistics and characteristics of the two groups are presented in Table 1.
The numbers of the cases were 95,551 and 18,409 without and with BP medicines, respectively. Means and standard deviations (SD) of SBP were 126.0 and 16.0 mmHg without BP medicines and 134.8 and 15.91 mmHg with BP medicines. The difference of the means was 11.2 mmHg. Under the null hypothesis, the means of two groups are equal, and the t-value becomes 86.88 and is rejected at any reasonable significance level, and SBP with BP medicines are higher than those without.
3.2. Regression Analysis using BP_Medicine Dummy
The previous results suggest that SBP of individuals with BP medicines are higher than those without. However, Nawata et al.  suggested, individual characteristics such as gender, age, and health conditions affect BP levels. The group with BP medicines might consist of individuals belonging to higher BP categories. To control for these effects, we use Model 1. Cases where body mass index (BMI) was too large (over 100) and BP values were too large or small (over 300 or under 10) were excluded, leaving 95,212 cases without missing values in any explanatory variables for the analysis. 15.8% took BP medicines. A summary of explanatory variables is presented in Table 2 and the results of the estimation are given in Table 3.
Figure 1. Distributions of SBP with and without BP medicines.
Table 1. Basic statistics of SBP levels (mmHg) and characteristics of two groups.
SD: Standard deviation. Mean and SD (in the parentheses) are given for Age, Height and BMI.
Table 2. Summary of explanatory variables.
SD: Standard Deviation.
The estimate of BP_medicine is 5.1 mmHg for Model 1 and the t-value is 34.82. Although the values are smaller (about half) those of the previous section, this means that the SBP of individuals with BP medicines are still higher than those without even when controlling for various individual factors. A direct interpretation might argue that BP medicines make BP levels higher and the medicines might be harmful (more than meaningless). However, individuals take BP medicines because their BP levels are high. There might be endogeneity of the BP_medicine dummy. In the next section, we consider the endogeneity problem.
3.3. Regression Analysis Using Expected Value of BP_Medicine Dummy
Equations (2)-(4) suggest that BP_medicine may be correlated with the error term of Equation (1). It is well known that the ordinary lease squares (OLS)
Table 3. Results of estimation (Model 1).
*Significant at the 5% level; **Significant at the 1% level; SE: standard error.
estimator is inconsistent in this situation. We therefore use the expected value instead of the original variables (Model 2), and obtain the consistent results presented in Table 4. (The results of probit maximum likelihood methods to calculate the expected values of BP_medicine are given in Table 5.) In this model, the estimate of E (BP_medicine) is −6.9 mmHg, and its t-value is −9.279; that is, taking BP medicines reduced SBP significantly at any reasonable level, and results indicate that medicines were effective. Note that since the model contains various individual characteristics, this reduction was due to the medicines alone.
3.4. Analyses Using the Data of Individuals Who Took BP Medicines in Some Periods and Not in Others
If BP medicines are effective, SBP should be lower when individuals take BP medicines and higher when they do not. Therefore, we can evaluate the effects of BP medicines by analyzing SBP levels of individuals who took BP medicines in some sample periods and not in others. Since individual characteristics change every year (for example, age increases a year), we needed a regression analysis. The number of these cases was 4315.
Table 4. Results of estimation (Model 2).
*Significant at the 5% level; **Significant at the 1% level; SE: standard error.
The results of Model 3 using these data (Model 3 is the same as Model 1 except it uses a subset of the dataset) are presented in Table 6. Since only a part of the dataset was used, a sample selection bias was possible. Therefore, the estimated results of Heckman’s sample selection bias model by the maximum likelihood method are also presented in Table 6. Although results of some variables, such as anamnesis, are a bit different, the estimated values of BP_medicine are similar using the two estimation methods. The estimates of BP_medicine were SBP −9.2 and −7.9 mmHg in the OLS and sample selection bias model results, respectively. This implies that the BP medicines effectively reduced SBP by about 8 - 9 mmHg.
RCT (especially double-blind RCT) is a very important and widely used tool to evaluate the effectiveness of medicines. However, such trials are costly and time consuming. At the development stage, pharmaceutical companies finance RCT, but once a medicine is approved for public use, finding private sponsors is not
Table 5. Results of probit estimation (Equation (4)).
*Significant at the 5% level; **Significant at the 1% level; SE: standard error.
easy. Public funds are also limited, and researchers face difficulty performing further RCT. Moreover, as previously mentioned, there are biases and ethical problems with RCT. In other words, RCT is not a perfect method.
In this paper, we considered methods for evaluating the effects of BP medicines using the dataset of health and medical checkups done every fiscal year in Japan obtained from one health insurance society. The data collection method is completely different from that of RCT. Therefore, the dataset used was free from costs, long duration, and various biases and ethical problems.
However, data analyses must be conducted very carefully. Here, individuals were divided into two groups (with and without BP medicines) by a non-random method. Individuals with high BP would be more likely to take BP medicines. Accordingly, by a simple comparison, we found mean SBP with BP medicine group to be 11.2 mmHg higher than that without BP medicine. Even when controlling for individual characteristics by regression analysis, the mean SBP with BP medicine group was still 5.1 mmHg higher. This is considered to reflect the endogeneity of the BP_medicine dummy. We then sought to solve the problem by replacing the explanatory variable. Using the estimates of E
Table 6. Results of estimation (Model 3).
(BP_medicine), the expected value of BP_medicine was −6.8 mmHg and its t-value was −9.28, we found that BP medicines were effective in this model. This result is confirmed by the regression analyses of individuals who took BP medicines in some sample periods and not in others. SBP was reduced by −9.2 mmHg on average when BP medicines were taken by the OLS method. Since less than 4% of all cases were used in this analysis, it may have suffered from a sample selection bias. We obtained a similar estimated value (−7.9 mmHg) by the Heckman’s sample selection bias model, which implies the analysis was reliable.
In this paper, we consider a different approach to evaluate the effects of BP medicines. The dataset used was obtained from the health and medical checkups required of most employees aged 40 or over in Japan; thus, it was completely independent from direct measurements of the effects of medicines, and was not costly or time consuming to obtain. The results of the analyses are free from the various problems of (double-blind) RCT, such as cost, length of trial periods, various biases, sponsorship, and ethical issues.
Since the dataset was not designed to evaluate the effects of medicines, careful statistical methods were required. As shown in this paper, even though the same dataset was used, our results were contradictory depending on the analysis method. Only careful statistical approaches would verify the results of RCT. They could also help in the proper and effective design of RCT before the costly and time-consuming trials.
We considered only BP medicines, and the data were obtained for only three fiscal years from one health insurance society. To verify the method, it will be necessary to evaluate various medicines and treatments, and to collect more data for longer periods from various health insurance societies. We are currently negotiating with several health insurance societies to provide us their data. Developments of more reliable and standardized statistical methods might be necessary. These are subjects to be studied in future.
This study was supported by a Grant-in-Aid for Scientific Research, “Analyses of Medical Checkup Data and Possibility of Controlling Medical Expenses (Grant Number: 17H22509)”, from the Japan Society of Science. This study was also conducted as a part of the Project “Exploring Inhibition of Medical Expenditure Expansion and Health-oriented Business Management Based on Evidence-based Medicine” undertaken at the Research Institute of Economy, Trade and Industry (RIETI). The dataset was anonymized at the health insurance society. This study was approved by the Institutional Review Boards of the University of Tokyo (number: KE17-30). The authors would like to thank the health insurance society for their cooperation in providing us the data. We would also like to thank an anonymous referee for his/her helpful comments and suggestions.
Appendix A: Heckman’s Sample Selection Model
To evaluate the effects of BP medicines using the data of individuals who took BP medicines in some sample periods and not in others, we considered the model,
However, we used less than 4% of total cases for the estimation. This means that we might have selected special individuals. In this situation, Heckman’s sample selection bias model  is widely used to verify the estimation results. We consider a latent variable , and an individual takes BP medicines if
where w is an error term with mean zero and variance 1, and the correlation coefficient of u and w is . For details of the model and likelihood function, see Amemiya  and Nawata  . The estimation results of the selection equation (Equation (8)) are given in Table A1. The estimate of the and t-value are 0.926 and 117.08, and the effect of sample selection bias considered in this model.
Table A1. Results of estimation: selection model (Equation (8)).
*Significant at the 5% level; ** Significant at the 1% level; SE: standard error.