Breast cancer is a disastrous burden for women all over the world. Around 1.7 million women worldwide (12% of all new cancer cases) were diagnosed with breast cancer in 2012  . According to National Cancer Institute (NCI), breast cancer in women in the United States is common and the estimated new cases for 2015 will be 231,840, representing 29% of all new cancer cases in the female. African American (AA) breast cancer patients experience lower survival rates in the United States compared to others    . According to the statistics from the National Cancer Institute, for every 100,000 African American women, there are 121 people diagnosed with breast cancer. Compared to Caucasian women, AA women have 10% lower incidence rate but 37% higher death rate  . African Americans’ socio-economic status and knowledge also make them one of the most vulnerable patients   .
In studying the survival of the patients, the parametric models are useful alternatives of Kaplan-Meier and Cox-PH model  . Because the model provides the time ratio, it is easier to interpret the results and is more informative and relevant to clinicians. Previous investigations have pointed out the significance of causes of death for breast cancer patients other than breast cancer itself and used competing risk method to study them     . Moreover, in studying the competing risk, parametric analysis takes into account all possible risks simultaneously and may provide better findings  .
The aims of this study are: 1) to perform the parametric competing risk analysis of AA Breast Cancer Patients in the USA from 1973 to 2012; 2) to apply the parametric mixture model method to observe the overall survival; and 3) to compare the parametric and non-parametric survival models for a specific group of sample.
2. Materials and Methods
The data set for this study was provided by the Surveillance, Epidemiology, and End Results (SEER) Program (SEER, 2012)  . This data set consists of 57,181 African American women diagnosed with breast cancer using histology, cytology, or microscopic confirmation during the period 1973-2012. Our study includes African American breast cancer patients aged above 20 at diagnosis (cancer site labeled breast by ICD-O-3 codes [C500-506 and C508-509]). After removing patients who do not have sufficient information, there are 47,016 subjects with malignant tumor (39,446) and carcinoma tumor (7570). Out of 7570 patients with carcinoma tumors, 1513 died (20%). Out of 39,446 patients with malignant tumors, 19,950 died (50.6%).
Patients’ actual ages in year were stored as a numerical variable with a range from 20 to 106. Disease stage was categorized based on SEER simplified version of stage  . There are four categories of the stage including in situ, localized, regional, and distant (coded as 0, 1, 2, and 4 respectively by SEER Historic Stage A  ). The information on radiation and surgical therapies is available in the SEER data sets. Information about chemotherapy and hormone therapy data can be obtained by linking the SEER and the Medicare claims datasets but the quality of the data is questionable   . Therefore, only radiotherapy and surgical therapy are included in this study. Tumor histologic grades were classified as well differentiated (grade 1), moderately differentiated (grade 2), poorly differentiated (grade 3), undifferentiated (grade 4), and unknown status of grade. In observation of the data, all the breast cancer patients with in-situ stage have carcinoma tumor behavior. All patients with other stages are classified as having malignant tumor. Therefore, tumor behavior is not included in this study because this information is already included by cancer stage. The patients’ marital status is categorized as single, married, separated, divorced, and widowed. Socioeconomic factors are also in our interest. Insurance classification is the only socio-economic factor in the SEER database. However, more than 80% of the data in insurance classification is unknown. Therefore, we decided not to include this factor in our analysis.
First, SEER Cause-Specific death Classification and Other Cause of Death Classification are used to obtain information about the vital status and the cause of death of the studied patients. However, these two items do not categorize the causes of death other than breast cancer. In order to investigate other significant causes of death, we used the item Cause of Death to SEER Site Recode (ICD_5DIG). We observed that 54.35% of the patients are still alive, 23.67% have died because of breast cancer, and 6.98% have died because of heart diseases. Therefore, in this study, we analyze the three competing causes of death: breast cancer, heart disease, and other causes. We observed that the distributions of survival times in years for all of these causes of death follow the Weibull distribution.
Suppose that the data consist of observations on the survival time of n patients. Associate with each individual i a random variable Bi that classifies the causes of death:
Probability that individual i dies of cause j is:,.
Following the cause-specific distributions approach in Prentice et al. (1978)  and Kalbfleisch and Prentice (1980)  , the competing risk model starts with cause-specific hazards:
The conditional cumulative distribution function (CDF) of, given that the death is of type j is:
Then the overall CDF is given by
This produces the hazard function associated with the CDF as developed by Maller and Zhu 
Since all the three cause-specific survival functions are the accelerated failure time models with Weibull distribution, the overall hazard and survival function of the competing risk model with the adjustment of the applicable covariates given in formula (1) can be written as:
where X and B are the covariate and parameter matrices.
The cause-specific and overall survival estimations from the competing risk models are compared with the survival plots derived from the Kaplan-Meier method for a specified group of patients. The overall hazard rate estimated by the competing risk model is also compared to the overall hazard estimated by the Kaplan-Meier method. Since the values of covariates affect the overall competing risk model, a comparison can only be efficiently inferred by specifying the covariates’ values. In this comparison, the population chosen for the comparison is the women who are older than 45, married, with third histological grade and localized stage, and received only surgery. This is the most frequent set of covariates in the data with 904 patients. All analyses were executed using SAS statistical software version 9.3, and the code will be available upon request to authors. The model’s parameter estimates are presented in Appendix Tables A1-A3 for the three competing risks. For each of the table results, positive covariate coefficients B would lower hazard rate and vice versa. Weibull Shape parameter refers to α.
The competing risk models and the overall survival model as described above were fitted. The parameter estimates, their p-values along with 95% CI are presented in the Appendix. The coefficient estimate for scale parameter is 1.1471. This means that risk of dying due to breast cancer itself decreases with time. For the risk of dying from heart diseases, older age decreases the expected survival time. The patients with all the other stage other than in situ are at risk of dying faster in order. Our study shows that using only radiation is likely to decrease the survival time of patients. The coefficient estimate for scale parameter is 0.7301, the risk of dying due to heart disease for breast cancer patients increases at a decreasing rate. Patients who are under the risk of other causes have a parameter pattern similar to the patients under the risk of heart diseases. The coefficient estimate for scale parameter is 0.7117. This means that risk of dying due to other causes for breast cancer patients increases at a decreasing rate.
The cause-specific hazard rates are plotted in Figure 1. Cancer patients generally focus on cancer more than any other disease they have. However, this result indicates that the rate of death due to other causes is higher than breast cancer after the first 5 years. This result suggests that in the long run, breast cancer patients should also pay attention to other diseases.
The overall hazard function is plotted in Figure 2. The hazard plot for all causes of death estimates that the rate of death of African American breast cancer patients declines slightly from 0.0385% to 0.0375% in the first year after diagnosis. After that, the death rate increases gradually. The overall survival rate of
Figure 1. Hazard rate of each cause of death.
Figure 2. Overall rate of death of African American breast cancer patients.
patients decreases gradually by approximately 3.5% per year. The 10-year survival probability is about 65%.
Comparison between Competing Risk Models and Kaplan-Meier Estimation
We also compare the Competing Risk Models with the Kaplan-Meier curves for the specified sample with the fixed covariate values as described in section 2.2. Estimates for both methods were computed and plotted on the same graphs. Results are presented in Figure 3.
Figure 3 show that in comparison to the Kaplan-Meier estimation, the use of the Accelerated Failure Time Model exaggerates the survival probability from year 2 to 18. The highest difference is about 4%. The estimation from the Accelerated Failure Time Model for the risk of heart diseases is slightly lower than the Kaplan-Meier estimation from year 5 to 15. The parametric model for other risks shows strong agreement to the Kaplan-Meier estimation. However, the parametric model provides a continuous function and thus a smooth survival curve. In addition, it incorporates the effect of covariates and thus makes the
Figure 3. Comparison of the of cause-specific and overall survival curves computed by the two methods.
survival prediction more flexible. The small deviation after year 18 can be explained by the lack of data points.
The hazard estimation between the competing-risk and non-parametric models are compared in Figure 4. The competing risk model’s hazard rate estimate was presented in Equation (2). The parametric model shows the upward trend in the rate of death over time while the Kaplan-Meier estimation provides a fluctuated hazard rate over time. This characteristic of the non-parametric model makes the inference of the hazard rate unrealistic. Usually, the rate of death does not depict too many changes in a population due to particular diseases. The consistency of the competing risk model makes the inference of hazard rate more reasonable. However, the waving pattern of the Kaplan-Meier curve may imply some important factor that is not included in the parametric model such as drug resistance.
In this study, we found several important characteristics of the rate of death of the patients and the effects of some factors on the survival time of AA breast cancer patients. The overall death rate of patients decreases in the short beginning period and increases thereafter (Figure 2). However, we discovered that only the death rate due to Breast Cancer decreases in the beginning period as opposed to the Heart Diseases and Other Causes (Figure 1). This clearly indicates the importance of the competing risk models in the study of African American Breast Cancer patients.
Our study shows the significant risk of heart diseases and other causes to breast cancer patients (Figure 1). The risk of dying of breast cancer was the highest in the first five years, but it gradually decreases over time while the risk
Figure 4. Hazard rate estimation by competing risk model and Kaplan-Meier method.
of dying of other causes increases. Therefore, our model implies that other diseases should not be underestimated when treating African American Breast Cancer patients. This result, however, may be confounded by other factors such as age. There is a possibility that death after 5 years are primarily caused by aging.
In Figure 2, the risk of dying goes down in the first one month after diagnosis and goes up significantly after that. This curve includes all patients in this study, regardless of treatment, age, or tumor behavior. Explanation for this phenomenon can be found in Figure 1. Our model shows that the risk from breast cancer decreases sharply in the first few months while the risks of other threats increase. The short decline in the aggregated hazard rate is attributed to the sharp decline in breast cancer risk and the later rise is attributed to the increase in other risks.
The comparison between two methods for a specified group of patients is shown in Figure 3. Due to the existence of the covariates in the model and the consistency of hazard rate, the parametric competing risk model can provide more reasonable results than the Kaplan-Meier estimate when dealing with different populations. Furthermore, in cases where the size of the population is insufficient, the Kaplan-Meier estimate will give constant survival rate over time. The estimation of the survival of a new group of patients can be calculated easily from the parametric model by simply changing the value of the covariates. These findings suggest that the parametric model may stimulate further study about breast cancer in African Americans.
The contribution of tumor differentiation is usually overlooked in the clinical setting. Some researchers believe that the degree of differentiation is not always an indication of the level of tumor invasiveness. The study of Jogi et al. (2012) supported the idea that the differentiation grade is associated with tumor behavior  . Our model in this study supports this idea. In the model, the parameters for more differentiated tumors are more negative than those of the less differentiated ones. This indicates that the degree of differentiation has a significant contribution to patients’ survival time. The less differentiated the tumor is, the higher risk the patient has.
We observed two notable results about the effect of age and treatments. First, our observation that survival decreases as age at diagnosis increases contradicts the conclusion by Keegan et al. (2012) in which adolescent and young adults had 44 percent higher risk of dying from breast cancer than patients from 40 to 64 years old  . Colzani et al. (2011) also concluded that women aged less than 45 have 95% probability of death whereas this percentage in patients aged from 65 to 74 is only 44.5%   . We further investigated this phenomenon by the smooth hazard curve of the two age groups: before and after 45 years old  .
The graph given in Figure 5 shows that the younger patients have higher risk in the first 10 years, which is similar to the two findings mentioned above. The highest difference in the rate of death is about 1.4%. After 15 years of diagnosis, the rate of death of patients older than 45 becomes higher. This supports the finding from our models.
The second notable result is the insignificance of radioactive treatment to survival time of the risk of breast cancer. This aspect can also be verified by the estimation of smooth hazard rate presented in Figure 6.
Patients who received both radiation and surgery and those who received surgery only had relatively the same death rate in the first 15 years. After that, patients who received only surgery had slightly lower death rate. Patients who received only radiation had the highest mortality rate. The rate of death of these patients is about 20% in the early years but declines rapidly over time. Patients who received neither radiation nor surgery also have high mortality rate in the early years, but the rate of death also decline over time. It cannot be said for certain
Figure 5. Hazard rate of the risk of breast cancer for the two age groups.
Figure 6. Hazard rate of the risk of breast cancer for the four types of treatments.
that using only radiation has a negative effect on breast cancer patients’ survival. Treatment options are chosen based on many different factors such as cancer stages, tumor size, and patient’s preference      . Patients who chose radiation therapy only in this dataset may already have a high risk of death. Choice of treatment may also reflect socio-economic status; it is possible that patients who choose radiation only will not have much access financially for treating other diseases. The rapid decline in hazard rate of the radiation-only group is not attributed to radiation therapy since patients who did not receive it also have the same pattern of decline. The reason for this drop is the decrease of risk of breast cancer that was shown in Figure 1. This finding contradicts Clark et al.  who suggested that breast irradiation does not affect survival and Whelan et al.  who suggested that radiation reduces risk. Steward et al.’s study  showed that adjuvant radiation improved survival of patients undergoing breast-conserving therapy. This finding is similar to our case where combination of surgery and radiation shows significant improvement in survival. However, Steward et al. did not present any result for cases with radiation only that can be compared with our interesting finding. In addition, our study focuses on African American patients while Steward et al.’s does not differentiate race. Further studies on the effectiveness of radiation only to confirm or disprove this finding will be helpful for physicians.
Besides the interest of statistical methodologies, our study presents the following notable findings that may be useful to clinical physicians.
・ Patients have the highest risk of dying from breast cancer in the first five years after diagnosis. After that, other diseases pose bigger threats.
・ Our findings support the idea of previous studies that the differentiation grade is associated with tumor behavior. The less differentiated the tumor is, the more dangerous it is.
・ Younger patients have higher risk than older ones in the first 10 years after diagnosis. The difference diminishes after this period. Previous studies presented mixed results about this phenomenon.
・ Our study shows that patients who received only radiation have higher risk of dying than other types of treatment and have similar risk as ones who received no treatment. This finding contradicts several previous studies and will need further investigation to confirm or disprove.
Table A1. Acceleration factor ratios for deaths by breast cancer with 95% confidence intervals and p-values.
Table A2. Acceleration factor ratios for deaths by heart diseases with 95% confidence interval and p-values.
Table A3. Acceleration factor ratios for deaths by other causes with 95% confidence interval and p-values.