With the development of the Internet, high-dimensional data has been widely collected in life, especially in the field of medical research and finance, the results or responses of data are censored, so the study of high-dimensional censored data is meaningful. However, due to the impact of “disaster of dimension”, the study of high-dimensional data becomes extremely difficult, and some special methods must be adopted to deal with it. As the number of data dimensions increases, the performance of high-dimensional data structures declines rapidly. In low-dimensional spaces, we often use Euclidean distance to measure the similarity between data; but in high-dimensional spaces, this kind of similarity no longer exists, which makes the data mining of high-dimensional data very severely challenging. On the one hand, the performance of the data mining algorithm based on the index structure is reduced; on the other hand, many mining methods based on the entire spatial distance function will fail. By reducing the number of dimensions, the data can be reduced from high to low dimensions, and then using low-dimensional data processing methods. Therefore, the study of effective dimensionality reduction methods becomes significant in statistics.
In many studies, the main results or responses of survival data are censored. Survival analysis is another important theme of statistics, and it has been widely used in medical research and finance. Therefore, the study of survival data has attracted a lot of attention. The Cox model  is the most commonly used regression model for survival data. The alternative method of the PH model is the accelerated failure time model, which directly correlates the logarithm of the failure time with the covariate, and is similar to the traditional linear model, which is easier to explain than the PH model.  takes into account both Lasso and threshold gradient oriented regularization for high-dimensional AFT model estimation and variable selection.  uses partial least squares (PLS) and Lasso methods to select variables in AFT models with high-dimensional covariates.  proposed a robust weighted minimum absolute deviation method to estimate the high-dimensional AFT model.  uses COSSO penalty in the nonparameter AFT model for variable selection  in the high-dimensional nonparameter AFT model, using the reproduction kernel Hilbert norm penalty for estimation, a new enhanced algorithm is proposed for censoring time data. The algorithm is suitable for fitting parameter accelerated failure time models.  studied the elastic net method for variable selection under the Cox proportional hazard model and the AFT model with high-dimensional covariates.  developed a robust prediction model for event time results through LASSO regularization. This model is aimed at the Gehan estimation of high-dimensional prediction variables accelerated failure time AFT model.  extends rank-based Lasso estimation to the estimation and variable selection in the high-dimensional partial linear acceleration failure time model.  uses the bridge penalty for regular estimation and parameter selection of high-dimensional AFT models. Based on the high-dimensional semi-parameter accelerated failure time model,  proposed the Buckley-James method of double penalty, which can perform variable selection and parameter estimation at the same time.  has developed a method for quickly predicting variable selection and contraction estimation of high-dimensional predictive variable AFT models. The model is related to the correlation vector machine (RVM), which relies on maximum posterior estimation to get sparse estimates quickly.  proposes a semiparametric regression model whose covariate effect contains parametric and nonparametric parts. The selection of parametric covariates is achieved by iterative LASSO method, and the nonparametric components are estimated using the sieve method , and based on kullback-leibler geometry , an empirical model selection tool for nonparameter components was obtained. However, they leave behind some theoretical issues that have not yet been resolved.  takes into account the estimation and variable selection of LASSO and MCP in AFT models with high covariates.  implements regularization in the high-dimensional AFT model L1/2 for variable selection.  proposed a covariate adjustment screening and variable selection procedure under the accelerated failure time model. It also appropriately adjusted the low-dimensional confounding factors to achieve a more accurate estimation of regression coefficients.  proposed an adaptive elastic net and weighted elastic net with censored data and high-dimensional variable selection in the AFT model.  proposed to apply a tensor recursive neural network architecture to extract latent representations from the entire patient medical record of the high-dimensional AFT model.  considers a novel Sparse L2 Boosting algorithm, which is based on a semiparameter variable coefficient accelerated failure time model of right-censored survival data with high-dimensional covariates model prediction and variable selection.  developed a variable selection method in an AFT model with high-dimensional predictive variables, which consists of a set of algorithms based on two widely used techniques in the field of variable selection in survival analysis synthesis: Buckley-James method and Dantzig selector.
In this article, based on potential predictors, we applied the GMCP (Group Minimax Concave Penalty) penalty method for the first time to the study of a high-dimensional nonparametric accelerated failure time additive regression model (2.1) (MCP,  ). The weighted least squares solution of the model based on GMCP penalty is given. We also derived the group coordinate descent algorithm used to calculate the GMCP estimate in this model. Our simulation results show that the weighted least squares estimation based on GMCP penalty works well in the high-dimensional nonparameter accelerated failure time additive regression model, and is superior to the GLasso (Group Least Absolute Shrinkage and Selection Operator) penalty method.
The rest of the paper is organized as follows. In Section 2, we describe the nonparameter accelerated failure time additive regression (NP-AFT-AR) model and our research methods. In Section 3, we give the asymptotic oracle property of GMCP estimation. The simulation results are given in Section 4. Verification of actual data is given in Section 5. The conclusion is given in Section 6.
2. Models and Methods
In this paper, we study the following nonparametric accelerated failure time additive regression (NP-AFT-AR) model to describe the relationship between the independent predictors or covariates Xj’s and the failure time T:
where is the intercept, is a vector of covariates, fj’s are unknown smooth functions with zero means, i.e., and is the random error term with mean zero and a finite variance . We consider sample size is small , assuming that some additive components are zero, the main purpose of our research is to find the non-zero components and zero components; the second goal is to find the specific functional form of the non-zero components in order to propose a more parsimonious model. In this study, we apply the GMCP penalty in the proposed NP-AFT-AR model for component selection and estimation. We use B-splines to parameterize the nonparameter components, then invoke the inverse probability-of-censoring weighted least squares method to achieve the goals. We treat the spline approximation for each component as a group of variables subject to selection. By the GMCP penalty approach, we show that the proposed method can select significant component functions by choosing the nonzero spline basis functions.
2.2. Weighted Least Squares Estimation
We define as the ith subject’s survival time, and let denote the censoring time and denote the event indicator, i.e., ; which takes value 1 if the event time is observed, or 0 if the event time is censored. Define as the minimum of the survival time and the censoring time, i.e., : Then, the observed data are in the form , . which are assumed to be an independent and identically distributed (i.i.d.) sample from .
Let be the order statistics of Yi’s, and are the associated censoring indicators and covariates. Let F be the distribution of T and be its Kaplan-Meier estimator , where the ’s are Kaplan-Meier weights (  ) calculated by
 showed that the weights, ’s, are the jumps in the Kaplan-Meier estimator. These are equivalent to the inverse probability-of-censoring weights (   ), ; where is the Kaplan-Meier estimator of G, the distribution function of C. The Stute’s weighted least squares loss function for the NP-AFT-AR model (2.1) is defined as
Here, we use B-spline basis functions to approximated unknown functions fj’s. For every function component, assuming that is bounded; and ; The basis functions are determined by the order and the number of interior knots . The total number of B-spline basis functions for each function component would be : For identifiability, satisfy ; we take the total number of basis functions to be only and center all the basis functions at their means. Then the B-splines approximation for each function component, ; is given by
where are the B-spline basis functions and is the corresponding coefficient parameter vector. Let denote the design matrix of B-spline basis of the jth predictor and be its ith row vector corresponding to the sorted data. Denote the design matrix as ; the ith row of as ; and the corresponding parameter vector as . Then we have
By plugging Equation (2.3) into Equation (2.2), we will get the new loss function as following:
By centering and with their -weighted means, the intercept becomes 0. Denote and ; where and Let denote the L2 norm of any vector . For simplicity, we use and . Then we can rewrite the Stute’s weighted least squares loss function Equation (2.4) as
2.3. Weighted Least Square Estimation of GMCP Penalty
B-splines approximation is used on the unknown functions, which transforms the nonparameter regression into a parameter regression that makes variable selection and parameter estimation easier to solve. Meanwhile, the grouped variables in ; i.e., ; for each , are all related to the variable ; so we can consider B-spline basis functions for each nonparameter function to be a group. Instead of selecting the significant nonparameter functions, our task converts to choosing the significant B-spline basis functions from or nonzero coefficients from .
In order to carry out variable selection at the group and individual variable levels simultaneously. In our case, the GMCP penalty function is
where is a parameter that controls the concavity of and is the penalty parameter. Here . We require and . The term MCP comes from the fact that it minimizes the maximum concavity measure defined at (2.2) of , subject to conditions on unbiasedness and selection feature. The MCP can be easily understood by considering its derivative
where for any vector , is the L1 norm: , is the penalty tuning parameter and . In our case, each represents the jth group of basis functions, i.e., ; the values of the basis functions for each nonparameter function may be different from those for another function ; and when ; we assume there is no overlap between groups. Now combining the objective function in Equation (2.5) and the penalty function in Equation (2.6), we have the penalized weighted least squares objective function for the proposed NP-AFT-AR model as follows:
We can conduct group or component selection and estimation by minimizing : If ; it implies that the function component is deleted, otherwise, it is selected, further, the individual basis functions within a group can be selected.
We derive a group coordinate descent algorithm for computing . This algorithm is a natural extension of the standard coordinate descent algorithm (  ). It has also been used in calculating the penalized estimates based on concave penalty functions (  ).
The group coordinate descent algorithm optimizes a target function with respect to a single group at a time, iteratively cycling through all groups until convergence is reached. It is particularly suitable for computing , since it has a simple closed form expression for a single-group model, see (2.11) below.
We write for an upper triangular matrix via the Cholesky decomposition. Let and . Simple algebra shows that
Note that . and
Let . For , it can be verified that the value that minimizes is
In particular, when , we have
which is the GLasso estimate for a single-group model (  ).
The group coordinate descent algorithm can now be implemented as follows. Suppose the current values for the group parameter are given. We want to minimize with respect to . Let
and write and . Let be the minimizer of . When , we have , where M is defined in (2.11).
For any given , we use (2.11) to cycle through one component at a time. Let be the initial value. The proposed coordinate descent algorithm is as follows.
Initial vector of residuals , where , For , carry out the following calculation until convergence. For , repeat the following steps.
Step 1: Calculate .
Step 2: Update .
Step 3: Update and .
The last step ensures that r holds the current values of the residuals. Although the objective function is not necessarily convex, it is convex with respect to a single group when the coefficients of all the other groups are fixed.
3. Asymptotic Oracle Properties of GMCP
Let denote the cardinality of any set and . Define
Here is dimensional sub-design matrix corresponding to the variables in A, Denote We make the following assumptions.
Similar to , we assume:
(C1) T and C are independent.
(C3) and .
(C4) Denote and as the least upper bounds of T and C, respectively. Then or .
(C5) has finite envelope function.
(C6) for .
These assumptions correspond to the conditions in . In the random censorship model, (C1) is a basic assumption. (C2) given the failure time T, the censoring indicator is independent of the . (C3) in least-squares estimation, we need the second moment. (C4) assumes the probability of an event being observed is greater than zero, which guarantees the consistency of the estimator. (C5) is a fundamental condition for the consistency and convergence rate in the proofs, and is used in the entropy calculation. (C6) guarantees that is identifiable.
(C7) There exist constant and where such that
(C8) There is a small constant such that .
(C9) The random errors are independent and identically distributed as , where and ; moreover, the tail probabilities satisfy for and some constantsC and K.
(C10) There exists a positive constant M such that .
(C7) is the sparse Riesz condition (SRC) formulated for the nonparameter AFT model (2.1), which controls the range of eigenvalues of the matrix Z. This condition was introduced to study the properties of Lasso for the linear regression model by . (C8) assumes that the unimportant predictors are small in the sense, but do not need to be exactly zero. If , (C8) becomes for all . The problem of variable selection is equivalent to distinguishing nonzero functions from zero functions. (C9) assumes that the distribution of the error terms has sub-Gaussian tails. This condition holds when the error distribution is normal. (C10) assumes that all the predictors are uniformly bounded, which is satisfied in many practical situations.
In this subsection, we simply write is GMCP estimator. Let and set if is empty. Define
This is the oracle least squares estimator. Of course, it is not a real estimator, since the oracle set is unknown.
We first consider the case where the 2-norm GMCP objective function is convex. This necessarily requires where be the smallest eigenvalue of , and recall . As in , define the function
This function arises from an upper bound for the tail probabilities of the chi-square distributions given in Lemma A.2 in Appendix. This is derived from an exponential inequality for chi-square random variables of .
Theorem 3.1. Suppose are independent and identically distributed as and (C1)-(C10). Then for any statisfying , and , we have
where and .
We give the proof of Theorem 3.1 in Appendix. It provides an upper bound on the probability that is not equal to the oracle estimator in terms of the tail probability functionh in (3.2). The key condition ensures that the 2-norm GMCP criterion is strictly convex. Nonetheless, this result is a starting for a similar result in case. The following corollary is an immediate consequence of Theorem 3.1.
Corollary 1. suppose that the condition of Therorm 3.1 are satisfied. Also suppose that for as . If , then
By Corollary 1, the 2-norm GMCP estimator equals the oracle least squares estimator with probability converging to one. This implies it is group selection consistent. We now consider the high-dimensional case where . Under condition (C7), let , and . Define
Theorem 3.2. suppose are independent and identically distributed as and B satisfies the in (C7) with , and , we have
where and .
Corollary 2. suppose that the condition of Therorm 3.2 are satisfied. Also suppose that for as . If , then
Theorem 3.2 and Corollary 2 provide sufficient conditions for the asymptotic oracle property of the global 2-norm GMCP estimator in the situations. Here we allow . So p can be greater than n. The condition is stronger than the corresponding condition in Theorem 3.5 (  ). The condition ensures that the GMCP criterion is convex in any q-dimensional subspace. It is stronger than the minimal sufficient condition for convexity in q-dimensional subspaces. This is the price we need to pay in search for a lower-dimensional space that contains the true model.
4. Numerical Simulation
In this section, we conduct simulation studies to evaluate the performance of the GMCP and GLasso penalties in a high-dimensional NP-AFT-AR model with limited samples. We therefore focus on the comparisons of the group selection methods with only the BIC (  ) selected tuning parameter , is given:
Where RSS is the sum of squared residuals, df is the number of selected variables given . We choose from the increasing sequence in Section 5, for any given value of , We choose from a sequence of 100 values , from to , Where is corresponding to the covariate with “design” matrix. is the maximum penalty value, which compresses all estimated coefficients to zero.
We compute the empirical prediction mean square error (MSE) to reveal the estimation accuracy. Let be the estimator of ; and we define MSE as
Three scenarios are considered in the following, where some nonzero components are linear and the response variable is subject to various censoring rates. The sample size and a total of 100 simulation runs are used. The logarithm of censoring time is generated from a uniform distribution , where and are determined by a Monte-Carlo method to achieve the censoring rates of 35% and 40% respectively. For example, the censoring rate is approximated by where is drawn from the proposed model (2.1) and is drawn from , M is the Monte-Carlo simulation runs used to compute cr. When we chose , which is considered to be the desired censoring rate. To take account of the computational efficiency and accuracy, we use the cubic B-spline with five evenly distributed interior knots for all the functions , which gives the number of basis functions for each nonparametric component. Due to the identifiability constraint, ; the actual number of basis functions used is 8. This choice is made because our simulation studies indicated that using a larger number of knots does not improve the finite sample performance (results are not shown).
4.1. Scenario 1 (Covariates Are Independent)
In this scenario, we consider independent covariates and set the intercept : The logarithm of failure times, , are generated from
The predictors are sampled from the .
We set and consider the cases where , respectively to see the performance of our proposed methods as the sample size increases. The penalty parameters are selected using CV as described above.
The results for the the GMCP, GSCAD and GLasso methods are given in Table 1 and Table 2 based on 100 replications. The columns in Table 1 include the average number of variables (NV) being selected, model error (ER), percentage of occasions on which correct variables are included in the selected model (%IN) and percentage of occasions on which the exactly correct variables are selected (%CS) with standard error in parentheses. Table 2 summarizes the mean square errors for the six important functions with standard error in parentheses.
Several observations can be obtained from Tables 1-4. The model that was selected by the GMCP and is better than the one selected by the GLasso in terms of model error, the percentage of occasions on which the true variables being selected and the mean square errors for the important coefficient functions. The GMCP includes the correct variables with high probability. When the sample size increases, the performance of both methods becomes better as expected. To examine the estimated nonparametric functions from Concave group Selection methods, we plot GMCP along with the true function components in Figure 1 and Figure 2. The estimated nonparametric coefficient functions are from GMCP method in one run when 100. From the graph, the estimators of the
Table 1. Simulation results. NV, number of selected variables; ER, model error; IN%, percentage of occasions on which the correct variables are included in the selected model; CS%, percentage of occasions on which exactly correct variables are selected, averaged over 100 replications. Enclosed in parentheses are the corresponding standard errors.
Table 2. Simulation results. Mean Square errors for the important coefficient functions based on 100 replications. Enclosed in parentheses are the corresponding standard errors.
Table 3. Simulation results. NV, number of selected variables; ER, model error; IN%, percentage of occasions on which the correct variables are included in the selected model; CS%, percentage of occasions on which exactly correct variables are selected, averaged over 100 replications. Enclosed in parentheses are the corresponding standard errors.
Table 4. Simulation results. Mean Square errors for the important coefficient functions based on 100 replications. Enclosed in parentheses are the corresponding standard errors.
Figure 1. , the solid black line is the real function, the dotted red line is the GMCP estimation, CR = 35%.
Figure 2. , the solid black line is the real function, the dotted red line is the GMCP estimation, CR = 40%.
4.2. Scenario 2 (Covariates Are Correlated)
In this scenario, we consider correlated covariates and set the intercept : The logarithm of failure times, , are generated from?
where the covariates are generated from where and U are i.i.d. . This provides a design with a correlation coefficient of 0.5 between all of the covariates.
The simulation study results are reported in Tables 5-8. The conclusions for Scenario 2 are very similar to those for Scenario 1. When the censoring rate increases, the estimation and selection performance decreases for all methods. The results in Table 6, Table 8 show that the GMCP estimator is more accurate than the GLasso estimator for both the individual component functions and the full
Table 5. Simulation results. NV, number of selected variables; ER, model error; IN%, percentage of occasions on which the correct variables are included in the selected model; CS%, percentage of occasions on which exactly correct variables are selected, averaged over 100 replications. Enclosed in parentheses are the corresponding standard errors.
Table 6. Simulation results. Mean Square errors for the important coefficient functions based on 100 replications. Enclosed in parentheses are the corresponding standard errors.
Table 7. Simulation results. NV, number of selected variables; ER, model error; IN%, percentage of occasions on which the correct variables are included in the selected model; CS%, percentage of occasions on which exactly correct variables are selected, averaged over 100 replications. Enclosed in parentheses are the corresponding standard errors.
Table 8. Simulation results. Mean Square errors for the important coefficient functions based on 100 replications. Enclosed in parentheses are the corresponding standard errors.
model, since the MSE under the GMCP approach is always smaller than that under the GLasso approach. The results in Table 5, Table 7 show that the GMCP method conducts component selection more precisely than the GLasso method, while the GLasso method chooses many zero component functions as nonzero functions. To examine the estimated nonparametric functions from the GMCP, we plot them along with the true function components in Figure 3, Figure 4. The estimated functions are from the GMCP method in one run when . The estimation and selection accuracy decrease when covariates are correlated, we can still see that the estimated curves under the GMCP method are close to the true curves compared with the estimated curves under the GLasso method.
5. Application in NA-AFT-Model
In this section, we will use Shedden 2008 (for short) to conduct an empirical analysis of part of the collected lung adenocarcinoma data to illustrate the proposed method. For more information, see . Retrospective data of 442 lung adenocarcinoma patients were collected at multiple locations, including their survival time, some other clinical and demographic data, and the expression level of the 22,283 gene from the following genes: tumor samples. However, most samples have small changes. Therefore, in our application, we randomly select 321 samples, the first 500, 1000 genes. Therefore, , and the survival rate is 35.8%.
Figure 3. , the solid black line is the real function, the dotted red line is the group MCP estimation, the dotted blue line is the group SCAD estimation, and the black line is the group lasso estimation, CR = 35%.
Figure 4. , the solid black line is the real function, the dotted red line is the group MCP estimation, the dotted blue line is the group SCAD estimation, and the black line is the group lasso estimation, CR = 40%.
Figure 5. The estimation function graph based on GLasso and GMCP approximates the corresponding 200746_s_at by using the same covariate, where the red dotted line is the GMCP estimate and the gray dotted line is the GLasso estimate.
Here, we are interested in the effect of tumor gene expression levels on the survival time of lung adenocarcinoma patients. Since the linear assumption is always latent in high dimensions, the proposed method may be more suitable for analyzing feature selection problems considering nonlinear effects. In our analysis, we set the spline base for each gene. The proposed method selects 1 gene locus under GMCP (ie 200746_s_at). However, when , the method under GLasso penalized regression alone selected the 6, 10 gene.
From Figure 5, we find that the larger the dimension, the worse the GLasso method estimation, but it has little effect on the GMCP estimation. Therefore, the verification of the actual data shows that the GMCP penalty is better than the GLasso penalty, and the accuracy is higher, and the calculation cost of the two is the same. Under the same conditions, the GMCP method is more suitable than the GLasso.
6. Concluding Remarks
In this paper, we study the weighted least squares estimation and selection attributes of GMCP in the NP-AFT-AR model with high-dimensional data. For the GMCP method, our simulation results show that GLasso tends to select some unimportant variables. In contrast, GMCP has progressive predictability, which shows that it also has selection consistency.
Lemma 1. Let be a random variable with chi-square distribution with k degrees of freedom. For , , where is defined in (3.2).
This lemma is a restatement of the exponential inequality for chi-square distributions of .
proofof Theorem 3.1. Since is the oracle least squares estimator, we have and
If , then by the definition of the MCP, . Since , the criterion (2.8) is strictly convex. By the Karush-Kuhn-Tucker (KKT) conditions, the equality holds in the intersection of the events
We first bound . Let . By (A.1)  and using
It follows that , where , Because , is distributed as a distribution with degrees of freedom. We have, for
where we used lemma 1 in the third line. Now consider , Recall . If for all , then . This implies
Let be a matrix with a identity matrix in the pth block and 0’s elsewhere. Then . Note that
and id distributed as a distribution with q degrees of freedom. Therefore, similar to , we have, for ,
Combining and , we have
Since , we can obtain . This completes the proof.
For any and , define
Lemma 2. Suppose . We have
proof. For any . We have . Thus
Since is a projection matrix, , where . Since there are ways to choose A from , we have
This and Lemma A.2 imply that
here we used the inequality , this completes the proof.
Define I as any set that satisfies
Lemma 3. Suppose that satisfies that , , and . Let . Then for any with , we have .
proof This lemma can be proved along the line of the proof of Lemma 1 of  and is omitted. proofof Theorem 3.2. By Lemma 3, in the event
we have , Thus in the event (6.4), the original model with p groups reduces a model with at most groups, in this reduced model, the condition of Theorem 3.2 implies that the conditions of Theorem 3.2. By Lemma 2,
Therefore, combining (6.5) and Theorem 3.1, we have , since , we can obtain . This proves Theorem 3.2.
 Huang, J., Ma, S. and Xie, H.L. (2006) Regularized Estimation in the Accelerated Failure Time Model with High-Dimensional Covariates. Biometrics, 62, 813-820.
 Datta, S., Jennifer, L.R. and Datta, S. (2007) Predicting Patient Survival from Microarray Data by Accelerated Failure Time Modeling Using Partial Least Squares and LASSO. Biometrics, 63, 259-271.
 Leng, C. and Ma, S. (2007) Accelerated Failure Time Models with Nonlinear Covariates Effects. Australian and New Zealand Journal of Statistics, 49, 155-172.
 David, E. and Li, Y. (2009) Survival Analysis with High-Dimensional Covariates: An Application in Microarray Studies. Statistical Applications in Genetics and Molecular Biology, 8, 1-122.
 Johnson, B.A.(2009) Rank-Based Estimation in the l1-Regularized Partly Linear Model for Censored Outcomes with Application to Integrated Analyses of Clinical Predictors and Gene Expression data. Biostatistics, 10, 3659-466.
 Wang, Z. and Wang, C.Y. (2010) Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data. Statistical Applications in Genetics and Molecular Biology, 9, Article No. 24.
 Ma, S. and Du, P. (2012) Variable Selection in Partly Linear Regression Model with Diverging Dimensions for Right Censored Data. Statistica Sinica, 22, 1003-1020.
 Hu, J.W. and Chai, H. (2013) Adjusted Regularized Estimation in the Accelerated Failure Time Model with High Dimensional Covariates. Journal of Multivariate Analysis, 122, 96-114.
 Chai, H., Liang, Y. and Liu, X.Y. (2015) The L1/2 Regularization Approach for Survival Analysis in the Accelerated Failure Time Model. Computers in Biology and Medicine, 64, 283-290.
 Xia, X.C., Jiang, B.Y., Li, J.L. and Zhang, W.Y. (2016) Low-Dimensional Confounder Adjustment and High-Dimensional Penalized Estimation for Survival Analysis. Lifetime Data Analysis, 22, 547-569.
 Khan, M.H.R. and Shaw, J.E.H. (2016) Variable Selection for Survival Data with a Class of Adaptive Elastic Net Techniques. Statistics and Computing, 26, 725-741.
 Yang, Y.C., Fasching, P.A. and Tresp, V. (2017) Modeling Progression Free Survival in Breast Cancer with Tensorized Recurrent Neural Networks and Accelerated Failure Time Models. Proceedings of Machine Learning Research, 68, 164-176.
 Khan, M.H.R. and Shaw, J.E.H. (2019) Variable Selection for Accelerated Lifetime Models with Synthesized Estimation Techniques. Statistical Methods in Medical Research, 28, 937-952.
 Satten, G.A. and Datta, S. (2001) The Kaplan-Meier Estimator as an Inverse-Probability-of-Censoring Weighted Average. The American Statistician 55, 207-210.
 Breheny, P. and Huang, J. (2011) Coordinate Descent Algorithms for Nonconvex Penalized Regression, with Applications to Biological Feature Selection. The Annals of Statistics, 5, 232-253.
 Yuan, M. and Lin, Y. (2006) Model Selection and Estimation in Regression with Grouped Variables. Journal of the Royal Statistical Society: Series B, 68, 49-67.
 Yang, G.R., Huang, J. and Zhou, Y. (2014) Concave Group Methods for Variable Selection and Estimation in High-Dimensional Varying Coefficient Models. Science china Mathematics, 57, 2073-2090.
 Director’s Challenge Consortium for the Molecular Classification of Lung Adenocarcinoma (2008) Gene Expression-Based Survival Prediction in Lung Adenocarcinoma: A Multi-Site, Blinded Validation Study. Nature Medicine, 14, 822-827.