In the United States, there is a well-documented high rate of lung cancer and pulmonary nodules   . In recent years, widespread use of computed tomography (CT) resulting from the finding by the National Lung Screening Trial that annual CT screening reduces lung cancer mortality by 20% compared to chest x-ray  has resulted in over 50% of CT-screened patients having at least one non-calcified nodule and over 96% of nodules over 4mm being benign (false positive)   . There is also a high false positive rate and morbidity associated with biopsy or resection of these benign nodules    . Consequently, a CT screening program will need substantial resources to perform follow up for every pulmonary nodule detected. Potential malignancy has been found to increase with nodule size, which also tends to result in later-stage detection of lung cancer as the nodule grows  . Therefore, there is a clear need for a complimentary test to aid evaluation of nodule malignancy potential, ideally a biomarker, to reduce false positive rates of CT screening for lung cancer and enable earlier detection.
Tumor-associated (TA) antigens are currently the most commonly measured biomarkers during management of cancer patients  . However they are often of little utility in early disease due to serum levels having a strong correlation with tumor burden. This generally results in levels only being elevated in later stage disease, limiting their clinical use to monitoring of treatment and disease recurrence. TA autoantibodies have been identified for a range of solid tumors and are currently emerging as strong candidates for clinically useful cancer biomarkers  . The mechanisms of their secretion have not been clearly determined but their production is thought to occur due to increased immunogenicity of the corresponding antigen. They are produced early in tumorigenesis, being measurable up to 5 years before the development of clinical symptoms  . As antibodies they represent biologically amplified markers, increasing the detectable signal for the corresponding antigen. They also persist in the circulation with half-lives of typically up to 30 days  and are more stable outside the body than other biomarkers.
EarlyCDT-Lung (Oncimmune Holdings plc.) detects the presence of TA autoantibodies to a panel of seven lung cancer associated antigens using an indirect enzyme-linked immunosorbent assay (ELISA) method   . A sample is positive when at least one of the panel of TA autoantibodies is elevated above a pre-determined cut-off  . The test has been both technically  and clinically validated in seven independent validation cohorts   including high- risk control groups matched for age, sex, and smoking history. The performance characteristics have been further validated in the commercial setting by an audit of clinical outcomes for the first 1599 patients who had a valid EarlyCDT-Lung and unknown nodule status  . The test consistently identifies lung cancer with 92% accuracy (compared with 50% for CT) with a sensitivity (true positive rate) of ~40% for all stages and types (small cell and non-small cell) of lung tumors and a specificity (true negative rate) of ~93% for all cohorts, showing the robustness and reproducibility of this assay system  .
The test was launched commercially in 2012 for the early detection of lung cancer in a high-risk screening scenario   . However it has also had some clinical acceptance for follow up of patients who had a positive result on CT. It has recently been demonstrated that a positive test result reflects a significant increased risk for malignancy in lung nodules 4 to 20 mm in diameter  . This confirms that EarlyCDT-Lung may have clinical value in assessment of malignancy risk for indeterminate pulmonary nodules.
The optimum diagnostic performance of EarlyCDT-Lung in the screening setting may not be appropriate for the indeterminate nodule setting. Performance is defined by the particular set of cut-offs chosen. So by varying the cut-offs, specificity and sensitivity, and hence false-positive and false-negative rates, can be adjusted. So the purpose of this present paper is largely technical, and describes the creation and use of a receiver-operating characteristic (ROC) curve for EarlyCDT-Lung. Because of the need for a relatively large number of cancers to achieve this, a case-control cohort (designated the optimization cohort) previously used for validation of EarlyCDT-Lung was used   . It was compared with a dataset (designated the nodule cohort) derived from our commercial operations (designated the audit cohort).
2. Materials and Methods
2.1. Patient Cohorts
2.1.1 Optimization Cohort
This cohort has been described elsewhere and was previously used to set the current commercial cut-offs for EarlyCDT-Lung  . Briefly, serum samples from 235 patients with lung cancer obtained at or just after histopathological confirmation of the tumor, were assayed. The lung cancers consisted of 178 non- small cell lung cancer (75.7 %), 53 small cell lung cancers (22.6 %), and 4 others (1 sarcoma, 2 × bronchogenic carcinomas, and 1 undefined lung cancer). This cohort was representative of a high risk population with a mean and median age of 65 years and a high proportion of smokers (49%) and ex-smokers (29%). The controls consisted of 266 healthy volunteers, 235 of which were individually matched to the lung cancer patients for age, gender, and smoking status. This group of controls had no evidence of any current or prior cancer including non-melanoma skin cancer.
2.1.2. Audit Patient Cohort
This cohort has been described previously  . Briefly, EarlyCDT-Lung was launched commercially in November 2010 with physicians in routine practice across USA ordering the test on behalf of their patients. This cohort was assembled for an audit of clinical practice, reporting the physicians’ use of the test and not a prospective study in a population defined by inclusion and exclusion factors. The cohort is comprised of the first 861 patients tested with the seven-an- tigen test, with some exclusion factors applied. The cohort was followed for clinical outcomes and a total of 35 cases of lung cancer were recorded.
2.1.3. Nodule Patient Cohort
This cohort was derived from the audit cohort by assessing CT imaging reports, from within 6 months of the autoantibody test, for the presence of pulmonary nodules. The size of the largest non-calcified nodule was recorded and exclusion criteria (invalid autoantibody tests, lost-to-follow-up or with previous history of cancer) were applied. The “Exclusive cohort”  , tested with the seven antigen panel and with cancer being confirmed by pathology reports (n = 166)  , was used. Nodules were categorized by size (<4 mm, 4 mm - 20 mm, >20 mm).
2.2. Comparison of Cohorts
The diagnostic performance of the optimization, audit and nodule (by nodule size: <4 mm, 4 mm - 20 mm and >20 mm) cohorts was compared statistically using Fisher’s Exact test applied to frequency tables for specificity and sensitivity separately. The test is slightly biased in the direction of non-significance due to the nodule cohort being a subset of the audit set.
2.3. Construction of ROC Curve
When a diagnostic test is based on a single test result or score, the construction of a ROC curve is straightforward. The cut-off can be varied from minimum to maximum possible values and the specificity and sensitivity are read off at a series of points. EarlyCDT-Lung, however, is effectively a multivariate test, so construction of a ROC curve is not as straightforward. Since the cut-offs for EarlyCDT-Lung are based on quantitative results, one method is to re-optimize the test at every point along a range of selected specificity levels. For the optimization cohort individualized antigen cut-offs were re-optimized to maximize sensitivity at each and every specificity level hence allowing the construction of a complete ROC curve. The optimization was a paper exercise not involving re- assay of the samples.
A Monte-Carlo direct search was used to explore the large number of possible combinations of cut-off level as follows. For a set of n controls:
1) Choose the specificity.
2) Calculate the number of control subjects that can be specified as positive i.e. s = n (1-specificity).
3) Select an autoantibody at random.
4) Select at random a number (r) between 1 and s. Identify the r samples with the highest OD for the autoantibody selected in step 3 positive, ignoring those already declared positive.
5) Set each cut-off value above the highest non-positive control sample OD value for that autoantibody.
6) Repeat steps 3 and 4 a 1000 times until the total number of control samples specified as positive is equal to the number set in step 2.
7) Calculate sensitivity.
8) Repeat the process for the next value of specificity.
9) Plot the ROC curve, i.e. sensitivity vs. (1-specificity) for all values of specificity.
To complete the description of diagnostic performance, the PPV, Negative Predictive Value (NPV) and Relative Risk (RR) were calculated based on 20% lung cancer risk for patients with nodules (of any size)  . Positive diagnostic ratio (DLRp) and Negative diagnostic ratio (DLRn) were also calculated.
2.4. Shift of Risk Category
Patients or their nodules are typically classified into three groups: low (0% - 10%), intermediate (10% - 65%) and high (>65%) risk of malignancy; although these ranges vary with source. The use of a biomarker test such as EarlyCDT- Lung will change a patient’s pre-test risk to a post-test risk, possibly resulting in a shift of risk group. Low and high risk groups will have clear monitoring and intervention paths. Nodules with intermediate risk are not so easy to manage clinically. So we calculated what percentage of intermediate nodules are re-class- ified using the different versions of the EarlyCDT-Lung test expressed in the ROC curve.
We took the current commercial test (with 91% specificity and 40% sensitivity for the optimization cohort), and determined a high specificity version of the test (with 98% specificity and 28% sensitivity for the optimization cohort) and a low specificity version of the test (with 49% specificity and 80% sensitivity for the optimization cohort). First we used the Swensen/Mayo nodule-based risk model   to derive the distribution of pre-test risk for all the nodule cases in the nodule cohort. Next we calculated the frequency of nodules in each 5% risk category. For each category, with midpoint risk r, we then calculated the predicted post-test risk i.e. the PPV given a positive marker test and (100-NPV) given a negative test using the usual formulae where sens is sensitivity and spec is specificity, and here with all factors expressed as proportions rather than percentages:
Next we calculated the predicted numbers (no.) of nodules positive and negative for the test:
Hence the number shifted up to a higher risk group is the total of true positives and true negatives where the post-test risk is in a higher risk group and the number shifted down to a lower risk group is the total of true negatives and false negatives where the post-test risk is in a lower risk group.
2.5. Diagnostic Likelihood Ratio (DLRp)
In a clinical setting pre-test and post-test risk estimation is generally more useful than specificity and sensitivity themselves. With a pre-test risk, r, and making the assumption of statistical independence mentioned above, the process equates to applying the positive diagnostic likelihood ratio (DLRp)  for test-positive subjects where:
There is an equivalent formula for test-negative subjects which converts the pre-test risk to post-test (NPV) using the negative ratio (DLRn) involving NPV:
Expressing the conversion from pre- to post-test risk in this way emphasizes that these estimates of post-test risk rely on the assumption of strict statistical independence of nodule size and marker test under a “both-positive rule”.
3.1. Comparison of Test Performance for the Patient Cohorts
The diagnostic metrics (specificity, sensitivity and positive diagnostic likelihood ratio) were first tabulated (Table 1). The current commercial EarlyCDT-Lung performance for the nodule cohort (specificity of 85.6% and sensitivity of 37.8%) was well in-line with that of the optimization cohort (specificity of 90.6% and sensitivity of 41.3%) bearing in mind the confidence intervals. Performance was also maintained when assessing the nodule cohort by nodule size where for nodules of 4 - 20 mm specificity was 83.9% with sensitivity of 40.0% whereas for nodules >20 mm specificity of 91.7% and sensitivity of 36.4% were determined. There were no nodules <4 mm. The performance comparison for the cohorts did not approach significance for either specificity (p = 0.28) or sensitivity (p = 0.93), despite the varying cancer rate (2.3% to 47.8%).
The performance was also compared statistically for the cohorts by calculating DLRp (see Table 1) in order to assess whether these cohorts were similar enough to be used for this study. For the audit and nodule cohorts, due to the relatively low number of cancers the DLRp is not estimated with high accuracy, and
Table 1. Summary of test performance for the patient cohorts.
a95% confidence intervals for specificity, sensitivity and DLRp are shown in brackets, bComparison of first four cohorts using Fisher’s Exact test: specificity p = 0.28, sensitivity p = 0.93, cNodule subset, size range in brackets.
showed quite wide confidence intervals, but the estimates were consistent across cohorts.
The consistency of diagnostic performance of EarlyCDT-Lung for the nodule and audit cohorts was determined to be similar to that for the optimization cohort. Therefore the ROC curve constructed using the optimization cohort can be used to predict the performance of higher and lower specificity versions of EarlyCDT-Lung and the subsequent nodule risk category shift analysis for the nodule and audit cohorts.
3.2. ROC Curve
The performance statistics were tabulated for a range of 100 specificity values from 0% to 100% allowing the creation of the whole ROC curve (Figure 1). A selection of the higher specificity points, 49% to 98%, with sensitivity ranging from 80% to 28%, were also tabulated with relevant performance metrics (Table 2).
The estimated area under the curve (AUC) for the ROC curve was 0.743 (p < 0.001 vs. randomness). The PPV, relative risk, DLRp and DLRn all increase as specificity increases. Thus between 90% and 98% specificity the diagnostic performance increases considerably, PPV from 54% to 78%, relative risk from 3.7 to 5.0 and DLRp from 4.2 to 14.0. The increase for DLRn was more modest, from 0.64 to 0.73. NPV increases steadily as specificity decreases, reaching 90.7% at a specificity of 49%. These changes result from the combination of high specificity and relatively low cancer rate which results in the number of false-positives falling rapidly towards the top of the table. High PPV is of use to a clinician faced with a positive biomarker test. Higher DLRp values mean that a positive biomarker test has greater relative effect on a pre-test risk.
Figure 1. Maximum true positive rate ROC curve constructed from multivariate data from the optimisation cohort. AUC = 0.743 (p < 0.001 vs. randomness).
Table 2. Tabulated performance characteristics of EarlyCDT-Lung for a range of speci- ficitiesa.
aBased on 20% cancer prevalence expected in a nodule cohort.
3.3. Shift of Risk Category
The risk groups and performance statistics for three test versions of interest, high specificity (98% specificity and 28% sensitivity for the optimization cohort), current commercial test (91%/40%) and low specificity (49%/80%) were tabulated (Table 3) with performance metrics. Any nodule category which had switched risk group was highlighted. For the high specificity version a total of 26.9% of intermediate nodules were re-classified either to a higher (10.1%) or lower (16.8%) risk group, whilst for the current commercial test a total of 25.9% were re-classified, either to a higher (10.4%) or lower (15.5%) group, and for the low specificity version of the test a total of 22.5% were re-classified, either to a higher (7.6%) or lower (14.9%) group.
The current commercial EarlyCDT-Lung performance in a pre-imaging screening scenario is quoted at 90.6% specificity and 41.3% sensitivity, with the speci-
Table 3. EarlyCDT-Lung performance characteristics of the nodule cohort by risk category with predicted category shift.
apre-test risk based on Swensen/Mayo nodule-based risk model with the lower figure being inclusive i.e. 0.00 and the higher figure being exclusive i.e. 4.99, bfrequency of nodule risk in the nodule cohort, c(specificity/sensitivity) calculated for the optimization cohort, dpost-test risk given a positive test result, epost-test risk given a negative test result, frelative risk = PPV/(100-NPV). Colors indicate risk groups of low (green), mid (orange) and high (red). The first column shows pre-test risk grouping, color coding for the rest of the table indicates shifted post-test risk group only for the groups that shifted.
ficity unadjusted for occult cancer in the optimization cohort   . The performance was determined by fixing the specificity and then optimizing sensitivity. The specificity of around 90% was chosen to restrict the false-positive rate to 10%. This rate is quite high but is justified on the basis that follow-up of positive patients will be by imaging and not anything more invasive. In the situation where a patient has already undergone imaging and a nodule has been observed, then the next stage may be more invasive possibly involving biopsy or surgery. Here the false-positive level needs to be kept low to avoid unnecessary invasive procedures. So the specificity would be set at a higher level, perhaps 95% or higher.
The diagnostic performance of EarlyCDT-Lung in the nodule cohort was found similar to that for the optimization cohort and hence the ROC curve constructed (using the optimization cohort) can be used to predict the performance of higher and lower specificity versions of EarlyCDT-Lung in the nodule context. Note that the ROC curve presented here is effectively a non-parametric curve as opposed to a model-based curve derived, for example, using logistic regression. This allows multivariate optimization without the assumptions required for a model-based method.
Clinicians consider many other factors besides nodule size itself when interpreting indeterminate pulmonary nodules. In some specialist centers nodule- based risk models, incorporating both demographic and nodule characteristics, are used  . We combined the EarlyCDT-Lung biomarker test with a nodule- based risk estimator (Swensen/Mayo in this case) to predict the percentage of intermediate risk pulmonary nodules which switched risk group after a positive or negative biomarker test. For the high specificity, commercial and low specificity tests the nodule reclassification was as high as 27%, 26% and 23% respectively, with the majority of reclassification being in the direction of lower risk for all tests.
It is recommended for nodules of moderate size that clinicians estimate the pre-test probability of malignancy either qualitatively by using their clinical judgment or quantitatively by using a validated model. The Fleischner Society guidelines describe the recommended follow-up for patients identified with a pulmonary nodule by CT   and balance early cancer detection and treat- ment against over-use of invasive diagnostic procedures. A biomarker that suggested an increased or decreased risk of malignancy whilst the nodule was still relatively small (e.g. 4 mm - 20 mm) would be clinically useful, as it could lead to earlier detection and decreased mortality. For larger nodules there is less need for a biomarker test since the clinical pathway is better established.
We have shown that it is possible to derive a receiver-operator characteristic (ROC) curve for a multivariate assay by re-optimizing the cut-offs for every fixed specificity value. This allows the deployment of different versions of the EarlyCDT-Lung biomarker test in different clinical situations depending on the relative costs of false-positives and false-negatives. In particular, higher and lower specificity versions would seem to be more appropriate for a nodule interpretation scenario where subsequent intervention may well be invasive.