The incidence of thyroid cancer has increased in recent years. The increase in its detection is mainly due to the use of diagnostic methods such as ultrasound (US)    . Early diagnosis and management have had an impact in the reduction of mortality rates. The worldwide mortality rate has also experimented a decrease. During 2010 in most of the countries, mortality rates according to sex were situated between 0.20 and 0.40 among men, and 0.20 and 0.60 in women. Nevertheless, mortality in women that lived in Ecuador, Colombia and Israel was over 0.60/100,000  .
Ultrasound assessment of thyroid nodules remains as the most important exam for the diagnosis approach due to its capacity of detecting potentially malignant thyroid nodules often not palpable and unsuspected. Up to 67% of the population may present a thyroid nodule, however, less than 10% are malignant   .
During the last decade, various authors, associations and international societies have proposed systems for the ultrasonographic categorization of thyroid nodules in order to classify, select and detect potentially malignant lesions that requires confirmation through fine needle aspiration biopsy (FNAB)   .
Similar to BI-RADSÒ, some of these proposed models aim to stratify, standardize and facilitate communication of results among radiologist and clinicians in order to start treatment of malignant lesions or perform a FNAB and begin the follow-up of intermediate suspicious lesions  .
Numerous authors have submitted different TIRADS (Thyroid Reporting and Data System) that had been validated among a wide variety of population. Nevertheless, those systems are often criticized because of its lack of practicality or reproducibility, for this reason none of the current TIRADS classification systems has been universally accepted     .
The aim of this study was to develop a diagnostic predictor model based on qualitative ultrasound features of thyroid nodules that underwent FNAB, in order to discriminate between benign nodules and nodules that are suspicious of malignancy.
2. Materials and Methods
2.1. Study Design and Population
This retrospective study was performed in 429 thyroid nodules that underwent US-guided FNAB during January of 2014 to November of 2017 at our imaging institute (ALPHA Imagen, Quito, Ecuador) which is a tertiary referral center that receive patients from several health institutions from all over the country.
The present paper modified and improved the initial classification proposed by one of our co-authors, previously presented in 2014 as a retrospective study of 1135 thyroid nodules analyzed at our center and published as part of a Radio-diagnostic dissertation thesis at the Postgraduate Institute of the Central University of Ecuador (Mosquera M. & Luzuriaga M., 2014).
This study was approved by our institutional review board. Informed consent for the FNAB was obtained from all of the subjects. A minimum nodular size was not determined to be included as part of this study.
The histopathological results of the samples obtained by FNAB were reported according to the Bethesda classification system  . Nodules were included in this study if they had a confirmed malign cytopathological diagnosis (V or VI category). The control nodules needed to have a confirmed benign diagnose to be included and were randomly selected among 429 samples obtained during this period. Thyroid nodules with undetermined diagnosis were excluded (I, III and IV categories) as well as patients under an age of 18 and incomplete information or missing data.
2.2. Ultrasound Assessment and US-Guided FNAB Procedure
US examinations and US-guided FNAB were performed by one expert radiologist (20 years of experience) and three radiologists with less than 10 years of experience under the expert supervision.
Thyroid US examination was performed with a 10 - 14 MHz high frequency transducer (MINDRAY, DC8 model) provided with high resolution images in 2D, tissue harmonics, color and power Doppler, spectral Doppler and Strain Elastography. Real time US-guided FNA was performed with a 1¼ inch long 23 G needle on a 20 mL vacuum aspiration system (MD. TECHÒ)
Ultrasound findings were reported and classified instantly by the radiologist in the electronic formulary. Nodules were assessed and classified according to the presence or absence (Yes vs. No) of the following characteristics proposed in our Thyroid Ultrasound Predictor Model (ALPHA Score) and to the practicality found with this method (Table 1). US features from the ACR-TIRADS 2017 score were also registered  .
Table 1. Suspicious ultrasound features used to create the predictor model of malignancy in thyroid nodules (ALPHA SCORE).
FNA samples were fixed, labeled and sent to the pathologist at a unique laboratory (Axxis Medicine Laboratories, Quito, Ecuador). The analysis of histopathological samples was performed by 3 expert pathologists. The results were reported using The Bethesda System for reporting thyroid cytopathology (2011)  .
2.3. Statistical Analysis
The first stage of analysis was performed in a sample of 429 nodules. Continuous variables are described as a mean standard deviation (SD). The distribution of malignancy frequency among groups and US features were compared using the chi-squared test (X2) and Fisher exact test.
A univariate logistic regression analysis was applied to estimate the association of each US feature and the cytopathological findings to obtain a likelihood predictor. Odds Ratio (OR) was calculated with a confidence interval of 95% (CI) for each US feature. The second stage of this study consisted in a paired analysis with a randomly selected sample of confirmed malignant (cases) and benign (controls) nodules. A univariate analysis and a logistic regression model were also applied.
The prediction model and score design used the OR from each US feature previously obtained by the logistic regression model analysis. Different values were assigned to each feature (0, 1, 2) and the final score allowed further classification of the nodule based on the estimated risk. The receiver operating characteristic (ROC) curve was also obtained (IC 95%).
For all of the statistical analyses, we used the statistical software SAS University Edition (version 6p.2/6p.2.688de4662a09-1-1; SAS Institute Inc., Cary, NC, USA). Statistical significance was assumed with a p value less than 0.05.
Among the 429 nodules that were evaluated, 326 (76%) were classified as benign and 103 (24%) as malign according to the cytopathological results. Table 2 summarizes the main demographic results as well as ultrasound findings. Mean age of patients was 53.9 years ±14.7 (range between 18 to 87 years old). There was no statistical significant difference between sex and the cytopathologic diagnosis (p = 0.556).
Malignant nodules obtained significantly higher values when the following US findings were present: hypoechogenicity, solid, irregular margins, absence of halo (p < 0.001). Bivariate analysis demonstrated the association between each US feature and the likelihood of malignancy as shown by the cytopathological results.
In the subsequent analysis in the group of paired cases (cases/controls), using a logistic regression model, the same ultrasonographic characteristics were associated with the greater or lesser probability of malignancy and the independent association of these was determined as predictors of malignancy based on the ultrasound patterns that showed significance (p < 0.001).
Table 2. Statistical Analysis.
Nevertheless, intranodular vascularity (central flow) was the only US finding with a non-significant P value (p = 0.130). No significant difference was found among age or gender in this subgroup analysis (p = 0.84 and p = 0.57).
Malignancy Predictor Model: ALPHA Score
We developed a predictor model based on the logistic regression analysis of ultrasound features and a specific score according to their association with the likelihood of malignancy (Table 2). The design of the qualitative score consisted in assigning a score depending on the presence or absence of the US feature measured and according to the likelihood associated with malignancy. The maximum possible score was 10 and minimum 0. Nodules with a malignant result scored higher than benign ones (7.24 ± 1.87 vs. 3.74 ± 1.83). The ROC curve obtained an area under the curve (AUC) of 0.9009 with a sensitivity of 47.6% and a specificity of 98.1% (Figure 1 and Figure 2).
Prevalence of malignancy in nodules that scored 2 to 3 in the model was less than 1.9% (n = 2) and were categorized as low suspicion of malignancy. The cutoff point that separates the highly suspicious of malignancy category from
Figure 1. Box plot and receiving operating characteristic (ROC) curve. (A) Score obtained using the predictor model based on US features classified and histopathological results according to Bethesda system. (B) ROC curve was obtained to determine the cutoff with the highest sensitivity and specificity for the risk stratification proposed in the score.
Figure 2. ALPHA score ultrasound features: Solid consistency (+2 point), Irregular margins (+2p), Microcalcifications (+2p), Hypoechoic appearance (+1p), Diameter ≥ 10 mm (+1p), Central vascular flow (+1 p). © Images property of ALPHA Image, Radiology and Interventionism Institute, Quito, Ecuador. (A) Central flow (+1p). TOTAL 1/10; (B) Central flow (+1p), Diameter ≥ 10 mm (+1p). TOTAL 2/10; (C) Irregular margins (+2p), Diameter ≥ 10 mm (+1p). TOTAL 3/10; (D) Solid consistency (+2p), Central flow (+1p), Diameter ≥ 10 mm (+1p). TOTAL 4/10; (E) Solid consistency (+2p), Hypoechoic appearance (+1p), Central flow (+1p), Diameter ≥ 10 mm (+1p). TOTAL 5/10; (F) Solid consistency (+2p), Hypoechoic appearance (+1p), Absence of a halo (+1p)Central flow (+1p), Diameter ≥ 10 mm (+1p). TOTAL 6/10; (G) Solid consistency (+2p), Irregular margins (+2p), Hypoechoic appearance (+1p), Central flow (+1p), Diameter ≥ 10 mm (+1p). TOTAL 7/10; (H) Solid consistency (+2p), Irregular margins (+2p), Hypoechoic appearance (+1p), Absence of a halo (+1p), Central flow (+1p), Diameter ≥ 10 mm (+1p). TOTAL 8/10; (I) Solid consistency (+2p), Irregular margins (+2p), Microcalcifications (+2p) Hypoechoic appearance (+1p), Absence of a halo (+1p), Diameter ≥ 10 mm (+1p). TOTAL 9/10; (J) Solid consistency (+2p), Irregular margins (+2p), Microcalcifications (+2p) Hypoechoic appearance (+1p), Absence of a halo (+1p), Diameter ≥ 10 mm (+1p), Central flow (+1p). TOTAL 10/10.
moderately suspicious of malignancy was 7, which was the median score obtained by malignant nodules according to Bethesda system. Otherwise, nodules that scared 0 to1 were classified as benign (Table 3).
Later, a sub-analysis was performed looking to compare the score obtained by nodules using the ACR TIRADS 2017 and our examination of the same nodules with our score system. We determined that nodules classified as highly suspicious of malignancy scored 4.4 (±0.62) using ACR TI-RADS, whereas that moderate and low suspicion scored 3.6 (±0.79) and 2.5 (±0.91) respectively. On the other hand, the benign nodules obtained 1.8 (±0.66) with the ACR TI-RADS score (Figure 3).
Table 3. Risk stratification categories for thyroid nodules suggested by the ALPHA Score according to US assessment.
Figure 3. Distribution of nodules classified according to ALPHA Score risk categories (X axis) and the respective score obtained using ACR-TIRADS 2017 (Y axis). Risk categories: (A) benign, (B) Low suspicion of malignancy, (C) Moderate suspicion of malignancy (D) High suspicion of malignancy.
This study allowed to observe the association between a new US scoring system (Alpha Score) and malignancy likelihood. Since current use of US has increased the detection of thyroid nodules, FNA procedures to exclude malignancy had also increased   .
The limited familiarity of TIRADS classification systems and other prediction models among physicians might lead to an over use of FNABs while at the same time the excess of TIRADS publications may have generated confusion about the approach and optimal use of FNAB as a diagnostic exam.
While it is true that several international guidelines focus on ultrasound assessment of the thyroid nodule before deciding whether to aspirate it or not, various authors coincide that better and easy alternative guidelines are needed in order to facilitate communication between patients and physicians  .
However, TIRADS was developed based on BI-RADSÒ, a well-known classification system that has been widespread by the American College of Radiology (ACR) and initially applied to breast imaging and was designed to standardize the reports and the radiographic language, facilitate the evaluation, reduce the number on biopsies and suggest a specific therapeutic conduct. Nevertheless, intermediate categories remain as a challenge because the management usually includes imaging follow-up and confirmation through biopsy    . As a matter of fact, category IV from BI-RADSÒ and TIRADS and its sub classifications A, B and C have generated many criticisms regarding its application, reproducibility and diagnostic effectiveness, in addition to generating a significant increase in diagnostic time for the selection and application of the score.
During the last few years, several authors have centered their research to determine the reliability and accuracy of the most used TIRADS. In 2014, in France, Moifo et al. used the Russ modified TIRADS (2013) aiming to assess the malignancy predictive capacity of it. The results presented in their study validated the proposed classification and also analyzed 4 of the ultrasound features that demonstrated higher association with malignancy like irregular margins (OR: 22.4), wider-than-taller nodules (OR: 19.5), microcalcifications (OR: 15.2) and marked hypoechogenicity (OR: 12.7), values that are very similar to our findings, however, authors concluded that more evidence is needed to ensure a wider application    . This background is interesting because, as Kwak et al. stated in their work back in 2011, a single ultrasound feature with a higher association (OR) correlates stronger with malignancy rather than two minor features together  .
In 2017, in Brasil, Delfim et al. presented a new modified TIRADS after analyzing the predictive power of 23 ultrasound characteristics with remarkable results that shown at least 9 features highly associated with malignancy using a bivariate analysis that couldn’t remain as significant predictors after multivariate analysis. Among these features are: non-ovoid shape, macrocalfications, absence of a halo, regular/irregular-thick halo, crystal colloid, hyperechogenicity and spongiform appearance, whilst a blurred margin was negatively associated with the likelihood of malignancy. Only absence of a halo demonstrated to be an independent predictor feature after the logistic regression analysis, while nodular vascularity and its criteria might need to be reformulated in further research despite the evidence presented in other studies    .
As presented in studies by authors like Horvath et al., Russ et al. and others, scoring systems that stratify risk in 5 or more categories, a higher frequency of malignancy is usually found between intermediate to high risk categories (TIRADS 3, 4A, 4B). This tendency was also present in our study, but we only used 4 risk categories, of which the first 2 were very specific to exclude malignancy.
In this regard, our findings allowed us to infer that most cases of malignancy will present with the most common malignancy US patterns leading to the decision to perform an FNA. This interpretation should be taken with caution because, according to a multi-center study conducted in Korea, 7.3% of malignant nodules did not present any suspicious feature on ultrasound  . This last-mentioned study does not necessary relate to our experience and possibly among other authors, who agree that all malignant nodules usually present a wide variety of US predictors.
A limitation of this study was the number of nodules included (selection bias), despite the distribution of malign and benign results are similar to other studies, we excluded multiple nodules and excluded results classified as Bethesda I, III and IV, that will certainly deserve a further analysis in the future, due to evidence that links some US findings as strong predictors of malignancy in follicular neoplasms (e.g. intranodular/central flow)  .
In addition, our study did not also include elastography which can be considered a limitation due to the widely accepted tendency to use this method for thyroid nodule assessment. However, our objective is that the applicability and interpretation of our score remains simple, easy to use in our country as well as other countries where the disposition of standardized equipment is not fully available. On the other hand, elastography is not included in other classification systems or in the latest edition of the American Thyroid Association (ATA) nor the ACR, probably due to the lack of evidence on use of strain and quantitative elastography (shear wave and similar)    .
One strength of our study was the availability of cytopathological studies for all samples obtained. Unfortunately, the design of this study did not contemplate having a long-term follow-up since the main objective for its use is the adequate selection of suspicious nodules to undergo an FNA procedure. We will consider a future study that contemplates the correlation of our results with post-surgical outcomes in selected groups to obtain further information about the accuracy of our prediction model.
In conclusion, the proposed model proved to be useful and easy to apply when stratifying thyroid nodule risk of malignancy using US features presented here and applying the proposed risk categories. The replication of this study needs to be considered within a multi-center design to validate it and possibly facilitate therapeutic decisions in the management of thyroid disease.