Automated technologies and methods have become commonplace for hearing screening in adults, especially outside of North America  . Self-report measures, such as the Hearing Handicap Inventory for the Elderly-Screening  , are available online and may be considered a low-tech approach to hearing screening. High-tech approaches to hearing screening generally involve self-administered tests with stimuli presented under variable and uncontrolled listening conditions, and include screening with the use of land-line and cellular telephones, the internet, and hand-held consumer-electronic devices such as smartphones and tablet computers. Software applications have been developed for screening with the use of pure-tones, but results have proven problematic      , and the calibration of devices and earphones remains a formidable challenge even for the most promising automated tests   . However, progress is being made in addressing the calibration problem  .
Since calibration problems make it difficult to deliver pure-tone tests over the internet and on mobile devices, stimulus materials consisting of speech in noise are used in some newer screening tests  -  . Although speech-in-noise screening provides a good measure of functional hearing capabilities, the correlation between these measures and pure-tone thresholds tends to be moderate    . Moreover, the outcome of some speech-in-noise screening measures differs qualitatively from pure-tone screening results. Instead of a dichotomous Pass-Fail outcome, the Speech Understanding in Noise screening test [13, 14], for example, places respondents into one of three outcome categories: Pass, Hearing Check Advised, and Hearing Check Recommended.
The American Speech-Language-Hearing Association or ASHA  guidelines for hearing screening in adults stipulate that the stimulus materials consist of pure-tones at 1, 2 and 4 kHz presented at 25 dB HL. In order to pass the screening, the examinee must respond to all of the stimuli. Failure to respond at any one of the three frequencies results in a referral for a complete audiological evaluation. Since the stimuli are pure-tones presented at 25 dB HL, they must be delivered via a properly calibrated audiometer. Without a way to control the sound level and the user’s selection of a transducer (earphones or speakers), it currently is not possible to develop a screening test which reflects the ASHA criteria and can be delivered over the internet. This is a formidable challenge facing the practice of tele-audiology and, in this regard, the ability to conduct hearing screening and simultaneously evaluate speech recognition ability may provide a useful alternative for testing conducted outside the clinical setting.
Using data obtained by the authors in earlier work  , the present study investigates the degree to which an online adaptive test of speech recognition can serve as a proxy for pure-tone hearing screening methods. The investigation sought to quantify the accuracy of the NTID Speech Recognition Test (NSRT®) as a screening measure with regard to the American Speech-Language-Hearing Association (ASHA) criteria for hearing screening in adults  . Specifically, the present study sought to evaluate the effectiveness of the NSRT® to separate test-takers who suffered from sensorineural hearing loss from those who were normally hearing, using the test criterion established by ASHA (i.e., the uniform screening level of 25 dB HL at pure-tone frequencies of 1, 2 and 4 kHz). Results of audiometric pure-tone testing in the clinical setting were compared to those predicted from online NSRT® testing. The outcome of each screening (i.e., traditional vs. NSRT®) was expressed as one of two possibilities: pass or refer a respondent for further evaluation.
2. Materials and Methods
Test Materials and Protocol: The NSRT® is a computer-based adaptive test of speech recognition. Results of validity and reliability studies have been reported previously, as have various applications of the test   . These results indicate high reliability coefficients for the NSRT® and moderate correlations with pure-tone thresholds across the octave frequencies 0.5 to 8 kHz (PTA and HFPTA) and other speech recognition measures (W-22, QuickSIN and SRT). Moreover, NSRT® scores have been shown to provide insight into the phonetic errors that affect speech understanding in adults who suffer from sensorineural hearing loss. When combined with chronological age and self-report of hearing handicap in a multiple linear regression procedure, performance on the NSRT® has been shown to be closely related to pure-tone thresholds for individuals having hearing losses < 55 dB HL. The close relationship between hearing thresholds and the combination of NSRT® scores, age and self-report of hearing handicap enables the multiple regression algorithms to predict hearing thresholds in individuals having hearing sensitivity ranging from normal to moderate hearing loss.
The NSRT® is composed of sentence-length utterances containing phonetic contrasts, primarily minimal pairs. The test protocol uses a paired comparison discrimination task in which a standard sentence is paired with two comparison sentences. Respondents must indicate if comparison sentences are the same or different from the standard sentence. Responses to the discrimination tasks are scored correct/incorrect.
The NSRT® is a computerized, adaptive test. It administers items utilizing an “up-down” method that selects items for presentation to respondents on the basis of their “information” value, a statistical concept. Like the stimuli used in adaptive psychophysical procedures, the stimuli used in adaptive testing are scaled along a continuum extending from low to high degrees of magnitude. However, rather than representing a physical construct such as the intensity of a sound, the continuum in this instance represents a domain of human performance, speech recognition ability   .
The “up-down” method of item selection insures that the items administered to each respondent are selected from a narrow range of difficulty spanning their level of ability, which is itself constantly updated as items are administered during the testing process. The increase/decrease in item difficulty is associated with variation in the phonetic and acoustic properties of the utterances. Items continue to be administered until one of several test termination criteria has been met. These criteria include a predetermined time limit and number of items, as well as a pre-selected level of reliability. In addition, testing is terminated when the number of items within a specified range of difficulty has been exhausted. For additional detail and background information regarding the item content and design of the NSRT® application, the interested reader is referred to our earlier work   . The NSRT® application itself is accessible at https://apps.ntid.rit.edu/NSRT/.
Participants and Procedure: Data used in this research were obtained from earlier studies conducted by the authors   . Data collection occurred at the audiology clinics at the National Technical Institute for the Deaf (NTID) at the Rochester Institute of Technology, the University at Buffalo and Syracuse University. Subjects in the studies were volunteers recruited from the campuses/communities served by the three clinics. One hundred twenty adults (54 males and 66 females) aged 18 - 88 years (mean = 55.0 years, sd = 23.0) participated in the study. The participants were all in good health, had sensorineural hearing loss with no evidence of any other handicapping condition, and were native speakers of English.
Pure-tone audiometric thresholds were obtained using standard audiometric procedures  at octave frequencies between 0.25 and 8 kHz. Air-bone gaps were ≤ 10 dB for all listeners. The signals were presented to participants’ right/preferred ear through EAR-3A insert earphones at the NTID Audiology Clinic and the University at Buffalo, and TDH-50 supra-aural earphones at Syracuse University. Individuals with conductive hearing loss, as indicated by an air bone gap > 10 dB at any test frequency or failure to show a peak in the tympanogram between -200 and +200 daPa of ambient pressure in the test ear, were not included in the study.
Table 1 summarizes the average hearing thresholds of all study participants computed across three different frequency ranges (i.e., lower, middle, and higher frequency ranges). Speech recognition threshold (SRT) data are also shown in the table. In this study, we focus on hearing screening at pure-tone frequencies of 1, 2 and 4 kHz, the criterion established by ASHA.
In addition to pure-tone testing, all participants were administered the NSRT® in both quiet and +5 dB SNR (signal-to-noise ratio) background noise (multi-talker babble). The NSRT® was presented monaurally at 70 dB SPL or at participants’ MCL if 70 dB SPL was less than 10 dB SPL re their PTA.
Data Analysis: In our earlier published work on the NSRT®  , we employ ordinary least squares (OLS) linear regression statistical procedures that combine the information obtained from an interactive testing experience in a man-
Table 1. Summary observed clinical characteristics of study participants (n = 120).
Note: All values reported in the table above are in dB HL.
ner that enables prediction (estimation) of pure-tone hearing thresholds across the octave frequencies 0.5 to 8 kHz. The data necessary to estimate these frequency-specific thresholds are: (1) an average of NSRT® test performance under two conditions (quiet and +5 dB SNR background noise); (2) age reported by a respondent; and (3) a binary indicator variable reflecting the respondent’s perception whether they suffer from hearing impairment (1 = yes, 0 = no).
To evaluate the effectiveness of the NSRT® as a hearing screening measure, the present study focuses on pure-tone frequencies at 1, 2 and 4 kHz. ASHA screening guidelines stipulate that hearing thresholds in excess of 25 dB HL at any one of these frequencies is indicative of hearing loss, warranting referral for further evaluation. In this study, we compare participants’ hearing thresholds obtained in a clinical setting, with pure-tone stimuli, with those estimated from the NSRT® testing protocol at 1, 2 and 4 kHz. Strictly speaking, the NSRT® data in this study were not obtained in the “home” setting. Consequently, the listening conditions under which these data were obtained (i.e., a laboratory setting) should be regarded as optimal.
3. Results and Discussion
In the United States and Canada, the ASHA criteria for hearing screening are widely accepted as the gold standard. The ASHA standard for hearing screening in adults  is the criterion against which we evaluate the NSRT®.
The effectiveness of a screening test is evaluated, in part, on the basis of its success in separating individuals with a target condition (i.e., hearing loss) from those without the same condition. The stimulus materials specified by ASHA for hearing screening in adults consist of pure-tones at 1, 2 and 4 kHz presented at 25 dB HL. To pass the screening criterion, test takers must respond to all of the stimuli. Failure at any one frequency results in a referral for audiological evaluation. The outcome of hearing screening, in accordance with ASHA guidelines, is a binary classification (pass or refer).
For the 120 participants in the current study, pure-tone hearing thresholds observed in the clinical setting at 1, 2 and 4 kHz provided the criterion measure (clinical outcomes) to which the screening test results were compared. Proxy thresholds estimated from the NSRT® data provided by the same participants served as the screening outcome. The question of interest here centered on the agreement of the screening test outcome and criterion measure, the former obtainable via self-administration of an online testing procedure, the latter obtained under controlled clinical testing conditions.
Measurements of sensitivity and specificity are commonly used to evaluate the efficacy of screening tests. When the clinically-obtained thresholds were compared to the online hearing screening test outcomes, the sensitivity of the NSRT® was found to be 95%. The specificity of the NSRT® was found to be 87%. The NSRT® protocol was designed first and foremost to identify those individuals with age-related hearing loss who, through a simple screening procedure, could benefit from an intervention that might, otherwise, not have been realized owing to more restricted, costly audiological testing procedures. Hence, the sensitivity of the NSRT® was primary in importance here. Overall diagnostic accuracy for hearing screening using the NSRT® was 91%.
Figure 1 illustrates visually the distribution of average hearing threshold across the frequencies 1, 2 and 4 kHz for participants’ observed (clinical, upper panel) and predicted (NSRT®, lower panel) hearing thresholds. The distributions are mirrored for comparative purposes. While this illustration uses average hearing threshold data, deviating from the ASHA reliance on discrete measurements of hearing sensitivity at specific frequencies, the overlap of the distributions shown in the figure indicates clearly that hearing screening in the region defined by 1, 2 and 4 kHz is effectively measured by the NSRT® application. A test of the difference between the means shown in the distributions yielded a t-ratio = 0.55 (p > 0.58).
Figure 2 elaborates the statistical information provided in Figure 1 by eval- uating the congruence of observed and predicted hearing thresholds across subgroups of participants. Figure 2 presents a boxplot display for each of four groups of respondents defined by degree of hearing loss. Normal hearing sensitivity was defined as an average observed PTA ≤ 25 dB HL across the frequency range 1, 2 and 4 kHz. Average hearing losses falling in the 26 - 40 dB HL range for pure-tones 1, 2 and 4 kHz were defined as mild, those in the 41 - 55 dB HL range were defined as moderate, and average hearing losses of 56+ dB HL were defined as severe.
In Figure 2, the center line within a box represents the median PTA across the frequency range 1 to 4 kHz observed vs. predicted for each of the subgroups of respondents. The bottom edge of a box represents the PTA corresponding to the 25th percentile, the top edge corresponds to the PTA value at the 75th percentile. The whiskers extending beyond the boxes represent 1.5 times the distance be-
Figure 1. Distribution of observed vs. predicted pure-tone averages.
tween the 25th and 75th percentile, adjusted for minimum/maximum observed or predicted values.
The percentages reported above the boxplot displays reflect the sizes of the subgroups of respondents, relative to the total sample of 120 participants. Figure 2 illustrates that the predicted PTA across the frequency range 1 to 4 kHz are most closely related to the PTA actually observed for individuals having hearing losses < 55 dB HL. The correspondence between observed and predicted PTA values is further reflected in a Pearson r = 0.90 (p < 0.01) across all 120 respondents.
Finally, hearing threshold data can also be represented by vectors. Vectors can be manipulated mathematically. In this study, we employ the cosine similarity (cos(θ)) index, a method used in information technology for text matching, to evaluate the congruence of hearing thresholds observed and predicted across the frequency range 1 to 4 kHz. To constrain the cosine similarity index to a 0 ? 1 range, hearing thresholds observed as negative decibels across the frequency range 1 to 4 kHz were transformed to a value of 0 dB, the normal audiometric standard at any frequency. The cosine similarity index provides a good indication of resemblance, here the similarity of two vectors A and B (corresponding to vectors with data entries representing hearing thresholds across the octave frequencies 1 to 4 kHz for observed [A] vs. predicted [B] values).
The numerator of the equation for the cosine similarity index is the dot product (dp) of A and B. It holds information about the direction of the vectors. If dpAB > 0, A and B form an angle < 90 degrees. If dpAB = 0, A and B form an angle that is exactly 90 degrees, indicating orthogonality. If dpAB < 0, A and B form an
Figure 2. Prediction accuracy conditioned on severity of hearing loss.
angle > 90 degrees, not applicable here.
An angle of 0 degrees means that cos(θ) = 1, indicating that the vectors have identical directions. An angle of 90 degrees means that cos(θ) = 0, indicating that the vectors are perpendicular to one another.
Taking square roots, dpAA and dpBB hold length information. The formula for the computation of cos(θ) is a dot product/length product ratio, measuring the direction-length resemblance between two vectors representing corresponding data sets (i.e., observed vs. predicted hearing thresholds across the frequency range 1 to 4 kHz).
Figure 3 confirms the similarity in magnitude and direction of hearing threshold values observed and predicted across the frequency range 1 to 4 kHz for individuals included in the study, as evidenced by a preponderance of cos(θ) indices approaching the upper limit = 1. Note that this analysis is based upon the test records of 107 of the 120 study participants. The reduction in number of participants included in these computations is necessitated when dpAA or dpBB = 0 (i.e., division by zero attempted).
It is also worth noting that the eight individuals with cos(θ) indices at the lower end of the frequency distribution (<0.70) had observed hearing thresholds = 0 dB at one/two of the frequencies ranging 1 to 4 kHz, constraining the upper limit of the cos(θ) calculation. The participant with the lowest cos(θ) index = 0 had observed hearing thresholds across the frequency range 1 to 4 kHz of 5 dB, 5 dB, and 0 dB, respectively; predicted thresholds were 0 dB, 0 dB, and 5 dB, respectively. These vectors, orthogonal, however very similar in their indication of normal hearing sensitivity, illustrate the need for caution in interpreting the cos(θ) index in those cases where the number of attributes in a vector is small and zero data entries are significant.
Figure 3. Similarity of observed vs. pseudo audiograms across the frequency range 1 to 4 kHz.
In this study, we evaluate the effectiveness of a hearing screening protocol that uses naturalistic speech-based stimulus materials to estimate hearing thresholds in individuals having hearing sensitivity ranging from normal to moderate hearing loss. The NSRT® is a self-administered internet-based application accessible on home and office computers, as well as other wireless devices. The application is targeted on individuals who suffer from age-related, progressive, mild to moderate hearing loss. As such, it has the potential to reach tens of millions of individuals who might not otherwise avail themselves of audiological assessment procedures provided in a formal clinical setting.
A recent report by USA President Obama’s Council of Advisors on Science and Technology (PCAST) indicates that hearing loss in an aging population is now a “substantial national problem,” citing cost as the largest barrier to hearing technology access by individuals who could benefit from it, simultaneously advocating changes in FDA regulation that include disruptive improvements in both the assessment and treatment of hearing loss  . In the report PCAST indicates that, whereas the hearing healthcare needs of a growing population of Americans are not being met by existing market models, Americans might be “better served if non-surgical air-conduction devices intended to address bilateral, gradual-onset, mild-to-moderate age-related hearing loss were available over-the-counter (OTC).” The report goes on to state that “Simple hearing tests to aid consumers in purchasing such OTC hearing aids should also be available OTC, including on-line and in stores.”
Data presented in this study clearly indicate that the NSRT® can serve as a proxy for pure-tone hearing screening methods. These data are especially impressive in light of the fact that differences on the order of +10 dB have been reported between thresholds obtained on hearing impaired persons with automated versus manual audiometry and test-retest reliabilities in standard audiometry     . The NSRT® is most suitable for screening where the primary cause of hearing loss is age-related, the vast majority of cases. The application was developed using linear statistical methods. As such, the NSRT® application focuses on detection of gradually-sloping or ski-slope hearing losses, by far the most common kind of hearing loss configuration. It was not intended to diagnose more complex kinds of hearing loss.
Beyond the obvious performance criteria that hearing screening tests must be sensitive and specific, demonstrated herein for the NSRT®, ASHA  guidelines for hearing screening tests require that such tests be: (1) easy to administer; (2) comfortable for the test taker; (3) short in duration; and (4) inexpensive. The NSRT® is self-administering and can be taken in the privacy of one’s home, it is short in duration (5 - 7 minutes) and freely available for use, and it provides respondents with informative reports of test performance immediately following a testing session. Because the application was developed for use primarily by persons aged 60+ years, who may face declines in cognitive and physical abilities associated with the use of electronic technologies, the NSRT® was designed in accordance with guidelines established by the National Institute on Aging and the National Library of Medicine in the USA  . The NSRT® application is accessible at https://apps.ntid.rit.edu/NSRT/.
Funding: This research received NO specific grant from any funding agency, commercial or not-for-profit sectors.
Conflict of Interest: The authors declare that there is NO conflict of interest regarding the publication of this paper.