Recognizing emotions in others is an important aspect of our daily communication and therefore of our emotional competence. Emotions are multidimensional phenomena and, by definition, lead to changes on different levels (Scherer, 2005; Traue & Kessler, 2003) : there is an appraisal process at the cognitive level, accompanied by a subjective feeling. Almost in parallel to these, there are measurable physiological changes within the central and autonomic nervous systems as well as different behavioral responses such as facial expressions.
Most researchers use photographs to measure and understand the ability to decode emotions via nonverbal signals and interpret their results based on the basic emotions defined by Ekman (Ekman, 1992) . The definition of these emotions is based on the fundamental idea that emotions are discrete and have evolved biologically (through adaptation to our environments) which in turn makes them universal (i.e. they are expressed and identified in every culture) (Ekman, 1992; Ekman & Cordaro, 2011) .
If there is a biological basis for understanding emotions and decoding facial signals, it is worthwhile to look for possible mediators and the responsible regions within the central nervous system, especially the limbic system as the amygdala (Kreibig, 2010; Murphy et al., 2003) . These areas of the brain are known to be influenced in particular by sex hormones such as estrogen, progesterone and testosterone (Ter Horst, 2010) . A number of studies have been published about testosterone and emotion processing. Many of these were performed using female subjects who received a sublingual application of exogenous testosterone, reporting a lower recognition rate for angry expressions (van Honk & Schutter, 2007) or poorer accuracy rates in the “Reading the Mind in the Eyes” test (Bos et al., 2016) . Goetz and colleagues found a positive correlation in men between the endogenous testosterone concentration and the reactivity of the amygdala, the hypothalamus and the periaqueductal grey to angry facial expressions (Goetz et al., 2014) . Ackerman and colleagues even report higher amygdala activity in men with higher endogenous testosterone concentration while viewing neutral pictures (Ackermann et al., 2012) .
Whether circulating testosterone levels affect sensitivity in terms of emotion recognition remains unknown. Therefore, the aim of our study was to investigate the influence of endogenous testosterone in males on the ability to recognize emotions using the FEEL test (Kessler et al., 2002) with picture material from the JACFEE dataset (Matsumoto & Ekman, 1988) presenting the emotions at intensities of 50% and 100%. Based on the literature reported above, we hypothesized that:
1) Testosterone impacts emotion recognition accuracy leading to a negative correlation between testosterone and emotion recognition accuracy.
2) If testosterone had an impact on emotion recognition, we expected it to be more prominent in the pictures at 50% intensity because the task of emotion recognition is more difficult and the variance higher in these cases.
In order to test these hypotheses the following methods were conducted.
2. Material and Methods
A total of 40 right-handed, healthy males aged 20 - 30 years (mean = 24.1 years; SD = 2.6 years; see Table 1) were recruited for our study by means of posters distributed throughout the Ulm University campus. At the end of the testing sessions, all participants were reimbursed for their time either with 10 ?or 2 participation-hours (necessary for psychology students). The study was designed in accordance with the ethical guidelines set out in the WMA Declaration of Helsinki and approved (#245/08-UBB/se) by the ethics committee of the University of Ulm (Helmholtzstraße 20, 89081 Ulm, Germany). Each experiment was undertaken with the understanding and written consent of each participant. The requirements for participation were that subjects must have no former psychiatric diseases, must not be taking hormonal preparations, must not be engaging in extreme body workouts (such as bodybuilding), show no obesity and must be heterosexual. The last point was necessary for a second part of our study, an emotion induction task using pictures from the International Affective Picture System (Lang et al., 1997; Lang et al., 2005) , which was not part of the analysis for the present study, see Figure 1.
All participants were instructed not to eat or drink for 30 minutes prior to the experiment to prevent any salivary contamination. After a short introduction to the lab, all participants filled out several questionnaires (TAS-20 Scale (Bach et al., 1996) , BSRI (Bem, 1974) , Empathizing/Systemizing (Samson & Huber, 2010) , TIPI (Muck et al., 2007) ) for further analyses. Because we used only the TAS-20 scale for this part of the study (see below), the other three personality and gender role questionnaires will not be further explained here. The German paper version of the TAS-20 scale (Toronto Alexithymia Scale-20) was used to check whether any of our “healthy” male sample demonstrated signs of alexithymia, which typically include a deficiency in recognizing and describing emotions, and externally-oriented thinking (Kessler et al., 2006) . Based on the TAS Scale, we determined that 4 males from among our sample had alexithymia (see results for further details).
The endogenous testosterone concentration was measured 3 times via salivary samples, see Figure 1. This method of measurement was chosen in order to prevent a “falsely” high or low measurement (directly after or prior to a secretion pulse), since testosterone is secreted in a pulsatile manner. Over time, this collection method enabled us to measure the testosterone level as a mean value for a period of about 2 hours rather than a single instance during the experiment. The measurements will be discussed in Section 4.)
All subjects were instructed to fill a small vial three times with a collection sample of approx. 0.5 ml each time, using a small straw to induce the saliva flow. The vials were stored in a refrigerator after each measurement. All samples were analyzed by SwissHealthMed, Feldkirchen (Germany) using the enzyme-linked immunoassay DEMEDITEC (Testosterone free in Saliva ELISA) and repeated measurements were performed (2 internal controls). The lowest detectable level of testosterone that can be distinguished from zero is 2.2 pg/ml at a confidence interval of 2 SD. (Denoted intra-assay coefficients of variation 5.6% - 9.7%,
Table 1. Physiological and psychometric data.
Figure 1. The ERECT study procedure: Every 30 min. a salivary sample of 0.5 ml was collected, resulting in a total saliva sample of 1.5 ml. (Since the questionnaires were not part of our analysis, they will not be explained in further detail).
inter-assay variation: 7.0% - 8.0%; determined by 20 replicate measurements of 3 saliva samples and duplicate measurements of saliva samples over 10 days).
Because testosterone concentration is subject to a diurnal rhythm (high in the morning, low in the evening) (Bribiescas & Hill, 2010) , we randomized the participants into 2 groups, inviting them to participate in the experiment at 8:00 AM; 10:00 AM; 2:00 PM or 4:00 PM (20 subjects in the morning and 20 in the evening) to ensure a high variance of the testosterone concentration.
After the first salivary collection and the completion of the questionnaires, the subjects were brought to the computer for the FEEL (Facially Expressed Emotion Labeling) test. This computer-based test was first published in 2002 by Kessler and colleagues (Kessler et al., 2002) and measures recognition accuracy for the six basic emotions defined by Ekman: anger, fear, sadness, disgust, surprise and happiness. In every round, one picture of a facial expression is presented to the subject, who must decide which of the six categories it depicts (forced choice paradigm). The picture material consisted of the JACFEE (Japanese and Caucasian Facial Expressions of Emotions) dataset (Matsumoto et al., 1988) and was presented in a randomized order to prevent sequential effects.
All participants read the instructions given on the screen and clicked “Okay” to proceed. The first six rounds were part of a trial run, giving participants the chance to ask questions about the test and the forced choice screen. Afterwards, the experimenter left the room and the participant started the test. A brief sound was played when each fixation cross (see Figure 2) was displayed to increase the focus of the viewer on the screen. After a short pause, during which a blank screen was displayed for 250 ms, one of the stimuli was presented for 1000 ms. The user was then given 10 seconds to assign the picture to a category. Then, the next fixation cross with sound was presented and so forth.
Figure 2. FEEL test: Presentation order (top) and an enlarged view of the decision screen on the (below).
Each FEEL session took about 15 - 20 minutes and every subject was shown his individual accuracy rates for each emotion and his overall accuracy rate on the screen (96 stimuli = 96 points = 100%). The FEEL results were exported as. xls files and prepared for statistical analysis using SPSS v.23, see Section 2.3.
2.3. Statistical Analysis
We used various statistical methods to analyze our data and our hypotheses, , the calculations for each of which were performed using IBM SPSS Statistics v 23 (IBM, Inc., Armonk, NY, USA). The effect sizes in each case were determined using G*Power 18.104.22.168 software (Faul et al., 2007) .
At first, we used a general linear model because of the structure of our data. The within-subject factor was emotion (anger, surprise, sadness, happiness, disgust, fear), the predictor was intensity and the covariate was testosterone. For the Spearman’s correlation, we used the means (SD) of the emotion accuracy scores as well as the testosterone concentration.
Finally, we conducted two Mann-Whitney U tests: One to look for significant differences between both subgroups (median split) for age, emotion regulation and alexithymia scales and one to look for the influence of testosterone on emotion recognition ability.
Because we performed multiple tests (testosterone and emotion intensity as the independent variable), we used Bonferroni corrections to adjust the significance p for the purposes of the specific analysis.
In the case of the correlational analysis we adjusted the significance level to (7 × 2): p = 0.003, which remained the same for the Median Split analysis. In the case of values above the Bonferroni-corrected significance level, we decided to still report the results with the corresponding effect sizes and the statistical power achieved (Nakagawa, 2004) .
Although all subjects followed the instructions they were given, the salivary sample of one subject was contaminated and no valid testosterone analysis could be conducted. Accordingly, this person was excluded from further analyses. All results presented here were derived from a sample of n = 39 subjects. The data of height and weight was used to calculate the body mass index, see Table 1. (Unfortunately, the data of two persons are missing).
As we requested a high variance of testosterone concentrations in total, we conducted our experiments both in the morning and in the afternoon. Comparing both median split groups revealed no significant differences between both subgroups for age, personality scales, emotion regulation and alexithymia scales when measured with a Mann-Whitney U test.
The 4 subjects who were found to have alexithymia were not excluded from the final analysis because the TAS scale mean of 45.05 (SD = 9.76) was comparable to those of other studies (Nyklı́ček & Vingerhoets, 2000) . A second analysis showed that their exclusion from the sample did not change the results in the end.
3.1. Emotion Recognition
All participants achieved an overall (for all emotions) accuracy rate of 77.03%. Accuracy rates differed for each emotion and among intensities. As expected, the accuracy rates for the 50% intensity pictures were significantly lower than those for the full-blown expressions (with the exception of disgust), see Figure 3. We calculated a general linear model with respect to the structure of our data. The analyses revealed that the intensity had a substantial influence on both the recognition rate (Waldχ2(1, N = 39) = 39.011, p < 0.001 and the perceived intensity of the stimulus material ((Waldχ2(3, N = 39) = 26.224, p < 0.001). Post-hoc analyses revealed significant decreases for all emotions besides disgust, see Figure 3. The means calculated are comparable to those of other studies using the FEEL Test with a healthy sample and utilizing different emotional intensities (Hoffmann et al., 2010) .
3.2. Testosterone and Emotion Recognition
The mean testosterone concentration of all participants was 194.2 pg/ml, which is comparable to those of other studies measuring the free testosterone with salivary samples (Smith et al., 2013) We used the same general linear method described above to look for an influence of testosterone on the recognition rates with different intensities. The model revealed no significant influence of testosterone on emotion recognition accuracy (Waldχ2(1, N = 39) = 0.373, p = 0.541).
In order to look further for a potential influence of testosterone on the accuracy rate, we pursued two additional analyses: a correlation was conducted with testosterone and the emotion recognition and secondly, a Mann-Whitney U test was used to get more information using a median split sample group (low and
Figure 3. Emotion recognition rates (mean and SD) for 50% and 100% intensity. *p ≤ 0.05.
high testosterone group (low median group: n = 20; mean concentration = 135.71 pg/ml; high median group = 19; mean concentration = 255.84 pg/ml)), see below.
The Spearman’s Rho correlation between the testosterone concentration and all emotions displayed at intensities of 50% and 100% revealed one negative correlation: With increasing testosterone levels, the emotion recognition rate for disgust displayed at 50% intensity decreased (r = −0.395; p = 0.013; d = 0.6), see Figure 4. This difference, however, missed the corrected significance level. Additionally, we found no significant correlation between testosterone and any other emotions or the total score for all emotions.
To substantiate the correlation outlined above, we compared two groups from our sample by means of the Mann-Whitney U-Test using a median split, which gave rise to a subsample of subjects with low testosterone levels and one containing subjects with high testosterone levels (see Table 1). Our results revealed few differences between both groups and also substantiated the correlational association, (see Figure 5(b)) for 50% and Figure 5(a)) for 100% intensity).
Sadness shown at full intensity (Z = −2.11; p = 0.05, dcohen = 0.55) was recognized more poorly by males with high testosterone concentrations as a statistical trend, and the overall recognition rate was also weak in this group (Z = −2.16; p = 0.03; dcohen = 0.35). In addition, we found higher recognition rates for fear at subtle intensities (Z = −1.84; p = 0.07; dcohen = 0.61) and disgust (Z = −2.21; p = 0.028; dcohen = 0.76) in males with lower testosterone levels. Due to the Bonferroni correction, none of the results presented above can be considered significant, however we are reporting them as statistical trends, as the effect sizes are medium.
In conclusion, our results show that with increasing testosterone levels, men tend to show lower emotion recognition rates for full-blown emotions in general
Figure 4. Spearman correlation between disgust 50% and salivary testosterone.
Figure 5. Emotion recognition rates (mean and SD) for the median split sample with Tlow = low testosterone and Thigh = high testosterone for (a) 100% intensity and (b) 50% intensity. ++p ≤ 0.05; +p ≤ 0.1.
and especially for sadness. Furthermore, men with high testosterone concentration show an impaired ability to recognize disgust, albeit on a subtle level.
Emotion recognition plays a crucial part in all our communications as it enables us to react adequately to our conversation partners and their emotions. Humans use facial expressions in particular to transmit the so-called basic emotions defined by Ekman (Ekman & Cordaro, 2011). Theoretically, they are presented facially and recognized correctly within practically every culture on this planet. At this point, we would like to emphasize the term “practically” because there are some findings in the literature indicating that specific variables may have an impact on emotion recognition ability (e.g., gender and sex hormones).
The aim of our study was to investigate the influence of the sex hormone testosterone in young males on emotion recognition ability in general, for each basic emotion and also for emotional facial expressions presented at different intensities (50% vs. 100%). We decided to compare only these two intensities, as 100% is usually used in emotion recognition studies and 50% should enable us to find differences on a subtle level of intensity. Although it would have been possible to present 60% - 90% intensity material as well (as it was used in our former study (Hoffmann et al.)), we rejected this option because there was a high risk of a learning effect with our design. For example, one participant rates and identifies 100% anger as displayed the first actor and afterwards (randomly selected), the same actor is portrayed displaying 60% anger. This could lead to improved recognition, not because of the intensity but because the participant remembers the face and the emotion when presented at 100%. There was two possible solutions to this problem: a) expand the range of the picture material to include different intensities presented by different actors or b) present only subsets of pictures to a large sample group as it was realized in (Hoffmann et al., 2010). Due to financial restrictions and our specific interest in hormonal measurements we decided to use only 50% and 100% with the current design.
Our results only partially confirmed our hypothesis that emotion recognition ability decreases with increasing testosterone levels: In particular, the generally decreased correlation we had hypothesized only held true for disgust presented at an intensity of 50%. Although we calculated the correlation analysis with our sample size of n = 39, the correlational factor of r = 0.39 describes a mid-sized effect of d = 0.61 calculated according to Sedlmeier & Renkewitz (Sedlmeier & Renkewitz, 2013). Additional median split comparisons revealed further statistical trends: the total recognition rate for emotions shown at 100% intensity in general, sadness shown at 100% and disgust and fear shown at 50% intensity is lower in males with high testosterone levels but not on a Bonferroni-corrected level. The effect size, however, indicates that testosterone has an observable negative influence on the recognition of specific emotions. We do know that median splits are sometimes criticized (Irwin & McClelland, 2003) but a number of researchers legitimize their use as long as the independent variables do not correlate (Iacobucci et al., 2015). Because this was not the case in our study, we conducted the median split and analyzed our data with respect to this dichotomization and with respect to Bonferroni corrections.
Although our results substantiate the impression given in the literature, namely that testosterone impairs emotion recognition, they do not confirm the thesis that it makes humans or males less empathic and more antisocial. This feeling emerges when reading the literature about testosterone and behavior. For example, females perform more poorly at recognizing angry facial expressions after an exogenous testosterone administration (van Honk & Schutter, 2007). Van Honk and Schutter report a reduced sensitivity to threat signals shown in faces (fear, disgust and anger). However, their study only revealed significant results for anger and not fear or disgust, which had also been expected and defined as threat signals. Interestingly, the latter emotions are the ones for which we found differences in the median comparisons in this study.
How can this be so? It is not yet possible to answer this question adequately. This is because there is one main difficulty when it comes to comparing the two studies, namely that the Van Honk and Schutter study was conducted with females and using a one-off injection of exogenous testosterone. Our results are not comparable with theirs because we used the endogenous testosterone levels of young males instead. Derntl and colleagues (Derntl et al., 2009) report a positive correlation between amygdala reactivity and testosterone during the processing of fearful and angry faces. However, the design of this study (two-alternative responses) also limits a comparison between theirs and ours.
There is, however, one other explanation for these results, according to Mazur and colleagues’ interpretation: Testosterone makes humans more dominant (Mazur & Booth, 1998), which is important when attempting to secure your social status. We found a decrease in the emotion detection accuracy rates for sadness and subtle disgust in males with high testosterone concentrations. Both emotions are crucial for specific nuances of communication and especially when one conversation partner wants to be perceived as dominant. For example, showing disgust to a communication partner can indicate disrespect for the words spoken or the person speaking them. This nonverbal communication would not be perceived by the dominant person because his sensitivity to this emotion is decreased. This in turn leads to a feeling of dominance and perhaps also to dominant behavior. The same would apply to the emotion of contempt. Although both emotions (disgust and contempt) are sometimes confused, according to Ekman, they are specific and distinguishable (Ekman, 2003). As we did not measure the emotion category of contempt in our study, it would be interesting to take it into account as well, especially as contempt is an emotion that is based on a feeling of being superior to one’s communication partner and is therefore important for the feeling of dominance and social status. The same is true of sadness. Since this emotion is a non-threat signal according to van Honk and Schutter, it might be not necessary for status-seeking males with high testosterone levels to recognize it. A reciprocal association between sadness, tears and testosterone was shown in a 2011 study published in Science (Gelstein et al., 2011), which reported a measurable decrease in testosterone in males after sniffing female tears. However, this decrease was interpreted as being induced by chemosignals. To the best of our knowledge, no other study has reported a decreased emotion recognition rate for sadness connected with testosterone. However, it would be interesting for future studies to try to replicate this finding.
Finally, we discovered a trend indicating a lower recognition rate for subtle fear in males with high testosterone levels. In contrast to disgust and sadness, fear is an emotion that provokes helping behaviors in the observer. This emotion was also defined as a threat stimulus by van Honk and Schutter (van Honk & Schutter, 2007). They also report a declining sensitivity to these signals with increasing testosterone levels, which can be interpreted as indicating that recognition of such signals is not necessary for males wishing to retain their status.
However, at this point we think it is important not to draw conclusions that are too general. Our results can be interpreted as indicating that the decreased emotion recognition ability is influenced by testosterone, but there must be additional moderating variables (e.g., psychometric variables or other hormones). Therefore, we would like to encourage future studies to include additional hormonal measurements to reveal the effect of hormones on the human processing of emotions as they play this important role in our everyday lives.
There are two additional facts we would like to address here. Of course, we are aware of the advantages and disadvantages of the salivary testosterone measurement method we used for this study. Although we measured testosterone levels three times and believe we have a valid mean value for the testosterone concentration for the duration of the tests, we do not really know the long-term testosterone concentration of our sample. As we believe that the long-term concentration has a more significant effect on emotional skills, we propose that this effect be analyzed over a longer period. As a further comment concerning the measurement method, one could argue that due to a high anticipatory baseline, testosterone levels decrease during an experiment. Nonetheless, we rate our measurement as valid because we obtained relatively stable values. Therefore, we consider this an adequate way of measuring the endogenous testosterone level. >
Secondly, it is important to note that in our study, there was no “emotion not identified” option in the FEEL test. This would have given us the opportunity to differentiate better between the recognition accuracies for all emotions.
In conclusion, the results of our study indicate a decreasing ability in emotion recognition with increasing endogenous testosterone level in young healthy males. This was especially true for disgust presented with an intensity of 50% and sadness presented with an intensity of 100%. Further studies should consider this impact of testosterone on emotion recognition and go more into detail to get an idea of how endocrinological parameters can modulate behavior and empathy.
This research was part of the SFB/Transregio 62-Companion Technology for Cognitive Systems project funded by the German Research Foundation and by the Ulm University “Baustein-Programm” scholarship awarded by the Medical Faculty to Dr. Stefanie Rukavina. We would like to thank Prof. Derntl for her helpful comments at the beginning of the experiment and all the subjects who participated for making this study possible.
#Both of these authors contributed equally to this work.