Over the past decades, the concept of subjective well-being (SWB) has been extensively studied (e.g. Diener et al., 1999; Keyes, 2006; Ryan & Deci, 2001 ), whilst due to its uncontested significance a great amount of efforts has been put on the development of the appropriate tools for its measurement (e.g. Bai et al., 2011; Diener, 1984; Diener et al., 1999; Silva & Caetano, 2013; Keyes & Magyar-Moe, 2003; Norrish & Vella-Brodrick, 2008; Veenhoven, 2013 ).
Affect describes the emotional experience that along with life satisfaction can decisively influence subjective well-being (Diener, Scollon, & Lucas, 2009) . According to Watson and Tellegen (1985) , affect comprises two highly distinctive dimensions, Positive Affect (PA) and Negative Affect (NA). PA refers to individuals’ experiences in positive mood states such as excitement, enthusiasm, cheer, alertness, vigilance, activeness, curiosity, and trust, while NA reflects aversive emotional states such as fear, anger, sorrow, depression, guilt, abasement, hatred, disgust, contempt, nervosity, underestimation, and scorn (Russell, 1980; Watson, Clark, & Tellegen, 1988; Diener, Scollon, & Lucas, 2009; Watson, 2002) .
Positive and negative affect can either describe a state, meaning that they may reflect relatively temporary mood fluctuations, or a trait that describes a permanent, stable and rigid personal characteristic, referred to as positive and negative affectivity or trait PA and trait NA respectively (Watson & Pennebaker, 1989) . Studies have shown that trait PA is linked to extroversion and therefore individuals with high positive affectivity are found to be happier and more energetic, whereas those with low levels of positive affectivity are more vulnerable to depression. On the other hand, trait NA is associated with anxiety and neuroticism and thus individuals who demonstrate high negative affectivity are more likely to experience high levels of distress, while those who report low negative affectivity tend to be more content (Tellegen, 1985; Watson & Pennebaker, 1989; Watson & Clark, 1984; Costa & McCrae, 1980; Clark & Watson, 1991; Clark, Steer, & Beck, 1994; Jolly et al., 1994) .
Given the importance of well-being in people’s lives as well as the two-factor structure of affect, an abundance of scales has been developed to measure the pleasant and unpleasant emotions, with the Positive and Negative Affect Schedule (PANAS; Watson, Clark, & Tellegen, 1988 ) adopted as the predominant instrument for assessing positive and negative affect, mainly due to its psychometric properties’ superiority (Thompson, 2007; Gray & Watson, 2007; Diener et al., 2010) . The PANAS is a tool consisting of two scales, one measuring PA and one measuring NA. Although it is the most widely used measure of well-being and despite its favorable validity and reliability, the PANAS does not come without limitations and shortcomings. Among the most common criticisms of the instrument is that it includes several items that are not considered to be feelings, especially within the PA scale (e.g. active, alert, strong, determined), and some others that are infrequent (e.g. inspired), while several core emotions (e.g. happy, sad) are omitted from the general scales (Diener et al., 2010) . The scale has also been criticized for disregarding the difference in the desirability of feelings in different cultures and contexts (Schimmack, Diener, & Oishi, 2002; Oishi, Schimmack, & Colcombe, 2003; Diener et al., 2010; Tsai, Knutson, & Fung, 2006) , as well as for overrepresenting some feelings (e.g. anxiety) using several similar adjectives for describing them (e.g. jittery, nervous, scared and afraid are all emotions that represent anxiety), meaning that the scale is weighted towards a certain type of feeling (Thompson, 2007; Crawford & Henry, 2004) .
In order to overcome the drawbacks of the PANAS and the rest existing scales, Diener and his colleagues (2010) developed the new Scale of Positive and Negative Experiences (SPANE). SPANE measures a number of both pleasant and unpleasant experiences related to feelings of well-being and ill-being by asking individuals to recall their activities and experiences during the past four weeks and report the duration of the corresponding feelings experienced. Although the initial psychometric analysis (Diener et al., 2010) as well as subsequent evaluations across several different cultures (Silva & Caetano, 2013; Sumi, 2014) failed to figure out whether the SPANE was advantageous in relation to the PANAS, the study conducted by Jovanović (2015) confirmed its incremental validity over the latter.
The authors of SPANE (Diener et al., 2010) suggest that the scale has various advantages in comparison to the existing scales. First of all, it gauges a full set of both general and specific emotions felt by the respondents as it includes a wide range of emotional states using adjectives of high significance. In addition, SPANE does not ask from respondents to rate each feeling in terms of intensity as PANAS does but instead it asks them to estimate the amount of time they passed in a particular state in the previous four weeks. This approach is consistent with the view that the overall perceptions of subjective well-being are based more on the frequency of an experience than on its intensity (Diener, Sandvik, & Pavot, 1991; Diener, Colvin, Pavot, & Allman, 1991) . Besides, the scale captures feelings that were ignored by most previous instruments and it reflects all levels of arousal for feelings so it can apply and perform well across different cultures. Finally, the four weeks period, as adopted by SPANE, contributes to the creation of a balance between memory efficiency and experience sampling.
Generally, two leading approaches have emerged in the literature on well-being: one that deals with subjective happiness and aims to the achievement of pleasure attainment and pain avoidance (hedonic well-being; e.g. Ryan & Deci, 2001; Maltby, Day, & Barber, 2005) , and one that deals with human potential, self-realization and meaning in life (eudaimonic well-being; Ryan & Deci, 2001; Ryff, 1989; Ryff & Keyes, 1995 ). The two constructs are considered to be related but distinct. For this reason, while hedonic well-being scales encourage participants’ evaluations based on feelings, eudaimonic well-being scales assess individuals’ functioning based on cognitive processes rather than emotional experiences (Kashdan, Biswas-Diener, & King, 2008) . SPANE was developed to gauge the hedonic well-being as opposed to e.g. the Flourishing Scale (FS; Diener et al., 2010 ) that measures the eudaimonic well-being. It is a brief and easily comprehensible scale that can assess a wide range of emotions using a limited number of items. Specifically, it comprises only twelve items in total, six devoted to positive feelings (SPANE-P subscale) and six related to negative feelings (SPANE-N subscale) while it also assesses the balance between the two (SPANE-B) by subtracting SPANE-N scores from SPANE-P scores. For both positive and negative feelings, 3 items are general (e.g. positive, negative), and three are more specific (e.g. happy, sad).
The scale showed good psychometric properties in the original scale validation study as it reported internal consistency reliability coefficients (Cronbach’s alpha) of .87 for SPANE-P, .81 for SPANE-N, and .89 for SPANE-B, while test-retest reliability coefficients over one month were shown to be .62, .63 and .68 for SPANE-P, SPANE-N and SPANE-B respectively. A single factor structure is supported for both SPANE-P and SPANE-N based on the results from a principal axis factor analysis. The factor loadings of all the items of both subscales ranged from .58 to .81 and from .49 to .78 for SPANE-P and SPANE-N respectively. Also, the two subscales correlated significantly and negatively with each other (r = −.60). The scale performed well in terms of convergent validity with other measures of emotion, happiness, and life satisfaction such as the Satisfaction with Life Scale, and the construct validity of SPANE-P, SPANE-N and SPANE-B was good, with moderate to extremely high correlations with scores of several other well-being measures. Finally, the scale has been translated into more than 9 languages and has been investigated for its psychometric properties across several cultures including Portugal (Silva & Caetano, 2013) , China (Li, Bai, & Wang, 2013; Tong & Wang, 2017) , Japan (Sumi, 2014) , Turkey (Telef, 2015) , Italy (Giuntoli et al., 2017) , Germany (Rahm, Heise, & Schuldt, 2017) , India (Singh, Junnarkar, & Jaswal, 2017) , and Serbia (Jovanović, 2015) .
In this study, during the evaluation of the original SPANE, a second shorter version of SPANE (SPANE-8) emerged, containing 8 items (4 in SPANE-P and 4 in SPANE-N). SPANE-8 is a revised structure containing one general feeling per dimension instead of three in the original SPANE (Diener et al., 2010: p. 145) . We removed the 2 out of 3 general feelings with the weakest factor loadings per factor. The logic behind this removal is that a general feeling is adequately measured with one item for the positive and one for the negative feelings.
The purpose of this study is to validate the SPANE in the Greek context and explore its psychometric properties among non-clinical Greek population. More specifically, the objectives of this study are the following: 1) To establish the construct validity of the Scale of Positive and Negative Experience, Greek Version using both exploratory and confirmatory factorial analysis techniques like Bifactor EFA, Bifactor CFA, ESEM and Bifactor ESEM; 2) to examine measurement invariance of SPANE across gender; 3) to study the internal consistency reliability, construct reliability (Hoque et al., 2017) and AVE-based convergent validity (Average Variance Extracted; Fornell & Larcker, 1981 ) of the SPANE; 4) to evaluate the convergent and discriminant validity of the SPANE with other constructs; 5) to provide normative data; 6) to evaluate a subjective well-being model (Linley et al., 2009) using SPANE to measure affect. All the above analyses were performed twice, one for the original SPANE (SPANE-12) and one for the new, shorter version of SPANE-8.
The sample is part of a study about the Quality of life in relation to Well-Being. The sample comprised 2272 Greek adults of the general population. Males were 37% and females 63% with an average age of M = 35.54 years (SD = 12.35). More than half of the sample was single (51%) or married (41%), followed by divorced (8%). The majority of the participants (59%) did not have children, or had from 1 (14%) to 2 children (22%), or more (5%). Most participants had a Bachelor (42%), finished high-school (24%), had a postgraduate degree (19%), were university students (14%), or received primary education (1%).
2.2.1. Scale of Positive and Negative Experience (SPANE-12)
This 12-item scale is a subjective well-being measure by Diener et al. (2009, 2010) . It was translated in Greek by Stalikas, Kyriazos and Petratou (2017) with the translation/back-translation method (Brislin, 1970) . It has 2 opposite dimensions: positive experiences (using 6 one-word items, e.g. “Positive” or “Happy”) and negative experiences (using 6 one-word items, e.g., “Negative”, “Sad”). Three positive feelings and 3 negative feelings are general, and the rest are specific (Diener et al., 2010: p. 145) . Items are scored on a Likert scale ranging from 1 (very rarely or never) to 5 (very often or always). Experiences are evaluated over a 4-week time frame. The positive score (SPANE-P) and the negative score (SPANE-N) can range from 6 to 30. Their difference (Affect Balance or SPANE-B) can vary from −24 to 24.
2.2.2. Scale of Positive and Negative Experience 8 (SPANE-8)
Except for the original version (SPANE-12), this study also proposed a second version (SPANE-8) containing 8 items (4 in SPANE-P and 4 in SPANE-N). SPANE-8 is a revised structure containing one general feeling per dimension (item pleasant in positive experiences and bad in negative ones) instead of 3 in the original SPANE (Diener et al., 2010: p. 145) . Thus, 2 general feelings per factor were excluded as overlapping content, leaving a total of 8 items in the scale instead of 12. Among the general positive and negative feelings items, the items with the lowest factor loadings during CFA were excluded. This resulted in a briefer and more parsimonious structure with 4 positive (Pleasant, Happy, Joyful, Contented) and 4 negative items (Bad, Sad, Afraid, Angry).
2.2.3. Flourishing Scale (FS)
The FS (Diener et al., 2009, 2010) is an 8-item unidimensional measure about general aspects of positive human functioning (e.g., “People respect me”). Items are answered on a Likert scale from strong disagreement (1) to strong agreement (7). All items are positively phrased. Score ranges from 8 (minimum flourishing) to 56 (maximum flourishing). Diener et al. (2010) reported an internal consistency reliability of α = .87. Internal consistency reliability in this study was α = 81.
2.2.4. Warwick-Edinburgh Mental Well-Being Scale (WEMWBS)
The WEMWBS was developed by the Universities of Warwick and Edinburgh (Tennant et al., 2007) . It is a 14-item unidimensional, self-report scale tapping on positive aspects of mental well-being and psychological functioning (e.g., “I’ve been feeling cheerful”). All items are positively phrased. They are rated on a 5-point Likert scale from “None of the time” to “All of the time” indicating frequency of positivity in mental state. Responses are summed to a score ranging from 14 to 70. WEMWBS has been reported to have adequate internal consistency reliability in student samples and general population samples, with alpha values of .89 and .91 respectively (Tennant et al., 2007) . Internal consistency reliability in this study was α = .91.
2.2.5. Brief Resilience Scale (BRS)
BRS (Smith, Dalen, Wiggins, Tooley, Christopher, & Bernard, 2008) is a brief measure of resilience. It contains 6 items tapping on the ability to bounce back from stress and difficulties (e.g., “I tend to bounce back quickly after hard times”). Items are rated on a 5-point Likert scale ranging from Strongly Disagree (1) to Strongly Agree (5). Smith et al. (2008) reported an adequate reliability, with Cronbach’s alpha ranging from .80 - .91. Internal reliability in this study was α = .80.
2.2.6. Mental Health Continuum-Short Form (MHC-SF)
Mental Health Continuum-Short Form (Keyes et al., 2008) is a self-report 14-item questionnaire, measuring the three aspects of well-being proposed by Keyes (2002) : emotional (EWB), social (SWB) and psychological (PWB), e.g., “How often did you feel interested in life?”. Items are rated on a 6-point Likert scale, suggesting the frequency of experiences (never, once or twice a month, about once a week, two or three times a week, almost every day, every day) in the past month. Internal consistency reliability for the total MHC-SF scale (Cronbach’s alpha) reported by Keyes (2005) to be adequate (>.80). Internal reliability for the total scale in this study was α = .90.
2.2.7. The Gratitude Questionnaire (GQ-6)
The GQ-6 (McCullough, Emmons, & Tsang, 2002) is a 6-item self-report questionnaire to evaluate proneness to gratitude experience in everyday life (e.g., “I have so much in life to be thankful for”). Respondents answer each item on a 7-point Likert scale (from 1 = strongly disagree to 7 = strongly agree). GQ-6 has a single factor structure. Scores range from 6 (less grateful) to 42 (most grateful) after reversing items 3 and 6. The internal consistency reliability of the scale reported to be alpha = .82 (McCullough, Emmons, & Tsang, 2002) . Internal reliability in this study was α = .68.
2.2.8. Meaning in Life Questionnaire (MLQ)
The MLQ (Steger et al., 2006) measures presence of and search for meaning in life, with a total of 10 items rated on a 7-point Likert scale (from “Absolutely True” to “Absolutely Untrue”). An example item is “I understand my life’s meaning”. Internal reliability in this study was α = .78
2.2.9. Satisfaction with Life Scale (SWLS)
The Satisfaction with Life Scale (Diener, Emmons, Larsen, & Grifin, 1985) is a scale of perceived life satisfaction (e.g., “So far I have gotten the important things I want in life”). Five items are rated on a 7-point Likert scale from 1 (Strongly Disagree) to 7 (Strongly Agree). Internal consistency reliability (Cronbach’s alpha) was reported from .79 to .89 (Pavot & Diener, 1993) . Internal reliability in this study was α = .88.
2.2.10. Trait Hope Scale (HS)
Trait Hope Scale (Snyder et al., 1991) is an assessment tool of trait hope (e.g., “I meet the goals that I set for myself”). HS has two factors: Agency and Pathways. Responses are rated from 1 (Definitely False) to 8 (Definitely True) with a score range from 8 to 64. Snyder et al. (1991) reported that Cronbach’s alphas for the total scale varied from .74 to .84. Internal reliability in this study was α = .89.
2.2.11. World Health Organization Quality of Life-Brief Scale (WHOQOL-BREF)
WHO Quality of Life-Brief scale (WHOQOL Group, 1998a, 1988b) is a self-report assessment tool measuring aspects of perceived quality of life. It is the short version of the WHOQOL-100 (c.f. Skevington, 1999 ). It contains 26 items (e.g., “How would you rate your quality of life?”) reflecting all 24 facets of life quality that WHOQOL-100 covers. Answers are rated by four different types of a 5-point Likert scale indicating either intensity, or capacity, or frequency, or judgement (Skevington et al., 2004) . Minimum possible rating in every Likert scale is 1 (minimum perceived QOL) and maximum is 5 (maximum perceived QOL). The instrument is divided in four domains―Physical health, Psychological health, Social Relations and Environment―with satisfactory Cronbach’s alphas, .82, .81, .68, and .80 respectively (Skevington, Lotfy, & O’Connell, 2004) . Internal reliability in this study was α = .91.
2.2.12. Depression Anxiety Stress Scale, Short Form (DASS-9)
DASS-9 ( Yusoff, 2013 and in Greek by Kyriazos, Stalikas, Prassa, Yotsidi, in press ) is a short form of DASS-21 (Lovibond & Lovibond, 1995) . This version is a post hoc measure, empirically derived by Yusoff (2013) as a faster screening tool. DASS-9 measures emotional distress in three 3-item dimensions: 1) depression (e.g., “I felt that I had nothing to look forward to”); 2) anxiety and 3) stress. All 9 items are rated on a 4-point Likert scale evaluating both intensity and frequency of symptoms over the last week (from 0 = did not apply to me at all to 3 = applied to me very much, or most of the time). The higher the score the more intense/frequent the emotions of distress. Each factor has a discrete score varying from 0 to 9. Cronbach’s alpha for the total DASS-9 was reported by Yusoff (2013) equal to .72 whereas for Depression, Anxiety and Stress factors, alphas were .52, .57, and .55 respectively. In this study, internal reliability for Depression, Anxiety, Stress and Total was .79, .77, .73 and .89 correspondingly (c.f. Kyriazos, Stalikas, Prassa, Yotsidi, in press ).
Data were collected with the assistance of 150 psychology students from 2 different psychology courses (Psychometrics and Research Methods in Psychology). Students forwarded a link to an electronic test battery (in Google Forms© format) to 15 - 20 adults from their social environment. Students were extra-credited for their participation in the study. All the fields in the form of the digital battery were required to eliminate missing values. A letter was included in the test battery to inform participants about the purpose of this study. The process of data collection was the following. First, students received a short, free workshop on the administration of digital psychology questionnaires by the research team. Then, a period of pilot-testing the digital test battery followed to track any flaws in the digital procedure and to record the time required to complete the battery (approximately 15 minutes). Finally, after successful pilot testing, students were provided with a link to the official study.
2.4. Research Design
The sample was split in three parts to study the construct validity of SPANE-12 and SPANE-8 in 3 different subsamples (see Table 1). Research was applied in two levels: 1) on three subsamples (EFA, CFA1 and CFA2) to evaluate construct validity and confirm it; 2) on the full sample (Total sample), to evaluate strict measurement invariance across gender. This is a construct validation procedure we called the “3-faced construct validation method” (see also Kyriazos et al., in press ). More specifically, in the above method the sample is randomly split in three parts (20%, 40%, 40%). In all three emerging subsamples (20% EFA, 40% CFA1, 40% CFA2) the threshold for sample to variable ratio (N/p) is set to 5/1 for EFA (Osborne & Costello, 2004; Singh et al., 2016) and to 10/1 for CFA (DeVellis, 2017) . After splitting the sample, the method consists of the following phases. In the first phase, an EFA is carried-out in the 20% of the sample to establish a structure. Then, in the second phase, a CFA (CFA1) follows in the second part of the sample (40%) evaluating multiple models. By default, in this phase it is a good practice to evaluate at a minimum, a single-factor, multifactor, Bifactor, and higher-order CFA models. Next, in the third phase, the optimal model from the CFA 1 will be replicated in a different sample of equal power (40%). The CFA 2 is designed to crosscheck the findings of the CFA 1. Finally,
Table 1. Overview of the “3-faced construct validation method”.
EFA = Exploratory Factor Analysis, ICM-CFA = Independent Cluster Model Confirmatory Factor Analysis. EFA Bifactor = Bifactor Exploratory Factor Analysis.
an MGCFA finalizes the validation procedure to establish measurement invariance across gender or/and age using the optimal model as a baseline model. If either the CFA 2 or the measurement invariance fails to revalidate the optimal CFA1 model, then the second best model of the CFA 1 is crosschecked etc. (see Table 1 for an overview of the method). Note that if the sample is not adequate, then only CFA is implemented (keeping the N/p ratio threshold > 10/1) and measurement invariance with the Multiple Indicators Multiple Causes method (MIMIC) follows. The above steps is a generally suggested process of construct validation (c.f. Kyriazos et al., in press ).
Regarding the factor analysis methods used in this study, in the first subsample (EFA subsample, NEFA = 452), Exploratory Factor Analysis (EFA) and Bifactor Exploratory Factor Analysis (Bifactor EFA) were carried out. Independent Cluster Model Confirmatory Factor Analysis (ICM-CFA), Bifactor Confirmatory Factor Analysis (Bifactor CFA), Exploratory Structural Equation Modeling (ESEM), and Bifactor Exploratory Structural Equation Modeling (Bifactor ESEM) were applied in the second subsample (CFA1 subsample, NCFA1 = 910), testing alternative solutions for SPANE-12 and SPANE-8. The optimal model that emerged from the CFA1 subsample was cross-validated in a different subsample of equal power (CFA2). Then, a multi-group CFA (MGCFA) was carried out in the entire sample (N = 2272) using the CFA 2 optimal model as a baseline model, to test for strict measurement invariance across gender (see Table 1 for an overview of this method). A reliability analysis (α and ω) followed in the entire sample. AVE Convergent validity and Convergent/Discriminant validity based on correlation analysis were performed in the total sample using measures of mental distress, well-being, positivity and quality of life. Next, a Bifactor CFA Subjective Well-being Model was evaluated, using SPANE to measure affect. Finally, normative data were calculated over the entire sample. Data were collected electronically on Google Forms® and were analyzed with SPSS Version 25 (IBM, 2017) , Stata Version 14.2 (StataCorp, 2015) and MPlus Version 7.0 (Muthen & Muthen, 2012) .
3.1. Data Management
The full sample contained N = 2272 cases. All variables had no missing values because all the digital test-battery fields were required (see details in Procedure section).
To validate the factor structure of SPANE, the total sample (N = 2272) was randomly split into three parts (20%, 40% and 40%) to implement a construct validation procedure we call the “3-faced construct validation method” (see also Kyriazos et al., in press ). Although splitting a sample is generally suggested for the development of new models (Schumacker & Lomax, 2015) , the above analysis strategy was adopted because this sample was large enough to maintain adequate power after splitting. Specifically, the sample-to-variable ratio (N/p) for the EFA subsample (NEFA = 452), CFA1 subsample (NCFA1 = 910) and CFA2 subsample (NCFA2 = 910) was 37.67/1 and 75.83/1 respectively. An acceptable sample-to-variable ratio can range anywhere from 10/1 (Osborne & Costello, 2004; Singh et al., 2016) to 20/1 (Schumacker & Lomax, 2015) . Alternatively, a sample having 500 to 1000 cases is generally regarded either satisfactory or excellent (Comrey & Lee, 1992) .
3.2. Univariate and Multivariate Normality
The data both in the total sample and the three subsamples violated the normality assumption. Kolomogorov-Smirnov tests (Massey, 1951) on each individual item of SPANE-P and SPANE-N were statistically significant with no exception, p = .000, indicating that items were non-normally distributed for the total sample. Specifically, Kolomogorov-Smirnov for the total sample ranged from D (2272) = .172, to D (2272) = .279, for the EFA subsample from D (452) =. 183, to D (452) = .232, for the CFA1 subsample from D (910) = .187 to D (910) = .327 and finally for the CFA2 subsample from D (910) = .178 to D (910) = .287, p = .000 in all samples. Next, we evaluated multivariate normality with the following four tests: 1) Mardia’s multivariate kurtosis test (Mardia, 1970) ; 2) Mardia’s multivariate skewness test (Mardia, 1970) ; 3) Henz-Zirkler’s consistent test (Henze & Zirkler, 1990) , and 4) Doornik-Hansen omnibus test (Doornik & Hansen, 2008) . The null hypothesis was rejected for all four tests (all p values < .0001), suggesting a violation of multivariate normality in all four samples (Total, EFA subsample, CFA1 subsample, CFA2 subsample). The values of all multivariate tests are presented in Table 2.
3.3. Exploratory Factor Analysis (EFA)
EFA was applied with the MLR estimator (c.f. Muthen & Muthen, 2012 ). MLR is
Table 2. Multivariate normality tests.
All p values < .0001.
a rescaling-based estimation method appropriate for non-normal distributions and―unlike similar estimators―it produces standard errors and chi-square test (Wang & Wang, 2012) . Additionally, MLR is appropriate for sample-sizes ranging from for small to medium (Bentler & Yuan, 1999; Muthen & Asparouhov, 2002; Wang & Wang, 2012) , like this split sample. The factors were rotated with Geomin factor rotation in the standard EFA model. Additionally, for the EFA Bifactor model, the technique proposed by Jennrich and Bentler (2011) was applied. EFA model fit was evaluated by the standards proposed by Hu & Bentler (1999) and Brown (2015) : RMSEA (≤ .06, 90% CI ≤ .06), SRMR (≤ .08), CFI (≥ .95), TLI (≥ .95), and the chi-square / df ratio less than 3 (Kline, 2016) . Multiple indices used together, provided a more conservative and reliable estimation of the results (Brown, 2015) .
For both SPANE-12 and SPANE-8, a total of 5 EFA models were evaluated in the EFA subsample (N = 472). For SPANE-12, the following models were tested. MODEL 1a was proposed by Diener et al. (2010) , and contains only the 6 positive items of SPANE-12 (SPANE-P). Respectively, MODEL 1b contains only the 6 negative items of SPANE-12 (SPANE-N; Diener et al., 2010 ) to test the assumption that Positive and Negative affect are independent measures of PA and NA ( Crawford & Henry, 2004) . MODEL 2 is a bi-dimensional EFA model with SPANE-P and SPANE-N in two separate factors (proposed by Singh et al., 2017 ; attributed to Diener et al., 2010 ). Generally, this EFA model also served as a benchmark for the subsequent Bifactor EFA model. MODEL 3, is a Bifactor EFA model (Jennrich & Bentler, 2011) . According to Reise et al. (2007) the evaluation of a Bifactor model is generally recommended when evaluating construct validity (c.f. Hammer & Toland, 2016 ). The Schmid-Leiman method (Schmid & Leiman, 1957) had been restricted to Bifactor CFA models until Jennrich & Bentler’s (2011) proposed the Exploratory Bifactor Analysis method applied here. Additionally, this Bifactor EFA model attempts to reproduce the hierarchical EFA structure for affect proposed by Tellegen et al. (1999) with a General Happiness/Sadness factor and PA and NA as specific factors. Generally, Bifactor models can successfully reproduce higher order structures (Howard et al., 2016) . SPANE-12 models showed the following fit. The fit for the models 1a and 1b was decent with all measures within acceptable or almost acceptable bounds. For the 2-factor EFA model (MODEL 2) and EFA Bifactor model (MODEL 3), fit measures (see Table 2) achieved the prerequisite limits, Chi-square = 135.82, Chi-square/df = 3.16, RMSE = .069, CFI = .952, and TLI = .926, SRMR = .033, factor loadings .358 - .850, and factor correlation −.724.
For SPANE-8 the following alternative models were tested. MODEL 1a and 1b contained only the 6 items of SPANE-P and SPANE-N respectively ( Diener et al., 2010 for SPANE-12), to test the assumption that PA and NA are independent constructs (Watson et al., 1988; Crawford & Henry, 2004) . MODEL 2 was the two factor structure with SPANE-P and SPANE-N in two different factors. Regarding the fit of the above SPANE-8 models, all fit indices showed acceptable fit. The 2-factor solution achieved the following fit measures (Chi-square = 17.64, Chi-square/df = 1.36, RMSE = .028, CFI = .995, and TLI = .989, SRMR = .018, factor loadings .390 - .893, and factor correlation −.584). Table 3 contains the fit statistics for all models.
3.4. Confirmatory Factor Analysis (CFA)
In this phase of the 3-faced construct validation method, we examined both SPANE-12 and SPANE-8 dimensionality further with Confirmatory Factor Analysis in a different subsample (40%, N = 910). The following criteria were applied to evaluate CFA model fit (Hu & Bentler, 1999; Brown, 2015) : RMSEA (≤ .06, 90% CI ≤ .06), SRMR (≤ .08), CFI (≥ .95), TLI (≥ .95), and the chi-square /df ratio less than 3 (Kline, 2016) .
For SPANE-12 the following 9 models were tested (see Table 4). MODEL 1 has both positive and negative experiences collapsed into one factor. It is a standard practice to test the assumption of maximum parsimony with a unidimensional ICM-CFA model (Crawford & Henry, 2004; Brown, 2015) . However, in this case, the model is also based on the hypothesis that PA and NA are opposite ends of a single dimension (Russell & Carroll, 1999; Crawford & Henry, 2004) . MODEL 2 has positive and negative experiences separated into two factors (also proposed by Giuntoli et al., 2017; Singh et al., 2017; Rahm et al., 2017; Silva & Caetano, 2013; Sumi, 2014 ). MODEL 3 is a variation of MODEL 2 with three error covariances added. MODEL 4 is a model with multiple, theoretically based, error covariances (see Figure 1(a)) to reflect the four content categories of SPANE items: 1) SPANE-P General Positive (items 1, 3, 5); 2) SPANE-P Specific Positive (items 7, 10, 12); 3) SPANE-N General Negative (items 2, 4, 6); 4) SPANE-N Specific Negative (items 8, 9, 11), see Figure 1. A model with multiple error covariances was also proposed by Li et al. (2013) . MODEL 5 is an ESEM model with positive and negative experiences separated into two factors to reproduce the bipolar structure of SPANE (never tested before). ESEM (Asparouhov & Muthen, 2009) is an integration of EFA, CFA, and SEM. ESEM potentially resolves misspecifications and inflated factor loadings, inherent in ICM-CFA due to constraining secondary factor loadings to zero (Marsh et al., 2014) . MODEL 6 is a variation of MODEL 5 with error covariances added. MODEL 7 is a Bifactor (Schmid & Leiman, 1957) CFA model suggested by Li et al. (2013) .
Figure 1. Path Diagrams of 2 alternative models of the total 9 tested for SPANE-12. (a) 2-factor ICM-CFA model with theoretical covariances between general and specific items; (b) A 4-factor model proposed (P General, P Specific, N General and N specific).
Table 3. EFA and bifactor EFA Fit statistics for SPANE-12 and SPANE-8.
Estimator = MLR, EFA Factor rotation = Geomin, EFA Bifactor rotation = BiGeomon. *G = General Factor, S = Specific factor, H/S = Happiness/Sadness. **SPANE-8 N (N4) = Bad, Sad, Afraid, Angry. **SPANE-8 P (P4) = Pleasant, Happy, Joyful, Contented.
Table 4. CFA Fit statistics for SPANE-12 and SPANE-8.
Estimator = MLR. *G = General Affect Factor, S = Specific Positive and Negative factors, H/S = Happiness/Sadness. **SPANE-8 N (N4) = Bad, Sad, Afraid, Angry. **SPANE-8 P (P4) = Pleasant, Happy, Joyful, Contented.
MODEL 8 is a Bifactor ESEM (c.f. Reise, 2012; Marsh et al., 2013 ), model (never tested before). MODEL 9 is a 4-factor structure (never tested before). Factor 1 has 3 general positive items, factor 2 has 3 specific positive items, factor 3 has 3 general negative items and factor 4 has 3 specific negative items. This structure incorporates the additional differentiation of items in general and specific (Diener et al., 2010) , apart from positive and negative (see Figure 1(b)). A higher order model was not tested due to misspecification, i.e. under-identification (Wang & Wang, 2012; Brown, 2015) .
Regarding the fit of the SPANE-12 models tested, MODEL 1 had a poor fit. MODEL 2 showed a tolerably acceptable fit. MODEL 3 was the variation of MODEL 2 with error covariances permitted in items 3 - 5, 7 - 12, 2 - 8, showing a very good fit (Chi-square = 150.20, Chi-square/df = 3.07, RMSE = .048, CFI = .975, and TLI = .967, SRMR = .026, factor loadings for SPANE-12 P .647 - .801, and for SPANE-N .384 - .828, and factor correlation −.724). In MODEL 4 error covariances were permitted in specific SPANE-P and SPANE-N items, based on the categorization of items either as General or Specific (see Figure 1(a)). The fit of this model was acceptable, however MODEL 4 was less parsimonious than MODEL 3. MODELS 5 and 6 were the ESEM models, with also an acceptable fit. The 12 items of SPANE-12 were allowed to load on both Positive and Negative Experiences factors in these two ESEM (Asparouhov & Muthen, 2009) models. Nevertheless, in this case there were no cross-loadings and the fit of ICM and ESEM models was equally satisfactory. Note, the primary factor loadings of 2-factor ESEM model ranged from .647 - .805 (SPANE-P) and .418 - .895 (SPANE-N) and factor correlation was −.628. Therefore, ESEM is most suitable than a corresponding ICM-CFA model when fitting the data better, otherwise the ICM-CFA model is preferable as more parsimonious (Marsh et al., 2014) . MODELS 7 and 8 were a Bifactor (Schmid & Leiman, 1957) CFA and a Bifactor ESEM (c.f. Reise, 2012 ) model respectively (Figure 1) with also adequate fit to the data. However, Bifactor models always tend to show acceptable fit, thus they have been criticized as doubtful (Joshanloo, Jose, & Kielpikowski, 2017; Joshanloo & Jovanovic, 2016) . MODEL 9 showed a decent fit with measures within acceptable limits but with less parsimonious structure (see Figure 1(b)).
Regarding the 2-factor ICM-CFA model tested for SPANE-8, it showed a good fit to the data, Chi-square = 26.05, Chi-square/df = 1.37, RMSE = .020, CFI = .996, and TLI = .994, SRMR = .020. The factor loadings for SPANE-8 P ranged from .639 to .771, and for SPANE-8 N from .413 to .811, with a factor correlation of −.662. Table 4 contains all the fit statistics for the 10 alternative CFA models tested for both SPANE versions.
To summarize the findings in the CFA 1 subsample (N = 910), the following models showed an overall optimal fit after considering fit measures, factor loadings and factor intercorrelations: 1) for SPANE-12, the 2-factor ICM-CFA with 3 error covariances permitted in items 3 - 5, 7 - 12 and 2 - 8 (MODEL 3); 2) for SPANE-8, the 2-factor ICM-CFA. This bi-dimensional structure has also been proposed by many studies as optimal (Giuntoli et al., 2017; Singh et al., 2017; Rahm et al., 2017; Silva & Caetano, 2013; Sumi, 2014) , supporting these preliminary findings further.
3.5. Cross-Validating the Optimal CFA Model in a Different Subsample
After determining that the 2-factor ICM-CFA model was the optimal structure emerging from the CFA 1 subsample (N = 910), for both SPANE-12 and SPANE-8, a crosscheck of this model followed to verify model fit in a second subsample (CFA2, N = 910) of equal power. See Table 1 for details about the “3-faced construct validation method” (see also Kyriazos et al., in press ).
Table 5 contains the results of this crosscheck CFA for SPANE-12 and SPANE-8. The 2-factor ICM-MODEL model showed adequate fit with all indices
Table 5. CFA Fit statistics for SPANE-12 and SPANE-8 in the CFA2 subsample.
Estimator = MLR. *SPANE-8 N (N4) = Bad, Sad, Afraid, Angry. *SPANE-8 P (P4) = Pleasant, Happy, Joyful, Contented.
falling within the acceptable range with good fit for SPANE-12 (see Figure 2(a)) and achieving an equally good fit on all indices for SPANE-8 (see Figure 2(b)). Moreover, fit measures for both versions were also very stable across the two subsamples. Factor loadings were also adequate, ranging for SPANE-12 P from .656 to .811 and for SPANE-12 N from .467 to .815, with factors intercorrelated at −.676. SPANE-8 factor loadings were equally adequate, both for SPANE-8 P (.678 - .800) and N (.493 - .802), with factors intercorrelated at −.617 (see Figure 2). All items loaded on the intended latent factors. After taking into consideration the above findings (fit statistics and factor loadings) for both SPANE versions, we used this successfully cross-validated 2-factor structure as a baseline model to evaluate measurement invariance across gender.
3.6. Measurement Invariance
For both SPANE versions, we examined measurement invariance across gender in the entire sample (N = 2272). The criteria used were ΔCFI ≤ −.01, and ΔRMSEA ≤ .015 (Chen, 2007) . First, for SPANE-12, gender invariance of the 2-factor ICM CFA model with error covariances was tested separately in each gender group (males, N = 832 versus females, N = 1440), to establish a baseline model. This model had a good fit for males (Chi-square = 185.57, Chi-square/df = 3.71, RMSEA = .057, CFI = .965) and equally good for females (Chi-square = 202.77, Chi-square/df = 4.22, RMSEA = .047, CFI = .978). Then, this bi-dimensional baseline solution was tested in both gender groups simultaneously (M1) showing adequate fit (see Table 6), thus configural invariance was supported. Next, to examine weak invariance, factor loadings were constrained to equality. As shown in Table 6, ΔCFI and ΔRMSEA for this constrained model (M2) confirmed weak invariance. Then, intercepts were forced to be equal (M3), and both ΔCFI and ΔRMSEA suggested that strong invariance was supported. Finally, to test strict invariance, the ultimate test of measurement invariance (Wang & Wang, 2012) , error variances were forced to be equal. ΔCFI and ΔRMSEA suggested strict measurement invariance.
Table 6. Fit Measures of the nested models tested to validate measurement invariance.
MLR estimator was used in all models. Error covariances for men and women differ across models (c.f. Wang & Wang, 2012 ).
Figure 2. Path Diagrams of the Optimal 2-factor ICM-CFA models found from the CFA2 subsample, having equal power with CFA1 subsample (see 3-faced construct validation method). (a) Optimal SPANE-12 model; (b) optimal SPANE-8 Model.
Then, to test measurement invariance for SPANE-8 the above procedure was repeated. The baseline bi-dimensional optimal solution presented very good fit for males (Chi-square 31.69, Chi-square /df = 1.76, RMSEA = .030, CFI = .992), and equally adequate fit for females (Chi-square = 43.694, Chi-square /df = 2.30, RMSEA = .030, CFI = .993). Then, this model was evaluated in both gender groups at the same time. The fit of this model (M1) confirmed configural invariance (see Table 6). Then, factor loadings (MODEL 2), intercepts (MODEL 3) and error variances (MODEL 4) were consecutively constrained to equality. Model comparison indicated that fit measures were adequate with ΔCFI and ΔRMSEA within acceptable range for MODEL 2 to 1 (weak invariance), tolerably acceptable range for MODEL 3 to 2 (strong invariance), and equally acceptable for MODEL 4 to 3 (strict invariance).
3.7. Reliability and AVE Validity
Next, we evaluated the reliability and validity of SPANE-12 and SPANE-8 over the entire sample (N = 2272) using the following measures; 1) Cronbach’s alpha (α; Cronbach, 1951 ) to examine internal consistency of the responses. Alpha values above .70 are generally acceptable (Hair et al., 2010) , and above .80 adequate (Kline, 2000; Nunnally & Berstein, 1994) ; 2) Omega Total coefficient (ω total; McDonald, 1999; Werts, Lim, & Joreskog, 1974 ) to examine construct reliability (Hoque et al., 2017) . Omega corresponds either to variance accounted by all factors or by each latent factor separately (Brunner et al., 2012). For omega a, value of .70 or greater is acceptable (Hair et al., 2010) ; 3) Average Variance Extracted (AVE; Fornell & Larcker, 1981 ) to evaluate convergent validity. Malhotra & Dash (2011) note that ω alone is weak, potentially allowing an error variance as high as 50%. Therefore, AVE in combination with ω coefficient offers a more conservative estimation of convergent validity (Malhotra & Dash, 2011). The threshold for AVE is .50 (Fornell & Larcker, 1981; Hair et al., 2010; Awang et al., 2015) .
The internal reliability of SPANE-12 total was significant. On average, for each SPANE-12 factor, alpha coefficients were adequate (M = .88). Similarly, for SPANE-8 total internal reliability was satisfactory and on average alpha coefficients for the two factors were also good (M = .80); see Table 7. Likewise, omega reliability coefficient for the total SPANE-12 was substantial (.94) and for each factor on average satisfactory, and equal to alpha, M = .88. This value suggests that mean percentage variance explained by each SPANE-12 factor score is 88%. For the total SPANE-8, omega coefficient was also significant (.90) and for each factor on average sufficient (M = .81), indicating a mean percentage of explained variance per factor score of 81% (see Table 7). Regarding AVE, Total AVE was significant for both versions. For SPANE-12 per factor AVE was on average adequate, M = . 56, and for SPANE-8 equally adequate, M = .53 (Table 7).
3.8. Correlation Analysis to Examine Convergent and Discriminant Validity
The relationship between SPANE with other measures was examined in the entire sample (N = 2272). Dimensions evaluated are separated into four groups: 1) mental distress (with the 3 dimensions of DASS-9; Yusoff, 2013; Kyriazos, Stalikas, Prassa, & Yotsidi, in press ); 2) well-being, including WEMWBS (Tennant et al., 2007) , MHC-SF (Keyes et al., 2008) , Flourishing Scale (FS; Diener et al., 2010 ), and Satisfaction with life Scale (SWLS; Diener et al., 1985 ); 3) positivity
Table 7. Reliability and AVE Convergent Validity of SPANE-12 and SPANE-8.
SPANE-8 N = Bad, Sad, Afraid, Angry. SPANE-8 P = Pleasant, Happy, Joyful, Contented.
scales comprising trait HOPE (Snyder et al., 1991) , Brief Resilience Scale (BRS; Smith et al., 2008 ), Meaning in life Questionnaire (MLQ; Steger et al., 2006 ) and Gratitude 6 Questionnaire (McCullough et al., 2002) ; 4) aspects of life quality like Physical health, Psychological health, Social Relations and Environment (from WHOQOL-BFEF; WHOQOL Group, 1998a, 1988b ). We examined results both for SPANE-12 and SPANE-8 as well as correlation between them (presented in Table 8). Concerning the correlations between SPANE and Group 1 (Mental Distress Scales), correlations of SPANE-12 P, SPANE-12 N and SPANE-12 B with DASS-9 ranged from moderate to strong, M = −.38, M = .45 and M = −.47 respectively. The correlations of SPANE-8 P, SPANE-8 N and SPANE-8 B with DASS-9 were also moderately strong (M = −.37, M = .44, M = −.46).
The correlations between SPANE-12 P and Group 2 (Well-Being Scales) were on average strong (M = .60). Note that WEMWB and Emotional WB showed the strongest correlations. The correlations between SPANE-12 N and Group 2 were on average moderate to strong (M = −.44). SPANE-12 B had a strong, positive correlation with Group 2 (M = .56). Similarly, correlations between SPANE-8 P and Group 2 were on average strong (M = .58), with WEMWBS and Emotional WB being the strongest. The correlations between SPANE-8 N and Group 2 were negative and on average moderately strong (M = −.40). The correlations between SPANE-8 B and Group 2 were positive and on average strong (M = .55).
Concerning the correlations between SPANE with Group 3 (Positivity Scales), SPANE-12 P, SPANE-12 N and SPANE-12 B had on average a moderate to strong correlation of .40, −.31, and .38 respectively. SPANE-8 P, SPANE-8 N and SPANE-B had on average moderate correlation with Group 3 (M = .39, M = −.28, M = .37). A consistent pattern of relations emerged in this Group for both SPANE versions, since HOPE and Search for Meaning where invariably at the highest and lowest positive range for both SPANE P and SPANE B. Likewise, Resilience and Search for Meaning had the strongest negative and weakest positive relation with SPANE-N for both SPANE versions.
Table 8. Bivariate Correlations between SPANE-12, SPANE-8 and other measures.
All p values were < .01. P = SPANE POSITIVE, N = SPANE NEGATIVE, B = SPANE AFFECT BALANCE. Bold indicates correlations of equivalent factors on SPANE-12 and SPANE-8.
Regarding the correlations of SPANE with Group 4 (Quality of life Scales), SPANE-12 P, SPANE-12 N and SPANE-12 B had on average strong correlations (M = .51, M = −.42, M = .51). SPANE-8 P, SPANE-8 N and SPANE-8 B also had on average a strong correlation with Group 4 (M = .50, M = −.39, M = .50). Finally, about the correlations between the two SPANE versions, SPANE-12 P, SPANE-12 N and SPANE-12 B were strongly correlated with their equivalent SPANE-8 factors with values as high as .97, .97 and .98 respectively, all p < .01 (see Table 8).
3.9. Evaluation of the Bifactor CFA Subjective Well-Being Model
For the measurement of subjective well-being, empirical literature (Linley et al., 2009) suggests using two constructs: 1) positive/negative affect, and 2) life satisfaction. The current study examined if the above empirical hierarchical model of subjective well-being (empirically by Linley et al., 2009 ; also postulated by Diener, Suh, Lucas, & Smith, 1999 as quoted in Ruini, 2017 ) is confirmable in the current sample with a Bifactor CFA structure. Bifactor structure is an alternative to original proposed higher-order CFA model (Linley et al., 2009) because it successfully reproduces higher order structures (Howard et al., 2016) which in this case is inapplicable due to misspecification (Wang & Wang, 2012; Brown, 2015) .
To test this Bifactor SWB model (see Figure 3) we used the Satisfaction with Life Scale (Diener et al., 1985) to measure life satisfaction and the two newly validated SPANE versions (SPANE-12 and SPANE-8) to measure affect. Specifically, we tested two Bifactor CFA models. The first had a general subjective well-being factor and SWLS, SPANE-12 P and SPANE12-N as specific factors (Figure 3(a)). The second had a general subjective well-being factor and SWLS, SPANE-8 P and SPANE 8-N as specific factors (Figure 3(b)). The SWB model of both SPANE versions had an adequate fit (see Table 9), with all measures within acceptable limits, and chi-square/df on the verge of acceptability. The satisfactory fit of the model to the data suggests that the proposed subjective well-being model is tenable using SPANE-12 or SPANE-8 to measure the affect component (see Figure 3).
3.10. Normative Data for SPANE-12 & SPANE-8
For both SPANE versions tested the means and ranges for the SPANE-P, SPANE-N and SPANE-B dimensions for the total sample (N = 2272) are presented in Table 10.
Means are not informative of individual scores, given the non-normality of the data (Crawford & Henry, 2004) . Therefore, Table 10 converts SPANE-12 and SPANE-8 scores to percentiles. For SPANE-12 the 50% of the respondents of this sample in Positive Experiences, Negative Experiences and Affect Balance scored ≤ 23, ≤ 15 and ≤ 8 respectively. For the original SPANE, more than half of the respondents in US scored 22 on SPANE-P (51%), 15 on SPANE-N (52%)
Figure 3. The subjective well-being measure using the Bifactor CFA method. (a) With SPANE-12 and SWLS as specific factors and a general subjective well-being factor (b) with SPANE-8 and SWLS as specific factors and a general subjective well-being Factor.
Table 9. Fit statistics for the subjective well-being Bifactor CFA Model for SPANE-12 and SPANE-8.
Estimator = MLR, *SPANE-8 N = Bad, Sad, Afraid, Angry, *SPANE-8 P = Pleasant, Happy, Joyful, Contented.
Table 10. Summary statistics and raw scores of SPANE converted to percentiles.
and 7 on SPANE-B (53%), as reported by Diener et al. (2010) . In turn, for SPANE-8 the 50% of the respondents of our sample had a score ≤ 15, ≤ 10 and ≤ 5 in Positive Experiences, Negative Experiences and Affect Balance respectively (see Table 10).
The purpose of this study was to examine the construct validity of two versions of SPANE: SPANE-12 (Diener et al., 2010) and SPANE-8. SPANE-8 is a new, shorter version proposed by the present study. SPANE-8 contains 1 instead of 3 items of general affect in each original SPANE factor (Diener et al., 2010: p. 145) . Among the general positive and negative items, 2 items with the lowest CFA factor loadings were excluded. This resulted to a briefer and more parsimonious structure with a total of 4 positive (Pleasant, Happy, Joyful, Contented) and 4 negative (Bad, Sad, Afraid, Angry) items. Items excluded were: 1) In SPANE-P item 1 (positive) and 3 (good) and 2) In SPANE-N item 2 (negative) and 6 (unpleasant).
The main findings that emerged for both SPANE versions are: 1) the two factor model (with SPANE-P and SPANE-N as dimensions) showed the optimal fit (also proposed by Giuntoli et al., 2017; Singh et al., 2017; Rahm et al., 2017; Silva & Caetano, 2013; Sumi, 2014 and attributed to Diener et al., 2010 ); 2) full, strict measurement invariance across gender was supported; 3) alpha and omega reliability and AVE convergent validity were adequate; 4) a consistent, overarching pattern of relationships was present from the correlation analysis, common for both versions; 5) the SPANE-12 and SPANE-8 had a high positive correlation indicating that on average 97% of the variance in SPANE-12 is explained by SPANE-8; 6) a Bifactor CFA Subjective well-being model (in a similar vein by Linley et al., 2009 ) was tenable in our data using both SPANE-12 and SPANE-8 to measure affect.
To establish the construct validity of SPANE we used a multifaceted crosscheck procedure we call the “3-faced construct validation method” based on sample-splitting. Sample-splitting (Guadagnoli & Velicer, 1988; MacCallum, Browne, & Sugawara, 1996) is generally regarded as a cross-validation method of building construct validity because factor analysis findings are replicated in a different subsample (Byrne, 2010; Brown, 2015; Schumacker & Lomax, 2015; Singh et al., 2016; DeVellis, 2017) . In the “3-faced construct validation method” (see also Kyriazos et al., in press and Table 1 for an overview), the sample is randomly split in three parts (20%, 40% and 40%) keeping the N/p ratio threshold for EFA to 5/1 and for CFA to 10/1. The first 20% is used for EFA. The second 40% is used for an explorative CFA (CFA 1) to test a minimum of the following alternative models: a single-factor ICM-CFA, a multifactor ICM-CFA, a Bifactor CFA and a Higher-order CFA (if applicable). In this study, additional alternative models tested in CFA1 were an ESEM model and a Bifactor ESEM model. Next, the third 40% for a crosscheck CFA (CFA 2) is used to verify the optimal model (or competing optimal models) of CFA 1 in a different subsample of equal power. If the CFA 2 fails to revalidate the optimal CFA 1 model, then the second best model is crosschecked etc. Measurement invariance using the cross-validated model as a baseline model is the final phase of the method (see Table 1 for an overview of the method). Note that if the sample is not adequate only CFA is performed (keeping the N/p ratio threshold > 10/1) and measurement invariance with the Multiple Indicators Multiple Causes method (MIMIC) follows. In this study the sample-size was adequate to permit sample splitting. Regarding sample power, all three samples were far beyond the suggested threshold of 5 to10 cases for each observed variable (Osborne & Costello, 2004; Singh et al., 2016) or even the stricter 20 cases for each observed variable (Schumacker & Lomax, 2015) .
In the first phase of the “3-faced construct validation method”, a total five alternative EFA models were tested for both versions―3 for SPANE-12 and 2 for SPANE-8 in the 20% subsample. Models tested included single, and two factor structures. Single structures had only the SPANE-P and SPANE-N as two independent structures to imitate the factorial structure proposed by Diener et al. (2010) , with adequate results for both SPANE versions (12 & 8). For SPANE-12 we also tested a 2-factor Bifactor EFA model. In EFA, the 2-factor solution (also tested by Singh et al., 2017 ) achieved acceptable fit in all fit indices and the 2-factor Bifactor model showed comparably adequate fit. However, a predominant general Happiness/Sadness factor is not necessarily present since dimensionality based only on Bifactor analysis is often regarded questionable (Joshanloo et al., 2017; Joshanloo & Jovanovic, 2016) .
To confirm results of the EFA in the second phase of the “3-faced construct validation method”, we used a new 40% subsample of to explore alternative structures for both SPANE-12 and SPANE-8 versions. CFA has turned into the ubiquitous test of construct validity in psychological measurement (Howard et al., 2016) . We tested a total of 9 alternative SPANE-12 models and the 2-factor model for SPANE-8. Crucially, of the 9 CFA models tested three of them were new additions to the empirical CFA literature: 1) A 2-factor ESEM Model. ESEM is an integration of EFA, CFA, and SEM that potentially resolves inflated factor loadings, inherent in ICM-CFA (Marsh et al., 2014) . 2) A 2-factor Bifactor ESEM model with a General “Happiness/sadness” factor and two specific factors, SPANE-P and SPANE-N. 3) a 4-factor structure with Factor 1 having 3 general positive items, factor 2 containing 3 specific positive items, factor 3 tapping 3 general negative items and factor 4 containing 3 specific negative items. This structure was based on an additional differentiation of items in “general” and “specific” (Diener et al., 2010: p. 145) , apart from positive and negative. We did not test a higher order model due to misspecification constrains (Wang & Wang, 2012; Brown, 2015) . However, Bifactor results has been criticized at certain conditions as questionable (Reise, 2012; Joshanloo et al., 2017) .
Considering fit measures and factor loadings, the 2-factor model with 3 error covariances showed the optimal fit. Thus, reaching the 3rd phase of the “3-faced construct validation method”, the optimal CFA 1 model was subsequently cross-validated with satisfactory results in the third subsample (also 40%). This cross-validation was performed for both versions of SPANE (12 & 8). This bi-dimensional, cross-validated solution for SPANE 12 is in line with many other similar empirical findings ( Giuntoli et al., 2017; Singh et al., 2017; Rahm et al., 2017; Silva & Caetano, 2013; Sumi, 2014 , attributed to Diener et al., 2010 ) and it was supported for SPANE-8 as well. Also, in line with previous CFA research, the two SPANE factors were intercorrelated with a strong negative correlation (Giuntoli et al., 2017; Singh et al., 2017; Silva & Caetano, 2013; Li et al., 2013) .
The results of measurement invariance across gender (c.f. Brown, 2015; Byrne, 2010; Kline, 2016; Schumacker & Lomax, 2015 ) of this cross-validated bi-di- mensional ICM-CFA model suggested that all SPANE-12 and SPANE-8 items were invariant (full, strict measurement invariance). Furthermore, the results supporting the measurement invariance across gender are adding to previous research findings in several diverse cultures (Li et al., 2013; Giuntoli et al., 2017; Singh et al., 2017) . Damasio & Koller (2015) comment that invariance adds a significant quality advantage enabling valid group comparisons across gender, free from response bias.
Next, we evaluated the internal consistency of the responses, the construct reliability with omega total (Awang et al., 2015) and Average Variance Extracted convergent validity (Fornell & Larcker, 1981) . Findings showed adequacy and stability across the two SPANE versions (12 and 8). Additionally, the relationship between SPANE and 10 other measures was evaluated. Dimensions used can be divided into four groups: 1) mental distress (with the 3 dimensions of DASS-9; Yusoff, 2013 ; Kyriazos, Stalikas, Prassa, & Yotsidi, in press ); 2) 4 well-being measures; 3) 4 positivity scales covering the constructs of Hope, Resilience, Meaning in life and Gratitude; 4) aspects of life quality. A consistent pattern of relations emerged for both SPANE versions since HOPE (Snyder et al., 1991) and Search for Meaning (MLQ; Steger et al., 2006 ) were invariably at the highest and lowest positive range for both SPANE P and SPANE B. Likewise, resilience and Search for Meaning showed the strongest negative and weakest positive relation with SPANE-N for both SPANE versions. We also examined SPANE-12 and SPANE-8 correlation and it was very high between equivalent SPANE-12 and SPANE-8 factors (.97 - .98). Moreover, normative data were calculated both for SPANE-12 and SPANE-8.
Finally, we successfully tested a Bifactor CFA Subjective Well-being model using SPANE (originally by Linley et al., 2009 as a higher order structure). Subjective well-being is operationalized having an affective (emotional) component and a cognitive, judgmental component about life satisfaction (Linley et al., 2009) . Thus, the measurement of subjective well-being consists of a positive/negative affect measure (here SPANE) and a life satisfaction measure (here SWLS by Diener et al., 1985; c.f. Govindji & Linley, 2007; Linley et al., 2009 ). We built two alternative models using either SPANE-12 or SPANE-8 for the affective component and SWLS for the cognitive life satisfaction component. The adequate fit of both models, especially the SPANE-8 model, is an indication that the SWB models is tenable in this sample.
We have addressed the issue of construct validity and full strict measurement invariance for two alternative versions of SPANE. We concluded that both the original SPANE-12 and the newly established SPANE-8 confirmed the bi-di- mensional model with the two components of affect in two factors. Additionally, SPANE-12 and SPANE-8 are measurement invariant across gender (strict invariance). Our findings confirmed the robustness of SPANE-12, suggesting at the same time that SPANE-8 has equally sound psychometric properties, and a more parsimonious structure, being shorter. The two versions are strongly correlated. Normative data were also calculated for both versions.
Nevertheless, this study has certain limitations that should be taken into account when interpreting the results. First, psychology students participated in data administration through an electronic test battery. The effects of this procedure are unspecified. Second, SPANE-8 findings are encouraging, however additional evaluation of its structure is necessary to different samples. Third, the measurement invariance across gender had a rather imbalanced sample of men versus women. All above limitations considered, the present study adds to present affect research by: 1) providing evidence that the Greek version of the original 12-item SPANE (Diener et al., 2010) is a robust measure of affect. The construct validation was implemented using a new sample-splitting based procedure we call “3-faced construct validation method”; 2) providing a newer, briefer version, the SPANE-8, with sound psychometric properties and normative data; 3) proving the strict measurement invariance of both above versions; 4) modeling Subjective wellbeing with a Bifactor CFA model using SPANE to measure the affect component; 5) calculating normative data for the Greek cultural context.