In recent years, the construct of meaning in life has been extensively studied by a plethora of social scientists, concertedly with the emphasis that has been put on several other variables of positive psychology (Ryan & Deci, 2001; Seligman & Csikszentmihalyi, 2000) . The meaning in life concerns the degree to which people think their life matters or has a purpose (Steger & Kashdan, 2006) and is a prominent indicator of well-being (Ryff, 1989; Steger & Frazier, 2005) . The pervasive relation between meaning in life and well-being has been consistently documented across several studies (e.g., Reker, Peacock, & Wong, 1987 ; Steger & Frazier, 2005 ). Those who perceive their life as meaningful display higher optimism and self-actualization levels (Compton, Smith, Cornish, & Qualls, 1996) , self-esteem (Steger et al., 2006) and positive affect (King, Hicks, Krull, & Del Gaiso, 2006) . On the other hand, the failure to achieve meaning in life may lead to depression and anxiety (Steger et al., 2006; Debats, van der Lubbe, & Wezeman, 1993) , substance abuse (Steger et al., 2008a; Steger & Kashdan, 2006) , increased suicidal ideation (Harlow, Newcomb, & Bentler, 1986) and other mental health problems, mainly due to the experience of boredom, emptiness and apathy (Frankl, 1963) .
Following several decades of empirical and theoretical work on the significance of meaning in life (e.g., Frankl, 1963 ; Ryff & Singer, 1998 ; Steger, 2012 ), two important dimensions of the concept have been emerged (e.g., Crumbaugh, 1977 ; Steger et al., 2006 ): 1) presence of meaning and 2) search for meaning. Presence of meaning in life refers to the extent to which people perceive their life as comprehensible, important and meaningful (Steger & Kashdan, 2006) and is recognized as a construct that leverages positive functioning (Steger & Shin, 2010) . Individuals who feel that their life is meaningful experience greater well-being and lower levels of psychological distress (Steger et al., 2006) . Moreover, they have smoother adjustment after experiencing post-traumatic events (Steger et al., 2008b) and lower levels of neuroticism (Zika & Chamberlain, 1992) . They report better health (Steger, Mann, Michels, & Cooper, 2009) and have reduced likelihood of cognitive decline (Boyle, Buchman, Barnes, & Bennett, 2010) .
The second dimension, search for meaning in life, concerns the need for intently seeking a valued life purpose or mission and a greater meaning in life and originates from human intrinsic motivation (Frankl, 1963) . Although search for meaning has been found to result to resilient responses in cases of people facing negative events (e.g., Skaggs & Barron, 2006 ; Lee, 2008 ; Chan & Chan, 2011 ), in non-stressful situations it is linked to higher levels of depression and anxiety (Steger et al., 2009) , negative affect (Steger et al., 2008b) and reduced levels of happiness and life satisfaction (Park, Park & Peterson, 2010) . In most of the studies conducted (e.g., Steger et al., 2006 ; Damasio & Koller, 2015 ; Boyraz, Lightsey Jr., & Can, 2013 ; Temane, Khumalo, & Wissing, 2014 ), negative correlations have been documented between the two dimensions. This can be attributed to the fact that perceived meaning is associated with less of a need to pursue further meaning in life (Schulenberg, Strack, & Buchanan, 2011) . Nevertheless, it is noticeable that in some studies this result is overturned and the relationship between them has been reported to be positive, especially in collectivistic cultures such as China and Japan (Steger et al., 2008c; Wang & Dai, 2008) . This finding may result from the differences in social and value orientations (Steger et al., 2008c) . As the interest towards the understanding of human well-being continuously grows (Seligman, Steen, Park, & Peterson, 2005; Seligman & Csikszentmihalyi, 2000) , the necessity of using psychometrically sound research tools and rigorously validated measures for assessing such constructs has become imperative (Diener & Seligman, 2004) . Meaning in life is one of the most critical components to provide the conditions from which happiness arises (Lent, 2004; Ryff & Singer, 1998) and thus the creation of psychometric scales for its measurement has been deemed indispensable. Several measures of meaning in life have been developed in previous work, such as the Purpose in Life Test (PIL; Crumbaugh & Maholick, 1964 ), the Life Regard Index (LRI; Battista & Almond, 1973 ) and the Sense of Coherence Scale (Antonovsky, 1987) . However, these scales have been criticized for being confounded with several of the variables they correlate with (Frazier, Oishi, & Steger, 2003) as well as for having poor content validity (Dyck, 1987) and structural properties (Steger, 2007) . In an effort to overcome these pitfalls and provide a psychometrically robust measure of the presence of and search for meaning in life, Steger et al. (2006) developed the Meaning in Life Questionnaire (MLQ). MLQ scores have greater stability and better discriminant validity than scores of other scales. In a review of fifty-nine instruments assessing meaning in life, Brandstätter et al. (2012) concluded that MLQ is one of the best scales in terms of concept definition, sampling, item development and forms of analysis.
The Meaning in Life Questionnaire is a self-report inventory that assesses two dimensions of meaning in life. The Presence of Meaning subscale comprises 5 items (items 1, 4, 5, 6, and 9) and measures respondents’ perceived meaning in their life while the Search for Meaning subscale also consists of 5 items (items 2, 3, 7, 8, and 10) and measures their motivation to discover meaning in their life. The questionnaire takes about 3 - 5 minutes to complete. The MLQ does not have cut scores but it is intended to measure meaning in life across the full range of human functioning.
Steger et al. (2006) reported internal consistency reliability coefficients (Cronbach’s alpha) ranging between .81 and .86 for Presence and between .84 and .92 for Search subscale. One-month test-retest reliability coefficients were .70 for Presence and .73 for Search. Other studies also provide support for the reliability and test-retest reliability of MLQ scores. Currently, the instrument has been translated into more than 20 languages for international use and has been validated in several cultures including Greece (Pezirkianidis et al., 2016) , Brazil (Damasio & Koller, 2015) , Chile (Steger & Samman, 2012) , Hong Kong (Chan, 2014) , Turkey and USA (Boyraz et al., 2013) , China (Jiang, Bai, & Xue, 2016) , Japan (Steger et al., 2008c) , and South Africa (Temane et al., 2014) . During the last few years, the MLQ has also been used in diverse and special populations within the United States, such as individuals experiencing grief ( Boyraz & Efstathiou, 2011; Boyraz, Horne, & Sayger, 2010) , patients suffering from serious mental illnesses (Schulenberg et al., 2011) , smoking cessation patients (Steger et al., 2009) , and individuals from different ethnic backgrounds (Kiang & Fuligni, 2010) . As far as differentiation in demographics variables are concerned, no significant gender differences have been found, while older participants report higher scores in Presence subscale and lower scores in Search subscale (Steger et al., 2006) .
The purpose of this study is to validate the MLQ in the Greek context and explore its psychometric properties among non-clinical Greek population. More specifically, the objectives of this study are the following: 1) To validate the construct validity of the Meaning in Life Questionnaire (Steger et al., 2006) , Greek Version using both exploratory and confirmatory factorial analysis techniques like Bifactor EFA, Bifactor CFA and ESEM; 2) to examine measurement invariance of MLQ across gender; 3) to study the internal consistency reliability of the MLQ; and 4) to evaluate the convergent and discriminant validity of the MLQ with the constructs of well-being, hope, anxiety, depression, stress, hope and resilience.
The sample is a subset of a larger dataset. The data used in this study were collected from 2016 onwards. The sample comprised 1561 Greek adults of the non-clinical population. The participants were on average 39.7 years of age (SD = 12.81) and mostly female (62%). Most of them were married (51%), followed by single (41%), divorced (5%), widowed (2%), and 1% endorsing “other”.
2.2.1. Meaning in Life Questionnaire (MLQ)
The MLQ (Steger et al., 2006) is a ten-item measure of perceived meaning and purpose in life. More specifically, it is a two-dimensional scale with five items in each factor. The first factor (Presence of Meaning) includes items about the perceived existence of meaning (e.g., “I have a good sense of what makes my life meaningful”). The second factor (Search for Meaning) examines one’s perceived quest for purpose (e.g., “I am always looking to find my life’s purpose”). Items are rated on a 7-point Likert scale (from “Absolutely True” to “Absolutely Untrue”). Possible factor scores range from 5 to 35. Scores higher than 24 in each factor show high Presence/Search for meaning. From the scores of the two factors combined, four groups of respondents (Steger, 2010) are possible. The two factors are considered related but distinct. Higher scores suggest greater Presence/Search of meaning. Item nine (“My life has no clear purpose”) is reverse-scored.
2.2.2. Satisfaction with Life Scale (SWLS)
The Satisfaction with Life Scale (Diener, Emmons, Larsen, & Griffin, 1985) is a five-item measure of perceived life satisfaction. It contains five items about cognitive appraisals of one’s life (e.g., “The conditions of my life are excellent”), rated on a 7-point Likert scale, from 1 = strongly disagree to 7 = strongly agree. The higher the score the greater the perceived satisfaction. The range of possible scores is from least satisfaction with life (5) to highest life satisfaction (35). Most of the people fall between 23 - 28 (Pavot & Diener, 1993) . The SWLS has been used both in community samples (c.f. Pavot & Diener, 2008 ) and clinical samples (Arrindell, Meeuwesen, & Huyse, 1991) . Internal consistency reliability (Cronbach’s alpha) was satisfactory ranging from .79 to .89 (Pavot & Diener, 1993; Adler & Fagley, 2005; Steger, et al., 2006; Alfonso & Allison, 1992a) . The SWLS is negatively correlated with depression (Blais, Vallerand, Pelletier, & Briere, 1989) and with NA (Larsen, Diener, & Emmons, 1985) .
2.2.3. Subjective Happiness Scale (SHS)
The SHS (Lyubomirsky & Lepper, 1999) is a brief and widely used self-report measure of the degree to which the respondent feels happy. Four items are rated on a 7-point Likert scale (from 1 = not a very happy person to 7 = very happy person). Higher scores suggest higher mean happiness. Lyubomirsky and Lepper reported internal consistency reliability ranging from .79 to .94 (M = .86). In the same study, test-retest reliability ranged from .55 to .90. Finally, convergent validity with well-being measures varied from .52 to .72.
2.2.4. Trait Hope Scale (HS)
Trait Hope Scale (Snyder et al., 1991) is a 12-item, self-report measure of dispositional hope (e.g., “I can think of many ways to get out of a jam”). Items are tapping on two factors: Agency and Pathways (or confidence and ability to pursuit goals, respectively). Agency and Pathways represent aspects of hope, distinctly related (Bronk et al., 2009) . Responses are rated using an 8-point Likert scale, from 1 (Definitely False) to 8 (Definitely True), such that scores can range from a low of 8 to a high of 64, since four items are fillers. Generally, higher scores suggest a greater sense of hope. In the older version of the HS (Snyder et al., 1991) , responses were rated on a four-point Likert scale (1 = Definitely False, 4 = Definitely True). Snyder et al. (1991) reported that for the total scale, Cronbach’s alphas varied from .74 to .84.
2.2.5. Depression Anxiety Stress Scale (DASS)
DASS (Lovibond & Lovibond, 1995) measures emotional distress in three 7-item factors: namely depression (e.g., “I couldn’t seem to experience any positive feeling at all”), anxiety (e.g., “I was aware of dryness of my mouth”) and stress (e.g., “I found it hard to wind down”). The 21 items are rated on a four-point Likert scale assessing intensity/frequency of distress (from 0 = did not apply to me at all to 3 = applied to me very much, or most of the time) over the past week. The higher the score the more intense or frequent the emotional distress. Each factor has a distinct score ranging from 0 - 21. Scores greater than 14, 10 and 17 suggest extremely severe Depression, Anxiety and Stress respectively (Lovibond & Lovibond, 1995) . Internal consistency reliability was reported α = .97 for adults of the general population (Henry & Crawford, 2005) , and for each factor alphas ranged between.81 and .97 ( McDowell, 2006 cited in Yusoff, 2013 ).
2.2.6. The Connor-Davidson Resilience Scale (CD-RISC)
The Connor-Davidson Resilience Scale (CD-RISC, Connor & Davidson, 2003 ) includes 25 items measuring psychological resilience (e.g., Can handle unpleasant feelings). Items are rated on a 5-point Likert scale, from “not true at all” (0), to “true nearly all of the time” (4). The perceived emotional states are rated over the past month. The possible score varies from 0 to 100. Higher scores suggest greater resilience. Scores higher than 92 suggest high-resilience individuals. Connor & Davidson (2003) reported a Cronbach’s alpha of .89 for the entire scale. The CD-RISC was primarily developed to measure stress-coping ability. Therefore, it is negatively correlated with perceived stress and positively correlated with social support (Connor & Davidson, 2003) .
Data were collected with the help of psychology students who voluntarily administered the test battery to 15 adult persons of their social environment. About 100 students participated in the study receiving extra credit. All participants were voluntarily recruited by the students on the condition they were older than 18 years. A letter was included in the test battery to inform participants about the purpose of this study. More specifically, the following process took place. First, students received a training course on the administration of psychology questionnaires by the research-team members. Then, a period of pilot-testing the battery followed to track any ambiguities. During pilot testing, the time needed to complete the test battery was estimated (approximately 20 minutes). Finally, each student was supplied with 15 copies of the test battery in paper and pencil form to administer them to adults in their social environment individually.
2.4. Design of the Research
The sample was split in three parts to study construct validity of MLQ in different samples. More specifically, all analyses were carried out on two levels: 1) on three sub-samples (EFA, CFA1 and CFA2) to examine construct validity and cross-validate it; 2) on the entire sample (Total sample), to evaluate measurement invariance across gender, internal consistency reliability and convergent/discriminant validity. In the first sample (EFA Sample), Exploratory Factor Analysis and Bifactor Exploratory Factor Analysis were carried out. Independent Cluster Model Confirmatory Factor Analysis (ICM-CFA), Bifactor Confirmatory Factor Analysis and Exploratory Structural Equation Modeling Analysis followed in the second sample (CFA1 Sample), testing seven alternative solutions. The third sample was used for cross-validation of the optimal CFA model established from the second sample (CFA2 Sample). Then, a multi-group CFA (MGCFA) was carried out in the entire sample (N = 1561) to test for the measurement invariance of the MLQ across gender. Reliability analysis followed in the entire sample to examine internal consistency. Finally, the relation of MLQ to well-being (namely to life satisfaction and subjective happiness), trait hope, depression, anxiety, stress, and resilience were examined in the total sample. Data collected were coded and analyzed with SPSS Version 25.0 (IBM, 2017) , Stata Version 14.2 (StataCorp, 2015) and MPlus Version 7.0 (Muthen & Muthen, 2012) .
3.1. Data Screening
The total sample comprised N = 1561 cases. Missing values in all variables did not exceed 2%. Missing data analysis followed to examine whether values were missing completely at random (MCAR). Little’s MCAR test (Little, 1988) was not significant, Chi-Square (14,972, N = 1561) = 15,128.87, p = .182, suggesting that values were missing entirely at chance. Thus, missing values in the dataset were estimated with the Expectation-Maximization algorithm (EM). This method assumes a distribution for the missing values and makes likelihood-based inferences under that distribution (ΙΒΜ, 2016) . Then EM is calculating a matrix of means and covariances to estimate the missing values (Soley-Bori, 2013) .
To validate the MLQ factor structure, the total sample (N = 1561) was randomly split into three parts (NEFA = 313, NCFA1 = 624, NCFA2 = 624) with the random number generator algorithm of SPSS version 25 (IBM, 2017) . Caution was taken to keep enough sample power in all three subsamples (EFA, CFA1, CFA2). A sample-to-variable ratio of 10:1 (Osborne & Costello, 2005; Singh et al., 2016) or alternately more than 300 cases are generally considered adequate for factor analysis (Tabachnick & Fidell 1996; Comrey & Lee, 1992) . Our sample-to-variable ratio for the EFA sample (N = 313), CFA1 sample (N = 624) and CFA2 sample (N = 624) was 31.3 and 62.4 respectively. Sample-splitting (Guadagnoli & Velicer, 1988; MacCallum, Browne & Sugawara, 1996) is generally a cross-validation method (Byrne, 2010; Brown, 2015) .
3.2. Univariate and Multivariate Normality
The data in all four samples (Total, EFA, CFA1 and CFA2) violated the assumptions of univariate normality. Specially, Kolomogorov-Smirnov tests (Massey, 1951) on each item of the MLQ were statistically significant for all 10 items (p < .001). Four tests were used to examine multivariate normality: 1) Mardia’s multivariate kurtosis test (Mardia, 1970) ; 2) Mardia’s multivariate skewness test (Mardia, 1970) ; 3) Henze-Zirkler’s consistent test (Henze & Zirkler, 1990) , and 4) Doornik-Hansen omnibus test (Doornik & Hansen, 2008) . All four tests rejected the null hypothesis (all p < .0001), suggesting a violation of multivariate normality of the MLQ scores for all three samples.
3.3. Establishing Construct Validity with Exploratory Factor Analysis (EFA)
For both EFA and CFA, MPlus (Muthen & Muthen, 2012) uses robust rescaling-based estimators (like robust MLR). MLR is appropriate for non-normal distributions, estimating standard errors and chi-square test statistics. Finally, MLR can handle small to medium-sized samples ( Bentler & Yuan, 1999 ; Muthen & Asparouhov, 2002 as quoted in Wang & Wang, 2012 ). Considering all the above properties, the robust MLR was used as an estimator for EFA and CFA.
Two exploratory factor analyses were executed in the EFA sample (N = 313): 1) a standard EFA, and 2) a Bifactor EFA (Schmid & Leiman, 1957) . The MLR was used for parameter estimates with Geomin factor rotation. Bi-Geomin factor rotation was used for the Bifactor EFA. A standard EFA was carried out to establish a factor structure and have a baseline model for comparison with the EFA Bifactor model tested subsequently. Generally, relying on an EFA measurement model is usually a prerequisite to examine construct-relevant multidimensionality ( Morin et al., 2016a , cited in Howard, et al., 2016 ). Regarding Bi-factor analysis, Reise et al. (2007) recommended a bifactor always to be tested when checking dimensionality of a construct (cited in Hammer & Toland, 2016 ). The Schmid-Leiman method (Schmid & Leiman, 1957) has been used to test bi-factor CFA models. Nevertheless, Jennrich & Bentler (2011) recently introduced the Exploratory Bi-factor Analysis method used in this study. EFA model fit was evaluated by the following criteria (Hu & Bentler, 1999; Brown, 2015) : RMSEA (≤.06, 90% CI ≤ .06), SRMR (≤.08), CFI (≥.95), TLI (≥.95), and the chi-square/df ratio less than 3 (Kline, 2016) . Fit statistics (see Table 1) suggested that both models achieved acceptable fit to the data with CFI and TLI > .95 and SRMR slightly better for the Bifactor solution. Both Chi-square/df ratios were < 3, indicating adequate fit to the data (Kline, 2016) .
Table 2 contains Geominand bi-Geominfactor loading for both EFA solutions tested. Factor Loadings for simple EFA ranged for the Presence factor from .493 (item 9) to .833 (item 4) and for the Search factor from .633 (Item 10) to .824 (Item 3). All items had significant primary factor loadings to the intended factor with no cross-loading items. Geominfactor correlation between Presence Factor
Table 1. Model fit statistics for EFA and EFA bifactor.
Table 2. Factor loadings & factor correlations for the EFA models tested.
Bold indicates significant primary loading in EFA and weaker factor loadings for Specific factors in Bifactor EFA. *Significant at 5% level.
and Search Factor was .149 (p < .05). Steger et al. (2006) reported a weak negative correlation between the two MLQ Factors. In the bifactor solution, all factor loadings were much higher on the general factor than the intended specific group factor, generating the assumption that the MLQ items may measure a higher order construct of meaning. The marginally better fit statistics of the bifactor solution further support this suggestion. However, Bifactor models always tend to support unidimensionality (Joshanloo, Jose, & Kielpikowski, 2017) . Thus, we decided to examine the MLQ factor structure further with Confirmatory Factor Analysis.
3.4. Confirming Construct Validity with Confirmatory Factor Analysis (CFA)
MLR was also used to estimate the models in all Confirmatory Factor Analyses. Goodness of fit was evaluated by the Standardized Root Mean Square Residual (SRMR), the Root Mean Square Error of Approximation (RMSEA, 90% CI), the Comparative Fit Index (CFI), and finally by the Tucker-Lewis index (TLI). Model fit was evaluated by the following criteria (Hu & Bentler, 1999; Brown, 2015) : RMSEA (≤.06, 90% CI ≤ .06), SRMR (≤.08), CFI (≥.95), TLI (≥.95), and the chi-square/df ratio less than 3 (Kline, 2016) . Models evaluated were the following: 1) a single-factor Independent Cluster Model of CFA(ICM-CFA) which is generally recommended (Brown, 2015; Crawford & Henry, 2004) to examine the hypothesis of maximum parsimony; 2) a two-factor ICM-CFA model proposed by Steger at al. (2006) . A variation of this model was also tested by adding a covariance between items 2 and 3 and a covariance between items 7 and 8; 3) a CFA bifactor model ( Schmid & Leiman, 1957 , c.f. Reise, 2012 ), with presence and search in two factors and simultaneously tapping a general factor of life meaning, according to Reise et al. (2007) , as quoted in Hammer & Toland, 2016) . A variation of this model was also tested by adding a covariance between items 2 and 3 and a covariance between items 7 and 8; 4) a two-factor Exploratory Structural Equation Model (ESEM; Asparouhov & Muthen 2009 ) with all 10 MLQ items loading on the two MLQ factors simultaneously. A variation of this model was also tested by adding a covariance between items 2 and 3 and a covariance between items 7 and 8. ESEM produces more unbiased and accurate models in comparison to ICM-CFA because all secondary loadings are not constrained to zero like in ICM-CFA (Asparouhov & Muthen, 2009) . Therefore, in ESEM models factor correlations are more accurate even with trivial secondary loadings (Howard, Gagne, Morin, Wang & Forest, 2016) .
We did not test a higher order model. For a two first-order factorial structure, like MLQ, evaluating if the second-order factor improves the model fit when compared to the first-order solution is not possible because of under-identification of the higher order model (Wang and Wang, 2012) . The fit of all seven alternative CFA models for each sample is presented in Table 3.
Regarding model fit, the single factor ICM-CFA (simple CFA) model performed poorly. The two-factor ICM-CFA model presented almost acceptable fit, both with and without covariances. The bifactor models showed notably better fit than the two-factor models with all fit statistics within acceptable limits (see Table 3). Among them, Bifactor model with covariances showed the best fit to the data. However, the ESEM two-factor models also presented equally adequate fit, with all fit measures notable above fit criteria. Unsurprisingly, ESEM two-factor model with covariances showed better fit than the equivalent non-covariant ESEM model. Consequently, two competing optimal models emerged: 1) the two-factor bifactor with covariances model, and 2) the two-factor ESEM with covariances model (see all fit statistics in Table 3 and Figure 1 for path diagram).
Table 3. Model Fit statistics for the CFA and ESEM models tested with the MLR estimator in the confirmation sample.
Figure 1. A path diagram of the two optimal models found: ESEM model (Left) and CFA bifactor model (right). Conventionally, latent factors are represented by circles, errors as small arrows pointing on rectangles that represent manifest variables. Single-headed arrows connecting the variables represent a causal path while double-headed arrows on latent variables denote correlation. Double headed arrows between manifest variables denote error covariance.
However, like ICM-CFA methods, Bifactor analysis ignores cross-loadings therefore resulting in a general factor with an overestimated variance (Morin et al., 2016a) . Even trivial, unaccounted secondary loadings can inflate factor correlations leading to misspecification (Marsh et al., 2014; Asparouhov & Muthen 2009; Howard et al., 2016) . Therefore, unidimensionality based only on bifactor analysis is often questionable (Joshanloo et al., 2017; Joshanloo & Jovanovic, 2016) . Thus, we suggest that the two-factor ESEM model with covariances be preferable to the bifactor model with covariances (see Figure 1 for more details). Finally, it should be noted that in the ICM-CFA Model tested, the correlation of the Presence with the Search factor was weak but negative (−0.10), in line with Steger et al. (2006) .
3.5. Cross-Validation CFA in a Different Sample
After determining the optimal model, cross-validation of this model followed (Byrne, 2010; Brown, 2015) to test whether model fit in a different sub-sample of the dataset (N = 624). Cross validation of the optimal two-factor ESEM model with covariances was implemented using the MLR estimator (Muthen & Muthen, 2012) . As shown in Table 4, fit statistics showed an adequate fit to the cross-validation sample with all measures within acceptable limits. Additionally, all fit measures in CFA 2 sample had comparable values with their corresponding fit measures in the model emerged in CFA1 sample (see Table 4 for comparison).
3.6. Measurement Invariance
To test for measurement invariance across gender groups, the baseline
Table 4. Model Fit comparison for the optimal two-factor ESEM model with covariances in a validation and a cross-validation sample.
two-factor ESEM model with covariances was tested separately in each gender group (males, N = 599 vs. females, N = 962). To compare nested models for invariance across gender, we used the ΔCFI and ΔRMSEA criteria proposed by Cheung & Rensvold (2002) . The model fitted the data very well in females (Chi-square = 80.82, p = .0000, RMSEA = .051, CFI = .981) and equally well in males (Chi-square = 65.09, p = .0000, RMSEA= .055, CFI = .975). To evaluate measurement invariance, the model was then tested in both gender groups simultaneously. This model (M1) showed acceptable fit (see Table 5), suggesting that configural invariance was supported. Equality constraints were then imposed on all factor loadings across the two gender groups. As shown in Table 5 both ΔCFI and ΔRMSEA in this constrained model (M2) suggested full metric invariance. Finally, all intercepts were constrained to be equal (M3), and both ΔCFI and ΔRMSEA showed full scalar invariance. Thus, we assume that the MLQ is invariant across gender.
3.7. Internal Consistency Reliability
Internal consistency reliability of MLQ was evaluated with Cronbach’s alpha coefficient (Cronbach, 1951) . Values ≥ .70 are generally considered acceptable and ≥ .80 adequate (Kline, 2016; Nunnally & Berstein, 1995; Nunnally, 1978) . Alpha coefficient for the total MLQ, the Presence factor and the Search factor were .76, .85, and .86 respectively. All results considered together (values from .76 to .86) suggest that the scale shows adequate internal reliability.
3.8. Convergent and Discriminant Validity
Convergent and discriminant validity was assessed using the total sample (N = 1561). The Presence of meaning was correlated with search for meaning with a positive, weak and non-significant correlation (r = .23). The correlations of Presence of meaning with measures of well-being were positive and significant ranging from weak (r = .35, p < .01, with Hope pathways) to moderate (r = .53 with life satisfaction, p < .01), M = .45. Similarly, the presence of meaning had positive and significant correlations (all ps < .01) with the dimensions of CD-RISC, varying from weak (r = .23, Spiritual influences) to moderate (r = .55, control), M = .39. As expected, the correlations of presence of meaning with Depression, Anxiety and Stress were all negative and weak but significant (ps < .01),
Table 5. MLQ Measurement invariance of optimal 2-factor ESEM model across gender.
from r = −.28 with Stress to r = −.40 with Depression, M = −.33. Search for meaning had a negative correlation with happiness that was, weak and non-significant. Search for meaning had no correlation with life satisfaction and weak but significant positive correlations with both Agency and Pathways (r = .12, ps < .01). The correlations of search for meaning and CD-RISC dimensions were also positive and significant (p < .01) but equally faint, from .07 to .15, M = .11. The same was true for Search for meaning and Stress, Anxiety and Depression, with very weak, positive correlations (M = .06) of mixed significance (see Table 6 for details).
3.9. Descriptive Statistics of MLQ Scores
In comparison with the scores of the Search factor (M = 23.08, SD = 6.99), the Presence factor scores were higher and more invariable (M = 25.69, SD = 5.73). Most participants scored high both on Meaning factor and Search factor (>24), followed by those that scored high in the Presence factor (>24) but not equally high in the Search factor (see Table 7 for details).
The purpose of this study was to evaluate the following: 1) the construct validity of the Meaning in Life Questionnaire (Steger et al., 2006) , Greek Version using different explorative and confirmative factorial analysis approaches like Bifactor EFA, ICM-CFA, Bifactor CFA and ESEM; 2) the measurement invariance of MLQ across gender; 3) the internal consistency reliability of the MLQ; and 4) the convergent and discriminant validity of the MLQ with measures of well-being and mental distress. The results suggested that the data fit the ESEM representation better than both the ICM-CFA model and the bifactor model.
More specifically, the sample was split into three different sub-samples (Guadagnoli & Velicer, 1988; MacCallum et al., 1996) , maintaining enough sample power in each sample to ensure robustness of the models found. Generally, sample-splitting is used as a cross-validation method (Brown, 2015; Byrne, 2010) . Regarding sample power, all three samples were far beyond the suggested 10 cases for each observed variable threshold (Osborne & Costello, 2005; Singh et al., 2016) and larger than 300 or 500 cases (Tabachnick & Fidell 1996; Comrey & Lee, 1992), suggested as sufficient sample size for factor analysis.
Table 6. Bivariate correlation of MLQ on the total sample.
**p < .01; *p < .05.
Table 7. Summary statistics for MLQ score for the total sample.
First two alternative models were tested using Exploratory factor analysis (EFA): 1) a standard EFA to examine the factor structure and to have a baseline model for EFA comparisons, and 2) a bifactor EFA. Generally, both solutions showed adequate fit to the data, with all fit indices within acceptability criteria and significant factor loadings, all in their intended factor with no exceptions. Generally, the fit of the two models tested was very comparable with the Bifactor model having an overall better fit than the standard EFA model. This may not necessarily suggest a predominant general Meaning-in-life factor be present, since unidimensionality based only on bifactor analysis is unstable (Joshanloo et al., 2017; Joshanloo & Jovanovic, 2016) .
To examine this assumption further, a Conﬁrmatory factor analysis (CFA) followed in a second sample to verify the models emerged from EFA and bifactor EFA. All CFA models with error covariance were better than their counterparts with no covariance. Note that two covariances added to items of the search factor (see Figure 1)―namely, to item two (“I am looking for something that makes my life feel meaningful”) with item three (“I am always looking to find my life’s purpose”) and to item seven (“I am always searching for something that makes my life feel significant”) with eight (“I am seeking a purpose or mission for my life.”). Finally, two CFA models showed optimal fit among the seven alternative models tested: 1) The two-factor ESEM model with covariances, and 2) the two-factor bifactor model with covariances. Comparing ESEM to Bi-factor factorial analysis, the latter is a CFA subcategory, allocating the variance of items both into a general factor and sub-factors. Each item is specified to load on the general factor and also its target group factor (Reise, 2012) , here “Presence of meaning” and “Search for meaning”. Bifactor analysis allows for an examination of the common variance shared by the two MLQ factors and the unique variance specific to each of them.
However, bifactor analysis has received some criticism (Reise et al., 2013; Reise, 2012; Joshanloo et al., 2017 ). More specifically, it seems that relying solely on the results of bifactor analysis to decide whether a psychological scale is unidimensional or multidimensional may be questionable (Joshanloo et al., 2017) . Additionally, constraining non-zero cross-loadings to zero can inflate the variance attributed to the general factor in bifactor analysis (Morin et al., 2016; Joshanloo et al., 2017) . Given all above considerations, and the commonness of non-trivial secondary loadings in construct validation, bifactor analysis more often than not is expected to support unidimensionality (Joshanloo & Jovanovic, 2016) . All these limitations considered, we suggested that the two-factor ESEM model with covariances is preferable to the bifactor model.
Additionally, in the CFA, a noteworthy difference was found between this study and previous research. In contrast to this study, RMSE was generally reported to be relatively high in previous research (Damasio & Koller, 2015; Steger et al., 2006) . This improvement in RMSE values in this study in comparison to reported empirical research, could possibly be attributed to ESEM and Bifactor techniques used here.
The MLQ Factors in EFA were positively, weakly correlated at 5% level of significance. However, the correlation of the Presence with the Search factor in the ICM-CFA model tested was weak and marginally negative. This difference in the relationship of the two factors in EFA and CFA may reflect the mixed influences of both collectivistic and individualistic cultures (Hofstede, 2001; Triandis, 1995) on the Greek sociocultural context, since Greece (along with Cyprus) is situated in the south-eastern border of Europe. This finding is compatible with previous research, suggesting that in collectivistic cultures (e.g., Japan: Steger et al., 2008c or China: Wang & Dai, 2008 ) presence of meaning and search for meaning are positively correlated. The opposite is true in individualistic cultures (e.g., US: Steger et al., 2006 ). Equally, ways of finding personal meaning and the relationship between Search and Presence, could be affected by cognitive orientation at individual level (Steger et al., 2008c) , and by differences in social orientation across cultures at a group level respectively (Boyraz et al., 2013) .
The results of measurement invariance (c.f. Brown, 2015 ; Byrne, 2010 ; Kline, 2016 ; Schumacker & Lomax, 2015 ) of the cross-validated model emerged across gender suggested that all 10 items of the MLQ were invariant (full invariance) when used both by males and females. In other words, results from this multi-group CFA suggest that 1) across the two genders, the pattern of fixed and free parameters of the MLQ was equivalent (configural invariance); 2) across gender corresponding factor loadings of the 10 MLQ items were comparable (metric invariance or weak factorial invariance), and 2) across gender corresponding indicator means (intercepts) were equivalent (full scalar invariance or Strong factorial invariance). Furthermore, the results supporting the measurement invariance of the MLQ across gender are adding to previous research findings in several diverse cultures (e.g., Damasio & Koller, 2015 ; Boyraz, et al., 2013 ). According to Damasio & Koller (2015) invariance suggests a significant quality indicator for the MLQ, enabling valid group comparisons between genders, free from response bias.
Additionally, internal consistency reliability of the Greek MLQ is adequate. Specifically, Cronbach’s alpha values were comparable to the results reported by Steger et al. (2006) and by other studies (e.g., Steger & Samman, 2012 ; Chan, 2014 ; Jiang, Bai, & Xue, 2016 ; Steger et al., 2008c ). Lastly, convergent and discriminant validity of the Greek MLQ was also examined. Expected correlations were found between MLQ and dimensions of well-being, hope, resilience, stress, anxiety and depression. Presence of meaning had significant correlations with all the above measures. The magnitude of the relationships ranged from low to strong. The opposite was true for the Search for meaning, since the significance of the relations was of mixed level and the relationships with the above constructs were weak. The results of the correlation analysis were expected since the MLQ was reported to have overlapping content with other related variables (Steger et al., 2006) . Concerning the importance of the relationships, the Presence of meaning factor had different relationships with other constructs tested from MLQ Search for meaning both in magnitude and in direction. This is not surprising, since the two factors of MLQ are reported to have a weak, negative correlation according to Steger et al. (2006) .
Finally, the general conclusion of this work is that the two-factor structure of MLQ established by Steger et al. (2006) is confirmed on the Greek cultural environment because all alternative two-factor models tested by different factorial techniques (Bifactor EFA, ICM CFA, Bifactor CFA and ESEM) showed adequate fit to the data. Nevertheless, the optimal model among all different two-factor models tested was the ESEM model with error covariances. Considering all above findings, we reach the conclusion that the Meaning in Life Questionnaire, Greek version is a valid and reliable measure to use in the Greek context. A second important finding of this research is that MLQ Greek is gender equivalent thus, it can be unbiasedly used by both men and women.
This study has certain limitations that should be taken into consideration. First, during the data collection process, trained psychology students were involved. The impact of this method, if any, is unknown. Consequently, any generalization to other populations should be made with caution. Second, missing values in the dataset were estimated with the Expectation-Maximization algorithm (EM). Despite that EM is particularly appropriate for Factor Analysis, the information whether this method assumes a distribution that does not violate the assumption of normality (IBM, 2016) or not (Soley-Bori, 2013) is unclear.
Moreover, error covariances used in optimal model possibly suggest an overlapping content of the items (Brown, 2015) . A similar issue was reported by Damasio & Koller (2005) for the Brazil version of the MLQ. So, further research is necessary to examine this issue in yet another sample evaluating if it is a culture-specific effect. Future research could also evaluate new confirmatory factor analysis techniques like Bifactor ESEM. Finally, invariance of the MLQ across age is another possible direction of the research in the future. Besides, construct validity is built over time. So, multiple studies should be carried out over different samples to shape more robust evidence of construct validity progressively (Crocker & Algina, 1986; Messick, 1995) .
Despite the above limitations, the contribution of the present study is that we have provided strong evidence for the construct validity, measurement invariance across gender, reliability and convergent/discriminant validity of the MLQ, Greek version. With the use of EFA bifactor, ICM-CFA, CFA Bifactor and ESEM factorial analysis techniques the MLQ Greek is confirmed to be a valid, reliable and gender equivalent measure of well-being.