Slavs are the most numerous Indo-European ethno-linguistic group in Europe ( Kipfer, 2000 ). Their proposed homeland is in the middle Dnieper basin: the area North-East of the Carpathians, the upper stream of rivers Bug and Dniester, mainly around Pripet river ( Mutafchiev, 1943 ; Rebala et al., 2007 ). They spread from the 6th century AD to inhabit whole Eastern Europe and parts of Central and South-Eastern Europe.
Roman writers mentioned Slavs as Venethi or Venedi ( Curta, 2001 ). The Southern European Slavs are first named as Sclavenes by East Roman (Byzantine) authors in the 6th century AD ( Džino, 2010 ; Smith, 2005 ).
Extant Slavic languages show great variety and fall into three groups: West (Czech, Slovakian and Polish), East (Russian, Belorussian and Ukrainian) and South (Serbian, Croatian, Bulgarian and Slovenian). The Bulgarian literary language differs from other Slavic languages by the almost complete loss of grammatical case; the creation of definite article of nouns (appearing in the form of a suffix, added to the stem); analytical comparative and superlative (by word-particles); and a complex tense system where the infinitive is completely lost ( Aepli, von Waldenfels, & Samardzic, 2014 ; Kushniarevich et al., 2015 ; Raykov, 2005 ).
Orthodox Christian Slavs use the Cyrillic alphabet, while Roman Catholic Slavs and Bosniaks use the Latin alphabet.
Genetic and genomic analysis of Slavs from different countries have been the object of many previously published studies; however, comparative genetic investigations among different Slavic groups are only few ( Grzybowski et al., 2007 ; Malyarchuk et al., 2008 ; Mielnik-Sikorska et al., 2013 ; Rebala et al., 2007 ). To the best of our knowledge, the most comprehensive study of Slavic genetic heritage to date ( Kushniarevich et al., 2015 ) reveals that the Slavic genetic diversity was formed through assimilation of preexisting regional genetic components and in situ gene pool shaping; as it also identifies an apparent genetic homogeneity of the majority of West and East Slavs and a substantial genetic difference between them and South Slavs.
In order to further contribute to the understanding of the correlation between language and genetic origin in Slavs, the present study analyzes for the first time the matrilineal and patrilineal relationships among European Slavic-speaking countries’ populations and also illustrates their position in the European uniparental genetic landscape.
2. Materials and Methods
We have collected previously published data for the frequencies of mitochondrial DNA (mtDNA) and Y-chromosome haplogroups in Slavic speaking and other European populations. In these studies, the mtDNA haplogroup assignment was based on partial or entire control region sequences and/or coding region markers; and the Y-chromosome haplogroup classification was performed by genotyping of informative biallelic markers. To analyze comparable results for larger number of populations, the data were normalized to the highest possible level of phylogenetic resolution.
We have included 18 and 27 (sub-) populations in the mtDNA and Y-chro- mosome haplogroup analyses of Slavic speaking populations (Supplementary Table S1 and Table S2); and 41 and 42 (sub-) populations in the mtDNA and Y-chromosome haplogroup analyses of European populations (Supplementary Table S3 and Table S4), respectively.
The mtDNA and Y-chromosome relationships among the populations were depicted by Principal Component Analysis (PCA) performed using XLSTAT.
3. Results and Discussion
The plot of the PCA performed on mtDNA haplogroup frequencies in Slavic speaking populations is presented in Figure 1. In the Slavic mtDNA landscape, the East Slavic populations (European Russians, Ukrainians) predominantly occupy areas covering negative values of PC2. It should be marked that European Russians are separated in two distant groups―regions of the Central (Vladimir, Yaroslavl, Tula and Kaluga) and of the North-Western part (Pskov, Velikii Novgorod and Volot) of the country. Ukrainians and Belarusians are interspersed with West Slavic populations (Poles, Slovaks and Czechs). South Slavic speaking populations (Serbians, Bulgarians, Slovenians, Croatians and the populations of Bosnia and Herzegovina) are quite dispersed despite the comparatively small geographic area of the Balkan Peninsula. Nevertheless, they are positioned in an area of positive values of PC2.
Patrilineal relationships of Slavic speaking populations based on Y-chromo- some haplogroup frequencies are depicted in the PCA plot in Figure 2. Unlike in the case of mtDNA, in the PC analysis of Y-chromosome haplogroup frequencies there are clear-cut dispositions: South Slavic speaking populations (Bulgarians, Serbs, Bosnians, Bosnia-Serbs, Bosnia-Croats, Croats and Slovenians) are again much dispersed, but all of them are located in the PC1 positive values area. Most East Slavic speaking populations are residing in the negative part of PC2. West Slavic speaking populations (Slovaks, Czechs and Poles) are in between South and East Slavs, but are closer to East Slavs, being interspersed among them. Certain Russian populations (Vologda, Arkhangel and Orel) are markedly distant.
In the PC analyses of both mtDNA and Y-chromosome haplogroup frequencies of Slavic speaking populations, South Slavs are quite dispersed. This is probably due to their different pasts. It is established that the ancestors of Croatians and Serbians migrated to the Balkan Peninsula in the 8th century AD from territories in Central Europe (White Croats and White Serbs) ( Borri, 2011 ; Chadwick, 2014 ). The area of their dispersal is in the territory of present-day West Slavic populations’ countries. Very different is the story of the contemporary Bulgarians. South Slavic tribes (Sclavеnes) and Proto-Bulgarians arrived almost simultaneously on the Balkan Peninsula in the 7th century AD, when the
Figure 1. PCA plot of Slavic speaking populations based on mtDNA haplogroup frequencies. The variance of the first and second principal components (F1 and F2, respectively) is given in brackets. ―West, ―East and ―South Slavic speaking populations.
Figure 2. PCA plot of Slavic speaking populations carried out on Y-chromosome haplogroup frequencies. The variance retained by the first and second principal components (F1 and F2, respectively) is shown in brackets. ―West, ―East and ―South Slavic speaking populations.
The position of the mtDNA haplogroup frequency profiles of Slavic speaking populations in the European context is represented in Figure 3. In this PCA plot, South Slavs (Bulgarians, Slovenians, Serbians, Croats and populations of Bosnia and Herzegovina) are grouped alongside Balkan (Northern Greeks and Romanians) and Northern Italian populations. West Slavic populations (Slovaks, Czechs and most of the Poles) are adjacent to some North European non-Slavic populations (from Finland and Sweden). The majority of East Slavic populations (from European Russia and Belarus) are scattered as some European Russians (from Vladimir and Yaroslavl) and Ukrainians are close to West Slavic populations. Furthermore, Germanic and Romance speaking populations (from Germany and Austria; and Iberia, France and Italy; respectively) are located separately, being situated in the positive part of PC1.
The comparison of the Y-chromosome haplogroup frequencies in Slavic speaking and remaining European populations performed by PC analysis is given in Figure 4. Compared to the PCA of mtDNA haplogroup frequencies, it again shows a more clear-cut grouping of most of South Slavs (Serbs, Bulgarians, Croats, Bosnia-Croats, Bosnia-Serbs, Bosnians) with neighboring populations from the Balkans (Romanians, Greeks and Macedonian Greeks). On the other hand, West (Czechs, Poles and Western Slovaks) and East Slavs (Ukrainians, Belarus and European Russians) are located separately. From the non-Slavic populations Swedish Saami and Finns are almost outliers; the two populations from Germany are almost overlapping, whereas Italian populations form a cluster which embraces Catalonia.
In general, the obtained results show that based on the distribution of mtDNA and Y-chromosome haplogroups West and East Slavic speaking populations locate separately from South Slavic populations. Furthermore, in the European uniparental landscape South Slavic speaking populations are positioned more close to neighboring Balkan non-Slavic populations and North Italian populations, than to other Slavic populations. This hints that the linguistic resemblance of South Slavic speaking populations with East and West Slavic groups is not paralleled to a similar extent by a genetic one, which is in line with previous findings demonstrating that the basis of the gene pool of West-East and South Slavic speaking populations is different ( Kushniarevich et al., 2015 ).
When considering the uniparental diversity of Slavic speaking populations, one should highlight a peculiarity in the East Slavs’ gene pool, namely the prevalence of Y-chromosome haplogroup N, which is typical for certain Asian populations: China-Naxi―25%, China-Oroqen―28.5%, China-Tu―28.5%, Cam- bodians―16.7% ( Sengupta et al., 2006 ), Mongols―8.7% ( Derenko et al., 2007 ; Hammer et al., 2005 ), Manchurian―4.9% ( Hammer et al., 2005 ), Koreans―4.3% ( Hammer et al., 2005 ; Zhong et al., 2010 ), Vietnamese―2.8% ( Hammer et al., 2005 ); Central Asian Turkic populations, such as Khakassians―50% and
Figure 3. PCA plot of European populations based on mtDNA haplogroup frequencies. The variance captured by the first and second principal components (F1 and F2, respectively) is written in brackets. NE―northeast, NW―northwest, ―West, ―East Slavic, ―South Slavic, ―non-Slavic populations.
Figure 4. Map of Principal Component Analysis of European populations based on their Y-chromosome haplogroup frequencies. Variance explained by the first and second principal components (F1 and F2, respectively) is shown in brackets. ―West, ―East, ―South Slavic speaking, ―non-Slavic populations.
Altaians―9.2% ( Derenko et al., 2007 ); and several Siberian populations: Khanty― 81.5% ( Mirabal et al., 2009 ), Yakuts―80.0% ( Derenko et al., 2007 ), Buryats― 30.9% ( Hammer et al., 2005 ), Evenks―29.3%, Tuvininas―29.1% and Koryaks― 25% ( Derenko et al., 2007 ).
In Europe Y-chromosome haplogroup N shows high frequency in Saami― 44.7% ( Karlsson et al., 2006 ). The frequency values for European Russians are striking: Arkhangel―41.1% ( Balanovsky et al., 2008 ; Mirabal et al., 2009 ), Vologda―38.8%, Pskov―24.2%, Smolensk―14.0%, Kostroma―13.5% ( Balano- vsky et al., 2008 ), Kursk―12.9% ( Balanovsky et al., 2008 ; Mirabal et al., 2009 ), Belgorod―12.6% ( Balanovsky et al., 2008 ), Tver―11.7% ( Balanovsky et al., 2008 ; Mirabal et al., 2009 ), Voronezh―6.3%, Orel province―5.5% ( Balanovsky et al., 2008 ) and Kalmyks―3.3% ( Derenko et al., 2007 ). The data for Bela- russians are 9.6% ( Kushniarevich et al., 2013 ; Kushniarevich et al., 2015 ) and for Ukrainians―5.7% ( Battaglia et al., 2009 ; Kushniarevich et al., 2015 ).
In the Y-chromosome gene pool of any West or South Slavic speaking population haplogroup N does not exceed 2-3% in frequency: Poles―2.7% ( Battaglia et al., 2009 ; Rebala et al., 2013 ), Czechs―2.7%, Croats―1.7% ( Battaglia et al., 2009 ), Bulgarians―0.5% ( Karachanak et al., 2013 ), Serbs 1.9% ( Regueiro et al., 2012 ), Slovenians―0% and Bosniaks―0% ( Battaglia et al., 2009 ). The same pattern is observed in the remaining populations in Europe, the Caucasus: Balkarians―0% and Georgians―0% ( Battaglia et al., 2009 ) and in South Asia ( Sengupta, et al., 2006 ). Obviously, this characteristic in the patrilineal gene pool of East Slavs can be explained by the Mongol invasion and the presence of Mongols in Russia (called Tatars by Russians) between 1237 and 1480 AD ( Karamzin, 1811/2016 ; Deynichenko, 2003 ).
In conclusion, as illustrated by the PC analysis of mtDNA and Y-chromosome haplogroup frequencies, West-East and South Slavic speaking populations, traditionally called “Slavs” (a term introduced in the 16th century AD) ( Šafařik, 1848 ) are heterogeneous based on the uniparental genetic diversity, which shows that they do not share substantial common genetic ancestry and that there is great genetic variety in the Slavic linguistic unity.
We would like to thank prof. Antonio Torroni, prof. Ornella Semino and their collaborators (Department of Biology and Biotechnology, University of Pavia, Italy) for their contribution for the development of the uniparental genetic study in Bulgaria.
Table S1. Number of individuals belonging to each mtDNA haplogroup in European Slavic speaking populations.
a Kushniarevich et al., 2013 ; bMalyarchuk et al., 2003; cSarac et al., 2014;dKarachanak et al., 2012; e Malyarchuk et al., 2008 ; fMalyarchuk et al., 2004; g Grzybowsky et al., 2007 ; hDavidovic et al., 2015; iPshenichnov et al., 2013.
Table S2. Absolute frequencies of Y-chromosome haplogroups in Slavic speaking populations.
a Balanovsky et al., 2008 ; b Mirabal et al., 2009 ; c Kushniarevich et al., 2013 ; d Kushniarevich et al., 2015 ; e Battaglia et al., 2009 ; f Battaglia et al., 2009 ; g Rebala et al., 2013 ; hReguiero et al., 2012.
Table S3. Number of individuals per mtDNA haplogroup in European populations.
a Hernаndez et al., 2014; bBrandstätter et al., 2007; c Kushniarevich et al., 2013 ; dBoattini et al., 2013; eMalyarchuk et al., 2003; fSarac et al., 2014; gKarachanak et al., 2012; hGonzález et al., 2003; i Malyarchuk et al., 2008 ; jHedman et al., 2007; kRichard et al., 2007; lTetzlaff et al., 2007; mMalyarchuk et al., 2004; nIrwin et al., 2008; o Grzybowsky et al., 2007 ; pHervella et al., 2014; qDavidovic et al., 2015; rTillmar et al., 2010; sPshenichnov et al., 2013.
Table S4. Absolute frequencies of haplogroups of the Y-chromosome in European populations .
a Balanovsky et al., 2008 ; b Mirabal et al., 2009 ; c Kushniarevich et al., 2013 ; d Kushniarevich et al., 2015 ; eBoattini et al., 2013; f Battaglia et al., 2009 ; g Karachanak et al., 2013 ); hSolé-Morata et al., 2015; iKing et al., 2008; j Karlsson et al., 2006 ; k Rebala et al., 2013 ; lMartinez-Cruz et al., 2012; mVarzari et al., 2013; nReguiero et al., 2012.