1. A Communication Channel Approach to the Theory of Translation
Translation is the replacement of textual material in one language by equivalent textual material in another language. It transfers meaning from one set of patterned symbols into another set of patterned symbols. Translation was formerly studied as a language-learning methodology or as part of comparative literature. Over time, however, the interdisciplinary and specialization of the subject have become more evident and theories and models have continued to be imported from other disciplines   . References  -  report results not based on mathematical analysis of texts, as we do with the theory here proposed. When a mathematical approach is used, as in References  -  , most of these studies neither concern the aspects of Shannon’s communication theory  , nor the fundamental connection which some linguistic variables have with reader’s reading ability and short-term memory capacity, considered instead in this paper. In fact, these studies are mainly concerned with machine translations, not with a response of human readers. Very often they refer only to one linguistic variable, e.g. phrases  . As stated in  , statistical machine translation is a process in which the text to be translated is “decoded” by eliminating the noise by adjusting lexical and syntactic divergences to reveal the intended message. In this paper, on the contrary, what we define as “noise”—given by quantitative differences between source text and translated text—must not be eliminated because it makes the translation readable and matched to reader’s short-term memory capacity, a connection never considered in the mentioned references.
The aim of this paper is to show that there seems to be a mathematical/statistical background that unifies all alphabetical languages, despite the spreading of the statistics of linguistic variables from language to language, described by parallel communication channels, one for each linguistic variable. The differences between translations seem to be mostly due to differences in the expected reader’s reading ability—quantified by readability formulae—and reader’s short-term memory capacity—quantified by Miller’s 7 ± 2 law  —assumed by the translators, not to a particular language. In other words, it seems that the mythical biblical Tower of Babel has produced a lot of “noise”, but has not destroyed this common background.
This unifying picture is mainly assessed by defining firstly an ideal translation channel, and secondly by comparing the actual translation channels to it, according to communication theory. Shannon has set the fundamental mathematics of the main parts of a communication channel  : the source of information (input) and the channel to which this information is delivered, with its response (output) to the input. The source is characterized by its entropy, i.e. the minimum average number of bits necessary for coding a symbol randomly produced by the source; the channel is characterized by the signal-to-noise ratio, which determines its capacity (in bits per symbol).
In this paper, we study the translation channel, after suitably defining input and output symbols. Compared to Shannon’s channels, our linguistic channels work at a different level because, for equal meaning, both input and output texts are structured in such a way to match reader’s expected reading ability and short-term memory capacity. In other words, these channels do not communicate with a machine, but with human beings, who may have serious difficulties in understanding what they read, if the text is not matched to their own reading ability and short-term memory capacity. The peculiarity of these linguistic channels makes less important the translation language. None of the previous studies has considered this unified approach.
The main mathematical/statistical characteristics are determined by studying the translation of a large selection of New Testament (NT) books—namely Matthew, Mark, Luke, John, Acts, Epistle to the Romans, Apocalypse, for a total of 155 chapters, according to the traditional subdivision of the original Greek texts—from Greek to Latin and to other 35 modern languages, 36 translations in total. The theory does not include meaning.
The rationale for studying the NT translations is based on their accuracy in any translation because done by a team of experts, whose aim is to render the same meaning of the original Greek texts to their readers, regardless of the language used. These translations strictly respect the subdivision in chapters and verses of the Greek texts, therefore they can be studied at least at these two different levels (chapters and verses), by comparing how a variable varies quantitatively from translation to translation. Notice that “translation” should not be confused with “language” because language plays one of the roles in translations, not the only one. For our analysis, we have chosen the chapter level because the amount of text is sufficiently large to assess reliable statistics. Therefore, for each translation we have considered a database of 155 × 37 = 5735 samples for each stochastic variable, sufficiently large to give reliable results.
After this introduction, the rest of the paper is organized as follows. In Section 2 we list the NT translations and some fundamental statistics; in Section 3 we define the ideal translation and the real translation; in Section 4 we define the communication channel, its noise-to-signal power ratio and describe its geometrical representation; in Section 5 we define the linguistic communication channels and study them; in Section 6 we deal with channel capacity according to communication theory; in Section 7 we relate the number of words per interpunctions (also termed the word interval) to human short-term memory capacity; in Section 8 we define a readability formula applicable to any alphabetical language; in Section 9 we examine different versions of the NT translation in the same language; in Section 10 we compare the translations of a novel from English to some modern languages and compare their statistics with the NT translation statistics from English to the same languages; in Section 11 we propose a general theory of language translation; finally, in Section 12 we summarize the main results and draw some conclusions. Some appendices report more details.
2. Translations: Babel of Different Statistical Results
Following our statistical study of a large corpus of literary texts taken from the Italian Literature spanning seven centuries  , we use the same stochastic variables to study the NT translations, namely, the number of words nW, sentences nS and interpunctions nI per chapter, the number of characters per word CP, the number of words per sentence PF, the number of words per interpunctions IP—a variable that seems to be related to the short-term memory capacity  —the number of interpunctions per sentence MF, which gives also the number of word intervals per sentence, and the total number of words W, sentences S and interpunctions I (Appendix A lists all mathematical symbols). How interpunctions were inserted into the scriptio continua of Greek and Latin texts is reviewed and discussed in  .
Table 1 lists the NT translations considered in our study, with some first-order statistics. We have considered only alphabetical languages, listed according to their linguistic family for visualizing possible similarities. Esperanto is an artificial (constructed) language based on European languages.
We downloaded each translation text from the web sites reported in Table 1, and saved the text in WinWord format. Then, for each chapter we counted words, sentences and interpunctions (full-stops, question marks, exclamation marks, commas, colons, semicolons) after deleting all extraneous characters added to the original text by translators/commentators, such as titles, footnotes et cetera. At the end of this lengthy and laborious work, only the original text sine glossa was left to be studied. Of course, we do not need to understand any of the translation languages because the process consists in just counting characters and sequences of characters.
The first impression arising after reading these statistics is their large variety. Words, sentences and their distribution within chapters (for sentences) and within sentences (for words) can be very different from translation to translation. Even though all these texts convey the same meaning, the spread—i.e. the scattering of the values—is large. For example, the number of total words ranges from 90,799 (Latin) to 152,823 (Haitian), a spread of 62,024 words which represents 61.9% of the total number of words in Greek, 100,145; the number of total sentences ranges from 5370 (Latin) to 10,429 (Haitian), a spread of 106.3% of the total number of sentences in Greek, 4759. However, a ranking is evident as some translations are closer to the Greek originals than others. A similar spread is also noticeable in the average and standard deviation of the number of words per sentence PF (from Hebrew to Welsh, range 52.4%), words per interpunction IP—word interval—(from Russian to Cebuano, range 60.8%) and interpunctions per sentence MF (from Cebuano to Esperanto, range 79.5%), Table 2.
Table 1. List of languages used in the NT translations (Matthew, Mark, Luke, John, Acts, Epistle to the Romans, Apocalypse). Total number of words W, sentences S and interpunctions I. Average values of characters per word CP, and words nW, sentences nS, interpunctions nI per chapter. In brackets: standard deviation. Last access to the indicated web sites was in the week October 5 to 9, 2020.
Table 2. Words per sentence PF, words per interpunction IP (word interval) and interpunctions per sentence MF (word intervals per sentence). The first number gives the average value, the number in brackets gives the standard deviation, and the third number gives the correlation coefficient between the two stochastic variables that define the parameter.
For avoiding misuse of the results reported in Table 1, Table 2, notice that the average values shown in Table 2 do not coincide with averages calculable from Table 1, because, in general, the average value of a ratio is not equal to the ratio calculated from the total values. For example, for Greek the total number of words divided by the total number of sentences (i.e., an estimate of the average value of the variable “words per sentence”), from Table 1 is 100,145/4759 = 21.04, while the average value of the ratio of the number of words per chapter divided by the number of sentences per chapter is 23.07 (Table 2).
The correlation r between the number of characters and the number of words is not reported because, as for Italian  , for all languages r > 0.990. Finally, notice that the lists of names (Genealogy) in Matthew 1.1 - 1.17 and in Luke 3.23 - 3.38 have been deleted for not biasing the statistics of all linguistic variables. In the following sections we investigate in-depth all these variables.
3. The Ideal Translation and the Real Translation
When a text written in a language is translated into a text written in another language, all linguistic variables do numerically change. Besides the total number of words W, sentences S and interpunctions I, the other main linguistic variables are the number of words nW, sentences nS, and interpunctions nI, per chapter. To them we add the number of characters per word CP, words per sentence PF, words per interpunctions IP, interpunctions per sentence MF. We refer to this latter set of variables as the deep-language variables. All these variables of language Y can be statistically compared to those of a reference language X (Greek) by calculating the correlation coefficient38 r between any couple of variables y of language/translation Y and x of the reference language/translation X (in the following, where no confusion is possible, we refer to a variable and to the language/translation with the same mathematical symbol), and their expected regression line (i.e., the relationship between averages):
38The correlation coefficient r between two variables x, y, with averages mx, my and standard deviations sx, sy, is given by , where is the covariance; indicates average value  .
with m the slope of the line. Of course, we expect, and it is so in the following, that no translation can yield r = 1 and m = 1, a case referred to as the ideal translation. In practice, we always find and . The slope m measures the multiplicative “bias” of the dependent variable compared to the independent variable, the correlation coefficient measures how “precise” the linear fit is. Even though the ideal translation is never found, it is useful as a reference model to which real translations can be compared. In the following we refer to it as the self-translation channel.
Figure 1 shows the scatterplot between nW in Greek and nW in the other languages listed in Table 1, with the regression lines (1); it shows with greater detail what reported in Table 1, Table 2. We can notice that translations can use more
Figure 1. Left: Scatterplot between nW in Greek and nW in the other translations listed in Table 1, together with the regression lines (1). The black line is the line y = x. The red line is the regression line between Latin and Greek. Right: Histogram of the difference (“error”) between the actual number of words in a given translation and the number of words in that translation calculated from the regression line, for a given Greek value.
or fewer words than Greek, and that Latin (red line) is one of the closest translations to Greek. Table 3 lists the values of the correlation coefficient r and slope m. Latin is the translation better correlated to Greek (r = 0.994), Hebrew the worst (0.949).
According to the regression lines, i.e., to the relationship between the average values in Y for assigned values in X (Greek), the translations that reduce the number of words (regression lines below the 45˚ line in Figure 1) mostly belong to the Balto-Slavic family, while the translations that increase this number belong to the Romance and Anglo-Saxon families (except Finnish). The range is . Figure 1 also shows the histogram of the difference (“error”) between the actual number of words in a given translation and the average number of words in that translation calculated from the regression line, for a given Greek value. The spread of these latter values makes r < 1. The probability density function deducible from Figure 1 can be modelled as Gaussian.
Figure 2 shows the results concerning nS. All languages/translations have more sentences than Greek, ranging from Latin (m = 1.123) to Haitian (m = 2.085), Table 3, therefore implying a multiplicative bias larger than the words bias, and saying that translations have very different distributions of full stops and, in general, interpunctions, not only compared to Greek, but also compared to each other. The correlation coefficients are all significantly lower than those concerning nW, in the range , Table 3. All translations convey the same meaning but with different quantities of words and sentences.
Figure 3 shows the results concerning nI. Most translations use more interpunctions than Greek, ranging from Swedish (m = 1.107) to Haitian (m = 1.730), see Table 3, therefore implying, again, a multiplicative bias larger than that found with words and sentences. Interpunctions impact directly on readers’ reading ability and short-term memory capacity. The correlation coefficient varies in the range .
Table 3. Slope m and correlation coefficient r of the regression line y = mx between a given stochastic variable in a translation and the corresponding variable in the original Greek text.
Figure 2. Left: Scatterplot between nS in Greek and nS in the other translations listed in Table 1, together with the regression lines (1). The black line is the line y = x. The red line is the regression line between Latin and Greek. Right: Histogram of the difference (“error”) between the actual number of words in a given translation and the number of words in that translation calculated from the regression line, for a given Greek value.
Figure 3. Left: Scatterplot between nI in Greek and nI in the other translations listed in Table 1, together with the regression lines (1). The black line is the line y = x. The red line is the regression line between Latin and Greek. Right: Histogram of the difference (“error”) between the actual number of words in a given translation and the number of words in that translation calculated from the regression line, for a given Greek value.
A larger spread can be noticed in the deep-language variables PF, IP and MF, Table 3 and Figures 4-6. The slopes and correlation coefficients of these variables clearly underline the fact that the distribution of interpunctions, within a chapter, introduced in any text for better conveying the meaning to readers, can be quite different from translation to translation. Compared to nW, nS and nI, the multiplicative bias increases for all languages, with very few exceptions (e.g. Esperanto and Welsh in the variable PF), and the correlation coefficients become smaller.
Now, to study the chaotic data reported in Tables 1-3, it is very useful to consider a translated text as the output of a communication channel fed by the original text. The characteristics of this channel (one for each stochastic variable) can give us more insight into the mathematical/statistical deep structure of alphabetical (and possibly human) languages. Before doing so, in the next section we define a useful parameter, namely the noise-to-signal power ratio of a real translation channel compared to the ideal channel.
4. Noise-to-Signal Power Ratio and Its Universal Geometrical Representation
We characterize any translation and its linguistic stochastic variables as a complex communication channel, made of parallel channels—one for each variable—affected by “noise”. The input language is the “signal”, the output language is a “replica” of the input language, but largely perturbed by noise. From the point of view of the output language this noise is, of course, indispensable for conveying the meaning to readers of the output language. To study these channels, we define a suitable noise-to-signal power ratio and use a geometrical representation borrowed from author’s design of deep-space radio links  , also applied in  . This geometrical representation is universal.
Two variables y and x, linked by a regression line y = mx, where m is the slope of the line, are perfectly correlated if the correlation coefficient r = 1, and are not biased if m = 1, in other words, if the regression line is y = x (45˚ line, m = 1) and all y-values lie on the line (r = 1). If these conditions are not met, we consider the variance of the difference between the regression line values (m ≠ 1) and the ideal line y = x values, at given x-values, as the “regression noise” power Nm, and the variance of the difference between the values not lying on the line and the regression line y = mx, (r ≠ 1), as the “correlation noise” power Nr.
Let us apply these concepts to language translation. Defined the variance of language x and of language y, the difference y - x between the regression line of the real translation channel and that of the ideal channel is given by , therefore the variance (or power) of the regression noise is given by:
Figure 4. Left: Scatterplot between PF in Greek and PF in the other languages listed in Table 1, together with the regression lines (1). The black line is the line y = x. The red line is the regression line between Latin and Greek. Right: Histogram of the difference (“error”) between the actual number of words in a given translation and the number of words in that translation calculated from the regression line, for a given Greek value.
Figure 5. Left: Scatterplot between IP in Greek and IF in the other translations listed in Table 1, together with the regression lines (1). The black line is the line y = x. The red line is the regression line between Latin and Greek. Right: Histogram of the difference (“error”) between the actual number of words in a given translation and the number of words in that translation calculated from the regression line, for a given Greek value.
Figure 6. Left: Scatterplot between MF in Greek and MF in the other translations listed in Table 1, together with the regression lines (1). The black line is the line y = x. The red line is the regression line between Latin and Greek. Right: Histogram of the difference (“error”) between the actual number of words in a given translation and the number of words in that translation calculated from the regression line, for a given Greek value.
Then, the regression noise-to-signal power ratio, Rm, is given by:
Notice that in (3) what counts is the absolute difference because Rm is an even function (parabola) around m = 1.
According to the theory of regression lines  , the fraction of the variance due to the y-values not belonging to the line (correlation noise power, Nr) is given by:
This noise power is correlated with the slope m, because the fraction of the variance due to the regression line y = mx, namely , is related to m according to the following relationship  :
Therefore, the correlation noise-to-signal power ratio, Rr, is given by:
Now, because the two noise sources are disjoint, the total noise-to-signal power ratio of the channel is given by:
By (3) and (6), R depends only on the two parameters m and r of the regression line (Table 3), given by:
For each couple of the same variable, in Greek and in a translation, we can represent Equation (8) graphically by considering the variables (not to be confused with translations):
By setting , being Ro a constant, X and Y trace a circle with radius in the first Cartesian quadrant. All points inside the circle correspond to ; the origin of the axes corresponds to R = 0 of the ideal channel, m = 1 and r = 1. The reciprocal of R is the signal-to-noise power ratio , which becomes infinite at the origin and decreases as the radius of the circle increases.
As discussed in  , among other features not of interest here, adopting the noise-to-signal power ratio instead of the more common signal-to-noise power ratio allows this graphical representation, which immediately shows how Rr and Rm, through their square roots, contribute to the total R, and which of the two pushes the translation away from the ideal self-translation.
In conclusion, the comparison between any couple of corresponding variables can be studied as a “communication channel” in which the input signal is the Greek text variable and the output signal is the translation variable. Compared to the ideal channel, the actual channel is noisy, always characterized by R > 0. Of course, as already noted, this indispensable “noise” is what actually makes the translation intelligible to the intended readers of the translated texts. In the next section we study these communication channels.
5. Linguistic Communication Channels
We compare, for each chapter, the numbers of words, sentences, interpunctions, and the so-called deep-language variables PF, IP, MF, of the original Greek texts to those of another language. The values of the slope m of the linear model (1) and the correlation r for all variables and translations can be read in Table 3. From these data we can calculate , and the noise-to-signal power ratio.
Let us first consider the words channel nW. Figure 7 shows the results obtained according to the geometrical representation discussed in Section 4. The closer the point is to the origin, the less noisy the channel, therefore implying a communication channel is closer to the ideal channel. Latin, Basque, Russian and Croatian are the least noisy languages (the black circles will be discussed in Section 6). All other languages values lie approximately along the regression line:
A regression line with a > 0, as Equation (10), is due to languages with m > 1, while a regression line with a < 0 is due to languages with m < 1.
From Equation (10) it turns out that, even though some translations can be practically unbiased (m ≈ 1), as is the case of Slovak, they can never be perfectly correlated with the Greek texts, i.e., their correlation coefficient can never approach 1. In fact, when m = 1, i.e. X = 0, from Equation (10) we get Y = 0.157 and, by setting m = 1 in Equation (6), we can calculate the corresponding “irreducible” (minimum) correlation coefficient:
This value has to be compared with the minimum value 0.949 of Hebrew (Table 3).
In conclusion, even though the channel is very close to being ideal for the slope (m ≈ 1, no bias on the average, very small regression noise), it can never be ideal for the correlation coefficient, therefore there is always some significant correlation noise around the 45˚ line. Notice that there is no clear trend for the various language families, except for the Balto-Slavic family, which minimizes the regression noise X, because m ≈ 1, therefore these translations are grouped towards the Y-axis. The noisiest languages are Norwegian, Cebuano and Haitian.
Let us consider the sentences channel nS, whose results are shown in Figure 8. Now, both and are further away from the origin than those of the words channel, therefore the noise-to-signal power ratio is greater than that of the words channel. Latin is, again, the least noisy language, together with Croatian and Basque. Moreover, as already noticed, the number of sentences tend to be larger than in Greek, therefore m > 1. The noisiest language is Haitian because of the extreme values m = 2.085 and r = 0.912. The regression line drawn in Figure 8 is given by , therefore the irreducible correlation coefficient is approximately the same of the words channel. In other words, if there were no multiplicative bias (m = 1), the spread of words and sentences around the regression lines, Table 3, would be very similar. Now, because characters and words are very much correlated (r > 0.990 for all languages, not shown but verified, just like for Italian literature  ), this observation applies also to the characters channel.
Let us consider the interpunctions channel nI, whose results are also shown in Figure 8. This channel is noisier than nW and nS channels. Swedish is the least noisy language, Haitian the noisiest. Each language, in fact, introduces a very different distribution of interpunctions in a chapter, both in type (full-stops, question marks, exclamation marks, commas, colons, semicolons) and quantity, therefore changing the length of sentences, word intervals, and interpunctions per sentence. The regression line drawn in Figure 8 is given by , therefore the irreducible correlation coefficient (11) is , the lowest of the three channels examined so far.
Figure 7. Scatterplot between and in the words nW Channel. The origin represents the ideal channel. The black arcs of circles give contours of equal channel capacity C (Section 6).
Figure 8. Scatterplot between and . The origin represents the ideal channel. Left: nS channel. Right: nI channel. The black arcs of circles give contours of equal channel capacity C (Section 6).
Let us consider the number of words per sentence, PF, a deep-language variable. Figure 9 shows the results obtained. This channel has a large correlation noise, as we can see from the range of Y, a consequence of the very low correlation coefficients (Table 3). The least noisy language, again, is Latin, the noisiest is Norwegian.
The results of the channel concerning the number of words per interpunction, i.e. the word interval IP, are also shown in Figure 9. The least noisy languages are Basque, Latin, Estonian and Croatian, the noisiest is Haitian. In general, the IP channel is less noisy than the PF channel. It seems that IP cannot be set as much independently from Greek as PF seems it can be. A likely explanation is that the word interval is empirically correlated with the short-term memory capacity, and this capacity not only is limited according to the 7 ± 2 Miller’s law  , but it cannot change so much in humans, regardless, of course, of the language used, therefore it varies less from language to language. This is not the case for PF, a variable more linked to the output language, or translation style and intended readers through a readability index (see Section 8), than to human short-term memory capacity.
The results of the channel concerning the number of interpunctions per sentence, MF, are also shown in Figure 9. The least noisy language is again Croatian, the noisiest is again Haitian, with Y ≈ 60 (due to the very low correlation coefficient 0.012, practically zero) and X ≈ 0.3, not shown because much out of scale. Notice that IP and MF channels are quite similar for most languages.
Compared to nW, nS and nS channels, the deep-language variables channels are the noisiest. The reason seems to be, again, the different distribution of interpunctions. For these channels we have not drawn regression lines because the correlation coefficient is small.
Let us summarize the main results of this section. The channels studied are differently affected by the translation noise. The most accurate channel is the word channel nW, a finding that seems reasonable. Humans seem to express a given meaning with a number of words—i.e. finite strings of abstract signs (characters)—which cannot vary so much even if some languages (Hebrew, Welsh, Basque etc.) do not share, according to scholars, a common ancestor with most other languages. This result seems to be something basic to human processing capabilities.
The number of sentences and their length in words, i.e. PF, can be treated more freely. We know that PF affects readability indices very much, as shown for Italian  , therefore, this variable tends to be better matched to the intended readers, with specific reading ability, not to the original Greek readers of the Roman Empire.
Finally, we observe that, independently of the different channels, the correlation noise is always larger than the regression noise, therefore indicating that every translation tries as much as possible not to be biased, but it cannot avoid being decorrelated, with correlation coefficients which approximately decrease from words, to sentences, to interpunctions and down to the deep-language variables.
Figure 9. Scatterplot between and . The origin represents the ideal channel. Upper: PF channel. Middle: IP channel. Lower: MF channel. The black arcs of circles give contours of equal channel capacity C (Section 6).
Besides the noise-to-signal power ratio, communication channels can be also characterized by the channel capacity, as we discuss in the next section.
6. Channel Capacity
The noise-to-signal power ratio and its universal geometrical representation is not the only interesting way for studying noisy channels. Noisy channels can be also characterized by a single variable, namely the channel capacity or mutual information defined by Shannon  , between the stochastic variables x (input) and y (output), see also  . In the following subsections, firstly we recall the channel capacity of communication theory and define what we mean by “symbol”; secondly, we assess, for the first time, the size of channel capacity obtainable with linguistic variables.
6.1. Channel Capacity According to Communication Theory
According to Shannon  , under some assumptions, the capacity (bits per symbol) of the channel is related directly to the channel signal-to-noise power ratio 1/R, according to:
In our analysis the term “symbol” is defined according to the linguistic variable under study. For example, in the words channel the “symbol” is defined as the number of words per chapter, therefore, the actual values nW of input and output chapters. For example, in Matthew, Chapter 5, the input symbol (Greek) is 823, while the output symbol is 1006 in English, 932 in Italian and 765 in Russian. Therefore, the magnitude of additive noise is 1006 − 823 = 183 in English, 932 − 823 = 109 in Italian and 765 − 823 = −58 in Russian. This noise can be relatively large as it peaks at 22.2% of the input value in the English translation. The signal-to-noise power ratio of this sample is, therefore, (823/183)2 = 20.2 in English and (823/58)2 = 201.3 in Russian, synthetically underling that the Russian translation is closer to Greek than the English translation.
In other words, we do not consider the classical information content of texts according to communication/information theory, which, to a first approximation, is measured by the entropy of letters  , a concept applicable to machine translation but not to human information processing, which is based on words, sentences and interpunctions distribution. Indeed, the short-term memory responds to words not to bits, therefore the use of entropy can be highly misleading in estimating the characteristics of the linguistic channels defined in the present paper (Appendix B).
For a constant , Equation (12) gives the minimum channel capacity if the noise is Gaussian. If the noise is not Gaussian, the actual channel capacity is larger than (12)  .
Of the two noise sources defined in Section 4, the correlation noise and the regression noise, the latter is deterministic (it could be cancelled by dividing the variables of the output language by the corresponding m, if known), but the first can approximately be modelled as Gaussian, Figures 1-6. Therefore, if we assume that both sources of noise are Gaussian, then the channel capacity calculated with Equation (12) is pessimistic. In any case, this is not of concern here because Equation (12) can be used for comparing different translations.
We have already shown contours of constant capacity C (given, of course, by constant ) in Figures 7-9, namely the black arcs of circles. In the origin of the Cartesian coordinates R = 0, therefore and . This last result, valid for the continuous channel assumed in Equation (12), merely means that the channel does not impose any limit to the output information, therefore in this case the mutual information coincides with the input self-information of the Greek texts.
Of the channels studied in Section 5, the words channel nW has the largest channel capacity for most translations. Figure 10 shows the scatterplot between the capacities of nW and nS channels. We notice that the two channels are quite correlated; for Welsh the two capacities are even practically identical. Figure 10 shows also the scatterplot between the capacities of nW and nI channels. The two capacities are practically uncorrelated. In Appendix C we report the scatterplots of the capacities of words channel and sentences channel with the deep-language channels capacities. In all cases, we notice a poor correlation, except partially for the PF channel, therefore evidencing, again, the fact that every translation has its own pattern of interpunctions within a chapter, which determines PF, IP and MF.
Some interesting observations can be done on the mixed scatterplots shown in Figure 11 between IP and nW, nS and IP channels capacities. The correlation between these variables is evident: as IP increases, thus loading more reader’s short-term memory, the channel capacities decrease. In other words, by decreasing this important deep-language variable, IP, channels tend to be closer to the ideal channels of words, sentences and IP itself.
Differently of the word interval IP, the number of words per sentence PF is quite correlated only with its channel capacity, Figure 12. As PF approaches the Greek value (23.07, Table 2), the channel capacity increases. This different behavior compared to Figure 11 where, as IP approaches the Greek value 7.47, IP channel capacity decreases, underlines that IP seems to be more related to how human brain processes texts (short-term memory), regardless of the particular language. In other words, translations do not follow the high Greek IP. On the contrary, PF is more related and matched to the intended readers through the readability index, which does not consider IP  .
In the next subsection we discuss how large is the capacity of linguistic channels.
6.2. Channel Capacity Size
Two questions arise: 1) Are the channel capacities large? 2) How can we assess how large they are? Let us start with studying the sensitivity of the channel capacity to the parameters m and r. Figure 13 shows a universal chart, drawn from
Figure 10. Upper: Scatterplot between the capacities of nW and nS channels. Middle: Scatterplot between the capacities of nW and nI channels. Lower: symbols caption.
Figure 11. Upper: Scatterplot between IP and the capacity of nW channel. Middle: Scatterplot between IP and the capacity of nS channel. Lower: Scatterplot between IP and the capacity of IP channel.
Figure 12. Upper: Scatterplot between PF and the capacity of nW channel. Middle: Scatterplot between PF and the capacity of nS channel. Lower: Scatterplot between PF and the capacity of MF channel.
Figure 13. Upper: Universal chart describing the relationship between the channel capacity C and the slope m, as a function of the correlation coefficient r. For illustration, the values of the nW channel capacity of some translations are also shown. Middle: C/Cmax of the nS channel. Lower: symbols caption.
Equations (12) and (8), which describes the relationship between the channel capacity C and the slope m, as a function of the correlation coefficient r. For illustration, we have also reported the values of the words channel capacity of some translations.
The maxima of C are found from Equation (12) when , which occurs if:
Therefore, from (8) it follows
Consequently, from (12) we get:
Because of (15), in Figure 13 we can notice a very sharp increase only for very high correlation coefficients. In actual translations, however, the capacity can be significantly large, not too far from the maximum value obtainable from Equation (15). In fact, defined the normalized capacity C/Cmax, Figure 13, Figure 14 show how C/Cmax varies. Notice that C/Cmax practically follows the same mathematical function, regardless of the channel (words or sentences) when the correlation coefficient r is about the same for all languages (Table 3). The same result is also found for the interpunctions channel (not shown for brevity). For PF and IP channels (Figure 14) no regularity emerges because of poor correlation coefficients, another sign that these deep-language variables depend more profoundly on the particular translation, not on the language. The MF channel follows the same trend (not shown).
In conclusions, the capacity of nW, nS and nI channels follow very closely the universal chart because of similar high correlation coefficients; on the contrary, the capacity of PF and IP channels is more spread because their correlation coefficients greatly varies from translation to translation.
7. Word Interval and Short-Term Memory
As studied and discussed in  , the number of words per interpunctions, namely the word interval Ip, varies in the same range of the short-term memory capacity—given by the 7 ± 2 Miller’s law  , a range where 95% of all occurrences are found—and is very likely related to it because interpunctions organize small portions of more complex arguments in short chunks of text. Moreover, drawn Ip against the number of words per sentence PF, Ip tends to saturate to a horizontal asymptote as PF increases. In other words, even if sentences get longer, Ip cannot get larger than about the upper limit of Millers’ law (namely 9), because of the constraints imposed by the short-term memory capacity of readers.
Empirically (best-fit) the average value of Ip is related to the average value of PF according to the relationship  :
Figure 14. Upper: C/Cmax of the nS channel. Middle: C/Cmax of the PF channel. Lower: C/Cmax of the IP channel.
where IP∞ gives the horizontal asymptote, and PFo gives the value of PF at which the exponential falls at 1/e of its maximum value. We apply Equation (16) to the NT translations. Because both Ip and PF depend on the translation, we find different constants in Equation (16), listed in Table 4, together with data concerning readability index discussed in Section 8.
Figure 15 shows the scatterplot concerning Greek, Latin and Hebrew. As for the Italian Literature (see Figure 16 of  ), Ip spreads in Miller’s range. Not surprisingly, the ancient readers of these texts had the same short-term memory capacity of modern readers, i.e. they followed Miller’s 7 ± 2 law. This finding is confirmed by the results concerning modern languages for which, however, the spread within Miller’s range can be different from translation to translation. Some translations tend to use shorter values of Ip, as Latin and Hebrew (Figure 15), therefore loading less reader’s short-term memory than other translations do, e.g. Italian, French and English (see asymptote values IP∞ in Table 4). In Appendix D we show more graphical examples.
Figure 16 shows all best-fit models of Table 4 and also the best-fit for Greek, with ±1 standard deviation calculated from the models of Table 4. We see that Miller’s lower bound corresponds to , therefore this value sets approximately a lower bound to the average length of sentences, a result generally valid for all languages considered.
In conclusion, each translation tends to address readers with different reading abilities because small Ip values are better matched to readers with small short-term memory capacity, who, therefore, can handle only short sentences, which correlates well with a large readability index, as we show in the next section.
8. Readability Index
As discussed in  , after an in-depth review based on many references there listed—to which we refer readers for further details—a readability formula gives an index that anyone can calculate directly and easily, so that a writer can sufficiently match text and expected readers. Its “ingredients” are understandable by anyone, because they are interwound with long-lasting writing and reading experience based on characters, words and sentences. A readability formula gives an index based on the same stochastic variables, regardless of the text considered, thus it provides an objective measurement for comparing different texts, or authors. A final objective readability formula—or software-developed methods—is very unlikely to be found or accepted by everyone. On the contrary, instead of absolute readability, readability differences can be more useful and meaningful. The classical readability formulae provide these differences easily and directly.
Table 4. Constants IP∞ and PFo of Equation (16) for each translation. Average and standard deviation of the readability index G for each translation. Slope m and correlation coefficient r of the regression line between G in Greek and G in the other languages.
Figure 15. Upper: IP versus PF in Greek. Middle: IP versus PF in Latin. Lower: IP versus PF in Hebrew. Miller’s bounds: magenta lines.
Figure 16. Upper: IP versus PF: best-fit from Table 4. Greek: red line; Latin: blue line; Esperanto: green line. Lower: IP versus PF: Greek, red line; ±1 standard deviation calculated from the relationships of Table 4. Miller’s bounds: magenta lines.
In particular, the last observation can justify our present proposal to adopt a readability formula that can be used for comparing texts of different languages because most of them do not have a readability formula, and few adapt some formulae studied for English texts to their texts   . The proposed formula, of course, does not exclude using other readability formulae—e.g., the large choice for English  —but it allows to compare, on the same ground, the readability of texts written in different languages.
For this purpose, we propose to adopt, as a calque, the readability formula used for Italian, amply studied in  , known with the acronym GULPEASE  , and given by:
In Equation (17a) p is the total number of words in the text considered, c is the number of characters contained in the p words, f is the number of sentences contained in the p words.
Notice that Equation (17a), as all readability formulae found in the literature, does not contain any reference to interpunctions, therefore it does not consider the very important parameter linked to the short-term memory capacity, namely the word interval IP.
G can be interpreted as a readability index by considering the number of years of school attended in Italy’s school system, as shown in Figure 17. The larger G, the more readable the text. By noting that ; , G can be written as:
In  we have shown that the term (loosely referred to as the semantic term) varies very little from text to text and across centuries, while the term (loosely referred to as the syntactic term) varies very much and, in practice, determines the readability index. We propose to use this formula also for the other languages listed in Table 1, by scaling the constant 10 of the semantic term according to the ratio between the average number of characters per word in Italian, , and the average number of characters per word in another language, e.g., Greek , see Table 1. The rationale for this choice is that CP is typical of a language and, if not scaled, would bias G, without really quantifying the change in reading difficulty of readers, who are accustomed to reading in their language shorter or longer words, on the average, than those found in Italian. In other words, this scaling avoids changing G for the only reason that a language has, on the average, words shorter or longer than Italian.
Figure 17. Readability index G versus school years in Italy, with regions of different reading difficulty.
On the other hand, we maintain the constant 300 because PF depends significantly on reader’s reading ability and short-term memory capacity  , in other words on translator’s choice. Therefore, the formula takes already care of the reader to whom the translation is addressed. Finally, notice that the constant 89 sets just the ordinate scale, therefore it has not impact on comparisons.
Therefore, the readability formula of a text written in a language with average characters per word is given by:
By using Equation (18), we force the average value of GC to be equal to that found in Italian, namely . For example (see Table 1), for Greek CP is multiplied by 10 × 4.48/4.86 = 9.22, instead of 10, for Finnish (longer words) CP is multiplied by 10 × 4.48/6.22 = 7.20 and for Haitian (shorter words) for 10 × 4.48/3.37 = 13.29.
Figure 18 shows GC and GF versus G, for Greek, Latin and for all languages, with some other examples shown in Appendix E. We can notice that GF largely determines G, compared to GC. The regression line relating GF to G, drawn in Figure 18, is given by . The correlation coefficient is 0.720, therefore 0.7202 = 0.518 is the fraction of the variance of GF due to Equation (19). The remaining fraction 1 − 0.518 = 0.482 is due to the values scattered around the line. On the contrary, the correlation coefficient between GC and G of the regression line also drawn in Figure 18, is −0.074, practically zero, therefore confirming that G is mainly determined by GF.
Figure 19 shows the scatterplot and the regression lines between the values of G in a translation and those in Greek, and the histogram of the difference (error) between the actual values and the regression line values. Table 4 reports average values and standard deviations for all translations, together with the slope and correlation coefficient of the regression lines shown in Figure 19. As we can notice, each translation sets different readability values for their intended readers, in a large spread. In other words, as mentioned above, the number of words per sentence PF distinguishes significantly the translations. From Table 4 we notice that Welsh, Albanian and Greek have the lowest average G (57 - 58), making them the least readable translations, while Hebrew (69.64), followed by Polish and Czech, are the most readable translations. Now, the texts of these two extremes, to be “easy” to read according to Figure 17, require 8 years of equivalent Italian schooling for G ≈ 57 and 6.5 years for G ≈ 70. They would become “difficult”, “very difficult” or even “almost unintelligible” to readers with very few years of schooling.
In conclusion, Equation (18) can be useful for comparing the readability of texts (not necessarily translations) written in different languages because of a “common ground” for interpreting them, namely Figure 17, which can be used as a first guide to assess readability according to the years of schooling.
Figure 18. Upper: GC (blue) and GF (red) versus G in Greek. Middle: In Latin. Lower: GC (blue) and GF (cyan) versus G in all languages.
Figure 19. Left: Scatterplot between G in Greek and G in the other translations listed in Table 1, together with the regression lines (1). The black line is the line y = x. The red line is the regression line between Latin and Greek. Right: Histogram of the difference (“error”) between the actual number of words in a given translation and the number of words in that translation calculated from the regression line, for a given Greek value.
9. Different NT Translations within the Same Language
If we considered different translations of the NT within the same language, do the statistics of linguistic parameters change? In other words, different versions of the NT in the same language are very similar, or do they differ from each other, maybe as much as do NT versions belonging to different languages? Indeed, for some languages there is a huge number of distinct translations: we have counted at least 60 English and 20 Spanish versions39, which means that at least 60 different audiences have been considered in the English case and 20 in the Spanish case, which is really remarkable.
In this section, just for a very preliminary investigation, we report the average values of the most important linguistic parameters concerning 6 languages and 18 distinct versions, 3 per language, of Matthew’s gospel, namely English, German, Polish, Russian, Spanish and Swedish, Table 5.
In Table 5 we notice that even the number of words and sentences can change within the same language, in versions sometimes labelled as “easy-to-read”, or “modern” language etc. In English, for example, it is clear that St. James’ version is the most difficult to read (G = 57.2) but it loads less reader’s short-term memory ( ) than the Contemporary English Edition (CEV) ( ). In German, the versions tend be much closer, even Luther’s, so that they seem to address very similar audiences.
The spread of the values within the same language can be a sizeable fraction of the overall range calculable from Table 1 and Table 2. For example, for English, the spread in W 8% is to be compared to the overall (Table 1) 61.9%; for S, the spread 75.3% is to be compared to 106.9%. Therefore, an English translation can be confused, mathematically, with the translation in another language.
Table 5. Matthew’s Gospel (28 chapters). Total number of words W and sentences S; average number of words per sentence PF, average number of words per interpunction IP, average number of interpunctions per sentence MF and readability index G for the indicated translations. The source of the unnoted translations is reported in Table 1. The range (%) is defined as the ratio between the difference between maximum and minimum values (range) and the Greek value, multiplied by 100.
In conclusion, it is clear that each NT different translation within the same language addresses different audiences, as it can be noticed from the range of the linguistic parameters, but, more interestingly, a translation in a language can be confused, mathematically, with the translation in another language. In other words, this preliminary sampling seems to confirm that language does not play the only role in translation, but that this role has to be shared mainly with reader’s reading ability (i.e., PF, G) and short-term memory (IP).
10. Literary Text Translations: Treasure Island
Another question arises: Are the above results only applicable to NT translations, or can they be also applied to translations of literary texts, such as novels? In this section we show, preliminarily with just one example, that novels tend to show similar statistics, but with more constraints on the translations than those found in the NT translations.
We have done the following exercise. We have studied the translations of Treasure Island (by R.L. Stevenson) from the original English text to Italian, French and German, by considering each chapter as text unit (34 chapters).
The comparison to the NT translation must be done, of course, by starting first with the English version of the NT and then studying its translations. Only after this study, we can consider Treasure Island as input text and calculate the same statistics. Therefore, we take the English NT as the reference (input) language and Italian, French and German as output languages, as if these NT versions were obtained by translating the English text, not the original Greek text. This hypothesis assumes, of course, that if the Italian, French and German translators had started from the English version of the NT, they would have ended up with the same text translated from Greek. This might be reasonable, although not directly controllable. We show below that the assumption can be justified. Table 6 reports the statistics concerning Treasure Island original text and its translations.
Table 7, Table 8 report the results on channel capacity obtained by considering English as the original NT text, while Table 9, Table 10 report the results on channel capacity concerning the direct translations of Treasure Island to Italian, French and German. We notice that the Italian translation uses the least number of words and sentences, and has also the highest correlation coefficients for all variables; therefore, its channels have also the largest capacities. In other words, the Italian translation is, mathematically, the closest to the English text, which appears surprising if we consider the different linguistic family.
Let us examine the single channels. In the words channel nW we notice that the slope m and correlation coefficient r of the three languages are about the same in both cases (Table 7 and Table 9), therefore our hypothesis, mentioned in the previous paragraph, on the translation of the English NT to the other languages is justified. More interesting, the channel capacity is about the same in both cases and very close to the maxima given by Equation (15).
Table 6. (a) Treasure Island statistics. Key: total; average (standard deviation); slope m and correlation coefficient r between the translation and the original English text. (b) Treasure Island statistics. Average (standard deviation); slope m and correlation coefficient r between the translation and the original English text.
Table 7. NT statistics on channel capacity (bits per symbol): Translations from English to Italian, French and German; nW and nS channels; n.a. stands for “not applicable”.
Table 8. NT statistics on channel capacity (bits per symbol): Translations from English to Italian, French and German; PF and IP channels.
Table 9. Treasure Island statistics on channel capacity (bits per symbol): Translations from English to Italian, French and German; nW and nS channels.
Table 10. Treasure Island statistics on channel capacity (bits per symbol): Translations from English to Italian, French and German; PF and IP channels.
In the sentences channel nS, on the contrary, m and r of the three languages are significantly different in the two cases. This is, of course, confirmed by the different capacities. This trend is further enhanced in the PF and IP channels (Table 8 and Table 10), another evidence that, as we pass from words to sentences, to PF and to IP (or MF), each translation has quite different ways of using interpunctions for their intended readers, therefore matching more reader’s reading ability and short-term memory capacity.
Finally, it is very interesting to notice in nW and nS channels (Table 7 and Table 9), that the NT translation, mathematically, is more accurate and respectful of the original Greek text than the translation of Treasure Island. On the contrary, in PF and IP channels, Treasure Island translations are more accurate than NT translations because, very likely, all dialogues must be strictly respected in any translation.
In conclusion, the statistics of words and sentences of a novel seems to be similar to those found in the NT translations. For example, the ranking of the number of sentences, from minimum to maximum, is the same both in the NT and in the Treasure Island translations: Italian, English, French, German. It is almost the same for words, namely, Italian, German, English, French for the NT translations; Italian, English, German, French for Treasure Island translations. The translation of a novel seems to be more respectful of the original text than the NT translations for what concerns PF and IP, mainly because the translators must consider the presence of dialogues, whose fraction of the total text can be, however, largely variable within novels, according to author’s style etc. Because these results refer to just one particular case, they should be further assessed with other literary (novels) translations, a study well beyond the aim of this paper.
11. A General Theory of Translation: From Any Language to Any Other Language
It is possible to extend the statistical theory outlined in the previous sections in such a way to arrive at a general theory of translation applicable to any alphabetical language. By knowing the statistics of the various linguistic variables studied in the previous sections—obtained in the translation channel from Greek to other languages—it is possible as we show below, to estimate the statistics obtainable in the translation channel from any language to any other language of those listed in Table 1. The necessary data for extending the theory are those reported in Table 3, Table 4.
The theory can also be applied to channels of texts belonging to the same language (not showing for brevity): for example, the channel that transforms words into sentences in a text can be compared to the channel that transforms words into sentences in a different text, both written in the same language. This comparison can be useful to study how texts of the same author may have changed over time, or to compare texts of different authors.
Figure 20 shows, schematically, the block diagram of the direct channels from language Yk ( , Greek in Figure 20) to language Yj (channel ; ) and the flow chart of the reverse channels, from any language Yj to the same language Yk (channels , Greek in Figure 20). In other words, in the direct channel the translation is from a single language (Greek, or Latin, or Esperanto etc.) to another language, therefore, if the starting language is Greek, the translations are those discussed in the previous sections. In the reverse channel the output language is the same for all translations, therefore if the output language is Greek, the translations are from input languages Latin, Esperanto etc. So far, we have studied only one possible direct channel (from Greek to the other languages) and none of the reverse channels. In this section we study all possible direct and reverse channels for proposing a statistical general theory of translation.
We first calculate the noise-to-signal power ratio obtainable in the general theory from the data reported in Table 3, Table 4. After, we show that direct and reverse channels concerning any couple of languages are not symmetric.
Figure 20. Left: Direct Channel: translation from a language (common input) to all other languages (output). Right: Reverse channel: translation from all languages (different input) to one language (common output).
11.1. Noise-to-Signal Power Ratio
Let us consider two languages Yk and Yj, and let us refer to Greek explicitly as language X. With reference to the ideal channel whose output is X (self-translation), we have found that the same variable of languages k and j are related by linear relationships with the corresponding Greek variable x:
In Equation (19) nk and nj are the noise sources added to the regression lines . The slope m is the source of the regression noise—because —the correlation coefficient r is the source of the correlation noise—because —as discussed in Section 4. For example, in the words channel between Greek and English, and (Table 3).
Let us refer to the 36 possible translations from language —including Greek—to language j. In other words, language k plays now the role played before by Greek. By eliminating x, i.e. Greek, from Equation (19), we get the linear relationship between the input language k and the output language j :
Compared to the reference language yk, the slope is given by:
Therefore, the regression noise-to-signal power ratio, Rm, of the channel is readily found, according to Equation (3), as:
Notice that Rm depends only the known slopes of the translations from Greek (Table 3).
Let us calculate the correlation noise-to-signal power ratio, Rr. To apply Equation (6), we must insert the unknown correlation coefficient between yj and yk due, of course, to the two noise sources in Equation (20). We can calculate its value from the correlation coefficients rk and rj reported in Table 3. First, we notice that the total noise added to the regression line relating the output variable yj to the input variable yk is given by:
As we can see from (22), the two noise sources are correlated, with unknown correlation coefficient r. Let and be the single noise powers, then the total noise power due to is given by (  , p.127):
Equation (24) has a geometric representation  . It can be seen as an application of the law of cosine to the vectors and snj, which form the angle between them. By applying this representation also to the vectors snk and sx (Greek) forming the angle and to the vectors snj and sx, forming the angle , the angle is given by , therefore r is given by:
Now, by Equation (6), the correlation noise-to-signal power ratio in the translation channel from language k to language j is given by:
In conclusion, the total noise-to-signal power ratio in the translation channel from language k to language j, for a given stochastic variable, is given by:
Figures 21-23 show the geometrical representation of Rm and Rr in the first Cartesian quadrant as discussed in Section 4, for all linguistic variables. Notice that the regression lines from Greek to other languages, drawn from Figure 7 and Figure 8, are approximately upper bounds to the general theory in the words, sentences and interpunctions channels. Moreover, also for the other variables, Greek direct and reverse channels are noisier than other languages. In other words, modern languages and Latin are statistically closer to each other than to Greek. We also notice two different features: the words nW, sentences nS and interpunctions nI channels are mostly dominated by Rm, because for most languages , i.e. . This result underlines, again, the greater freedom used in these channels in sizing the number of words, sentences and interpunctions, whose average values may vary substantially (Table 1 and Table 2), while keeping very high correlation coefficients (Table 3). In the words channel for example , and . On the contrary, in the channels concerning the deep-language variables PF, IP (with some exceptions), MF, and the readability index G, we mostly observe , i.e. . In the PF channel, for example, in Table 3 we read and , with a significant impact on the noise-to-signal power ratio.
From Figures 21-23 we can calculate direct and reverse channels capacities. Figure 24 shows the scatterplots between Ckj (direct channel) and Cjk (reverse channel) for some languages in the words channel nW.
Figure 25 shows the scatterplot of the averages of all languages for the words channel. Notice that the perfect even symmetry around the 45˚ line is due to how the table from which the data are taken is built. However, the interesting point is the very small data scattering around the 45˚ line, which yields a small . Similar scatterplots are also obtained for the other channels (see Appendix F).
These scatterplots show that direct and reverse channels are not very different. Although , as we establish in the next subsection, they are, however, very similar for all variables and languages, regardless of their absolute value. In other words, a common underlying structure emerges from considering channel capacities, which seems to govern textual/verbal communication channels defined here, as we can see in Figure 25. In Appendix F we show results for the other linguistic channels.
In the next subsection we show that .
Figure 21. Upper: Scatterplot between and in nw channels. The origin represents the ideal channel. For each language 36 identical symbols are shown, because it is the common output of the translations from the remaining 36 languages. The regression line is redrawn from Figure 7. Lower: symbols caption.
Figure 22. Scatterplot between and . Upper: nS channels. Middle: nI channels. Lower: PF channels.
Figure 23. Scatterplot between and . Upper: IP channels. Middle: MF channels. Lower: G channels.
Figure 24. nW channels. Upper: Scatterplot between direct channel capacity (from … to) and the reverse channel capacity (to … from) for Greek. Middle: (to … from) for Latin. Lower: (to … from) for English.
Figure 25. Upper: Scatterplot between direct channel capacity (from … to) and the reverse channel capacity (to … from) for all languages, nW channel. The origin represents the ideal channel. The large red symbol is the overall average value. Lower: symbols caption.
11.2. Direct and Reverse Channels Are Not Symmetric
Are direct and reverse channels concerning a couple of languages, e.g. translations from Greek to English and from English to Greek, symmetric? We can answer to this question by considering the channel capacity.
The specific question becomes now: Is the capacity Ckj (bits per symbol) of the (direct) channel from language k to language j, equal to the capacity Cjk of the (reverse) channel from language j to language k? In other words, can the two languages be exchanged in the input-output relationship without changing the statistical characteristics of the translation channel? According to communication theory  , this happens in telecommunication channels affected by additive white Gaussian noise, but this is not true in translation channels, as we show next.
We establish now that any couple of direct and reverse channels are not symmetric, unless and , a case never found. The reason for this asymmetry is because the noise added to any ideal (self-translation) channel to get the text in another language is statistically always different.
According to Equations (12) and (27), and recalling that , the two channel capacities are equal if:
Let . After standard algebraic passages, we get following solution for the unknown correlation coefficient:
To yield real values, the radicand in Equation (29) must be positive, and to yield a correlation coefficient must be less than 1, therefore we get the range:
The lower limit in (31) is always satisfied because ; the upper limit gives:
The inequality (31) is never satisfied, unless x = 1, therefore only if , in which case, from Equation (29) . In other words, in translation channels . Only in the ideal channel (self-translation) . In the next subsection we assess how large the capacity difference is, in other words, how asymmetric direct and reverse channels are.
11.3. Direct and Reverse Channels Capacity Difference
Figures 26-28 show main statistics, for all couples of direct and reverse channels—and for the same linguistic variable—for each language, by drawing, as a function of , the standard deviation , the root mean square (RMS) value (bits per symbol) and its relative (normalized) value RMS (%)—the latter obtained by dividing RMS of by the average direct channel capacity . Table 11 reports averages.
Several interesting observations can be done. First, we notice that , and RMS vary in about the same range. The average value, for example is approximately always in the range (bits per symbol), regardless of the variable. Only Greek is clearly distinct from the other languages, with larger values. The standard deviation is even more stable as (bits per symbol) in most cases. Only RMS has larger variations, between 0.2 and 0.6 (bits per symbol). As already noticed, Latin and modern languages are closer to each other than to Greek.
On the contrary, the variations of the normalized RMS (%) are significantly different. In the words channel RMS varies between 10% and 30%, and similarly for the sentences channel (10% to 40%) and interpunctions channel (10% to 20%); on the contrary RMS varies in a larger range in the deep-language channels PF, IP and MF, up to 300%.
We can rank the channels according to the normalized RMS (%). Table 12 shows its overall average. The least variable channel is the readability channel, followed by the interpunction channel, the words and sentences channels, then the deep-language channels, therefore confirming that these latter variables are treated by translators with fewer constraints than the number of words or sentences, unless dialogues have to be respected, as seen with Treasure Island translations. In other words, in the NT translations differences are mainly due to specific linguistic variables, not to the particular language.
Figure 26. Standard deviation, RMS and normalized RMS (%) values versus average capacity difference of the reverse and direct nW channels.
Figure 27. Standard deviation, RMS and normalized RMS (%) values versus the average capacity difference of the reverse and direct channels. Upper: nS channels. Middle: nI channels. Lower: PF channels.
Figure 28. Standard deviation, RMS and normalized RMS (%) values versus the average capacity difference of the reverse and direct channels. Upper: IP channels. Middle: MF channels. Lower: G channels.
Table 11. Direct and reverse channels average statistics.
Table 12. Channels ranking according to the overall normalized RMS (%).
We have proposed a unifying statistical theory of translation, based on communication theory, which involves linguistic stochastic variables, some of which are not considered by scholars. Its main mathematical characteristics have emerged by studying the translation of most NT books.
When a text written in a language is translated into another language, all linguistic variables do numerically change. To study these apparently chaotic data we have characterized any translation as a complex communication channel affected by “noise”, studied according to Communication Theory applied for the first time to this channel. The new theory deals with aspects of languages more complex than those currently considered in machine translations. The input language is the “signal”, the output language is a “replica” of the input language, but largely perturbed by noise. For the output language, this noise is indispensable for conveying the meaning of the input language to its readers
All channels studied are differently affected by translation noise. The more accurate channel is the word channel nW, a finding that seems reasonable. It emerges that humans seem to express a given meaning with a number of words—i.e. finite strings of abstract signs (characters)—which cannot vary so much even if some languages do not share a common ancestor. On the contrary, the number of sentences and especially their length in words, i.e. PF, are treated more freely by translators. PF, affects readability indices very much, therefore this variable tends to be better matched to the intended readers, with specific reading ability.
Independently of the different parallel channels (one for each variable), the correlation noise (due to a regression line slope ) is mostly larger than the regression noise (due to a regression correlation coefficient ), therefore indicating that every translation tries as much as possible to be not biased, but it cannot avoid being decorrelated, with correlation coefficients which approximately decrease from words, to sentences, to interpunctions and down to the deep-language variables PF, IP, MF and CP.
Different translations of the NT within the same language, mathematically, can be quite different and they can even seem to belong to different languages. In other words, in language translations differences are mainly due to specific linguistic variables, not to the particular language. Clearly, they are matched to different audiences, an aspect not explicitly considered in machine translations.
Besides the noise-to-signal power ratio, communication channels can be also characterized by the channel capacity (bits per symbol, the latter suitably defined). This parameter can be relatively large, very close to the maximum value obtainable, for nW, nS and nI channels, less for PF, IP, MF channels. We have found that the NT translations are similar to translations of literary texts, as shown for the novel Treasure Island translated from English to Italian, French and German for nW, nS and nI channels. On the contrary, the translation of novels seems to set more stringent constraints on the translators for PF, and IP, channels because dialogues must be strictly maintained. A topic to be further researched.
The number of words per interpunctions Ip varies in the same range of the short-term memory capacity. Drawn against the number of words per sentence PF, Ip tends to saturate to a horizontal asymptote as PF increases because, even though sentences get longer, Ip cannot get larger than about the upper limit of Millers’ law, because of the constraints imposed by readers’ short-term memory capacity.
We have defined a formula for the readability index of any alphabetical languages, based on a calque of the readability formula used in Italian, both for providing it to languages that have none, and also for estimating, on common grounds, the readability of texts belonging to different languages/translations.
Finally, we have extended the statistical theory outlined before to a general theory of translation applicable to any alphabetical language, even to texts written in the same language. The general theory shows that direct and reverse channels are not symmetric.
In conclusion, a common underlying statistical structure, governing human textual/verbal communication channel—not defeated by the mythical biblical Tower of Babel—seems to emerge from the findings. The main result is that the statistical and communication characteristics of a text, and its translations into other languages, seem to depend not only on the particular language—mainly through the number of words and sentences—but also on the particular translation because the text is very much characterized by the reading abilities and short-term memory capacity of the intended readers, aspects not explicitly considered in machine translations. These conclusions seem to be everlasting because applicable also to ancient Roman and Greek readers. A future research should extend the general theory to non-alphabetical languages.
The author wishes to warmly thank all those scholars who, with continuous great care and dedication, keep online the texts of the Bible in many languages, for the benefit of everyone.
Appendix A. List of mathematical symbols.
Appendix B. Entropy and human information-processing
The short-term memory capacity follows Miller’s 7 ± 2 law  . Notice, however, that the range of Miller’s law does not refer to bits, but to a “buffer” in which are stored “chunks” of information of the type that can be “compressed”, as are sequences of words or sequences of numbers (see  and the references there cited). In other words, humans process information differently from translation machines. As a consequence, the entropy of a language may be misleading in studying the linguistic channels defined in this paper. This point is now illustrated with an example.
Let us consider the total number of words W (Table 1) of translations into English, French, German, Italian, Russian and Spanish. The entropy of a language referred to single letters is termed F1 by Shannon  . Estimated values of F1 for the mentioned languages are reported in Table B1.
Now the total number of information bits produced
Table B1 reports the values calculated from Equation (B.1). It is clear that each language/translation has different number of words and bits. Table B2 reports the ranking of languages (from minimum to maximum) according to the number of words (left column) or the number of bits (right column). The first column is what humans perceive; the second column is what machines process. The two lists are identical only for the first three lines—Russian, Greek and Italian—then they diverge. Now, the short-term memory responds to words not to bits, therefore the use of entropy can be highly misleading (e.g., see German, English, French and Spanish) in estimating quantities and the characteristics of the linguistic channels defined in this paper.
Table B1. Entropy F1 (bits per letter), total number of words W, average number of letters (characters) per word CP, difference between the number of bits in the indicated language and in Greek . Source of F1 data:  for Greek;  for French, German, Italian and Spanish;  for English;  for Russian.
Table B2. Ranking (from minimum to maximum).
Appendix C. Scatterplots of average channel capacity for the indicated channels. Languages are distinguished according to the symbols listed below.
Appendix D. Scatterplots of IP versus PF for the indicated languages. The horizontal magenta lines are Miller’s bound 5 and 9.
Appendix E. Scatterplots of GC (blue) and GF (red) versus G for the indicated languages.
Appendix F. Scatterplots between direct channel capacity (from … to) and the reverse channel capacity (to … from) for all languages. Red circles are the average values of each translation. The red symbol is the overall average value.
 Barbancon, F., Evans, S., Nakhleh, L., Ringe, D. and Warnow, T. (2013) An Experimental Study Comparing Linguistic Phylogenetic Reconstruction Methods. Diachronica, 30, 143-170.
 Collins-Thompson, K. (2014) Computational Assessment of Text Readability: A Survey of Past, in Present and Future Research, Recent Advances in Automatic Readability Assessment and Text Simplification, ITL. International Journal of Applied Linguistics, 165, 97-135.
 Bakker, D., Muller, A., Velupillai, V., Wichmann, S., Brown, C.H., Brown, P., Egorov, D., Mailhammer, R., Grant, A. and Holman, E.W. (2009) Adding Typology to Lexicostatistics: A Combined Approach to Language Classification. Linguistic Typology, 13, 169-181.
 Petroni, F. and Serva, M. (2010) Measures of Lexical Distance between Languages. Physica A: Statistical Mechanics and Its Applications, 389, 2280-2283.
 Gómez-Adorno, E, Sidorov, G., Pinto, D. Vilarino D. and Gelbukh, A. (2016) Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs. Sensors, 16, 19 p.
 Carling, G., Larsson, F., Cathcart, C., Johansson, N., Holmer, A., Round, E. and Verhoeven, R. (2018) Diachronic Atlas of Comparative Linguistics (DiACL)—A Database for Ancient Language Typology. PLoS ONE, 13, e0205313.
 Gao, Y., Liang, W., Shi, Y. and Huang, Q. (2014) Comparison of Directed and Weighted Co-Occurrence Networks of Six Languages. Physica A: Statistical Mechanics and Its Applications, 393, 579-589.
 Pichel, J.R., Gamallo, P. and Alegria, I. (2019) Measuring Diachronic Language Distance Using Perplexity: Application to English, Portuguese, and Spanish. Natural Language Engineering, 26, 433-454.
 Brown, P.F., Cocke, J., Della Pietra, A., Della Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L. and Roossin, P.S. (1990) A Statistical Approach to Machine Translation. Computational Linguistic, 16, 79-85.
 Koehn, F., Och, F.J. and Marcu, D. (2003) Statistical Phrase-Based Translation. Proceedings of HLT-NAACL 2003, Main Papers, Edmonton, 27 May-1 June 2003, 48-54.
 Michael Carl, M. and Schaeffer, M. (2017) Sketch of a Noisy Channel Model for the Translation Process. In: Hansen-Schirra, S., Czulo, O. and Hofmann, S., Eds., Empirical Modelling of Translation and Interpreting, Language Science Press, Berlin, 71-116.
 Miller, G.A. (1955) The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. Psychological Review, 101, 343-352.
 Matricciani, E. (2019) Deep Language Statistics of Italian throughout Seven Centuries of Literature and Empirical Connections with Miller’s 7 ± 2 Law and Short-Term Memory. Open Journal of Statistics, 9, 373-406.
 Matricciani, E. (2009) An Optimum Design of Deep-Space Downlinks Affected by Tropospheric Attenuation. International Journal of Satellite Communication and Networking, 27, 312-329.
 De Caro, L., Giannini, C., Lassandro, R., Scattarella, F., Sibillano, T., Matricciani, E. and Fanti, G. (2019) X-Ray Dating of Ancient Linen Fabrics. Heritage, 2, 2763-2783.
 Francois, T. (2014) An Analysis of a French as Foreign Language Corpus for Readability Assessment. Proceedings of the 3rd Workshop on NLP for CALL, NEALT, Proceedings 107, Series 22, Linkoping, 13-32.