Decomposition of the Sum of Cubes, the Sum Raised to the Power of Four and Codeviance

Show more

1. Introduction

In case, a collective unit can be split in groups, decomposition of deviance in two parts is quite known in statistics (Girone, 2009) [1]: the sum of partial deviances and the one of standard deviations between partial and general averages rated by an high number of groups (Mickey et al., 2004) [2]. The purpose of this paper is to achieve similar decomposition formulas as to the sum of deviation cubes, the sum of deviation raised to the power of four and codeviance. In this regard, we indicate by k the group index and i the index of a statistical unit, by r the number of groups, N is the high number of a collective unit, N_{k} is the high number of a k-group and by (x_{ki}, y_{ki}) the i-observation of two types in a k-group.

2. Decomposition of a Sum of Deviation Cubes

The averages, deviances and deviation cubed sums of X-type groups times $k=1,2,\cdots ,r$ are

${\stackrel{\xaf}{x}}_{k}=\frac{{\displaystyle {\sum}_{i=1}^{{N}_{k}}{x}_{ki}}}{{N}_{k}}$ (1)

$Dev\left({X}_{k}\right)={\displaystyle {\sum}_{i=1}^{{N}_{k}}{\left({x}_{ki}-{\stackrel{\xaf}{x}}_{k}\right)}^{2}}$ (2)

$Sc\left({X}_{k}\right)={\displaystyle {\sum}_{i=1}^{{N}_{k}}{\left({x}_{ki}-{\stackrel{\xaf}{x}}_{k}\right)}^{3}}$ (3)

General average, general deviance and general sum of deviation cubes are

$\stackrel{\xaf}{x}=\frac{{\displaystyle {\sum}_{k=1}^{r}{\displaystyle {\sum}_{i=1}^{{N}_{k}}{x}_{ki}}}}{N}$ (4)

$Dev\left(X\right)=\frac{{\displaystyle {\sum}_{k=1}^{r}{\displaystyle {\sum}_{i=1}^{{N}_{k}}{\left({x}_{ki}-\stackrel{\xaf}{x}\right)}^{2}}}}{N}$ (5)

$Sc\left(X\right)=\frac{{\displaystyle {\sum}_{k=1}^{r}{\displaystyle {\sum}_{i=1}^{{N}_{k}}{\left({x}_{ki}-\stackrel{\xaf}{x}\right)}^{3}}}}{N}$ (6)

Similarly, to what is done for deviance decomposition, we start from the formula of a general sum of deviation cubes: by subtracting and adding the average of k-group within brackets

$Sc\left(X\right)={\displaystyle {\sum}_{k=1}^{r}{\displaystyle {\sum}_{i=1}^{{N}_{k}}{\left[\left({x}_{ki}-{\stackrel{\xaf}{x}}_{k}\right)+\left({\stackrel{\xaf}{x}}_{k}-\stackrel{\xaf}{x}\right)\right]}^{3}}}$ (7)

calculating the cube and simplifying it, the outcome is

$Sc\left(X\right)={\displaystyle {\sum}_{k=1}^{r}{\displaystyle {\sum}_{i=1}^{{N}_{k}}{\left({x}_{ki}-{\stackrel{\xaf}{x}}_{k}\right)}^{3}}}+3{\displaystyle {\sum}_{k=1}^{r}\left({\stackrel{\xaf}{x}}_{k}-\stackrel{\xaf}{x}\right)Dev\left({X}_{k}\right)}+{\displaystyle {\sum}_{k=1}^{r}{\left({\stackrel{\xaf}{x}}_{k}-\stackrel{\xaf}{x}\right)}^{3}{N}_{k}}$ (8)

that is,

$SC\left(X\right)={\displaystyle {\sum}_{k=1}^{r}Sc\left({X}_{k}\right)}+3{\displaystyle {\sum}_{k=1}^{r}\left({\stackrel{\xaf}{x}}_{k}-\stackrel{\xaf}{x}\right)Dev\left({X}_{k}\right)}+{\displaystyle {\sum}_{k=1}^{r}{\left({\stackrel{\xaf}{x}}_{k}-\stackrel{\xaf}{x}\right)}^{3}{N}_{k}}$ (9)

which shows that a general sum of deviation cubes is primarily equal to the sum of deviation cube partial sums plus the sum of deviations between partial averages and general average, rated by partial deviances, and even more the sum of deviation cubes between partial averages and general average rated by an high number of groups. In other words, the sum of deviation cubes, apart from the inside, is completed by two components depending on differences between partial averages and general average, on deviances and an high number of groups.

It is hardly necessary to emphasize that if partial averages are all mutually equal, the last two parts cancel each other out. Therefore, the general sum of deviation cubes is equal to the sum of partial sums in deviation cubes.

Other special cases of similar partial deviances are equally interesting as well as high numbers of similar groups.

The above formula is simple only in the case of two groups:

$\begin{array}{c}SC\left(X\right)={\displaystyle {\sum}_{k=1}^{2}SC\left({X}_{k}\right)}+\frac{3\left({\stackrel{\xaf}{x}}_{1}-{\stackrel{\xaf}{x}}_{2}\right)\left({\sigma}_{1}^{2}-{\sigma}_{2}^{2}\right){N}_{1}{N}_{2}}{{N}_{1}+{N}_{2}}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+\frac{{\left({\stackrel{\xaf}{x}}_{1}-{\stackrel{\xaf}{x}}_{2}\right)}^{3}\left({N}_{2}-{N}_{1}\right){N}_{1}{N}_{2}}{\left({N}_{1}+{N}_{2}\right)}\end{array}$ (10)

where ${\sigma}_{1}^{2}$ and ${\sigma}_{2}^{2}$ are variables of the two groups. The second addend is positive (negative) if averages and variances are concordant (discordant). Instead, third addend is positive (negative) if averages and high numbers of the two groups are discordant (concordant).

3. Decompositon of the Sum of Deviation Raised to the Power of Four

The sums of deviation raised to the power of four for X-type groups, that is, the partial sums of deviation fourth exponents as to $k=1,2,\cdots ,r$, are

$Sq\left({X}_{k}\right)={\displaystyle {\sum}_{i=1}^{{N}_{k}}{\left({x}_{ki}-{\stackrel{\xaf}{x}}_{k}\right)}^{4}}$ (11)

The general sum of the deviation raised to the power of four is

$Sq\left({X}_{k}\right)={\displaystyle {\sum}_{k=1}^{r}{\displaystyle {\sum}_{i=1}^{{N}_{k}}{\left({x}_{ki}-{\stackrel{\xaf}{x}}_{k}\right)}^{4}}}$ (12)

Similarly, to what has been done in previous paragraph, by subtracting and adding the average of k-group within brackets, in the general sum of deviation raised to the power of four, the outcome is

$Sq\left(X\right)={\displaystyle {\sum}_{k=1}^{r}{\displaystyle {\sum}_{i=1}^{{N}_{k}}{\left[\left({x}_{ki}-{\stackrel{\xaf}{x}}_{k}\right)+\left({\stackrel{\xaf}{x}}_{k}-\stackrel{\xaf}{x}\right)\right]}^{4}}}$ (13)

Calculating to the power of four and simplifying it, it comes out

$\begin{array}{c}Sq\left(X\right)={\displaystyle {\sum}_{k=1}^{r}{\displaystyle {\sum}_{i=1}^{{N}_{k}}{\left({x}_{ki}-{\stackrel{\xaf}{x}}_{k}\right)}^{4}}}+4{\displaystyle {\sum}_{k=1}^{r}\left({\stackrel{\xaf}{x}}_{k}-\stackrel{\xaf}{x}\right)Sc\left({X}_{k}\right)}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+6{\displaystyle {\sum}_{k=1}^{r}{\left({\stackrel{\xaf}{x}}_{k}-\stackrel{\xaf}{x}\right)}^{2}Dev\left(X\right)}+{\displaystyle {\sum}_{k=1}^{r}{\left({\stackrel{\xaf}{x}}_{k}-\stackrel{\xaf}{x}\right)}^{4}{N}_{k}}\end{array}$ (14)

that is,

$\begin{array}{c}Sq\left(X\right)={\displaystyle {\sum}_{k=1}^{r}Sq\left({X}_{k}\right)}+4{\displaystyle {\sum}_{k=1}^{r}\left({\stackrel{\xaf}{x}}_{k}-\stackrel{\xaf}{x}\right)Sc\left({X}_{k}\right)}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+6{\displaystyle {\sum}_{k=1}^{r}\left({\stackrel{\xaf}{x}}_{k}-\stackrel{\xaf}{x}\right)Dev\left({X}_{k}\right)}+{\displaystyle {\sum}_{k=1}^{r}{\left({\stackrel{\xaf}{x}}_{k}-\stackrel{\xaf}{x}\right)}^{4}{N}_{k}}\end{array}$ (15)

which shows that general sum of deviation raised to the power of four is firstly equal to the sum of partial sums in deviation raised to the power of four plus the sum of deviation between partial averages and general average, rated by partial sums of deviation cubes, plus the sum of squares of deviations between partial averages and general average rated by deviances of groups, as well as even the sums of deviation raised to the power of four between partial averages and general average rated by high numbers of groups. In other words, the general sum of deviation raised to the power of four, apart from the inside, is completed by three components depending on differences between partial averages and general average, on partial sums of deviation cubes, on partial deviances and an high number of groups.

It is hardly necessary to emphasize that the last three parts cancel each other out if partial averages are all mutually similar. Therefore, the general sum of deviation raised to the power of four is equal to the sum of partial sums in deviation raised to the power of four.

There are other equally interesting special cases of mutually similar partial sums in deviation cubes, of mutually similar deviances as well as similar high numbers of groups.

The above formula, also in this case, is simple only in the case of two groups:

$\begin{array}{c}Sq\left(X\right)={\displaystyle {\sum}_{k=1}^{2}Sq\left({X}_{k}\right)}+\frac{4\left({\stackrel{\xaf}{x}}_{1}-{\stackrel{\xaf}{x}}_{2}\right)\left({\gamma}_{1}-{\gamma}_{2}\right){N}_{1}{N}_{2}}{{N}_{1}+{N}_{2}}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+\frac{6{\left({\stackrel{\xaf}{x}}_{1}-{\stackrel{\xaf}{x}}_{2}\right)}^{2}\left({N}_{1}{\sigma}_{1}^{2}+{N}_{2}{\sigma}_{2}^{2}\right){N}_{1}{N}_{2}}{{N}_{1}+{N}_{2}}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+\frac{{\left({\stackrel{\xaf}{x}}_{1}-{\stackrel{\xaf}{x}}_{2}\right)}^{4}\left({N}_{1}^{2}-{N}_{1}{N}_{2}+{N}_{2}^{2}\right){N}_{1}{N}_{2}}{{({N}_{1}+{N}_{2})}^{3}}\end{array}$ (16)

where γ_{1} e γ_{2} are asymmetrical indexes of the two groups. The second addend is positive (negative) if averages and asymmetrical indexes are concordant (discordant). The third and fourth addends are always not negative.

4. Decompositon of Codeviance

Regarding partial averages and totals of Y-type, formulas in previous paragraph are to be taken into account by substituting all x with all y.

Codeviances between types X and Y of the groups, as to $k=1,2,\cdots ,r$, are

$Codev\left({X}_{k},{Y}_{k}\right)={\displaystyle {\sum}_{i=1}^{{N}_{k}}\left({x}_{ki}-{\stackrel{\xaf}{x}}_{k}\right)}\left({y}_{ki}-{\stackrel{\xaf}{y}}_{k}\right).$ (17)

General codeviance is

$Codev\left(X,Y\right)={\displaystyle {\sum}_{k=1}^{r}{\displaystyle {\sum}_{i=1}^{{N}_{k}}\left({x}_{ki}-{\stackrel{\xaf}{x}}_{k}\right)}}\left({y}_{ki}-\stackrel{\xaf}{y}\right).$ (18)

Similarly, to what has been done in previous paragraphs, by subtracting and adding the average of k-group within brackets, the outcome is

$Codev\left(X,Y\right)={\displaystyle {\sum}_{k=1}^{r}{\displaystyle {\sum}_{i=1}^{{N}_{k}}\left[\left({x}_{ki}-{\stackrel{\xaf}{x}}_{k}\right)+\left({\stackrel{\xaf}{x}}_{k}-\stackrel{\xaf}{x}\right)\right]}}\left[\left({\stackrel{\xaf}{y}}_{ki}-{\stackrel{\xaf}{y}}_{k}\right)+\left({\stackrel{\xaf}{y}}_{k}-\stackrel{\xaf}{y}\right)\right].$ (19)

By calculating the product and eliminating two zero-value terms, we have

$Codev\left(X,Y\right)={\displaystyle {\sum}_{k=1}^{r}{\displaystyle {\sum}_{i=1}^{{N}_{k}}\left({x}_{ki}-{\stackrel{\xaf}{x}}_{k}\right)}}\left({y}_{ki}-{\stackrel{\xaf}{y}}_{k}\right)+{\displaystyle {\sum}_{k=1}^{r}\left({\stackrel{\xaf}{x}}_{k}-\stackrel{\xaf}{x}\right)}\left({\stackrel{\xaf}{y}}_{k}-\stackrel{\xaf}{y}\right){N}_{k}.$ (20)

that is

$Codev\left(X,Y\right)={\displaystyle {\sum}_{k=1}^{r}Codev\left({X}_{k},{Y}_{k}\right)}+{\displaystyle {\sum}_{k=1}^{r}\left({\stackrel{\xaf}{x}}_{k}-\stackrel{\xaf}{x}\right)\left({\stackrel{\xaf}{y}}_{k}-\stackrel{\xaf}{y}\right){N}_{k}}$ (21)

which shows that general codeviance is equal to the sum of partial codeviances increased by the sum of results from deviations of partial averages out of X-general average, as to corresponding Y-deviations rated by high numbers of groups. The latter sum can be also called codeviance of averages.

It is hardly necessary to emphasize that codeviance of averages is zero value in similar partial averages (regarding one or both types), so that general codeviance is equal to the sum of partial codeviances.

The above formula, also in this case, is simple only in the case of two groups:

$Codev\left(X\right)={\displaystyle {\sum}_{k=1}^{2}Codev\left({X}_{k},{Y}_{k}\right)}+\frac{\left({\stackrel{\xaf}{x}}_{1}-{\stackrel{\xaf}{x}}_{2}\right)\left({\stackrel{\xaf}{y}}_{1}-{\stackrel{\xaf}{y}}_{2}\right){N}_{1}{N}_{2}}{{N}_{1}+{N}_{2}}$ (22)

Second addend is positive (negative) depending on whether averages of the above two types are concordant (discordant).

5. Application

As an application, we refer to a group of 278 students (144 males and 134 females) attending a first year course at the University of Bari whose height and body weight were detected. The averages, deviations, sums of deviation cubes and deviation raised to the power of four are as follows (Tables 1-4).

As to previous results, the following (Tables 5-8) decompositions are made.

Previous results allow the following considerations to be made. Let’s start with the values:

- the two groups are quite similarly large (Males are slightly prevailing);

- average values, as to both types, are bigger among Males;

- standard deviations, for both types, indicates a bigger variability among Males;

Table 1. Number of cases.

Table 2. Height (cm).

Table 3. Weights (kg).

Table 4. Heights × weights.

Table 5. Decomposition of deviance.

Table 6. Decomposition of the sum of deviation cubes.

Table 7. Decomposition of the sum of deviation raised to the power of four.

Table 8. Decomposition of codeviance.

- height asymmetrical indexes in both sexes take on slight negative values, those for Weights, instead, are surely asymmetrical positive;

- height disnormality indexes take on mild and contrasting values (negative for Males and positive for Females), the Weight ones, for both sexes indeed, are similarly positive, even though only the Male one takes on an high value;

- both correlation coefficients are positive, the Male one is the highest.

Let us now turn to decompositions.

The decomposition of deviance highlights a relevant incidence of deviance between the two sexes, due to a marked difference of averages.

It should be added that such incidence affects more distinctly the height, whose variability depends almost exclusively on genetic factors, than weight whose variability also depends on exogenous factors.

Regarding decomposition of the sum of deviation cubes, it is necessary to make a difference as to both types.

Regarding height (Table 6), the sum of deviation cubes shows a modest negative value as to Males and quite trifling for Females: overall, the incidence is −14.1%. A still more significant negative contribution (−26.7%) is due to the sum of cubic deviations between the average heights of each sex and the general average height rated by relative high numbers. The most significant and actually positive contribution is due to deviations between average heights of each sex and the general average height rated by relative deviances (140.8%). This is the component that gives rise to a positive value of the general sum of deviation cubes, indicating the positive asymmetry of general distribution, while the partial ones are slightly asymmetrical negative.

As it can be seen in Table 6, the situation as to weight is very different: first of all, both sums of weight deviation cubes of both sexes are positive, so that internal sum of cubes is more than half of the general one (51.1%). The incidence of this component due to deviations between average weights of both sexes and general average weight rated by deviations is identical (51.1%). On the other hand, the contribution due to the sum of deviation cubes between average weights of each sex and the average general weight rated by relative high numbers is totally marginal (−2.2%). To sum up, the positive general sum of deviation cubes as to weight is due to the sum of deviation cube sums in each sex for one half and for the other is due to deviations between average weights in each sex and general average weight rated by their relative deviances. In other words, this second part integrates the internal sum of deviation cubes, stressing the positive asymmetrical general distribution compared to the positives ones in distributions of both sexes.

It is necessary to make a difference between height and weight regarding the sum of deviation fourth powers.

As to height (Table 7), almost a third (32.4%) of the general sum of deviation fourth powers is given by the sums of deviation fourth powers in each sex. However, the most relevant incidence (59.5%) is due to the sum of deviations between average heights of each sex and general average height rated by deviances. It is modest, but still positive, the contribution (9.4%) of the fourth power sum of deviations between average heights of each sex and general average height rated by corresponding high numbers. Finally, there is a negative contribution (−1.3%) from the sum of deviations between average heights of each sex and general average height, rated by corresponding sums of deviation cubes.

As to weight (Table 7), all four components are positive: the most relevant is the sum of deviation partial fourth powers (44.2%), then comes a component that involves squares of deviations as well as deviances (36.0%) and the one which involves deviations and the sums of deviation cubes (15.9%). Finally, there is the highest component of fourth powers, with a modest contribution (3.9%) of deviations between weights of each sex and general average weight rated by a corresponding high number. Moreover, the general sum of deviation fourth powers and, therefore, the hyper-normal general distribution is due to all four components together.

To end up, in Table 8 there is decomposition of codeviance: internal codeviance of both sexes (with a distinctly prevalence of the Males one, as can be seen both from the values of codeviances and correlation coefficients in both sexes) explains 41.4% of general codeviance, since the remaining 58.6% is due to codeviance between concordant averages in both sex types.

6. Conclusion

Similarly, to what is done for deviance decomposition (Scheffè, 1999) [3], decomposition formulas of sums regarding deviation cubes, the sum of deviation raised to the power of four and codeviance have been obtained. Such formulas allow to evaluate the contribution of different components of the above three absolute measures regarding asymmetry, disnormality and concordance. It should be noted that the three aforementioned measures are the result of an internal part of partial distributions plus other parts in which averages, deviances, deviation cube partial sums and codeviance of averages are involved. In addition to formulas valid for r groups, there are even more significant ones that have been obtained and valid only for two groups. An example has allowed us to better highlight how useful those formulas were.