Pivot Points in Bivariate Linear Regression
Show more
Abstract: There are little-noticed points in the plane, which are artifacts of linear regression. The points, which are called pivot points, are the intersections of sets of regression lines. We derive the coordinates of the pivot point and explain its sources. We show how a pivot point arises in a certain notable data set, which has been analyzed often for points of high leverage. We obtain the application of pivot points that shortens calculations when updating a set of bivariate observations by adding a new point.

1. Introduction

It is common to produce many lines to fit bivariate data as the observations are being altered in some way. For example, in order to determine a particular data point’s influence on the best fit, the point may be moved by changing its y-coordinate and a new line created. Some diagnostic tests are based on this. A point, which is called the pivot point, is the intersection of certain lines that are often used for examining influence.

An example of a pivot point is presented in Section 2. In Section 3, we derive the coordinates of the pivot point. We show that a pivot point can be created in two ways. One way is augmenting an original set of bivariate observations with an additional point, which can have arbitrary multiplicity. Another way is altering an existing observation’s y-coordinate as described above. Section 4 presents the benefit of the pivot point in that it can be useful to shorten calculations when adding a new observation.

2. Illustrative Example

Consider the data in Table 1 . The predictor variable (x) is the age in months at which a child says their first word, and the response variable (y) is the child’s Gesell Adaptive Score from an aptitude test. These data have been analyzed many times for influential and outlying observations  - . Using various criteria, Cases 2, 18, and 19 have been identified as significant. For illustrative purposes, we focus on Case 18.

When examining an individual observation’s influence on a bivariate least-squares linear regression, it is common to generate a sequence of regression lines. These lines fit the same set of observations, except that the y-coordinate is made to vary while its x-coordinate is unchanged for the specified data point of interest. The influence of Case 18 on the least-squares regression line is examined by keeping its x-coordinate of 42 and giving its y-coordinate the values 57, 77, 97, 117, and 137. This produces the five regression lines in Figure 1. Clearly, Case 18 could have a large influence on the regression line. Some authors have illustrated and evaluated leverage in this way    . All these regression lines pass though a common point, called the pivot point . In Figure 1, the pivot point (12.3, 96.1) is shared by the five lines, and its location is indicated by the symbol D.

3. Derivation of the Pivot Point

We derive the formula for the coordinates of the pivot point. The pivot point can be created by augmenting an original set of bivariate observations with an additional point, which can have arbitrary multiplicity, which is another method to diagnose influence on the line    . We show that formulation to be equivalent to varying the location of a single point, while keeping the same first coordinate, as is done in Figure 1.

Consider the bivariate data set ${S}_{0}=\left\{\left({x}_{i},{y}_{i}\right):i=1,2,\cdots ,n\right\}$. For simplicity, assume that coordinates are selected so that $\left(\sum x/n,\sum y/n\right)=\left(0,0\right)$. Unindexed summations are over the elements of S0. Define $V=\sum {x}^{\text{2}}/n$. Introduce m copies of the new point R(u,v). If R is a point in S0, these are additional copies. The aggregate of S0 and m > 0 copies of R is denoted Sm.

For m = 0, the least-squares regression line of S0 is

$y={a}_{0}+{b}_{0}x=\left(\sum xy/\sum {x}^{2}\right)x$.

Table 1. Age at First Word (x) and Gesell Adaptive Score (y).

(a)(b)(c)(d)(e)

Figure 1. Altering Case 18’s y-coordinate from 57 to 77, 97, 117, and 137, yielding five lines through a pivot point. (a) y = 57; (b) y = 77; (c) y = 97; (d) y = 117; (e) y = 137.

For any integer m ≥ 0, the least-squares regression line of Sm is

$y={a}_{m}+{b}_{m}x=\frac{mV\left(v-{b}_{0}u\right)}{\left(m+n\right)V+m{u}^{2}}+\frac{\left(m+n\right)V{b}_{0}+muv}{\left(m+n\right)V+m{u}^{2}}x$, (1)

and the point of means is

${M}_{m}=\left(\frac{m}{m+n}u,\frac{m}{m+n}v\right)$, (2)

which is on line (1) for Sm.

When m > 0 and u ≠ 0, the pivot point

$P=\left(-\frac{V}{u},-\frac{V{b}_{0}}{u}\right)$ (3)

is on the least-squares line for all setsSm. This can be seen by substituting point (3) into the equation of the line (1), that is,

${a}_{m}+{b}_{m}\left(-V/u\right)=-V{b}_{0}/u$.

Point P on (3) is called the pivot point ofR with respect toS0, because P is on all regression lines for Sm, which have different slopes. Because the y-coordinate v of R is absent from the coordinates of P, it is also called the pivot point ofu with respect to S0. The set of regression lines that is created by adding copies of R, is called a pencil of lines or fan of lines throughP.

When u = 0, the best-fit line (1) translates in the y-direction as m increases, and the pivot point is said to be at infinity. The pivot point is solely an artifact of the least-squares regression equations. Initially, it was found and explained in a linear-algebraic setting .

The regression lines in a fan, which is formed by vertically moving one point in the data set, intersect at the pivot point. In particular, the regression line formed by addingm copies of the pointR(u,v) toS0 is equivalent to the line formed by adding a single point (u,vm) with

${v}_{m}=\frac{n\left(1-m\right)V{b}_{0}u}{\left(m+n\right)V+m{u}^{2}}+\frac{m\left(\left(1+n\right)V+{u}^{2}\right)}{\left(m+n\right)V+m{u}^{2}}v,$

which can be seen algebraically by setting m = 1 and $v={v}_{m}$ in line (1), which yields (1).

Pivot points occur when the data are not centered at the origin. All best-fit lines can be rigidly translated, so that the new center is $\left(\stackrel{¯}{x},\stackrel{¯}{y}\right)$. The slope of each line can be found from

$\frac{\sum \left(x-\stackrel{¯}{x}\right)\left(y-\stackrel{¯}{y}\right)}{\sum {\left(x-\stackrel{¯}{x}\right)}^{2}}$,

which shows the dependence solely on the differences of each coordinate from its mean. The observations in Figure 1 are centered at the data set’s mean point $\left(\stackrel{¯}{x},\stackrel{¯}{y}\right)$.

4. Computational Shortcuts When Augmenting a Bivariate Set

The pivot point offers two shortcuts for computing equations for regression lines. This is analogous to adding the n + 1st value a to the data set $\left\{{x}_{i}:i=1,2,\cdots ,n\right\}$, whose mean is $\stackrel{¯}{x}$. The new mean can be calculated using $\left(n\stackrel{¯}{x}+a\right)/\left(n+1\right)$, which requires considerably less computation than not using $\stackrel{¯}{x}$ .

One shortcut is, given setS0, the regression line forSm can be computed as the line containing the point of means (2) and the pivot point (3). Recall that in (4), V and b0 are based only on the unaugmented data set.

The second shortcut involves the line obtained when multiplicitym becomes very large, then the line (1) approaches the line

$y={a}_{\infty }+{b}_{\infty }x=\frac{V\left(v-{b}_{0}u\right)}{V+{u}^{2}}+\frac{V{b}_{0}+uv}{V+{u}^{2}}x,$ (4)

which contains the new pointR and the pivot pointP. The coefficients in (4) provide the tool for rapid computation for the line (1) for any m, including m = 1 for a single additional point. In (1), am is a weighted average of a0 and ${a}_{\infty }$ , andbm is a weighted average ofb0 and ${b}_{\infty }$ with the same weights, in particular,

${a}_{m}=w{a}_{0}+\left(1-w\right){a}_{\infty }$ and ${b}_{m}=w{b}_{0}+\left(1-w\right){b}_{\infty },$ (5)

where

$w=\frac{nV}{\left(m+n\right)v+m{u}^{2}},$ (6)

Equations (5) are seen by substituting a0 and b0 from (1), ${a}_{\infty }$ and ${b}_{\infty }$ from (4), and w from (6) into the right-hand sides of (5), which yields am and bm in (1).

5. Conclusion

Pivot points are omnipresent in applications of bivariate linear regression. In particular, they are points through which new lines pass when a data point is altered. One important purpose of altering a point is to determine its influence. We have displayed this phenomenon with the well-known data set of ages at first word versus Gesell scores, which has been analyzed by many authors from many points of view. A pivot point is a handy and efficient tool for shortening calculations when new data arises.

Acknowledgements

We are grateful to many of our colleagues who have frequently and freely shared their knowledge about regression and computational statistics.

Cite this paper: Lutzer, C. and Farnsworth, D. (2021) Pivot Points in Bivariate Linear Regression. Open Journal of Statistics, 11, 393-399. doi: 10.4236/ojs.2021.113023.
References

   Mickey, R.M., Dunn, O.J. and Clark, V. (1967) Note on the Use of Stepwise Regression in Detecting Outliers. Computer and Biomedical Research, 1, 105-111.
https://doi.org/10.1016/0010-4809(67)90009-2

   Andrews, D.F. and Pregibon, D. (1978) Finding the Outliers that Matter. Journal of the Royal Statistical Society, Series B (Methodological), 40, 85-93.
https://doi.org/10.1111/j.2517-6161.1978.tb01652.x

   Dempster, A.P. and Gasko-Green, M. (1981) New Tools for Residual Analysis. Annals of Statistics, 9, 945-959.
https://doi.org/10.1214/aos/1176345575

   Draper, N.R. and John, J.A. (1981) Influential Observations and Outliers in Regression. Technometrics, 23, 21-26.
https://doi.org/10.1080/00401706.1981.10486232

   Moore, D.S., Notz, W.I. and Fligner, M.A. (2017) The Basic Practice of Statistics. 8th Edition, Freeman, New York.

   Paul, S.R. (1983) Sequential Detection of Unusual Points in Regression. Journal of the Royal Statistical Society, Series D (The Statistician), 32, 417-424.
https://doi.org/10.2307/2987543

   Rousseeuw, P.J. and Leroy, A.M. (1987) Robust Regression and Outlier Detection. Wiley, New York.
https://doi.org/10.1002/0471725382

   Chatterjee, S. and Hadi, A.S. (1986) Influential Observations, High Leverage Points, and Outliers in Linear Regression. Statistical Science, 1, 379-393.
https://doi.org/10.1214/ss/1177013630

   Hoaglin, D.C. (1988) Using Leverage and Influence to Introduce Regression Diagnostics. College Mathematics Journal, 19, 387-416.
https://doi.org/10.1080/07468342.1988.11973146

   Hoaglin, D.C. (1992) Diagnostics. In: Hoaglin, D.C. and Moore, D.S., Eds., Perspectives on Contemporary Statistics, Mathematical Association of America, Washington, 123-144.

   Montgomery, D.C., Runger, G.C. and Hubele, N.F. (2011) Engineering Statistics. 5th Edition, Wiley, New York.

   Lutzer, C.V. (2017) A Curious Feature of Regression. College Mathematics Journal, 48, 189-198.
https://doi.org/10.4169/college.math.j.48.3.189

   Brase, C.H. and Brase, C.P. (2017) Understandable Statistics: Concepts and Methods. 12th Edition, Cengage Learning, Boston.

   Larose, D.T. (2015) Discovering Statistics. 3rd Edition, Freeman, New York.

   Triola, M.F. (2017) Elementary Statistics. 13th Edition, Pearson, Boston.

Top