Standardization of Winning Streaks in Sports

Show more

1. Introduction

Sports are rife with statistics-some of them are useful, while others are not. In baseball, a player’s batting average is an important statistic. It tells us the number of base hits per plate appearances. A batting average of 0.300 (or 30%) is considered a very good batting average. A batting average of about 0.200 or less is not considered good. An Earned Run Average (ERA), which is the number of earned runs forfeited by a pitcher per every nine innings is another important statistic, which gives us an idea of the pitcher’s ability to prevent the opponent from scoring. An earned run average of 3.00 is considered good, whereas an ERA of 2.00 or less is considered exceptional. In basketball, a shooting percentage of 50% or better is typically considered good.

All of the above example statistics are informative, because the baseline is firmly established. That is, we know what the minimum possible values are, and with the exception of the earned run average in baseball, we also know the possible maximum value. The maximum possible value is, in theory, infinity, but such a value will not exist, as a pitcher ineffective to that degree will not be permitted to continue pitching. Typically, an ERA of 4.50 or above is considered ineffective.

Unfortunately, not all sports statistics fall into the “tidy” category of having firmly established baselines. Streaks fall into this unfortunate category. Articulation of a streak, such as a winning streak or losing streak, is intended to inform the fan of recent success or lack thereof. While this can be informative, it can also be inconsistent. Consider, for example, a baseball team that has won its last nine consecutive games. Clearly, this is an impressive run. Also consider that this same team has won 12 out of its last fifteen games-an 80% winning percentage. Which measure is more important and/or informative? There is no clear answer to this question. Let us further assume that before this team won 12 out of fifteen games, they lost five consecutive games. Therefore, this team has won 60% of its last 20 games-slightly better than average. When we “stretch” the chronology of the measurement, the success becomes less impressive. Let us review this via a table:

When sports media talk about streaks, and/or recent runs of success or lack of success, there is a tendency to “package” the information in such a way that maximizes or enhances the success or lack of success. While the general point of this is understandable, such statistics are usually biased due to a small sample size.

The above is not intended to be critical of studying streaks. Streaks are important to show that binary outcomes in sports can on occasion defy expectation, and this is worth study. The intent here is to standardize streaks across a larger time frame. In short, it is intended here, to study streaks across an entire season, and isolate the teams that tend to show more “streakiness” than other teams. This standardization consists of a few new metrics to study consistency of winning and losing across an entire season. These metrics also consider the winning percentage of teams across the season. In other words, these analyses of streaks are adjusted for the team’s success. After the metrics are presented, they are used to assess the performance of all teams in American Major League Baseball, and all teams in the American National Basketball Association for their respective 2016 seasons.

2. Literature Review

Much work has been done to study streaks in sports. Perhaps one reason for this is due to Joe DiMaggio’s 56-game hitting streak in 1941. Many consider it the most impressive streak in sports history. Much effort has been put forth in an attempt to better understand the forces at work during the streak, which has subsequently led to deeper understanding to streaks in general, and the entities that are related to streaks, which could be considered possible “causes” or contributing factors to the existence of streaks [1] . Effort has gone into deciding whether or not a resultant set of data actually qualifies as an actual “streak” [2] , and much work has been done at trying to predict streaks [3] [4] [5] . Vallone and Tversky [6] demonstrated that single outcomes in sports are not related to prior outcomes. Streaks have even been studied so that gamblers can improve their chances of successful sports betting [7] .

Streaks are difficult to measure, because there are no rules defining what constitutes a meaningful streak. Because of this, there are many opportunities to research what is considered a streak, and how meaningful a streak is [8] . In certain ways, a streak can be related to a degree of variation that exists in the data-the “streakier” the data is, the more variation the data will show. Conversely, the less streaky the data, the smaller the variation. This problem has been addressed [9] . The work in this paper attempts to extend this work by first studying descriptive statistics associated with win/loss streak performance of two sports leagues, and secondly understanding the relationship between expected wins and actual wins throughout a single season. This second motivation has exploited previous work associated with production scheduling [10] to generalize actual win/loss performance with expected win/lost performance.

3. Methodology

Here, we first describe descriptive statistics associated with win/loss streaks of sports teams. Next, we describe a “Gap” measure which compares a team’s actual win/loss performance to their expected performance though each game of the season. Finally, a “runs” test is described. The section concludes by illustrating the methodology via a simple, simulated data set.

3.1. Streak Analysis

The first part of our methodology pertains to actual streaks: winning streaks and losing streaks. Here, we compute all descriptive statistics relevant to winning and losing streaks. Prior to delving to the mathematics of these metrics, a table of definitions is provided (see Table 1).

Let us assume that there are n games in a season, m teams, and w_{ij} represents team j winning game i, shown via the following:

${w}_{ij}=\{\begin{array}{l}1\text{ifteam}j\text{winsgame}i\hfill \\ 0\text{otherwise}\hfill \end{array},\forall i,j$ (1)

The total number of wins for team j (Wins_{j}) is computed as follows:

${\text{Wins}}_{j}=\underset{i=1}{\overset{n}{{\displaystyle \sum}}}{w}_{ij},\forall j$ (2)

Similarly, the total number of losses for team j (Losses_{j}) is computed as follows:

Table 1. Terms used for analysis.

${\text{Losses}}_{j}=n-\underset{i=1}{\overset{n}{{\displaystyle \sum}}}{w}_{ij},\forall j$ (3)

The winning percentage for team j (Pct_{j}) is computed as follows:

${\text{Pct}}_{j}=\frac{1}{n}\underset{i=1}{\overset{n}{{\displaystyle \sum}}}{w}_{ij},\forall j$ (4)

The above is trivial-we are simply comparing wins and losses for each team. More importantly, we wish to glean information from winning and losing streaks. In order to do this, we need to use the w_{ij} values to construct a list of winning and losing streaks for each team. In order to do this, we define the I^{th} winning streak as follows:

${w}_{aj}=1,\text{}{w}_{a+1,j}=1,\cdots ,{w}_{b-1,j}=1,\text{}{w}_{bj}=1,\text{}\forall j$ (5)

In other words, team j wins all games, starting with game a, and ending with game b. The values of both w_{a}_{−1,j} and w_{b}_{+1,j} are zero. This results in the I^{th} winning streak of the following length:

${\text{WSL}}_{Ij}=1+\left(b-a\right),\forall j$ (6)

The count of winning streaks for team j is incremented by one via the following:

${\text{WSC}}_{j}={\text{WSC}}_{j}+1,\forall j$ (7)

It should be noted that for all j, WSC_{j} is initialized to zero prior to analysis. The longest winning streak for team j is determined as follows:

$\text{Max}{W}_{j}=\mathrm{max}\left({\text{WSL}}_{Ij}\right),\forall I,j$ (8)

Calculating losing streak characteristics is done in similar fashion to winning streaks. First of all, the J^{th} losing streak is defined as follows:

${w}_{aj}=0,\text{}{w}_{a+1,j}=0,\cdots ,{w}_{b-1,j}=0,\text{}{w}_{bj}=0,\text{}\forall j$ (9)

Analogous to the case for winning streaks, the losing streak above has w_{a}_{−1,j} and w_{b}_{+1,j} values equal to 1. The length of this J^{th} losing streak for team j is computed as follows:

${\text{LSL}}_{Ij}=1+\left(b-a\right),\forall j$ (10)

For team j, the count of the losing streaks is incremented as follows:

${\text{LSC}}_{j}={\text{LSC}}_{j}+1,\forall j$ (11)

As was the case with the winning streak counts, all j teams have their LSC_{j} values initialized to zero prior to analysis. The longest losing streak length for team j is as follows:

$\text{Max}{L}_{j}=\mathrm{max}\left({\text{LSL}}_{Ij}\right),\forall J,j$ (12)

3.2. Gap Analysis

There is another measure of importance that is not directly related to streaks. That is the “smoothness” of a team’s success throughout the season. For example, if we assume a team wins 66.67% of their games, and they won two games then lost one, with this pattern repeating itself throughout the season, their winning pattern would map exactly to their winning percentage, and the “smoothness” of their winning would be optimal. In reality, of course, this does not happen, so it’s important to quantify the smoothness of teams’ winning patterns. We can quantify this via a variation of the “smoothness index” that has been used to study many scheduling algorithms [10] . Given the above definitions, we call this smoothness index the “Gap” measure, and each team has such a measure. It is calculated as follows:

${\text{Gap}}_{j}=\sqrt{{{\displaystyle {\sum}_{i=1}^{n}\left(\left({\displaystyle {\sum}_{h=1}^{i}{w}_{hj}}\right)-i\cdot {\text{Pct}}_{j}\right)}}^{2}},\forall j$ (13)

This metric essentially tells us how many games team j has won through game i compared to how many games they are expected to win through game i. This difference is then squared, summed for all n games, and then the square root of this quantity is taken for standardization purposes. In layman’s terms, this metric tells us the smoothness of a team’s winning pattern. Lower quantities suggest more consistency in the winning patterns, while higher quantities suggest less consistency in winning patterns.

3.3. Runs Test

A “runs test” is a popular way to determine if a sequence of binary outcomes is truly random [11] . The runs test essentially has three properties that can be gathered from the sequence of binary outcomes: n_{1} is the total number of one type of binary outcome (Wins_{j} regarding this effort), n_{2} is the total number of the other type of binary outcome (Losses_{j} regarding this effort), while “r” is the number of streaks (or “runs”), analogous to WSC_{j} + LSC_{j} regarding this effort. We compute the mean, standard deviation, and associated z-score according to the following:

${\mu}_{r}=\frac{2{n}_{1}{n}_{2}}{{n}_{1}+{n}_{2}}+1$ (14)

${\sigma}_{r}=\frac{\left({\mu}_{r}-1\right)\left({\mu}_{r}-2\right)}{\left({n}_{1}+{n}_{2}-1\right)}$ (15)

$z=\frac{r-{\mu}_{r}}{{\sigma}_{r}}$ (16)

Given the standardized normal deviate, we can determine the two-tailed p- value is follows:

$p=\frac{2}{\sqrt{2\text{\pi}}}{\displaystyle {\int}_{-\infty}^{-\left|z\right|}{e}^{-{x}^{2}/2}}\text{d}x$ (17)

If the p-value associated with the test is less than a pre-specified critical value, we reject the null hypothesis and claim that the values comprising the sequence is not random. Otherwise, we fail to reject the null hypothesis and conclude that the values comprising the sequence are in fact random.

This research effort employs the runs test to see if the win-loss distribution is random or not.

3.4. Example Problem

A “toy” data set is presented to provide an illustration as to how the presented metrics work. The data set is binary data on the passing success of an American football quarterback. A “1” means an attempted pass was completed, a “0” means the attempted pass was not completed. The data set is simulated such that the percentage of completed passes is 57%. One hundred simulated passes were generated. The simulated data is as follows:

1 0 0 0 1 1 1 1 1 1 1 0 1 1 0 1 0 0 0 1 1 1 0 1 1 1 1 0 0 1 0 0 1 1 0 0 1 1 0 1 1 0 0 0 1 1 1 1 0 0 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 1 0 1 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0

This data set was used for the presented formulae, and the statistics are summarized accordingly. For the winning streak statistics, “n” is used to represent WSC_{j}, “
$\stackrel{\xaf}{x}$ ” is used to represent
${\stackrel{\xaf}{x}}_{{W}_{j}}$ , “s” is used to represent s_{Wj}, and “max” is used to represent MaxW_{j}. Similarly for losing streaks, “n” is used to represent LSC_{j}, “
$\stackrel{\xaf}{x}$ ” is used to represent
${\stackrel{\xaf}{x}}_{{L}_{j}}$ , “s” is used to represent s_{Lj}, and “max” is used to represent MaxL_{j}. Since there is only a single entity (one “team,” so to speak) for this example, the subscript is not recognized for convenience. The winning streak data and the losing streak data are segregated for the presentation below (see Table 2):

In the context of this example, a completion is considered a success (analogous to a “win”) while an incompletion is considered a failure (analogous to a “loss”).

Table 2. Results for “Toy” data set.

The summary tells us that there were 57 completed passes and 43 incomplete passes for a 57% completion percentage. There were 29 completion streaks, each having a mean length of 1.97 completions and a standard deviation of 1.43 completions. The maximum streak of completed passes was 7. There were 30 incompletion tasks, each having a mean length of 1.47 incompletions and a standard deviation of 0.73 incompletions, with a maximum streak of incomplete passes of 3. The “Gap” measure, or “smoothness” measure here is 12.92. This value is not particularly informative here because there is no basis for comparison. This measure will be more informative when there are several entities used for comparison. Given the high p-value associated with the runs test, we cannot reject the null hypothesis of randomness.

3.5. Experimentation

The simple example above is used to merely illustrate the use of the presented statistics. It is important to apply the presented methodology to real data to better understand the “streakiness,” consistency and/or winning patterns for real sports teams. As such, the 2016 performances of all National Basketball Association (NBA) teams and all Major League Baseball (MLB) teams are studied and compared. Both the NBA and MLB are very popular sports leagues in the United States and abroad. In particular, the NBA has gained much international interest in recent years, while MLB has gained interest in the Caribbean region, along with Japan and South Korea.

It is of particular interest to understand which teams are most “streaky,” or inconsistent. The presented methodology is intended to shed some light on this fundamental question.

4. Results

This section presents the results for the aforementioned leagues-Major League Baseball and the National Basketball Association. The results are then discussed in some detail.

4.1. Major League Baseball

MLB has 162 games on their regular season schedule. However, not all of the thirty teams play this number of games, due to rain postponements and the like. If a team affected by a postponement is eligible for a post-season playoff berth, any missed games will be made up if the result of the makeup game impacts the post-season scenario. Any other postponed games will not be made up unless a makeup game is convenient for both affected teams. Because of this, not all teams play the full 162 games. In 2016, most teams did in fact play 162 games,

^{1}The Miami Marlins and the Atlanta Braves cancelled a game in observance of the unexpected death of a Miami player. This game was not made up because neither team was playoff eligible, and a makeup game was not convenient for either team.

but a few played 161 games due to weather postponements^{1}.

With this said, the Chicago Cubs were the best team in the regular season with a winning percentage of 0.6398, while the Minnesota Twins were the worst team in the league with a winning percentage of 0.3642. Full results are shown in Table 3.

Table 3. MLB results.

In terms of winning streaks, the Chicago Cubs had the longest average winning streak of 2.68 wins/streak. They were closely followed by the Washington Nationals (2.5), Texas Rangers (2.5) and the Cleveland Indians (2.33). It is also worth noting that the Cleveland Indians had a (14) game winning streak in 2016, the longest of the season. For losing streaks, the Minnesota Twins had the longest average losing streak of 3.3 losses/streak, while the Tampa Bay Rays and Atlanta Braves also had long average losing streaks (2.61 and 2.53 respectively).

Clearly, better teams will have longer average winning streaks, while lesser teams will have longer average losing streaks. In fact, the correlation between the two entities is −0.5235, which is statistically significant (p = 0.0030).

In terms of consistency, or lack of “streakiness,” the following teams did well: St. Louis, San Diego, Milwaukee, Oakland, Arizona, Washington, Cleveland, Los Angeles Dodgers and Detroit, all having “Gap” measures of less than 20. These teams, in effect, consistently won and lost games at a rate in accordance with their overall winning percentage. They had relatively small maximum winning streaks, and relatively small maximum losing streaks. The exception to this was Cleveland, whose long 14 game winning streak was basically offset by a very short maximum losing streak of 3 games.

In terms of inconsistency, or “streakiness,” the Atlanta Braves are seen to be the most salient, which a “Gap” measure of 76.3. This high number can be understood by taking note of their maximum winning streak of (7) games and their maximum losing streak of (9) games-both long streaks, which detract from their consistency. San Francisco was also “streaky,” with a maximum winning streak of (8) games, and a maximum losing streak of (6) games.

It should be noted that there is no significant correlation between a team’s winning percentage and their “Gap” measure―the correlation is −0.2175, with a p-value of 0.2483. As such, the “Gap” measure does provide information beyond a team’s winning percentage.

The runs test for this data set is not informative, because in all instances, the conclusion is random sequences of wins and losses.

4.2. National Basketball Association

In the NBA, there are (82) regular-season games scheduled. Unlike MLB, these games are not postponed due to weather. As such, all teams play 82 games in a regular season.

For the 2016 regular NBA season, the Golden State Warriors had the best winning percentage (0.8902), which was the best record in league history-no team had ever won 73 games in a season before Golden State accomplished this. Conversely, the Philadelphia 76 ers had the worst winning percentage in the league (0.1220). All findings are shown in Table 4.

In terms of winning streaks, Golden State had the longest average winning streak of 7.3 wins/streak, followed by San Antonio, with 5.15 average wins/ streak. It is also worth noting that Golden State started the season with a record- breaking 24-game win streak. For losing streaks, Philadelphia had the longest

Table 4. NBA results.

average losing streak of 6.55 losses/streak, followed by the Los Angeles Lakers, who averaged 5 losses/streak. Also, it is clear that there is a correlation between average lengths of winning streaks and losing streaks (0.5342), which is statistically significant (p = 0.0024). It is also worth noting that for Golden State, their average losing streak length was (1) loss/streak, and for Philadelphia, their average winning streak was (1) win/streak. In other words, Golden State never had a losing streak exceed (1) game, and Philadelphia never had a winning streak exceed (1) game.

The most consistent or least “streaky” teams were the Milwaukee Bucks, Los Angeles Lakers and the Denver Nuggets, all with “Gap” measures under (9). None of these teams had winning streaks in excess of (4) wins, and their maximum losing streaks, while as high as (10) losses for Los Angeles, were nevertheless consistent with their dismal winning percentages. In short, these three teams won their games fairly proportionally to winning percentages.

There are three teams that are very streaky, or inconsistent throughout the season: the Portland Trail Blazers, the Memphis Grizzlies, and the Charlotte Hornets. Despite all of these teams being good, with winning percentages above 0.500, and making the playoffs, inconsistency throughout the regular season is prevalent. Portland had a maximum winning streak of (6) games and a maximum losing streak of (7) games. Memphis had a maximum winning streak of (5) games and a maximum losing streak of (6) games. Charlotte had maximum winning and losing streaks of (7) games each. These lengthy winning and losing streaks increase the “Gap” measure to the “top of the list.”

As is the case with MLB, there is no correlation between a team’s winning percentage and their “Gap” measure. The correlation is 0.0197, with an associated p-value of 0.9178. Given this lack of relationship, the “Gap” measure does provide information beyond winning percentage.

As was the case for the runs test used for MLB, the result is not informative, as the runs tests informs us that the sequence of wins and losses is random.

4.3. Comparison of MLB and NBA

There is a vast difference in the win/loss dynamic between the NBA and MLB. There is much more disparity in the NBA as compared to MLB. Table 5 shows a general breakdown of winning percentage statistics.

The average winning percentage for any league will always be 50%, because every team’s win is offset by another team’s loss. The standard deviation in winning percentage (Albert, 2012) through the league is a different story, however. In MLB this value is 6.62%, but in the NBA it is 16.92%―the NBA has much more performance parity as compared to MLB. The same applies to the winning percentage gap between the best and worst teams in the league. The difference in winning percentage between the Chicago Cubs and the Minnesota Twins (the best and worst teams in the league) is 27.56%. Similarly, the difference in winning percentage between the Golden State Warriors and the Philadelphia 76 ers is 76.82%―an immense difference in success.

Figure 1 and Figure 2 also show a difference in the win/loss dynamic when MLB is compared against the NBA. These two plots are organized such that the horizontal axis represents the winning percentage, while the vertical axis is the

Table 5. Winning percentage comparison.

Figure 1. Winning percentage vs number of streaks for MLB.

Figure 2. Winning percentage vs number of streaks for NBA.

sum of the number of winning streaks and losing streaks (WSC_{j} + LSC_{j}) for each team.

Figure 1 shows no relationship between winning percentage and number of streaks-there is seemingly noting interesting to report. Figure 2, however, shows a nonlinear relationship between winning percentage and number of streaks. The mediocre teams have more streaks than as compared to the teams that perform very poorly and/or very well.

Figure 3 shows the same as Figure 2, but with teams removed whose winning percentages are confined to match the range of MLB winning percentages. In other words, this filtered data set has omitted teams whose winning percentages are more extreme than those from the MLB data set.

Figure 3. Winning percentage vs number of streaks for filtered NBA data set.

Figure 3 is more resembling of Figure 1―there is no relationship between winning percentage and the number of streaks when teams with extreme records are filtered out of the analysis.

The high disparity in success for the NBA as compared to MLB is a surprise finding. Nevertheless, this enables us to basically conclude that the NBA team performance is more “streaky” or less consistent as compared to MLB. Our streak statistics and “Gap” measure have demonstrated that.

5. Concluding Comments

Methodology has been presented in an attempt to standardize streaks in sports. We have presented these metrics in two different forms: studying streaks via descriptive statistics associated with the streaks themselves, and studying how a team performs throughout the season as compared to how they should perform according to their season winning percentage. The methodology was applied to the 2016 NBA and 2016 MLB seasons. Our findings have shown us that NBA performance involves much more disparity as compared to MLB. The reason for this, beyond statistical analysis, is beyond the scope of the paper.

This type of binary analysis involves winning or losing. Our “toy” problem data set used completions vs. incompletions for an American football quarterback. The binary nature of our data can be used for many other sporting applications: a baseball player’s batting average for all at bats (hit vs. no hit), a soccer player’s success regarding penalty kicks (goal vs no goal), a hockey player’s success regarding penalty shots (goal vs. no goal), etc.

This type of analysis can also be used for other binary outcomes outside the world of sports. For example, we can study market streaks at the close of some stock exchange-market increase vs. market decrease. We can study streaks regarding the success of salespeople-successful sales call (customer places an order) vs. an unsuccessful sales call. In short, the applications for studying streaks with binary outcomes are only constrained by one’s imagination.

References

[1] Albert, J. (2008) Streaky Hitting in Baseball. Journal of Quantitative Analysis in Sports, 4.

https://doi.org/10.2202/1559-0410.1085

[2] Albright, C. (1993) A Statistical Analysis of Hitting Streaks in Baseball, Journal of the American Statistical Association, 88, 1175-1183.

https://doi.org/10.1080/01621459.1993.10476395

[3] Rockoff, D.M. and Yates, P.A. (2009) Chasing DiMaggio: Streaks in Simulated Season Using Non-Constant At-Bats. Journal of Quantitative Analysis in Sports, 5.

https://doi.org/10.2202/1559-0410.1167

[4] Arkes, J. (2010) Revisiting the Hot Hand Theory with Free Throw Data in a Multivariate Framework. Journal of Quantitative Analysis in Sports, 6.

[5] Thomas, A.C. (2010) That’s the Second-Biggest Hitting Streak I’ve Ever Seen! Verifying Simulated Historical Extremes in Baseball. Journal of Quantitative Analysis in Sports, 6.

https://doi.org/10.2202/1559-0410.1266

[6] Vallone, R. and Tversky, A. (1985) The Hot Hand in Basketball: On the Misperception of Random Sequences. Cognative Psychology, 17, 295-314.

https://doi.org/10.1016/0010-0285(85)90010-6

[7] Bowman, R.A., Ashman, T. and Lambrinos, J. (2015) Using Sports Wagering Markets to Evaluate and Compare Team Winning Streaks in Sports. American Journal of Operations Research, 5, 357-366.

https://doi.org/10.4236/ajor.2015.55029

[8] Albert, J. (2012) Streakiness in Team Performance. Chance, 17, 37-43.

https://doi.org/10.1080/09332480.2004.10554913

[9] Albert, J. (2013) Looking at Spacings to Assess Streakiness. Journal of Quantitative Analysis in Sports, 9, 151-163.

https://doi.org/10.1515/jqas-2012-0015

[10] Miltenburg, J. (1989) Level Schedules for Mixed Model Assembly Lines in Just in Time Production Systems. Management Science, 92, 192-207.

https://doi.org/10.1287/mnsc.35.2.192

[11] Groeber, D.F., Shannon, P.W., Fry, P.C. and Smith, K.D. (2001) Business Statistics: A Decision-Making Process. 5th Edition, Prentice-Hall, New Jersey.