1. Historical Development of Polls
In 1824 Andrew Jackson ran against John Quincy Adams for president. A newspaper conducted a straw poll of about 500 people and Andrew Jackson received 2 out of every 3 votes. The newspaper proclaimed that Andrew Jackson would win. When Jackson received the most popular votes1, the era of the “straw poll” began. In 1936, Literary Digest2 mailed out 10 million questionnaires and 2.3 million people responded. They predicted that Alfred Landon would win in a landslide over Franklin D Roosevelt. At the same time, George Gallup conducted another poll where he sent out trained interviewers to demographically representative samples or quotas. Mr. Gallup’s survey correctly projected a Roosevelt victory. This ended the widespread use of straw polls in favor of quota sampling.
Then came the election of 1948, GallupTM, RopersTM and CrossleyTM all proclaimed that Dewey would defeat Truman. When Harry Truman won the election, the newspapers blamed the quota sampling technique. One unknown study at the University of Michigan conducted a poll based on a “random probability” sampling method. This poll projected Truman would win. This ushered in the statistics methodology currently used today.
The opinion polls relating to Trump were off about the same amount that brought down quota sampling in the 1948 Dewey-Truman fiasco. The following is a general discussion on basic statistics and sampling methods.
2. Basic Statistics
Historically, Empires collected data that helped in making important decisions. In the 5th century BC, the Athenians calculated the height of walls by counting the number of bricks in the wall. The generals found that repeating the count several times allowed them to determine the most frequent brick numbers, which were then used to calculate the height of the ladders necessary to scale the walls.
Mathematicians became involved in calculating and formulating probabilities based on the simple coin flip. The frequency of heads and tails were plotted on a graph. Since there were only two choices the majority of the flips fell at the mean. However, the flips also showed lower percentage events such as three heads in a row or five tails in a row, etc. This resulted in a bell-shaped curve with the most frequent numbers located at its peak and the less frequent numbers tapering away from the center.
2.1. Normal Distribution Curve
As shown in Figure 1, this bell-shaped distribution curve is called a normal distribution. The area under the curve for a specific distance equaled the total probability for that interval. This analysis and calculation depend on the shape of the curve. If the curve has multiple peaks, flat sections, or changes, then the calculated area under the curve will yield different total probabilities. Hence, for basic statistics to work as intended the normal distribution must exist.
The centerline of a normal distribution is located at a zero standard deviation. One standard deviation away from the center in each direction would contain 68.3% of all potential numbers on either side of the mean. Two standard deviations would contain 95.4% of the numbers and three standard deviations would contain 99.7%.
Figure 1. Normal distribution curve. A sample standard deviation is plotted on the x-axis and the probability is plotted on the y-axis. The percentage of each section is shown for each standard deviation. The total of ± two standard deviations will be 34.1 + 34.1 + 13.6 + 13.6 equals 95.4% or ~95%.
2.2. Confidence Interval
This is the total probability that the real number is contained within a particular area under the normal distribution curve. It does not mean that the selected number has a 95% chance of being correct. It only suggests that there is a 95% chance that the real number is contained within ± two standard deviations from the mean. The two are vastly different. The probability of a number being the correct number could be 1 in 100 and still be within the 95% total probability.
2.3. Margin of Error
A margin of error is the amount of error expected in a survey. The smaller the margin of error produces a greater compaction of the probability density around the mean. By knowing the margin, a minimum number of random selections can be calculated to narrow in on the real number. If the margin of error is low, then a larger number of random samples will be necessary.
2.4. Sample Size
Calculations for determining the size of the random sample are simple when based on the normal distribution. Two major factors are the margin of error and the confidence interval. For large populations, there are published tables and automatic calculators (Smith, 2017) . For example, assuming the worst-case situation, if a 95% confidence level and a margin of error of ±3% are selected then the calculated size of the sample is 1,068 people. These variables (95% confidence and ±3% margin) are used by most pollsters in political elections.
3. Sampling Methods
3.1. Random Sampling
A core principle in statistics is to select a random sample. Of course, that is easier said than done. For small samples, a unique number is assigned to each member in the domain, and then a random number generator selects the participants. In large populations, pollsters equate the general population to telephone owners. They randomly generate telephone numbers and an interviewer asks questions from those who respond. Although landlines dominated the telephone usage between 1950 and 2000, cell phones now reach a majority of households. However, cell phones have their own demographic and other complications (Blumberg & Luke, 2007) .
3.2. Non-Statistical On-Line Sampling
An on-line sampling includes direct communications between the pollster and the respondents via smartphones and computers. One polling company predicted a Trump victory using on-line direct communications (Jomeh & Lauter, 2016) . Merely increasing the sample size does not improve the accuracy as shown in by the 1936 Literary Digest study.
4. Polling Data
Table A13, as set forth in Appendix A, lists opinion poll results from the 50 states taken immediately before the 2016 election. The difference between the actual vote percent and the polling projection percent is the polling error. A positive error means that the poll underestimated the actual vote, and a negative value indicates that the poll overestimated the vote. The polling error data in Table A1 are graphically displayed in Figure 2.
The polls underestimated both Clinton (3%) and Trump (6.9%). The average absolute4 polling error for Clinton was 3.8% and for Trump was 6.9%. The fact that Trump’s absolute polling error and his average polling error were the same indicates that there were almost no polls that overestimated his support.
The state polling errors relating to Trump varied from −0.4% (MN) to +17.1% (N.D.) with an average underestimating error of 6.9%. His standard deviation was 4.54. The standard deviation is a measure of the variations in the data.
The first dotted vertical line at point 0 represents where the errors on the positive side (underestimated) equal those errors on the negative side (overestimated). This would be the expected result from a neutral and unbiased poll. If the poll acted in a statistical manner, it would follow a bell-shape5 similar to the one superimposed on the graph.
In Figure 2 the polling errors did not peak at the expected mean i.e. 0, but was shifted +6.9% to the right as shown by the second dotted line. This indicates that
Figure 2. This is a plot of the Trump state polling errors on the x-axis in 1% intervals for the US Presidential election of 2016. The number of occurrences in each interval is plotted on the y-axis. A bell-shaped probability curve is superimposed on the graph to illustrate how the actual points lined up with a normal distribution.
the polls seriously underestimated Trump’s actual votes. The individual points also failed to follow a bell-shape curve, suggesting that factors other than statistics probably affected the polling results.
Although the Clinton state polls’ mean (average) was 3%, her individual state polling errors varied from −4.6% (S.D.) to +15.9% (VT). The Clinton standard deviation was 3.9.
Figure 3 is a plot of Mrs. Clinton’s polling errors. The peak was shifted to the right at 3%. The individuals polling error frequency failed to align with the bell-shaped curve.
Data similar to that set forth in Table A1 were prepared for the 2004, 2008 and 2012 elections, and showed a similar right shift in the peaks. This indicates that the average polling errors consistently underestimated all candidates each year.
The election of 2004 had an average polling error that underestimated both Kerry (2.1%) and Bush (2.7%). Both had higher state polling errors ranging from −2.5% to +12.1% (Kerry) and −2.1% to +9.4% (Bush). The average absolute polling errors were 2.5% (Kerry) and 2.9% (Bush). The standard deviation was 2.9 for Kerry and 2.4 for Bush.
In 2008, the polls underestimated Obama (2.4%) and McCain (2%). Both candidates had higher state differences ranging from −4.5% to +10.5% (Obama) and
Figure 3. This is a plot of Mrs. Clinton’s polling errors for the 2016 US Presidential election with sample a bell-shaped curve superimposed.
Figure 4. This is a plot of Mr. Kerry’s polling errors for the 2004 US Presidential election with sample a bell-shaped curve superimposed.
Figure 5. This is a plot of Mr. McCain’s polling errors for the 2008 US Presidential election with sample a bell-shaped curve superimposed.
−5.5% to +7.6% (McCain). The average absolute polling error was 3% for Obama and 2.6% for McCain. The standard deviation for Obama was 2.6 and 2.7 for McCain.
The Presidential election of 2012 underestimated Obama by 3% and Romney by 1.5%, although both had higher state deviations i.e. −3% to +9.6% (Obama) and −6.2% to +12.6% (Romney). The average absolute polling error was 3.3% for Obama and 2.5% for Romney. The standard deviation varied from 2.7 for Obama to 3.5 for Romney.
The data and plots suggest that the polls consistently underestimated both candidates for every election. These polls (excluding Trump) showed a mean error between 1.6% and 3% with an average of 2.4%. The Trump opinion polling errors were 2.8 times higher than the average (2.4%) and considerably higher than the margin of error.
Figure 6 is a plot of the average absolute polling error for multiple US Presidential election years. It shows the absolute error applicable to the Republican candidates curving significantly higher since 2012.
Figure 7 is the standard deviation of the state polling errors between multiple elections. A high standard deviation measures the degree of data fluctuation from the mean or centerline. A high number raises the potential of missing variables. A continuous worsening of the standard deviation tends to negate an isolated anomaly.
Figure 6. The average absolute polling error is plotted on the y-axis and the year is plotted on the x-axis.
Figure 7. Standard deviation of the state polling error is plotted on the y-axis and the year on the x-axis.
The data identify four significant polling problems, i.e. the polls consistently underestimated the candidates’ actual performance; a substantial variation appeared between state polls; the standard deviation increased each year; and an unexpectedly large polling error occurred relating to Trump.
5. Likely Causation of Polling Errors
5.1. Opinion versus Fact
Statistics and probabilities are founded on “facts”. The flipping of a coin to determine heads or tails are facts. Each flip produces a head or a tail, and each flip can be counted. These are ascertainable and indisputable. The throwing of a dice results in a number and each throw can be counted. Cutting a deck of cards is countable and each cut will result in a particular card. These facts have a few things in common:
They are certain; they do not change, and they are verifiable.
“Who are you going to vote for?” is an opinion based on a person’s state of mind for a future event. It is essentially a feeling that cannot be physically or objectively measured and is far from certain. This opinion can change multiple times before the survey is completed. In addition, a responder can lie; and there is no way for the observer to know or correct it. A responder must also understand the question, whereas a fact does not depend on the competence or incompetence of a person. Therefore, treating opinions as facts is a fundamental error.
Marrying fact-based statistics and feeling-based opinion polls seem incompatible if not bizarre. But, the results showed the mergers have been somewhat successful. For example, prior to 2016 all but one of the presidential pre-election polls6 since Truman has been within the margin of error (NCPP, 2017) . This success rate is a partial verification of this merger. However, proving cause- and-effect is far more difficult. For example, there is no proof that the prior successes were a result of statistics as opposed to the expertise of the pollsters.
One company conducted a study7 (Pollster Accuracy Study) involving 370 different pollsters (Silver et al., 2016) . This study was done before the 2016 election. It showed that the accuracy varied from 1.2% polling error to 23.8%. Some pollsters were within the margin of error 100% of the time (116 companies) and others (42 companies) were always outside the margin of error. Some pollsters (28) never called a race correctly while others (154) had a 100% success rate. One pollster with a 100% success rate called 465 races correctly but only received a C minus rating. This suggests that the poll accuracy was related to the expertise of the pollster as opposed to mathematics.
Exit polls are not the same as opinion polls. An exit poll asks people how they actually voted. This is much closer to a fact rather than an opinion, although it is subject to lying, etc. The exit polling data for the 2016 election showed Clinton had an average absolute exit polling error of 2% and 2.8% for Trump. Both exit polls were well within the margin of error. In contrast, the absolute pre-election opinion polls (Trump 6.9% and Clinton 3.8%) were both considerably higher than projected. This raises a question as to why the two polling results (opinion & exit) were so different in the same wildly contentious election. Historical elections since Truman also show that exit polls have been more accurate thereby negating the 2016 election as an anomaly.
The Pollster Accuracy Study and the exit poll/opinion poll comparison provide a cogent argument that opinion poll accuracy is more related to the pollster’s expertise as opposed to mathematics.
5.2. Nonresponse Rate
In 2000, the percent of people opting not to respond to polling inquiries was 72% (Kennedy & Deane, 2017) . This increased to 76% in 2004, and then to 84% by 2008. By 2012 it rose to 91% and stayed at that level for the 2016 elections. Many experts questioned whether a random sample could be obtained when 91 percent of the population is excluded. Some assigned a “nonresponse bias” to the polling survey. Others ignored the nonresponse rate contending that the statistical distribution of the whole is the same as the statistical distribution of the portion. Studies investigating response rates as it affects poll results could not find a reliable connection (Kennedy & Deane, 2017; Groves & Peytcheva, 2008) .
In Figure 6, the polling error for Romney was 2.5%, which rose to 6.9% for Trump while at the same time the nonresponse rate remained unchanged at 91%. Between the 2008 and 2012 elections, there was no change in the Republican voting error but the nonresponse rate went up (83% to 91%). Between 2004 and 2008 the Republican voting accuracy went down while the nonresponse rate went up. These observations suggest that a meaningful short term relationship between accuracy and nonresponse rates is uncertain and complex. It could be based on a simple coincidence.
On the other hand, the standard deviation as shown in Figure 7 showed an increasing deviation with years, which roughly corresponded to the increasing nonresponse rate.
The fact that one cannot prove a connection between polling accuracy and nonresponse rates does not mean that a connection does not exist. The mathematics of random sampling reveals that a relationship must exist, i.e. a 100% nonresponse rate means no random sample.
5.3. Extrapolating National Polling Data to States.
Some of the state polling data included in Table A1 may have been obtained by extrapolation and not by an actual survey. Disaggregation is a gross comparison using census data. It combines large national databases and then disaggregates the data so as to calculate percentages by states. This is a generalization and is a known source for errors. Another method is Multilevel Regression with Poststratification (MRP). The latter uses the same national survey databases but divides the data into multilevel demographic and geographic predictors (Kastellac et al., 2010) . Whether disaggregation or MRP was used in Table A1 data is not known, but it would provide a logical reason for the wide deviations between some state polls. Many of the highest polling errors did occur in those smaller populated states where extrapolation and/or MRP is likely to have occurred, i.e. Alaska (15.9%), North Dakota (17%), South Dakota (14.5%), Wyoming (13.7%), Hawaii 11.8%), Idaho (9.2%).
5.4. Third-Party Candidates
It is possible that the state polling data were skewed by the presence of third party candidates (Cassino, 2016; Smith, 2016) . During the pre-election polling period, third-party candidates provide a convenient way to protest. Protest votes are usually not at a factor at election time. Some pre-election polls limit the choices to the main candidates. However, this invoked strong criticism and some lawsuits. To avoid this, many pollsters present results for both, i.e. one poll for a two-party race and one that includes significant third-party candidates. The presence of third-party candidates provides a good explanation for the opinion polls consistently underestimating the major candidates.
There are many types of biases that can affect opinion polls. This is supported by the Pollster Accuracy Study that showed extremely divergent results depending on who did the survey.
5.5.1. Financial/Political Bias
This bias would apply to all elements of the polling methodology, i.e. sample selection, interviewer bias, question bias, response bias, weighing bias, etc. The underestimate/overestimate polling errors suggest a bias, particularly when it goes outside the margin of error. In the 2016 election, Trump was underestimated8 by 6.9% which was more than the margin of error. This size of error suggests a fundamental flaw (equating opinion with fact) or severe bias in the polling.
Of the pollsters used in the Pollster Accuracy Study indicated there were 74 pollsters that leaned toward the Democrats, and 27 that leaned to the Republicans. That is a favoring of 2.7 Democrat leaning pollsters for each Republican leaning pollster. But the amount of “mean reverted bias” associated with each of these pollsters was significantly greater, i.e. the Democrat leaning pollsters had a total bias of 63 with an average of 0.84/pollster; whereas the total bias for the Republicans was 7 with an average of 0.25/pollster. Hence, not only were far more pollsters favoring the Democrats the amount or degree of bias by each pollster was much greater. This accuracy study was done months (updated Aug 5, 2016) before the election. A review of the Pollster Accuracy Study indicates that more than 60% of all polling entities are associated with Universities and Academia are 90% Democrat donors (Kiersz & Walker, 2014) , and may be one of the reasons why the Democrat leaning is much higher.
The results of the 2016 election indicate that something was seriously flawed, and a review of the pollsters used in Pollster Accuracy Study exposed a major bias component, both in number and amount.
5.5.2. Sampling Bias
There is a difference between a sampling bias and a sampling error. The sampling error is a methodology related issue that should be included within the margin of error. The sampling bias is based on a conscious or unconscious sample selection. This point is illustrated by agricultural workers who may not be reachable during the working season. There are 3.2 million farmer/ranch workers in 2012 (USDA, 2014) and farmers lean to the Republican Party by approximately 80% based on the number of donors (Kiersz & Walker, 2014) . This indicates that telephone surveys of farmers may result in a significant under-sam- pling. Similar extension analysis would need to be done for the Mining Industry (90% republication) Construction Industry (65%) Oil & Gas (70%) and Real Estate (60%). Offsetting analysis would have to be done on those favoring Democrats, i.e. Entertainment (90%), Academia (90%), Newsprint (85%), On-line Computer Services (70%), Legal (70%), and Pharmaceuticals (65%). To determine if sampling bias existed with the data in Table A1 would require access to unavailable internal data from each pollster. This is a potential source for errors in the 2016 election.
5.5.3. Publicity Bias
This hypothesis is that the publication of the opinion poll directly affects the turnout and voter preferences thereby potentially making the poll a self-fulfilling prophesy. However, studies show mixed results. There is evidence that the party gaining in the polls will benefit in a bandwagon type effect (Dahlgaard et al., 2016) . A study of voters exposed and unexposed to the opinion polls showed no statistical difference (Knappen, 2014) . One study is not sufficient to support or negate a relationship.
5.5.4. Search Results Bias
Another potential problem is SEME (Search Engine Manipulation Effect) by computer search companies (Bing, Google, DuckDuckGo, Chrome, etc.) (Epstein & Robertson, 2017) . Technically, SEME is not a deficiency with the statistical polling methodology. It is more of a direct attack on the voting process. The Epstein et al. study indicates that SEME is much stronger than a bias and could possibly qualify as an interference with the election process itself (Epstein, 2019) . Although SEME is a serious problem, it is not analyzed or discussed further in this article.
6. Potential Remedial Measures
6.1. Statistical Based Polls
Solutions to problems associated with “opinions” in statistical polling methods are not fully studied in this article. These are issues that should be addressed by the professional organizations such as AAPOR (American Association for Public Opinion Research) and the various governmental entities that regulate this area.
6.2. Statistical Error Analysis Method
A new methodology was discovered while trying to compress polling data. It is based on using error data rather than polling data. The error data is the difference between polling data and actual data. This methodology is covered in a patent application publication (Nelson, 2019) . It had a 92% accuracy of predicting the outcome in the last 4 elections. It had a 100% accuracy in predicting a Trump victory in the 2016 election.
6.3. On-Line Polling Methods
On-Line methods include indirect and unknown communications, i.e. sampling without the knowledge of the person. This includes searching social media and TwitterTM accounts for word frequency (Lampos & Cohn, 2013) , the ratio of positive words to negative words (O’Connor et al., 2010) , tweet counts (Tavabi, 2015) , etc. Businesses have successfully used on-line search queries, word mining, and credit transactions for years. It has been used to accurately calculate public health events, i.e. contagious diseases (Ginsberg et al., 2009) , hurricanes (Vlachos et al., 2004) , earthquakes (Sakaki et al., 2010) , etc.
On-line nonprobability surveys are becoming more popular and more accurate [Kennedy & Caumont 2016; AAPOR 2013]. In the 2012 presidential election, the on-line polls outperformed both telephone and live polls [Silver 2012].
A major concern exists with on-line polling. There will be entities that could use computer programs to plant words, tweets, queries, etc. in a manner to influence the poll outcome. Programs attempting to detect these intrusions can be circumvented, and are usually not prepared until after the intrusion becomes known. In addition, on-line polling may be affected by countries and entities outside the United States.
In the past, every time that a polling scheme is overwhelmingly wrong, a new methodology was adopted. This occurred in 1936 in the election of Franklin Roosevelt where Landon was the polling favorite, in 1948 when Dewey was the favored over Truman, and again in 2016 where almost all polls projected Clinton would win over Trump. In the 2016 election, Mr. Trump was underestimated by 6.9% and higher than the margin of error. Graphs show that the maximum error did not occur at the expected value, nor did the data align with a normal statistical bell-shaped distribution. Major vulnerabilities exist with combining fact- based statistical analysis with feeling-based opinions. Basic statistics equations do not cover feeling-based factors, i.e. biases, truthfulness, competency, nonresponse rates, etc. A comprehensive pollster accuracy study showed that the most widely used pollsters had significant biases favoring Democrats over Republicans. The 2016 polling failures illustrate deficiencies in the existing approach supporting the view that a new polling methodology is needed.
Table A1. 2016 presidential polling and election results.
1Jackson did not receive the majority from the Electoral College and John Quincy Adams became the sixth President.
2A weekly magazine published by Funk & WagnallsTM until 1938.
3Source for the information in Table A1 is from the Real Clear PoliticsTM website (http://www.realclearpolitics.com). For four states, the information is from Washington PostTM http://www.washingtonpost.com/2016-election-results/alabama/) for Alabama; Election ProjectionTM (http://electionprojection.com/presidential-elections.php) for Hawaii and North Dakota, and 270 to WinTM (http://www.270towin.com/2016-polls-clinton-trump/wyoming/) for Wyoming.
4Absolute polling error is based on the average change from the mean where the negative numbers are converted into positive ones. The absolute value is more appropriate when compared with the margin of error.
5The shape of the bell-shaped curve is illustrative only and not to scale.
6Maximum difference between average poll and actual vote. 1956 Eisenhower/Stevenson – 1.5%; 1960 Kennedy/Nixon – 1%; 1964 Johnson/Goldwater – 3%; 1968 Nixon/Humphrey – 1.5%; 1972 Nixon-McGovern – 2.6%; 1976 Carter/Ford – 3%; 1984 Regan/Mondale – 2.6%; 1988 Bush/Dukakis – 2.2%; 1992 Bush/Clinton – 1.8%, 1996 Clinton/Dole - 2.7%; 2000 Gore/Bush 2.3%; 2004 Bush/Kerry – 2.9%; 2008 Obama/McCain 3.0%.
7FiveThirtyEightTM is a polling aggregation website and made the raw data for the pollster accuracy study available for download via https://github.com/fivethirtyeight/data/tree/master/pollster-ratings under the file name “pollster-stats-ful.xlsx”. Of the 370 pollsters studied the Democrat leaning/Republican leaning bias was the same for each, i.e. 0.5, suggesting a bias neutral study.
8This was based on the average of the top opinion polls done by RealClearPolitics as set forth in Table A1 for each state and then the average of all states.