A Bayesian’s Journey in Elections Season
by Tiger Gao, Jack Edmondson, Tom Bearpark, Michael Psenka
This semester, Tiger, Jack, and Tom have been taking Princeton’s 1st-year PhD econometrics sequence with Prof. Chris Sims, who won the Nobel Prize in Economics in 2011 for his work in macroeconomics – more specifically, his pathbreaking application of Bayesian inference to evaluate economic policies.
Nate Silver, widely considered as the preeminent pollster, uses Bayesian methods in his forecasting. Taking his course expanded our conception of statistics and probability theory in a way never before, and we thought it may be interesting to apply some of the influential Bayesian concepts he taught us to our political debates today. We hope this brief exploration below could be somewhat helpful in informing you of the foundational methodology that Silver uses to forecast.
The questions we seek to answer here are:
Was Nate Silver right in 2016?
Can we even judge whether a forecaster is right or wrong?
Are elections chaotic systems that we cannot predict or controlled systems that we can?
Should forecasters incorporate the likelihood of a coup / contested elections in their models?
Do alternative facts (or truths) exist?
Do we have enough data to make predictions for someone like Trump?
… and many more
Our co-author Michael is a math major at Princeton, and those who have contributed to this article through comments and informal conversations include professors and graduate students in economics, mathematics, and political science. We would sincerely appreciate any feedback and hope this is only the start to many exciting conversations to come.
When Tiger first told his parents that he’s writing a long article on the theories and applications of elections forecasting, they said: “nobody cares about your math; just tell us who the winner will be.” This is what millions of voters truly want – clarity, simplicity, and accuracy. The fact that forecasting has become so complex that it would take us pages to explore even the most fundamental concepts only shows the progress made by political scientists, but also the unnecessary over-complication of simple ideas. The result is that the public receives much more and noisier information, while their understanding of the elections has not been improved. This is a great tragedy in our opinion, and in this article we hope to simplify and deconstruct some of those debates for you.
What is Bayesian statistics? (You may skip this technical segment if you just want to read why Nate Silver is worse than Crackhead Jim…)
Let us first set the scene. When we talk about statistical inference – the process that draws conclusions from sample data – two popular frameworks are the frequentist and Bayesian methods.
Say you want to infer what percentage of American people want to vote for Trump. Frequentists would say: I don't know what that percentage is, but I know that value is fixed, meaning that it is a number that is not random. As long as you can ask everyone (and everyone answers truthfully), you’ll get that number. You can collect some data and make your estimation, and then only two things can happen: either your estimation is consistent with the actual “true” average, or it’s not. Well, the tricky thing is that you can never really test this hypothesis out unless you literally go out there and ask every single American.
Bayesians would say: sure we may never be able to ask every American’s opinion of Trump, but given our polling of people around us, we may be able to assign a probability distribution to that unknown percentage we’re interested in. For instance, I would assign almost zero probability that only less than 10% of people actually want to vote for Trump (highly improbable!); and maybe I’ll assign a 52% probability for a percentage between 25% and 35%, and such and so on…
So, the Bayesian method allows you to start making predictions even with very small datasets! But the true beauty of Bayesian inference doesn’t stop here – it is that you would keep updating your beliefs as you see more data. For instance, maybe the first 50 people you talked to are all liberal college kids, so you might arrive at a belief that nobody supports Trump. But as you ask more people (hopefully now some people on the Right), you’ll realize that your “prior” belief (probability distribution) was wrong, and you can update your “posterior” belief based on the conservatives you’ve just talked to. You then go ask more people; and based on how right/wrong you are, you keep updating your belief and continue down this process…
In Bayesian statistics, you assign a probability distribution to all of your unknown parameters and predictions. The way you solve a problem using Bayesian inference is, you construct some joint probability distribution for your knowns and unknowns – and then use the laws of probability to make statements about the unknowns given the knowns.
It’s just three things in the equation: you have a “prior” (a belief about the true average height in your mind), and a “likelihood” (what values of that true average height is consistent with the data you have) – together, they help make up the “posterior” (the updated belief about true average height in your mind). In the later part of this article, prior simply means the belief you used to have before being exposed to any new data; posterior simply means the updated belief after seeing new facts and data.
It sounds complicated, but in practice it’s an intuitive, iterative process: you already have some preconceived prior belief; you got some previous historical or new data; you go through them; you arrive at an updated posterior belief about the unknowns based on the data; you test this new belief against what you observe in reality; and based on how right/wrong you are, you update your belief and continue down this process…
To get to the “truth,” you don't need to start with a lot of data; you just need to be willing to update your beliefs as you see more data, and update them especially strongly when something unexpected happens. You may wonder about all these “simulations” on 538 and why Silver’s predictions change every now and then – it’s because Silver keeps testing his beliefs against new data and updating his predictions. What Nate Silver does is a fundamentally beautiful statistical process.
Nate Silver was right – you just don’t understand statistics
Nate Silver is a Bayesian, and his forecasting isn’t just popular amongst the public, but also highly regarded by many seasoned econometricians we’ve talked to. Silver’s final prediction on the 2016 election night was around 30% likelihood of Trump winning, and before then he fluctuated around 16% likelihood of Trump winning.
The side supporting Silver believes that Nate Silver wasn't wrong in 2016. We should not forget that 16% is the probability of getting a six in a die roll (or any other number), which is actually quite high.
So, the people who saw a 16% likelihood as “oh Trump is definitely going to lose" just simply didn’t understand statistics. Because of their lack of knowledge in statistics, the supporters of Silver would say, they could not grasp the true meaning of Silver’s forecast.
This is happening again in this election cycle. For the past few months, Trump has been consistently polling at below 45%, and Silver has now assigned him a 10% probability of winning at the moment. This sounds absurd to most people at first sight. The difference is just 5% of the vote – how can you drastically reduce his winnings odds to 10%?!
But if you seriously reason through the probability, Silver is correct: in a two-person game where whoever gets above 50% wins, it is entirely reasonable to assign less than a 20% chance of winning to the candidate who has consistently polled at 45% or below for months. “I think you will be surprised at how far away 45% is from 50%,” an older economics graduate student extremely knowledgable in econometrics and statistics explained to us. Assuming that the remaining 55% are with Biden, then you are asking over 7.5 million Biden/undecided supporters to suddenly change their mind on elections day. 5% sounds small, but in reality it will be a dramatic shift.
So, it is entirely mathematically sound for Silver to have made his predictions back in 2016 and today. He was right then, and he is again right today saying that Trump has a 10% likelihood of winning.
Nate Silver was NOT right – because he can never be wrong!
Those against Silver, however, would argue that Silver’s forecast was misleading, and expecting the public to understand the nuances of probability is unrealistic. You cannot expect the American public to react to a 16% likelihood as “oh Trump actually has pretty good odds!”
But we think there’s an even more philosophical and deeper argument to be made here, which is that Nate Silver cannot really be right or wrong when there’s no strict standard to judge him.
Consider the example of Crackhead Jim. Every election, he just says that the Republican candidate has a 50% chance of winning, and the Democratic candidate has a 50% chance of winning.
No matter who wins, he will argue that he’s right and a genius – if Silver’s 16% were good odds, Crackhead Jim’s 50% would be amazing odds. So, is Jim much better than Silver? The question remains: how do you call out Crackhead Jim for the fraud he is?
The fact that we cannot make a judgement on who's right and who's wrong for a prediction of an election is in the same way that the physics community says that the famous “String Theory” cannot be right or wrong: there's no way to verify it. Sure, the math checks off in many models for String Theory, but there’s no fundamental way to say whether it’s a good theory because we still cannot run experiments on it to prove it’s in line with reality. This is why physicists refuse to definitely conclude whether String Theory is right or wrong. (This part is Michael trying to show off his physics knowledge).
Likewise, any verification of one’s election prediction would involve having some reasonably good simulation of American voters, and we repeatedly run the simulation to see if Trump or Biden would win. But we obviously don't have anything close to that (which we explain more later).
One possibly good metric for a forecaster’s accuracy in predictions would be a metric on the forecaster themselves, rather than the forecasts. We may tally up how many elections a forecaster was right on, since this gives some notion of verifiability for the forecaster. But again, the forecasters have cleverly transformed their predictions from binary outcomes into continuous random variables.
If Nate Silver predicts ONE winner for every election, it would be easy to verify him. But instead, he gives a probability like 16% (which few people understand the true meaning and calculation behind it). This allows him to always explain in hindsight whether that’s in fact a really good or bad number.
The issue is that the forecasters, through their complex probability models, have made this game easier for themselves. It would be a much more transparent and stricter benchmark to judge them on binary outcomes (whether the winner of the election is the winner you predicted), but they’ve created a “heads I win, tails you lose” situation. Brilliant minds by these “revisionist statisticians.”
The punchline is: IF THERE’S NO WAY THAT I CAN TELL YOU’RE WRONG, I WILL NEVER SAY THAT YOU’RE RIGHT!
Should forecasters incorporate “Black Swan” events into their models?
UBS said there's only a 15-20% probability that we get a winner by election night; then zero probability that the courts decide on a winner in the week following; then we should put uniform probability from then up to the inauguration date as to when we’ll get the results. In other words, we will likely not have a clear winner on election night, and then there does not seem to be a precise date regarding when we’ll receive the results.
Because of the radical uncertainty we’re facing, shouldn't anyone’s forecast model seriously adjust according to these factors beyond the election itself? After all, the likelihood of one’s prediction of actually coming into fruition in some way on election night is only 15% or so.
The question now is: if you think a regime changing event like a contested election or Trump pulling off a coup could happen, should you take it into consideration in your probability model? Has Nate Silver already done that in saying maybe Trump has an 8% likelihood of winning because of that and therefore here's the total probability of him winning the actual election? We're not sure, but most likely not. For any pollster or election forecaster to model these events would mean incurring serious risks to their reputations. Silver is only looking at the voter sentiments as they are and then making predictions based on these data, rather than incorporating possibilities like a coup.
The questions on our mind are: What if Trump instigates a coup and refuses to leave? Does the recent appointment of Justice Amy Coney Barrett change the court’s decision of any contested election? Will the announcement of 7.4% record-level GDP growth one week before the election sway voters? What about the recent record Covid-19 cases in many regions of the country?
Are the forecasts a chaotic or controlled system?
As mentioned above, frequentists wouldn’t assign probability distribution on unknown values. They say since it’s a fixed and known value, there’s no point of giving it a probability. Bayesians would say: sure we don’t know, but given our samples, we may be able to assign a probability distribution to the parameters we’re interested in.
Michael’s belief is that if you’re not giving him the actual model, he’s going to take your prediction with a grain of salt. The variance for this prediction is way too high, and it’s hard to say. The only way for us to measure the consistency of our data is for the election to happen.
The random variable people really care about, let’s call it X, is who is going to win the election, which is largely dependent on how many people vote for each candidate at some future date.
An intuitive way to get an estimate on X is to estimate how many people are voting for each candidate right now. As long as your polls are unbiased and we can assume people won’t change their vote too much until the election (probably an easier assumption than unbiased), and you polled enough people, basic probability theory gives a good guarantee that basic polling will give a good estimate. Obviously the polls were off last election, perhaps bias here is to blame, but that’s tough to say.
Another way to estimate X is to go beyond polling, and perhaps use historical election data with some Bayesian inference method. These are things that I will pass with a grain of salt unless they’re telling me exactly their method of inference. As your Bayesian inference gets more complicated, the bias in the inference data is much harder to rigorously detect than bias for polling data. Furthermore, elections clearly aren’t a well-posed mathematical system. Law of large numbers is all we need to make some guarantee that basic polling can give a good estimate of X; this kind of guarantee is harder for Bayesian inference methods, and highly dependent on what your inference method even is.
All in all, it is much easier to look at a poll number and see if it’s a “good guess”. If someone, even if you think they’re really smart, gives you a prediction based on a black-box Bayesian inference method (or any machine learning algorithm that you don’t know), my personal hypothesis is that the underlying distribution of X is way too complicated for any prior information you assume to not induce too much bias error.
If you see every American’s voting decision as a random variable, in total this could be a chaotic system. Little unpredictable factors could result in dramatically different outcomes.
Are polls even accurate?
If you follow any political polling accounts on Twitter, then you’ve no doubt seen certain replies to a tweet that isn’t favorable to Trump: “The polls are inaccurate!,” people say. “2016 was a real winner for the pollsters!” “Why don’t you ask Hillary how the polls turned out?” Are they right?
Well, we first need to ask ourselves a question: what does it mean for a poll to be accurate? Does it mean that the poll closely matches the final voting outcome? Or maybe that it just closely captures the current sentiment among voters? If the latter is the case, then why does it really matter what polls say, and how can they even be useful for prediction?
For the most part, we should see polls as the latter; polls, when properly executed, are a measure of the preference of voters at a certain time. The question for those making models becomes how strong of a predictor they are regarding the actual election outcome.
So, were the polls wrong in 2016? Well, yes and no. It’s true that most state polls had Hillary leading going into election day, though her lead had narrowed considerably after the “October surprise” from Mr. Comey. The very last poll from PA showed Trump with a lead, but most others showed the Democratic nominee with a slight advantage. Conversely, national polls were pretty much dead-on, with a polling average of around 2% in favor of Clinton, which she ultimately attained. Why the disparity between state and national polls?
Some theories say that people were too shy to admit their support for Trump over the phone, or at least the many undecideds in 2016 actually were leaning Trump but didn’t want to say so. Political scientists have utilized this type of effect to explain poll-result disparities before: in 1982, Democratic candidate Tom Bradley, a Black man, ran for governor of California; despite leading in the polls, he lost narrowly to the white Republican, George Deukmejian. Some would suggest that people responding to polls didn’t want to admit that they opposed Bradley, lest they seem like they opposed him due to his race, thus causing his support to seem inflated in polling. Could there have been “shy Trump supporters” who didn’t want to admit support on account of his perceived sexism or racism? Possibly, but there’s a better explanation, especially when we remember that Senate candidates like Ron Johnson (R-WI) and Pat Toomey (R-PA) also were down in the polls and pulled off wins, and we don’t think voters were too shy to report their preference for more traditional conservatives.
What plagued many polls was probably an issue of weighting. Many polls’ samples often skew democratic, independent, or republican. Thus, if the results of the poll are not weighted to match the population balance, it will give a false sense of the race, as certain voters who tend to lean one way or the other will be overrepresented. Polls are weighted using demographic data on education, race, marital status, and political affiliation. The problem, though, is that in state races, the data is not as accurate as it is nationally, and many pollsters often have neglected to actually weight their polls. The result? Clinton support was overstated. Of course, there were other factors too: high turnout in rural areas in rust belt states, undecided voters going into the polls largely broke for Trump, turnout/enthusiasm in some Democratic areas was down compared to 2012 and 2008, etc..
Regardless, there were key signs in 2016 that state polls missed: polling in some congressional district races showed heavy Trump support, even in those that Obama won. A late poll in NY-22 showed Trump with a 12 point lead over Clinton, despite Obama tying Romney there in 2012 49-49. Trump ultimately won that district by over 10%.
So what can we conclude for this year? There are two big takeaways: one, it’s very possible for undecided voters to skew heavily towards either Trump or Biden (remember from before that unlikely outcomes still can happen!), which can cause a less-probabilistic election outcome to occur. Two, polling can be and has been wrong, but it’s usually due to weighting. It’s very likely that national averages will be quite accurate, but polls that have less accurate weighing (or none at all) should be viewed with more skepticism. In fact, many pollsters openly admit to using unconventional samples or weights because they have some qualitative belief that they think it will better match the true outcome. This practice is common among some pollsters, which are often denoted with (D) or (R) on polling aggregators like RealClearPolitics, in order to make support seem higher for a preferred candidate. In general, though, if weighting issues occur among good polls in one state, it’s very possible those same issues will occur in states that have similar demographics – look at 2016, where the Rust Belt states all had state polling issues. Make sure to read the fine print on state polls to understand their methodology, lest you make the same mistake of 2016!
Do alternative facts exist?
Do we assume there is a set of facts (some policy’s impact like President Trump’s tax cut or the scientific truths behind climate change) that are fixed but we don't know and therefore random, or do we treat them as non-random?
The connection between Bayesian econometrics and “alternative facts” seems a bit far-stretched. One of the objective things that Bayesian inference theory shows is how people update beliefs. Two people may start with different prior beliefs and may take a long time to converge to what the truth is, but in Bayesian theory, they will eventually agree on the distribution of uncertainty, unless the two people put zero probability on the other person's model. In other words, theoretically, as long as they give some serious consideration to the other side's argument, they will eventually agree with each other. You don't need to believe there's a fixed truth; you just need to be willing to update your beliefs, and update them especially strongly when something unexpected happens. This is the optimism of Bayesian theory.
In reality, however, we’re not so optimistic. First, there's psychology going on, which is that people don't seek out the facts that they disagree with, and they might not update their beliefs when they encounter new facts. In other words, people have really, really strong (and often inconsistent) priors, and their updating procedure is dependent on their prior beliefs. For example, if one believes that climate change isn’t due to human factors, then the effect of new information on this person’s posterior may heavily depend on whether it agrees with the existing prior – a fact of CO2 emissions will influence the person very little, while some fact about the “unpredictability of weather” may deeply reinforce this person’s conviction that climate change is not due to human actions.
Then, let’s say convergence of beliefs will definitely happen, it may simply take forever that people lose their patience. Probability theory says that any event with a nonzero probability will eventually happen if you keep doing it. So you will eventually get hit by a car if you cross the road for enough times, but that “enough times” may mean 2000 years which is entirely unrealistic to expect.
Do we have enough data to predict someone like Trump?
Say we’re interested in the percentage of people who will vote for Trump vs. Biden on Election Day. This is something we cannot predict for sure, so a Bayesian would put some probability distribution for this number, and we might look at previous elections to come up with that probability distribution.
But one hypothesis is that Trump is simply so different from all other political candidates that we used to know. He appeals to his base in a unique way that few rational minds on the Left could understand; his frequent outrageous statements have disrupted our conventional measure of acceptability of political discourse……
So here’s the dilemma proposed by Michael: You can either use all the previous election results data, which have less variance in your prediction, but your prediction may very likely be skewed because they don’t accurately represent Trump. I say it doesn’t accurately represent Trump because he’s more or less a dramatic “regime change” that really broke most prediction/polling models back in 2016. His rise to power and day-to-day operations continue to surprise and puzzle people, so it seems that we won’t arrive at very accurate results by resorting to conventional wisdom. The alternative, then, seems to be to only use data we have since 2016, which may simply not be enough data for us to come up with anything meaningful either!
The counter-argument would be: well, we already have experienced four more years of Trump, which means four more years of careful analysis and repeated polling of voter sentiments. Four years sounds short in the long run of history, but in today’s age they yield so much data that should be sufficient for any analysis. Also, Trump may not be that “ground-breaking” anyways as many previous presidents like Nixon had their unique ways of appealing to their bases and upsetting conventional wisdom back then. Trump is not that different.