Model Differences

Instead of engaging the frenzy of incredulity on the right over whether the popular statistical models tracked by this site (one in particular, no surprise) will be proven right, I think it is worth reminding ourselves how the models are actually structured so that we do not lose sight of their overall philosophy. Having a good grip on model architecture will enable us to enumerate and evaluate the differences between them (and any other competing approaches) more convincingly.

All three models plus Votamatic can be thought of as probabilistic state poll aggregators. The aggregation is on two levels:1) the models call the candidate vote share in each state separately–though not necessarily independently–using the consensus of (mostly) state-based polls, then 2) compute the Obama/Romney Electoral Vote (EV) count as the sum total of individual state outcomes. This is sensible since the election is decided by the Electoral College and not the popular vote. Crucially, each call is not a single value of the vote share but a probabilistic range, reflecting uncertainty in the call due to various culpable factors. This is what sets these models apart from conventional poll aggregators such as RealClearPolitics and TPM PollTracker. Where the models do differ is thus what elements contribute to the uncertainty measure. The table below summarizes the key modelling assumptions that affect this measure.

1. Forecast or Snapshot?
This is perhaps the most important distinction between the models, which may not be immediately apparent to the generic obsessive poll watcher (myself included). To the extent that current polls only reflect current outcomes, snapshots make an inference about the present state of the race, thus can be thought of as providing the likely range of EV outcomes if the election were literally held today. On the other hand, forecasts attempt to call the EV outcome on Election Day, so additional assumptions about how the polls are going to move between now and Nov 6 is required which introduces an extra component to the uncertainty measure (this is why Votamatic is not tracked by this site, being a pure forecast model). The added uncertainty tends to ‘even out’ the probability distributions of forecast models, which makes any individual outcome relatively more likely but also stabilizes the center of the distribution (thus stabilizing the forecast). However, the closer we get to election day, the smaller this uncertainty element becomes, until the forecast model essentially converges to a snapshot on election day.

2,3. Poll aggregate measure and polling adjustments
This step determines the foundation of the snapshot call for a given state. Given a multitude of state polls up to a particular day, what kind of measure appropriately estimates the candidate share of the vote (and thus its surrounding error margins) for that day? The simplest philosophy is that of RealClearPolitics, which takes the crude mean of the polls. FiveThirtyEight (538) and HuffPost Pollster (HuffPo) use adjusted means as they believe that systematic pollster biases such as house effects will skew the aggregate on days when certain polling firms happen to dominate. Polls are also weighted according to sample size and/or pollster quality. Princeton Election Consortium (PEC) tackles this problem using the median of polls, which ignores outliers and performs well with a large volume of polls.

*While Votamatic does not have a snapshot aggregate measure as it starts with a predicted baseline outcome (the forecast ‘prior’) and only then incorporates state poll information, the methodology is consistent with treating all polls equally i.e. a simple mean. Drew Linzer has done some excellent validation to demonstrate that the polls can indeed be assumed to be simple random samples of the true candidate vote share i.e. reflecting the same mean with most deviations coming from sampling error and not systematic error.

4. Trendline (interstate correlation) model
One important aspect of predicting the Electoral Vote outcome by aggregating fifty different state calls is that the quality of information is not consistent across all states on any given day. Some states never get polled, other states are polled less frequently, while on particular days a state may be dominated by polling outfits with a systematic bias. Calling each state independently thus assumes that the error margins we construct for one state are as credible as any other, which may not be true. A trendline element attempts to correct for this inconsistency by borrowing information from other states when updating the call for a particular state. 538 and HuffPo use a nearest-neighbour model to determine which groups of states can best fill up the gap for each other on sparse polling days. In addition, 538, HuffPo and Votamatic include a national trend that impacts all fifty states, with 538 and HuffPo using national polls and Votamatic using a unified trend estimate gleaned simultaneously from all available state polls.

5. Baseline (prior) model
This element is perhaps the strongest modelling assumption that sets each model apart from the others. Implicit in a forecasting philosophy is not only how the polls are going to move between now and election day but where. It is easy to see that for a model that does not include a forecast component (HuffPo), the best guess for the state of the race on any day is simply to take the previous day’s estimate. However, the other three models have a definite idea of where the candidate vote share should roughly end up on any given day, and thus by extension on election day, even in the absence of polls. 538 thinks that nationally, economic fundamentals determine the overall contours of the race (and is therefore movable with changing economic conditions), while in individual states demographic variables (including partisan voting history) are influential. PEC says that come election day, the range of outcomes should not veer far off from that described by the average and standard deviation of the meta-margin (Obama’s Electoral College lead/deficit translated into a grand poll lead/deficit). This is a form of ’empirical’ prior and fits in with the idea that voting preferences are generally stable. Finally, Votamatic anchors each state with the candidate vote share predicted by the well-regarded Abramowitz Time-for-Change model, which estimates the magnitude of uniform swing on Obama’s 2008 vote share using incumbency, economic and presidential approval variables.

Most of the model differences (factors 2,3 and 4) have a larger impact on the construction of each model’s snapshot, which means that if these differences are significant this should become more apparent the closer we get to election day. As we can observe from the ModelTracker at the top of the page, this is not the case. Such behaviour is consistent with the idea that the polls do tell the same story** i.e. modeling assumptions affect the outlook less and less as more polls become available, allowing systematic biases to be canceled out and polling gaps ‘filled in’ just by the sheer volume. Where I think the models have truly diverged is the quality of their forecast i.e. assessing which model got it right first. Here, the baseline model assumed (factor 5) is critical. If it turns out that Obama wins 332 EV for example, and the distribution of outcomes are consistent with Votamatic’s uncertainty calibration (i.e. 3 out of 5 states called for Obama at 60% confidence actually go to him) then Votamatic’s incumbency-heavy baseline is a far more informative prior than 538’s economic variables. It bears mentioning that in all forecasts the contribution of baseline model steadily diminishes until it is negligible by election day, but it is interesting to examine which set of baseline assumptions are able to approximate the outcome far in advance. This is one thing that is really worth looking out for come election day.

**What if the consensus of polls is off?
As Simon Jackman has pointed out, same story does not mean the right story. The polls may converge but all be off in the same direction. However, a fair accounting of this possibility is to acknowledge that the polls are just as likely to be underestimating Obama’s level of support as they are overestimating it, as a historical assessment of the polls show that there is no pattern to the magnitude of the miss by the consensus (large misses are likely but in both directions). When Simon does incorporate this additional uncertainty element into his HuffPo calls, Obama’s odds shift from being something like a 9:1 favourite to a 7:3 favourite.


One response to “Model Differences

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s