The students in David Stein's Political Statistics class at Montgomery Blair High School in Silver Spring, Maryland have built a statistical model for predicting the outcomes of the upcoming midterm elections: the model makes assumptions about voter turnout and the way that polling data will translate into votes in 2018.

More interesting than the choices that the class made for its model is the thorough and thoroughly accessible documentation of those choices: the class's explanatory notes of their statistical assumptions are the best I've seen -- the kind of thing that we usually only get after the fact, when trusted predictions fall short of the mark and pollsters set out to explain how they got it so wrong.

Getting this kind of information with the predictions themselves, ahead of time, gives us a way to understand conflicting predictions and decide whether and which we'll trust.

At this point, we have predicted vote shares and standard deviations of vote shares, so it might seem that calculating the overall chance that a party gets a majority of seats is rather simple. However, this is not the case.

In order to avoid excessive computations or calculating an exact number, we simulate the entire House election 10,000,00010,000,000 times each time we run our model. The probability that each party wins a majority of seats in the House is then about the number of simulations in which this happens divided by 10,000,00010,000,000.

A naïve approach to use here would be to simulate each district separately, using the numbers we already have. However, this approach has the implicit assumption that the district vote shares are all independent, which is certainly not the case. For one, the systematic bias of polls can often be caused by the same factors from district to district. Also, since our model is not perfect, it may be consistently off in one direction for most districts. Thus, our simulation must introduce some correlation between districts.

