Predicting November Turnout

Predicting November
I’ve spent the last few months trying to see if there’s a way I could plausibly predict November. Sites like FiveThirtyEight do a plenty good job of national races, but what can we say about state races? Could Democrats win the Pennsylvania House? The PA Senate?

Well, I finally think I’ve got a model that does a plausible job.

​Soon, I’ll publish some predictions for the winners. But first, let’s look at turnout.

Trends in PA Turnout

Today, I’m focusing on the turnout in even-year general elections (so all Presidential or Gubernatorial races) since 2002. I’m going to only use the two-party vote and use the total votes for President or Governor as turnout, rather than the actual turnout. This ignores third party voters and people who skipped the topline election altogether. The difference between this and actual turnout won’t be large, and this makes the predictions later easier

Between 3.5M and 4.1M Pennsylvanians voted in the midterms since 2002.

What’s a good guess for turnout this year? 2006 seems like an obvious benchmark. In that year, an incumbent Democratic Governor and Senate candidate Bob Casey, Jr capitalized on a national Democratic surge against an unpopular president. Sounds familiar. In that election, 4.09 million Pennsylvanians voted for governor. The other high was from the other wave election in the period: Republicans’ sweep of 2010.

The Model
I’ve built a model that predicts turnout of every precinct using data from even years from 2002 – 2016. The model uses information on the election (if it’s a midterm, the party in the presidency, whether local races are contested, the incumbency of local races, the presence of female candidates, and district population), and allows for different precinct-level responses to midterm elections, presidential party, and turnout growth or shrinkage over times.

The thing that makes predicting state races so hard is that there aren’t surveys. Without them, it’s really hard to find good proxies for voter excitement and disproportionate interest. Instead, I’ve built the model to simulate the full distribution of types of elections, from very Democratic to very Republican, and then give the entire range of possible results. We can then use that to either (a) examine the full range of possible outcomes, or (b) plug in specific values and see the results, for example “what if the election looked like 2006?”

To achieve this, I’ve modeled the correlations in turnout among precincts, to identify groups of precincts that all turn out together. Some precincts all come out disproportionately in midterms, others come out only in Democratic wave years. It’s this factor that is the biggest unknown moving into November: what type of election will it be. These correlations create a lot of uncertainty: you can’t rely on the Law of Large Numbers to cancel out all of the districts’ indiosyncracies.

So, does the model work?

Testing the Model: 2016

To test the model, let’s pretend its September 2016. Using only data from 2002-2014, and I fit it, and then generate predictions for 2016 turnout.

In 2016, I would have estimated 5.68 million votes cast statewide, with a 95% credible interval of 5.16M – 6.29M (the uncertainty is huge, but listen, science is hard, and I’m a serious person). In reality, 6.01 million votes were cast for President. I undershot it by a little bit, but the result is well within the interval.

Capturing relative turnout is arguably more important for final results than overall. Which places voted more than usual, and which less? Let’s compare the model’s predictions for Vote in 2016 / Vote in 2012, compared to the actual values.
I did less well on that. Above is a plot of the observed turnout growth in each geography (measured as turnout in 2016 divided by turnout in 2014) versus what the model would have predicted. A perfect prediction would have all of the points on the 45-degree.

There maybe exists correlation between my predicted growths and the observed results, but it’s weak. It turns out that the growth depends heavily on the partisanship of the election; the correlation factors that I discussed above. Since I don’t know what that is ahead of time, I have to simulate them from all of the possibilities, resulting in the elliptical blob above. The model easily identifies these factors retrospectively–I can say for example that 2006 was a very strong Democratic year–but I don’t in general have a way to predict that for an upcoming election.

The Predictions
Enough delay. What do I predict for turnout in 2018?

There will be 4,295,981 votes for Governor.

This strikes me as high. It’s higher turnout than any midterm in my dataset. But the model did relatively well in the holdout test of 2016, and I don’t want to commit the sin of post-hoc adjusting. So this is my prediction, and I’m sticking to it.

What are the arguments for this astronomical number? You, a person who somehow reads this blog and thus are well down the elections analysis rabbit-hole, might have noticed unprecedented excitement for a midterm, and be unsurprised by a high prediction. But the model doesn’t have that info. Instead, it does see that (a) a Republican is president, which increases midterm turnout more in Democratic precincts than a Democratic president increases in Republican ones, (b) many more races are contested, including in the newly-redrawn congressional districts and a ton of contested state house seats, and (c) after all of the adjustments, turnout has been steadily increasing since 2002. All of these combine to create a prediction for midterm turnout that is unprecedented in the dataset. And some of those features, particularly the contested races, are probably serving as proxies for voter enthusiasm.

There’s a lot of uncertainty in the prediction because, again, science is hard. The 95% credible interval is 3.85M to 4.72M. That interval would include the turnout of the last two wave midterm elections—4.09M in 2006 and 4.00M in 2010—and exclude the lower-turnout years of 2002 and 2014.

Within Philadelphia, I project 460,000 voters, with a 95% CI of (410,000, 517,000). Even at the lower end, that would beat out the 2006 and 2010 turnout highs.

What does the model have to say about precinct-specific changes? Below is a plot of its predictions in Philadelphia, relative to turnout in 2014. Keep in mind that these predictions are equivalent to the blob plot above: there’s a loose predictive power, but a ton of noise based on what type of election this ends up being.

I predict particularly high turnout in Center City East and the River Wards, upwards of 60% growth over 2014. That one bright yellow precinct in the River Wards is because of population changes that have seen increasing midterm turnout, and a competitive State House election in a neighborhood that hasn’t seen one for years. West Philly, North Philly, up to West Oak Lane, will likely turn out similarly to 2014, given their largely uncontested races.

Coming Soon
So I tentatively expect record turnout, at least among election since 2002. Will it happen? I’ve over-predicted turnout before. We’ll see if I learned my lesson.

Until that test comes, let’s brazenly barrel forward and predict the actual results. Coming soon.

Sources
Data comes, as always, from the amazing Open Elections Project.
I also leaned heavily on Ballotpedia to complement and extend the data.
GIS data is from the US Census.

District Profile: Chester County’s PA-155

Welcome to the Sixty-Six Wards District Profiles!

Democrats need to pick up 20 seats to tie up the PA House. There are 19 districts currently held by Republicans that voted for Clinton in 2016. All of them are in the Philadelphia region. I’ve profiled Delco’s District 168, Philadelphia’s Districts 170 and 177, and Bucks’ District 178.

Today, let’s look at Chester County, which showed some of the region’s strongest anti-Trump shifts. If Democrats want to have any chance to win the House, they need to win these districts with strong, Republican incumbents but which voted for Clinton in 2016. District 155 exemplifies these important districts: Republican incumbent Becky Corbin won by 16 points in 2014 even as Clinton won the district by 8. She’s being challenged by Danielle Friel Otten, running from the left with endorsements by Emily’s List, Planned Parenthood, and the SEIU.

District 155

​District 155 gerrymanders through Chester, combining slices of Democratic Phoenixville and Spring City in the East with vast swaths of lower-density Republican lands.

Corbin was first elected in 2012, and won reelection by 21 and 16 points in 2014 and 2016, respectively. In 2016, voters gave her a landslide victory even while voting strongly for Clinton, providing a 24 point swing that was typical of districts in Chesco.

[Interactive State House Map] [Interactive Presidential Map]

​All but one precinct voted more for Corbin than for Trump, with 18 of the 30 precincts showing an over-20 point difference between the Republicans.

[Interactive Plot]

While Phoenixville and Spring City are denser than the right of the area, the sweeping Republican stretch in Uwchlan (above Downingtown and Exton) still dominate the vote, by virtue of their huge landmasses. Friel Otten lives in Uwchlan, which may prove important in swaying this traditionally Republican stronghold.

[Interactive Map]

The district is basically entirely White, with the most diverse Census Tract being in the heart of Phoenixville, which is still 71% non-Hispanic White, and 19% Hispanic (and only a third of that tract is even in the district). No other tract is less than 79% White.

[Interactive Map]

Lining up the precinct votes in order of 2016 vote shows limit of the Democratic precincts’ impact, and the steep hill for Friel Otten. She needs many of the broadly Republican precincts to shift–like they did for Clinton–and can’t just rely on the local urban cores.

 [Interactive Plot]

Unlike the other districts I’ve looked at, this one shows almost no difference in relative turnouts between 2014 and 2016: all precincts increased their votes by 75%, regardless of party. In every other district, Democrats turned out disproportionately more in 2016. In those cases, I thought it boded well for Democrats: if an engaged Democratic party looked more like 2016 than 2014, that would help them. Here, that looks less likely: the huge engagement differences between 2014 and 2016 didn’t yield any Democratic edge, so it’s hard to imagine them getting a turnout edge here. Any progress will have to be changing the minds of voters.

[Interactive Plot]

This district is a stretch for Democrats, but is the type of District–with a strong Republican incumbent but steep anti-Trump sentiment–where they will have to do well to take the state house in November.

Sources:
Election data from the Open Elections Project
Population data from the 2016 American Community Survey 5-year estimates.
Boundaries and GIS data from election-geodata
Base maps provided by maps.stamen.com/