Covid-19 and PA’s first-ever mail-in ballot election.

With Coronavirus upon us, Philadelphians are re-evaluating how we might vote in the June 2nd Primary. Coincidentally, this is also the first year that Pennsylvanians can request mail-in ballots even if they won’t be absentee. This combination has led to a huge amount of requests, and big questions about what this might mean for turnout and the results.

The friendly folks at the Commissioners’ Office have sent me all of Philadelphia’s requests through Thursday May 7th. Let’s dig in.

An astounding 91,587 voters have requested ballots. Philadelphia has 913,000 active registered voters. In the 2019 Primary, 217,000 people voted. For a brand new program, that number is a huge percent.

Who is requesting ballots?

As with everything political in this city, that pattern wasn’t equal across the city.

View code

library(tidyr)
library(dplyr)
library(readr)
library(ggplot2)
library(sf)
library(magrittr)
library(lubridate)

setwd("C:/Users/Jonathan Tannen/Dropbox/sixty_six/posts/mail_in/")
source("../../admin_scripts/util.R")

fve <- readRDS("../../data/processed_data/voter_db/fve.Rds")
v2e <- readRDS("../../data/processed_data/voter_db/voters_to_elections.Rds")
elections <- readRDS("../../data/processed_data/voter_db/elections.Rds")

mail_in <- read_csv("../../data/voter_registration/mail_in/Absentee_Mail-in Ballot Listing (5-7-20).csv")

mail_in %<>% filter(!duplicated(`ID Number`))
mail_in %<>% left_join(fve, by=c("ID Number" = "voter_id"))
# table(mail_in$voter_status, useNA = "always")

divs <- st_read("../../data/gis/warddivs/201911/Political_Divisions.shp")divs %<>% mutate(warddiv = pretty_div(DIVISION_N))

mail_in_div <- mail_in %>% 
  group_by(warddiv) %>%
  summarise(n_mail = n())

fve_div <- fve %>%
  filter(voter_status == "A") %>%
  group_by(warddiv) %>%
  summarise(n_reg = n())

df_major <- readRDS("../../data/processed_data/df_major_20191203.Rds")
df_major %<>% mutate(warddiv = pretty_div(warddiv))

turnout <- df_major %>%
  filter(is_topline_office) %>%
  group_by(warddiv, year, election) %>%
  summarise(turnout = sum(votes))

turnout_wide <- turnout %>%
  unite("key", year, election) %>%
  spread(key, turnout)

divs %<>% left_join(mail_in_div) %>% mutate(n_mail = ifelse(is.na(n_mail), 0, n_mail))
divs %<>% left_join(fve_div)
divs %<>% left_join(turnout_wide)

These requestors are much more likely to have voted than a typical Philadelphian. A whopping 53% of them voted in the 2019 Primary (overall turnout among active voters was 24%). Some 52% of them voted in at least three of the last four elections, and 44% in six of the last eight (the number is presumably lower because of people who didn’t live in the state eight years ago).

View code

mail_in_v2e <- v2e %>%
  mutate(mail_in_reqd = voter_id %in% mail_in$`ID Number`) %>%
  inner_join(
    elections %>% filter(is_general | is_primary, year >= 2016) %>%
      select(election_date, year, fve_election_id)
  ) %>%
  group_by(voter_id, mail_in_reqd) %>%
  summarise(
    voted_2019 = !is.na(vote_method[election_date == "2019-05-21"]),
    voted_l4 = sum(!is.na(vote_method[year >= 2018])),
    voted_l8 = sum(!is.na(vote_method)),
    cnt = n()
  ) %>%
  group_by(mail_in_reqd) %>%
  summarise(
    voted_2019 = sum(voted_2019),
    voted_l4 = sum(voted_l4 >= 3),
    voted_l8 = sum(voted_l8 >= 6),
    cnt = n()
  )

mail_in_v2e <- bind_rows(
  mail_in_v2e %>% mutate(mail_in_reqd = ifelse(mail_in_reqd, "Requested Mail-In", "No Request")),
  mail_in_v2e %>% mutate(mail_in_reqd = "All") %>% group_by(mail_in_reqd) %>% summarise_all(sum)
)

ggplot(
  mail_in_v2e %>% filter(mail_in_reqd %in% c("Requested Mail-In", "All")) %>%
    mutate(mail_in_reqd = forcats::fct_inorder(mail_in_reqd)) %>%
    gather("key", "value", voted_2019, voted_l4, voted_l8) %>%
    mutate(
      key = case_when(
        key == "voted_2019" ~ "Voted 2019 Primary",
        key == "voted_l4" ~ "Voted in 3 of last 4",
        key == "voted_l8" ~ "Voted in 6 of last 8"
      )
    ),
  aes(x=key, y=100*value / cnt, fill=mail_in_reqd)
) + 
  geom_bar(stat="identity", position="dodge") +
  theme_sixtysix() +
  scale_fill_manual(values = c(strong_blue, grey(0.70))) +
  labs(
    x = NULL,
    y = "Percent",
    fill = NULL,
    title = "Requestors are much more likely to have voted.",
    subtitle = "Percents among voters who requested mail-in ballots"
  )

By Age

The mail-in requests are coming disproportionately from younger voters. The cohort aged 30-40 made up only 15% of votes in the 2019 Primary, but over 23% of the requests so far.

View code

age_cuts <- c(18, seq(10, 120, 10))
fve_age <- fve %>%
  filter(voter_status == "A") %>%
  mutate(
    age = (lubridate::ymd("2020-06-02") - lubridate::ymd(dob)) / dyears(1),
    age_cat = cut(age, age_cuts)
  ) %>%
  group_by(age_cat) %>%
  summarise(n = n()) %>%
  ungroup() %>%
  mutate(prop = n / sum(n))

fve_age_2019 <- fve %>%
  inner_join(
    v2e %>% 
      inner_join(elections %>% filter(election_date == "2019-05-21")) %>%
      filter(!is.na(vote_method))
  ) %>%
  mutate(
    age = (lubridate::ymd("2020-06-02") - lubridate::ymd(dob)) / dyears(1),
    age_cat = cut(age, age_cuts)
  ) %>%
  group_by(age_cat) %>%
  summarise(n = n()) %>%
  ungroup() %>%
  mutate(prop = n / sum(n))

mail_in_age <- mail_in %>%
  mutate(
    age = (lubridate::ymd("2020-06-02") - lubridate::ymd(dob)) / dyears(1),
    age_cat = cut(age, age_cuts)
  ) %>%
  group_by(age_cat) %>%
  summarise(n = n()) %>%
  ungroup() %>%
  mutate(prop = n / sum(n))

age_cats <- bind_rows(
  mail_in_age %>% mutate(source = "Mail-In Requests"),
  fve_age_2019 %>% mutate(source = "2019 Primary Voters"),
  fve_age %>% mutate(source = "Active Registered Voters")
) %>%
  mutate(source = forcats::fct_inorder(source))


ggplot(
  age_cats %>% 
    filter(!is.na(age_cat), !age_cat %in% c("(0,10]","(100,110]","(110,120]")) %>%
    mutate(
      age_cat = droplevels(age_cat)
    ),
  aes(x = age_cat, y = 100 * prop, fill=source)
) + geom_bar(
  stat="identity",
  position="dodge"
) +
  theme_sixtysix() +
  labs(
    y="Percent of Total",
    x = "Age",
    fill = ""
  ) +
  scale_fill_manual(values = c(strong_blue, grey(0.70), grey(0.9))) +
  ggtitle("Voters under 40 are disproportionately requesting ballots")

By Neighborhood

Geographically, the patterns are even starker. More than 13,000 of the requests came from the two Center City wards–5 and 8–alone.

[See the Appendix for Division-level maps.]

View code

library(leaflet)

make_leaflet <- function(
  sf, 
  color_col, 
  title,
  is_percent=FALSE,
  pal="viridis"
){
  zoom <- 11
  
  min_value <- min(sf[[color_col]], na.rm=TRUE)
  max_value <- max(sf[[color_col]], na.rm=TRUE)
  
  sigfig <- round(log10(max_value-min_value)) - 1
  step_size <- 2 * 10^(sigfig)
  
  min_value <- step_size * (min_value %/% step_size)
  max_value <- step_size * (max_value %/% step_size + 1)
  
  color_numeric <- colorNumeric(
    pal, 
    domain=c(min_value, max_value)
  )
  
  legend_values <- seq(min_value, max_value, step_size)
  legend_colors <- color_numeric(legend_values)
  if(is_percent){
    legend_labels <- sprintf("%s%%", legend_values)
  } else{
    legend_labels <- scales::comma(legend_values)
  }
  
  bbox <- st_bbox(sf)
  
  leaflet(
    options=leafletOptions(
      minZoom=zoom, 
      # maxZoom=zoom,
      zoomControl=TRUE
      # dragging=FALSE
    )
  )%>% 
    addProviderTiles(providers$Stamen.TonerLite) %>%
    addPolygons(
      data=sf$geometry,
      weight=0, 
      color="white", 
      opacity=1, 
      fillOpacity = 0.8, 
      smoothFactor = 0,
      fillColor = color_numeric(sf[[color_col]]),
      popup=sf$popup
    ) %>%
    setMaxBounds(
      lng1=bbox["xmin"],
      lng2=bbox["xmax"],
      lat1=bbox["ymin"],
      lat2=bbox["ymax"]
    ) %>%
    addControl(title, position="topright", layerId="map_title") %>% 
    addLegend(
      layerId="geom_legend",
      position="bottomright",
      colors=legend_colors,
      labels=legend_labels
    )
}

wards <- divs %>% 
  mutate(ward = substr(warddiv, 1, 2)) %>%
  group_by(ward) %>%
  summarise_at(
    .funs=list(sum), 
    .vars=vars(n_mail, n_reg, `2002_general`:`2019_primary`)
  )

wards %<>% 
  mutate(
    popup = sprintf(
      paste(
        c(
          "<b>Ward %s</b>",
          "Active Registered Voters: %s",
          "Mail-In Requests: %s",
          "Votes in the 2019 Primary: %s",
          "Votes in the 2016 Primary: %s"
        ),
        collapse = "<br>"
      ),
      ward,
      scales::comma(n_reg),
      scales::comma(n_mail),
      scales::comma(round(`2019_primary`)),
      scales::comma(round(`2016_primary`))
    )
  )

wards$color_mail <- wards$n_mail
wards_mail <- make_leaflet(
  wards,
  "color_mail",
  "Number of Mail-In Requests"
)
library(widgetframe)
knitr::opts_chunk$set(widgetframe_self_contained = TRUE) 
frameWidget(wards_mail)

As a percent of all voters, Center City, the wealthy ring around it, and Mount Airy/Chestnut Hill stand out. These Wards make up what I’ve dubbed the Wealthy Progressives.

View code

wards$color_mail_vs_reg <- 100 * wards$n_mail / wards$n_reg
mail_reg <- make_leaflet(
  wards,
  "color_mail_vs_reg",
  "Mail-In Requests as % of Active Registered Voters",
  is_percent=TRUE
)
knitr::opts_chunk$set(widgetframe_self_contained = TRUE)
frameWidget(mail_reg)

View code

wards$color_mail_vs_2019 <- 100 * wards$n_mail / wards$`2019_primary`
mail_2019 <- make_leaflet(
  wards,
  "color_mail_vs_2019",
  "Mail-in requests as % of 2019 Primary voters",
  is_percent=TRUE
)
knitr::opts_chunk$set(widgetframe_self_contained = TRUE)
frameWidget(mail_2019)

In fact, aggregating by those Division Cohorts paints a stark picture.

View code

div_cats <- readRDS("../../data/processed_data/div_cats_20200411.RDS")
divs %<>% left_join(div_cats %>% select(warddiv, cat))
cat_colors <- c(
  "Black Voters" = light_blue, 
  "Wealthy Progressives" = light_red, 
  "White Moderates" = light_orange, 
  "Hispanic North Philly" = light_green
)

div_cats <- divs %>% 
  as.data.frame() %>%
  select(-geometry) %>%
  group_by(cat) %>%
  summarise_at(
    .funs=list(sum), 
    .vars=vars(n_mail, n_reg, `2002_general`:`2019_primary`)
  ) %>%
  ungroup() 

ggplot(
  div_cats,
  aes(x = cat, y = n_reg, fill=cat)
) + geom_bar(
  stat="identity",
  position="dodge"
) +
  theme_sixtysix() + # %+replace% theme(axis.text.x = element_blank())+
  scale_y_continuous(labels=scales::comma) +
  labs(
    y=NULL,
    x = NULL,
    fill = "",
    title="Black Voter Divisions represent nearly half of Active Voters",
    subtitle="Active Registered Voters"
  ) +
  scale_fill_manual(values = cat_colors, guide=FALSE)

View code

ggplot(
  div_cats %>%
    mutate(prop_mail = n_mail / n_reg),
  aes(x = cat, y = 100 * prop_mail, fill=cat)
) + geom_bar(
  stat="identity",
  position="dodge"
) +
  theme_sixtysix() %+replace% theme(title = element_text(size = 10)) +
  labs(
    y=NULL,
    x = NULL,
    fill = "",
    title="But Wealthy Progressive Divisions have the most mail-in requests",
    subtitle="Mail-in requests as a percent of active registered voters"
  ) +
  scale_fill_manual(values = cat_colors, guide=FALSE)

View code

ggplot(
  div_cats %>%
    mutate(prop_2019 = n_mail / `2019_primary`),
  aes(x = cat, y = 100 * prop_2019, fill=cat)
) + geom_bar(
  stat="identity",
  position="dodge"
) +
  theme_sixtysix() + # %+replace% theme(axis.text.x = element_blank())+
  labs(
    y=NULL,
    x = NULL,
    fill = "",
    title="Mail-in requests as percent of 2019 Primary turnout",
    subtitle="Requests from Wealthy Progressive Divisions are nearly 80% of their 2019 turnout."
  ) +
  scale_fill_manual(values = cat_colors, guide=FALSE)

What this means for the election

Since we’ve never had mail-in voting before and it’s been over 100 years since the last pandemic, it’s impossible to know what this means for total turnout in June.

Two things are possible:

These mail-in percents represent the actual turnout. Voters decide not to vote in person, and Wealthy Progressives vastly over-represent the electorate.
Voters that aren’t requesting mail-in ballots are just planning to vote in person, and the gap totally closes on Election Day. The city-wide turnout looks normal.

Probably, the outcome will be somewhere in between. What does that mean for PA-1 and other local races? Coming soon!

Appendix: Division Maps

View code

divs %<>% 
  mutate(
    popup = sprintf(
      paste(
        c(
          "<b>Division %s</b>",
          "Active registered voters: %s",
          "Mail-in requests: %s",
          "Votes in the 2019 Primary: %s",
          "Votes in the 2016 Primary: %s"
        ),
        collapse = "<br>"
      ),
      warddiv,
      scales::comma(n_reg),
      scales::comma(n_mail),
      scales::comma(round(`2019_primary`)),
      scales::comma(round(`2016_primary`))
    )
  )


divs$color_mail <- divs$n_mail
div_color_mail <- make_leaflet(
  divs,
  "color_mail",
  "Number of mail-in requests"
)
knitr::opts_chunk$set(widgetframe_self_contained = TRUE)
frameWidget(div_color_mail)

View code

divs$color_mail_vs_reg <- 100 * divs$n_mail / divs$n_reg
l0 <- make_leaflet(
  divs,
  "color_mail_vs_reg",
  "Mail-in requests as % of active registered voters",
  is_percent=TRUE
)
knitr::opts_chunk$set(widgetframe_self_contained = TRUE)
frameWidget(l0)

View code

divs$color_mail_vs_2019 <- 100 * divs$n_mail / divs$`2019_primary`
l1 <- make_leaflet(
  divs,
  "color_mail_vs_2019",
  "Mail-in requests as % of 2019 Primary voters",
  is_percent=TRUE
)
knitr::opts_chunk$set(widgetframe_self_contained = TRUE)
frameWidget(l1)

View code

divs$color_mail_vs_2016 <- 100 * divs$n_mail / divs$`2016_primary`
l2 <- make_leaflet(
  divs,
  "color_mail_vs_2016",
  "Mail-in requests as % of 2016 Primary voters",
  is_percent=TRUE
)
knitr::opts_chunk$set(widgetframe_self_contained = TRUE)
frameWidget(l2)

Author: Jonathan Tannen

How to follow an unusual election with Sixty-Six Wards

The Turnout Tracker

Sixty-Six Wards After Dark: The Election Needle

Who’s requesting mail-in ballots?

Covid-19 and PA’s first-ever mail-in ballot election.

Who is requesting ballots?

By Age

By Neighborhood

What this means for the election

Appendix: Division Maps