Mayoral supporters and their votes for Council At-Large

Jonathan Tannen, Sixty Six Wards
Seth Bluestein, Philadelphia City Commissioner

Today, Sixty Six Wards is partnering with Philadelphia City Commissioner Seth Bluestein to look at individual voter patterns in the 2023 Democratic Primary.

The 2023 Democratic Primary saw mayoral candidates running in distinct lanes (Sixty Six Wards’ recap, preview). Parker won handily–in retrospect–as Rhynhart and Gym came in distant second and third. In Council, all of the winners were endorsed by the Democratic City Committee, with Santamoor, McIllmurray and Almirón–in 6th, 7th, and 8th–also apparently splitting various non-machine coalitions.

In February, Sixty-Six Wards teamed up with Commissioner Bluestein to look beyond division-aggregated data at individual voter level patterns for correlations and diversity within divisions. We’re doing it again.

The Commissioner’s office has generated that dataset with anonymized, aggregated counts of candidate combinations at the Division level by vote mode–election day, mail-in, provisional–for the 2023 primary.

Rather than a gigantic piece like last time, we will provide a series of short pieces with interesting insights from the data.

How did Mayoral supporters vote for Council?

Today, we examine how each mayoral candidate’s supporters voted for council. Among Divisions, there were strong correlations between Parker and Thomas performance, or Gym and Landau. Some of that may be ecological: supporters tend to live in the same places, but might not be the same people. Just how strong is the voter-level trend?

View code
library(dplyr)
library(ggplot2)
library(tidyr)

pairs <- readr::read_csv(
  "../../data/raw_election_data/pairs_new.csv",
  locale = readr::locale(encoding = "Latin1")
)
pairs_backup <- pairs
# pairs <- pairs_backup

toplines <- readr::read_csv(
  "../../data/raw_election_data/primary_2023_combinations_toplines.csv",
  locale = readr::locale(encoding = "Latin1")
)
voters <- readr::read_csv(
  "../../data/raw_election_data/primary_2023_combinations_voters.csv"
)
histogram <- readr::read_csv(
  "../../data/raw_election_data/primary_2023_combinations_histogram.csv",
  locale = readr::locale(encoding = "Latin1")
)

toplines$office_candidate <- factor(toplines$office_candidate)

levels_df <- data.frame(
  office_candidate=levels(toplines$office_candidate)
) |>
  separate(office_candidate, into=c("office_orig", "candidate_orig"), sep="::", remove=FALSE) |>
  mutate(
    office_orig = stringr::str_trim(office_orig),
    office = stringi::stri_trans_totitle(office_orig),
    office = gsub("(Dem|Rep)$", "(\\1)", office),
    candidate_orig = stringr::str_trim(candidate_orig),
    candidate = gsub("\\s*(DEM|REP) \\([0-9]+\\)$", "", candidate_orig),
    candidate = stringi::stri_trans_totitle(stringr::str_trim(candidate)),
    candidate = gsub("\\bMc([a-z])", "Mc\\U\\1", candidate, perl=TRUE),
    candidate = gsub("\\b([Ii]+)\\b", "\\U\\1", candidate, perl=TRUE),
  )

pairs <- pairs |> filter(candidate.x != "overvote")
pairs <- pairs |> filter(candidate.y != "overvote")

pairs$office.x <- factor(pairs$office.x)
pairs$office.y <- factor(pairs$office.y)
pairs$candidate.x <- factor(pairs$candidate.x)
pairs$candidate.y <- factor(pairs$candidate.y)

levels(pairs$office.x) <- levels_df$office[match(levels(pairs$office.x), levels_df$office_orig)]
levels(pairs$office.y) <- levels_df$office[match(levels(pairs$office.y), levels_df$office_orig)]
levels(pairs$candidate.x) <- levels_df$candidate[match(levels(pairs$candidate.x), levels_df$candidate_orig)]
levels(pairs$candidate.y) <- levels_df$candidate[match(levels(pairs$candidate.y), levels_df$candidate_orig)]

testthat::expect_false(any(is.na(pairs$office.x)))
testthat::expect_false(any(is.na(pairs$office.y)))
testthat::expect_false(any(is.na(pairs$candidate.x)))
testthat::expect_false(any(is.na(pairs$candidate.y)))

histogram$office_candidate <- factor(histogram$office_candidate, levels=levels(toplines$office_candidate))
histogram$office <- levels_df$office[histogram$office_candidate]
histogram$candidate <- levels_df$candidate[histogram$office_candidate]

toplines$office <- levels_df$office[toplines$office_candidate]
toplines$candidate <- levels_df$candidate[toplines$office_candidate]

Council Results by Mayor

View code
mayor_council <- pairs |>
  filter(
    office.y == "Mayor (Dem)", 
    office.x == "Council At-Large (Dem)"
  ) |> 
  rename(mayor = candidate.y, council=candidate.x) |>
  group_by(mayor, council, Type) |>
  summarise(counts = sum(n), .groups="drop")

# Add candidate, Type pairs that are missing
mayor_council <- expand.grid(
      mayor=unique(mayor_council$mayor), 
      Type=unique(mayor_council$Type),
      council=unique(mayor_council$council)
    ) |>
  left_join(mayor_council) |>
  mutate(counts = replace_na(counts, 0)) |>
  left_join(
    toplines |> 
      filter(office == "Mayor (Dem)") |> 
      group_by(candidate, Type) |> 
      summarise(topline_votes=sum(votes), .groups="drop"), 
    by=c("mayor" = "candidate", "Type"="Type")
  ) |>
  mutate(pvote = counts / topline_votes)

get_order <- function(office){
  toplines |>
    filter(office == !!office) |>
    group_by(candidate) |>
    summarise(votes = sum(votes), .groups="drop") |>
    arrange(desc(votes)) |>
    with(candidate)
}

council_order <- get_order("Council At-Large (Dem)")
mayor_order <- get_order("Mayor (Dem)")

mayor_council <- mayor_council |>
  mutate(
    council = factor(council, levels=council_order),
    mayor = factor(mayor, levels=mayor_order)
  )

ggplot(
  mayor_council |> 
    group_by(mayor, council) |> 
    summarise(topline_votes=sum(topline_votes), counts=sum(counts)),
  aes(x=council, y=100*counts/topline_votes)
) + 
  geom_bar(stat="identity") +
  facet_wrap(~mayor) +
  theme_minimal() %+replace% 
  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1, size=5)) +
  labs(
    title="Votes for Council At-Large by Mayoral Vote (Dem)",
    x=NULL,
    y="Percent of Mayoral voters who cast\nvote for Council candidate"
  ) 

Among Parker voters, the five eventual topline winners were on top. But the order was starkly different: third-place Rue Landau finished a distant fifth among this group. Conversely, Landau was the highest vote-getter among Gym voters, and among Rhynhart voters was on par with Thomas and Gilmore-Richardson. There are a few other noticeable peaks in the plot above: Santamoor and Itzkowitz received 36% and 25% of votes from Rhynhart voters, and McIllmurray and Almirón whopping 48% and 46% from Gym voters.

Amusingly, voters who wrote in choices for Mayor were vastly more likely to write in for council, too.

Those patterns didn’t change across vote mode, with Election Day voters and Mail-In voters showing similar preferences (the plots below show only the top four mayoral candidates, for sanity).

View code
ggplot(
  mayor_council |> 
    filter(mayor %in% c("Cherelle L Parker", "Rebecca Rhynhart", "Helen Gym", "Allan Domb")),
  aes(x=council, y=100*counts/topline_votes)
) + 
  geom_bar(stat="identity") +
  facet_grid(Type~mayor) +
  theme_minimal() %+replace% 
  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1, size=5)) +
  labs(
    title="Votes for Council At-Large by Mayoral Vote and Vote Type",
    x=NULL,
    y="Percent of Mayoral voters who cast\nvote for Council candidate"
  ) 

Notice that the average heights of the bars are different between facets; some mayoral candidates’ supporters cast more votes for Council. The heights of the bars would sum to 500 if every voter used all five votes, but only 400 if they used on average four, 300 if three, etc.

View code
mayor_council |>
  group_by(mayor, Type, topline_votes) |>
  summarise(counts = sum(counts), .groups="drop") |>
  mutate(votes_per_voter = counts/topline_votes) |>
  ggplot(
    aes(x=mayor, y=votes_per_voter)
  ) +
  geom_bar(stat="identity") +
  facet_grid(Type~.) +
  theme_minimal() %+replace% 
  theme(axis.text.x = element_text(angle=60, vjust=1, hjust=1)) +
  labs(
    title="Council At-Large votes per voter, by Mayoral choice",
    x=NULL,
    y="Average number of votes cast\nfor Council At-Large"
  )

Among Election Day voters, Gym and Rhynhart supporters cast 3.4 and 3.3 votes for Council At-Large, respectively. Parker supporters cast 2.7, and the rest of voters about 2.5. That difference vanished among Mail-In voters: all candidates’ supporters cast between 3.6 and 4.1 At-Large votes. An obvious explanation is that mail-in voters have time to look up down-ballot candidates with the ballot in hand that Election Day voters do not.

Suppose we recreated the above plot at the Division level: categorize Divisions by which mayoral candidate won, and calculate each council candidate’s percent (we’ll use total votes for Mayor as the denominator). The patterns are reasonably similar to the person-level results.

View code
div_mayor <- toplines |> filter(office == "Mayor (Dem)") |>
  group_by(Division, candidate) |> 
  summarise(votes = sum(votes)) |>
  group_by(Division) |>
  summarise(
    winner = candidate[which.max(votes)],
    pvote = votes[which.max(votes)] / sum(votes),
    mayoral_votes = sum(votes)
  ) |>
  mutate(mayor = factor(winner, levels=mayor_order))

div_council <- toplines |> filter(office == "Council At-Large (Dem)") |>
  group_by(Division, candidate) |> 
  summarise(votes = sum(votes)) |>
  mutate(council = factor(candidate, levels=council_order))

div_council |> 
  left_join(div_mayor, by="Division") |>
  group_by(mayor, council) |>
  summarise(
    votes = sum(votes),
    mayoral_votes = sum(mayoral_votes)
  ) |>
  ggplot(
    aes(x=council, y=100*votes / mayoral_votes)
  ) +
  geom_bar(stat="identity") +
  facet_wrap(~mayor) +
  theme_minimal() %+replace% 
  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1, size=5)) +
  labs(
    title="Votes for Council At-Large by Division's Mayoral Winner",
    x=NULL,
    y="Percent (Votes / Total votes for office of Mayor)"
  ) 

There are some interesting, small differences between the plots: Rhynhart Divisions supported McIllmurray over Harrity by 3.2pp, but Rhynhart voters supported Harrity over McIllmurray by 2pp. This is due to ecological correlations: Rhynhart voters live disproportionately in Divisions where the other voters were more likely to support McIllmurray (presumably Gym voters). Similarly, Santamoor beat Harrity by 3.9pp in Gym Divisions, but only by 0.6pp among Gym voters.

Regression Analysis

A form of ecological effects could permeate even the voter-level results above. Maybe Rhynhart voters came from Divisions more likely to support Santamoor, but within each Division they were just like the others. We can solve this by fitting a regression, using Division fixed effects. This allows us to say, for example, that within a given Division, Cherelle Parker supporters were on average 3.2 percentage points more likely to support Jim Harrity, and less likely to support Amanda McIllmurray and Rue Landau (8.8 and 7.9pp, respectively).

View code
mayor_council_div <- toplines |> 
    filter(office == "Mayor (Dem)") |>
    rename(voters=votes, mayor=candidate) |>
    select(Type, Division, mayor, voters) |>
    # Need this cross join in case there are zero votes for a council candidate.
    cross_join(
        data.frame(council=unique(mayor_council$council))
    ) |>
    left_join(
      pairs |> 
        filter(
          office.y == "Mayor (Dem)", 
          office.x == "Council At-Large (Dem)"
        ) |>
        rename(mayor = candidate.y, council=candidate.x, votes = n) |>
        select(mayor, council, Type, Division, votes),
      by=c("mayor", "council", "Type", "Division")
    ) |>
  mutate(votes=replace_na(votes, 0)) |>
  group_by(Division, mayor, council) |>
  summarise(
    voters=sum(voters),
    votes=sum(votes)
  )

RERUN <- FALSE

library(purrr)

if(RERUN){
  df_cand <- mayor_council_div |>
    ungroup() |>
    filter(council != "Write-In") |>
    # filter(council == !!council_candidate) |>
    cross_join(data.frame(vote_for=c(0,1))) |>
    mutate(votes = ifelse(vote_for==1, votes, voters-votes))
  fits <- list()
  
  for(council in unique(df_cand$council)){
    print(council)
    if(!council %in% names(fits)){
      fits[[council]] <- lm(
        vote_for ~ Division + mayor, 
        w=votes,
        data=df_cand |> filter(council==!!council)
      )
    }
  }  
  coefs_full <- lapply(fits, broom::tidy) |> bind_rows(.id="council")
  saveRDS(coefs_full, file="coefs_full.RDS")
} else {
  coefs_full <- readRDS("coefs_full.RDS")
}

mayor_coef <- coefs_full |> 
  filter(substr(term, 1,5) == 'mayor') |>
  mutate(mayor = gsub("^mayor", "", term))  |>
  select(council, mayor, estimate, std.error) |>
  bind_rows(
    data.frame(
      council = council_order,
      mayor = "Allan Domb", # Reference mayor
      estimate = 0,
      std.error = NA
    )
  ) |>
  left_join(
    toplines |> filter(office == "Mayor (Dem)") |>
      group_by(candidate) |>
      summarise(total_votes = sum(votes)),
    by=c("mayor"="candidate")
  ) |>
  group_by(council) |>
  mutate(weighted_coef = estimate - weighted.mean(estimate, w=total_votes))|>
  mutate(
    council = factor(council, levels=council_order),
    mayor = factor(mayor, levels=mayor_order)
  ) 


ggplot(
  mayor_coef,
  aes(x=council, y=weighted_coef)
) +
  geom_bar(stat='identity') +
  facet_wrap(~mayor) +
  theme_minimal() %+replace% 
  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1, size=5)) +
  labs(
    title="Mayoral supporters' relative vote for council",
    subtitle="Coefficient of mayoral support on council votes, controlling for Division fixed effects",
    x=NULL,
    y="Coefficient of Mayoral Support on Council Vote"
  )

Within a Division, Gym supporters were vastly more likely to support Thomas, Gilmore-Richardson, Landua, McIllmurray, and Almirón, and Rhynhart supporters more likely to support Santamoor, Itzkowitz, and then Gilmore-Richardson, Landau, and Ahmad.

Parker voters were relatively similar to their broader Divisions, but less likely to support Landau, McIllmurray, and Almirón.

Supporters of all mayoral candidates from Domb on were much less likely to support the top four council candidates–Thomas, Gilmore Richardson, Landau, and Ahmad–than supporters of the top three mayoral candidates.

Does ballot position matter for City Council?

This May, Philadelphia will be voting for City Council. This includes five city-wide Democratic At Large positions. We don’t yet know exactly how many At Large candidates there will be, but in 2019 there were 28 names on the ballot.

In order to arrange those names on the ballot, we famously draw names from a coffee can.

In the past, I’ve demonstrated that our judicial elections are determined by the random luck of drawing a good ballot position: being in the first column nearly triples your votes, and is more important than a Democratic City Committee endorsement and Philadelphia Bar Association Recommendation combined). I even proposed an NBA-wheel style ballot procedure that would fix the problem.

I’ve wondered if the same effect exists for City Council. There are reasons to expect not: voters pay more attention to City Council races and candidates spend more money, so it’s less likely that a voter will just push a button in the first column. But with voters choosing up to five names out of a pool of around 28 candidates, it’s certainly plausible they’ll take shortcuts.

I tried this analysis in January 2019 but didn’t have quite enough data. This time around I’ve added in 2019’s 28 candidates, and can finally measure some effects.

In 2019, all three incumbents plus Isaiah Thomas won handily. The fifth winner was Katherine Gilmore Richardson with 6.8% of the vote. Following her were Justin DiBerardinis with 6.3%, then Adrián Rivera-Reyes, Eryn Santamoor, and Erika Almirón at 5.3, 5.2, and 5.1% respectively.

View code
library(tidyverse)
library(sf)

source("../../admin_scripts/util.R")


setwd("C:/Users/Jonathan Tannen/Dropbox/sixty_six/posts/council_ballot_position_23/")
df_major <- readRDS("../../data/processed_data/df_major_type_20220523.Rds")
ballot_position <- read.csv("../../data/processed_data/ballot_layout.csv")

Encoding(ballot_position$candidate) <- "latin1"
ballot_position$candidate <- gsub("\\s+", " ", ballot_position$candidate)

format_name <- function(x){
  x <- tolower(x)
  x <- gsub("(\\b)([a-z])", "\\1\\U\\2", x, perl=TRUE)
  x <- gsub("(á|ñ|ó)([A-Z]+)", "\\1\\L\\2", x, perl=TRUE)
  x <- gsub("\\s+", " ", x)
  x <- gsub("(^\\s)|(\\s$)", "", x)
  return(x)
}

council <- df_major %>% 
  filter(
    election_type == "primary",
      party == "DEMOCRATIC",
      office == "COUNCIL AT LARGE",
      year %in% c(2011, 2015, 2019)
  ) %>%
  mutate(year = as.integer(year))

council <- council %>% 
  left_join(ballot_position, by = c("year" = "year", "candidate" = "candidate"))

council$candidate <- factor(council$candidate)
levels(council$candidate) <- format_name(levels(council$candidate))
council <- council %>% filter(candidate != 'Write In')

council <- council %>%
  group_by(year) %>%
  mutate(ncand = length(unique(candidate)))

total_results <- council %>%
  group_by(candidate, year, row, column, ncand, incumbent) %>%
  summarise(votes = sum(votes)) %>%
  group_by(year) %>%
  mutate(
    pvote = votes/sum(votes),
    winner = rank(desc(votes)) <= 5
  )

YEAR <- 2019
ggplot(
  total_results %>% 
    filter(year == YEAR) %>% 
    mutate(
      lastname=format_name(gsub(".*\\s(\\S+)$", "\\1", candidate)),
      lastname=ifelse(lastname == "Jr",format_name(gsub(".*\\s(\\S+\\s\\S+)$", "\\1", candidate)),lastname),
    ) %>%
    arrange(votes),
  aes(y=row, x=column)
) +
  geom_tile(
    aes(fill=pvote*100, color=winner),
    size=2
  ) +
  geom_text(
    aes(
      label = ifelse(incumbent==1, "Incumbent", ""),
      x=column-0.45,
      y=row+0.45
    ),
    color="grey70",
    hjust=0, vjust=0
  ) +
  geom_text(
    aes(label = sprintf("%s\n%0.1f%%", lastname, 100*pvote)),
    color="black"
    # fontface="bold"
  ) +
  scale_y_reverse(NULL) +
  scale_x_continuous(NULL)+
  scale_fill_viridis_c(guide=FALSE) +
  scale_color_manual(values=c(`FALSE`=rgb(0,0,0,0), `TRUE`="yellow"), guide=FALSE) +
  expand_limits(x=3.5)+
  theme_sixtysix() %+replace% 
  theme(
    panel.grid.major=element_blank(),
    axis.text=element_blank()
  ) +
  ggtitle(
    paste(YEAR, "Council At Large Results"),
    "Democratic Primary, arranged by the ballot layout. Winners are outlined."
  )

Ballot position appears weaker than for judges: many candidates win from later columns. Incumbency is obviously the strongest factor.

But looking farther back, we see instances where ballot position appears to help. In 2015, Derek Green led the entire field as a challenger with the top position. And in 2011 Sherrie Cohen came in a close sixth place from the first column, and two more first column candidates were in the top nine.

View code
ggplot(
  total_results,
  aes(y = 100 * pvote, color = interaction(incumbent, column==1))
) + 
  geom_text(
    aes(label = candidate),
    x=0, 
    hjust=0
  ) +
  facet_grid(. ~ year) +
  theme_sixtysix() +
  scale_y_continuous(breaks = seq(0,20,2.5)) +
  geom_text(
    data = tribble(
      ~votes, ~candidate, ~incumbent, ~year, ~pvote, ~column,
      # 1e3, "Challenger", 0, 2011, -0.007, 0,
      7e3, "Incumbent", 1, 2011, 0.007, 0,
      4e3, "First Column", 0, 2011, 0.000, 1
    ),
    fontface="bold",
    x=0.45,
    aes(label = candidate),
    hjust = 0,
    vjust=0
  ) +
  scale_color_manual(
    values=c(
      '1.FALSE' = strong_blue, 
      '1.TRUE' = strong_blue, 
      '0.FALSE'= "black", 
      '0.TRUE'=strong_green
    ),
    guide = FALSE
  ) +
  expand_limits(y=0) +
  labs(
    title="Incumbents Swept 2011 and 2019, but not 2015",
    y = "% of Vote"
  )

Let’s use regression to tease apart the effects. I’ll regress the percent of the vote received by a candidate on being in the first column and being in the first row, incumbency, and year fixed effects. The regression is simplistic, but since ballot position is randomized we don’t need anything more. (The substantive findings below are robust to more controls and to using log(votes).)

View code
ols_fit <- lm(
  100 * pvote ~ 
    as.character(year) +
    incumbent +
    (row == 1) +
    (column == 1) + 
    # (column == 1 & row == 1) +
    # (column == 1 & row != 1) +
    # (column == 2) +
    1,
  data = total_results #%>% filter(!incumbent)
)
# summary(ols_fit)

print_coef <- function(fit, coef){
  val <- round(ols_fit$coefficients[coef], 1)
  se <- summary(ols_fit)$coefficients[,2][coef]
  # stars <- case_when(p<0.01 ~ " (p < 0.01)", p < 0.05 ~ " (p < 0.05)", TRUE ~ "")
  se_text <- paste0(" (",round(se, 1),")")
  prefix <- (if(val > 0) "+" else "")
  paste0(prefix, val, se_text)
}

tribble(
  ~Effect, ~"% Vote in pp (standard error)",
  "Baseline Votes 2019", "2.6",
  "Incumbency", ols_fit %>% print_coef('incumbent'),
  "First Column",  ols_fit %>% print_coef('column == 1TRUE'),
  "First Row",  ols_fit %>% print_coef('row == 1TRUE')
) %>% 
  knitr::kable("html") %>% 
  kableExtra::kable_styling(full_width = F)
Effect % Vote in pp (standard error)
Baseline Votes 2019 2.6
Incumbency +6.4 (0.9)
First Column +2.4 (0.9)
First Row -0.3 (0.8)

Non-incumbent candidates in the second or later column started with an average 2.4% of the vote in 2019. Incumbents on average receive 6.4 percentage points more votes. Candidates in the first column receive on average 2.4 pp more votes. Being in the first row doesn’t appear to help.

So the first-column effect for City Council is smaller than for Common Pleas, but still nearly doubles a typical challenger’s votes. And in 2019 it would have been enough to put any of the close challengers (Gilmore Richardson, DiBerardinis, Santamoor, Almirón) over the top.